A personal AI agent with a two-agent architecture, self-organizing memory based on a Rutgers research paper, and proactive scheduling. She lives on Telegram, remembers everything, and reaches out on her own.
I've thought for a while that chat is not the final form for AI interfaces. Karpathy said in a talk that the GUI for LLMs has not been created yet, and I think he's right. I remember being in awe when ChatGPT could start executing Python and doing web search in conversation — that was the first glimpse of something bigger.
But there's still a fundamental problem with ChatGPT, Claude, all of them: the lack of real memory. Yes, the platforms have memory, but it doesn't feel accurate. It appears to be an LLM call over your recent chats that condenses who you are into a few paragraphs of text. For basic information this works fine. Claude knows I'm a software engineer and roughly the last few things I discussed with him. But I want my agent to be more in tune with what's going on in my life.
I am constantly thinking — coming up with ideas for things to make, stewing on X posts about AI, thinking about the future of software, dreaming up new business ideas, brain-dumping into voice notes. I want an agent that is up to speed on what's going on in my head. I think this might provide some interesting insights over time. I've always wanted a Spotify Rewind for my life, and I think Astrud might be able to provide it.

This was the breakthrough that made everything work. I discovered that you cannot expect an LLM to simultaneously have a concise, human-feeling conversational persona and rich tool-calling behavior. When I prompted a single agent to be concise like a human, it became concise with its tool calls too — refusing to do the heavy memory and execution work.
The solution: split them. An interaction agent handles conversation — its entire existence revolves around feeling human, matching the user's tone and verbosity. An execution agent works behind the curtain, processing delegated tasks with no constraints on thoroughness. When you send a long voice note, the interaction agent acknowledges instantly and delegates the heavy memory work to execution, so it doesn't block conversation.
I found this approach by discovering that Poke — an AI assistant whose conversational feel I admired — uses a multi-agent architecture. Someone convinced their Poke to email them its system prompts (plural). In hindsight, it's genius. The natural benefit is cost: the interaction agent can be a cheap, fast model. I'm using Grok 4.1 fast reasoning at 20 cents in / 50 cents out per million tokens. I can talk to it all day for basically free. Grok also gives me X search, which is useful for real-time information. The execution agent is Claude Opus 4.6. I was using Grok for execution too and wasn't getting the results I wanted. I started architecting a better system prompt, then thought — let me just plug in Opus and see if it just works. It just worked.
My first attempts at memory were a filing cabinet approach. I gave the agent tools — read_file, write_file — and access to files like Projects.md, Goals.md, Notes.md, Memory.md, User_Preferences.md, Personality.md. The idea was to give the agent freedom to organize its own internal state.
This didn't work. I was fighting two battles without knowing it. First, I was forcing the LLM to decide not only when to remember things, but how to categorize them. If you're an LLM and I tell you something, how do you decide whether it goes in Notes, Memory, or User Preferences? I was choosing these categories based on vibes — they were inherently unclear. Second, the concise persona problem was causing the agent to be concise with its tool calls, so it wasn't even trying to store memories properly.
I was discussing this with Opus 4.5, trying to architect a more human-like memory system. I remembered learning about human memory types in a college psychology course — short-term, working, long-term — and was trying to model encoding and surfacing. I asked Claude if there was active research on this, and it returned A-MEM — a paper out of Rutgers that presents a solution for creating a human-like memory bank for LLMs.
Here's how it works: when we want to remember something, we do an LLM call to format the memory as an atomic fact and generate tags for it. Then we generate a vector embedding and insert it into the brain. A smaller model analyzes whether the new memory node should be connected to existing nodes based on the embedding and the tags. At runtime, before the LLM responds to any query, we call surface_memories — we look at the user's message, get an embedding, and surface related memories from the brain. These get dynamically injected into the system prompt. Persistent, interconnected, atomic memories that surface automatically.

The first features I thought of were morning briefing — where I'm prompted with my active pursuits and held accountable to commit to working toward them that day — and evening briefing where I give a free-form recap of my day. This carries to weekly review where the LLM inferences my week and gives me a recap, then monthly, then yearly. Astrud doesn't just wait to be spoken to.
A full web dashboard built in React shows every LLM call, tool invocation, and token count as they happen via server-sent events. An Activity feed for raw event streaming. An Execution page for watching task processing in real-time with grouped tool calls nested under each LLM iteration. A Memory browser with live updates as new knowledge is stored.
┌──────────────┐
│ Telegram │
│ (you) │
└──────┬───────┘
│
┌──────▼───────────────────────────────────┐
│ Interaction Agent (Grok 4.1 fast) │
│ "feel human, match tone, be concise" │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Delegates to execution agent │ │
│ │ when heavy work is needed │ │
│ └──────────────┬─────────────────────┘ │
└─────────────────┼────────────────────────┘
│
┌─────────────────▼────────────────────────┐
│ Execution Agent (Claude Opus 4.6) │
│ "be thorough, use tools aggressively" │
│ │
│ ┌──────────┐ ┌───────────────────┐ │
│ │ A-MEM │ │ Task Execution │ │
│ │ Memory │ │ (calendar, web, │ │
│ │ System │ │ search, etc.) │ │
│ └────┬─────┘ └───────────────────┘ │
│ │ │
│ ┌────▼──────────────────────────────┐ │
│ │ SQLite + Vector Embeddings │ │
│ │ (memories, connections, tags) │ │
│ └───────────────────────────────────┘ │
└──────────────────────────────────────────┘
│
┌──────▼───────────────────────────────────┐
│ React Dashboard (Vite + SSE) │
│ Activity │ Execution │ Memory Browser │
└──────────────────────────────────────────┘The backend is a Fastify server in TypeScript. The interaction agent handles all user-facing communication through Telegram. When it detects something that requires heavy processing — storing complex memories, executing multi-step tasks, analyzing information — it delegates to the execution agent. The two agents never share a conversation thread; delegation happens through structured task handoff.
The memory system runs in the execution agent's context. Every memory is an atomic node with vector embeddings and tags, connected to related nodes. At conversation time, the interaction agent calls surface_memories with the user's message, and relevant memories are injected into its system prompt under a section that says “these memories surfaced in your mind based on what the user said.”

The discovery that you can't have a concise persona and aggressive tool use in the same agent was the turning point. A single agent either feels robotic and verbose (optimized for execution) or feels human but does nothing useful (optimized for conversation). Splitting them means each agent's system prompt is entirely devoted to its job. The interaction agent's prompt literally says “IF YOU IGNORE EVERYTHING ELSE, ADHERE TO THIS” about matching user verbosity.
The filing cabinet approach — giving the agent categorized files to maintain — was rigid but unclear. The categories were vibes-based and the LLM couldn't consistently decide where information belonged. A-MEM eliminates the categorization problem entirely. Every memory is an atomic node. Organization emerges from vector similarity and tag connections, not predetermined buckets.
The interaction agent runs on Grok 4.1 fast reasoning — 20 cents in, 50 cents out per million tokens. I can talk to it all day for basically free. Grok also gives me X search for real-time information, which other LLMs don't have. The execution agent runs on Opus 4.6. I haven't done extensive price/performance testing, but Opus just works for complex multi-step tasks where cheaper models fall short.
I'm actively building Astrud into a platform where I can host multiple independent agents for myself, each with their own custom connectors to the outside world. Astrud is the personal agent. A plant with an ESP32 and a Telegram account is another agent. Each one gets its own personality, its own tools, its own connections — but they all run on the same infrastructure. I swear I started working on this before I heard of Clawdbot. But even still, I wanted my own version — like the dad who says “no, don't buy that $500 table, I can make it for the low cost of $1,200 and 10 years off my lifespan.” And also for security reasons. I want my brain state running on my own infrastructure.
I want Astrud to do real tasks — manage my calendar, inbox, analyze my finances, tell me what's going on in the world. The memory and conversation layer works. The next step is integrating with the systems I actually use day-to-day, so Astrud goes from someone I talk to into someone who gets things done.
There are also obvious security implications to uploading your brain state to a database. Every stray thought, business idea, and personal observation is sitting in SQLite. I need to think carefully about encryption and access control before I'd recommend this approach to anyone else.