Local-first agent memory

It's just layers.

People overthink this — they think they need to learn vector databases and embeddings. You don't. A "second brain" is just a few layers, each doing one obvious job. Here's the whole mental model — then the full build if you want it.

The mental model — this is all you actually need to understand.

🤖 The AIClaude Code · Codex · your assistant
💬 what you talk to
The part you actually talk to. It sits on top and uses every layer below — it doesn't store anything itself.
🧠 Memory layere.g. Hindsight
⚙️ plumbing · never touch
Carries context across sessions — decisions, preferences, what worked. So chat #50 knows what chat #1 figured out.📎 like an assistant who remembers your last visit
🔍 Retrieval layere.g. QMD
⚙️ plumbing · "vectors" hide here
Pulls the right note instantly instead of you re-explaining. This is the only place "search / vectors" live — and you never touch them.📎 like a librarian who knows where everything is
📚 Knowledge layerObsidian (markdown)
👁️ you read & own this
Your notes in plain markdown you own — human-readable, no lock-in. The source of truth. Everything above just serves this.📎 like the library + your own notebooks

↑ truth at the bottom · the AI on top · each layer does exactly one job

📌 Notice what's not here No "vector database" to run. No embeddings to understand. No graph theory. Those are implementation details that live inside one layer (retrieval) and stay there. You're organizing a library, not building a database — start with notes + a search tool and you already have a working second brain.

🗂️ Entity layer

A structured map of the people, projects & decisions that matter (e.g. GBrain). The "who/what" your notes are about.

💻 Session-recall layer

"Have we solved this before?" over past work sessions (e.g. CASS). Useful once you're coding with agents a lot.

📥 Ingestion + 🔁 upkeep

Auto-pull email/calendar/Slack in, and a nightly pass that tidies notes into durable memory. Pure convenience — bolt on last.

Same four layers — here's how the real pieces wire together on my box. Markdown owns the truth; everything else serves it.

flowchart TB
  subgraph SRC["📥 INGESTION"]
    direction LR
    GM["Gmail · Calendar"]:::src
    SL["Slack"]:::src
  end
  subgraph KNOW["📚 KNOWLEDGE LAYER — markdown owns the truth"]
    direction LR
    OBS["Obsidian Vault
wiki + brain"]:::truth GB["GBrain
typed entity graph"]:::truth AM["Agent Memory
~/.agents/memory"]:::truth end QMD["🔍 RETRIEVAL LAYER — QMD
lex · vec · hyde · remote embed (OpenRouter)"]:::retr HS["🧠 MEMORY LAYER — Hindsight
retain / recall / reflect · shared cloud bank"]:::work CASS["💻 CASS
coding-session recall"]:::code subgraph RT["🤖 THE AI — runtimes"] direction LR HER["Hermes
always-on assistant"]:::rt OMP["OMP
interactive coding"]:::rt end CRON["⏰ cron + nightly 'dream' pass"]:::sch GM --> GB SL --> GB OBS --> QMD GB --> QMD AM --> QMD QMD --> HER QMD --> OMP HER <==> HS OMP <==> HS OMP <==> CASS HS -. "reviewed promotions" .-> AM CRON --> GM CRON --> AM classDef src fill:#3a2412,stroke:#fb923c,color:#ffe6cf,stroke-width:1.5px; classDef truth fill:#0e3330,stroke:#34d399,color:#d7fff2,stroke-width:1.5px; classDef retr fill:#10233f,stroke:#22d3ee,color:#d6f6ff,stroke-width:2px; classDef work fill:#241646,stroke:#7c5cff,color:#ece4ff,stroke-width:2.5px; classDef code fill:#3a1226,stroke:#f471b5,color:#ffd6e6,stroke-width:1.5px; classDef rt fill:#1e2236,stroke:#9aa0b8,color:#fff,stroke-width:1.5px; classDef sch fill:#15161f,stroke:#5a607c,color:#cfd3e6,stroke-width:1.2px;

Knowledge owns the truth · QMD indexes it · Hindsight is the shared working memory both runtimes read/write · GBrain/CASS are the optional extra layers · collectors + a nightly pass keep it fed.

📚 Obsidian Vault — knowledge layer
Human-curated markdown · the source of truth
Plain markdown you own. Hosts the human wiki and a typed brain subtree. Read-mostly; synced across devices with Obsidian Sync. Everything else just serves this.
🔍 QMD — retrieval layer
Local markdown search index
Open-source search over your markdown (tobi/qmd). Modes: lex (keyword), vec (semantic), hyde. Embeddings + rerank run remotely via OpenRouter so it's fast even on a tiny ARM box. The one layer where "vectors" exist — read-only retrieval, never the truth.
🧠 Hindsight — memory layer
Shared working memory · retain / recall / reflect
Cloud-hosted, with both runtimes as clients of one shared bank. Holds live working context — decisions, preferences, what worked — scoped per-project so sessions don't bleed. Authoritative for context, not for truth.
🗂️ GBrain — entity layer (optional)
Typed graph of people / projects / decisions
A structured map of the real-world things your notes are about. Lives inside the vault; local + privacy-sensitive.
💻 CASS — session-recall layer (optional)
"Have we solved this before?"
Lexical search over raw coding transcripts across Claude Code / Codex / OpenCode.
🤖 Hermes & OMP — the AI
Always-on assistant + interactive coding agent
Hermes is always-on (Telegram / CLI / web) and owns the scheduler; OMP is the interactive coding agent. Both use the memory layer against the shared bank. Collectors pull Gmail/Calendar/Slack in; a nightly "dream" pass consolidates notes into durable memory.
🧭 The whole idea in one line: your notes own the truth · search finds them · memory carries context across sessions. Everything else is an optimization.