Local-first agent memory

It's just layers.

People overthink this — they think they need to learn vector databases and embeddings. You don't. A "second brain" is just a few layers, each doing one obvious job. Here's the whole mental model — then the full build if you want it.

The mental model — this is all you actually need to understand.

🤖 The AIClaude Code · Codex · your assistant

💬 what you talk to

The part you actually talk to. It sits on top and uses every layer below — it doesn't store anything itself.

🧠 Memory layere.g. Hindsight

⚙️ plumbing · never touch

Carries context across sessions — decisions, preferences, what worked. So chat #50 knows what chat #1 figured out.📎 like an assistant who remembers your last visit

🔍 Retrieval layere.g. QMD

⚙️ plumbing · "vectors" hide here

Pulls the right note instantly instead of you re-explaining. This is the only place "search / vectors" live — and you never touch them.📎 like a librarian who knows where everything is

📚 Knowledge layerObsidian (markdown)

👁️ you read & own this

Your notes in plain markdown you own — human-readable, no lock-in. The source of truth. Everything above just serves this.📎 like the library + your own notebooks

↑ truth at the bottom · the AI on top · each layer does exactly one job

📌 Notice what's not here No "vector database" to run. No embeddings to understand. No graph theory. Those are implementation details that live inside one layer (retrieval) and stay there. You're organizing a library, not building a database — start with notes + a search tool and you already have a working second brain.

If you want to go further (optional extra layers)

🗂️ Entity layer

A structured map of the people, projects & decisions that matter (e.g. GBrain). The "who/what" your notes are about.

💻 Session-recall layer

"Have we solved this before?" over past work sessions (e.g. CASS). Useful once you're coding with agents a lot.

📥 Ingestion + 🔁 upkeep

Auto-pull email/calendar/Slack in, and a nightly pass that tidies notes into durable memory. Pure convenience — bolt on last.

Same four layers — here's how the real pieces wire together on my box. Markdown owns the truth; everything else serves it.

flowchart TB
  subgraph SRC["📥 INGESTION"]
    direction LR
    GM["Gmail · Calendar"]:::src
    SL["Slack"]:::src
  end
  subgraph KNOW["📚 KNOWLEDGE LAYER — markdown owns the truth"]
    direction LR
    OBS["Obsidian Vault
wiki + brain"]:::truth
    GB["GBrain
typed entity graph"]:::truth
    AM["Agent Memory
~/.agents/memory"]:::truth
  end
  QMD["🔍 RETRIEVAL LAYER — QMD
lex · vec · hyde · remote embed (OpenRouter)"]:::retr
  HS["🧠 MEMORY LAYER — Hindsight
retain / recall / reflect · shared cloud bank"]:::work
  CASS["💻 CASS
coding-session recall"]:::code
  subgraph RT["🤖 THE AI — runtimes"]
    direction LR
    HER["Hermes
always-on assistant"]:::rt
    OMP["OMP
interactive coding"]:::rt
  end
  CRON["⏰ cron + nightly 'dream' pass"]:::sch

  GM --> GB
  SL --> GB
  OBS --> QMD
  GB --> QMD
  AM --> QMD
  QMD --> HER
  QMD --> OMP
  HER <==> HS
  OMP <==> HS
  OMP <==> CASS
  HS -. "reviewed promotions" .-> AM
  CRON --> GM
  CRON --> AM

  classDef src   fill:#3a2412,stroke:#fb923c,color:#ffe6cf,stroke-width:1.5px;
  classDef truth fill:#0e3330,stroke:#34d399,color:#d7fff2,stroke-width:1.5px;
  classDef retr  fill:#10233f,stroke:#22d3ee,color:#d6f6ff,stroke-width:2px;
  classDef work  fill:#241646,stroke:#7c5cff,color:#ece4ff,stroke-width:2.5px;
  classDef code  fill:#3a1226,stroke:#f471b5,color:#ffd6e6,stroke-width:1.5px;
  classDef rt    fill:#1e2236,stroke:#9aa0b8,color:#fff,stroke-width:1.5px;
  classDef sch   fill:#15161f,stroke:#5a607c,color:#cfd3e6,stroke-width:1.2px;

Knowledge owns the truth · QMD indexes it · Hindsight is the shared working memory both runtimes read/write · GBrain/CASS are the optional extra layers · collectors + a nightly pass keep it fed.

What each piece is

📚 Obsidian Vault — knowledge layer

Human-curated markdown · the source of truth

›

Plain markdown you own. Hosts the human wiki and a typed brain subtree. Read-mostly; synced across devices with Obsidian Sync. Everything else just serves this.

🔍 QMD — retrieval layer

Local markdown search index

›

Open-source search over your markdown (tobi/qmd). Modes: lex (keyword), vec (semantic), hyde. Embeddings + rerank run remotely via OpenRouter so it's fast even on a tiny ARM box. The one layer where "vectors" exist — read-only retrieval, never the truth.

🧠 Hindsight — memory layer

Shared working memory · retain / recall / reflect

›

Cloud-hosted, with both runtimes as clients of one shared bank. Holds live working context — decisions, preferences, what worked — scoped per-project so sessions don't bleed. Authoritative for context, not for truth.

🗂️ GBrain — entity layer (optional)

Typed graph of people / projects / decisions

›

A structured map of the real-world things your notes are about. Lives inside the vault; local + privacy-sensitive.

💻 CASS — session-recall layer (optional)

"Have we solved this before?"

›

Lexical search over raw coding transcripts across Claude Code / Codex / OpenCode.

🤖 Hermes & OMP — the AI

Always-on assistant + interactive coding agent

›

Hermes is always-on (Telegram / CLI / web) and owns the scheduler; OMP is the interactive coding agent. Both use the memory layer against the shared bank. Collectors pull Gmail/Calendar/Slack in; a nightly "dream" pass consolidates notes into durable memory.

🧭 The whole idea in one line: your notes own the truth · search finds them · memory carries context across sessions. Everything else is an optimization.