Memory Architecture

Not a RAG.
Something lighter.

Two layers of persistent memory: a knowledge graph powered by wikilinks, and on-demand vector search over your codebase. Both run locally. No server. No embeddings pipeline.

The problem with RAG

RAG is optimized
for millions of docs.

You have tens of notes, not millions. Vector databases are overkill — and they hide what your agents actually know.

Traditional RAG
Code → Chunking → Embeddings → Vector DB
→ Similarity search → Retrieved chunks
→ LLM query → answer
Requires: server, embeddings API, vector DB
Black box — you can't read what it retrieved
Cost per query — embedding every note
All agents retrieve the same pool of knowledge
Chunks lose context of surrounding file

Layer 1

Wikilink
knowledge graph.

Each memory is a node. Each [[wikilink]] is an edge. Agents read only what's linked to them — 18 entries (~72 tokens) instead of everything (~5000 tokens).

Knowledge Graph — hover to explore · click to highlight
Agent
Heuristic
Bug
Decision
Error
Tool/Stack
Layer 1 — Wikilinks (always on)
Individual .md files in memory-vault/. Each references agents with [[wikilinks]]. Before each task, agents grep for their own name and load only those files.
# Before You Act (MANDATORY)
grep [[backend-dev]] heuristics/
grep [[backend-dev]] errors/
grep related-keyword bugs/

18 entries · ~72 tokens · $0
Layer 2 — Deep Memory (on demand)
Vector search over schema, APIs, migration history, PRDs, git log. Agents query it by semantic intent when they need historical context beyond their personal notes.
# vault query
tasuki vault query "how do we handle auth"

→ models/user.py (score: 0.94)
→ routers/auth.py (score: 0.91)
→ plans/jwt-auth/prd.md (0.87)
50ms · local SQLite · no API

Layer 2 in depth

Yes, we use RAG.
But differently.

Layer 1 (wikilinks) is the fast index — always loaded, zero cost. Layer 2 uses a real vector store via MCP, but only when the agent needs deep context. The key difference: you choose the backend, not us.

SCHEMA
Models, migrations, tables, columns, relations. Your entire database structure indexed and searchable by meaning.
API
Views, routes, serializers, controllers. Every endpoint with its code, auth requirements, and request/response types.
PLANS
PRDs, implementation plans, architectural decisions from past pipeline runs. The Planner doesn't repeat what was already designed.
MEMORIES
Heuristics, bugs, lessons, errors — the same nodes from Layer 1 but with full content (not just the 4-line summary).
GIT
Recent commits and diffs. "What changed last time we touched auth?" — answered without running git log.
CONFIG
Docker, settings, package.json, pyproject.toml. Infrastructure context without reading each file.
How Layer 2 works
1
Sync
tasuki vault sync indexes your entire project into the vector store. Runs automatically on onboard and when agents write new memories.
2
Query
Agent asks: "how do we handle payments?" The MCP searches by semantic similarity and returns full context — schema, code, plans, incidents.
3
Swap
Outgrow SQLite? Change one line in .mcp.json. Replace rag-memory-mcp with Qdrant or pgvector. The agent query stays the same.
The agent is backend-agnostic. Whether you're on Scale 0 (files only) or Scale 3 (pgvector in production), the agent runs the exact same query. The MCP abstraction means you never touch agent prompts when scaling. This is the adoption argument: start with zero infra, grow when you need to.
Default — local SQLite ($0)
"rag-memory": {
  "command": "npx",
  "args": ["rag-memory-mcp"]
}
Scale 2 — Qdrant (same query, different backend)
"rag-memory": {
  "command": "npx",
  "args": ["qdrant-mcp"]
}

Knowledge graph structure

Every node has
a type and purpose.

The vault is auto-initialized during tasuki onboard. Agents write to it when they learn something non-obvious. You can open it in Obsidian.

memory-vault/ structure
memory-vault/
├── index.md ← graph index · recent activity
├── agents/ ← auto-generated from installed agents
│ ├── backend-dev.md → [[fastapi]] [[postgres]] [[jwt]]
│ └── security.md → [[semgrep-mcp]] [[owasp]]
├── heuristics/ ← PERMANENT rules (never expires)
│ ├── always-index-lookups.md → [[db-architect]] [[backend-dev]] [[reviewer]]
│ ├── parameterized-queries.md → [[backend-dev]] [[security]]
│ └── tests-before-code.md → [[qa]] [[backend-dev]] [[reviewer]]
├── bugs/ ← EPISODIC: specific incidents
│ └── advisory-lock-bug.md → [[db-architect]] [[backend-dev]]
├── errors/ ← "DO NOT" entries · appear in project-facts
│ └── used-print-not-logger.md → [[backend-dev]]
├── decisions/ ← architectural decisions with alternatives
├── tools/ ← MCP server nodes (auto-generated)
│ ├── sentry-mcp.md → [[debugger]] [[devops]]
│ └── postgres-mcp.md → [[db-architect]] [[debugger]]
└── stack/ ← technology nodes (auto-generated)
└── fastapi.md → [[backend-dev]] [[qa]] [[security]]
4-dimension entry format
## 2026-03-14 — advisory_lock doesn't rollback on exception
**Pattern**: advisory_lock context manager in asyncpg does NOT auto-rollback if an exception
is raised inside the block. Must explicitly handle in finally clause.
**Evidence**: `app/services/payment.py:47` — lock held after IntegrityError, blocking retries
**Scope**: All advisory_lock usages in services/. Applies to any asyncpg lock context.
**Prevention**: grep for `advisory_lock` without `finally: await conn.execute("SELECT pg_advisory_unlock")`

tags: [[backend-dev]] [[db-architect]] [[asyncpg]] [[postgres]]
PATTERN
Reusable knowledge, not the symptom
EVIDENCE
Proof it's real — file:line observed
SCOPE
Where else this applies
PREVENTION
Grep pattern or rule to catch next time

The closed loop

Memory connects
to every stage.

Memory isn't decorative — it's a feedback loop. Each pipeline run reads from the vault, and Stage 9 writes back.

Memory flow — "add payment endpoint"
Task: "add payment endpoint"

[Planner] reads [[websocket-vs-sse]] → knows past architecture decisions

[QA] reads [[parameterized-queries]] → writes SQL injection tests first

[Backend] reads [[used-print-not-logger]] → uses structured logging
queries RAG: "payments?" → full schema + past PRDs

[Security] reads [[advisory-lock-bug]] → checks for transaction safety

[Stage 9] writes payment-idempotency.md → tagged [[backend-dev]] [[planner]]

Next task starts with this knowledge loaded.
CONNECTION 1
Before You Act
Every agent grepping for [[its-name]] before any task. Loads ~18 entries (~72 tokens) — not 200.
CONNECTION 2
Stage 9 Writes
After Reviewer APPROVES, Stage 9 writes to vault — only if non-obvious insight, not after every task.
CONNECTION 3
Error Memory
tasuki error "used print() not logger" → creates node AND appears in project-facts.md "Do NOT" section.
CONNECTION 4
Project Facts
Anti-hallucination file auto-generated from real imports. Versions from package files, paths from ls. Not guessed.
CONNECTION 5
Project Context
Business understanding from the onboard interview — read only by Planner. Other agents get business context through Taskmaster tasks (~500 tokens saved per agent).
CONNECTION 6
Capability Map
Auto-generated from agent frontmatter. Planner reads this to route tasks by domain, not by name. Adding a new agent auto-updates the map.

Design constraints

20-entry limit.
Not a bug.

Without the cap, 6 months of active use would give each agent 200+ entries. Context fills with noise. Valuable insights get buried.

Anti-bloat rules
MAX 20 ENTRIES PER AGENT
When full, remove the oldest LOW-value entry. Signal-to-noise stays high forever.
TRIGGER CONDITIONS
Only write when: root cause was non-obvious, fix caused regression, new pattern discovered, security finding was new. Not after every task.
NO DUPLICATION
If info is already in the agent's prompt or TASUKI.md, don't repeat in memory. One entry per insight.
PROMOTION PROTOCOL
Same pattern appears 2+ times in bugs/lessons → create a permanent heuristic. The insight graduates to a rule.
Growth path — scale without changing agents
Scale
Backend
Use when
0
Wikilinks only
Offline, no infra
1
rag-memory-mcp
Default — local SQLite
2
Qdrant MCP
Team / high volume
3
pgvector
Production / multi-team
The agent query never changes. Whether you're on Scale 0 or Scale 3, the agent runs the same query. Only the MCP backend changes. No agent prompt updates needed.
Memory by mode
fast
Wikilinks only
No RAG queries
standard
Wikilinks + RAG
When needed
serious
Wikilinks + RAG
Mandatory every stage

Why not just RAG?

The decisions
behind the design.

Decision Alternative evaluated Why this option
Individual .md nodes + wikilinks Single MEMORY.md per agent with appended lines Flat files become noise after 50 entries. Individual nodes are navigable. Compatible with Obsidian.
Behavioral memory (agent writes it) PostToolUse hook that auto-extracts learnings A shell script can't evaluate significance. "Root cause required 3 investigation rounds" needs LLM reasoning, not regex.
No RAG/Vector DB by default ChromaDB or Pinecone from the start Grep works for tens of notes. Vector DBs are optimized for millions. Over-engineering creates dependencies and infra costs.
Max 20 entries per agent Unlimited entries Without the limit, 6 months = 200+ entries per agent. Context fills with noise. The limit forces curation.
Local SQLite for Layer 2 External vector API (OpenAI embeddings) $0, offline, <50ms queries, no API key required. Works in air-gapped environments.
Agent-specific filtering ([[wikilinks]]) Load all memory for every agent Backend Dev doesn't need Frontend Dev's heuristics. Filtering reduces context from ~5000 tokens to ~700 tokens per agent.
Real data

50 entries. 148 KB.
Under 50ms.

Numbers from a real Django + PostgreSQL project after onboarding and one pipeline run.

$ tasuki vault sync
Indexing memory vault...     19 memories
Indexing database schema...   8 schema files
Indexing API endpoints...     13 API files
Indexing pipeline plans...    7 plan files
Indexing recent git history... 20 commits
Indexing config files...      3 config files

RAG deep memory: 70 entries indexed
Database: 148 KB · Query time: <50ms
$ tasuki vault query "authentication"
[schema] authentication models
  CustomAPIKey, JWT tokens...
[api]    authentication views
  login, register, logout endpoints...
[api]    authentication serializers
  token validation, password change...
[plan]   auth-logout-profile
  PRD + implementation details

4 results in 12ms
50+
entries indexed
148
KB database
<50
ms per query
$0
infrastructure cost

Common questions

Memory FAQ

Yes — they're plain .md files. Open memory-vault/ in Obsidian or any text editor. Edit, delete, add [[wikilinks]] to connect nodes. The vault is fully human-readable and human-editable.
The agent simply won't load that memory next time. No errors, no broken references. The [[wikilink]] becomes a dead link — same as in Obsidian. You can run tasuki vault sync to update the RAG index after deleting.
No. The default backend (rag-memory-mcp) uses SQLite locally. Everything runs on your machine — no API calls, no cloud, no embeddings service. Works in air-gapped environments.
When the same pattern appears 2+ times in bugs/ or lessons/, the agent creates a permanent entry in heuristics/. The original episodic entries remain for historical context, but the heuristic becomes the canonical rule. Example: two separate SQL injection bugs → one "always use parameterized queries" heuristic.
The oldest LOW-value entry is archived (moved out, not deleted). CRITICAL and HIGH entries are never auto-removed. The cap keeps signal-to-noise high — without it, 6 months of use would give each agent 200+ entries and context would fill with noise.
Yes. Change the MCP backend in .mcp.json. Replace rag-memory-mcp with qdrant-mcp or pgvector. The agent query never changes — only the backend. See the Growth Path table above for options.
No. Each agent reads only memories tagged with its [[wikilink]]. Backend Dev reads [[backend-dev]] entries (~18 memories, ~72 tokens). Security reads [[security]] entries. A memory can be tagged with multiple agents if it's relevant to more than one.
Claude's memory is per-conversation and resets between sessions. Tasuki's memory is per-project and persists forever. It's also structured (typed nodes with wikilinks) instead of flat notes, agent-specific instead of global, and visible/editable by you instead of hidden inside the model.

Your AI gets smarter
with every task.

Onboard once. The knowledge graph initializes automatically.

npm install -g tasuki
click to copy