Memory — Tasuki

Tasuki vs Traditional RAG

Wikilinks are $0, offline,
human-readable.

Layer 1 (wikilinks) costs nothing, runs offline, and you can read every memory in a text editor. RAG is Layer 2 — on-demand for deep context when agents need it. You don't choose one or the other; you get both.

Traditional RAG

Code → Chunking → Embeddings → Vector DB
→ Similarity search → Retrieved chunks
→ LLM query → answer

✗ Requires: server, embeddings API, vector DB

✗ Black box — you can't read what it retrieved

✗ Cost per query — embedding every note

✗ All agents retrieve the same pool of knowledge

✗ Chunks lose context of surrounding file

Tasuki Memory

Agent acts → learns → writes .md node
→ [[wikilinks]] connect knowledge
→ next run → grep for [[this-agent]]

✓ $0 · offline · human-readable markdown

✓ Transparent — open any file in Obsidian

✓ Agent-specific filtering: ~72 tokens vs 5000

✓ Full file context — no chunking artifacts

✓ Layer 2 available for deep codebase queries

Layer 1 — Wikilinks (always on)

Wikilink
knowledge graph.

Each memory is a node. Each [[wikilink]] is an edge. Agents read only what's linked to them — 18 entries (~72 tokens) instead of everything (~5000 tokens). Graph Expansion uses BFS O(V+E) to traverse 1-2 levels of connected knowledge. Confidence scoring (high/experimental/deprecated) with auto-decay ensures stale knowledge doesn't pollute context. Applied-count tracks how many times each heuristic was actually used.

Knowledge Graph — hover to explore · click to highlight

Agent

Heuristic

Bug

Decision

Error

Tool/Stack

Layer 1 — Wikilinks (always on)

Individual .md files in memory-vault/. Each references agents with [[wikilinks]]. Before each task, agents grep for their own name and load only those files. Graph Expansion (BFS) traverses 1-2 levels of connected nodes for deeper context. Each entry has a confidence level (high/experimental/deprecated) that auto-decays over time via tasuki vault decay. Applied-count tracks usage.

        # Before You Act (MANDATORY)

        grep [[backend-dev]] heuristics/

        grep [[backend-dev]] errors/

        grep related-keyword bugs/

        ↓

        18 entries · ~72 tokens · $0

Layer 2 — RAG Deep Memory (on demand)

Vector search over schema, APIs, migration history, PRDs, git log. Agents query it by semantic intent when they need historical context beyond their wikilink memories. This is where traditional RAG lives — but it's on-demand, not always-on.

        # vault query

        tasuki vault query "how do we handle auth"

        ↓

        → models/user.py (score: 0.94)

        → routers/auth.py (score: 0.91)

        → plans/jwt-auth/prd.md (0.87)

        50ms · local SQLite · no API

Layer 2 — RAG Deep Memory

Yes, we use RAG.
But on-demand.

Layer 1 (wikilinks) is the fast index — always loaded, zero cost. Layer 2 uses a real vector store via MCP, but only when the agent needs deep context. The key difference: you choose the backend, not us. Growth path: Scale 0 (wikilinks only) → Scale 1 (SQLite RAG) → Scale 2 (Qdrant) → Scale 3 (pgvector).

SCHEMA

Models, migrations, tables, columns, relations. Your entire database structure indexed and searchable by meaning.

API

Views, routes, serializers, controllers. Every endpoint with its code, auth requirements, and request/response types.

PLANS

PRDs, implementation plans, architectural decisions from past pipeline runs. The Planner doesn't repeat what was already designed.

MEMORIES

Heuristics, bugs, lessons, errors — the same nodes from Layer 1 but with full content (not just the 4-line summary).

GIT

Recent commits and diffs. "What changed last time we touched auth?" — answered without running git log.

CONFIG

Docker, settings, package.json, pyproject.toml. Infrastructure context without reading each file.

How Layer 2 works

1

Sync

tasuki vault sync indexes your entire project into the vector store. Runs automatically on onboard and when agents write new memories.

2

Query

Agent asks: "how do we handle payments?" The MCP searches by semantic similarity and returns full context — schema, code, plans, incidents.

3

Swap

Outgrow SQLite? Change one line in .mcp.json. Replace rag-memory-mcp with Qdrant or pgvector. The agent query stays the same.

The agent is backend-agnostic. Whether you're on Scale 0 (files only) or Scale 3 (pgvector in production), the agent runs the exact same query. The MCP abstraction means you never touch agent prompts when scaling. This is the adoption argument: start with zero infra, grow when you need to.

Default — local SQLite ($0)

        "rag-memory": {

          "command": "npx",

          "args": ["rag-memory-mcp"]

        }

Scale 2 — Qdrant (same query, different backend)

        "rag-memory": {

          "command": "npx",

          "args": ["qdrant-mcp"]

        }

Knowledge graph structure

Every node has
a type and purpose.

The vault is auto-initialized during tasuki onboard. Agents write to it when they learn something non-obvious. You can open it in Obsidian.

memory-vault/ structure

memory-vault/
├── index.md ← graph index · recent activity
├── agents/ ← auto-generated from installed agents
│ ├── backend-dev.md → [[fastapi]] [[postgres]] [[jwt]]
│ └── security.md → [[semgrep-mcp]] [[owasp]]
├── heuristics/ ← PERMANENT rules (never expires)
│ ├── always-index-lookups.md → [[db-architect]] [[backend-dev]] [[reviewer]]
│ ├── parameterized-queries.md → [[backend-dev]] [[security]]
│ └── tests-before-code.md → [[qa]] [[backend-dev]] [[reviewer]]
├── bugs/ ← EPISODIC: specific incidents
│ └── advisory-lock-bug.md → [[db-architect]] [[backend-dev]]
├── errors/ ← "DO NOT" entries · appear in project-facts
│ └── used-print-not-logger.md → [[backend-dev]]
├── decisions/ ← architectural decisions with alternatives
├── tools/ ← MCP server nodes (auto-generated)
│ ├── sentry-mcp.md → [[debugger]] [[devops]]
│ └── postgres-mcp.md → [[db-architect]] [[debugger]]
└── stack/ ← technology nodes (auto-generated)
└── fastapi.md → [[backend-dev]] [[qa]] [[security]]

4-dimension entry format

      ## 2026-03-14 — advisory_lock doesn't rollback on exception

      **Pattern**: advisory_lock context manager in asyncpg does NOT auto-rollback if an exception

                  is raised inside the block. Must explicitly handle in finally clause.

      **Evidence**: `app/services/payment.py:47` — lock held after IntegrityError, blocking retries

      **Scope**:   All advisory_lock usages in services/. Applies to any asyncpg lock context.

      **Prevention**: grep for `advisory_lock` without `finally: await conn.execute("SELECT pg_advisory_unlock")`

      tags: [[backend-dev]] [[db-architect]] [[asyncpg]] [[postgres]]

PATTERN

Reusable knowledge, not the symptom

EVIDENCE

Proof it's real — file:line observed

SCOPE

Where else this applies

PREVENTION

Grep pattern or rule to catch next time

The closed loop

Memory connects
to every stage.

Memory isn't decorative — it's a feedback loop. Each pipeline run reads from the vault, and Stage 9 writes back.

Memory flow — "add payment endpoint"

Task: "add payment endpoint"
↓
[Planner] reads [[websocket-vs-sse]] → knows past architecture decisions
↓
[QA] reads [[parameterized-queries]] → writes SQL injection tests first
↓
[Backend] reads [[used-print-not-logger]] → uses structured logging
queries RAG: "payments?" → full schema + past PRDs
↓
[Security] reads [[advisory-lock-bug]] → checks for transaction safety
↓
[Stage 9] writes payment-idempotency.md → tagged [[backend-dev]] [[planner]]
↓
Next task starts with this knowledge loaded.

CONNECTION 1

Before You Act

Every agent grepping for [[its-name]] before any task. Loads ~18 entries (~72 tokens) — not 200.

CONNECTION 2

Stage 9 Writes

After Reviewer APPROVES, Stage 9 writes to vault — only if non-obvious insight, not after every task.

CONNECTION 3

Error Memory

tasuki error "used print() not logger" → creates node AND appears in project-facts.md "Do NOT" section.

CONNECTION 4

Project Facts

Anti-hallucination file auto-generated from real imports. Versions from package files, paths from ls. Not guessed.

CONNECTION 5

Project Context

Business understanding from the onboard interview — read only by Planner. Other agents get business context through their assigned tasks (~500 tokens saved per agent).

CONNECTION 6

Team Sync

Share vault knowledge across your team via tasuki vault push / tasuki vault pull using git branches. One developer learns a pattern, the entire team benefits on the next pipeline run.

Design constraints

20-entry limit.
Not a bug.

Without the cap, 6 months of active use would give each agent 200+ entries. Context fills with noise. Valuable insights get buried.

Anti-bloat rules

MAX 20 ENTRIES PER AGENT

When full, remove the oldest LOW-value entry. Signal-to-noise stays high forever.

TRIGGER CONDITIONS

Only write when: root cause was non-obvious, fix caused regression, new pattern discovered, security finding was new. Not after every task.

NO DUPLICATION

If info is already in the agent's prompt or TASUKI.md, don't repeat in memory. One entry per insight.

PROMOTION PROTOCOL

Same pattern appears 2+ times in bugs/lessons → create a permanent heuristic. The insight graduates to a rule.

Growth path — Scale 0 → Scale 3, no agent changes

Scale

Backend

Use when

0

Wikilinks only

Offline, no infra

1

rag-memory-mcp

Default — local SQLite

2

Qdrant MCP

Team / high volume

3

pgvector

Production / multi-team

The agent query never changes. Whether you're on Scale 0 or Scale 3, the agent runs the same query. Only the MCP backend changes. No agent prompt updates needed.

Memory by mode

fast

Wikilinks only

No RAG queries

standard

Wikilinks + RAG

When needed

serious

Wikilinks + RAG

Mandatory every stage

Why not just RAG?

The decisions
behind the design.

Decision	Alternative evaluated	Why this option
Individual .md nodes + wikilinks	Single MEMORY.md per agent with appended lines	Flat files become noise after 50 entries. Individual nodes are navigable. Compatible with Obsidian.
Behavioral memory (agent writes it)	PostToolUse hook that auto-extracts learnings	A shell script can't evaluate significance. "Root cause required 3 investigation rounds" needs LLM reasoning, not regex.
No RAG/Vector DB by default	ChromaDB or Pinecone from the start	Grep works for tens of notes. Vector DBs are optimized for millions. Over-engineering creates dependencies and infra costs.
Max 20 entries per agent	Unlimited entries	Without the limit, 6 months = 200+ entries per agent. Context fills with noise. The limit forces curation.
Local SQLite for Layer 2	External vector API (OpenAI embeddings)	$0, offline, <50ms queries, no API key required. Works in air-gapped environments.
Agent-specific filtering ([[wikilinks]])	Load all memory for every agent	Backend Dev doesn't need Frontend Dev's heuristics. Filtering reduces context from ~5000 tokens to ~700 tokens per agent.

Real data

50 entries. 148 KB.
Under 50ms.

Numbers from a real Django + PostgreSQL project after onboarding and one pipeline run.

$ tasuki vault sync

        Indexing memory vault...     19 memories

        Indexing database schema...   8 schema files

        Indexing API endpoints...     13 API files

        Indexing pipeline plans...    7 plan files

        Indexing recent git history... 20 commits

        Indexing config files...      3 config files

        RAG deep memory: 70 entries indexed

        Database: 148 KB · Query time: <50ms

$ tasuki vault query "authentication"

        [schema] authentication models

          CustomAPIKey, JWT tokens...

        [api]    authentication views

          login, register, logout endpoints...

        [api]    authentication serializers

          token validation, password change...

        [plan]   auth-logout-profile

          PRD + implementation details

        4 results in 12ms

50+

entries indexed

148

KB database

<50

ms per query

$0

infrastructure cost

Common questions

Memory FAQ

Yes — they're plain .md files. Open memory-vault/ in Obsidian or any text editor. Edit, delete, add [[wikilinks]] to connect nodes. The vault is fully human-readable and human-editable.

The agent simply won't load that memory next time. No errors, no broken references. The [[wikilink]] becomes a dead link — same as in Obsidian. You can run tasuki vault sync to update the RAG index after deleting.

No. The default backend (rag-memory-mcp) uses SQLite locally. Everything runs on your machine — no API calls, no cloud, no embeddings service. Works in air-gapped environments.

When the same pattern appears 2+ times in bugs/ or lessons/, the agent creates a permanent entry in heuristics/. The original episodic entries remain for historical context, but the heuristic becomes the canonical rule. Example: two separate SQL injection bugs → one "always use parameterized queries" heuristic.

The oldest LOW-value entry is archived (moved out, not deleted). CRITICAL and HIGH entries are never auto-removed. The cap keeps signal-to-noise high — without it, 6 months of use would give each agent 200+ entries and context would fill with noise.

Yes. Change the MCP backend in .mcp.json. Replace rag-memory-mcp with qdrant-mcp or pgvector. The agent query never changes — only the backend. See the Growth Path table above for options.

No. Each agent reads only memories tagged with its [[wikilink]]. Backend Dev reads [[backend-dev]] entries (~18 memories, ~72 tokens). Security reads [[security]] entries. A memory can be tagged with multiple agents if it's relevant to more than one.

Claude's memory is per-conversation and resets between sessions. Tasuki's memory is per-project and persists forever. It's also structured (typed nodes with wikilinks) instead of flat notes, agent-specific instead of global, and visible/editable by you instead of hidden inside the model.

Memory that persists
across sessions.

Wikilinks are $0, offline,
human-readable.

Wikilink
knowledge graph.

Yes, we use RAG.
But on-demand.

Every node has
a type and purpose.

Memory connects
to every stage.

20-entry limit.
Not a bug.

The decisions
behind the design.

50 entries. 148 KB.
Under 50ms.

Memory FAQ

Memory that persists.
Knowledge that grows.

Memory that persistsacross sessions.

Wikilinks are $0, offline,human-readable.

Wikilinkknowledge graph.

Yes, we use RAG.But on-demand.

Every node hasa type and purpose.

Memory connectsto every stage.

20-entry limit.Not a bug.

The decisionsbehind the design.

50 entries. 148 KB.Under 50ms.

Memory FAQ

Memory that persists.Knowledge that grows.

Memory that persists
across sessions.

Wikilinks are $0, offline,
human-readable.

Wikilink
knowledge graph.

Yes, we use RAG.
But on-demand.

Every node has
a type and purpose.

Memory connects
to every stage.

20-entry limit.
Not a bug.

The decisions
behind the design.

50 entries. 148 KB.
Under 50ms.

Memory that persists.
Knowledge that grows.