Architecture

The 9-Stage Pipeline

Sequential by design — every stage has a reason to go exactly where it does. Click any stage to see what it does, what tools it uses, and how it hands off to the next.

1
Planner
Designs architecture · writes PRD · decomposes work · waits for your OK
Confirms with you thinking · opus

What it reads

  • .tasuki/agents/planner.md — full role definition
  • project-context.md — business understanding from onboard interview
  • project-facts.md — verified stack, paths, commands
  • tasuki-plans/index.md — previous plans to avoid repetition
  • capability-map.yaml — which agents are available and their domains

What it produces

  • tasuki-plans/{feature}/prd.md — product requirements doc
  • tasuki-plans/{feature}/plan.md — implementation plan
  • tasuki-plans/{feature}/status.md — tracking file (updated by each stage)
  • One Taskmaster task per agent — ~50 tokens vs 2000 for full PRD
  • Brief plan summary shown to you before any code is written

MCP tools

Taskmaster MCP Context7 MCP
Taskmaster decoupling: The Planner pushes one task per agent (~50 tokens each). When QA runs, it reads only its own task — not the full PRD. This saves ~12K tokens per pipeline.

Example output

[Planner] Feature: JWT Authentication
───────────────────────────────
Scope: Login, register, refresh, revoke
QA: 12 test cases, auth middleware
DBA: users, sessions, refresh_tokens
Backend: 4 routes, JWT service, middleware
Frontend: login/register/profile pages

Continue with implementation? [y/n]

Why it confirms with you

The Planner stops here because architecture decisions are the human's domain. Once you say yes, the full pipeline runs automatically on Claude Code. This is the only confirmation needed — except for the Frontend design preview at Stage 5.

2
QA
Writes failing tests FIRST — the test is the specification
TDD enforced execution · sonnet

Before it acts (MANDATORY)

  • Read project-facts.md
  • grep [[qa]] in memory-vault/heuristics/ — its permanent rules
  • grep [[qa]] in memory-vault/errors/ — mistakes to avoid

Schema protocol

  • If the model doesn't exist yet → BLOCK, report to DBA with schema needed
  • Tests MUST fail when written — if they pass, they're not testing new behavior
  • Writes test for every endpoint, every state, every auth edge case

MCP tools

Taskmaster MCP Playwright MCP Context7 MCP

Example output

[QA] Writing failing tests (RED phase)
test_login_valid_credentials
test_login_wrong_password → 401
test_jwt_missing → 401 not 500
test_refresh_token_rotation
test_revoke_all_sessions
14 tests written · all failing ✓
Handoff → DBA: needs users, sessions tables

Why TDD is mechanically enforced

The tdd-guard hook intercepts every Write/Edit call. If you try to edit app/routers/users.py without tests/test_users.py existing → exit code 2, blocked. This is not advisory — it's a mechanical restriction. Without it, Backend Dev in Stage 4 could write implementation without any tests, and "testing" becomes "confirming what I already wrote works".

3
DB Architect
Zero-downtime migrations · verifies plan matches tests · production-safe DDL
execution · sonnet

Pipeline coordination check

  • Reads Planner's plan AND QA's tests
  • Verifies they agree on schema before creating anything
  • If discrepancy → reports to Planner before touching the DB

Production-safe DDL patterns

  • Column renames → expand-contract (never direct rename)
  • NOT NULL additions → add nullable first, backfill, then constrain
  • Large table defaults → batched update, never inline
  • Index creation → CONCURRENTLY to avoid table lock
  • Every migration has an upgrade() and downgrade()

MCP tools

Taskmaster MCP Postgres MCP Context7 MCP

Example output

[DB Arch] Creating migration
users table (id, email, hashed_pw)
sessions (id, user_id, token, expires)
refresh_tokens (id, session_id, hash)
INDEX CONCURRENTLY on sessions.token
RLS policy: users can only read own rows
Run: alembic upgrade head

Why DBA runs before Backend

Backend Dev imports from models. If the model doesn't exist, the import fails with ImportError. Running DBA first means Backend starts with tables that exist, indexes optimized, and models importable. Running in parallel would cause random import failures depending on race conditions.

4
Backend Dev
Runs failing tests first · implements until all pass · structured handoff to Frontend
execution · sonnet

Test consumption protocol

  • Runs QA's tests first — confirms they fail for the RIGHT reason
  • ImportError → DBA needs to create model → escalate
  • 404 → correct, route doesn't exist yet → implement
  • After implementation: ALL tests must pass (GREEN phase)

Implements

  • Routers / controllers / handlers
  • Service layer with business logic
  • Pydantic schemas / request-response models
  • Authentication middleware
  • Background jobs (Celery, ARQ, etc.)

Structured handoff to Frontend

  • Every new endpoint with method, path, auth requirements
  • Request/response schemas with types
  • Auth flows (JWT header format, token refresh endpoint)
  • Error codes and their meanings
  • New env vars needed

Example output

[Backend] Running tests (should fail)
FAILED test_login → 404 ✓ expected
Implementing auth routes...
[Backend] Running tests (should pass)
PASSED 14/14 ✓
Handoff: POST /auth/login, POST /auth/register
POST /auth/refresh, DELETE /auth/logout
Bearer token required on: /users/*, /posts/*
⚡ test checkpoint — all pass → Stage 5 · any fail → Debugger (Stage 5.5)
5
Frontend Dev
Design preview first · accessible + responsive · all states covered
Design preview execution · sonnet

Step 5a — Design reference

  • Option A — Figma: Figma MCP pulls exact specs → "Build with these specs?"
  • Option B — Stitch: Stitch MCP generates preview → you approve → then build
  • Option C — Skip: "Just build it" → uses project's existing design system

Step 5b — Implementation

  • Uses /ui-ux-pro-max skill: 161 rules, 67 styles, 57 font pairings
  • Responsive: mobile (320px), tablet (768px), desktop (1440px)
  • Accessibility: semantic HTML, ARIA labels, keyboard nav, 4.5:1 contrast
  • All states: loading, error, empty, success
  • XSS protection in every form and display

MCP tools

Taskmaster MCP Figma MCP Stitch MCP Playwright MCP Context7 MCP

Why Frontend goes after Backend

The frontend consumes the backend's endpoints. If both built in parallel, Frontend would assume a response structure that might not match what Backend implemented. The API contract is defined when Backend finishes — that's when Frontend can reliably build against it.

5.5
Debugger
Reactive — only activates on test failure · max 5 diagnostic steps · never restarts
Reactive execution · sonnet

Investigation protocol

  • Step 1: Categorize symptom (crash, wrong data, slow, auth failure)
  • Step 2: Gather evidence (logs, DB state, git log, resource usage)
  • Step 3: Form hypotheses ranked by likelihood
  • Step 4: Test hypotheses (read code paths, check DB, verify auth)
  • Step 5: Confirm root cause — must explain ALL symptoms

Safety net

  • Max 5 active diagnostic steps
  • If no root cause after 5 → UNCONFIRMED report → escalate to user
  • After fix fails: reads previous diagnosis + fix → investigates only the delta (never starts from zero)
  • Max 3 fix rounds before escalating

MCP tools

Sentry MCP Postgres MCP Context7 MCP

Example output

[Debugger] Tests failing at Stage 4
Symptom: 422 on POST /auth/login
Evidence: schema expects email_address, model has email
Root cause: field name mismatch DBA↔QA
Delegating fix → Backend Dev
Re-running tests...
14/14 PASSED ✓
⚡ test checkpoint — all pass → Stage 6 (Security) · any fail → Debugger again
6
Security
OWASP Top 10:2025 · variant analysis · always runs · no false positives accepted
Always runs thinking · opus

4-phase audit

  • Phase 1: Automated scans (Semgrep MCP, language-specific rules)
  • Phase 2: OWASP Top 10:2025 checklist walk — 10 categories, 47+ items
  • Phase 3: Manual verification (project-specific attack vectors)
  • Phase 4: Variant analysis — for each CRITICAL/HIGH finding

Variant analysis (Phase 4)

  • Understand the root cause, not just the symptom
  • Search exact pattern across entire codebase
  • Identify abstraction points (wrappers, helpers)
  • Generalize search iteratively
  • Triage ALL instances as independent findings

False positive protocol

These 6 rationalizations are explicitly rejected:

  • "It's only used internally" → internal services get compromised
  • "Input is validated elsewhere" → show me WHERE
  • "We'll fix it later" → accepted risk, not FP
  • "Same pattern everywhere" → N vulnerabilities, not zero

MCP tools

Semgrep MCP Sentry MCP
Verdict: PASS / PASS (N accepted risks) / FAIL. CRITICAL unresolved → FAIL, pipeline stops.

Why Security runs after all code exists

You can't audit partial code. A security audit on half-written code produces false positives (auth is added later) and false negatives (you can't see component interactions). Security needs the complete system — backend + frontend + migrations — to find real vulnerabilities and their cross-layer interactions.

7
Reviewer
Reads every changed file · cross-references all layers · 3-round fix loop
Always runs thinking · opus

What it reviews

  • Reads ALL changed files end-to-end (not just the diff)
  • Cross-references: models ↔ routes ↔ schemas ↔ tests ↔ migrations
  • Runs Semgrep MCP for patterns the eye might miss
  • Verifies test coverage for every new endpoint and feature

3-round fix loop

  • Round 1: Review → find issues → delegate fixes
  • Round 2: Re-review fixed files + files that import them (regression check)
  • Round 3: Final review — Clean → APPROVE · Broken → REQUEST CHANGES

Exit conditions

  • All CRITICAL + WARNING resolved → APPROVE
  • CRITICAL unresolved after 3 rounds → REQUEST CHANGES (NEVER approves)
  • Only SUGGESTIONs remaining → APPROVE with notes

Example output

[Reviewer] Round 1
CRITICAL: No rate limiting on /auth/login
WARNING: Missing test for token expiry
Delegating → Backend Dev...
[Reviewer] Round 2
Rate limiting fixed + tests added
No regressions detected
APPROVE
8–9
DevOps + Completion
Deploys · health checks · writes memory · sends summary
execution · sonnet

Stage 8 — DevOps

  • Updates Dockerfile / docker-compose if new services or ports
  • Updates CI/CD pipeline if new tests or env vars
  • Deploys or prepares deploy command
  • Checks Sentry MCP for errors after deploy

Stage 9 — Completion

  • Shows summary: feature name, stages run, files changed, tests passing
  • Lists next steps: env vars to set, migrations to run, docker restart needed
  • Writes to memory vault only if real insight (non-obvious root cause, new pattern)
  • Marks all Taskmaster tasks complete

Memory write conditions (Stage 9)

The agent only writes to the vault if one of these conditions is true — not after every task:

  • Root cause was non-obvious (required >2 investigation steps)
  • A fix introduced a regression that wasn't caught immediately
  • Discovered a pattern that applies to other parts of the codebase
  • A convention violation was caught by Reviewer
  • A security finding was new (not in OWASP checklist)

Execution modes

Stages run vary
by complexity.

Capability-based routing skips agents when there's nothing for them to do. A pure bug fix doesn't need a DB migration.

fast
~$0.26
~$0.15 – $0.50 · complexity 1–3
Planner
skippable
QA
runs
DB Architect
skipped
Backend Dev
runs
Frontend Dev
skipped
Security
runs
Reviewer
runs
DevOps
skipped
serious
~$1–3
varies with retries · complexity 8–10
All 9 agents
run
Memory — RAG
mandatory every stage
Debugger rounds
up to 3
Security re-scans
up to 3
Review rounds
up to 3
Full variant analysis
all CRITICAL

Architecture decisions

Why this order.
Why these constraints.

Every pipeline decision was evaluated against alternatives. These aren't arbitrary — each has a specific consequence if changed.

Decision Alternative evaluated Why this option Consequence if changed
Sequential pipeline Backend + Frontend in parallel API is the contract — Frontend can only build against a finished Backend API mismatches. Frontend builds 5 fields, Backend implements 7. Discovered at integration.
QA before Dev (TDD) QA after Dev (test-after) Test IS the specification — it defines expected behavior, not validates existing code Tests become rubber-stamps. They confirm what was already written instead of discovering bugs.
Security + Reviewer at end Security parallel with Dev Can't audit partial code — auth might be added later, components interact in ways not visible mid-build False positives (blocking on issues Dev was going to fix) and false negatives (can't see cross-layer interactions).
9 specialized agents 1 general agent sequentially Observability — when a test fails, you know exactly which stage to look at Failure diagnosis becomes archaeology. A DBA who writes routers bypasses Security review.
tdd-guard mechanical block Advisory rule in agent prompt Prompts can be ignored. Exit code 2 cannot. TDD becomes optional. Backend Dev skips tests when rushed. Quality collapses silently.
Taskmaster per-agent tasks Pass full PRD to every agent Each agent reads ~50 tokens for its task vs ~2000 for the full PRD ~12K extra tokens per pipeline. No improvement in agent behavior — they only need their slice.
Thinking model for Planner/Security/Reviewer Sonnet for all agents Planning, security auditing, and code reviewing need deep reasoning Planner misses architectural edge cases. Security produces surface-level audits. Reviewer approves bugs.
Real output

What a pipeline run
actually looks like.

From a real Django project — "add overdue loans endpoint". Full pipeline, standard mode.

$ tasuki progress
Pipeline: add overdue loans endpoint
████████████████████░ 89% (8/9)
Mode: standard | Status: running

Stages:
  ✓ Planner (done) — 02:25:49
  ✓ QA (done) — 02:25:55
  ✓ DB-Architect (done) — 02:26:10
  ✓ Backend-Dev (done) — 02:26:45
  ✓ Test Checkpoint — all pass
  ✓ Security (done) — 02:27:30
  ✓ Reviewer — APPROVED
  → Completion (running)
7
stages executed
14
tests written
0
security findings
~$0.68
total cost

Common questions

Pipeline FAQ

Because each stage consumes the output of the previous one. Backend Dev imports models that DBA created. Frontend calls endpoints that Backend built. Security audits code that exists. Running in parallel causes import errors, API mismatches, and incomplete audits. The order isn't arbitrary — it's dependency-driven.
Yes — stages skip automatically when they're not needed. A bug fix doesn't trigger DB Architect or Frontend Dev. tasuki mode fast skips Planner for simple tasks. The capability-based routing decides which agents run based on the task description. Security and Reviewer always run — those are non-negotiable.
At test checkpoints, if tests fail → the Debugger activates (Stage 5.5). It diagnoses the root cause in max 5 steps, delegates the fix to the right agent, and re-runs tests. If it still fails after 3 rounds, it escalates to you. The pipeline never silently continues with failing tests.
The pipeline resumes from where it stopped. Each stage updates tasuki-plans/{feature}/status.md with checkboxes. When you say "continue" in a new session, the AI reads the status file, sees which stages are marked [x], and continues from the first [ ].
Yes. Create a .md file in .tasuki/agents/ with frontmatter (domains, triggers, priority, activation). Run tasuki discover — it auto-registers in the capability map. The Planner will route tasks to your custom agent when its domains match. See CONTRIBUTING.md for the format.
Yes, with different levels. Claude Code gets automatic pipeline execution via Agent() sub-agents + mechanical hooks. Other tools get role-switching — the AI follows instructions in order but can't enforce hooks mechanically. The pipeline still runs, but TDD guard becomes advisory instead of blocking. Run tasuki onboard . --target=cursor to generate for your tool.
Without it, review loops can go infinite — the Reviewer finds an issue, Dev fixes it, the fix introduces a new issue, Reviewer catches it, Dev fixes that... The 3-round limit forces escalation to you. In practice, 95% of issues are resolved in round 1. Round 3 means something architectural is wrong and needs human judgment.
Debugger activates during a pipeline when tests fail — it diagnoses root causes and delegates fixes. Reviewer runs at the end to gate quality — it approves or requests changes. Doctor is a CLI command (tasuki doctor) that checks your Tasuki installation health outside of any pipeline — missing files, stale configs, broken hooks.

See it run on
your project.

10 seconds to onboard. Your AI assistant becomes a team.

npm install -g tasuki
click to copy