Pipeline — Tasuki

Planner

Designs architecture · writes PRD · decomposes work · waits for your OK

Confirms with you thinking · opus

What it reads

.tasuki/agents/planner.md — full role definition
project-context.md — business understanding from onboard interview
project-facts.md — verified stack, paths, commands
tasuki-plans/index.md — previous plans to avoid repetition
capability-map.yaml — which agents are available and their domains

What it produces

tasuki-plans/{feature}/prd.md — product requirements doc
tasuki-plans/{feature}/plan.md — implementation plan
tasuki-plans/{feature}/status.md — tracking file (updated by each stage)
One task per agent via shared task list — ~50 tokens vs 2000 for full PRD
Brief plan summary shown to you before any code is written

MCP tools

Taskmaster MCP Context7 MCP

Agent Teams native coordination: The Planner assigns tasks via shared task list. Each teammate picks up only its own task — no token overhead. On other AI tools, Taskmaster MCP provides similar decoupling.

Example output

[Planner] Feature: JWT Authentication
───────────────────────────────
Scope: Login, register, refresh, revoke
QA: 12 test cases, auth middleware
DBA: users, sessions, refresh_tokens
Backend: 4 routes, JWT service, middleware
Frontend: login/register/profile pages

Continue with implementation? [y/n]

Why it confirms with you

The Planner stops here because architecture decisions are the human's domain. Once you say yes, the full pipeline runs automatically on Claude Code. This is the only confirmation needed — except for the Frontend design preview at Stage 5.

Writes failing tests FIRST — the test is the specification

TDD enforced execution · sonnet

Before it acts (MANDATORY)

Read project-facts.md
grep [[qa]] in memory-vault/heuristics/ — its permanent rules
grep [[qa]] in memory-vault/errors/ — mistakes to avoid

Schema protocol

If the model doesn't exist yet → BLOCK, report to DBA with schema needed
Tests MUST fail when written — if they pass, they're not testing new behavior
Writes test for every endpoint, every state, every auth edge case

MCP tools

Taskmaster MCP (native in Agent Teams) Playwright MCP Context7 MCP

Example output

[QA] Writing failing tests (RED phase)
✗ test_login_valid_credentials
✗ test_login_wrong_password → 401
✗ test_jwt_missing → 401 not 500
✗ test_refresh_token_rotation
✗ test_revoke_all_sessions
14 tests written · all failing ✓
Handoff → DBA: needs users, sessions tables

Why TDD is mechanically enforced

The tdd-guard hook intercepts every Write/Edit call. If you try to edit app/routers/users.py without tests/test_users.py existing → exit code 2, blocked. This is not advisory — it's a mechanical restriction. Without it, Backend Dev in Stage 4 could write implementation without any tests, and "testing" becomes "confirming what I already wrote works".

DB Architect

Zero-downtime migrations · verifies plan matches tests · production-safe DDL

execution · sonnet

Pipeline coordination check

Reads Planner's plan AND QA's tests
Verifies they agree on schema before creating anything
If discrepancy → reports to Planner before touching the DB

Production-safe DDL patterns

Column renames → expand-contract (never direct rename)
NOT NULL additions → add nullable first, backfill, then constrain
Large table defaults → batched update, never inline
Index creation → CONCURRENTLY to avoid table lock
Every migration has an upgrade() and downgrade()

MCP tools

Taskmaster MCP (native in Agent Teams) Postgres MCP Context7 MCP

Example output

[DB Arch] Creating migration
✓ users table (id, email, hashed_pw)
✓ sessions (id, user_id, token, expires)
✓ refresh_tokens (id, session_id, hash)
✓ INDEX CONCURRENTLY on sessions.token
✓ RLS policy: users can only read own rows
Run: alembic upgrade head

Why DBA runs before Backend

Backend Dev imports from models. If the model doesn't exist, the import fails with ImportError. Running DBA first means Backend starts with tables that exist, indexes optimized, and models importable. Running in parallel would cause random import failures depending on race conditions.

Backend Dev

Runs failing tests first · implements until all pass · structured handoff to Frontend

execution · sonnet

Test consumption protocol

Runs QA's tests first — confirms they fail for the RIGHT reason
ImportError → DBA needs to create model → escalate
404 → correct, route doesn't exist yet → implement
After implementation: ALL tests must pass (GREEN phase)

Implements

Routers / controllers / handlers
Service layer with business logic
Pydantic schemas / request-response models
Authentication middleware
Background jobs (Celery, ARQ, etc.)

Structured handoff to Frontend

Every new endpoint with method, path, auth requirements
Request/response schemas with types
Auth flows (JWT header format, token refresh endpoint)
Error codes and their meanings
New env vars needed

Example output

[Backend] Running tests (should fail)
FAILED test_login → 404 ✓ expected
Implementing auth routes...
[Backend] Running tests (should pass)
PASSED 14/14 ✓
Handoff: POST /auth/login, POST /auth/register
POST /auth/refresh, DELETE /auth/logout
Bearer token required on: /users/*, /posts/*

⚡ test checkpoint — all pass → Stage 5 · any fail → Debugger (Stage 5.5)

Frontend Dev

Design preview first · accessible + responsive · all states covered

Design preview execution · sonnet

Step 5a — Design reference

Option A — Figma: Figma MCP pulls exact specs → "Build with these specs?"
Option B — Stitch: Stitch MCP generates preview → you approve → then build
Option C — Skip: "Just build it" → uses project's existing design system

Step 5b — Implementation

Uses /ui-ux-pro-max skill: 161 rules, 67 styles, 57 font pairings
Responsive: mobile (320px), tablet (768px), desktop (1440px)
Accessibility: semantic HTML, ARIA labels, keyboard nav, 4.5:1 contrast
All states: loading, error, empty, success
XSS protection in every form and display

MCP tools

Taskmaster MCP (native in Agent Teams) Figma MCP Stitch MCP Playwright MCP Context7 MCP

Why Frontend goes after Backend

The frontend consumes the backend's endpoints. If both built in parallel, Frontend would assume a response structure that might not match what Backend implemented. The API contract is defined when Backend finishes — that's when Frontend can reliably build against it.

5.5

Debugger

Reactive — only activates on test failure · max 5 diagnostic steps · never restarts

Reactive execution · sonnet

Investigation protocol

Step 1: Categorize symptom (crash, wrong data, slow, auth failure)
Step 2: Gather evidence (logs, DB state, git log, resource usage)
Step 3: Form hypotheses ranked by likelihood
Step 4: Test hypotheses (read code paths, check DB, verify auth)
Step 5: Confirm root cause — must explain ALL symptoms

Safety net

Max 5 active diagnostic steps
If no root cause after 5 → UNCONFIRMED report → escalate to user
After fix fails: reads previous diagnosis + fix → investigates only the delta (never starts from zero)
Max 3 fix rounds before escalating

MCP tools

Sentry MCP Postgres MCP Context7 MCP

Example output

[Debugger] Tests failing at Stage 4
Symptom: 422 on POST /auth/login
Evidence: schema expects email_address, model has email
Root cause: field name mismatch DBA↔QA
Delegating fix → Backend Dev
Re-running tests...
14/14 PASSED ✓

⚡ test checkpoint — all pass → Stage 6 (Security) · any fail → Debugger again

Security

OWASP Top 10:2025 · variant analysis · always runs · no false positives accepted

Always runs thinking · opus

4-phase audit

Phase 1: Automated scans (Semgrep MCP, language-specific rules)
Phase 2: OWASP Top 10:2025 checklist walk — 10 categories, 47+ items
Phase 3: Manual verification (project-specific attack vectors)
Phase 4: Variant analysis — for each CRITICAL/HIGH finding

Variant analysis (Phase 4)

Understand the root cause, not just the symptom
Search exact pattern across entire codebase
Identify abstraction points (wrappers, helpers)
Generalize search iteratively
Triage ALL instances as independent findings

False positive protocol

These 6 rationalizations are explicitly rejected:

"It's only used internally" → internal services get compromised
"Input is validated elsewhere" → show me WHERE
"We'll fix it later" → accepted risk, not FP
"Same pattern everywhere" → N vulnerabilities, not zero

MCP tools

Semgrep MCP Sentry MCP

Verdict: PASS / PASS (N accepted risks) / FAIL. CRITICAL unresolved → FAIL, pipeline stops.

Why Security runs after all code exists

You can't audit partial code. A security audit on half-written code produces false positives (auth is added later) and false negatives (you can't see component interactions). Security needs the complete system — backend + frontend + migrations — to find real vulnerabilities and their cross-layer interactions.

Reviewer

Reads every changed file · cross-references all layers · 3-round fix loop

Always runs thinking · opus

What it reviews

Reads ALL changed files end-to-end (not just the diff)
Cross-references: models ↔ routes ↔ schemas ↔ tests ↔ migrations
Runs Semgrep MCP for patterns the eye might miss
Verifies test coverage for every new endpoint and feature

3-round fix loop

Round 1: Review → find issues → delegate fixes
Round 2: Re-review fixed files + files that import them (regression check)
Round 3: Final review — Clean → APPROVE · Broken → REQUEST CHANGES

Exit conditions

All CRITICAL + WARNING resolved → APPROVE
CRITICAL unresolved after 3 rounds → REQUEST CHANGES (NEVER approves)
Only SUGGESTIONs remaining → APPROVE with notes

Example output

[Reviewer] Round 1
CRITICAL: No rate limiting on /auth/login
WARNING: Missing test for token expiry
Delegating → Backend Dev...
[Reviewer] Round 2
✓ Rate limiting fixed + tests added
✓ No regressions detected
APPROVE

8–9

DevOps + Completion

Deploys · health checks · writes memory · sends summary

execution · sonnet

Stage 8 — DevOps

Updates Dockerfile / docker-compose if new services or ports
Updates CI/CD pipeline if new tests or env vars
Deploys or prepares deploy command
Checks Sentry MCP for errors after deploy

Stage 9 — Completion

Shows summary: feature name, stages run, files changed, tests passing
Lists next steps: env vars to set, migrations to run, docker restart needed
Writes to memory vault only if real insight (non-obvious root cause, new pattern)
Marks all tasks complete

Memory write conditions (Stage 9)

The agent only writes to the vault if one of these conditions is true — not after every task:

Root cause was non-obvious (required >2 investigation steps)
A fix introduced a regression that wasn't caught immediately
Discovered a pattern that applies to other parts of the codebase
A convention violation was caught by Reviewer
A security finding was new (not in OWASP checklist)

Decision	Alternative evaluated	Why this option	Consequence if changed
Sequential pipeline	Backend + Frontend in parallel	API is the contract — Frontend can only build against a finished Backend	API mismatches. Frontend builds 5 fields, Backend implements 7. Discovered at integration.
QA before Dev (TDD)	QA after Dev (test-after)	Test IS the specification — it defines expected behavior, not validates existing code	Tests become rubber-stamps. They confirm what was already written instead of discovering bugs.
Security + Reviewer at end	Security parallel with Dev	Can't audit partial code — auth might be added later, components interact in ways not visible mid-build	False positives (blocking on issues Dev was going to fix) and false negatives (can't see cross-layer interactions).
9 specialized agents	1 general agent sequentially	Observability — when a test fails, you know exactly which stage to look at	Failure diagnosis becomes archaeology. A DBA who writes routers bypasses Security review.
tdd-guard mechanical block	Advisory rule in agent prompt	Prompts can be ignored. Exit code 2 cannot.	TDD becomes optional. Backend Dev skips tests when rushed. Quality collapses silently.
Taskmaster per-agent tasks	Pass full PRD to every agent	Each agent reads ~50 tokens for its task vs ~2000 for the full PRD	~12K extra tokens per pipeline. No improvement in agent behavior — they only need their slice.
Thinking model for Planner/Security/Reviewer	Sonnet for all agents	Planning, security auditing, and code reviewing need deep reasoning	Planner misses architectural edge cases. Security produces surface-level audits. Reviewer approves bugs.

Common questions

Pipeline FAQ

Because each stage consumes the output of the previous one. Backend Dev imports models that DBA created. Frontend calls endpoints that Backend built. Security audits code that exists. Running in parallel causes import errors, API mismatches, and incomplete audits. The order isn't arbitrary — it's dependency-driven.

Yes — stages skip automatically when they're not needed. A bug fix doesn't trigger DB Architect or Frontend Dev. tasuki mode fast skips Planner for simple tasks. The capability-based routing decides which agents run based on the task description. Security and Reviewer always run — those are non-negotiable.

At test checkpoints, if tests fail → the Debugger activates (Stage 5.5). It diagnoses the root cause in max 5 steps, delegates the fix to the right agent, and re-runs tests. If it still fails after 3 rounds, it escalates to you. The pipeline never silently continues with failing tests.

Session continuity is built in. The pipeline state machine writes machine-readable state (files_created, files_edited, tests_run per stage) to tasuki-plans/{feature}/status.md. When you start a new session and say "continue", the next session reads this state, sees which stages are marked [x], and resumes from the first [ ]. No context is lost.

Yes. Create a .md file in .tasuki/agents/ with frontmatter (domains, triggers, priority, activation). Run tasuki discover — it auto-registers in the capability map. The Planner will route tasks to your custom agent when its domains match. See CONTRIBUTING.md for the format.

Yes, with different levels. Claude Code gets automatic pipeline execution via Agent Teams (real teammates) + mechanical hooks. Other tools get role-switching — the AI follows instructions in order but can't enforce hooks mechanically. The pipeline still runs, but TDD guard becomes advisory instead of blocking. Run tasuki onboard . --target=cursor to generate for your tool.

Without it, review loops can go infinite — the Reviewer finds an issue, Dev fixes it, the fix introduces a new issue, Reviewer catches it, Dev fixes that... The 3-round limit forces escalation to you. In practice, 95% of issues are resolved in round 1. Round 3 means something architectural is wrong and needs human judgment.

Debugger activates during a pipeline when tests fail — it diagnoses root causes and delegates fixes. Reviewer runs at the end to gate quality — it approves or requests changes. Doctor is a CLI command (tasuki doctor) that checks your Tasuki installation health outside of any pipeline — missing files, stale configs, broken hooks.

A process your AIcan't skip.

Claude Code — Agent Teams

Other AI tools — Role-switching

What it reads

What it produces

MCP tools

Example output

Why it confirms with you

Before it acts (MANDATORY)

Schema protocol

MCP tools

Example output

Why TDD is mechanically enforced

Pipeline coordination check

Production-safe DDL patterns

MCP tools

Example output

Why DBA runs before Backend

Test consumption protocol

Implements

Structured handoff to Frontend

Example output

Step 5a — Design reference

Step 5b — Implementation

MCP tools

Why Frontend goes after Backend

Investigation protocol

Safety net

MCP tools

Example output

4-phase audit

Variant analysis (Phase 4)

False positive protocol

MCP tools

Why Security runs after all code exists

What it reviews

3-round fix loop

Exit conditions

Example output

Stage 8 — DevOps

Stage 9 — Completion

Memory write conditions (Stage 9)

Fast, standard,or serious.

Why this order.Why these constraints.

What a pipeline runactually looks like.

Pipeline FAQ

See it run onyour project.

A process your AI
can't skip.

Fast, standard,
or serious.

Why this order.
Why these constraints.

What a pipeline run
actually looks like.

See it run on
your project.