A single binary. Zero infrastructure. PlanDB gives AI agents a compound task graph — recursive hierarchy with cross-level dependencies — so they can decompose complex work, parallelize safely, and adapt plans mid-flight. The graph structure itself is the scheduling algorithm: ready = run now, pending = blocked, multiple ready = parallelize.
We gave a single Claude Code instance one sentence: "Build a GPT from scratch in Rust, then train it to do tool calling." PlanDB handled the rest.
The agent autonomously built a 3,769-line transformer with full backpropagation in pure Rust (no ML frameworks), then designed and executed a 7-method RL experiment comparing SFT, REINFORCE, DPO, rejection sampling, and more. The task graph evolved from 6 planned tasks to 20 through mid-flight adaptation — splitting tasks when they proved complex, spawning parallel workers for independent experiments, and pivoting when REINFORCE catastrophically collapsed.
t-tc-design ──▶ t-sft ─────┬──▶ t-rl-reinforce ──┐
├──▶ t-rl-dpo ─────────┤
├──▶ t-rl-custom ──────┼──▶ t-compare ──▶ t-final
├──▶ t-sft-v2 ─────────┤ (fan-in)
├──▶ t-sft-v3 ─────────┤
└──▶ t-sft-reject ─────┘
| Method | Format Acc | Tool Acc | Composite |
|---|---|---|---|
| Rejection Sampling | 71.3% | 70.0% | 0.601 |
| SFT Baseline | 66.3% | 63.8% | 0.577 |
| DPO | 65.0% | 62.5% | 0.570 |
| REINFORCE | 0.0% | 0.0% | 0.090 |
Try it yourself — pre-trained weights included, runs instantly:
cd experiments/mini-gpt-rust && cargo run --release -- --demoFull source, experiment logs, and analysis: experiments/mini-gpt-rust/
More experiments in
experiments/— docs sites built autonomously by Codex, Claude Code, and Gemini CLI, each using PlanDB for task orchestration.
| The problem | What PlanDB does |
|---|---|
| Your agent can't think beyond one step. It finishes a task and forgets the big picture. | The graph IS the big picture. Every task knows what depends on it, what it depends on, and where it sits in the hierarchy. |
| You can't parallelize safely. Two agents grab the same work, or one starts before its dependency finishes. | Atomic claiming + dependency enforcement. plandb list --status ready = exactly what's safe to parallelize right now. |
| Plans break on contact with reality. Agent discovers something mid-flight, but the plan is frozen. | Split, insert, pivot, replan — six mid-flight adaptation primitives. The graph evolves as the agent learns. |
| Flat task lists can't model real work. "Backend" depends on "Schema" but also on "Auth" from a different team. | Compound graph: containment (hierarchy) and dependencies (flow) are orthogonal. Cross-level deps just work. |
| You have no idea what's blocking everything. Agent is busy, but is it on the critical path? | plandb critical-path shows the longest chain. plandb bottlenecks shows what's blocking the most work. plandb what-unlocks shows impact. |
| Agent output quality is hope-based. You pray it checked its work. | Pre/post conditions on every task. Post-condition shown on completion: "verify this before moving on." |
| Every project starts from zero. Same decomposition patterns reinvented every time. | plandb export saves a project's structure as a template. plandb import applies it. Best decompositions compound. |
# macOS / Linux
curl -fsSL https://github.com/Agent-Field/plandb/releases/latest/download/plandb-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m) -o /usr/local/bin/plandb && chmod +x /usr/local/bin/plandb
# From source
cargo install --path .plandb init "my-project"
plandb add "Design the API" --as design --description "Define REST endpoints and auth strategy"
plandb add "Build backend" --dep t-design --description "Implement the API spec"
plandb add "Write tests" --dep t-design --description "Integration tests for all endpoints"
plandb go # claim next ready task
# ... work ...
plandb done --next # complete + claim next (no ID needed)PlanDB is the task planning layer for AgentField — the open-source AI backend that lets you build and run AI agents like APIs. Use them together or independently.
| Project | What it does | Link |
|---|---|---|
| AgentField | AI backend — build and run AI agents like APIs | Agent-Field/agentfield |
| PlanDB | Agent-native task planning — compound graph for decomposition and coordination | you are here |
| SWE-AF | Autonomous software engineering — fleet of agents that plan, code, test, and ship PRs | Agent-Field/SWE-AF |
| PR-AF | AI-native multi-agent pull request reviewer | Agent-Field/pr-af |
| CloudSecurity-AF | AI-native cloud security scanner — attack path analysis from IaC | Agent-Field/cloudsecurity-af |
SWE-AF uses PlanDB internally to decompose engineering tasks into parallel agent workstreams — it's the reference implementation for PlanDB-driven multi-agent orchestration.
Generate a ready-to-use system prompt for any AI agent:
plandb prompt --for cli # system prompt for CLI agents (Codex, Claude Code, Gemini, Aider)
plandb prompt --for mcp # MCP config JSON for Claude Code / Cursor / Windsurf
plandb prompt --for http # REST API instructions for custom agentsOr copy this minimal prompt into your agent's instructions:
You have plandb installed for task planning. Use it to decompose work and track progress.
Core loop: plandb go → work → plandb done --next
Add tasks: plandb add "title" --description "detailed spec" --dep t-xxx
Split: plandb split --into "A, B, C" (independent) or "A > B > C" (chain)
Introspect: plandb critical-path | plandb bottlenecks | plandb what-unlocks <id>
Status: plandb status --detail
After each completion, reassess: plandb status --detail + plandb critical-path.
Plans are hypotheses — adapt as you learn.
When plandb list --status ready shows multiple tasks, parallelize them.
See examples/ for complete scripts running PlanDB with Codex, Claude Code, and Gemini CLI.
Two orthogonal structures composed together — more general than a DAG, hierarchical DAG, or hypergraph:
- Place graph (containment): tasks contain subtasks recursively, to any depth
- Link graph (dependencies): edges between tasks at ANY level, crossing containment boundaries
A subtask at depth 3 can depend on a task at depth 0 in a different branch. Nesting doesn't constrain flow. Composite tasks auto-complete when all children finish.
plandb go # claim next ready task (no --agent needed, no IDs)
plandb done --next # complete current + claim nextplandb critical-path # what to prioritize — longest chain to completion
plandb bottlenecks # what's blocking the most downstream work
plandb what-unlocks t-abc # impact of completing a specific task
plandb watch # live dashboardPlans are hypotheses. Agents discover reality during execution — new tasks emerge, dependencies shift, approaches change. PlanDB has six mid-flight primitives:
plandb split --into "A, B, C" # decompose a task that turned out complex
plandb task insert --after t-a --before t-b # add a missed step, rewire deps automatically
plandb task pivot t-abc --file new.yaml # replace an entire subtree with a new approach
plandb task amend t-abc --prepend "NOTE: ..." # annotate future tasks with discoveries
plandb task replan t-abc --file revised.yaml # cancel + recreate subtasks
plandb what-if cancel t-abc # preview effects before committingThe graph reshapes itself as agents learn. Dependencies rewire. Ready tasks recalculate. No replanning ceremony — just mutate and go.
plandb add "Implement API" \
--pre "schema must define all endpoints" \
--post "all routes return valid JSON" \
--description "..."plandb export > fullstack.yaml # save a successful project's structure
plandb import fullstack.yaml # apply pattern to new projectplandb init "project" # create project
plandb add "title" --description "spec" --dep t-xxx # add task
plandb add "title" --as custom-id --kind code # custom ID, typed
plandb go # claim next ready
plandb done --next # complete + claim next
plandb done --result '{"key": "value"}' # complete with dataplandb split --into "A, B, C" # independent subtasks
plandb split --into "A > B > C" # linear chain
plandb task decompose t-abc --file subtasks.yaml # from YAML
plandb use t-abc # scope into subtree
plandb use .. # scope outplandb status --detail # dependency tree
plandb status --full # compound graph view
plandb critical-path # longest chain
plandb bottlenecks # blocking tasks
plandb what-unlocks t-abc # downstream impact
plandb watch # live dashboardplandb task insert --after t-a --before t-b --title "X" # insert step
plandb task amend t-abc --prepend "NOTE: ..." # annotate
plandb task pivot t-abc --file new-plan.yaml # replace subtree
plandb what-if cancel t-abc # preview effects
plandb export > template.yaml # save pattern
plandb import template.yaml # apply patternPLANDB_AGENT=w1 plandb go # atomic claim (no double-assignment)
PLANDB_AGENT=w1 plandb done --next--kind:generic,code,research,review,test,shell--dep: upstream must exist first. Types:feeds_into(default),blocks,suggests--description: always include — it's the work order, not the title- IDs: short (
t-k3m9), custom (--as api→t-api), fuzzy-matched on typos
| Interface | Command | Use Case |
|---|---|---|
| CLI | plandb <command> |
Direct agent use |
| MCP | plandb mcp |
Claude Code, Cursor, Windsurf |
| HTTP | plandb serve --port 8484 |
Custom agents, webhooks |
PlanDB uses a compound graph — see docs/ARCHITECTURE.md for the full design: why two orthogonal structures, cross-level dependency mechanics, comparison with flat DAGs/hypergraphs, composite auto-completion, and when each structure matters.
Apache License 2.0
