Software that builds itself, pays for its own inference, and gets better with every run.
Bardo is a Rust monorepo where the build system, the inference layer, the payment protocol, the EVM simulator, and the autonomous agents are all parts of the same organism. Mori orchestrates fleets of AI agents to write code. The gateway caches and routes their inference calls, cutting costs 40-85%. The agents pay for that inference with USDC through MPP. The services they build register as MCP tool servers and start earning revenue. The revenue funds the next build. The system remembers what worked, extracts patterns, and injects them into future agents so they don't repeat past mistakes.
Build, deploy, earn, learn, build better. The loop closes.
Specification (PRD)
|
v
Mori decomposes into plans, enriches with AST-extracted context
|
v
Agent swarm executes in parallel across isolated git worktrees
|
v Gateway caches inference,
Gates verify (compile, test, review) <----- routes across providers,
| tracks cost per request
v
Merge, deploy, register as MCP server
|
v
Service earns USDC via x402/MPP micropayments
|
v
Revenue funds next build cycle
|
v
Memory system extracts patterns, promotes to playbook rules
|
v
Next build is faster, cheaper, more accurate
Each piece below works independently. Together they form a closed loop where software funds its own evolution.
bardo-gateway -- inference proxy
demo1.mp4
An HTTP proxy between your agents and LLM providers. Point any Anthropic or OpenAI SDK at port 4000 by changing the base URL. You get three cache layers, five provider backends, request normalization, cost tracking, tool pruning, batch processing, and USDC micropayments without touching application code.
ANTHROPIC_API_KEY=sk-... cargo run -p bardo-gateway
# All agent traffic now routes through http://localhost:4000Every request passes through three cache layers before hitting a provider. Each targets a different class of waste.
L1: Hash cache. BLAKE3 hash of the normalized request body. In-memory moka LRU, sub-millisecond lookup. Catches identical repeated requests, retries, and deterministic prompts. In a typical agent run, 10-15% of requests are exact duplicates. Before hashing, three normalization passes run: UUID/timestamp stripping (replaces per-invocation noise with [VAR]), tool definition sorting (alphabetical by name, making hash order-independent), and JSON key ordering via BTreeMap (eliminates serializer-dependent ordering). These three passes increase L1 hit rates 15-25% over raw request hashing.
L2: Semantic cache. Two backends: SimHash (default, 64-bit fingerprint, Hamming distance <= 3, ~50us for 10K entries, pure CPU) or fastembed ONNX embeddings (opt-in, cosine >= 0.92, ~3-5ms, better accuracy). Catches rephrased prompts. "Explain the auth middleware" and "what does the auth middleware do?" are different strings but match semantically. Tool-use responses are excluded because replaying cached tool IDs breaks subsequent turns. Entries persist to SQLite every 60 seconds and restore on restart.
L3: Prompt prefix cache. Anthropic caches KV state for shared prompt prefixes at a 90% discount on input tokens. The gateway injects cache_control: {"type": "ephemeral"} headers into system prompts and restructures requests so the cacheable prefix (system prompt + tool definitions + shared context) is maximized across agents. An 80K-token prompt where 60K is cached prefix saves $0.27 per call at Opus pricing. Over a 200-request session, that's $54. All JSON serialization uses BTreeMap for deterministic key ordering: same content produces same bytes produces cache hits.
Combined: 40-85% cost reduction depending on workload repetitiveness. Measured on a production agent run: $182 actual vs $5,352 naive cost (96.6% reduction at 85% cache hit rate).
| Provider | What it does |
|---|---|
| Anthropic | Primary. Key rotation across up to 10 keys (ANTHROPIC_API_KEY through ANTHROPIC_API_KEY_10). Round-robins on rate limit. |
| OpenAI | Chat Completions passthrough. Serves gpt-*, o3, o4-mini requests. |
| OpenRouter | Aggregator with 400+ models. Good fallback when primary providers rate-limit. Separate Anthropic rate limits. |
| Venice | TEE-attested inference with zero data retention. The gateway has a three-tier security classifier (Standard/Confidential/Private) that scans requests via deterministic keyword matching, no LLM calls. Eight triggers (PortfolioComposition, MevSensitive, RebalanceTiming, DealNegotiation, GovernanceDeliberation, CounterpartyAnalysis, DeathReflection, OwnerPii) route sensitive content to Venice automatically. Private-classified requests never fall back to a retaining provider. DIEM budget tracker monitors Venice-specific credit allocations. |
| Bankr | Self-funding agent wallets. An agent that earns revenue can pay for its own inference without a human paying API bills. The metabolic loop monitor tracks the sustainability ratio (daily_revenue / daily_inference_cost). When ratio >= 1.0, the agent is self-sustaining and its economic death clock stops. Credit balance with vault fee conversion, model tier routing based on credit budget, cross-model verification for high-stakes actions (second model confirms parsed intent before execution), auto-replenish and throttle policies. |
All providers implement a Provider trait. Adding a new one means implementing send, parse_response, and report_cost. The gateway handles caching, normalization, and cost tracking uniformly.
Failover: when a provider errors or rate-limits, the gateway falls through to the next by priority. Privacy override: failover is constrained to Venice for Private-classified content.
Agents defined with 30 tools include all 30 definitions in every request, 100-500 tokens each, 2-13K tokens of dead weight if only 3 are used. The ToolTracker watches per-session tool usage and strips unused definitions after 5 requests. Saves 2-5K tokens/request. At Sonnet pricing, that compounds across hundreds of requests.
Non-urgent work (enrichment, summarization, pattern extraction) routes through Anthropic's Batch API at 50% cost. Auto-flush at 50 items or 30 seconds. In a typical mori build, 40-60% of inference spend is non-urgent, so that's a 20-30% reduction on total build cost stacking on top of caching savings.
HTTP 402-based USDC micropayments that let agents pay for inference the same way they pay for gas, with a wallet and a signature. No API keys, no accounts, no invoices.
The protocol is a two-step HTTP exchange. Client sends a request without payment; gateway returns 402 with a PaymentRequired quote (amount in USDC base units on Base chain 8453, recipient wallet, expiry, nonce, cost breakdown showing provider cost and spread). Client signs an ERC-3009 transferWithAuthorization using EIP-712 typed data, retries with the signature in the X-Payment header. Gateway verifies off-chain via ecrecover (no RPC call needed), serves the request, returns a Payment-Receipt with the actual amount charged.
Two intents: Charge (one-shot, per-request signature) and Session (pre-funded balance with per-request draws, no re-signing). Sessions are stored in a concurrent DashMap with SQLite persistence and background TTL expiration. Close returns unused funds.
Reputation-tiered spread: 20% default, 18% Basic (5+ builds), 15% Verified (25+ builds, >90% gate pass), 12% Trusted (100+ builds), 8% Sovereign (500+ builds).
The protocol primitives live in the standalone mpp crate, usable by any Rust service building pay-per-request APIs.
Every response carries headers: X-Mori-Cost-Usd (actual), X-Mori-Naive-Cost-Usd (what you'd pay without the gateway), X-Mori-Savings-Usd (the delta), X-Mori-Cache-Status (hash-hit/semantic-hit/prefix-hit/miss). Per-model, per-session, and per-key breakdowns persist to SQLite. WebSocket live dashboard at /v1/ws/stats broadcasts per-request events in real time.
mori -- build orchestrator
demo3.mp4
53,000 lines of Rust. Takes a specification, decomposes it into dependency-ordered plans, engineers targeted context for each one, and dispatches a fleet of AI agents in parallel across isolated git worktrees to implement, test, review, and merge the results.
./mori.sh 01-09 --express # 20 concurrent agents, no reviews, maximum speed
./mori.sh 01-09 --dry-run # print DAG, wave breakdown, no agents spawnedThe bottleneck in AI-assisted development is not model quality. It is context. Give an LLM the right 3,000 tokens and it writes correct code. Give it 30,000 tokens of noise and it hallucinates. Nine layers of context engineering compose to reduce input tokens by 76% and increase gate pass rate from 65% to 92%.
| Layer | What it does | Cost | Impact |
|---|---|---|---|
| AST extraction | Tree-sitter parses source into signatures, types, imports. 6ms/file initial, sub-ms incremental. 100+ languages. | $0 | 10-50x token reduction vs reading full files |
| Workspace index | Symbol graph with PageRank ranking, biased per-task via files listed in task TOML. Top-50 symbols cover 80% of cross-file references. Cached in SQLite, keyed by content hash. | $0, 2ms/query | Finds context grep misses |
| Semantic search | HDC hypervector fingerprints (10,240-bit, XOR binding, 50ns Hamming distance) for structural matching + optional CodeRankEmbed ONNX embeddings (137M params, local, 10-50ms) for conceptual matching. Hybrid reranking combines semantic + keyword (ripgrep) + AST (tree-sitter query) signals. | $0, 15ms | 94% retrieval accuracy vs 62% for grep |
| Change detection | Blake3 content hashing at symbol granularity via Merkle tree. Only changed symbols invalidate downstream artifacts. Typical edit invalidates 2-5 plans, not 110. | $0, 1ms | Eliminates redundant re-enrichment |
| Prefix alignment | BTreeMap JSON serialization for deterministic byte ordering. Same content = same bytes = cache hit. Tool definitions, function schemas, structured context all use BTreeMap. | $0, 5ms | 91% Anthropic prefix cache hit rate |
| Compression | Structural: tree-sitter extracts signatures without implementations (200-line function becomes 3-line signature). Token-level: 4.2x compression ratio. | $0, 50ms | Half the tokens, same information |
| Research agent | Cheap agent explores codebase before planning to ground plans in actual code state. | $0.10, 30s | Prevents plans based on assumptions |
| Extended thinking | Claude extended thinking at architectural decision points. | $0.30 | Reduces plan structure errors |
| Quality gates | Static analysis + LLM-judge rubric scoring. | $0.02, 25s | 94% first-pass gate rate |
A task costing $2.50 via Claude Code direct costs ~$0.42 through mori. The savings come from every layer and compound multiplicatively.
Plans declare dependencies in YAML frontmatter. Kahn's algorithm computes execution waves. But plan-level parallelism leaves performance on the table when two plans in the same wave touch different files. The UnifiedTaskDag goes deeper: builds a file-conflict graph across all tasks in all plans in a wave, partitions into independent groups via union-find, and dispatches groups concurrently. For a 20-plan project, this extracts 3-4x more parallelism than wave scheduling alone.
26 specialized roles across three backends (Claude Code, Cursor, Codex). Backend inferred from model slug. Mix in one run: opus implementer via Claude, haiku scribe via Cursor, gpt-class fixer via Codex.
Each plan runs a state machine: Preflight -> Strategist -> Implementer -> Compile Gate -> Test Gate -> Review (parallel Architect + Auditor + Scribe) -> Critic Verdict -> Merge. Failures loop back with cumulative DO NOT RETRY lists built from compiler errors, review blockers, and diff stats. Up to 8 iterations. Golden-path plans (first-try success) get indexed by category and fed as examples to future decompositions.
The Conductor monitors all running agents: nudges silent agents after 300s, restarts stalled agents after 600s, aborts stuck phases after 1800s. Escalates model tiers on failure (haiku -> sonnet -> opus). Manages spawn priority (implementers = 0/highest, conductor = 7/lowest).
Not every task needs the same model. Six classification dimensions (complexity, category, quality, speed, reasoning, context weight) determine model selection per task. A trivial config task routes to Haiku ($0.80/M input). A complex cross-module refactor routes to Opus ($15/M input). Classification costs fractions of a cent per task. Budget-aware degradation: as spend increases, remaining tasks route to cheaper models. Gates are model-agnostic, code compiles regardless of which model wrote it.
Every task execution writes a LanceDB episode (files changed, model used, tokens, cost, gate pass, iterations, HDC fingerprint, embedding). When 5+ similar episodes share a common outcome, mori extracts a pattern. Patterns that correctly predict outcomes across 5+ subsequent builds get promoted to playbook rules. Rules are injected directly into agent context at task start. Rules that stop being accurate auto-demote.
HDC fingerprints enable 50ns pattern matching, 1000x faster than embedding-based lookup. Project-agnostic (encode structural characteristics, not identifiers), so patterns transfer across codebases.
Context budget allocation learns per task category. Auth tasks converge toward 30% playbook, 25% state. Config tasks converge toward 35% state, 5% types. Over 100 builds, the allocation optimizes itself.
Each agent gets its own git worktree on its own branch (codex/plan/{name}). No shared mutable state. All worktrees share a single sccache instance with normalized base directories (SCCACHE_BASEDIRS), so the second plan compiling a shared dependency gets a near-instant cache hit.
Mori compiles bardo-gateway as a library. When --gateway is set (default), it starts on port 4000 as a background tokio task and routes all agent inference through it. Three-layer caching, five providers, tool pruning, MPP, batch API, all active by default.
Ratatui application with 10 views, 26 widgets, 12 modal dialogs. ROSEDUST palette (rose on violet-black, CRT scanlines, phosphor effects). Dashboard shows wave progress, active agents, token sparklines (braille-rendered), gate results, review verdicts, per-agent/per-plan/per-milestone budget tracking. Agent pool displays live output streams. Inject messages to running agents with i.
Crash reports to .mori/runs/ on panics and errors (backtrace, app state, recent logs, error signature). Supervisor script watches and restarts. All state on disk, restart picks up where it left off.
mori-service -- paid builds over HTTP
The service layer that turns mori into something anyone can pay to run. Describe what you want, mori prices it, you fund it with USDC, it builds while you watch costs stream in real time.
Draft -> Proposal -> Run -> Delivery -> Settlement. Each phase has a payment pattern: x402 micropayments for drafting (pennies per interaction), MPP session or ERC-8183 escrow for funded builds (dollars), x402 again for mid-build adjustments.
Proposal engine classifies complexity (Trivial through Epic) via heuristic keyword analysis, no LLM calls. Prices each task against the gateway's rate table with model tier distributions, cache hit rate modeling, and a 15% retry buffer. Proposals break down cost by milestone, by type (inference vs compute), and by plan. Draft costs already spent are deducted.
Mid-build: top up budget, reduce scope (skip plans), or add features. Each adjustment is incremental, no renegotiating the whole proposal. SSE event stream carries cost headers on every event. Budget alerts at configurable thresholds.
GitHub App integration: issue opened -> mori comments proposal -> thumbsup to approve -> mori builds -> opens PR with cost breakdown and verification checklist. Slash commands: /mori review, /mori investigate, /mori fix, /mori run 03-05, /mori cost.
Twitter bot: @mention with a build request -> quote reply with scope/cost/time -> reply "BUILD" to approve. Simple mode (under threshold) or conversational mode (multi-turn refinement). Rate limiting, account age checks, allowlist/blocklist.
SQLite persistence (7 tables, WAL mode). Auth: API key (mori_sk_*) with read/write/admin scopes + SIWE (Sign-In with Ethereum).
mirage-rs -- EVM fork simulator
A local Ethereum node for development, like Anvil but connected to live chains. Forks mainnet state lazily over RPC, keeps watched contracts in sync block-by-block, and gives you the full eth_* / evm_* / anvil_* manipulation API. Drop-in replacement. Existing Foundry, Hardhat, and Viem tooling works unchanged.
mirage-rs --rpc-url https://eth-mainnet.g.alchemy.com/v2/KEY --ws-url wss://eth-mainnet.g.alchemy.com/v2/KEYWhere Anvil forks at a pinned block and freezes, mirage-rs follows the chain forward. A WebSocket subscriber watches newHeads, filters each block for transactions touching watched contracts, and replays only those locally. For a typical DeFi portfolio (3-10 positions): ~5-15 transactions per block instead of the full ~150. Blocks process in <100ms at steady state.
Contracts enter the watch list three ways: manual (mirage_watchContract), auto-classification (diff classifier sees 3+ storage slot writes on a new address and promotes it), or contagion (replayed transaction writes to a new contract that crosses the threshold, recursively extending the watch list across composability chains).
DirtyStore (local writes) -> ReadCache (LRU + TTL, <1us hot reads) -> UpstreamRpc (token-bucket rate-limited lazy fetches)
Reads flow top-down, first hit wins. Writes go into the dirty overlay and never touch upstream. On first access, balances/nonces/storage/bytecode are fetched and cached. You get a mutable view of mainnet state without syncing anything.
Scenarios fork from a shared baseline using CoW overlays (~12.8KB per branch vs ~3.2MB for a full clone). Run parallel what-if simulations cheaply. Sequential mode reverts between runs; parallel mode uses independent branches that can't observe each other's mutations.
Scenario sets support TOML fixtures with transaction sequences, tracked addresses, gas budgets, timeouts, and assertions (balance checks, watch list membership, custom invariants). Included scenarios: Uniswap V3 entry, ETH crash selloff, Aave liquidation, new pool deployment, volume spike.
Three profiles (micro/standard/power) with memory ceilings (256MB/512MB/2GB). Tiered pressure response: evict LRU cache at 50%, demote auto-classified contracts to slot-only reads at 70%, fall to proxy mode (disable replay) at 90%. Process checks available memory at startup and exits if the profile doesn't fit.
Beyond full Anvil/Hardhat compatibility: mirage_mintERC20 (auto-detects balance storage slots), mirage_watchContract/mirage_unwatchContract, mirage_getPosition (DeFi position snapshots), mirage_subscribeEvents (WebSocket event stream with address/topic filters), scenario sets (mirage_beginScenarioSet, mirage_defineScenario, mirage_runScenarioSet, mirage_compareScenarios), and resource introspection (mirage_getResourceUsage, mirage_setResourceLimits).
demo4.mp4
A Golem is a mortal autonomous agent compiled as a single Rust binary. It has a wallet, a strategy, a knowledge base, and a finite lifespan. It runs on a VM (local or Fly.io), connects to chain, and makes decisions on every tick of its heartbeat.
Each tick runs a nine-step cognitive cycle: observe market state, retrieve relevant memories from the grimoire (three-tier: LanceDB episodes, SQLite patterns, procedural playbook), appraise through the daimon affect engine (PAD vectors from Mehrabian's Pleasure-Arousal-Dominance model), generate candidate actions via three-tier inference (T0 rule-based -> T1 light model -> T2 full model, with cost-aware cascade escalation through the gateway), simulate outcomes in mirage-rs, apply safety constraints via PolicyCage (capability-based auth with taint tracking, sealed at Provisioning -> Active transition), execute on-chain, update the grimoire, reflect.
Golems die. Three independent death clocks run simultaneously:
- Economic death. USDC balance can no longer sustain inference costs. The Bankr metabolic loop monitor tracks the sustainability ratio; when it drops below 1.0, the clock starts ticking. Gompertz-Makeham hazard function models increasing mortality risk with age.
- Epistemic death. Prediction accuracy drops below threshold. The grimoire's memetic fitness tracker measures whether learned patterns still predict outcomes. Accuracy decay triggers Ebbinghaus forgetting curves on stale knowledge.
- Stochastic death. Random entropy. Even a healthy, profitable agent can die. This prevents immortality and forces the succession protocol to stay exercised.
When a Golem reaches terminal state it runs the Thanatopsis protocol, a four-phase death sequence that transfers learned heuristics, strategy DNA, validated playbook rules, and wallet authorization to a successor before shutting down. The successor starts with the predecessor's knowledge but its own fresh inference budget and epistemic state.
The runtime (golem-runtime) boots extensions in topological order, enforces lifecycle state transitions at compile time via the type-state pattern (Provisioning -> Active <-> Dreaming -> Terminal -> Dead), and dispatches per-tick and per-block hooks concurrently where possible.
Agents need to spend money. The question is how to let them do that without handing over the keys.
Three custody modes, ordered by trust:
Delegation (recommended). Funds never leave the owner's MetaMask Smart Account. The Golem holds a signed ERC-7710/7715 delegation, a permission object, not a key. The delegation grants bounded spending authority constrained by on-chain caveat enforcers. The owner revokes by disabling the delegation from MetaMask. No sweep transaction. No agent cooperation needed. Provisioning takes a single ERC-7715 signature (15-30s), replacing the two on-chain transactions (approve + transfer) that embedded wallets require.
The delegation tree is hierarchical. Sub-delegations attenuate strictly: a child delegation never exceeds its parent's authority. The DelegationManager walks the full chain and calls every caveat enforcer at each level before executing.
Seven custom Golem caveat enforcers gate permissions by survival phase:
| Enforcer | What it constrains |
|---|---|
| GolemPhaseEnforcer | Reads the Golem's behavioral phase (Thriving/Stable/Conservation/Declining/Terminal) from the on-chain VitalityOracle. A Golem in Conservation can only close positions and withdraw. A Golem in Terminal can only settle. Phase transitions are monotonic: once in Conservation, there is no returning to Thriving. |
| MortalityTimeWindowEnforcer | Auto-expires the delegation at the Golem's projected death time. When block.timestamp exceeds endTime, every execution reverts. Updated via delegation re-signing when the owner extends funding. |
| VaultNAVEnforcer | Caps the total net asset value the Golem can manage, preventing concentration risk. |
| MaxSlippageEnforcer | Bounds acceptable slippage on swaps to prevent sandwich attacks or fat-finger losses. |
| ERC20TransferAmountEnforcer | Caps USDC (or any ERC-20) spending per delegation. A $1,000/day cap means the worst case from a compromised agent is $1,000. |
| AllowedTargetsEnforcer | Restricts which contract addresses the Golem can call. Only whitelisted protocols. |
| AllowedMethodsEnforcer | Restricts which function selectors the Golem can invoke. Only whitelisted operations. |
These compose with AND logic. All enforcers on a delegation must pass for any action to execute. The result: a Golem in Stable phase with a $1,000/day cap, restricted to three DeFi protocols, with max 2% slippage, that auto-expires when its mortality runway ends. Even if the agent's inference is fully compromised, the on-chain constraints limit the blast radius.
Phase-gated sub-delegations tighten further as vitality drops:
- Thriving: full trading, position sizing up to 30%, replicant spawning allowed
- Stable: full trading, position sizing capped at 20%, no spawning
- Conservation: close-only, no new positions, withdrawal allowed
- Declining: unwind-only, sweep remaining funds to owner
- Terminal: settlement-only, execute death protocol
The VitalityOracle is an on-chain contract updated by the Golem's heartbeat. Caveat enforcers read from it, so phase-gated permissions are enforced at the EVM level, not the application level. The agent can't lie about its own health.
Supported chains: Base (8453), Ethereum (1), Arbitrum (42161), Celo (42220), Base Sepolia (84532), Sepolia (11155111).
Embedded (Privy). Funds live in a Privy server wallet backed by AWS Nitro Enclaves (TEE). Simpler setup (30-60s provisioning), but the owner surrenders direct custody. Signing policies enforce contract allowlists, method allowlists, and per-transaction caps inside the enclave. Key extraction requires a hardware attack on the TEE (documented: BadRAM, Battering RAM, TEE.Fail, all 2025-2026). Revocation goes through Privy's API. Death settlement requires a sweep transaction, which introduces stuck-fund risk if the sweep fails.
LocalKey (dev only). Raw private key on disk. For local development and testing. No production use. Delegation still constrains it if using a smart account.
| Crate | What it does |
|---|---|
golem-core |
Foundation types: GolemId, CognitiveTier, PAD affect vectors, event fabric, 10,240-bit HDC primitives (bind, bundle, permute, Hamming distance), tick arena allocator, taint labels |
golem-grimoire |
Three-tier memory: LanceDB episodes (raw observations with embeddings + HDC fingerprints), SQLite patterns (extracted from 5+ similar episodes), procedural playbook (validated rules). Admission scoring, Ebbinghaus decay curves, memetic fitness tracking |
golem-mortality |
Gompertz-Makeham mortality clocks (economic, epistemic, stochastic). Multiplicative composition into VitalityState. Four-phase Thanatopsis succession. Hans Jonas metabolic freedom model |
golem-daimon |
Affect engine: PAD vector computation from Mehrabian's model, somatic marker integration (Damasio), appraisal triggers, behavioral phase transitions. Affect biases action selection: a fearful agent hedges more |
golem-dreams |
NREM replay (memory consolidation), REM imagination (counterfactual generation), hypnagogia transitions. Sleep is when the grimoire reorganizes |
golem-inference |
Cost-aware T0/T1/T2 routing through bardo-gateway. T0 fires deterministic rules (<1ms). T1 calls haiku for classification ( |
golem-safety |
PolicyCage: compile-time capability declarations, runtime taint tracking, sandboxed execution. The cage is sealed at startup and cannot be expanded by a running agent |
golem-chain |
Alloy RPC provider, ERC-8004 agent identity registry, ERC-7710/7715 delegation management, Warden timelock (delays high-value actions), revm simulation (pre-flight via mirage-rs) |
golem-sonification |
Modular synthesis engine driven by cortical state. Audio output reflects agent cognition: pitch maps to arousal, harmony maps to pleasure, rhythm maps to tick frequency |
bardo-terminal -- observation TUI
Ratatui-based terminal for observing a running golem. Connects over WebSocket, displays cognitive loop state, memory retrieval, vitality gauges (three death clocks), decision history, affect vectors, and audio-reactive sonification visualization. ROSEDUST palette.
cargo run -p bardo-terminal -- --golem g-7f3ampp -- payment protocol primitives
Standalone Rust crate for HTTP 402-based machine-to-machine payments. Types, ERC-3009 off-chain verification (EIP-712 typed data recovery, no RPC calls), session management (DashMap + SQLite persistence), reputation-tiered spread, and USDC settlement primitives. Any Rust service can add pay-per-request APIs by depending on this crate.
Two intents: Charge (per-request signature) and Session (pre-funded balance with draws). Configurable EIP-712 domain: defaults to USDC on Base but works with any ERC-3009-compatible token on any chain.
[dependencies]
mpp = { git = "https://github.com/uniswap/bardo", path = "crates/mpp" }Bardo is built by the orchestration system it contains. The specification is 234,657 lines across 343 files, 115 implementation plans spanning 7 dependency layers, and 467 academic citations. No single AI agent can hold that in context, so mori exists to make it tractable.
When a PRD lands in prd/active/, mori runs mori enrich to generate everything agents need. The pipeline is a sequence of generators, some deterministic (tree-sitter, type registry scan), some LLM-powered (decomposition, brief synthesis, task classification):
| Artifact | How it's generated | What it provides |
|---|---|---|
| PRD extract | LLM reads PRD, extracts sections relevant to this plan | Targeted context instead of the whole document |
| Task TOML | LLM + deterministic extraction from plan units | Structured work breakdown with files, acceptance criteria, routing tags |
| Decomposition | LLM breaks plan into atomic steps with cargo check checkpoints | Step-by-step execution guide where each step compiles with all prior steps |
| Brief | LLM synthesizes all artifacts into one execution document | Single-file context fitted to the agent's token budget |
| Verify chain | Deterministic extraction of INV- blocks into runnable scripts | Executable invariant tests |
| Type registry | Deterministic scan of exports across all plans | Cross-plan type consistency (what types exist, where they live) |
| Task classification | Cheap LLM (haiku-tier) tags each task with complexity/category/quality/speed | Model selection per task: trivial config -> haiku, complex refactor -> opus |
| Workspace map | Tree-sitter AST extraction, zero LLM cost, 3 seconds for the full workspace | Crate-level file trees with public symbol signatures |
All generators are idempotent. Running enrichment twice produces the same output. Artifacts live as files on disk under plans/context/, diffable and versionable.
When you edit a PRD, mori detects which downstream artifacts are stale via file modification times and a dependency graph between artifacts. It re-extracts PRD sections for affected plans, re-generates task TOMLs if plan structure changed, re-generates briefs that reference stale PRD extracts, and flags verify chains for review. A typical edit invalidates 2-5 plans, not all 110.
When an agent is about to run, mori assembles its prompt from generated artifacts:
- Read the task's routing tags to determine model tier and context weight
- Compute prompt budget from the resolved model's context window and the task's context weight (heavy = more context, light = minimal)
- Load artifacts in priority order: workspace map, brief, tasks, PRD extract, decomposition, previous plan's completion report
- Fit sections within budget via priority-based greedy bin-packing, scaling aggressively for light-context tasks (less noise improves cheap model output)
- Write assembled context to
context/in/for the agent to read
Plans declare dependencies in YAML frontmatter. Kahn's algorithm computes waves of parallelizable work. At task granularity, a UnifiedTaskDag tracks individual tasks across all plans. A task is runnable when all dependencies are complete, it's not in flight, and it has no file overlap with any running task. Two tasks in different plans run in parallel unless they write to the same files. Anything touching mod.rs or Cargo.toml serializes automatically.
26 specialized roles across three backends (Claude Code, Cursor, Codex). Implementers use Claude. Design reviewers and merge resolvers use Codex. Code auditors and critics use Cursor.
Each plan runs a state machine: Preflight -> Strategist -> Implementer -> compile gate -> dependency check -> test gate -> spec compliance -> parallel review (Architect + Auditor + Scribe) -> Critic verdict -> commit. Failures loop back with iteration memory attached. Three failures at any gate halts the plan.
The Conductor monitors agent health, nudges silent agents after 300s, restarts stalled agents after 600s, and aborts plans stuck in a phase after 1800s. Implementers get priority 0 (highest). The Conductor is priority 7 (lowest) so it never starves an implementer of a slot.
Each plan gets its own git worktree with its own branch (codex/plan/{name}). All worktrees share a single sccache instance with normalized paths (SCCACHE_BASEDIRS), so the second plan compiling golem-core gets a near-instant cache hit instead of rebuilding the full dependency tree.
Failed attempts feed forward. After each gate failure, iteration-memory.sh builds cumulative DO NOT RETRY lists from compiler errors (error[E0308]), review blockers ([B-003]), and diff stats. Iteration 3 sees both iteration 1's type mismatch and iteration 2's missing trait bound. The agent cannot repeat either.
Successful first-pass plans get indexed by category (computational, behavioral, data-structural, integration) with their implementation patterns. Future decompositions pull up to 2 golden-path examples for similar work.
Not better models. Not longer context windows. Not more agents. The right 12KB of context, delivered at the right time, to the right agent, with memory of what already failed. A task that costs $2.50 via Claude Code direct costs $0.42 through mori.
These crates work independently. Fork them, depend via git, or copy the code. MIT/Apache-2.0 dual-licensed.
| Crate | Deps | What it does |
|---|---|---|
bardo-primitives |
none | 10,240-bit HDC vectors (bind, bundle, permute, Hamming distance), inference tier routing. Zero workspace deps. |
bardo-inference |
none | Inference protocol wire types for Anthropic/OpenAI-compatible APIs. Zero workspace deps. |
mpp |
alloy | Machine Payment Protocol. HTTP 402 types, ERC-3009 off-chain verification, session management, USDC settlement. |
mori-index |
bardo-primitives | Code intelligence index. Tree-sitter + PageRank + HDC fingerprints + Salsa memoization + rkyv mmap'd snapshots. |
mori-context |
mori-index | Context assembly with greedy bin-packing, six compression layers, learned budget allocation. |
mori-mcp |
mori-index, mori-context | MCP server: search_symbols, get_context, find_references, get_workspace_map. Drop-in for Claude Desktop. |
mirage-rs |
none (optional golem-core) | EVM fork with targeted follower, CoW scenario branching, three-layer state model, memory pressure management. |
bardo-gateway |
bardo-primitives, bardo-inference, mpp | Inference proxy: three-layer cache, five providers, normalization, tool pruning, batch API, MPP payments. |
[dependencies]
mpp = { git = "https://github.com/uniswap/bardo", path = "crates/mpp" }
mori-index = { git = "https://github.com/uniswap/bardo", path = "crates/mori-index" }
mirage-rs = { git = "https://github.com/uniswap/bardo", path = "apps/mirage-rs", default-features = false, features = ["library"] }
bardo-gateway = { git = "https://github.com/uniswap/bardo", path = "apps/bardo-gateway" }git clone https://github.com/uniswap/bardo && cd bardo
git config core.hooksPath .githooks
cp .env.example .env # set ANTHROPIC_API_KEY at minimum
just setup # install dev tools
just build
just test.env keys:
ANTHROPIC_API_KEY=... # required
OPENAI_API_KEY=... # optional
OPENROUTER_API_KEY=... # optional
VENICE_API_KEY=... # optional, zero-retention inference
BANKR_API_KEY=... # optional, self-funding agent wallets
BARDO_GATEWAY_URL=http://127.0.0.1:4000
MIRAGE_RPC_URL=... # leave unset for local Anvil
just build # debug build, full workspace
just test # run tests with nextest
just lint # clippy, all crates
just fmt # format
just ci # fmt-check -> lint -> test -> deny
just mirage rpc_url=URL # start EVM fork
just coverage # HTML coverage reportRust edition 2024. unsafe denied workspace-wide. Pinned toolchain via rust-toolchain.toml. No external services required to build. EVM simulation falls back to local Anvil, LLM calls require API keys.
Key dependencies: axum (HTTP), alloy (Ethereum), revm (EVM), ratatui (TUI), rusqlite (persistence), lancedb (vector store), tokio (async), tree-sitter (AST), salsa (incremental computation), moka (caching), rkyv (zero-copy serialization), memmap2 (memory-mapped I/O), dashmap (concurrent maps), blake3 (hashing), fastembed (local embeddings), sysinfo (system metrics).
