-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Summary
Upgrade terraphim-ai's learning system from defensive failure capture to a full Operational Skill Store (aka Context File System). Currently terraphim-agent learn only records failed commands. This proposal extends it to also capture successful multi-step workflows as reusable procedures, track their success rates, and auto-replay proven sequences -- transforming one-off successes into permanent organisational assets.
Inspired by Ben Lorica's Context File System analysis (Gradient Flow, March 2026) and evaluated against terraphim-ai's existing architecture.
Motivation
The Context Tax Problem
Current agent architectures treat every task as novel. The 1,000th execution costs the same as the 1st because the agent re-plans from scratch each time. In production ADF deployments, over half of token spend goes to rebuilding context from previous sessions.
What We Have Today (Partial CFS)
| CFS Feature | terraphim-ai Current | Coverage |
|---|---|---|
| Procedural memory | terraphim-agent learn (failures only) + CLAUDE.md boot rules |
Partial -- no success capture |
| Tool discovery | Aho-Corasick KG (~/.config/terraphim/kg/) |
Partial -- concept matching, not tool schema indexing |
| Reasoning/execution separation | Skills system + CLAUDE.md rules | Partial -- skills are manual, not auto-crystallised |
| Self-healing | None -- broken corrections require manual learn correct <id> |
Gap |
| Model independence | Procedures in markdown/bash | Good |
| Governance/auditability | Session search, PostToolUse hook logging | Partial |
What We Do Better Than Generic CFS
- Aho-Corasick knowledge graph -- sub-millisecond domain concept matching (CFS has no semantic layer)
- Three-domain memory architecture -- behavioural/relational/technical split is more nuanced than a single procedure store
- Edge-level learning capture -- PostToolUse hooks capture at CLI level before any orchestration layer
- Curiosity loops -- prediction-outcome calibration (EIDOS episodic reasoning: prediction-outcome tracking for agent evolution #601) and friction detection go beyond CFS's purely defensive model
Proposed Architecture
┌─────────────────────────────────────────┐
│ Operational Skill Store │
│ │
Failures ──────► │ ┌─────────────┐ ┌──────────────────┐ │
(existing) │ │ Defensive │ │ Procedural │ │
│ │ Learnings │ │ Memory │ │
Successes ──────► │ │ (failures, │ │ (successful │ │
(NEW) │ │ guardrails)│ │ workflows, │ │
│ └──────┬──────┘ │ replayable) │ │
│ │ └────────┬─────────┘ │
│ └────────┬─────────┘ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Success-Rate │ │
│ │ Monitor │ │
│ │ (self-healing) │ │
│ └────────┬───────┘ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Replay Engine │ │
│ │ (skip reasoning │ │
│ │ for known work)│ │
│ └────────────────┘ │
└─────────────────────────────────────────┘
Implementation Phases
Phase 1: Success Capture (terraphim_agent + terraphim_agent_evolution)
Extend CapturedLearning with a success variant. When a multi-step bash/tool sequence completes with exit code 0 and matches a known task pattern, capture the full action sequence.
New type:
pub struct CapturedProcedure {
pub id: String,
pub task_type: String, // e.g. "deploy-config", "fix-pipeline"
pub steps: Vec<ProcedureStep>, // ordered action sequence
pub preconditions: Vec<String>, // what must be true before replay
pub postconditions: Vec<String>, // what should be true after replay
pub source: LearningSource,
pub context: LearningContext,
pub success_count: u32, // times replayed successfully
pub failure_count: u32, // times replay failed
pub confidence: f64, // derived from success/failure ratio
pub tags: Vec<String>,
}
pub struct ProcedureStep {
pub command: String,
pub expected_exit_code: i32,
pub expected_output_pattern: Option<String>, // regex for output validation
pub timeout_secs: Option<u64>,
pub env_vars: Vec<String>, // required env vars (names only, not values)
}CLI extension:
terraphim-agent learn capture-success "deploy-config" --steps-from-session <session-id>
terraphim-agent learn procedures # list captured procedures
terraphim-agent learn procedures --task-type "deploy" # filter by typeAffected crates: terraphim_agent (learnings module), terraphim_types
Phase 2: Procedure Replay Engine (terraphim_agent)
When learn query finds a matching procedure with confidence > 0.8, offer replay instead of re-planning.
terraphim-agent learn replay "deploy terraphim-llm-proxy config"
# Matches procedure: deploy-config (confidence: 0.92, replayed 14 times)
# Steps:
# 1. ssh alex@linux-small-box "sudo cp /etc/terraphim-llm-proxy/config.toml /etc/terraphim-llm-proxy/config.toml.bak"
# 2. scp config.toml alex@linux-small-box:/tmp/
# 3. ssh alex@linux-small-box "sudo mv /tmp/config.toml /etc/terraphim-llm-proxy/config.toml"
# 4. ssh alex@linux-small-box "sudo systemctl restart terraphim-llm-proxy"
# 5. ssh alex@linux-small-box "sudo systemctl status terraphim-llm-proxy"
# Execute? [y/n/edit]Key behaviours:
- Dry-run by default -- show steps before executing
- Step-by-step validation -- check each step's exit code and output pattern against expectations
- Bail on divergence -- if a step fails or output doesn't match, stop replay and fall back to LLM reasoning
- Update confidence -- increment success_count or failure_count after each replay
Affected crates: terraphim_agent (new replay subcommand)
Phase 3: Success-Rate Monitoring (terraphim_agent_evolution)
Track confidence scores over time. When a procedure's confidence drops below threshold (e.g. 0.5 over last 10 executions), mark it as degraded and trigger re-learning.
pub struct ProcedureHealthReport {
pub procedure_id: String,
pub rolling_success_rate: f64, // last N executions
pub total_executions: u32,
pub last_failure: Option<DateTime<Utc>>,
pub failure_pattern: Option<String>, // common error across recent failures
pub status: ProcedureHealth,
}
pub enum ProcedureHealth {
Healthy, // > 0.8 success rate
Degraded, // 0.5 - 0.8 success rate
Broken, // < 0.5 success rate -- auto-pulled from replay
NeedsRelearning, // broken + marked for re-capture
}CLI:
terraphim-agent learn health # show health report for all procedures
terraphim-agent learn health --degraded-only # show only degraded/brokenIntegration with EIDOS (#601): Health transitions emit predictions that feed into the prediction-outcome tracking loop.
Affected crates: terraphim_agent_evolution (health monitoring), terraphim_agent (health CLI)
Phase 4: Nightly Extraction as ADF Agent (terraphim_orchestrator)
Add a new Core-tier ADF agent that runs nightly:
- Import recent sessions (
terraphim-agent sessions import) - Scan sessions for successful multi-step sequences
- Match against existing procedures (Aho-Corasick deduplication)
- Crystallise novel sequences as new
CapturedProcedureentries - Run health check across all procedures
- Generate daily report: new procedures captured, degraded procedures flagged
This is the automated crystallisation loop -- the missing piece that turns one-off successes into institutional memory.
Affected crates: terraphim_orchestrator (new agent config), terraphim_sessions (extraction API)
Phase 5: MCP Tool Index (terraphim_mcp_server)
Extend the MCP server to expose a tool capability index. Instead of stuffing all tool schemas into context, the index returns only relevant tool definitions for the current task -- using the existing Aho-Corasick automata to match task description against tool capability descriptions.
terraphim-agent tools relevant "fix the broken data pipeline"
# Returns: ssh, systemctl, config-edit, log-search (4 of 47 available tools)Affected crates: terraphim_mcp_server, terraphim_automata
Dependencies on Existing Issues
- Enhanced learning capture: multi-hook pipeline with importance scoring #599 Enhanced learning capture (multi-hook pipeline) -- Phase 1 builds on this
- EIDOS episodic reasoning: prediction-outcome tracking for agent evolution #601 EIDOS episodic reasoning -- Phase 3 integrates with prediction-outcome tracking
- Epic: Leverage Paperclip features into AI Dark Factory #637 Paperclip leverage into ADF -- Phase 4 adds a new ADF agent
- Phase 2: Session persistence for Claude Code agents in ADF #639 Session persistence for Claude Code agents -- Phase 4 depends on session data
- Evaluate: Tree-structured session storage (Pi pattern) for terraphim_persistence #683 Tree-structured session storage -- Phase 4 benefits from structured session access
Economics
Based on Lorica's analysis and our token spend patterns:
- Current: every ADF agent execution re-plans from scratch (~15-30% of tokens on context rebuild)
- After Phase 2: proven workflows skip LLM reasoning entirely (claimed 90%+ token reduction for routine tasks)
- After Phase 3: degraded procedures caught automatically instead of producing silent failures
- After Phase 4: new procedures discovered without manual curation
Non-Goals
- This is NOT a general-purpose workflow engine (use ADF orchestrator for that)
- Procedures are advisory -- human or agent always confirms before replay
- No cross-organisation sharing (single-tenant only, matching terraphim's local-first philosophy)
- No proprietary prompt format -- procedures stored as structured Rust types serialised to JSON/CBOR via
terraphim_persistence
References
- Your agents need runbooks, not bigger context windows (Ben Lorica, Gradient Flow)
- dex Context File System (commercial CFS implementation)
- Compounding Agency (Atlas Forge, learning loops framework)
- Existing terraphim-ai:
crates/terraphim_agent/src/learnings/,crates/terraphim_agent_evolution/