Skip to content

Epic: Operational Skill Store -- from failure capture to procedural memory #692

@AlexMikhalev

Description

@AlexMikhalev

Summary

Upgrade terraphim-ai's learning system from defensive failure capture to a full Operational Skill Store (aka Context File System). Currently terraphim-agent learn only records failed commands. This proposal extends it to also capture successful multi-step workflows as reusable procedures, track their success rates, and auto-replay proven sequences -- transforming one-off successes into permanent organisational assets.

Inspired by Ben Lorica's Context File System analysis (Gradient Flow, March 2026) and evaluated against terraphim-ai's existing architecture.

Motivation

The Context Tax Problem

Current agent architectures treat every task as novel. The 1,000th execution costs the same as the 1st because the agent re-plans from scratch each time. In production ADF deployments, over half of token spend goes to rebuilding context from previous sessions.

What We Have Today (Partial CFS)

CFS Feature terraphim-ai Current Coverage
Procedural memory terraphim-agent learn (failures only) + CLAUDE.md boot rules Partial -- no success capture
Tool discovery Aho-Corasick KG (~/.config/terraphim/kg/) Partial -- concept matching, not tool schema indexing
Reasoning/execution separation Skills system + CLAUDE.md rules Partial -- skills are manual, not auto-crystallised
Self-healing None -- broken corrections require manual learn correct <id> Gap
Model independence Procedures in markdown/bash Good
Governance/auditability Session search, PostToolUse hook logging Partial

What We Do Better Than Generic CFS

  1. Aho-Corasick knowledge graph -- sub-millisecond domain concept matching (CFS has no semantic layer)
  2. Three-domain memory architecture -- behavioural/relational/technical split is more nuanced than a single procedure store
  3. Edge-level learning capture -- PostToolUse hooks capture at CLI level before any orchestration layer
  4. Curiosity loops -- prediction-outcome calibration (EIDOS episodic reasoning: prediction-outcome tracking for agent evolution #601) and friction detection go beyond CFS's purely defensive model

Proposed Architecture

                     ┌─────────────────────────────────────────┐
                     │         Operational Skill Store          │
                     │                                         │
  Failures ──────►   │  ┌─────────────┐  ┌──────────────────┐  │
  (existing)         │  │  Defensive   │  │   Procedural     │  │
                     │  │  Learnings   │  │   Memory         │  │
  Successes ──────►  │  │  (failures,  │  │   (successful    │  │
  (NEW)              │  │   guardrails)│  │    workflows,    │  │
                     │  └──────┬──────┘  │    replayable)   │  │
                     │         │         └────────┬─────────┘  │
                     │         └────────┬─────────┘            │
                     │                  ▼                      │
                     │         ┌────────────────┐              │
                     │         │ Success-Rate    │              │
                     │         │ Monitor         │              │
                     │         │ (self-healing)  │              │
                     │         └────────┬───────┘              │
                     │                  ▼                      │
                     │         ┌────────────────┐              │
                     │         │ Replay Engine   │              │
                     │         │ (skip reasoning │              │
                     │         │  for known work)│              │
                     │         └────────────────┘              │
                     └─────────────────────────────────────────┘

Implementation Phases

Phase 1: Success Capture (terraphim_agent + terraphim_agent_evolution)

Extend CapturedLearning with a success variant. When a multi-step bash/tool sequence completes with exit code 0 and matches a known task pattern, capture the full action sequence.

New type:

pub struct CapturedProcedure {
    pub id: String,
    pub task_type: String,           // e.g. "deploy-config", "fix-pipeline"
    pub steps: Vec<ProcedureStep>,   // ordered action sequence
    pub preconditions: Vec<String>,  // what must be true before replay
    pub postconditions: Vec<String>, // what should be true after replay
    pub source: LearningSource,
    pub context: LearningContext,
    pub success_count: u32,          // times replayed successfully
    pub failure_count: u32,          // times replay failed
    pub confidence: f64,             // derived from success/failure ratio
    pub tags: Vec<String>,
}

pub struct ProcedureStep {
    pub command: String,
    pub expected_exit_code: i32,
    pub expected_output_pattern: Option<String>,  // regex for output validation
    pub timeout_secs: Option<u64>,
    pub env_vars: Vec<String>,       // required env vars (names only, not values)
}

CLI extension:

terraphim-agent learn capture-success "deploy-config" --steps-from-session <session-id>
terraphim-agent learn procedures                      # list captured procedures
terraphim-agent learn procedures --task-type "deploy"  # filter by type

Affected crates: terraphim_agent (learnings module), terraphim_types

Phase 2: Procedure Replay Engine (terraphim_agent)

When learn query finds a matching procedure with confidence > 0.8, offer replay instead of re-planning.

terraphim-agent learn replay "deploy terraphim-llm-proxy config"
# Matches procedure: deploy-config (confidence: 0.92, replayed 14 times)
# Steps:
#   1. ssh alex@linux-small-box "sudo cp /etc/terraphim-llm-proxy/config.toml /etc/terraphim-llm-proxy/config.toml.bak"
#   2. scp config.toml alex@linux-small-box:/tmp/
#   3. ssh alex@linux-small-box "sudo mv /tmp/config.toml /etc/terraphim-llm-proxy/config.toml"
#   4. ssh alex@linux-small-box "sudo systemctl restart terraphim-llm-proxy"
#   5. ssh alex@linux-small-box "sudo systemctl status terraphim-llm-proxy"
# Execute? [y/n/edit]

Key behaviours:

  • Dry-run by default -- show steps before executing
  • Step-by-step validation -- check each step's exit code and output pattern against expectations
  • Bail on divergence -- if a step fails or output doesn't match, stop replay and fall back to LLM reasoning
  • Update confidence -- increment success_count or failure_count after each replay

Affected crates: terraphim_agent (new replay subcommand)

Phase 3: Success-Rate Monitoring (terraphim_agent_evolution)

Track confidence scores over time. When a procedure's confidence drops below threshold (e.g. 0.5 over last 10 executions), mark it as degraded and trigger re-learning.

pub struct ProcedureHealthReport {
    pub procedure_id: String,
    pub rolling_success_rate: f64,    // last N executions
    pub total_executions: u32,
    pub last_failure: Option<DateTime<Utc>>,
    pub failure_pattern: Option<String>,  // common error across recent failures
    pub status: ProcedureHealth,
}

pub enum ProcedureHealth {
    Healthy,           // > 0.8 success rate
    Degraded,          // 0.5 - 0.8 success rate
    Broken,            // < 0.5 success rate -- auto-pulled from replay
    NeedsRelearning,   // broken + marked for re-capture
}

CLI:

terraphim-agent learn health                    # show health report for all procedures
terraphim-agent learn health --degraded-only    # show only degraded/broken

Integration with EIDOS (#601): Health transitions emit predictions that feed into the prediction-outcome tracking loop.

Affected crates: terraphim_agent_evolution (health monitoring), terraphim_agent (health CLI)

Phase 4: Nightly Extraction as ADF Agent (terraphim_orchestrator)

Add a new Core-tier ADF agent that runs nightly:

  1. Import recent sessions (terraphim-agent sessions import)
  2. Scan sessions for successful multi-step sequences
  3. Match against existing procedures (Aho-Corasick deduplication)
  4. Crystallise novel sequences as new CapturedProcedure entries
  5. Run health check across all procedures
  6. Generate daily report: new procedures captured, degraded procedures flagged

This is the automated crystallisation loop -- the missing piece that turns one-off successes into institutional memory.

Affected crates: terraphim_orchestrator (new agent config), terraphim_sessions (extraction API)

Phase 5: MCP Tool Index (terraphim_mcp_server)

Extend the MCP server to expose a tool capability index. Instead of stuffing all tool schemas into context, the index returns only relevant tool definitions for the current task -- using the existing Aho-Corasick automata to match task description against tool capability descriptions.

terraphim-agent tools relevant "fix the broken data pipeline"
# Returns: ssh, systemctl, config-edit, log-search (4 of 47 available tools)

Affected crates: terraphim_mcp_server, terraphim_automata

Dependencies on Existing Issues

Economics

Based on Lorica's analysis and our token spend patterns:

  • Current: every ADF agent execution re-plans from scratch (~15-30% of tokens on context rebuild)
  • After Phase 2: proven workflows skip LLM reasoning entirely (claimed 90%+ token reduction for routine tasks)
  • After Phase 3: degraded procedures caught automatically instead of producing silent failures
  • After Phase 4: new procedures discovered without manual curation

Non-Goals

  • This is NOT a general-purpose workflow engine (use ADF orchestrator for that)
  • Procedures are advisory -- human or agent always confirms before replay
  • No cross-organisation sharing (single-tenant only, matching terraphim's local-first philosophy)
  • No proprietary prompt format -- procedures stored as structured Rust types serialised to JSON/CBOR via terraphim_persistence

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions