Epic: Operational Skill Store -- from failure capture to procedural memory

## Summary

Upgrade terraphim-ai's learning system from **defensive failure capture** to a full **Operational Skill Store** (aka Context File System). Currently `terraphim-agent learn` only records failed commands. This proposal extends it to also capture successful multi-step workflows as reusable procedures, track their success rates, and auto-replay proven sequences -- transforming one-off successes into permanent organisational assets.

Inspired by Ben Lorica's [Context File System analysis](https://gradientflow.substack.com/p/the-missing-layer-in-todays-agent) (Gradient Flow, March 2026) and evaluated against terraphim-ai's existing architecture.

## Motivation

### The Context Tax Problem

Current agent architectures treat every task as novel. The 1,000th execution costs the same as the 1st because the agent re-plans from scratch each time. In production ADF deployments, over half of token spend goes to rebuilding context from previous sessions.

### What We Have Today (Partial CFS)

| CFS Feature | terraphim-ai Current | Coverage |
|-------------|---------------------|----------|
| Procedural memory | `terraphim-agent learn` (failures only) + CLAUDE.md boot rules | **Partial** -- no success capture |
| Tool discovery | Aho-Corasick KG (`~/.config/terraphim/kg/`) | **Partial** -- concept matching, not tool schema indexing |
| Reasoning/execution separation | Skills system + CLAUDE.md rules | **Partial** -- skills are manual, not auto-crystallised |
| Self-healing | None -- broken corrections require manual `learn correct <id>` | **Gap** |
| Model independence | Procedures in markdown/bash | **Good** |
| Governance/auditability | Session search, PostToolUse hook logging | **Partial** |

### What We Do Better Than Generic CFS

1. **Aho-Corasick knowledge graph** -- sub-millisecond domain concept matching (CFS has no semantic layer)
2. **Three-domain memory architecture** -- behavioural/relational/technical split is more nuanced than a single procedure store
3. **Edge-level learning capture** -- PostToolUse hooks capture at CLI level before any orchestration layer
4. **Curiosity loops** -- prediction-outcome calibration (#601) and friction detection go beyond CFS's purely defensive model

## Proposed Architecture

```
                     ┌─────────────────────────────────────────┐
                     │         Operational Skill Store          │
                     │                                         │
  Failures ──────►   │  ┌─────────────┐  ┌──────────────────┐  │
  (existing)         │  │  Defensive   │  │   Procedural     │  │
                     │  │  Learnings   │  │   Memory         │  │
  Successes ──────►  │  │  (failures,  │  │   (successful    │  │
  (NEW)              │  │   guardrails)│  │    workflows,    │  │
                     │  └──────┬──────┘  │    replayable)   │  │
                     │         │         └────────┬─────────┘  │
                     │         └────────┬─────────┘            │
                     │                  ▼                      │
                     │         ┌────────────────┐              │
                     │         │ Success-Rate    │              │
                     │         │ Monitor         │              │
                     │         │ (self-healing)  │              │
                     │         └────────┬───────┘              │
                     │                  ▼                      │
                     │         ┌────────────────┐              │
                     │         │ Replay Engine   │              │
                     │         │ (skip reasoning │              │
                     │         │  for known work)│              │
                     │         └────────────────┘              │
                     └─────────────────────────────────────────┘
```

## Implementation Phases

### Phase 1: Success Capture (`terraphim_agent` + `terraphim_agent_evolution`)

Extend `CapturedLearning` with a success variant. When a multi-step bash/tool sequence completes with exit code 0 and matches a known task pattern, capture the full action sequence.

**New type:**
```rust
pub struct CapturedProcedure {
    pub id: String,
    pub task_type: String,           // e.g. "deploy-config", "fix-pipeline"
    pub steps: Vec<ProcedureStep>,   // ordered action sequence
    pub preconditions: Vec<String>,  // what must be true before replay
    pub postconditions: Vec<String>, // what should be true after replay
    pub source: LearningSource,
    pub context: LearningContext,
    pub success_count: u32,          // times replayed successfully
    pub failure_count: u32,          // times replay failed
    pub confidence: f64,             // derived from success/failure ratio
    pub tags: Vec<String>,
}

pub struct ProcedureStep {
    pub command: String,
    pub expected_exit_code: i32,
    pub expected_output_pattern: Option<String>,  // regex for output validation
    pub timeout_secs: Option<u64>,
    pub env_vars: Vec<String>,       // required env vars (names only, not values)
}
```

**CLI extension:**
```bash
terraphim-agent learn capture-success "deploy-config" --steps-from-session <session-id>
terraphim-agent learn procedures                      # list captured procedures
terraphim-agent learn procedures --task-type "deploy"  # filter by type
```

**Affected crates:** `terraphim_agent` (learnings module), `terraphim_types`

### Phase 2: Procedure Replay Engine (`terraphim_agent`)

When `learn query` finds a matching procedure with confidence > 0.8, offer replay instead of re-planning.

```bash
terraphim-agent learn replay "deploy terraphim-llm-proxy config"
# Matches procedure: deploy-config (confidence: 0.92, replayed 14 times)
# Steps:
#   1. ssh alex@linux-small-box "sudo cp /etc/terraphim-llm-proxy/config.toml /etc/terraphim-llm-proxy/config.toml.bak"
#   2. scp config.toml alex@linux-small-box:/tmp/
#   3. ssh alex@linux-small-box "sudo mv /tmp/config.toml /etc/terraphim-llm-proxy/config.toml"
#   4. ssh alex@linux-small-box "sudo systemctl restart terraphim-llm-proxy"
#   5. ssh alex@linux-small-box "sudo systemctl status terraphim-llm-proxy"
# Execute? [y/n/edit]
```

Key behaviours:
- **Dry-run by default** -- show steps before executing
- **Step-by-step validation** -- check each step's exit code and output pattern against expectations
- **Bail on divergence** -- if a step fails or output doesn't match, stop replay and fall back to LLM reasoning
- **Update confidence** -- increment success_count or failure_count after each replay

**Affected crates:** `terraphim_agent` (new `replay` subcommand)

### Phase 3: Success-Rate Monitoring (`terraphim_agent_evolution`)

Track confidence scores over time. When a procedure's confidence drops below threshold (e.g. 0.5 over last 10 executions), mark it as degraded and trigger re-learning.

```rust
pub struct ProcedureHealthReport {
    pub procedure_id: String,
    pub rolling_success_rate: f64,    // last N executions
    pub total_executions: u32,
    pub last_failure: Option<DateTime<Utc>>,
    pub failure_pattern: Option<String>,  // common error across recent failures
    pub status: ProcedureHealth,
}

pub enum ProcedureHealth {
    Healthy,           // > 0.8 success rate
    Degraded,          // 0.5 - 0.8 success rate
    Broken,            // < 0.5 success rate -- auto-pulled from replay
    NeedsRelearning,   // broken + marked for re-capture
}
```

**CLI:**
```bash
terraphim-agent learn health                    # show health report for all procedures
terraphim-agent learn health --degraded-only    # show only degraded/broken
```

**Integration with EIDOS (#601):** Health transitions emit predictions that feed into the prediction-outcome tracking loop.

**Affected crates:** `terraphim_agent_evolution` (health monitoring), `terraphim_agent` (health CLI)

### Phase 4: Nightly Extraction as ADF Agent (`terraphim_orchestrator`)

Add a new Core-tier ADF agent that runs nightly:
1. Import recent sessions (`terraphim-agent sessions import`)
2. Scan sessions for successful multi-step sequences
3. Match against existing procedures (Aho-Corasick deduplication)
4. Crystallise novel sequences as new `CapturedProcedure` entries
5. Run health check across all procedures
6. Generate daily report: new procedures captured, degraded procedures flagged

This is the **automated crystallisation loop** -- the missing piece that turns one-off successes into institutional memory.

**Affected crates:** `terraphim_orchestrator` (new agent config), `terraphim_sessions` (extraction API)

### Phase 5: MCP Tool Index (`terraphim_mcp_server`)

Extend the MCP server to expose a tool capability index. Instead of stuffing all tool schemas into context, the index returns only relevant tool definitions for the current task -- using the existing Aho-Corasick automata to match task description against tool capability descriptions.

```bash
terraphim-agent tools relevant "fix the broken data pipeline"
# Returns: ssh, systemctl, config-edit, log-search (4 of 47 available tools)
```

**Affected crates:** `terraphim_mcp_server`, `terraphim_automata`

## Dependencies on Existing Issues

- **#599** Enhanced learning capture (multi-hook pipeline) -- Phase 1 builds on this
- **#601** EIDOS episodic reasoning -- Phase 3 integrates with prediction-outcome tracking
- **#637** Paperclip leverage into ADF -- Phase 4 adds a new ADF agent
- **#639** Session persistence for Claude Code agents -- Phase 4 depends on session data
- **#683** Tree-structured session storage -- Phase 4 benefits from structured session access

## Economics

Based on Lorica's analysis and our token spend patterns:

- **Current**: every ADF agent execution re-plans from scratch (~15-30% of tokens on context rebuild)
- **After Phase 2**: proven workflows skip LLM reasoning entirely (claimed 90%+ token reduction for routine tasks)
- **After Phase 3**: degraded procedures caught automatically instead of producing silent failures
- **After Phase 4**: new procedures discovered without manual curation

## Non-Goals

- This is NOT a general-purpose workflow engine (use ADF orchestrator for that)
- Procedures are **advisory** -- human or agent always confirms before replay
- No cross-organisation sharing (single-tenant only, matching terraphim's local-first philosophy)
- No proprietary prompt format -- procedures stored as structured Rust types serialised to JSON/CBOR via `terraphim_persistence`

## References

- [Your agents need runbooks, not bigger context windows](https://gradientflow.substack.com/p/the-missing-layer-in-todays-agent) (Ben Lorica, Gradient Flow)
- [dex Context File System](https://getdex.sh/what-is-dex) (commercial CFS implementation)
- [Compounding Agency](knowledge/compound-agency-learning-architecture.md) (Atlas Forge, learning loops framework)
- Existing terraphim-ai: `crates/terraphim_agent/src/learnings/`, `crates/terraphim_agent_evolution/`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Operational Skill Store -- from failure capture to procedural memory #692

Summary

Motivation

The Context Tax Problem

What We Have Today (Partial CFS)

What We Do Better Than Generic CFS

Proposed Architecture

Implementation Phases

Phase 1: Success Capture (`terraphim_agent` + `terraphim_agent_evolution`)

Phase 2: Procedure Replay Engine (`terraphim_agent`)

Phase 3: Success-Rate Monitoring (`terraphim_agent_evolution`)

Phase 4: Nightly Extraction as ADF Agent (`terraphim_orchestrator`)

Phase 5: MCP Tool Index (`terraphim_mcp_server`)

Dependencies on Existing Issues

Economics

Non-Goals

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CFS Feature	terraphim-ai Current	Coverage
Procedural memory	`terraphim-agent learn` (failures only) + CLAUDE.md boot rules	Partial -- no success capture
Tool discovery	Aho-Corasick KG (`~/.config/terraphim/kg/`)	Partial -- concept matching, not tool schema indexing
Reasoning/execution separation	Skills system + CLAUDE.md rules	Partial -- skills are manual, not auto-crystallised
Self-healing	None -- broken corrections require manual `learn correct <id>`	Gap
Model independence	Procedures in markdown/bash	Good
Governance/auditability	Session search, PostToolUse hook logging	Partial

Epic: Operational Skill Store -- from failure capture to procedural memory #692

Description

Summary

Motivation

The Context Tax Problem

What We Have Today (Partial CFS)

What We Do Better Than Generic CFS

Proposed Architecture

Implementation Phases

Phase 1: Success Capture (terraphim_agent + terraphim_agent_evolution)

Phase 2: Procedure Replay Engine (terraphim_agent)

Phase 3: Success-Rate Monitoring (terraphim_agent_evolution)

Phase 4: Nightly Extraction as ADF Agent (terraphim_orchestrator)

Phase 5: MCP Tool Index (terraphim_mcp_server)

Dependencies on Existing Issues

Economics

Non-Goals

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Phase 1: Success Capture (`terraphim_agent` + `terraphim_agent_evolution`)

Phase 2: Procedure Replay Engine (`terraphim_agent`)

Phase 3: Success-Rate Monitoring (`terraphim_agent_evolution`)

Phase 4: Nightly Extraction as ADF Agent (`terraphim_orchestrator`)

Phase 5: MCP Tool Index (`terraphim_mcp_server`)