Cortex `consolidate` — Performance Report

# Cortex `consolidate` — Performance Report

**Plugin:** `cortex` (v3.9.1, per skill path)
**Date:** 2026-04-15
**Environment:** macOS 25.4.0 (Darwin), Mac M2 Max, 64 GB RAM

## Summary

A full `consolidate({decay, compress, cls, memify})` run on a memory
store of ~66K memories took **35 min 50 s** (2,150,496 ms). Most of
the work appears to be O(N) per-row updates across stages that could
plausibly be batched. One stage scanned but produced nothing, and the
homeostatic stage reported `health_score: 0.0` and skipped scaling.

I'd like to suggest a few low-risk optimizations and — more usefully —
some per-stage telemetry that would let maintainers (and operators)
see where time actually goes on real stores.

## Pre-run state

```json
{
  "total_memories": 65949,
  "episodic_count": 25319,
  "semantic_count": 40630,
  "active_count":   65610,
  "archived_count": 339,
  "protected_count": 3088,
  "avg_heat": 0.6746,
  "total_entities": 10770,
  "total_relationships": 34735,
  "active_triggers": 1456,
  "last_consolidation": "2026-04-13T01:20:15Z",   // ~2 days prior
  "has_vector_search": true
}
```

## Full-run result

```json
{
  "decay":       { "memories_decayed": 62522, "metabolic_updates": 65937,
                   "entities_decayed": 10770, "total_memories": 65949 },
  "plasticity":  { "ltp": 17, "ltd": 33092, "edges_updated": 33109 },
  "pruning":     { "edges_pruned": 32024, "entities_archived": 718 },
  "compression": { "compressed_to_gist": 213, "compressed_to_tag": 0,
                   "protected_skipped": 3088, "semantic_skipped": 38502 },
  "cls":         { "patterns_found": 0, "new_semantics_created": 0,
                   "skipped_inconsistent": 0, "skipped_duplicate": 0,
                   "causal_edges_found": 0 },
  "memify":      { "pruned": 0, "strengthened": 0, "reweighted": 1345 },
  "cascade":     { "advanced": 503, "transitions": [... 503 entries ...] },
  "homeostatic": { "scaling_applied": false, "health_score": 0.0 },
  "duration_ms": 2150496
}
```

## Observations

### 1. Most work is row-by-row across large result sets
- **Decay** touched 62,522 memories (94.8 % of the store) plus 10,770
  entities. If this is implemented as per-row `UPDATE` statements
  inside a loop, it is the most plausible source of the bulk of the
  35 minutes.
- **Plasticity** updated 33,109 edges (17 LTP + 33,092 LTD).
- **Pruning** deleted 32,024 edges.

These three stages together account for ~138K row operations. A single
set-based `UPDATE ... SET heat = heat * :decay WHERE ...` (and the
analogous `UPDATE` / `DELETE` for plasticity and pruning) is typically
orders of magnitude faster than per-row application logic.

### 2. CLS scanned ~25K episodic memories and produced zero output

```
patterns_found: 0, new_semantics_created: 0,
skipped_inconsistent: 0, skipped_duplicate: 0, causal_edges_found: 0
```

A stage that finds nothing after a 2-day gap on a 25K-episodic store
is either miscalibrated (threshold too strict) or doing expensive
work with no observable effect. Either way, it's a candidate for
early-exit when the input set hasn't changed meaningfully since the
last run.

### 3. Homeostatic stage reports `health_score: 0.0`, scaling skipped

`"scaling_applied": false, "health_score": 0.0` looks like either a
divide-by-zero / empty-input guard short-circuiting, or a metric that
legitimately is zero but whose name suggests a bug. Worth a glance.

### 4. Compression is heavily skipped
`protected_skipped: 3088`, `semantic_skipped: 38502`, with only
`compressed_to_gist: 213`. Skips are ~99 % of the candidate set. If
those skip checks are cheap, this is fine — but if the skip path
re-reads each memory's content/flags from the DB, that's another
O(N) tax.

### 5. No per-stage duration in the result payload

Today the response includes only `duration_ms` (total). Everything
above is inferred from row counts. Per-stage timings would make
bottleneck identification mechanical instead of speculative.

## Suggested telemetry (the most useful change, probably)

Return `duration_ms` per stage in the existing result dict:

```json
{
  "decay":       { ..., "duration_ms": 1234567 },
  "plasticity":  { ..., "duration_ms": 234567 },
  "pruning":     { ..., "duration_ms": 45678 },
  "compression": { ..., "duration_ms": 12345 },
  "cls":         { ..., "duration_ms": 456789 },   // scanned but empty
  "memify":      { ..., "duration_ms": 6789 },
  "cascade":     { ..., "duration_ms": 3456 },
  "homeostatic": { ..., "duration_ms": 12 },
  "duration_ms":  2150496
}
```

If it's useful, I'd be happy to collect and share per-run timing
stats from my installation so you have at least one real-world
large-store data point to tune against.

Additional fields that would be valuable if cheap to collect:

| Field | Why |
|---|---|
| `rows_scanned` vs `rows_modified` per stage | Catches the CLS-style "scanned 25K, produced 0" case |
| `db_query_count`, `db_tx_count` per stage | Surfaces N+1 patterns directly |
| `avg_work_per_memory_ms` | Normalizes across memory-store sizes |
| `peak_memory_mb` | Flags when in-memory structures should be streamed |
| `delta_vs_last_run_ms` | Drift signal — "this run was 2× slower than last" |

## Suggested optimizations (in rough order of likely payoff)

1. **Batch the decay update.** One SQL statement over the candidate
   set instead of per-row updates. Same for plasticity LTD and pruning.
2. **Cooldown window on decay.** Skip memories whose `last_decayed`
   timestamp is within the configured cooldown — on a 2-day gap this
   may not help, but on daily runs it should dramatically shrink the
   working set.
3. **Early-exit CLS** when the episodic-memory delta (new / accessed
   since last run) is below some minimum. The current run scanned the
   whole 25K to produce zero patterns.
4. **Instrument the `*_skipped` counters in compression** so it's
   clear whether the skip check itself is the expense or only the
   (rare) non-skip path.
5. **Fix or explain `health_score: 0.0` + `scaling_applied: false`.**
   Either the metric name is misleading (rename) or the scaling path
   is silently disabled (fix).

## Offer to help

If any of the above is actionable and you'd like:

- a real-world workload for benchmarking (my ~66K-memory store before
  and after a patched run),
- per-stage timings once the telemetry lands, or
- a PR for the telemetry change itself (adding `duration_ms` per
  stage is a small, contained change),

happy to do it. Let me know what's most useful.

## Repro

```text
Tool: mcp__plugin_cortex_cortex__consolidate
Args: { decay: true, compress: true, cls: true, memify: true, deep: false }
Store size at run: 65949 memories, 40630 semantic, 34735 relationships
Time since last consolidate: ~2 days
Wall-clock duration: 2,150,496 ms (35 min 50 s)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cortex `consolidate` — Performance Report #13

Cortex `consolidate` — Performance Report

Summary

Pre-run state

Full-run result

Observations

1. Most work is row-by-row across large result sets

2. CLS scanned ~25K episodic memories and produced zero output

3. Homeostatic stage reports `health_score: 0.0`, scaling skipped

4. Compression is heavily skipped

5. No per-stage duration in the result payload

Suggested telemetry (the most useful change, probably)

Suggested optimizations (in rough order of likely payoff)

Offer to help

Repro

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Field	Why
`rows_scanned` vs `rows_modified` per stage	Catches the CLS-style "scanned 25K, produced 0" case
`db_query_count`, `db_tx_count` per stage	Surfaces N+1 patterns directly
`avg_work_per_memory_ms`	Normalizes across memory-store sizes
`peak_memory_mb`	Flags when in-memory structures should be streamed
`delta_vs_last_run_ms`	Drift signal — "this run was 2× slower than last"

Cortex consolidate — Performance Report #13

Description

Cortex consolidate — Performance Report

Summary

Pre-run state

Full-run result

Observations

1. Most work is row-by-row across large result sets

2. CLS scanned ~25K episodic memories and produced zero output

3. Homeostatic stage reports health_score: 0.0, scaling skipped

4. Compression is heavily skipped

5. No per-stage duration in the result payload

Suggested telemetry (the most useful change, probably)

Suggested optimizations (in rough order of likely payoff)

Offer to help

Repro

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Cortex `consolidate` — Performance Report #13

Cortex `consolidate` — Performance Report

3. Homeostatic stage reports `health_score: 0.0`, scaling skipped