Cortex consolidate — Performance Report
Plugin: cortex (v3.9.1, per skill path)
Date: 2026-04-15
Environment: macOS 25.4.0 (Darwin), Mac M2 Max, 64 GB RAM
Summary
A full consolidate({decay, compress, cls, memify}) run on a memory
store of ~66K memories took 35 min 50 s (2,150,496 ms). Most of
the work appears to be O(N) per-row updates across stages that could
plausibly be batched. One stage scanned but produced nothing, and the
homeostatic stage reported health_score: 0.0 and skipped scaling.
I'd like to suggest a few low-risk optimizations and — more usefully —
some per-stage telemetry that would let maintainers (and operators)
see where time actually goes on real stores.
Pre-run state
{
"total_memories": 65949,
"episodic_count": 25319,
"semantic_count": 40630,
"active_count": 65610,
"archived_count": 339,
"protected_count": 3088,
"avg_heat": 0.6746,
"total_entities": 10770,
"total_relationships": 34735,
"active_triggers": 1456,
"last_consolidation": "2026-04-13T01:20:15Z", // ~2 days prior
"has_vector_search": true
}
Full-run result
{
"decay": { "memories_decayed": 62522, "metabolic_updates": 65937,
"entities_decayed": 10770, "total_memories": 65949 },
"plasticity": { "ltp": 17, "ltd": 33092, "edges_updated": 33109 },
"pruning": { "edges_pruned": 32024, "entities_archived": 718 },
"compression": { "compressed_to_gist": 213, "compressed_to_tag": 0,
"protected_skipped": 3088, "semantic_skipped": 38502 },
"cls": { "patterns_found": 0, "new_semantics_created": 0,
"skipped_inconsistent": 0, "skipped_duplicate": 0,
"causal_edges_found": 0 },
"memify": { "pruned": 0, "strengthened": 0, "reweighted": 1345 },
"cascade": { "advanced": 503, "transitions": [... 503 entries ...] },
"homeostatic": { "scaling_applied": false, "health_score": 0.0 },
"duration_ms": 2150496
}
Observations
1. Most work is row-by-row across large result sets
- Decay touched 62,522 memories (94.8 % of the store) plus 10,770
entities. If this is implemented as per-row UPDATE statements
inside a loop, it is the most plausible source of the bulk of the
35 minutes.
- Plasticity updated 33,109 edges (17 LTP + 33,092 LTD).
- Pruning deleted 32,024 edges.
These three stages together account for ~138K row operations. A single
set-based UPDATE ... SET heat = heat * :decay WHERE ... (and the
analogous UPDATE / DELETE for plasticity and pruning) is typically
orders of magnitude faster than per-row application logic.
2. CLS scanned ~25K episodic memories and produced zero output
patterns_found: 0, new_semantics_created: 0,
skipped_inconsistent: 0, skipped_duplicate: 0, causal_edges_found: 0
A stage that finds nothing after a 2-day gap on a 25K-episodic store
is either miscalibrated (threshold too strict) or doing expensive
work with no observable effect. Either way, it's a candidate for
early-exit when the input set hasn't changed meaningfully since the
last run.
3. Homeostatic stage reports health_score: 0.0, scaling skipped
"scaling_applied": false, "health_score": 0.0 looks like either a
divide-by-zero / empty-input guard short-circuiting, or a metric that
legitimately is zero but whose name suggests a bug. Worth a glance.
4. Compression is heavily skipped
protected_skipped: 3088, semantic_skipped: 38502, with only
compressed_to_gist: 213. Skips are ~99 % of the candidate set. If
those skip checks are cheap, this is fine — but if the skip path
re-reads each memory's content/flags from the DB, that's another
O(N) tax.
5. No per-stage duration in the result payload
Today the response includes only duration_ms (total). Everything
above is inferred from row counts. Per-stage timings would make
bottleneck identification mechanical instead of speculative.
Suggested telemetry (the most useful change, probably)
Return duration_ms per stage in the existing result dict:
{
"decay": { ..., "duration_ms": 1234567 },
"plasticity": { ..., "duration_ms": 234567 },
"pruning": { ..., "duration_ms": 45678 },
"compression": { ..., "duration_ms": 12345 },
"cls": { ..., "duration_ms": 456789 }, // scanned but empty
"memify": { ..., "duration_ms": 6789 },
"cascade": { ..., "duration_ms": 3456 },
"homeostatic": { ..., "duration_ms": 12 },
"duration_ms": 2150496
}
If it's useful, I'd be happy to collect and share per-run timing
stats from my installation so you have at least one real-world
large-store data point to tune against.
Additional fields that would be valuable if cheap to collect:
| Field |
Why |
rows_scanned vs rows_modified per stage |
Catches the CLS-style "scanned 25K, produced 0" case |
db_query_count, db_tx_count per stage |
Surfaces N+1 patterns directly |
avg_work_per_memory_ms |
Normalizes across memory-store sizes |
peak_memory_mb |
Flags when in-memory structures should be streamed |
delta_vs_last_run_ms |
Drift signal — "this run was 2× slower than last" |
Suggested optimizations (in rough order of likely payoff)
- Batch the decay update. One SQL statement over the candidate
set instead of per-row updates. Same for plasticity LTD and pruning.
- Cooldown window on decay. Skip memories whose
last_decayed
timestamp is within the configured cooldown — on a 2-day gap this
may not help, but on daily runs it should dramatically shrink the
working set.
- Early-exit CLS when the episodic-memory delta (new / accessed
since last run) is below some minimum. The current run scanned the
whole 25K to produce zero patterns.
- Instrument the
*_skipped counters in compression so it's
clear whether the skip check itself is the expense or only the
(rare) non-skip path.
- Fix or explain
health_score: 0.0 + scaling_applied: false.
Either the metric name is misleading (rename) or the scaling path
is silently disabled (fix).
Offer to help
If any of the above is actionable and you'd like:
- a real-world workload for benchmarking (my ~66K-memory store before
and after a patched run),
- per-stage timings once the telemetry lands, or
- a PR for the telemetry change itself (adding
duration_ms per
stage is a small, contained change),
happy to do it. Let me know what's most useful.
Repro
Tool: mcp__plugin_cortex_cortex__consolidate
Args: { decay: true, compress: true, cls: true, memify: true, deep: false }
Store size at run: 65949 memories, 40630 semantic, 34735 relationships
Time since last consolidate: ~2 days
Wall-clock duration: 2,150,496 ms (35 min 50 s)
Cortex
consolidate— Performance ReportPlugin:
cortex(v3.9.1, per skill path)Date: 2026-04-15
Environment: macOS 25.4.0 (Darwin), Mac M2 Max, 64 GB RAM
Summary
A full
consolidate({decay, compress, cls, memify})run on a memorystore of ~66K memories took 35 min 50 s (2,150,496 ms). Most of
the work appears to be O(N) per-row updates across stages that could
plausibly be batched. One stage scanned but produced nothing, and the
homeostatic stage reported
health_score: 0.0and skipped scaling.I'd like to suggest a few low-risk optimizations and — more usefully —
some per-stage telemetry that would let maintainers (and operators)
see where time actually goes on real stores.
Pre-run state
{ "total_memories": 65949, "episodic_count": 25319, "semantic_count": 40630, "active_count": 65610, "archived_count": 339, "protected_count": 3088, "avg_heat": 0.6746, "total_entities": 10770, "total_relationships": 34735, "active_triggers": 1456, "last_consolidation": "2026-04-13T01:20:15Z", // ~2 days prior "has_vector_search": true }Full-run result
{ "decay": { "memories_decayed": 62522, "metabolic_updates": 65937, "entities_decayed": 10770, "total_memories": 65949 }, "plasticity": { "ltp": 17, "ltd": 33092, "edges_updated": 33109 }, "pruning": { "edges_pruned": 32024, "entities_archived": 718 }, "compression": { "compressed_to_gist": 213, "compressed_to_tag": 0, "protected_skipped": 3088, "semantic_skipped": 38502 }, "cls": { "patterns_found": 0, "new_semantics_created": 0, "skipped_inconsistent": 0, "skipped_duplicate": 0, "causal_edges_found": 0 }, "memify": { "pruned": 0, "strengthened": 0, "reweighted": 1345 }, "cascade": { "advanced": 503, "transitions": [... 503 entries ...] }, "homeostatic": { "scaling_applied": false, "health_score": 0.0 }, "duration_ms": 2150496 }Observations
1. Most work is row-by-row across large result sets
entities. If this is implemented as per-row
UPDATEstatementsinside a loop, it is the most plausible source of the bulk of the
35 minutes.
These three stages together account for ~138K row operations. A single
set-based
UPDATE ... SET heat = heat * :decay WHERE ...(and theanalogous
UPDATE/DELETEfor plasticity and pruning) is typicallyorders of magnitude faster than per-row application logic.
2. CLS scanned ~25K episodic memories and produced zero output
A stage that finds nothing after a 2-day gap on a 25K-episodic store
is either miscalibrated (threshold too strict) or doing expensive
work with no observable effect. Either way, it's a candidate for
early-exit when the input set hasn't changed meaningfully since the
last run.
3. Homeostatic stage reports
health_score: 0.0, scaling skipped"scaling_applied": false, "health_score": 0.0looks like either adivide-by-zero / empty-input guard short-circuiting, or a metric that
legitimately is zero but whose name suggests a bug. Worth a glance.
4. Compression is heavily skipped
protected_skipped: 3088,semantic_skipped: 38502, with onlycompressed_to_gist: 213. Skips are ~99 % of the candidate set. Ifthose skip checks are cheap, this is fine — but if the skip path
re-reads each memory's content/flags from the DB, that's another
O(N) tax.
5. No per-stage duration in the result payload
Today the response includes only
duration_ms(total). Everythingabove is inferred from row counts. Per-stage timings would make
bottleneck identification mechanical instead of speculative.
Suggested telemetry (the most useful change, probably)
Return
duration_msper stage in the existing result dict:{ "decay": { ..., "duration_ms": 1234567 }, "plasticity": { ..., "duration_ms": 234567 }, "pruning": { ..., "duration_ms": 45678 }, "compression": { ..., "duration_ms": 12345 }, "cls": { ..., "duration_ms": 456789 }, // scanned but empty "memify": { ..., "duration_ms": 6789 }, "cascade": { ..., "duration_ms": 3456 }, "homeostatic": { ..., "duration_ms": 12 }, "duration_ms": 2150496 }If it's useful, I'd be happy to collect and share per-run timing
stats from my installation so you have at least one real-world
large-store data point to tune against.
Additional fields that would be valuable if cheap to collect:
rows_scannedvsrows_modifiedper stagedb_query_count,db_tx_countper stageavg_work_per_memory_mspeak_memory_mbdelta_vs_last_run_msSuggested optimizations (in rough order of likely payoff)
set instead of per-row updates. Same for plasticity LTD and pruning.
last_decayedtimestamp is within the configured cooldown — on a 2-day gap this
may not help, but on daily runs it should dramatically shrink the
working set.
since last run) is below some minimum. The current run scanned the
whole 25K to produce zero patterns.
*_skippedcounters in compression so it'sclear whether the skip check itself is the expense or only the
(rare) non-skip path.
health_score: 0.0+scaling_applied: false.Either the metric name is misleading (rename) or the scaling path
is silently disabled (fix).
Offer to help
If any of the above is actionable and you'd like:
and after a patched run),
duration_msperstage is a small, contained change),
happy to do it. Let me know what's most useful.
Repro