feat: cherry-pick 3 ideas from OpenViking — dual-threshold, ranking, expand

iPythoning · claude · iPythoning · commit 1303a2004837 · 2026-03-31T21:42:19.000+08:00
1. Dual-threshold compression (proactive-summary.mjs):
   - 50% BACKGROUND_SAVE: extract key facts to ChromaDB without compressing
   - 65% COMPRESS: full L2 compression (existing behavior)
   - Backward-compatible exports (TOKEN_THRESHOLD still available)

2. Memory re-ranking (chroma.mjs):
   - Replace simple word-count scoring with 3-factor ranking:
     lexical overlap (normalized) + recency decay + tag weighting
   - Fix bug: `score + 0.5` → properly integrated into rankResult()

3. Archive expand (chroma.mjs):
   - New `expand &lt;turn_id&gt;` command to view full original text
   - Searches all customer directories by ID

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/scripts/proactive-summary.mjs b/scripts/proactive-summary.mjs
@@ -2,19 +2,23 @@
 /**
  * proactive-summary — L2 Proactive Compaction Trigger
  *
- * Monitors token usage and triggers conversation compression at 65% threshold.
+ * Dual-threshold token monitoring with background save + forced compression.
  * Uses a small/fast model (haiku-class) for compression to minimize cost.
  * Compressed summaries are stored in ChromaDB for long-term retrieval.
  *
+ * Thresholds:
+ *   - 50% (BACKGROUND_THRESHOLD): Non-blocking — extract key facts to ChromaDB without compressing
+ *   - 65% (COMPRESS_THRESHOLD): Blocking — full L2 compression (compress + store + trim)
+ *
  * Integration:
- *   - Called by OpenClaw's compaction.memoryFlush hook (softThresholdTokens: 12000)
+ *   - Called by OpenClaw's compaction.memoryFlush hook (softThresholdTokens: 10000)
  *   - Can also run standalone: node proactive-summary.mjs --context <file> --output <file>
  *
  * Architecture:
  *   1. Read current conversation context
- *   2. Estimate token usage (if >= 65% of model's context window, trigger)
- *   3. Extract key facts → update MemOS (safety net)
- *   4. Compress with haiku-class model (fast, cheap)
+ *   2. Estimate token usage → tri-state: OK / BACKGROUND_SAVE / COMPRESS
+ *   3. BACKGROUND_SAVE: Extract key facts to ChromaDB (no compression, no blocking)
+ *   4. COMPRESS: Full compression via haiku-class model (fast, cheap)
  *   5. Store compressed summary in ChromaDB
  *   6. Return compressed messages (summary + last 3 raw turns)
  */
@@ -53,7 +57,10 @@ Compress the conversation history into a structured summary with ZERO informatio
 - [topic summary]
 === End ===`;
 
-const TOKEN_THRESHOLD = parseFloat(process.env.TOKEN_THRESHOLD || '0.65');
+const COMPRESS_THRESHOLD = parseFloat(process.env.COMPRESS_THRESHOLD || '0.65');
+const BACKGROUND_THRESHOLD = parseFloat(process.env.BACKGROUND_THRESHOLD || '0.50');
+// Backward compat: old env var still works
+const TOKEN_THRESHOLD = COMPRESS_THRESHOLD;
 
 const MODEL_CONTEXT_WINDOWS = {
   'claude-haiku-4-5': 200000,
@@ -73,15 +80,42 @@ function estimateTokens(text) {
   return Math.ceil(otherChars / 4 + cjkChars / 2);
 }
 
+const BACKGROUND_SAVE_PROMPT = `Extract key facts from this conversation for background storage. Do NOT compress — just list the important data points.
+
+## Extract:
+- Customer BANT data (Budget, Authority, Need, Timeline)
+- All numbers (prices, quantities, dates, percentages)
+- Commitments from both sides
+- Decision signals and objections
+- Stage progression
+
+## Output format:
+=== Key Facts Snapshot (${new Date().toISOString()}) ===
+[Facts]
+1. fact
+2. fact
+[Numbers]
+- exact figure or quote
+[Open Items]
+- pending action or decision
+=== End ===`;
+
 function shouldTrigger(contextText, model) {
   const tokens = estimateTokens(contextText);
   const maxTokens = MODEL_CONTEXT_WINDOWS[model] || 128000;
   const usage = tokens / maxTokens;
-  return { trigger: usage >= TOKEN_THRESHOLD, usage, tokens, maxTokens };
+  let action = 'OK';
+  if (usage >= COMPRESS_THRESHOLD) action = 'COMPRESS';
+  else if (usage >= BACKGROUND_THRESHOLD) action = 'BACKGROUND_SAVE';
+  return { trigger: action !== 'OK', action, usage, tokens, maxTokens };
 }
 
 // Export for use as a module
-export { COMPRESSION_PROMPT, TOKEN_THRESHOLD, estimateTokens, shouldTrigger };
+export {
+  COMPRESSION_PROMPT, BACKGROUND_SAVE_PROMPT,
+  TOKEN_THRESHOLD, COMPRESS_THRESHOLD, BACKGROUND_THRESHOLD,
+  estimateTokens, shouldTrigger,
+};
 
 // ─── CLI ────────────────────────────────────────────────────
 if (process.argv[1]?.endsWith('proactive-summary.mjs')) {
@@ -90,7 +124,7 @@ if (process.argv[1]?.endsWith('proactive-summary.mjs')) {
   const contextFile = args.find((_, i) => args[i - 1] === '--context');
 
   if (args.includes('--help') || !contextFile) {
-    console.log(`proactive-summary — L2 Proactive Compaction
+    console.log(`proactive-summary — Dual-Threshold Compaction
 
 Usage:
   node proactive-summary.mjs --context <file> [--model <model>]
@@ -100,10 +134,13 @@ Options:
   --model     AI model name for threshold calculation (default: gpt-4o)
 
 Environment:
-  TOKEN_THRESHOLD  Trigger threshold (default: 0.65)
+  BACKGROUND_THRESHOLD  Background save threshold (default: 0.50)
+  COMPRESS_THRESHOLD    Compression threshold (default: 0.65)
 
-This script checks if compaction should trigger and outputs the compression prompt.
-Actual compression is performed by the OpenClaw agent using its configured model.`);
+Actions:
+  OK              — Below 50%, no action needed
+  BACKGROUND_SAVE — 50-65%, extract key facts to ChromaDB (non-blocking)
+  COMPRESS        — Above 65%, full L2 compression (blocking)`);
     process.exit(0);
   }
 
@@ -112,12 +149,16 @@ Actual compression is performed by the OpenClaw agent using its configured model
     const context = readFileSync(contextFile, 'utf-8');
     const result = shouldTrigger(context, model);
 
+    const prompt = result.action === 'COMPRESS' ? COMPRESSION_PROMPT
+      : result.action === 'BACKGROUND_SAVE' ? BACKGROUND_SAVE_PROMPT
+      : null;
+
     console.log(JSON.stringify({
       ...result,
       model,
-      threshold: TOKEN_THRESHOLD,
-      action: result.trigger ? 'COMPRESS' : 'OK',
-      compressionPrompt: result.trigger ? COMPRESSION_PROMPT : null,
+      backgroundThreshold: BACKGROUND_THRESHOLD,
+      compressThreshold: COMPRESS_THRESHOLD,
+      prompt,
     }, null, 2));
   } catch (e) {
     console.error(`Error: ${e.message}`);
diff --git a/skills/chroma-memory/chroma.mjs b/skills/chroma-memory/chroma.mjs
@@ -9,6 +9,7 @@
  *   node chroma.mjs store --customer <id> --turn <n> --user <msg> --agent <msg> [--stage <s>] [--topic <t>]
  *   node chroma.mjs search <query> [--customer <id>] [--limit <n>]
  *   node chroma.mjs recall <customer_id> [--limit <n>]
+ *   node chroma.mjs expand <turn_id>
  *   node chroma.mjs snapshot
  *   node chroma.mjs stats
  */
@@ -74,6 +75,28 @@ function storeTurn({ customer, turn, user, agent, stage, topic }) {
   return id;
 }
 
+// ─── Ranking: lexical overlap + recency decay + tag boost ─────
+const TAG_WEIGHTS = { has_order: 0.12, has_quote: 0.10, has_commitment: 0.10, has_objection: 0.08, has_sample: 0.05 };
+
+function rankResult(doc, queryWords) {
+  // Factor 1: normalized lexical overlap (0-1)
+  const textLower = doc.text.toLowerCase();
+  const matchCount = queryWords.reduce((s, w) => s + (textLower.includes(w) ? 1 : 0), 0);
+  const lexical = queryWords.length > 0 ? matchCount / queryWords.length : 0;
+
+  // Factor 2: recency decay — half-life 30 days, floor at 0.5
+  const ageDays = doc.timestamp ? (Date.now() - new Date(doc.timestamp).getTime()) / 86400000 : 30;
+  const recency = Math.max(0.5, 1 - ageDays / 60);
+
+  // Factor 3: tag boost
+  let tagBoost = 0;
+  for (const [tag, weight] of Object.entries(TAG_WEIGHTS)) {
+    if (doc[tag]) tagBoost += weight;
+  }
+
+  return lexical * 0.5 + recency * 0.3 + tagBoost;
+}
+
 function search(query, customer, limit = 5) {
   const queryWords = query.toLowerCase().split(/\s+/);
   const results = [];
@@ -88,13 +111,9 @@ function search(query, customer, limit = 5) {
     for (const file of readdirSync(dir).filter(f => f.endsWith('.json'))) {
       try {
         const doc = JSON.parse(readFileSync(join(dir, file), 'utf-8'));
-        if (doc.type === 'crm_snapshot') continue; // skip snapshots in search
-        const textLower = doc.text.toLowerCase();
-        const score = queryWords.reduce((s, w) => s + (textLower.includes(w) ? 1 : 0), 0);
-        // Boost tagged turns
-        if (doc.has_quote) score + 0.5;
-        if (doc.has_commitment) score + 0.5;
-        if (score > 0) results.push({ ...doc, score });
+        if (doc.type === 'crm_snapshot') continue;
+        const score = rankResult(doc, queryWords);
+        if (score > 0.1) results.push({ ...doc, score: Math.round(score * 1000) / 1000 });
       } catch { /* skip */ }
     }
   }
@@ -138,6 +157,38 @@ function snapshot() {
   console.log(`CRM snapshot marker created: ${id}`);
 }
 
+function expand(id) {
+  const dirs = readdirSync(CHROMA_DIR, { withFileTypes: true })
+    .filter(d => d.isDirectory())
+    .map(d => join(CHROMA_DIR, d.name));
+
+  for (const dir of dirs) {
+    const file = join(dir, `${id}.json`);
+    if (existsSync(file)) {
+      const doc = JSON.parse(readFileSync(file, 'utf-8'));
+      console.log(`=== Turn ${doc.turn_number || '?'} — ${doc.customer_id || 'unknown'} (${doc.timestamp || '?'}) ===`);
+      console.log(`Stage: ${doc.stage || 'unknown'} | Topic: ${doc.topic || 'general'}`);
+      const tags = Object.entries(TAG_WEIGHTS).filter(([t]) => doc[t]).map(([t]) => t);
+      if (tags.length) console.log(`Tags: ${tags.join(', ')}`);
+      console.log(`\n--- Customer ---\n${doc.user_message || '(empty)'}`);
+      console.log(`\n--- Agent ---\n${doc.agent_response || '(empty)'}`);
+      console.log(`\n=== End ===`);
+      return doc;
+    }
+  }
+
+  // Also check top-level files (snapshots)
+  const topFile = join(CHROMA_DIR, `${id}.json`);
+  if (existsSync(topFile)) {
+    const doc = JSON.parse(readFileSync(topFile, 'utf-8'));
+    console.log(JSON.stringify(doc, null, 2));
+    return doc;
+  }
+
+  console.log(`Turn ${id} not found in any customer directory.`);
+  return null;
+}
+
 function stats() {
   let totalTurns = 0;
   let totalSnapshots = 0;
@@ -208,12 +259,15 @@ switch (command) {
   case 'recall':
     console.log(JSON.stringify(recall(opts._positional || args[0], parseInt(opts.limit) || 10), null, 2));
     break;
+  case 'expand':
+    expand(opts._positional || args[0]);
+    break;
   case 'snapshot':
     snapshot();
     break;
   case 'stats':
     stats();
     break;
   default:
-    console.log('Usage: chroma.mjs <store|search|recall|snapshot|stats> [args]');
+    console.log('Usage: chroma.mjs <store|search|recall|expand|snapshot|stats> [args]');
 }
diff --git a/workspace/MEMORY.md b/workspace/MEMORY.md
@@ -5,15 +5,15 @@
 ```
 Message In → L1 MemOS auto-recall
            → L3 chroma:store (every turn)
-           → L2 proactive-summary (at 65% token usage)
+           → L2 dual-threshold (50% background save → 65% compress)
            → L4 CRM snapshot (daily 12:00 fallback)
 ```
 
 | Layer | Engine | How It Works | Your Action |
 |-------|--------|-------------|-------------|
 | **L1: MemOS** | Structured memory | Auto-injects past memories at conversation start, auto-captures BANT/commitments/objections at end | Read what it gives you |
-| **L2: Proactive Summary** | Token monitoring | Compresses at 65% context usage via haiku-class model. Zero info loss on numbers/quotes/commitments | Embed key-data summary past 20 turns |
-| **L3: ChromaDB** | Per-turn vector store | Every turn stored with customer_id isolation + auto-tagging (quotes, commitments, objections) | Use `chroma:search` before outreach |
+| **L2: Proactive Summary** | Dual-threshold monitoring | **50%**: background save key facts to ChromaDB (non-blocking). **65%**: full compression via haiku-class model. Zero info loss on numbers/quotes/commitments | Embed key-data summary past 20 turns |
+| **L3: ChromaDB** | Per-turn store | Every turn stored with customer_id isolation + auto-tagging. Search uses recency-weighted ranking | Use `chroma:search` before outreach |
 | **L4: CRM Snapshot** | Daily backup | 12:00 daily pipeline snapshot to ChromaDB as disaster recovery | None — automatic |
 
 ## Operating Rules (Every Conversation)
@@ -43,6 +43,7 @@ memory:stats
 chroma:store --customer "+971501234567" --turn 5 --user "price?" --agent "let me quote..." --stage qualifying --topic pricing
 chroma:search "pricing discussion Dubai" --customer "+971501234567" --limit 5
 chroma:recall "+971501234567" --limit 10
+chroma:expand <turn_id>   -- View full original text of a compressed/archived turn
 chroma:snapshot
 chroma:stats
 ```
@@ -80,16 +81,23 @@ Every stored turn is automatically analyzed and tagged:
 | `has_order` | "place order", "confirm purchase", "deposit" |
 | `has_sample` | "sample", "trial", "prototype" |
 
-## L2 Compression Rules
+## L2 Dual-Threshold Compression
 
-When token usage hits 65%, the proactive summary engine:
+**At 50% token usage** (BACKGROUND_SAVE):
+- Non-blocking background extraction of key facts
+- Facts stored to ChromaDB — no conversation compression
+- Protects critical data early in case of unexpected context loss
+
+**At 65% token usage** (COMPRESS):
 1. Updates MemOS first (safety net)
 2. Compresses with haiku-class model (fast, cheap)
 3. **Preserves verbatim**: all numbers, quotes, commitments, BANT data
 4. **Compresses**: small talk, repeated intros, multi-round confirmations
 5. Stores compressed summary in ChromaDB
 6. Keeps last 3 raw turns uncompressed
 
+**Recover compressed turns**: Use `chroma:expand <turn_id>` to view full original text.
+
 ## CRM Column Mapping
 > See USER.md → CRM Configuration