fix: add per-ID mutex lock for update() to prevent concurrent corruption#143
fix: add per-ID mutex lock for update() to prevent concurrent corruption#143baizenghu wants to merge 1 commit intoCortexReach:mainfrom
Conversation
…add data loss Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Review: fix: add per-ID mutex lock for update() to prevent concurrent delete+add data lossVerdict: Fix-then-merge (two blockers — see below) ✅ What's workingThe core mutex design is correct for JavaScript's single-threaded event loop: // withUpdateLock — per-ID Promise-chain mutex
const prev = this.updateLocks.get(id) ?? Promise.resolve();
const next = new Promise<void>(r => { resolve = r; });
this.updateLocks.set(id, next);
try {
await prev; // wait for previous holder
return await fn();
} finally {
resolve!(); // release next waiter
if (this.updateLocks.get(id) === next) this.updateLocks.delete(id); // cleanup
}
🔴 Blocker 1: Direct conflict with PR #144The 102 lines added to Whichever PR merges second will have a merge conflict on 🔴 Blocker 2: Zero test coverage for the mutex logicA concurrent data-corruption fix with no tests is not safe to ship. If someone later simplifies // 1. Two concurrent updates on the same ID — only the last one should win
// 2. bulkDelete blocks during a concurrent update — no interleaving
// 3. Sequential updates still produce correct results (sanity check)Wire it into 🟡 Known limitation: prefix vs full UUID lock key mismatch
This is low-probability in practice, but it's a gap in the stated guarantee. Suggested fix: resolve prefix → full UUID before acquiring the lock, then use the canonical full UUID as the key. Or at minimum, add a code comment documenting this limitation. ⚪ Non-blocking
The mutex design itself is solid and addresses a real data-loss bug. The two blockers are the tools.ts duplication (coordination issue) and missing tests — neither is about the lock logic itself. |
Problem
When a single agent processes rapid consecutive messages, the
auto-capturehook can trigger multipleupdate()calls on the same memory entry simultaneously. Sinceupdate()performs a read-modify-write cycle without any locking, this leads to data corruption:Solution
withUpdateLock) using a Promise-chain pattern (zero external dependencies)withBulkDeleteLock) that drains all in-flight updates before executing bulk deleteImplementation Details
updateLocks: Map<string, Promise<void>>— per-ID serialization via Promise chainingbulkDeleteLock: Promise<void>— global barrier thatbulkDelete()waits on before proceedingTesting
Verified in production environment with agents handling rapid consecutive messages. No more orphaned entries or lost updates after applying this fix.