feat: sanitize transport metadata before embedding/display by 76159482 · Pull Request #103 · CortexReach/memory-lancedb-pro

76159482 · 2026-03-08T02:17:40Z

Problem

OpenClaw's chat transport layers (Telegram, Discord, etc.) inject metadata blocks into message content:

Conversation info (untrusted metadata): \``json...````
Sender (untrusted metadata): \``json...````
[Queued messages while agent was busy]

These wrappers pollute memory embeddings and retrieval results, causing:

Irrelevant memories to match on metadata keywords
Noisy display in memory_recall output
Wasted embedding API calls on transport boilerplate

Solution

Add sanitizeMemoryText() function to strip known metadata patterns:

Removes metadata blocks via regex patterns
Compresses excessive whitespace
Preserves human-readable content

Integrated at 3 key points:

memory_recall - Clean text before display
memory_store - Clean text before embedding
memory_update - Clean text before re-embedding

Implementation

noise-filter.ts: Add METADATA_BLOCK_PATTERNS + sanitizeMemoryText()
tools.ts: Call sanitizeMemoryText() at embedding/display points
Backward compatible: No config changes required
Fail-safe: Returns original text if sanitization fails

Testing

Tested in production for 2+ weeks with OpenClaw Telegram transport.

Observed improvements:

Reduced false-positive matches on metadata keywords
Cleaner memory_recall output
No regressions in existing functionality

## Problem OpenClaw's chat transport layers (Telegram, Discord, etc.) inject metadata blocks into message content: - 'Conversation info (untrusted metadata): ' - 'Sender (untrusted metadata): ' - '[Queued messages while agent was busy]' These wrappers pollute memory embeddings and retrieval results, causing: 1. Irrelevant memories to match on metadata keywords 2. Noisy display in memory_recall output 3. Wasted embedding API calls on transport boilerplate ## Solution Add `sanitizeMemoryText()` function to strip known metadata patterns: - Removes metadata blocks via regex patterns - Compresses excessive whitespace - Preserves human-readable content Integrated at 3 key points: 1. `memory_recall` - Clean text before display 2. `memory_store` - Clean text before embedding 3. `memory_update` - Clean text before re-embedding ## Implementation - `noise-filter.ts`: Add METADATA_BLOCK_PATTERNS + sanitizeMemoryText() - `tools.ts`: Call sanitizeMemoryText() at embedding/display points - Backward compatible: No config changes required - Fail-safe: Returns original text if sanitization fails ## Testing Tested in production for 2+ weeks with OpenClaw Telegram transport. Observed improvements: - Reduced false-positive matches on metadata keywords - Cleaner memory_recall output - No regressions in existing functionality

rwmjhb · 2026-03-08T07:22:10Z

Thanks for tackling this.

I do see 3 issues with the current implementation:

It only covers part of the ingestion path
This PR updates memory_recall, memory_store, and memory_update, but the plugin's autoCapture path still reads raw event.messages content and directly embeds/stores it without applying this sanitization. So these wrappers can still enter the database through auto-capture.
memory_store is internally inconsistent
It uses cleanedText for embedding and duplicate detection, but still writes the original text to store.store() and mdMirror. That means the vector and persisted text can diverge, and the transport metadata is still being stored even though recall output looks cleaner.
There is no automated test coverage for these wrapper patterns
Since this is a behavior fix for a real production issue, the missing tests are a gap. Right now there is nothing locking in the expected behavior for these exact metadata envelopes.

So my assessment is: the problem is real and the direction is correct, but the current implementation is still partial and does not fully fix it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sanitize transport metadata before embedding/display#103

feat: sanitize transport metadata before embedding/display#103
76159482 wants to merge 1 commit intoCortexReach:mainfrom
76159482:feat/sanitize-transport-metadata

76159482 commented Mar 8, 2026

Uh oh!

rwmjhb commented Mar 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

76159482 commented Mar 8, 2026

Problem

Solution

Implementation

Testing

Uh oh!

rwmjhb commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rwmjhb commented Mar 8, 2026 •

edited

Loading