feat: improve FTS5 search foundations by BYK · Pull Request #46 · BYK/opencode-lore

BYK · 2026-03-22T14:24:59Z

Phase 1 of search improvements

Fixes the FTS5 search foundations as the first step toward a comprehensive search overhaul.

Changes

New: src/search.ts — centralized search module

ftsQuery() — AND-based FTS5 query builder with stopword + single-char filtering
ftsQueryOr() — OR-based variant for fallback when AND returns nothing
STOPWORDS — conservative set (only genuinely content-free words, preserves domain terms like handle, state, type)
EMPTY_QUERY sentinel for all-stopword queries

Fixed: Knowledge search ranking

Was: ORDER BY updated_at DESC (most recently edited wins regardless of relevance)
Now: ORDER BY bm25(knowledge_fts, 6.0, 2.0, 3.0) (title matches weighted 6x, category 3x)
Uses JOIN pattern instead of subquery for proper rank access

New: distillation_fts table (schema migration v7)

FTS5 on observations column with porter unicode61 tokenizer
Replaces LIKE-based distillation search with BM25-ranked FTS5 search
Backfills existing data, sync triggers for INSERT/UPDATE/DELETE

Improved: AND→OR fallback pattern

All search functions try AND first (precision), fall back to OR when nothing matches (recall)
Blanket OR was tested empirically and rejected — adds noise even with stopwords

New: "Too vague" handling

When query is all stopwords/single-chars, recall tool returns guidance message instead of empty results
Prompts the LLM to reformulate with specific keywords

Test coverage

23 new tests in test/search.test.ts (query building, stopwords, edge cases)
New BM25 ranking test + AND→OR fallback test in test/ltm.test.ts
Schema v7 + distillation_fts verification in test/db.test.ts
Updated temporal.test.ts for new import path + behavior

- Create src/search.ts with centralized ftsQuery/ftsQueryOr functions - Add stopword filtering (conservative list, preserves domain terms) - Drop single-char tokens (contraction artifacts) but keep 2-char+ terms - Implement AND-then-OR fallback: AND first for precision, OR when AND returns nothing - Fix knowledge search to use BM25 rank instead of updated_at DESC - Uses bm25() with column weights: title=6.0, content=2.0, category=3.0 - JOIN pattern instead of subquery for proper rank access - Add distillation_fts table (schema migration v7) - FTS5 on observations column with porter unicode61 tokenizer - Backfill existing data, sync triggers for INSERT/UPDATE/DELETE - Replace LIKE-based distillation search with FTS5 ranked search - Add 'too vague' handling in recall tool for all-stopword queries - Remove ftsQuery from temporal.ts (now in search.ts, no re-export)

## Phase 2 of search improvements (depends on #46) Adds cross-source score fusion using Reciprocal Rank Fusion and rewrites the recall tool to produce a single ranked result list. ### Changes **New in `src/search.ts`** - `reciprocalRankFusion<T>()` — merges multiple ranked lists using RRF (k=60, Cormack et al. 2009). Rank-based, not score-based, so magnitude differences across FTS tables don't matter. - `normalizeRank()` — min-max normalization of FTS5 BM25 ranks to 0–1 (for display only) **New scored search variants** - `ltm.searchScored()` — returns `KnowledgeEntry & { rank }` with BM25 scores via `bm25(knowledge_fts, 6, 2, 3)` - `temporal.searchScored()` — returns `TemporalMessage & { rank }` - `searchDistillationsScored()` — returns `Distillation & { rank }` All scored variants include AND→OR fallback (same as Phase 1 search functions). **Rewritten recall tool** - Runs all 3 scored searches, tags results with source type - Fuses via RRF into a single ranked list - Output format: source-annotated list (`[knowledge/category]`, `[distilled]`, `[temporal/role]`) - Most relevant results appear first regardless of which source they came from ### Test coverage - 11 new tests for `normalizeRank()` and `reciprocalRankFusion()` - Tests cover: multi-list merge, dedup, empty lists, single list, custom k, score correctness

…embedding BLOBs (#52) ## Problem Both transform hooks in `src/index.ts` — `experimental.chat.system.transform` and `experimental.chat.messages.transform` — had no try-catch wrapping. Any SQLite error (corruption, busy timeout, schema mismatch) propagated through OpenCode's Plugin.trigger mechanism and surfaced as a 500 "Internal server error", halting the user's session. Additionally, after adding the `embedding BLOB` column (schema v8), all `SELECT *` queries in `ltm.ts` were unnecessarily loading 4KB of Float32Array data per knowledge entry (~200KB per `forSession()` call) that was immediately discarded. ## Investigation: Embedding/vector search link The embedding/vector code is **not in the transform hook call path** — `forSession()` uses only FTS5 BM25, not embeddings. The 500 errors were a latent bug (unprotected hooks) that predated the embedding feature. The temporal correlation with the Voyage AI rollout was coincidental — it coincided with the search overhaul (PRs #46-#50). ## Changes ### Error handling (`src/index.ts`, `src/gradient.ts`) - **system.transform**: Wrap knowledge injection in try-catch. On error: log via `log.error()`, reset `setLtmTokens(0)`, push fallback note directing LLM to use recall tool. Track degraded sessions to avoid busting the provider's read-token cache on recovery — if conversation is longer than LTM content, keep fallback note. - **messages.transform**: Wrap entire transform path in try-catch. On error: log and leave `output.messages` unmodified (layer 0 passthrough). - Export `getLastTransformEstimate()` from gradient.ts for the cache trade-off calculation. ### Performance (`src/ltm.ts`) - Define `KNOWLEDGE_COLS` / `KNOWLEDGE_COLS_K` constants listing exactly the 11 columns in `KnowledgeEntry`, excluding `embedding`. - Replace all 10 `SELECT *` / `SELECT k.*` queries across 8 functions. ### Tests (`test/index.test.ts`) 4 new tests: 1. system.transform survives DB error → fallback note + `getLtmTokens() === 0` 2. messages.transform survives DB error → messages unchanged 3. LTM recovery skipped on long session (preserves prompt cache) 4. LTM recovery proceeds on short session (cheap cache bust)

BYK enabled auto-merge (squash) March 22, 2026 14:25

BYK merged commit 60cfc76 into main Mar 22, 2026
1 check passed

BYK deleted the feat/fts5-foundations branch March 22, 2026 14:25

BYK mentioned this pull request Mar 22, 2026

feat: add RRF score fusion and rewrite recall tool #47

Merged

BYK mentioned this pull request Mar 24, 2026

fix: catch unhandled exceptions in transform hooks and avoid loading embedding BLOBs #52

Merged

craft-deployer bot mentioned this pull request Mar 24, 2026

publish: BYK/opencode-lore@0.7.0 #53

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve FTS5 search foundations#46

feat: improve FTS5 search foundations#46
BYK merged 1 commit intomainfrom
feat/fts5-foundations

BYK commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented Mar 22, 2026

Phase 1 of search improvements

Changes

Test coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant