Skip to content

perf: WAND and BM25 read optimizations#6214

Draft
esteban wants to merge 2 commits intolance-format:mainfrom
esteban:perf/wand-bm25-optimizations
Draft

perf: WAND and BM25 read optimizations#6214
esteban wants to merge 2 commits intolance-format:mainfrom
esteban:perf/wand-bm25-optimizations

Conversation

@esteban
Copy link
Contributor

@esteban esteban commented Mar 17, 2026

Optimize the FTS query hot path for lower CPU usage:

  • Inline BM25 scoring into WAND inner loop, pre-sort postings by df
  • Phase-split scheduler: async I/O loading then rayon CPU compute
  • Dedicated rayon thread pool sized to Lance CPU budget
  • Partition-level stats cache and per-token doc_freq cache

Benchmarking is WIP.

Esteban Gutierrez and others added 2 commits March 16, 2026 22:39
Optimize the FTS query hot path for lower CPU usage:
- Inline BM25 scoring into WAND inner loop, pre-sort postings by df
- Phase-split scheduler: async I/O loading then rayon CPU compute
- Dedicated rayon thread pool sized to Lance CPU budget
- Partition-level stats cache and per-token doc_freq cache
@github-actions
Copy link
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@github-actions
Copy link
Contributor

PR Review: perf: WAND and BM25 read optimizations

The overall approach is sound — phase-splitting async I/O from CPU compute, caching partition stats, pre-extracting block metadata, and avoiding redundant decompression via cached_doc_id are all good optimizations.

P1: Unbounded token_doc_freq_cache growth

token_doc_freq_cache (Mutex<HashMap<String, usize>>) grows without bound over the lifetime of the Arc<InvertedPartition>. Since partitions are long-lived (shared via Arc), every unique query token ever seen accumulates in this cache with no eviction. For services handling diverse FTS queries, this is a slow memory leak.

Consider either:

  • A bounded LRU cache (e.g., lru::LruCache or a simple size check before insert)
  • Or, since the cache is mainly valuable within a single query's scoring, scope it to IndexBM25Scorer instead of the partition

P1: No benchmark results

This is a perf-only PR but the description says "Benchmarking is WIP." The changes are non-trivial (new thread pool, phase splitting, caching, Ord rewrite) and could regress edge cases. Please include before/after numbers before merging — at minimum for single-partition and multi-partition FTS queries.

Minor observations (not blocking)

  • move_preceding (line 762) still uses .doc().is_none() instead of .empty() — inconsistent with the other changes and a missed optimization on the same hot path.
  • The Clone impl for InvertedPartition resets all caches. This is fine since the hot path clones Arc, but it's worth a doc comment noting that cloning the inner struct intentionally drops caches.
  • The check_pivot_aligned bubble_up removal looks correct — all postings 0..=pivot share pivot_doc so sort order is preserved, and move_preceding re-sorts after scoring.
  • cached_doc_id consistency looks correct across all update sites in next() for both compressed and plain paths.

🤖 Generated with Claude Code

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 75.12438% with 50 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/index.rs 70.58% 38 Missing and 2 partials ⚠️
rust/lance-index/src/scalar/inverted/wand.rs 80.00% 9 Missing ⚠️
rust/lance-index/src/scalar/inverted/scorer.rs 95.00% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant