perf: add benchmark for distributed vector merge finalization by Xuanwo · Pull Request #6176 · lance-format/lance

Xuanwo · 2026-03-12T09:59:01Z

This adds a dedicated benchmark for distributed vector index finalization and a small query-side metric to count find_partitions calls. Together they give us a baseline for analyzing the current single-node merge bottleneck and for evaluating future segmented-index work.

As context, the new distributed_merge_only_ivf_pq benchmark already shows that finalize cost grows much faster than input bytes as shard count and partition count increase. In the local filesystem benchmark, the mean finalize time grows from about 64 ms at 8 shards / 256 partitions to about 2.87 s at 128 shards / 1024 partitions.

Based on this benchmark, I noticed that our current logic performs poorly as the number of shards increases.

2026-03-12-distributed-merge-trend-matplotlib

github-actions · 2026-03-12T09:59:20Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

github-actions · 2026-03-12T10:00:09Z

PR Review

Clean PR — benchmark is well-structured and the metric addition is minimal and well-tested.

One minor issue

Potential division by zero in write_case_metadata (line ~338 of the benchmark):

partial_aux_bytes_per_shard: fixture.partial_aux_bytes / fixture.partial_dir_count as u64,

If partial_dir_count is 0 (e.g., fixture building failed silently or was skipped), this will panic. Consider using .checked_div().unwrap_or(0) or guarding against it. Low severity since this is benchmark-only code, but it could make debugging confusing if a fixture doesn't build properly.

Looks good

The find_partitions_calls metric is a clean addition with proper test coverage across all relevant test cases.
Benchmark design with fixture caching and iter_batched for isolation is solid.
No concerns with dependencies — all are already in the workspace.

🟢 LGTM with the optional nit above.

codecov · 2026-03-12T10:38:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

BubbleCal

LGTM

BubbleCal · 2026-03-18T09:53:25Z

rust/lance/benches/distributed_vector_build.rs

+    dataset
+}
+
+async fn train_shared_ivf_pq(


believe we have such helper function already, maybe just need to change its visibility

BubbleCal · 2026-03-18T09:54:53Z

rust/lance/src/io/exec/knn.rs

            index_metrics: IndexMetrics::new(metrics, partition),
            partitions_ranked: metrics.new_count(PARTITIONS_RANKED_METRIC, partition),
            deltas_searched: metrics.new_count(DELTAS_SEARCHED_METRIC, partition),
+            find_partitions_calls: metrics.new_count(FIND_PARTITIONS_CALLS_METRIC, partition),


how about changing this to FIND_PARTITIONS_ELAPSED_METRIC?
we may introduce some optimizations for find_partitions in the future, the number of calls can't prove that

Add distributed vector merge benchmark

9141715

Xuanwo changed the title ~~Add benchmark for distributed vector merge finalization~~ perf: add benchmark for distributed vector merge finalization Mar 12, 2026

Merge branch 'main' into xuanwo/distributed-merge-benchmark

6e346d4

github-actions bot added the performance label Mar 12, 2026

BubbleCal approved these changes Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add benchmark for distributed vector merge finalization#6176

perf: add benchmark for distributed vector merge finalization#6176
Xuanwo wants to merge 2 commits intomainfrom
xuanwo/distributed-merge-benchmark

Xuanwo commented Mar 12, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

codecov bot commented Mar 12, 2026

Uh oh!

BubbleCal left a comment

Uh oh!

BubbleCal Mar 18, 2026

Uh oh!

BubbleCal Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Xuanwo commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

github-actions bot commented Mar 12, 2026

PR Review

One minor issue

Looks good

Uh oh!

codecov bot commented Mar 12, 2026

Codecov Report

Uh oh!

BubbleCal left a comment

Choose a reason for hiding this comment

Uh oh!

BubbleCal Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

BubbleCal Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Xuanwo commented Mar 12, 2026 •

edited

Loading