Skip to content

perf: add benchmark for distributed vector merge finalization#6176

Open
Xuanwo wants to merge 2 commits intomainfrom
xuanwo/distributed-merge-benchmark
Open

perf: add benchmark for distributed vector merge finalization#6176
Xuanwo wants to merge 2 commits intomainfrom
xuanwo/distributed-merge-benchmark

Conversation

@Xuanwo
Copy link
Collaborator

@Xuanwo Xuanwo commented Mar 12, 2026

This adds a dedicated benchmark for distributed vector index finalization and a small query-side metric to count find_partitions calls. Together they give us a baseline for analyzing the current single-node merge bottleneck and for evaluating future segmented-index work.

As context, the new distributed_merge_only_ivf_pq benchmark already shows that finalize cost grows much faster than input bytes as shard count and partition count increase. In the local filesystem benchmark, the mean finalize time grows from about 64 ms at 8 shards / 256 partitions to about 2.87 s at 128 shards / 1024 partitions.


Based on this benchmark, I noticed that our current logic performs poorly as the number of shards increases.

2026-03-12-distributed-merge-trend-matplotlib

@github-actions
Copy link
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@github-actions
Copy link
Contributor

PR Review

Clean PR — benchmark is well-structured and the metric addition is minimal and well-tested.

One minor issue

Potential division by zero in write_case_metadata (line ~338 of the benchmark):

partial_aux_bytes_per_shard: fixture.partial_aux_bytes / fixture.partial_dir_count as u64,

If partial_dir_count is 0 (e.g., fixture building failed silently or was skipped), this will panic. Consider using .checked_div().unwrap_or(0) or guarding against it. Low severity since this is benchmark-only code, but it could make debugging confusing if a fixture doesn't build properly.

Looks good

  • The find_partitions_calls metric is a clean addition with proper test coverage across all relevant test cases.
  • Benchmark design with fixture caching and iter_batched for isolation is solid.
  • No concerns with dependencies — all are already in the workspace.

🟢 LGTM with the optional nit above.

@Xuanwo Xuanwo changed the title Add benchmark for distributed vector merge finalization perf: add benchmark for distributed vector merge finalization Mar 12, 2026
@codecov
Copy link

codecov bot commented Mar 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@BubbleCal BubbleCal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

dataset
}

async fn train_shared_ivf_pq(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

believe we have such helper function already, maybe just need to change its visibility

index_metrics: IndexMetrics::new(metrics, partition),
partitions_ranked: metrics.new_count(PARTITIONS_RANKED_METRIC, partition),
deltas_searched: metrics.new_count(DELTAS_SEARCHED_METRIC, partition),
find_partitions_calls: metrics.new_count(FIND_PARTITIONS_CALLS_METRIC, partition),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about changing this to FIND_PARTITIONS_ELAPSED_METRIC?
we may introduce some optimizations for find_partitions in the future, the number of calls can't prove that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants