perf: add benchmark for distributed vector merge finalization#6176
perf: add benchmark for distributed vector merge finalization#6176
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
PR ReviewClean PR — benchmark is well-structured and the metric addition is minimal and well-tested. One minor issuePotential division by zero in partial_aux_bytes_per_shard: fixture.partial_aux_bytes / fixture.partial_dir_count as u64,If Looks good
🟢 LGTM with the optional nit above. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| dataset | ||
| } | ||
|
|
||
| async fn train_shared_ivf_pq( |
There was a problem hiding this comment.
believe we have such helper function already, maybe just need to change its visibility
| index_metrics: IndexMetrics::new(metrics, partition), | ||
| partitions_ranked: metrics.new_count(PARTITIONS_RANKED_METRIC, partition), | ||
| deltas_searched: metrics.new_count(DELTAS_SEARCHED_METRIC, partition), | ||
| find_partitions_calls: metrics.new_count(FIND_PARTITIONS_CALLS_METRIC, partition), |
There was a problem hiding this comment.
how about changing this to FIND_PARTITIONS_ELAPSED_METRIC?
we may introduce some optimizations for find_partitions in the future, the number of calls can't prove that
This adds a dedicated benchmark for distributed vector index finalization and a small query-side metric to count
find_partitionscalls. Together they give us a baseline for analyzing the current single-node merge bottleneck and for evaluating future segmented-index work.As context, the new
distributed_merge_only_ivf_pqbenchmark already shows that finalize cost grows much faster than input bytes as shard count and partition count increase. In the local filesystem benchmark, the mean finalize time grows from about64 msat8 shards / 256 partitionsto about2.87 sat128 shards / 1024 partitions.Based on this benchmark, I noticed that our current logic performs poorly as the number of shards increases.