Skip to content

feat!: add progress monitoring via callbacks for distributed merge#6210

Open
vivek-bharathan wants to merge 2 commits intolance-format:mainfrom
vivek-bharathan:feat/progressfordistributedmerge
Open

feat!: add progress monitoring via callbacks for distributed merge#6210
vivek-bharathan wants to merge 2 commits intolance-format:mainfrom
vivek-bharathan:feat/progressfordistributedmerge

Conversation

@vivek-bharathan
Copy link
Contributor

No description provided.

@github-actions
Copy link
Contributor

PR Review

Bug (P0): Inconsistent Mutex type in inverted index test

In rust/lance-index/src/scalar/inverted/builder.rs, the RecordingProgress test helper uses tokio::sync::Mutex (with .lock().await), while the equivalent helpers in btree.rs and index_merger.rs use std::sync::Mutex (with .lock().unwrap()). This will compile (since tokio::sync::Mutex<T: Send> is Sync), but it's needlessly inconsistent and introduces an unnecessary async lock for a simple test helper. Should use std::sync::Mutex for consistency.

Magic number (P1): * 6 in inverted index rename total

In rust/lance-index/src/scalar/inverted/builder.rs, the total renames is calculated as:

let total_renames = id_mapping
    .iter()
    .filter(|(old_id, new_id)| old_id != new_id)
    .count() as u64
    * 6;

The 6 comes from 3 file suffixes × 2 rename phases, but this is fragile and not documented. If a file type is added/removed, this silently breaks the progress total. Consider deriving it from the actual constants, e.g.:

let num_suffixes = [TOKENS_FILE, INVERT_LIST_FILE, DOCS_FILE].len() as u64;
let num_phases = 2u64; // temp + final
let total_renames = changed_count * num_suffixes * num_phases;

Minor note: RecordingProgress duplicated 3 times

The same RecordingProgress test helper is copy-pasted in btree.rs, inverted/builder.rs, and index_merger.rs (plus ivf/v2.rs). Consider extracting it into lance-testing or a shared test utility module in lance-index to avoid divergence (as already evidenced by the Mutex inconsistency above).


Overall: the approach of threading Arc<dyn IndexBuildProgress> through the merge paths is clean and well-tested. The test coverage is good with stage ordering assertions. The issues above are the main things to address before merge.

@codecov
Copy link

codecov bot commented Mar 17, 2026

@vivek-bharathan vivek-bharathan force-pushed the feat/progressfordistributedmerge branch from 62b4466 to 4037180 Compare March 17, 2026 18:26
@vivek-bharathan vivek-bharathan marked this pull request as ready for review March 17, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant