Better handling of batching of search in MG replicated mode#1718

Merged

rapids-bot[bot] merged 3 commits intorapidsai:mainfrom

viclafargue:fix-mg-replicated-batching

Feb 11, 2026

Contributor

viclafargue commented Jan 21, 2026 •

edited

Loading

Answers #1720

In multi-GPU replicated mode, the search query is divided in batches. These batches are ran in parallel with OpenMP. In some cases, there may be more batches than available GPUs causing a thread safety issue (at least for CAGRA indices). This change solves the issue. Each rank gets its own thread, that thread handles all batches for that rank sequentially. This prevents concurrent access to the same GPU from multiple threads.

github-project-automation Bot added this to Unstructured Data Processing

github-project-automation Bot moved this to Todo in Unstructured Data Processing

cjnolet assigned viclafargue

cjnolet added bug non-breaking labels

cjnolet moved this from Todo to In Progress in Unstructured Data Processing

Contributor

tfeher commented Jan 21, 2026

I would expect CAGRA to be thread safe. We expect that we can search using multiple threads to achieve large throughput for concurrent small batch size queries. We should fix this in CAGRA.

tfeher approved these changes

View reviewed changes

Contributor

tfeher left a comment

The changes look good to me as a workaround. But we should find the root cause, please open an issue to track it.

Member

cjnolet commented Jan 26, 2026

@viclafargue cam you also link the issue in a comment in the code before this is merged?

viclafargue requested a review from a team as a code owner

January 26, 2026 09:33

viclafargue mentioned this pull request

Fixed cuvs benchmark debug build issue (linker step fail) #1599

Merged

achirkin approved these changes

View reviewed changes

Contributor

achirkin commented Feb 7, 2026

/merge

1 similar comment

Contributor

bdice commented Feb 9, 2026

/merge

bdice mentioned this pull request

[BUG] Failing tests on RTX PRO 6000 #1782

Closed

Member

divyegala commented Feb 9, 2026

/merge

Contributor

bdice commented Feb 9, 2026 •

edited

Loading

@divyegala @achirkin This is failing to merge because it is targeting release/26.02. (Is it a hotfix?) If that's correct, an admin-merge is needed from the ops team. Otherwise it should be retargeted to main.

viclafargue changed the base branch from release/26.02 to main

February 10, 2026 09:08

viclafargue requested a review from a team as a code owner

February 10, 2026 09:08

viclafargue requested a review from KyleFromNVIDIA

February 10, 2026 09:08


          Better handling of batching of search in MG replicated mode

ae8d0b8

viclafargue force-pushed the fix-mg-replicated-batching branch from 7fda58b to ae8d0b8 Compare

February 10, 2026 09:13

Contributor Author

viclafargue commented Feb 10, 2026

@divyegala @achirkin This is failing to merge because it is targeting release/26.02. (Is it a hotfix?) If that's correct, an admin-merge is needed from the ops team. Otherwise it should be retargeted to main.

This is a mistake. I re-targeted to the main branch.


          Merge branch 'main' into fix-mg-replicated-batching

ed5d3b3

Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

5 similar comments

Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

9 similar comments

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.


          Merge branch 'main' into fix-mg-replicated-batching

94be1b7

Contributor

achirkin commented Feb 11, 2026

/merge

rapids-bot Bot merged commit a46fa87 into rapidsai:main

94 checks passed

github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing

viclafargue mentioned this pull request

[BUG] Issue with search batching in MG replicated mode #1720

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug non-breaking