Skip to content

Better handling of batching of search in MG replicated mode#1718

Merged
rapids-bot[bot] merged 3 commits intorapidsai:mainfrom
viclafargue:fix-mg-replicated-batching
Feb 11, 2026
Merged

Better handling of batching of search in MG replicated mode#1718
rapids-bot[bot] merged 3 commits intorapidsai:mainfrom
viclafargue:fix-mg-replicated-batching

Conversation

@viclafargue
Copy link
Copy Markdown
Contributor

@viclafargue viclafargue commented Jan 21, 2026

Answers #1720

In multi-GPU replicated mode, the search query is divided in batches. These batches are ran in parallel with OpenMP. In some cases, there may be more batches than available GPUs causing a thread safety issue (at least for CAGRA indices). This change solves the issue. Each rank gets its own thread, that thread handles all batches for that rank sequentially. This prevents concurrent access to the same GPU from multiple threads.

@cjnolet cjnolet added bug Something isn't working non-breaking Introduces a non-breaking change labels Jan 21, 2026
@cjnolet cjnolet moved this from Todo to In Progress in Unstructured Data Processing Jan 21, 2026
@tfeher
Copy link
Copy Markdown
Contributor

tfeher commented Jan 21, 2026

I would expect CAGRA to be thread safe. We expect that we can search using multiple threads to achieve large throughput for concurrent small batch size queries. We should fix this in CAGRA.

Copy link
Copy Markdown
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good to me as a workaround. But we should find the root cause, please open an issue to track it.

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Jan 26, 2026

@viclafargue cam you also link the issue in a comment in the code before this is merged?

@achirkin
Copy link
Copy Markdown
Contributor

achirkin commented Feb 7, 2026

/merge

1 similar comment
@bdice
Copy link
Copy Markdown
Contributor

bdice commented Feb 9, 2026

/merge

@divyegala
Copy link
Copy Markdown
Member

/merge

@bdice
Copy link
Copy Markdown
Contributor

bdice commented Feb 9, 2026

@divyegala @achirkin This is failing to merge because it is targeting release/26.02. (Is it a hotfix?) If that's correct, an admin-merge is needed from the ops team. Otherwise it should be retargeted to main.

@viclafargue viclafargue changed the base branch from release/26.02 to main February 10, 2026 09:08
@viclafargue viclafargue requested a review from a team as a code owner February 10, 2026 09:08
@viclafargue viclafargue force-pushed the fix-mg-replicated-batching branch from 7fda58b to ae8d0b8 Compare February 10, 2026 09:13
@viclafargue
Copy link
Copy Markdown
Contributor Author

@divyegala @achirkin This is failing to merge because it is targeting release/26.02. (Is it a hotfix?) If that's correct, an admin-merge is needed from the ops team. Otherwise it should be retargeted to main.

This is a mistake. I re-targeted to the main branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

5 similar comments
@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 10, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

9 similar comments
@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot Bot commented Feb 11, 2026

This PR's base branch has been changed since the last /merge command. Please issue the command again to confirm the merge with the new base branch.

@achirkin
Copy link
Copy Markdown
Contributor

/merge

@rapids-bot rapids-bot Bot merged commit a46fa87 into rapidsai:main Feb 11, 2026
94 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Unstructured Data Processing Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants