Better handling of batching of search in MG replicated mode#1718
Better handling of batching of search in MG replicated mode#1718rapids-bot[bot] merged 3 commits intorapidsai:mainfrom
Conversation
|
I would expect CAGRA to be thread safe. We expect that we can search using multiple threads to achieve large throughput for concurrent small batch size queries. We should fix this in CAGRA. |
tfeher
left a comment
There was a problem hiding this comment.
The changes look good to me as a workaround. But we should find the root cause, please open an issue to track it.
|
@viclafargue cam you also link the issue in a comment in the code before this is merged? |
|
/merge |
1 similar comment
|
/merge |
|
/merge |
|
@divyegala @achirkin This is failing to merge because it is targeting |
7fda58b to
ae8d0b8
Compare
This is a mistake. I re-targeted to the |
|
This PR's base branch has been changed since the last |
5 similar comments
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
9 similar comments
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
This PR's base branch has been changed since the last |
|
/merge |
Answers #1720
In multi-GPU replicated mode, the search query is divided in batches. These batches are ran in parallel with OpenMP. In some cases, there may be more batches than available GPUs causing a thread safety issue (at least for CAGRA indices). This change solves the issue. Each rank gets its own thread, that thread handles all batches for that rank sequentially. This prevents concurrent access to the same GPU from multiple threads.