[fix](group commit) group commit select backend should consider isLoadAvailable#61555
[fix](group commit) group commit select backend should consider isLoadAvailable#61555mymeiyi wants to merge 1 commit intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
This PR refines backend selection for group commit loads by factoring the availability checks into a single helper and ensuring the chosen backend is load-available (not just alive).
Changes:
- Centralize backend eligibility logic into
isBackendAvailable(...). - Update cached-backend reuse and random backend selection to additionally require
backend.isLoadAvailable().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/review |
There was a problem hiding this comment.
Code Review Summary
PR Goal & Accomplishment
This PR extracts backend availability checks into a shared isBackendAvailable() method and adds the missing isLoadAvailable() check to group commit backend selection. Previously, backends with isLoadDisabled=true or isShutDown=true could still be selected for group commit, which is incorrect.
The refactoring is clean and the extracted method correctly centralizes logic that was previously duplicated (with inconsistencies) between getCachedBackend and getRandomBackend.
Critical Checkpoint Conclusions
1. Does the code accomplish the goal?
Yes. The isLoadAvailable() check is now applied in both getCachedBackend and getRandomBackend. The refactoring into isBackendAvailable() also fixes a pre-existing inconsistency where getRandomBackend was missing the cloud cluster name check (though in practice this was harmless since the cloud path pre-filters backends by cluster).
2. Is the modification minimal and focused?
Yes. Single file, 16 additions / 6 deletions, well-scoped.
3. Concurrency concerns?
No new concurrency issues introduced. The existing ConcurrentHashMap usage pattern is preserved.
4. Parallel code paths?
The error messages in selectBackendForCloudGroupCommitInternal (line 302-307) and selectBackendForLocalGroupCommitInternal (line 337-341) now need updating — see inline comment.
5. Configuration / compatibility?
No new configs. No incompatible changes.
6. Test coverage?
No tests added. There are no existing unit tests for GroupCommitManager backend selection logic either. Given this is a bug fix that changes filtering behavior, a unit test would be valuable.
7. Observability?
See inline comment about error messages missing isLoadDisabled/isShutDown fields.
8. Performance?
No concerns. The refactoring has equivalent performance characteristics.
9. Other issues?
Minor: isAlive() is checked both explicitly and within isLoadAvailable(). This is redundant but harmless and aids readability.
| } | ||
| if (!Config.isCloudMode()) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
[Observability issue] Now that isBackendAvailable() rejects backends based on isLoadAvailable() (which checks isLoadDisabled and isShutDown), the error messages in selectBackendForCloudGroupCommitInternal (line ~303) and selectBackendForLocalGroupCommitInternal (line ~338) should include isLoadDisabled and isShutDown fields. Currently they only log alive, active, decommissioned (and decommissioning in the cloud version).
Without this, when all backends are alive but load-disabled, operators will see a misleading error showing all backends as healthy.
Suggested addition to the error message map lambdas:
+ ", loadDisabled=" + be.isLoadDisabled() + ", shutdown=" + be.isShutDown()(Note: isShutDown() is private — you may need to add isLoadAvailable() to the message instead, or make isShutDown accessible.)
|
run buildall |
TPC-H: Total hot run time: 26429 ms |
TPC-DS: Total hot run time: 167415 ms |
FE UT Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
Uh oh!
There was an error while loading. Please reload this page.