Skip to content

[fix](group commit) group commit select backend should consider isLoadAvailable#61555

Open
mymeiyi wants to merge 1 commit intoapache:masterfrom
mymeiyi:fix-group-commit
Open

[fix](group commit) group commit select backend should consider isLoadAvailable#61555
mymeiyi wants to merge 1 commit intoapache:masterfrom
mymeiyi:fix-group-commit

Conversation

@mymeiyi
Copy link
Contributor

@mymeiyi mymeiyi commented Mar 20, 2026

  1. extracts backend availability checks into a shared isBackendAvailable() method
  2. adds the missing isLoadAvailable() check to group commit backend selection. Previously, backends with isLoadDisabled=true or isShutDown=true could still be selected for group commit, which is incorrect.

Copilot AI review requested due to automatic review settings March 20, 2026 07:39
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines backend selection for group commit loads by factoring the availability checks into a single helper and ensuring the chosen backend is load-available (not just alive).

Changes:

  • Centralize backend eligibility logic into isBackendAvailable(...).
  • Update cached-backend reuse and random backend selection to additionally require backend.isLoadAvailable().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Mar 20, 2026

/review

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

PR Goal & Accomplishment

This PR extracts backend availability checks into a shared isBackendAvailable() method and adds the missing isLoadAvailable() check to group commit backend selection. Previously, backends with isLoadDisabled=true or isShutDown=true could still be selected for group commit, which is incorrect.

The refactoring is clean and the extracted method correctly centralizes logic that was previously duplicated (with inconsistencies) between getCachedBackend and getRandomBackend.

Critical Checkpoint Conclusions

1. Does the code accomplish the goal?
Yes. The isLoadAvailable() check is now applied in both getCachedBackend and getRandomBackend. The refactoring into isBackendAvailable() also fixes a pre-existing inconsistency where getRandomBackend was missing the cloud cluster name check (though in practice this was harmless since the cloud path pre-filters backends by cluster).

2. Is the modification minimal and focused?
Yes. Single file, 16 additions / 6 deletions, well-scoped.

3. Concurrency concerns?
No new concurrency issues introduced. The existing ConcurrentHashMap usage pattern is preserved.

4. Parallel code paths?
The error messages in selectBackendForCloudGroupCommitInternal (line 302-307) and selectBackendForLocalGroupCommitInternal (line 337-341) now need updating — see inline comment.

5. Configuration / compatibility?
No new configs. No incompatible changes.

6. Test coverage?
No tests added. There are no existing unit tests for GroupCommitManager backend selection logic either. Given this is a bug fix that changes filtering behavior, a unit test would be valuable.

7. Observability?
See inline comment about error messages missing isLoadDisabled/isShutDown fields.

8. Performance?
No concerns. The refactoring has equivalent performance characteristics.

9. Other issues?
Minor: isAlive() is checked both explicitly and within isLoadAvailable(). This is redundant but harmless and aids readability.

}
if (!Config.isCloudMode()) {
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Observability issue] Now that isBackendAvailable() rejects backends based on isLoadAvailable() (which checks isLoadDisabled and isShutDown), the error messages in selectBackendForCloudGroupCommitInternal (line ~303) and selectBackendForLocalGroupCommitInternal (line ~338) should include isLoadDisabled and isShutDown fields. Currently they only log alive, active, decommissioned (and decommissioning in the cloud version).

Without this, when all backends are alive but load-disabled, operators will see a misleading error showing all backends as healthy.

Suggested addition to the error message map lambdas:

+ ", loadDisabled=" + be.isLoadDisabled() + ", shutdown=" + be.isShutDown()

(Note: isShutDown() is private — you may need to add isLoadAvailable() to the message instead, or make isShutDown accessible.)

@mymeiyi
Copy link
Contributor Author

mymeiyi commented Mar 20, 2026

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 26429 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6ea0907b098059ffa6c8f843d25b499cb886654c, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17657	4444	4244	4244
q2	q3	10648	795	516	516
q4	4673	364	251	251
q5	7544	1192	1024	1024
q6	174	175	145	145
q7	799	855	701	701
q8	9297	1445	1299	1299
q9	4960	4700	4518	4518
q10	6329	1915	1651	1651
q11	450	249	252	249
q12	738	569	467	467
q13	18046	2961	2165	2165
q14	234	223	207	207
q15	q16	729	745	661	661
q17	740	821	460	460
q18	6122	5361	5237	5237
q19	1107	985	606	606
q20	542	495	384	384
q21	4716	1818	1365	1365
q22	396	489	279	279
Total cold run time: 95901 ms
Total hot run time: 26429 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4728	4722	4621	4621
q2	q3	3878	4334	3803	3803
q4	872	1199	798	798
q5	4094	4412	4349	4349
q6	195	169	142	142
q7	1779	1658	1541	1541
q8	2527	2650	2552	2552
q9	7610	7433	7327	7327
q10	3875	4003	3575	3575
q11	491	437	419	419
q12	491	586	433	433
q13	2800	3109	2429	2429
q14	289	305	288	288
q15	q16	762	768	721	721
q17	1174	1331	1348	1331
q18	7165	6745	6585	6585
q19	930	986	936	936
q20	2054	2234	2081	2081
q21	3975	3472	3343	3343
q22	492	421	375	375
Total cold run time: 50181 ms
Total hot run time: 47649 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 167415 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6ea0907b098059ffa6c8f843d25b499cb886654c, data reload: false

query5	4332	643	500	500
query6	333	234	206	206
query7	4218	464	258	258
query8	339	236	223	223
query9	8735	2694	2685	2685
query10	503	408	342	342
query11	6988	5072	4926	4926
query12	183	130	124	124
query13	1286	449	355	355
query14	5762	3684	3445	3445
query14_1	2848	2840	2788	2788
query15	202	193	173	173
query16	1000	465	449	449
query17	1128	746	630	630
query18	2451	449	355	355
query19	215	210	190	190
query20	140	131	127	127
query21	215	133	110	110
query22	13289	13332	13194	13194
query23	15806	15550	15593	15550
query23_1	16159	15727	15617	15617
query24	8713	1800	1338	1338
query24_1	1281	1292	1332	1292
query25	665	533	482	482
query26	1282	295	178	178
query27	3033	529	348	348
query28	5568	1934	1870	1870
query29	822	567	465	465
query30	304	233	187	187
query31	1010	941	868	868
query32	85	70	71	70
query33	502	337	268	268
query34	897	864	532	532
query35	649	694	614	614
query36	1096	1134	992	992
query37	133	100	83	83
query38	2886	2891	2841	2841
query39	841	829	822	822
query39_1	798	805	796	796
query40	228	154	136	136
query41	61	61	58	58
query42	259	253	257	253
query43	248	249	215	215
query44	
query45	239	187	190	187
query46	884	983	610	610
query47	3066	3080	2032	2032
query48	308	334	226	226
query49	635	460	372	372
query50	672	273	217	217
query51	4078	4023	4042	4023
query52	261	267	255	255
query53	289	338	290	290
query54	300	270	268	268
query55	89	84	83	83
query56	321	314	303	303
query57	1942	1797	1691	1691
query58	284	273	272	272
query59	2777	2928	2743	2743
query60	348	334	329	329
query61	153	151	149	149
query62	631	574	535	535
query63	311	280	280	280
query64	4992	1265	981	981
query65	
query66	1478	481	353	353
query67	24256	24406	24172	24172
query68	
query69	393	304	282	282
query70	968	967	856	856
query71	345	315	300	300
query72	2846	2874	2438	2438
query73	542	532	324	324
query74	9621	9560	9403	9403
query75	2826	2811	2473	2473
query76	2278	1016	680	680
query77	367	384	309	309
query78	10985	11072	10492	10492
query79	3130	743	572	572
query80	1759	609	546	546
query81	579	259	225	225
query82	995	152	116	116
query83	330	271	254	254
query84	256	117	98	98
query85	905	501	451	451
query86	495	314	321	314
query87	3150	3130	2976	2976
query88	3585	2666	2612	2612
query89	425	376	343	343
query90	2059	178	174	174
query91	161	161	141	141
query92	82	78	69	69
query93	1689	819	499	499
query94	646	317	292	292
query95	604	404	324	324
query96	654	516	227	227
query97	2487	2472	2385	2385
query98	243	225	220	220
query99	1010	1003	850	850
Total cold run time: 256242 ms
Total hot run time: 167415 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/9) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 20, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants