Fix flaky check raw file injection smoke test#10920
Draft
Conversation
Three bugs contributed to flakiness in CI on JDK 8 variants (zulu8, semeru8) and under load: 1. Copy-paste bug in assertRawLogLinesWithInjection: the fallback assertion for 32-bit trace IDs incorrectly referenced logLines[0] (BEFORE FIRST SPAN) instead of logLines[4] (INSIDE THIRD SPAN) and logLines[6] (AFTER FORTH SPAN). This meant the 32-bit trace ID format was never actually validated for those two log lines. 2. BaseApplication.waitForCondition timeout was 10 seconds. On loaded CI machines with JDK 8 JVMs, the RC config change propagation through captureTraceConfig() could take several seconds, approaching the limit and causing "Logs injection config was never updated" failures. Increased to 30 seconds to give adequate headroom. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 60 metrics, 11 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~7ed6a98d3d, baseline=1.61.0-SNAPSHOT~468d566f35
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.056 s) : 0, 1056079
Total [baseline] (11.068 s) : 0, 11068356
Agent [candidate] (1.058 s) : 0, 1057599
Total [candidate] (11.092 s) : 0, 11092015
section appsec
Agent [baseline] (1.253 s) : 0, 1252589
Total [baseline] (11.295 s) : 0, 11294604
Agent [candidate] (1.25 s) : 0, 1249583
Total [candidate] (11.118 s) : 0, 11118254
section iast
Agent [baseline] (1.237 s) : 0, 1237346
Total [baseline] (11.383 s) : 0, 11383294
Agent [candidate] (1.23 s) : 0, 1229992
Total [candidate] (11.29 s) : 0, 11289627
section profiling
Agent [baseline] (1.182 s) : 0, 1181570
Total [baseline] (11.125 s) : 0, 11125191
Agent [candidate] (1.181 s) : 0, 1180977
Total [candidate] (10.964 s) : 0, 10963649
gantt
title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~7ed6a98d3d, baseline=1.61.0-SNAPSHOT~468d566f35
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.206 ms) : 0, 1206
crashtracking [candidate] (1.203 ms) : 0, 1203
BytebuddyAgent [baseline] (627.684 ms) : 0, 627684
BytebuddyAgent [candidate] (629.179 ms) : 0, 629179
AgentMeter [baseline] (29.355 ms) : 0, 29355
AgentMeter [candidate] (29.398 ms) : 0, 29398
GlobalTracer [baseline] (256.263 ms) : 0, 256263
GlobalTracer [candidate] (256.994 ms) : 0, 256994
AppSec [baseline] (31.662 ms) : 0, 31662
AppSec [candidate] (31.758 ms) : 0, 31758
Debugger [baseline] (60.305 ms) : 0, 60305
Debugger [candidate] (60.207 ms) : 0, 60207
Remote Config [baseline] (591.753 µs) : 0, 592
Remote Config [candidate] (584.092 µs) : 0, 584
Telemetry [baseline] (9.45 ms) : 0, 9450
Telemetry [candidate] (8.003 ms) : 0, 8003
Flare Poller [baseline] (3.598 ms) : 0, 3598
Flare Poller [candidate] (4.251 ms) : 0, 4251
section appsec
crashtracking [baseline] (1.209 ms) : 0, 1209
crashtracking [candidate] (1.187 ms) : 0, 1187
BytebuddyAgent [baseline] (662.132 ms) : 0, 662132
BytebuddyAgent [candidate] (660.939 ms) : 0, 660939
AgentMeter [baseline] (12.226 ms) : 0, 12226
AgentMeter [candidate] (12.097 ms) : 0, 12097
GlobalTracer [baseline] (259.455 ms) : 0, 259455
GlobalTracer [candidate] (258.52 ms) : 0, 258520
AppSec [baseline] (178.04 ms) : 0, 178040
AppSec [candidate] (177.83 ms) : 0, 177830
Debugger [baseline] (66.078 ms) : 0, 66078
Debugger [candidate] (66.026 ms) : 0, 66026
Remote Config [baseline] (657.533 µs) : 0, 658
Remote Config [candidate] (623.47 µs) : 0, 623
Telemetry [baseline] (8.447 ms) : 0, 8447
Telemetry [candidate] (8.275 ms) : 0, 8275
Flare Poller [baseline] (3.604 ms) : 0, 3604
Flare Poller [candidate] (3.525 ms) : 0, 3525
IAST [baseline] (24.351 ms) : 0, 24351
IAST [candidate] (24.266 ms) : 0, 24266
section iast
crashtracking [baseline] (1.246 ms) : 0, 1246
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (802.459 ms) : 0, 802459
BytebuddyAgent [candidate] (798.089 ms) : 0, 798089
AgentMeter [baseline] (11.681 ms) : 0, 11681
AgentMeter [candidate] (11.419 ms) : 0, 11419
GlobalTracer [baseline] (248.968 ms) : 0, 248968
GlobalTracer [candidate] (247.324 ms) : 0, 247324
AppSec [baseline] (26.828 ms) : 0, 26828
AppSec [candidate] (26.46 ms) : 0, 26460
Debugger [baseline] (69.74 ms) : 0, 69740
Debugger [candidate] (70.807 ms) : 0, 70807
Remote Config [baseline] (528.067 µs) : 0, 528
Remote Config [candidate] (530.848 µs) : 0, 531
Telemetry [baseline] (10.276 ms) : 0, 10276
Telemetry [candidate] (9.185 ms) : 0, 9185
Flare Poller [baseline] (3.659 ms) : 0, 3659
Flare Poller [candidate] (3.321 ms) : 0, 3321
IAST [baseline] (25.628 ms) : 0, 25628
IAST [candidate] (25.407 ms) : 0, 25407
section profiling
crashtracking [baseline] (1.176 ms) : 0, 1176
crashtracking [candidate] (1.177 ms) : 0, 1177
BytebuddyAgent [baseline] (681.709 ms) : 0, 681709
BytebuddyAgent [candidate] (681.286 ms) : 0, 681286
AgentMeter [baseline] (8.948 ms) : 0, 8948
AgentMeter [candidate] (9.062 ms) : 0, 9062
GlobalTracer [baseline] (215.513 ms) : 0, 215513
GlobalTracer [candidate] (215.321 ms) : 0, 215321
AppSec [baseline] (31.995 ms) : 0, 31995
AppSec [candidate] (32.059 ms) : 0, 32059
Debugger [baseline] (65.966 ms) : 0, 65966
Debugger [candidate] (64.389 ms) : 0, 64389
Remote Config [baseline] (558.596 µs) : 0, 559
Remote Config [candidate] (585.52 µs) : 0, 586
Telemetry [baseline] (7.684 ms) : 0, 7684
Telemetry [candidate] (8.472 ms) : 0, 8472
Flare Poller [baseline] (3.436 ms) : 0, 3436
Flare Poller [candidate] (4.205 ms) : 0, 4205
ProfilingAgent [baseline] (93.796 ms) : 0, 93796
ProfilingAgent [candidate] (93.568 ms) : 0, 93568
Profiling [baseline] (94.355 ms) : 0, 94355
Profiling [candidate] (94.137 ms) : 0, 94137
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~7ed6a98d3d, baseline=1.61.0-SNAPSHOT~468d566f35
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.055 s) : 0, 1055136
Total [baseline] (8.827 s) : 0, 8826826
Agent [candidate] (1.056 s) : 0, 1055844
Total [candidate] (8.908 s) : 0, 8908268
section iast
Agent [baseline] (1.244 s) : 0, 1243704
Total [baseline] (9.563 s) : 0, 9563481
Agent [candidate] (1.227 s) : 0, 1227391
Total [candidate] (9.537 s) : 0, 9536716
gantt
title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~7ed6a98d3d, baseline=1.61.0-SNAPSHOT~468d566f35
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.195 ms) : 0, 1195
crashtracking [candidate] (1.195 ms) : 0, 1195
BytebuddyAgent [baseline] (628.209 ms) : 0, 628209
BytebuddyAgent [candidate] (629.091 ms) : 0, 629091
AgentMeter [baseline] (29.44 ms) : 0, 29440
AgentMeter [candidate] (29.319 ms) : 0, 29319
GlobalTracer [baseline] (256.824 ms) : 0, 256824
GlobalTracer [candidate] (256.858 ms) : 0, 256858
AppSec [baseline] (31.739 ms) : 0, 31739
AppSec [candidate] (31.726 ms) : 0, 31726
Debugger [baseline] (59.638 ms) : 0, 59638
Debugger [candidate] (59.371 ms) : 0, 59371
Remote Config [baseline] (584.229 µs) : 0, 584
Remote Config [candidate] (594.772 µs) : 0, 595
Telemetry [baseline] (8.006 ms) : 0, 8006
Telemetry [candidate] (8.067 ms) : 0, 8067
Flare Poller [baseline] (3.497 ms) : 0, 3497
Flare Poller [candidate] (3.53 ms) : 0, 3530
section iast
crashtracking [baseline] (1.219 ms) : 0, 1219
crashtracking [candidate] (1.211 ms) : 0, 1211
BytebuddyAgent [baseline] (809.306 ms) : 0, 809306
BytebuddyAgent [candidate] (796.538 ms) : 0, 796538
AgentMeter [baseline] (11.869 ms) : 0, 11869
AgentMeter [candidate] (11.365 ms) : 0, 11365
GlobalTracer [baseline] (249.378 ms) : 0, 249378
GlobalTracer [candidate] (247.479 ms) : 0, 247479
AppSec [baseline] (26.941 ms) : 0, 26941
AppSec [candidate] (26.541 ms) : 0, 26541
Debugger [baseline] (66.617 ms) : 0, 66617
Debugger [candidate] (67.781 ms) : 0, 67781
Remote Config [baseline] (527.015 µs) : 0, 527
Remote Config [candidate] (521.37 µs) : 0, 521
Telemetry [baseline] (11.701 ms) : 0, 11701
Telemetry [candidate] (10.698 ms) : 0, 10698
Flare Poller [baseline] (4.092 ms) : 0, 4092
Flare Poller [candidate] (3.802 ms) : 0, 3802
IAST [baseline] (25.704 ms) : 0, 25704
IAST [candidate] (25.319 ms) : 0, 25319
LoadParameters
See matching parameters
SummaryFound 3 performance improvements and 2 performance regressions! Performance is the same for 15 metrics, 16 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~7ed6a98d3d, baseline=1.61.0-SNAPSHOT~468d566f35
dateFormat X
axisFormat %s
section baseline
no_agent (18.328 ms) : 18140, 18516
. : milestone, 18328,
appsec (18.509 ms) : 18320, 18698
. : milestone, 18509,
code_origins (17.902 ms) : 17724, 18080
. : milestone, 17902,
iast (17.828 ms) : 17651, 18006
. : milestone, 17828,
profiling (18.751 ms) : 18562, 18940
. : milestone, 18751,
tracing (18.275 ms) : 18092, 18458
. : milestone, 18275,
section candidate
no_agent (18.215 ms) : 18025, 18404
. : milestone, 18215,
appsec (18.902 ms) : 18714, 19091
. : milestone, 18902,
code_origins (18.059 ms) : 17879, 18238
. : milestone, 18059,
iast (17.885 ms) : 17709, 18062
. : milestone, 17885,
profiling (20.054 ms) : 19847, 20260
. : milestone, 20054,
tracing (17.746 ms) : 17570, 17921
. : milestone, 17746,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~7ed6a98d3d, baseline=1.61.0-SNAPSHOT~468d566f35
dateFormat X
axisFormat %s
section baseline
no_agent (1.184 ms) : 1172, 1197
. : milestone, 1184,
iast (3.337 ms) : 3291, 3383
. : milestone, 3337,
iast_FULL (6.006 ms) : 5945, 6068
. : milestone, 6006,
iast_GLOBAL (3.588 ms) : 3528, 3648
. : milestone, 3588,
profiling (2.057 ms) : 2039, 2076
. : milestone, 2057,
tracing (1.78 ms) : 1766, 1794
. : milestone, 1780,
section candidate
no_agent (1.187 ms) : 1175, 1198
. : milestone, 1187,
iast (3.116 ms) : 3080, 3153
. : milestone, 3116,
iast_FULL (5.671 ms) : 5614, 5727
. : milestone, 5671,
iast_GLOBAL (3.447 ms) : 3390, 3505
. : milestone, 3447,
profiling (2.215 ms) : 2194, 2237
. : milestone, 2215,
tracing (1.806 ms) : 1790, 1822
. : milestone, 1806,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~7ed6a98d3d, baseline=1.61.0-SNAPSHOT~468d566f35
dateFormat X
axisFormat %s
section baseline
no_agent (1.473 ms) : 1461, 1484
. : milestone, 1473,
appsec (3.803 ms) : 3581, 4025
. : milestone, 3803,
iast (2.258 ms) : 2188, 2327
. : milestone, 2258,
iast_GLOBAL (2.303 ms) : 2233, 2373
. : milestone, 2303,
profiling (2.102 ms) : 2045, 2158
. : milestone, 2102,
tracing (2.076 ms) : 2021, 2130
. : milestone, 2076,
section candidate
no_agent (1.473 ms) : 1462, 1485
. : milestone, 1473,
appsec (3.8 ms) : 3573, 4027
. : milestone, 3800,
iast (2.248 ms) : 2180, 2317
. : milestone, 2248,
iast_GLOBAL (2.289 ms) : 2220, 2359
. : milestone, 2289,
profiling (2.107 ms) : 2050, 2164
. : milestone, 2107,
tracing (2.077 ms) : 2023, 2131
. : milestone, 2077,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~7ed6a98d3d, baseline=1.61.0-SNAPSHOT~468d566f35
dateFormat X
axisFormat %s
section baseline
no_agent (14.99 s) : 14990000, 14990000
. : milestone, 14990000,
appsec (14.781 s) : 14781000, 14781000
. : milestone, 14781000,
iast (18.265 s) : 18265000, 18265000
. : milestone, 18265000,
iast_GLOBAL (17.86 s) : 17860000, 17860000
. : milestone, 17860000,
profiling (15.355 s) : 15355000, 15355000
. : milestone, 15355000,
tracing (14.97 s) : 14970000, 14970000
. : milestone, 14970000,
section candidate
no_agent (15.487 s) : 15487000, 15487000
. : milestone, 15487000,
appsec (14.506 s) : 14506000, 14506000
. : milestone, 14506000,
iast (17.969 s) : 17969000, 17969000
. : milestone, 17969000,
iast_GLOBAL (17.715 s) : 17715000, 17715000
. : milestone, 17715000,
profiling (14.798 s) : 14798000, 14798000
. : milestone, 14798000,
tracing (14.83 s) : 14830000, 14830000
. : milestone, 14830000,
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Does This Do
Fixes two bugs in
LogInjectionSmokeTest.check raw file injectionthat caused flaky test failures in CI on JDK 8 variants (zulu8,semeru8) and under load.Motivation
The test has been failing consistently in CI (visible in Datadog Test Visibility since March 13 when
final_statustracking was added). All observed failures are in test classes with Logback or Log4j2 backends. The@Flakyannotation covers IBM8 and OracleJDK8, but the same failures occur on other JDK 8 variants.Bug 1: Wrong index in
assertRawLogLinesWithInjection(copy-paste error)Lines 234 and 236 used
logLines[0]in the fallback assertion instead oflogLines[4]andlogLines[6]. SincelogLines[0]is "BEFORE FIRST SPAN", the fallback for 32-bit trace IDs was never exercised for INSIDE THIRD SPAN or AFTER FORTH SPAN.Bug 2: RC config propagation timeout too tight
BaseApplication.TIMEOUT_IN_NANOSwas 10 seconds. The test exercises remote config to toggle log injection mid-run, andBaseApplicationpollscaptureTraceConfig()every 100ms waiting for the change. On loaded CI machines with JDK 8 JVMs, RC config propagation through the tracer's internal processing pipeline was observed taking ~500ms–1s even on JDK 21 locally. On slower JVMs under CI load this can approach or exceed 10s, causing:→
exitValue != 0→ test failure.Increased to 30s (3× headroom).
Additional Notes
gradle/gradle-daemon-jvm.propertiesrequires JDK 21+ for the Gradle daemon, which always provides the test subprocess JVM viaSystem.getProperty("java.home")@Flakycondition intentionally not extended — fixing root causes rather than skipping on more JVMsContributor Checklist
Fix ...)type: bug,comp: testing,tag: flaky test,tag: no release notesJira ticket: N/A