[system_tests] Fix flaky test_app_started_is_first_message by using seq_id ordering by vlad-scherbich · Pull Request #6387 · DataDog/system-tests

vlad-scherbich · 2026-02-26T22:11:24Z

https://datadoghq.atlassian.net/browse/PROF-13796

Motivation

test_telemetry.py::Test_Telemetry::test_app_started_is_first_message was flaky (e.g. 8 hits in dd-trace-py main CI, 2026). The test asserted that app-started is the first telemetry message, but it used capture order (log file order) instead of send order (seq_id). Capture order can be non-deterministic—when the first captured batch is not the first sent, the test fails.

Changes

test_app_started_is_first_message: Use seq_id as the source of truth for ordering. Flatten all lifecycle telemetry (single messages + message-batch payloads), skip non-lifecycle types (sketches, generate-metrics, logs, distributions), group by runtime_id, sort by (seq_id, batch_index), and assert per runtime that the first message is app-started. This removes dependence on capture/file order.

Reviewer checklist

Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
A docker base image is modified?
- the relevant build-XXX-image label is present
A scenario is added, removed or renamed?
- Get a review from R&P team

github-actions · 2026-02-26T22:12:03Z

CODEOWNERS have been resolved as:

tests/test_telemetry.py                                                 @DataDog/libdatadog-telemetry @DataDog/apm-sdk-capabilities @DataDog/system-tests-core

… files)

cbeauchesne

From framework usage, AGTM.

But about the test logic, I'm a little bit concerned by the fact we are using seq_id as source of truth. The previous logic was using the capture order, and in consequence, there is an assertion that seq_id is coherent with that order.

I understand that at some point we may hit some painful non-deterministic behavior. But on the other hand, this assertion may be quite useful : are we 100% that a incoherent seq_id order has no effect on backend ?

Is there a way to keep a certain degree of assertion on this order?

This point is not my decision to make, but do you mind getting a review from someone familiar with the feature, to ensure the new test logic does not open the door to a downgrade of those assertions ?

vlad-scherbich · 2026-03-05T16:56:01Z

From framework usage, AGTM.

But about the test logic, I'm a little bit concerned by the fact we are using seq_id as source of truth. The previous logic was using the capture order, and in consequence, there is an assertion that seq_id is coherent with that order.

I understand that at some point we may hit some painful non-deterministic behavior. But on the other hand, this assertion may be quite useful : are we 100% that a incoherent seq_id order has no effect on backend ?

Is there a way to keep a certain degree of assertion on this order?

This point is not my decision to make, but do you mind getting a review from someone familiar with the feature, to ensure the new test logic does not open the door to a downgrade of those assertions ?

Thanks @cbeauchesne for catching this potential issue. I'm not sure who the feature owner might be:
@pawelchcki , @paullegranddc , @rachelyangdog - would you happen to know?

mabdinur · 2026-03-09T14:42:08Z

From framework usage, AGTM.
But about the test logic, I'm a little bit concerned by the fact we are using seq_id as source of truth. The previous logic was using the capture order, and in consequence, there is an assertion that seq_id is coherent with that order.
I understand that at some point we may hit some painful non-deterministic behavior. But on the other hand, this assertion may be quite useful : are we 100% that a incoherent seq_id order has no effect on backend ?
Is there a way to keep a certain degree of assertion on this order?
This point is not my decision to make, but do you mind getting a review from someone familiar with the feature, to ensure the new test logic does not open the door to a downgrade of those assertions ?

Thanks @cbeauchesne for catching this potential issue. I'm not sure who the feature owner might be: @pawelchcki , @paullegranddc , @rachelyangdog - would you happen to know?

seq_id is not used by the backend. It is mainly used to track and debug telemetry data.

Previously, we reset the seq_id whenever a process forked. Because of that, we had to rely on capture order to estimate when a payload was generated. This worked since payloads were sent in FIFO order, but as of a few weeks ago, that’s no longer the case (a child process can now submit telemetry before it's parent and this is the source of the current flakiness).

We recently changed this behavior so that seq_id is no longer reset on fork. As a result, we can now reliably use seq_id to determine which telemetry payload was generated first.

cbeauchesne · 2026-03-10T09:08:51Z

We recently changed this behavior so that seq_id is no longer reset on fork.

My question is about this point : seq_id now become the source of truth, rather timestamps. In consequence, we don't test it.

So it means that if at some point someone breaks this behavior (for instance, an error is pushed that reset again seq_id on forks), system-tests won't see it.

Just to be clear, I'm not opposing to this change, but I just wanted to be sure that it's an educated choice, and not a new loophole.

vlad-scherbich changed the title ~~Fix flakiness by relying on send order (seq_id) vs capture order (log…~~ [system_tests] Fix flaky test_app_started_is_first_message by using seq_id ordering Feb 27, 2026

vlad-scherbich force-pushed the vlad/system-tests-deflake-test_app_started_is_first_message branch 4 times, most recently from eba115d to c14cbfb Compare February 27, 2026 15:49

Fix flakiness by relying on send order (seq_id) vs capture order (log…

a418fd7

… files)

vlad-scherbich force-pushed the vlad/system-tests-deflake-test_app_started_is_first_message branch from c14cbfb to a418fd7 Compare February 27, 2026 16:05

vlad-scherbich marked this pull request as ready for review February 27, 2026 16:32

vlad-scherbich requested review from a team as code owners February 27, 2026 16:32

vlad-scherbich requested review from a team, KowalskiThomas, pawelchcki and rachelyangdog and removed request for a team February 27, 2026 16:32

cbeauchesne reviewed Mar 3, 2026

View reviewed changes

mabdinur approved these changes Mar 9, 2026

View reviewed changes

gh-worker-ownership-write-b05516 bot removed request for a team and KowalskiThomas March 9, 2026 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[system_tests] Fix flaky test_app_started_is_first_message by using seq_id ordering#6387

[system_tests] Fix flaky test_app_started_is_first_message by using seq_id ordering#6387
vlad-scherbich wants to merge 1 commit intomainfrom
vlad/system-tests-deflake-test_app_started_is_first_message

vlad-scherbich commented Feb 26, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

cbeauchesne left a comment •

edited

Loading

Uh oh!

vlad-scherbich commented Mar 5, 2026

Uh oh!

mabdinur commented Mar 9, 2026

Uh oh!

cbeauchesne commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vlad-scherbich commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Reviewer checklist

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

cbeauchesne left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vlad-scherbich commented Mar 5, 2026

Uh oh!

mabdinur commented Mar 9, 2026

Uh oh!

cbeauchesne commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vlad-scherbich commented Feb 26, 2026 •

edited

Loading

cbeauchesne left a comment •

edited

Loading