Skip to content

[system_tests] Fix flaky test_app_started_is_first_message by using seq_id ordering#6387

Open
vlad-scherbich wants to merge 1 commit intomainfrom
vlad/system-tests-deflake-test_app_started_is_first_message
Open

[system_tests] Fix flaky test_app_started_is_first_message by using seq_id ordering#6387
vlad-scherbich wants to merge 1 commit intomainfrom
vlad/system-tests-deflake-test_app_started_is_first_message

Conversation

@vlad-scherbich
Copy link
Contributor

@vlad-scherbich vlad-scherbich commented Feb 26, 2026

https://datadoghq.atlassian.net/browse/PROF-13796

Motivation

test_telemetry.py::Test_Telemetry::test_app_started_is_first_message was flaky (e.g. 8 hits in dd-trace-py main CI, 2026). The test asserted that app-started is the first telemetry message, but it used capture order (log file order) instead of send order (seq_id). Capture order can be non-deterministic—when the first captured batch is not the first sent, the test fails.

Changes

  • test_app_started_is_first_message: Use seq_id as the source of truth for ordering. Flatten all lifecycle telemetry (single messages + message-batch payloads), skip non-lifecycle types (sketches, generate-metrics, logs, distributions), group by runtime_id, sort by (seq_id, batch_index), and assert per runtime that the first message is app-started. This removes dependence on capture/file order.

Reviewer checklist

  • Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
  • A docker base image is modified?
    • the relevant build-XXX-image label is present
  • A scenario is added, removed or renamed?

@github-actions
Copy link
Contributor

CODEOWNERS have been resolved as:

tests/test_telemetry.py                                                 @DataDog/libdatadog-telemetry @DataDog/apm-sdk-capabilities @DataDog/system-tests-core

@vlad-scherbich vlad-scherbich changed the title Fix flakiness by relying on send order (seq_id) vs capture order (log… [system_tests] Fix flaky test_app_started_is_first_message by using seq_id ordering Feb 27, 2026
@vlad-scherbich vlad-scherbich force-pushed the vlad/system-tests-deflake-test_app_started_is_first_message branch 4 times, most recently from eba115d to c14cbfb Compare February 27, 2026 15:49
@vlad-scherbich vlad-scherbich force-pushed the vlad/system-tests-deflake-test_app_started_is_first_message branch from c14cbfb to a418fd7 Compare February 27, 2026 16:05
@vlad-scherbich vlad-scherbich marked this pull request as ready for review February 27, 2026 16:32
@vlad-scherbich vlad-scherbich requested review from a team as code owners February 27, 2026 16:32
@vlad-scherbich vlad-scherbich requested review from a team, KowalskiThomas, pawelchcki and rachelyangdog and removed request for a team February 27, 2026 16:32
Copy link
Collaborator

@cbeauchesne cbeauchesne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From framework usage, AGTM.

But about the test logic, I'm a little bit concerned by the fact we are using seq_id as source of truth. The previous logic was using the capture order, and in consequence, there is an assertion that seq_id is coherent with that order.

I understand that at some point we may hit some painful non-deterministic behavior. But on the other hand, this assertion may be quite useful : are we 100% that a incoherent seq_id order has no effect on backend ?

Is there a way to keep a certain degree of assertion on this order?

This point is not my decision to make, but do you mind getting a review from someone familiar with the feature, to ensure the new test logic does not open the door to a downgrade of those assertions ?

@vlad-scherbich
Copy link
Contributor Author

From framework usage, AGTM.

But about the test logic, I'm a little bit concerned by the fact we are using seq_id as source of truth. The previous logic was using the capture order, and in consequence, there is an assertion that seq_id is coherent with that order.

I understand that at some point we may hit some painful non-deterministic behavior. But on the other hand, this assertion may be quite useful : are we 100% that a incoherent seq_id order has no effect on backend ?

Is there a way to keep a certain degree of assertion on this order?

This point is not my decision to make, but do you mind getting a review from someone familiar with the feature, to ensure the new test logic does not open the door to a downgrade of those assertions ?

Thanks @cbeauchesne for catching this potential issue. I'm not sure who the feature owner might be:
@pawelchcki , @paullegranddc , @rachelyangdog - would you happen to know?

@mabdinur
Copy link
Contributor

mabdinur commented Mar 9, 2026

From framework usage, AGTM.
But about the test logic, I'm a little bit concerned by the fact we are using seq_id as source of truth. The previous logic was using the capture order, and in consequence, there is an assertion that seq_id is coherent with that order.
I understand that at some point we may hit some painful non-deterministic behavior. But on the other hand, this assertion may be quite useful : are we 100% that a incoherent seq_id order has no effect on backend ?
Is there a way to keep a certain degree of assertion on this order?
This point is not my decision to make, but do you mind getting a review from someone familiar with the feature, to ensure the new test logic does not open the door to a downgrade of those assertions ?

Thanks @cbeauchesne for catching this potential issue. I'm not sure who the feature owner might be: @pawelchcki , @paullegranddc , @rachelyangdog - would you happen to know?

seq_id is not used by the backend. It is mainly used to track and debug telemetry data.

Previously, we reset the seq_id whenever a process forked. Because of that, we had to rely on capture order to estimate when a payload was generated. This worked since payloads were sent in FIFO order, but as of a few weeks ago, that’s no longer the case (a child process can now submit telemetry before it's parent and this is the source of the current flakiness).

We recently changed this behavior so that seq_id is no longer reset on fork. As a result, we can now reliably use seq_id to determine which telemetry payload was generated first.

@cbeauchesne
Copy link
Collaborator

We recently changed this behavior so that seq_id is no longer reset on fork.

My question is about this point : seq_id now become the source of truth, rather timestamps. In consequence, we don't test it.

So it means that if at some point someone breaks this behavior (for instance, an error is pushed that reset again seq_id on forks), system-tests won't see it.

Just to be clear, I'm not opposing to this change, but I just wanted to be sure that it's an educated choice, and not a new loophole.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants