feat(ffe): add flag evaluation metrics E2E tests for Go by leoromanovsky · Pull Request #6410 · DataDog/system-tests

leoromanovsky · 2026-03-02T20:45:35Z

Motivation

Per the RFC "Flag evaluations tracking for APM tracers" (Oleksii Shmalko, 2026-01-20): we want to collect a metric for flag evaluations to track usage of flags. The companion dd-trace-go PR implements the feature_flag.evaluations OTel metric in the OpenFeature provider. These system tests validate the end-to-end pipeline: flag evaluation in the Go weblog → OTel SDK aggregation → OTLP export to agent → agent forwards to backend (proxy) → system tests capture and assert.

Changes

Modified utils/_context/_scenarios/__init__.py: Added DD_METRICS_OTEL_ENABLED=true and OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://agent:4318/v1/metrics to the FFE scenario weblog env. The OTLP endpoint points directly to the agent container's OTLP receiver (port 4318), since the proxy does not have an OTLP listener in this scenario.
New tests/ffe/test_flag_eval_metrics.py: 4 E2E test classes:
- Test_FFE_Eval_Metric_Basic: Verifies metric exists with correct tags (feature_flag.key, feature_flag.provider.name, feature_flag.result.variant, feature_flag.result.reason)
- Test_FFE_Eval_Metric_Count: Evaluates same flag 5 times, verifies aggregated metric count >= 5
- Test_FFE_Eval_Metric_Different_Flags: Evaluates two flags, verifies separate metric series per feature_flag.key
- Test_FFE_Eval_Metric_Error: Evaluates non-existent flag, verifies feature_flag.result.reason=error and error.type=flag_not_found
Modified manifests/golang.yml: Added tests/ffe/test_flag_eval_metrics.py: v2.7.0-dev to enable new tests for Go.

Decisions

OTLP endpoint to agent: The FFE scenario's weblog uses DD_TRACE_AGENT_URL pointing to the proxy, but the proxy doesn't have an OTLP receiver. We set OTEL_EXPORTER_OTLP_METRICS_ENDPOINT directly to agent:4318/v1/metrics to bypass the proxy for metric export. The agent then forwards processed metrics to the proxy via /api/v2/series, where system tests capture them via interfaces.agent.get_metrics().
25s sleep for pipeline latency: The OTLP pipeline (10s OTel SDK export + 10s agent flush + buffer) requires waiting for metrics to propagate. Each test setup includes a 25s sleep after flag evaluations.
Metrics captured via existing agent interface: Uses interfaces.agent.get_metrics() filtering /api/v2/series — no proxy changes needed.

Local test evidence

System tests (all 17 FFE tests pass — 0 regressions)

Scenario: FEATURE_FLAGGING_AND_EXPERIMENTATION
Library: golang@2.7.0-dev.1

tests/ffe/test_dynamic_evaluation.py ..                                  [ 11%]
tests/ffe/test_exposures.py ...........                                  [ 76%]
tests/ffe/test_flag_eval_metrics.py ....                                 [100%]

=============== 17 passed, 2224 deselected in 228.93s (0:03:48) ================

Companion PR

dd-trace-go: feat(openfeature): add flag evaluation tracking via OTel Metrics dd-trace-go#4489 (core implementation)

Add system tests validating the feature_flag.evaluations OTel metric emitted by dd-trace-go's OpenFeature provider. - Enable DD_METRICS_OTEL_ENABLED and OTLP endpoint in FFE scenario - 4 test cases: basic metric, count, different flags, error tags - Update Go manifest for new test file

github-actions · 2026-03-02T20:46:23Z

CODEOWNERS have been resolved as:

tests/ffe/test_flag_eval_metrics.py                                     @DataDog/feature-flagging-and-experimentation-sdk @DataDog/system-tests-core
manifests/cpp_httpd.yml                                                 @DataDog/dd-trace-cpp
manifests/cpp_kong.yml                                                  @DataDog/system-tests-core
manifests/cpp_nginx.yml                                                 @DataDog/dd-trace-cpp
manifests/dotnet.yml                                                    @DataDog/apm-dotnet @DataDog/asm-dotnet
manifests/golang.yml                                                    @DataDog/dd-trace-go-guild
manifests/java.yml                                                      @DataDog/asm-java @DataDog/apm-java
manifests/nodejs.yml                                                    @DataDog/dd-trace-js
manifests/php.yml                                                       @DataDog/apm-php @DataDog/asm-php
manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
manifests/ruby.yml                                                      @DataDog/ruby-guild @DataDog/asm-ruby
manifests/rust.yml                                                      @DataDog/apm-rust
utils/_context/_scenarios/__init__.py                                   @DataDog/system-tests-core
utils/build/docker/golang/app/_shared/common/ffe.go                     @DataDog/dd-trace-go-guild @DataDog/system-tests-core

Attribute dropped from dd-trace-go — always "Datadog", adds no value.

…test The Go weblog was calling ofClient.Object() for all evaluations, ignoring the variationType field. This meant type conversion errors could never occur, unlike Python/Node.js which dispatch to the type-specific methods (BooleanValue, StringValue, etc.). Fix the Go weblog to dispatch based on variationType, matching the behavior of other language weblogs. Add Test_FFE_Eval_Metric_Type_Mismatch: configures a STRING flag but evaluates it as BOOLEAN, triggering a type conversion error that happens after the core evaluate() returns. This test would fail with the old evaluate()-level metric recording (which would see targeting_match / no error) and only passes when metrics are recorded via a Finally hook (which sees error / type_mismatch).

…al-metrics

Add type annotations to module-level helper functions and move boolean default to keyword-only argument to satisfy ruff ANN001 and FBT002 rules.

datadog-datadog-prod-us1 · 2026-03-03T21:25:27Z

⚠️ Tests

✨ Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 11 Tests failed

tests.debugger.test_debugger_symdb.Test_Debugger_SymDb.test_symdb_upload[chi] from system_tests_suite (Datadog) (Fix with Cursor)

ValueError: No scope containing debugger controller with scope_type CLASS or MODULE was found in the symbols

self = <tests.debugger.test_debugger_symdb.Test_Debugger_SymDb object at 0x7f6a679e0b60>

    def test_symdb_upload(self):
>       self._assert()

tests/debugger/test_debugger_symdb.py:90: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/debugger/test_debugger_symdb.py:24: in _assert
...

tests.test_config_consistency.Test_Config_UnifiedServiceTagging_CustomService.test_specified_service_name[haproxy-spoa] from system_tests_suite (Datadog) (Fix with Cursor)

ValueError: No trace has been found for request PVXJHNBEKAGEEUSOSAPFKJXCHTAZPVCOIPDS

self = <tests.test_config_consistency.Test_Config_UnifiedServiceTagging_CustomService object at 0x7f5b4c771640>

    def test_specified_service_name(self):
>       interfaces.library.assert_trace_exists(self.r)

tests/test_config_consistency.py:390: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

...

tests.test_standard_tags.Test_StandardTagsClientIp.test_client_ip[haproxy-spoa] from system_tests_suite (Datadog) (Fix with Cursor)

AssertionError: No root spans found

self = <tests.test_standard_tags.Test_StandardTagsClientIp object at 0x7f5b4c794500>

    def test_client_ip(self):
        """Test http.client_ip is always reported in the default scenario which has ASM enabled"""
>       meta = self._get_root_span_meta(self.request_with_attack)

tests/test_standard_tags.py:264: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
...

View all

ℹ️ Info

❄️ No new flaky tests detected

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 41ac7f3 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

Only Go supports flag evaluation metrics via OTel so far. Without this, the test file runs for all FFE-enabled languages and fails.

brettlangdon

manifests/python.yml lgtm

tests/ffe/test_flag_eval_metrics.py

Replace hardcoded time.sleep(25) in each test setup with agent_interface_timeout=30 on the FFE scenario. The container shutdown flushes metrics; the timeout gives the agent time to receive and process them.

cbeauchesne

Framework usage : all good !

But could you get a review frome someone familiar with the tested feature ?

leoromanovsky · 2026-03-09T13:22:13Z

Framework usage : all good !

But could you get a review frome someone familiar with the tested feature ?

Thanks yes I have asked FFE engineers to review as well before merging.

Assert that feature_flag.result.allocation_key tag is present with value "default-allocation" on successful flag evaluations.

leoromanovsky mentioned this pull request Mar 2, 2026

feat(openfeature): add flag evaluation tracking via OTel Metrics DataDog/dd-trace-go#4489

Merged

leoromanovsky added 6 commits March 2, 2026 16:18

fix(ffe): remove feature_flag.provider.name assertion

81a2eb2

Attribute dropped from dd-trace-go — always "Datadog", adds no value.

fix(ffe): add type annotation to fix mypy index error

2062205

Merge remote-tracking branch 'origin/main' into leo.romanovsky/ffe-ev…

0137fbb

…al-metrics

Merge main and fix lint in test_flag_eval_metrics.py

cd1d7eb

Add type annotations to module-level helper functions and move boolean default to keyword-only argument to satisfy ruff ANN001 and FBT002 rules.

Remove obvious comment in ffe.go

f63048c

Mark test_flag_eval_metrics.py as missing_feature for non-Go languages

9b7e9d4

Only Go supports flag evaluation metrics via OTel so far. Without this, the test file runs for all FFE-enabled languages and fails.

leoromanovsky marked this pull request as ready for review March 3, 2026 22:35

leoromanovsky requested review from a team as code owners March 3, 2026 22:35

leoromanovsky requested review from brettlangdon and removed request for a team March 3, 2026 22:35

leoromanovsky requested review from claponcet, manuel-alvarez-alvarez, r1viollet, sameerank, typotter and xlamorlette-datadog and removed request for a team March 3, 2026 22:35

brettlangdon approved these changes Mar 3, 2026

View reviewed changes

nccatoni reviewed Mar 4, 2026

View reviewed changes

tests/ffe/test_flag_eval_metrics.py Outdated Show resolved Hide resolved

xlamorlette-datadog removed their request for review March 9, 2026 09:44

leoromanovsky added 3 commits March 9, 2026 06:27

Remove per-test sleeps, use scenario-level agent_interface_timeout

ad6f029

Replace hardcoded time.sleep(25) in each test setup with agent_interface_timeout=30 on the FFE scenario. The container shutdown flushes metrics; the timeout gives the agent time to receive and process them.

Merge main and resolve manifest conflicts

c43cd33

fix: remove extra blank lines left from sleep removal

9b9297e

leoromanovsky requested review from cbeauchesne and nccatoni March 9, 2026 10:38

nccatoni approved these changes Mar 9, 2026

View reviewed changes

cbeauchesne reviewed Mar 9, 2026

View reviewed changes

leoromanovsky added 2 commits March 9, 2026 10:12

Add allocation_key assertion to flag eval metrics test

d43b23a

Assert that feature_flag.result.allocation_key tag is present with value "default-allocation" on successful flag evaluations.

Merge branch 'main' into leo.romanovsky/ffe-eval-metrics

41ac7f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ffe): add flag evaluation metrics E2E tests for Go#6410

feat(ffe): add flag evaluation metrics E2E tests for Go#6410
leoromanovsky wants to merge 13 commits intomainfrom
leo.romanovsky/ffe-eval-metrics

leoromanovsky commented Mar 2, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1 bot commented Mar 3, 2026 •

edited by datadog-prod-us1-4 bot

Loading

Uh oh!

brettlangdon left a comment

Uh oh!

Uh oh!

cbeauchesne left a comment

Uh oh!

leoromanovsky commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

leoromanovsky commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Decisions

Local test evidence

System tests (all 17 FFE tests pass — 0 regressions)

Companion PR

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-datadog-prod-us1 bot commented Mar 3, 2026 • edited by datadog-prod-us1-4 bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

ℹ️ Info

Uh oh!

brettlangdon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cbeauchesne left a comment

Choose a reason for hiding this comment

Uh oh!

leoromanovsky commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leoromanovsky commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

datadog-datadog-prod-us1 bot commented Mar 3, 2026 •

edited by datadog-prod-us1-4 bot

Loading