Skip to content

feat(ffe): add flag evaluation metrics E2E tests for Go#6410

Open
leoromanovsky wants to merge 13 commits intomainfrom
leo.romanovsky/ffe-eval-metrics
Open

feat(ffe): add flag evaluation metrics E2E tests for Go#6410
leoromanovsky wants to merge 13 commits intomainfrom
leo.romanovsky/ffe-eval-metrics

Conversation

@leoromanovsky
Copy link
Contributor

@leoromanovsky leoromanovsky commented Mar 2, 2026

Motivation

Per the RFC "Flag evaluations tracking for APM tracers" (Oleksii Shmalko, 2026-01-20): we want to collect a metric for flag evaluations to track usage of flags. The companion dd-trace-go PR implements the feature_flag.evaluations OTel metric in the OpenFeature provider. These system tests validate the end-to-end pipeline: flag evaluation in the Go weblog → OTel SDK aggregation → OTLP export to agent → agent forwards to backend (proxy) → system tests capture and assert.

Changes

  • Modified utils/_context/_scenarios/__init__.py: Added DD_METRICS_OTEL_ENABLED=true and OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://agent:4318/v1/metrics to the FFE scenario weblog env. The OTLP endpoint points directly to the agent container's OTLP receiver (port 4318), since the proxy does not have an OTLP listener in this scenario.
  • New tests/ffe/test_flag_eval_metrics.py: 4 E2E test classes:
    • Test_FFE_Eval_Metric_Basic: Verifies metric exists with correct tags (feature_flag.key, feature_flag.provider.name, feature_flag.result.variant, feature_flag.result.reason)
    • Test_FFE_Eval_Metric_Count: Evaluates same flag 5 times, verifies aggregated metric count >= 5
    • Test_FFE_Eval_Metric_Different_Flags: Evaluates two flags, verifies separate metric series per feature_flag.key
    • Test_FFE_Eval_Metric_Error: Evaluates non-existent flag, verifies feature_flag.result.reason=error and error.type=flag_not_found
  • Modified manifests/golang.yml: Added tests/ffe/test_flag_eval_metrics.py: v2.7.0-dev to enable new tests for Go.

Decisions

  • OTLP endpoint to agent: The FFE scenario's weblog uses DD_TRACE_AGENT_URL pointing to the proxy, but the proxy doesn't have an OTLP receiver. We set OTEL_EXPORTER_OTLP_METRICS_ENDPOINT directly to agent:4318/v1/metrics to bypass the proxy for metric export. The agent then forwards processed metrics to the proxy via /api/v2/series, where system tests capture them via interfaces.agent.get_metrics().
  • 25s sleep for pipeline latency: The OTLP pipeline (10s OTel SDK export + 10s agent flush + buffer) requires waiting for metrics to propagate. Each test setup includes a 25s sleep after flag evaluations.
  • Metrics captured via existing agent interface: Uses interfaces.agent.get_metrics() filtering /api/v2/series — no proxy changes needed.

Local test evidence

System tests (all 17 FFE tests pass — 0 regressions)

Scenario: FEATURE_FLAGGING_AND_EXPERIMENTATION
Library: golang@2.7.0-dev.1

tests/ffe/test_dynamic_evaluation.py ..                                  [ 11%]
tests/ffe/test_exposures.py ...........                                  [ 76%]
tests/ffe/test_flag_eval_metrics.py ....                                 [100%]

=============== 17 passed, 2224 deselected in 228.93s (0:03:48) ================

Companion PR

Add system tests validating the feature_flag.evaluations OTel metric
emitted by dd-trace-go's OpenFeature provider.

- Enable DD_METRICS_OTEL_ENABLED and OTLP endpoint in FFE scenario
- 4 test cases: basic metric, count, different flags, error tags
- Update Go manifest for new test file
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

CODEOWNERS have been resolved as:

tests/ffe/test_flag_eval_metrics.py                                     @DataDog/feature-flagging-and-experimentation-sdk @DataDog/system-tests-core
manifests/cpp_httpd.yml                                                 @DataDog/dd-trace-cpp
manifests/cpp_kong.yml                                                  @DataDog/system-tests-core
manifests/cpp_nginx.yml                                                 @DataDog/dd-trace-cpp
manifests/dotnet.yml                                                    @DataDog/apm-dotnet @DataDog/asm-dotnet
manifests/golang.yml                                                    @DataDog/dd-trace-go-guild
manifests/java.yml                                                      @DataDog/asm-java @DataDog/apm-java
manifests/nodejs.yml                                                    @DataDog/dd-trace-js
manifests/php.yml                                                       @DataDog/apm-php @DataDog/asm-php
manifests/python.yml                                                    @DataDog/apm-python @DataDog/asm-python
manifests/ruby.yml                                                      @DataDog/ruby-guild @DataDog/asm-ruby
manifests/rust.yml                                                      @DataDog/apm-rust
utils/_context/_scenarios/__init__.py                                   @DataDog/system-tests-core
utils/build/docker/golang/app/_shared/common/ffe.go                     @DataDog/dd-trace-go-guild @DataDog/system-tests-core

Attribute dropped from dd-trace-go — always "Datadog", adds no value.
…test

The Go weblog was calling ofClient.Object() for all evaluations,
ignoring the variationType field. This meant type conversion errors
could never occur, unlike Python/Node.js which dispatch to the
type-specific methods (BooleanValue, StringValue, etc.).

Fix the Go weblog to dispatch based on variationType, matching the
behavior of other language weblogs.

Add Test_FFE_Eval_Metric_Type_Mismatch: configures a STRING flag but
evaluates it as BOOLEAN, triggering a type conversion error that
happens after the core evaluate() returns. This test would fail with
the old evaluate()-level metric recording (which would see
targeting_match / no error) and only passes when metrics are recorded
via a Finally hook (which sees error / type_mismatch).
Add type annotations to module-level helper functions and move
boolean default to keyword-only argument to satisfy ruff ANN001
and FBT002 rules.
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Mar 3, 2026

⚠️ Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🧪 11 Tests failed

tests.debugger.test_debugger_symdb.Test_Debugger_SymDb.test_symdb_upload[chi] from system_tests_suite (Datadog) (Fix with Cursor)
ValueError: No scope containing debugger controller with scope_type CLASS or MODULE was found in the symbols

self = <tests.debugger.test_debugger_symdb.Test_Debugger_SymDb object at 0x7f6a679e0b60>

    def test_symdb_upload(self):
>       self._assert()

tests/debugger/test_debugger_symdb.py:90: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/debugger/test_debugger_symdb.py:24: in _assert
...
tests.test_config_consistency.Test_Config_UnifiedServiceTagging_CustomService.test_specified_service_name[haproxy-spoa] from system_tests_suite (Datadog) (Fix with Cursor)
ValueError: No trace has been found for request PVXJHNBEKAGEEUSOSAPFKJXCHTAZPVCOIPDS

self = <tests.test_config_consistency.Test_Config_UnifiedServiceTagging_CustomService object at 0x7f5b4c771640>

    def test_specified_service_name(self):
>       interfaces.library.assert_trace_exists(self.r)

tests/test_config_consistency.py:390: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

...
tests.test_standard_tags.Test_StandardTagsClientIp.test_client_ip[haproxy-spoa] from system_tests_suite (Datadog) (Fix with Cursor)
AssertionError: No root spans found

self = <tests.test_standard_tags.Test_StandardTagsClientIp object at 0x7f5b4c794500>

    def test_client_ip(self):
        """Test http.client_ip is always reported in the default scenario which has ASM enabled"""
>       meta = self._get_root_span_meta(self.request_with_attack)

tests/test_standard_tags.py:264: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
...
View all

ℹ️ Info

❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 41ac7f3 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!

Only Go supports flag evaluation metrics via OTel so far. Without this,
the test file runs for all FFE-enabled languages and fails.
@leoromanovsky leoromanovsky marked this pull request as ready for review March 3, 2026 22:35
@leoromanovsky leoromanovsky requested review from a team as code owners March 3, 2026 22:35
@leoromanovsky leoromanovsky requested review from brettlangdon and removed request for a team March 3, 2026 22:35
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

manifests/python.yml lgtm

@xlamorlette-datadog xlamorlette-datadog removed their request for review March 9, 2026 09:44
Replace hardcoded time.sleep(25) in each test setup with
agent_interface_timeout=30 on the FFE scenario. The container
shutdown flushes metrics; the timeout gives the agent time to
receive and process them.
Copy link
Collaborator

@cbeauchesne cbeauchesne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Framework usage : all good !

But could you get a review frome someone familiar with the tested feature ?

@leoromanovsky
Copy link
Contributor Author

Framework usage : all good !

But could you get a review frome someone familiar with the tested feature ?

Thanks yes I have asked FFE engineers to review as well before merging.

Assert that feature_flag.result.allocation_key tag is present
with value "default-allocation" on successful flag evaluations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants