feat(openfeature): add flag evaluation tracking via OTel Metrics#4489
Conversation
Count feature flag evaluations as custom metrics using the OTel Metrics API. The OTel SDK handles aggregation; metrics export to the Datadog agent via OTLP; the agent forwards to Metrics Platform. Metric: feature_flag.evaluations (Int64Counter, delta temporality) Attributes: feature_flag.key, feature_flag.provider.name, feature_flag.result.variant, feature_flag.result.reason, error.type Gated by DD_METRICS_OTEL_ENABLED=true (noop otherwise).
Codecov Report❌ Patch coverage is
Additional details and impacted files
🚀 New features to boost your workflow:
|
BenchmarksBenchmark execution time: 2026-03-03 16:13:42 Comparing candidate commit 7a57d24 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 155 metrics, 9 unstable metrics.
|
The OpenFeature Reason constants (TARGETING_MATCH, DEFAULT, DISABLED, ERROR) just need lowercasing for metric attributes. The explicit switch mapping was unnecessary indirection.
Always "Datadog" — adds no value as a tag dimension since this metric is only emitted by our own provider.
The switch was a mechanical mapping from sentinel errors to snake_case strings. A declarative map is clearer and eliminates the indirection.
| // Record flag evaluation metric | ||
| if p.flagEvalMetrics != nil { | ||
| p.flagEvalMetrics.record(ctx, flagKey, res.VariantKey, | ||
| strings.ToLower(string(res.Reason)), res.Error) | ||
| } |
There was a problem hiding this comment.
nitpick: having 3 string parameters is error-prone. It would be cleaner if record() accepted evaluationResult:
| // Record flag evaluation metric | |
| if p.flagEvalMetrics != nil { | |
| p.flagEvalMetrics.record(ctx, flagKey, res.VariantKey, | |
| strings.ToLower(string(res.Reason)), res.Error) | |
| } | |
| // Record flag evaluation metric | |
| if p.flagEvalMetrics != nil { | |
| p.flagEvalMetrics.record(ctx, flagKey, res) | |
| } |
There was a problem hiding this comment.
Thanks agreed with that!
Remove the ownsProvider field from flagEvalMetrics since it was a test-only knob that made the shutdown test a no-op. ddmetric.Shutdown() already handles noop and SDK providers gracefully, so the guard is unnecessary. Remove TestShutdownClean which was meaningless because setupTestMetrics set ownsProvider=false, causing shutdown() to skip all real work.
Move metric recording from a defer in evaluate() to an OpenFeature Finally hook. The old approach missed type conversion errors (e.g., calling BooleanValue on a string flag) and "not ready" state evaluations because those happen after evaluate() returns. The Finally hook fires after ALL evaluation logic completes, including type-specific conversions in BooleanEvaluation/StringEvaluation/etc., so it captures the full picture. Also simplify record() to accept InterfaceEvaluationDetails instead of 3 separate string params, and use OpenFeature ErrorCode for error classification instead of matching against sentinel errors.
Addressed all review commentsCommits
System testsAlso updated system-tests in DataDog/system-tests#6410:
Local test resultsAll 18 FFE system tests pass (previously 17, +1 new type mismatch test): Unit tests also pass: |
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
Motivation
Per the RFC "Flag evaluations tracking for APM tracers" (Oleksii Shmalko, 2026-01-20): we want to collect a metric for flag evaluations to track usage of flags. This data powers the FFE product's change tracking ("which services evaluate this flag?") and usage analytics.
The RFC evaluated 6 alternatives (tracer-side aggregation via EVP, custom agent aggregation, Metrics Platform, reuse agent pipeline with custom intake, OTel Events) and recommends the Metrics Platform approach: implement flag evaluations tracking as regular custom metrics sent via the OpenTelemetry Metrics API. Tracers aggregate metrics via OTel, send aggregated metrics to the agent via OTLP, and the agent sends metrics to Metrics Platform. This approach has the lowest SDK team effort, requires no backend changes, requires no agent changes, and has good performance.
Key RFC constraints:
Changes
openfeature/flageval_metrics.go: Creates a dedicatedMeterProvidervia dd-trace-go's OTel metrics support (ddmetric.NewMeterProvider()). Defines anInt64Counterinstrument (feature_flag.evaluations, delta temporality, 10s export interval). Providesrecord()to emit metric with attributes:feature_flag.key,feature_flag.result.variant,feature_flag.result.reason, anderror.type(on error). Error classification uses a declarativeerrorTypeTagsmap from sentinel errors to low-cardinality strings.openfeature/provider.go: AddedflagEvalMetricsfield toDatadogProvider. Wired intonewDatadogProvider()(creates metrics on init),evaluate()(records metric via defer after every evaluation, reason lowercased directly from OpenFeature constants), andShutdownWithContext()(graceful meter provider shutdown).openfeature/flageval_metrics_test.go: Table-driven unit tests using OTel SDKManualReaderfor in-memory metric collection. Covers success/error/default/disabled attributes, multiple evaluations aggregation, different flag series, all error types, and integration withevaluate().Decisions
DD_METRICS_OTEL_ENABLEDis nottrue— zero overhead when disabled.feature_flag.key,feature_flag.result.variant,feature_flag.result.reason,error.type. High-cardinality attributes (targeting_key, context, allocation) explicitly excluded per RFC to avoid blowing up custom metric cardinality.feature_flag.provider.namealso excluded — always "Datadog", adds no value.Enabling OTLP in production / dogfooding
The following is needed on the deployment side to receive these metrics:
DD_METRICS_OTEL_ENABLED=trueandOTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://<agent-host>:4318/v1/metricsDD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINTdoesn't properly nest the config — you need to mount adatadog.yamlwith the nested YAML structure:pid: hostto avoid "failed to register process metrics: process does not exist" which crashes the OTLP pipeline.Dogfooding branch: https://github.com/DataDog/ffe-dogfooding/tree/leo.romanovsky/flageval-metrics-dogfooding
Dogfooding evidence
Metric
feature_flag.evaluationsconfirmed registered in Datadog (Eppo org, datadoghq.com) with the Go dogfooding app runningdd-trace-go v2.7.0-dev.1:Metric metadata confirmed in Datadog:
Local test evidence
Unit tests (all pass)
System tests (all 17 FFE tests pass — 0 regressions)
Companion PRs