Add deployment name and build ID labels to backlog metrics#9705
Add deployment name and build ID labels to backlog metrics#9705
Conversation
…ild_id labels to backlog metrics
The existing worker_version label encodes deployment name and build ID as a
single colon-delimited string (deploymentName:buildId). This is incompatible
with Kubernetes label length/format requirements for HPA via prometheus-adapter.
This adds two new labels that decompose the same information into separate
fields, enabling downstream consumers to match on deployment name and build ID
individually.
Cardinality impact: none. The new labels are a 1:1 decomposition of the
existing worker_version value — every unique (deployment_name, build_id) tuple
maps to exactly one worker_version string. No new time series are created.
The same BreakdownMetricsByBuildID dynamic config gates all three labels:
- Gate off: all three collapse to "__versioned__" / "__unversioned__"
- Gate on: real values are emitted
Example on an unversioned queue:
approximate_backlog_count{
task_priority="0",
worker_version="__unversioned__",
temporal_worker_deployment_name="__unversioned__",
temporal_worker_deployment_build_id="__unversioned__"
}
Example on a V3 versioned queue (gate on):
approximate_backlog_count{
task_priority="0",
worker_version="myDeployment:build123",
temporal_worker_deployment_name="myDeployment",
temporal_worker_deployment_build_id="build123"
}
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sites Ensures both fetchAndEmitLogicalBacklogMetrics and emitZeroLogicalBacklogForQueue derive deployment tags from the same string via the same function, preventing stale gauge values when V2 build IDs contain the ":" delimiter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| func parseDeploymentFromVersionKey(versionKey string) (deploymentName, buildID string) { | ||
| if name, id, found := strings.Cut(versionKey, worker_versioning.WorkerDeploymentVersionDelimiter); found { | ||
| return name, id | ||
| } | ||
| return "", "" | ||
| } | ||
|
|
There was a problem hiding this comment.
I know this is kinda ugly, sorry about this. not sure if there is any other way here?
thoughts?
There was a problem hiding this comment.
DescribeTaskQueuePartition request and response should treat WDV as a first class citizen and use struct instead of the old build id strings. But I can see that being it's own separate PR. We should do it sooner than later thought IMO.
| workerDeploymentName = "temporal_worker_deployment_name" | ||
| workerDeploymentBuildID = "temporal_worker_deployment_build_id" |
There was a problem hiding this comment.
we don't prefix any other metric tag, so we should not do for this one.
Separately, I usually prefer shorter tag names, like would deployment_name and build_id be enough?
There was a problem hiding this comment.
Dustin the external metrics PM suggested the temporal_worker_* prefix for these labels, since they are user facing. For reference, in external metrics, we have the following labels with temporal prefix: temporal_namespace, temporal_account, temporal_activity_type, temporal_task_queue, temporal_workflow_type. and these labels without temporal prefix: worker_version, operation, timeout_type, task_type, task_priority, is_background, namespace_mode.
Unfortunately we have to make a call on what to merge tonight, and Brandon + Dustin approved temporal_worker_deployment_name and temporal_worker_deployment_build_id, so I'm going to go with those.
| func parseDeploymentFromVersionKey(versionKey string) (deploymentName, buildID string) { | ||
| if name, id, found := strings.Cut(versionKey, worker_versioning.WorkerDeploymentVersionDelimiter); found { | ||
| return name, id | ||
| } | ||
| return "", "" | ||
| } | ||
|
|
There was a problem hiding this comment.
DescribeTaskQueuePartition request and response should treat WDV as a first class citizen and use struct instead of the old build id strings. But I can see that being it's own separate PR. We should do it sooner than later thought IMO.
Drop the temporal_worker_ prefix to be consistent with other metric tags in the codebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ent_build_id as agreed by PMs

Summary
temporal_worker_deployment_nameandtemporal_worker_deployment_build_id) toapproximate_backlog_countand related backlog metricsworker_versionlabel (which encodesdeploymentName:buildIdas a single string) into two separate fieldsCardinality impact
None. The new labels are a 1:1 decomposition of the existing
worker_versionvalue — every unique(deployment_name, build_id)tuple maps to exactly oneworker_versionstring. No new time series are created.The same
BreakdownMetricsByBuildIDdynamic config gates all three labels:worker_versiontemporal_worker_deployment_nametemporal_worker_deployment_build_id__unversioned____versioned__myDeploy:build1myDeploybuild1Files changed
common/metrics/tags.go— New tag constants and constructor functionsservice/matching/physical_task_queue_manager.go— Tags added to handler (propagates to db.go emission sites)service/matching/task_queue_partition_manager.go— Tags added to logical backlog emission sites +parseDeploymentFromVersionKeyhelperTest plan
🤖 Generated with Claude Code
Note
Low Risk
Low risk: changes are limited to metrics tagging and are gated by the existing
BreakdownMetricsByBuildIDflag, but could affect downstream metric queries/dashboards expecting the prior label set.Overview
Adds two new metrics tags,
temporal_worker_deployment_nameandtemporal_worker_deployment_build_id, alongsideworker_version.Propagates these tags through matching task queue manager/backlog metric emission (including logical backlog and unload zeroing), extracting deployment/build ID from the version key when applicable and updating the affected matching metrics test expectations.
Written by Cursor Bugbot for commit 87a805d. This will update automatically on new commits. Configure here.