Skip to content

Add deployment name and build ID labels to backlog metrics#9705

Merged
carlydf merged 7 commits intomainfrom
add-deployment-name-build-id-backlog-labels
Mar 27, 2026
Merged

Add deployment name and build ID labels to backlog metrics#9705
carlydf merged 7 commits intomainfrom
add-deployment-name-build-id-backlog-labels

Conversation

@Shivs11
Copy link
Member

@Shivs11 Shivs11 commented Mar 26, 2026

Summary

  • Adds two new metric labels (temporal_worker_deployment_name and temporal_worker_deployment_build_id) to approximate_backlog_count and related backlog metrics
  • These decompose the existing worker_version label (which encodes deploymentName:buildId as a single string) into two separate fields
  • Enables downstream consumers like Kubernetes HPA via prometheus-adapter to match on deployment name and build ID individually, without hitting Kubernetes label length/format constraints

Cardinality impact

None. The new labels are a 1:1 decomposition of the existing worker_version value — every unique (deployment_name, build_id) tuple maps to exactly one worker_version string. No new time series are created.

The same BreakdownMetricsByBuildID dynamic config gates all three labels:

Queue type worker_version temporal_worker_deployment_name temporal_worker_deployment_build_id
Unversioned __unversioned__ "" ""
V3 versioned, gate off __versioned__ "" ""
V3 versioned, gate on myDeploy:build1 myDeploy build1

Files changed

  • common/metrics/tags.go — New tag constants and constructor functions
  • service/matching/physical_task_queue_manager.go — Tags added to handler (propagates to db.go emission sites)
  • service/matching/task_queue_partition_manager.go — Tags added to logical backlog emission sites + parseDeploymentFromVersionKey helper

Test plan

  • Existing matching service tests pass (CI)
  • Verify labels appear in metrics output after deployment

🤖 Generated with Claude Code


Note

Low Risk
Low risk: changes are limited to metrics tagging and are gated by the existing BreakdownMetricsByBuildID flag, but could affect downstream metric queries/dashboards expecting the prior label set.

Overview
Adds two new metrics tags, temporal_worker_deployment_name and temporal_worker_deployment_build_id, alongside worker_version.

Propagates these tags through matching task queue manager/backlog metric emission (including logical backlog and unload zeroing), extracting deployment/build ID from the version key when applicable and updating the affected matching metrics test expectations.

Written by Cursor Bugbot for commit 87a805d. This will update automatically on new commits. Configure here.

…ild_id labels to backlog metrics

The existing worker_version label encodes deployment name and build ID as a
single colon-delimited string (deploymentName:buildId). This is incompatible
with Kubernetes label length/format requirements for HPA via prometheus-adapter.

This adds two new labels that decompose the same information into separate
fields, enabling downstream consumers to match on deployment name and build ID
individually.

Cardinality impact: none. The new labels are a 1:1 decomposition of the
existing worker_version value — every unique (deployment_name, build_id) tuple
maps to exactly one worker_version string. No new time series are created.
The same BreakdownMetricsByBuildID dynamic config gates all three labels:

- Gate off: all three collapse to "__versioned__" / "__unversioned__"
- Gate on: real values are emitted

Example on an unversioned queue:
  approximate_backlog_count{
    task_priority="0",
    worker_version="__unversioned__",
    temporal_worker_deployment_name="__unversioned__",
    temporal_worker_deployment_build_id="__unversioned__"
  }

Example on a V3 versioned queue (gate on):
  approximate_backlog_count{
    task_priority="0",
    worker_version="myDeployment:build123",
    temporal_worker_deployment_name="myDeployment",
    temporal_worker_deployment_build_id="build123"
  }

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Shivs11 Shivs11 requested review from a team as code owners March 26, 2026 21:54
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

…sites

Ensures both fetchAndEmitLogicalBacklogMetrics and emitZeroLogicalBacklogForQueue
derive deployment tags from the same string via the same function, preventing
stale gauge values when V2 build IDs contain the ":" delimiter.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines +1302 to +1308
func parseDeploymentFromVersionKey(versionKey string) (deploymentName, buildID string) {
if name, id, found := strings.Cut(versionKey, worker_versioning.WorkerDeploymentVersionDelimiter); found {
return name, id
}
return "", ""
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is kinda ugly, sorry about this. not sure if there is any other way here?

thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DescribeTaskQueuePartition request and response should treat WDV as a first class citizen and use struct instead of the old build id strings. But I can see that being it's own separate PR. We should do it sooner than later thought IMO.

Comment on lines +40 to +41
workerDeploymentName = "temporal_worker_deployment_name"
workerDeploymentBuildID = "temporal_worker_deployment_build_id"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't prefix any other metric tag, so we should not do for this one.

Separately, I usually prefer shorter tag names, like would deployment_name and build_id be enough?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dustin the external metrics PM suggested the temporal_worker_* prefix for these labels, since they are user facing. For reference, in external metrics, we have the following labels with temporal prefix: temporal_namespace, temporal_account, temporal_activity_type, temporal_task_queue, temporal_workflow_type. and these labels without temporal prefix: worker_version, operation, timeout_type, task_type, task_priority, is_background, namespace_mode.

Unfortunately we have to make a call on what to merge tonight, and Brandon + Dustin approved temporal_worker_deployment_name and temporal_worker_deployment_build_id, so I'm going to go with those.

Comment on lines +1302 to +1308
func parseDeploymentFromVersionKey(versionKey string) (deploymentName, buildID string) {
if name, id, found := strings.Cut(versionKey, worker_versioning.WorkerDeploymentVersionDelimiter); found {
return name, id
}
return "", ""
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DescribeTaskQueuePartition request and response should treat WDV as a first class citizen and use struct instead of the old build id strings. But I can see that being it's own separate PR. We should do it sooner than later thought IMO.

Shivs11 and others added 4 commits March 26, 2026 22:02
Drop the temporal_worker_ prefix to be consistent with other metric tags
in the codebase.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@carlydf carlydf merged commit b45bbca into main Mar 27, 2026
92 of 96 checks passed
@carlydf carlydf deleted the add-deployment-name-build-id-backlog-labels branch March 27, 2026 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants