Enable Controller-managed versioned scaling resources with `WorkerResourceTemplate` by carlydf · Pull Request #217 · temporalio/temporal-worker-controller

carlydf · 2026-03-06T04:30:22Z

What was changed

New CRD: `WorkerResourceTemplate` (WRT)

A new WorkerResourceTemplate CRD that lets users attach arbitrary namespaced Kubernetes resources (HPAs, PDBs, custom scalers, etc.) to a TemporalWorkerDeployment. The controller creates one copy of the resource per active worker version, with auto-injection of scaleTargetRef, selector.matchLabels, and metric selector labels to point at the correct versioned Deployment.

Key behaviors:

One copy per active Build ID, named {twdName}-{wrtName}-{buildID} (uniquely truncated to 47 chars, DNS-safe)
Auto-injects spec.scaleTargetRef (when set to {}) to reference the versioned Deployment → enables per-version HPA autoscaling, and any other scaler that uses scaleTargetRef
Auto-injects selector.matchLabels (when set to {}) with the correct per-version labels → enables per-version PDB targeting, and arbitrary CRDs that use selector.matchLabels to target versioned Deployments
Auto-appends worker_deployment_name, worker_deployment_build_id, and temporal_namespace to spec.metrics[*].external.metric.selector.matchLabels whenever matchLabels is present (including {}). User labels like task_type: "Activity" coexist alongside the injected keys. Absent matchLabels = no injection for that metric entry.
Applied via Server-Side Apply with field manager "temporal-worker-controller"
Owner ref on each resource copy points to the WorkerResourceTemplate → k8s GC deletes all copies when the WRT is deleted
Apply status written back to WorkerResourceTemplate.status.versions[*] (Applied, Message, BuildID)
Resource spec lives in spec.template (raw JSON/YAML embedded object)
Target TWD referenced via spec.temporalWorkerDeploymentRef.name

Validating Webhook

A WorkerResourceTemplateValidator webhook enforces:

apiVersion and kind required; metadata.name/metadata.namespace forbidden (controller sets these)
Allowed resource kinds configurable via ALLOWED_KINDS env var (default: HorizontalPodAutoscaler)
minReplicas ≠ 0 (currently required for approximate_task_queue_backlog metric-based autoscaling to work when queue is idle, plan to relax this in future)
scaleTargetRef must be absent or {} (opt-in sentinel); non-empty value rejected (controller owns injection)
selector.matchLabels must be absent or {} (opt-in sentinel); non-empty value rejected (controller owns injection)
metrics[*].external.metric.selector.matchLabels must not contain the controller-owned keys worker_deployment_name, worker_deployment_build_id, or temporal_namespace; user labels (e.g. task_type) are allowed
SAR check: requesting user must be able to create/update the embedded resource type
SAR check: controller service account must be able to create/update the embedded resource type
spec.temporalWorkerDeploymentRef.name is immutable after creation

Helm chart updates

helm/temporal-worker-controller-crds/templates/temporal.io_workerresourcetemplates.yaml (new CRD manifest)
helm/temporal-worker-controller/templates/webhook.yaml (always-on WorkerResourceTemplate ValidatingWebhookConfiguration; TemporalWorkerDeployment webhook now behind webhook.enabled)
helm/temporal-worker-controller/templates/certmanager.yaml (cert-manager Issuer + Certificate for TLS, default enabled)
helm/temporal-worker-controller/Chart.yaml (cert-manager added as optional subchart dependency; opt in via certmanager.install: true)
helm/temporal-worker-controller/templates/manager.yaml (cert volume/port always present; ALLOWED_KINDS, POD_NAMESPACE, SERVICE_ACCOUNT_NAME env vars)
helm/temporal-worker-controller/templates/rbac.yaml (WorkerResourceTemplate + SAR rules in manager ClusterRole; editor/viewer roles; configurable attached-resource RBAC)
helm/temporal-worker-controller/values.yaml (workerResourceConfig.allowedResources default: HPA, piped to ALLOWED_KINDS and to controller rbac)

Integration tests

New integration test subtests added to the existing envtest suite, all running through the shared testTemporalWorkerDeploymentCreation table-test runner:

WorkerResourceTemplate (7 tests): Deployment owner ref, matchLabels injection, multiple WorkerResourceTemplates on same TemporalWorkerDeployment, metric selector label injection, multiple active versions, apply failure → Applied:false, SSA idempotency
Rollout gaps (5 tests): Progressive ramp to Current, ConnectionSpecHash annotation repair, gate input from ConfigMap, gate input from Secret, multiple deprecated versions
Webhook admission (5 tests, separate Ginkgo suite): Spec rejection, SAR pass, SAR fail (user), SAR fail (controller SA), temporalWorkerDeploymentRef.name immutability

Why?

HPA autoscaling for versioned Temporal workers requires a separate HPA per worker version, each targeting only that version's Deployment with the correct scaleTargetRef and label selectors. Without this CRD, users have no way to create per-version resources that the controller lifecycle-manages alongside the versioned Deployments.

Checklist

Closes Enable CRUD of controller-managed scaling objects (and other custom scalers) #207
How was this tested:
- Full envtest integration test suite: new subtests covering WRT lifecycle, previously uncovered rollout scenarios, and webhook admission via the real HTTP admission path
- Unit tests: webhook validator, SSA naming/injection helpers, planner integration
- All tests pass: KUBEBUILDER_ASSETS=.../bin/k8s/1.27.1-darwin-arm64 go test -tags test_dep ./...
Any docs updates needed?
- docs/worker-resource-template.md added: concept overview, HPA example with cert-manager setup, RBAC configuration guide

…ned Deployments Introduces a new `TemporalWorkerOwnedResource` (TWOR) CRD that lets users attach arbitrary namespaced Kubernetes resources (HPA, PDB, WPA, custom CRDs, etc.) to each per-Build-ID versioned Deployment managed by a TemporalWorkerDeployment. Key design points: - One copy of the attached resource is created per active Build ID, owned by the corresponding versioned Deployment — Kubernetes GC deletes it automatically when the Deployment is removed, requiring no explicit cleanup logic. - Resources are applied via Server-Side Apply (create-or-update), so the controller is idempotent and co-exists safely with other field managers (e.g. the HPA controller). - Two-layer auto-population for well-known fields: Layer 1: `scaleTargetRef: null` and `matchLabels: null` in spec.object are auto-injected with the versioned Deployment's identity and selector labels. Layer 2: Go template expressions (`{{ .DeploymentName }}`, `{{ .BuildID }}`, `{{ .Namespace }}`) are rendered in all string values before apply. - Generated resource names use a hash-suffix scheme (`{prefix}-{8-char-hash}`) to guarantee uniqueness per (twdName, tworName, buildID) triple even when the prefix is truncated; the buildID is always represented in the hash regardless of name length. - `ComputeSelectorLabels` is now the single source of truth for selector labels used both in Deployment creation and in owned-resource matchLabels injection. - Partial-failure isolation: all owned resources are attempted on each reconcile even if some fail; errors are collected and surfaced together. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Extract getOwnedResourceApplies into planner package so it can be tested without a live API client - Add OwnedResourceApply type and OwnedResourceApplies slice to Plan - Thread twors []TemporalWorkerOwnedResource through GeneratePlan - Add TestGetOwnedResourceApplies (8 cases: nil/empty inputs, N×M cartesian, nil Raw skipped, invalid template skipped) - Add TestGetOwnedResourceApplies_ApplyContents (field manager, kind, owner reference, deterministic name) - Add TestGetOwnedResourceApplies_FieldManagerDistinctPerTWOR - Add two TWOR cases to TestGeneratePlan for end-to-end count check - Add helpers: createTestTWOR, createDeploymentWithUID, createTestTWORWithInvalidTemplate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Both the controller plan field and the planner Plan field now share the same name, making the copy-assignment self-documenting: plan.ApplyOwnedResources = planResult.ApplyOwnedResources Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Users don't need to template the k8s namespace (they already know it when creating their TWOR in that namespace). The Temporal namespace is more useful since it configures where the worker connects to. - TemplateData.Namespace → TemplateData.TemporalNamespace - RenderOwnedResource gains a temporalNamespace string parameter - getOwnedResourceApplies threads the value from spec.WorkerOptions.TemporalNamespace down to RenderOwnedResource - Update all tests: {{ .Namespace }} → {{ .TemporalNamespace }} - GoTemplateRendering test now uses distinct k8s ns ("k8s-production") and Temporal ns ("temporal-production") to make the difference clear Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements the admission webhook for TemporalWorkerOwnedResource with: - Pure spec validation: apiVersion/kind required, metadata.name/namespace forbidden, banned kinds (Deployment/StatefulSet/Job/Pod/CronJob by default), minReplicas≠0, scaleTargetRef/matchLabels absent-or-null enforcement - API checks: RESTMapper namespace-scope assertion, SubjectAccessReview for the requesting user and controller SA (with correct SA group memberships) - ValidateUpdate enforces workerRef.name immutability and uses verb="update" - ValidateDelete checks delete permissions on the underlying resource - Helm chart: injects POD_NAMESPACE and SERVICE_ACCOUNT_NAME via downward API, BANNED_KINDS from ownedResources.bannedKinds values Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- cmd/main.go: register TemporalWorkerOwnedResourceValidator unconditionally - webhook.yaml: rewrite to always create the webhook Service and TWOR ValidatingWebhookConfiguration; TWD validating webhook remains optional behind webhook.enabled - certmanager.yaml: fix service DNS names, remove fail guard, default enabled - manager.yaml: move cert volume mount and webhook port outside the webhook.enabled gate so the webhook server always starts - values.yaml: default certmanager.enabled to true, clarify that webhook.enabled only controls the optional TWD webhook Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add helm/crds/temporal.io_temporalworkerownedresources.yaml so Helm installs the CRD before the controller starts - Add temporalworkerownedresources get/list/watch/patch/update rules to the manager ClusterRole so the controller can watch and update status - Add authorization.k8s.io/subjectaccessreviews create permission for the validating webhook's SubjectAccessReview checks - Add editor and viewer ClusterRoles for end-user RBAC on TWOR objects Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

TemporalWorkerOwnedResource supports arbitrary user-defined resource types (HPA, PDB, custom CRDs) that are not known at install time. Add a wildcard rule to the manager ClusterRole so the controller can create/get/patch/update/delete any namespaced resource on behalf of TWOR objects. Security note: the TWOR validating webhook is a required admission control that verifies the requesting user has permission on the embedded resource type before the TWOR is admitted, so the controller's broad permissions act as executor, not gatekeeper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the wildcard ClusterRole rule with a configurable list of explicit resource type rules. Default to HPA and PDB — the two primary documented TWOR use cases. Wildcard mode is still available as an opt-in via ownedResources.rbac.wildcard=true for development clusters or when users attach many different custom CRD types. Operators add entries to ownedResources.rbac.rules for each additional API group their TWOR objects will use (e.g. keda.sh/scaledobjects). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Consolidates bannedKinds and rbac under a single top-level key for clarity. Update all template references accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add TWORName, TWORNamespace, BuildID to OwnedResourceApply so the executor knows which status entry to update after each apply attempt. Refactor the apply loop in execplan.go to collect per-(TWOR, BuildID) results (success or error) and then, after all applies complete, write OwnedResourceVersionStatus entries back to each TWOR's status subresource. This means: - Applied=true + ResourceName set on success - Applied=false + Message set on failure - All Build IDs for a TWOR are written atomically in one status update - Apply errors and status write errors are both returned via errors.Join so the reconcile loop retries on either kind of failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the double-nested errors.Join with a single call over the concatenated slice, which is equivalent and more readable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When a TemporalWorkerDeployment is reconciled, ensure each TemporalWorkerOwnedResource referencing it has an owner reference pointing back to the TWD (controller: true). This lets Kubernetes garbage-collect TWOR objects automatically when the TWD is deleted. The patch is skipped when the reference is already present (checked via metav1.IsControlledBy) to avoid a write on every reconcile loop. Uses client.MergeFrom to avoid conflicts with concurrent modifications. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

genplan should only read state and build a plan — not perform writes. Instead of patching TWORs directly in generatePlan, build (base, patched) pairs in genplan.go (pure computation) and let executePlan apply them, consistent with how all other writes are structured. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add TWOROwnerRefPatch type and EnsureTWOROwnerRefs to planner.Plan - Add getTWOROwnerRefPatches to planner package, unit-tested in planner_test.go (TestGetTWOROwnerRefPatches) - GeneratePlan now accepts a twdOwnerRef and populates EnsureTWOROwnerRefs; genplan.go builds the OwnerReference from the TWD object and passes it through - Owner ref patch failures in execplan.go now log-and-continue so that a deleted TWOR (race between list and patch) cannot block the more important owned-resource apply step Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ing it pre-built Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…bhook validator - Add +kubebuilder:object:generate=false to TemporalWorkerOwnedResourceValidator (client.Client interface field was blocking controller-gen) - Regenerate zz_generated.deepcopy.go: adds GateInputSource, GateWorkflowConfig, OwnedResourceVersionStatus deepcopy that were missing from the manual edit - Regenerate CRD manifests: adds type:object to spec.object in TWOR CRD, field ordering change in TWD CRD - Remove now-unused metav1 import from genplan.go (was missed in prior commit) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Tests the full reconciliation loop: create TWOR with HPA spec → controller applies one HPA per active Build ID via SSA → asserts scaleTargetRef is auto-injected with the correct versioned Deployment name → asserts TWOR.Status.Versions shows Applied: true → asserts TWD controller owner reference is set on the TWOR. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…functions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…etup - docs/owned-resources.md: full TWOR reference (auto-injection, RBAC, webhook TLS, examples) - examples/twor-hpa.yaml: ready-to-apply HPA example for the helloworld demo - helm/webhook.yaml + values.yaml: add certmanager.caBundle for BYO TLS without cert-manager - internal/demo/README.md: add cert-manager install step and TWOR demo walkthrough - README.md + docs/README.md: add cert-manager prerequisite, TWOR feature bullet, and doc link Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Truncate owned resource names to 47 chars (safe for Deployments; avoids per-resource-type special cases if Deployment is ever un-banned) - Fix docs: replace "active Build ID" with "worker version with running workers" throughout; "active" is reserved for Ramping/Current versions - Fix docs: owned-resource deletion is due to versioned Deployment sunset, not a separate "version delete" operation - Fix docs: scaleTargetRef injection applies to any resource type with that field, not just HPA; clarify webhook rejects non-null values because controller owns them - Fix docs: remove undocumented/untested BYO TLS path; cert-manager is required - Fix docs: expand TWOR abbreviation to full name throughout; remove ⏳ autoscaling bullet from README and clarify TWOR is the path for metric/backlog-based autoscaling - Add note on how to inspect the banned kinds list (BANNED_KINDS env var) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ission Implements all envtest-capable test scenarios identified in docs/test-coverage-analysis.md: **TWOR integration tests (tests 1–7)** in internal/tests/internal/twor_integration_test.go: - Owner reference on versioned Deployment points to TWOR's owning Deployment - matchLabels auto-injection (null sentinel → selector labels) - Multiple TWORs on the same TWD each produce independent resources - Go template variables (DeploymentName, BuildID, TemporalNamespace) - Multiple active build IDs each get their own owned resource instance - Partial SSA failure is isolated per resource (other versions still apply) - SSA idempotency: repeated reconciles produce no spurious updates **Rollout integration tests (tests 8, 9, 10, 13)** in internal/tests/internal/rollout_integration_test.go: - Progressive rollout auto-promotes to Current after the 30s pause expires - Controller repairs stale ConnectionSpecHashAnnotation on a versioned Deployment - Gate input from ConfigMap: blocks Deployment creation until ConfigMap exists - Three successive rollouts accumulate two deprecated versions in status **Webhook integration tests (tests 14–18)** in api/v1alpha1/temporalworkerownedresource_webhook_integration_test.go: - Banned kind rejected via the real HTTP admission path - SAR pass: admin user + controller SA with HPA RBAC → creation allowed - SAR fail: impersonated user without HPA permission → rejected - SAR fail: controller SA without HPA RBAC → rejected - workerRef.name immutability enforced via real HTTP update Supporting changes: - config/webhook/manifests.yaml: add TWOR ValidatingWebhookConfiguration - api/v1alpha1/webhook_suite_test.go: register TWOR webhook; add corev1/rbacv1/authorizationv1 to scheme; set controller SA env vars - docs/test-coverage-analysis.md: corrections to webhook and autoscaling sections Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…de review feedback - Convert runTWORTests and runRolloutTests into tworTestCases()/rolloutTestCases() table slices that run through the standard testTemporalWorkerDeploymentCreation runner, eliminating duplicate validation paths for TWD status and Temporal state - Add WithTWDMutatorFunc and WithPostTWDCreateFunc hooks to TestCaseBuilder to support gate-ConfigMap blocking and other pre/post-create mutations without polluting the builder API - Remove all test-number references (tests 1–7, Test 8, etc.) from comments; replace with self-contained scenario descriptions that don't depend on the coverage doc - Improve WithValidatorFunction doc comment to precisely state execution order: runs after both verifyTemporalWorkerDeploymentStatusEventually and verifyTemporalStateMatchesStatusEventually have confirmed the expected TWD state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Mark TWOR gaps 1-7, rollout gaps 8-10/13, and webhook tests 14-18 as implemented. Update subtest counts (28→39 integration, 0→5 webhook suite). Update priority recommendations to reflect remaining gaps.

Mirror of gate-input-from-configmap using SecretKeyRef. Controller blocks Deployment creation while the Secret is absent; creating the Secret unblocks the reconcile loop and the version promotes to Current. Also updates the test-coverage-analysis.md to mark Gate input from Secret as covered (was the last open envtest gap).

docs/test-coverage-analysis.md

Without this, 'go test ./...' in CI (where etcd is not installed) caused BeforeSuite to crash immediately instead of skipping gracefully. - BeforeSuite: Skip() early when KUBEBUILDER_ASSETS is unset - AfterSuite: return early when testEnv is nil (BeforeSuite was skipped) - test-unit: add envtest prerequisite and set KUBEBUILDER_ASSETS so the webhook integration tests actually run (not skip) in the unit test job

…long for the full version string to fit <=63 chars

…emplate Rendered resources (HPAs, PDBs, etc.) are now owned by the WorkerResourceTemplate that defines them rather than the versioned Deployment. This means deleting a WRT cascades to all per-Build-ID copies via k8s GC, without any controller action. Integration test wrt-deletion-cascade verifies the GC cascade end-to-end. wrt-owner-ref test updated to assert WRT (not Deployment) as controller owner. wrt-template-variable test updated: .DeploymentName removed; use .K8sNamespace/.TWDName. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…clobber DeleteWorkerResources: when a versioned Deployment is sunset (deleted), the controller now explicitly deletes the WRT-managed resources for that build ID. Because rendered resources are now owned by the WRT (not the Deployment), k8s GC no longer handles sunset cleanup — the controller does it instead via plan.DeleteWorkerResources. Names are computed deterministically (ComputeWorkerResourceTemplateName) so the delete is safe even if the WRT was never applied for that build ID. HPA replica clobber fix: remove ClearReplicasDeployments entirely. Patching spec.replicas=nil failed because the k8s Deployment defaulter resets nil→1 on every admission, causing an infinite reconcile loop. Instead, updateDeploymentWithPodTemplateSpec now skips the replicas field entirely when spec.Replicas is nil, leaving an external autoscaler in full control without interference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tors autoInjectFields was recursively injecting controller selector labels into any matchLabels: {} it found in the spec tree, including metric selectors like spec.metrics[*].external.metric.selector.matchLabels, which are user-owned. Fix: split into autoInjectFields (matchLabels only at spec.selector.matchLabels) and injectScaleTargetRefRecursive (scaleTargetRef anywhere in spec, unchanged). scaleTargetRef remains recursive because it is unambiguous across all supported resource types; matchLabels is not. Add test case verifying that a metric selector matchLabels: {} is left untouched. Update webhook comment and docs to document which fields are controller-owned. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…controller into temporal-worker-owned-resource

…ve merge conflicts once the other PR merges to main, but that's fine

docs/crd-management.md

carlydf · 2026-03-25T08:29:51Z

@carlydf a few inline comments but mostly looking good! Did I miss where that prometheus rule translation thing was included, though?

I pushed more changes that I discovered over the last day or two running in a minikube cluster.
All of the demo-related changes are mostly under internal/demo.

temporal-worker-controller/examples/wrt-hpa-backlog.yaml

Line 65 in d6d8f7c

- type: External

shows the templating working for backlog count with the re-written labels. Users who have short enough names can actually use the worker_version label directly, because I discovered that Temporal Cloud already replaces the / and the : characters with _. This PR is so big though that I'm wanting to save any "opt-in-to-shorter-build-id" option for a follow up, if we want it.

temporal-worker-controller/internal/demo/k8s/prometheus-stack-values.yaml

Line 111 in d6d8f7c

- record: temporal_backlog_count_by_version

is where the prometheus "recording rule" for backlog count is defined. The recording rule also tells k8s which k8s namespace to send the metric to, which is nice to not have to hard code.

Aside from the demo code, the main actual code changes I merged since your review Tuesday morning are:

Make the WRT the owner of the rendered versioned resources, so that when WRT is deleted they are also deleted. As for deletion of versioned resources when controller deletes Deployment during sunset, controller now does that via DeleteWorkerResources (see changes in planner.go).
Fix matchLabels injection scope: only spec.selector, not metric selectors
Prevent controller from clobbering replicas changes made by HPA: When spec.replicas is nil (omitted), the controller follows the Kubernetes-recommended pattern for HPA coexistence: active Deployments are created with nil replicas and the controller never calls UpdateScale on them, allowing the external scaler to take sole ownership. To ensure that controller does not accidentally write to replicas field during UpdateDeployment, skip updating the replicas field when spec.replicas is nil (see how ScaleDeployments and UpdateDeployments are handled).
Template key changes to align with how the metrics actually work (add TWDName and Namespace to templatable keys and rename Namespace template keyword -> K8sNamespace, remove DeploymentName)

…source template'

…e, build_id, temporal_namespace

…template support - Rename metric selector label build_id → worker_deployment_build_id throughout (workerresourcetemplates.go, prometheus-stack-values.yaml, tests) - Webhook now rejects ALL non-empty metrics[*].external.metric.selector.matchLabels; controller injects worker_deployment_name, worker_deployment_build_id, temporal_namespace automatically so users must leave matchLabels absent or {} - Remove Go template expression support entirely: validateTemplateExpressions now rejects any {{ }} with "not supported"; remove allowedTemplateFields and isAllowedFieldAction; remove text/template and text/template/parse imports - Update all tests to match: template cases now expect errors, metric selector matchLabels cases added, label names updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…template validation - Allow user-defined labels in metrics[*].external.metric.selector.matchLabels; only the three controller-owned keys (worker_deployment_name, worker_deployment_build_id, temporal_namespace) are rejected - Remove validateTemplateExpressions: no release supported templates, failures will surface at apply time instead - Restore task_type: "Activity" in wrt-hpa-backlog.yaml example metric selector - Prometheus recording rule: remove {task_type="Activity"} hard-filter, add task_type to sum by so HPAs can filter by any task type - Fix docs: build_id -> worker_deployment_build_id in auto-injection table - Nit: "injects into" -> "appends to" in workerresourcetemplates.go comment - Update tests to match: template expressions now allowed by webhook, controller-owned key rejection tests added Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…leanup - Move controllerOwnedKeys to package-level ControllerOwnedMetricLabelKeys var - Remove out-of-scope template expression test case - Prometheus recording rule: expand comment to list all produced labels explicitly (worker_deployment_name, worker_deployment_build_id, temporal_namespace, task_type) so it is clear both label_replace calls are present - Add note to metricSelectorLabels comment: matchLabels is a map; JSON marshal uses sorted keys so hash is deterministic; no explicit sorting needed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tion test The wrt-template-variable test tested template rendering which is no longer an officially supported feature. Replace it with wrt-metric-selector-injection which verifies the actual controller behaviour: worker_deployment_name, worker_deployment_build_id, and temporal_namespace are appended to any External metric selector matchLabels, and user-provided labels (task_type) coexist. Add waitForHPAWithInjectedMetricSelector helper. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…controller into temporal-worker-owned-resource

## Summary - Adds a required `kubectl label` + `kubectl annotate` step to the CRD migration instructions - Without this, `helm install` of the new CRDs chart fails with "cannot be imported into the current release: invalid ownership metadata" - Discovered during demo setup when migrating from previously-installed CRDs to the new charts format Extracted from #217 per [this comment](https://github.com/temporalio/temporal-worker-controller/pull/217/changes/BASE..d6d8f7ce7f5651529c7cdf216fe1a8b4fbecbe1a#r2986526435) — this needs to land independently in the chart version where CRDs are first split out. ## Test plan - [ ] Verify the migration steps work on a cluster with pre-existing CRDs installed via the old `crds/` directory 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…controller into temporal-worker-owned-resource

…sions, document metric selector injection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ceTemplate

carlydf · 2026-03-26T00:07:09Z

internal/demo/helloworld/helm/helloworld/templates/deployment.yaml

    deleteDelay: 24h
  # Desired number of worker replicas
-  replicas: 1
+  #replicas:


remove instead of comment out

carlydf and others added 26 commits February 26, 2026 15:53

fix lint

dda34a3

Rename ownedResources to ownedResourceConfig in Helm values

d0394e2

Consolidates bannedKinds and rbac under a single top-level key for clarity. Update all template references accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Simplify errors.Join call for apply and status errors

e1043dc

Replace the double-nested errors.Join with a single call over the concatenated slice, which is equivalent and more readable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Have planner build TWD OwnerReference from name+UID instead of receiv…

47c1ca6

…ing it pre-built Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Refactor TWOR integration test: extract validation steps into helper …

27987b8

…functions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Update test-coverage-analysis.md to reflect implemented tests

3d8832a

Mark TWOR gaps 1-7, rollout gaps 8-10/13, and webhook tests 14-18 as implemented. Update subtest counts (28→39 integration, 0→5 webhook suite). Update priority recommendations to reflect remaining gaps.

carlydf requested review from a team and jlegrone as code owners March 6, 2026 04:30

carlydf commented Mar 6, 2026

View reviewed changes

docs/test-coverage-analysis.md Outdated Show resolved Hide resolved

carlydf and others added 10 commits March 24, 2026 22:35

go back to using recording rule, since most Deployment Names are too …

826acf0

…long for the full version string to fit <=63 chars

rename Namespace template keyword -> K8sNamespace, remove DeploymentName

d96ffbf

fmt-imports

1f9348c

respond to Jay comments

165f5ab

Merge branch 'main' of https://github.com/temporalio/temporal-worker-…

9a3e6ff

…controller into temporal-worker-owned-resource

just use ConditionTypeReady instead of ConditionTypeWRTReady; will ha…

d6d8f7c

…ve merge conflicts once the other PR merges to main, but that's fine

improve grafana login instructions for demo

5f34dd7

carlydf commented Mar 25, 2026

View reviewed changes

docs/crd-management.md Show resolved Hide resolved

carlydf and others added 6 commits March 25, 2026 10:58

replace logs and strings referencing 'owned resource' with 'worker re…

2ed58f4

…source template'

make demo work with specific injectable labels: worker_deployment_nam…

c49a635

…e, build_id, temporal_namespace

fix weird claude style and fmt-imports

ddafc7d

Shivs11 mentioned this pull request Mar 25, 2026

docs: add Helm ownership labeling step to CRD migration guide #245

Merged

1 task

carlydf and others added 2 commits March 25, 2026 14:14

Merge branch 'main' of https://github.com/temporalio/temporal-worker-…

4557afa

…controller into temporal-worker-owned-resource

carlydf and others added 5 commits March 25, 2026 15:37

revert accidentally-committed demo patch

a5cf096

fix lint

b470eef

Merge branch 'main' of https://github.com/temporalio/temporal-worker-…

fb233b9

…controller into temporal-worker-owned-resource

update WorkerResourceTemplateSpec doc comment: remove template expres…

c5476de

…sions, document metric selector injection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

use ConditionReady for both TemporalWorkerDeployment and WorkerResour…

6f43c43

…ceTemplate

carlydf commented Mar 26, 2026

View reviewed changes

carlydf added 2 commits March 25, 2026 18:27

use helm chart for webhook validator test

2163975

remove all reference to templating

38240ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Controller-managed versioned scaling resources with `WorkerResourceTemplate`#217

Enable Controller-managed versioned scaling resources with `WorkerResourceTemplate`#217
carlydf wants to merge 103 commits intomainfrom
temporal-worker-owned-resource

carlydf commented Mar 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

carlydf commented Mar 25, 2026

Uh oh!

carlydf Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

carlydf commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed

New CRD: WorkerResourceTemplate (WRT)

Key behaviors:

Validating Webhook

Helm chart updates

Integration tests

Why?

Checklist

Uh oh!

Uh oh!

Uh oh!

carlydf commented Mar 25, 2026

Uh oh!

carlydf Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

carlydf commented Mar 6, 2026 •

edited

Loading

New CRD: `WorkerResourceTemplate` (WRT)