Add finalizer management for application credentials#685
Add finalizer management for application credentials#685Deydra71 wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
Conversation
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/61090ad340f246b2a571069f7190fad7 ❌ openstack-k8s-operators-content-provider FAILURE in 8m 49s |
| if k8s_errors.IsNotFound(err) { | ||
| // If the ACID is set but the secret is not found, requeue to let the cache sync | ||
| if instance.Status.ACID != "" { | ||
| return ctrl.Result{RequeueAfter: time.Second * 5}, nil |
There was a problem hiding this comment.
If the ACID is set in status but the Secret is not found, the controller assumes an informer cache lag and requeues every 5 seconds. If the Secret was genuinely deleted (e.g. manual kubectl delete secret), this becomes an infinite requeue loop.
Consider adding a bounded retry - for example, an annotation counter or a timestamp check - and after a threshold (e.g. 30 seconds or N retries), fall through to doRotate = true so the controller self-heals instead of spinning indefinitely.
There was a problem hiding this comment.
We don't actually need the requeue anymore - we suppress reconcile trigger on create events, so this scenario can't happen anymore. I will remove the guard and that fixes the infinite loop possibility, which is valid and I replicated manually the problem when I manually deleted the finalziers.
| // finalizer should only be removed once all nodes across all NodeSets have | ||
| // been redeployed with the new credentials. This depends on per-node secret | ||
| // rotation tracking: https://github.com/openstack-k8s-operators/openstack-operator/pull/1781 | ||
| func hasConsumerFinalizer(secret *corev1.Secret) bool { |
There was a problem hiding this comment.
The substring check strings.Contains(f, "-ac-consumer") could match unrelated finalizers that happen to contain that substring. A tighter check would reduce false-positive risk:
if strings.HasPrefix(f, "openstack.org/") && strings.HasSuffix(f, "-ac-consumer") {
return true
}This still matches all service-operator consumer finalizers (openstack.org/barbican-ac-consumer, openstack.org/cinder-ac-consumer, etc.) while being robust against accidental collisions.
|
|
||
| for i := range secretList.Items { | ||
| s := &secretList.Items[i] | ||
| if protected[s.Name] || hasConsumerFinalizer(s) || !controllerutil.ContainsFinalizer(s, acSecretFinalizer) { |
There was a problem hiding this comment.
The !controllerutil.ContainsFinalizer(s, acSecretFinalizer) guard creates a permanent orphan scenario. If the controller crashes between line 625 (removing the protection finalizer via Update) and line 631 (deleting the Secret), the Secret is left without a finalizer. On the next reconcile, this guard causes the loop to skip it, and it is never cleaned up.
One fix is to remove the ContainsFinalizer pre-condition entirely - if the Secret is not protected by name and has no consumer finalizer, it should be deleted regardless of whether it still carries the protection finalizer:
if protected[s.Name] || hasConsumerFinalizer(s) {
continue
}Alternatively, invert the order: issue the Delete first (the API server won't actually remove the object while the finalizer exists), then remove the finalizer. That way, even on crash between the two operations, the object is already marked for deletion and the next reconcile just removes the finalizer.
There was a problem hiding this comment.
Yes, we don't need that additional check. Old mutable secrets with only the protection finalizers should be safe since the protected is checked before hasConsumerFinalizer if service operator would be late to add consumer finalizer to the secret.
I don't think inversing the order is a good way to go, we would need to handle the Terminating state, because deletion would be blocked.
| @@ -455,26 +728,25 @@ func (r *ApplicationCredentialReconciler) storeACSecret( | |||
|
|
|||
| op, err := controllerutil.CreateOrPatch(ctx, helperObj.GetClient(), secret, func() error { | |||
There was a problem hiding this comment.
Since each rotation produces a unique Secret name (ac-<svc>-<first5>-secret), CreateOrPatch will effectively always Create. However, if the controller crashes after the Create succeeds but before the status is patched, the retry will find the Secret already exists and attempt a Patch - which will try to update .data on an immutable Secret and fail.
Consider replacing this with a plain Create + IsAlreadyExists check:
err := helperObj.GetClient().Create(ctx, secret)
if k8s_errors.IsAlreadyExists(err) {
logger.Info("Immutable AC secret already exists (likely retry), proceeding", "secret", secretName)
return secretName, nil
}
if err != nil {
return "", fmt.Errorf("failed to create immutable AC secret %s: %w", secretName, err)
}| userID string, | ||
| ) error { | ||
| logger := r.GetLogger(ctx) | ||
| serviceName := strings.TrimPrefix(instance.Name, "ac-") |
There was a problem hiding this comment.
This derivation assumes every KeystoneApplicationCredential CR name follows the ac-<service> convention. If a CR is created with a name that doesn't start with ac-, TrimPrefix returns the full name unchanged and the resulting service label / label-based queries silently target the wrong set of Secrets.
Consider either:
- Validating the naming convention via a webhook at admission time, or
- Adding an explicit
serviceNamefield to the Spec so the derivation is unnecessary and the contract is explicit.
There was a problem hiding this comment.
openstack-operator creates AC CRs with a name convention defined in keystone-operator -https://github.com/openstack-k8s-operators/openstack-operator/blob/main/internal/openstack/applicationcredential.go#L113
It's still possible to manually create the AC CR ( we do that for tests bypassing openstack-op), but we want openstack-operator to be the sole creator of AC CR based on config in controlplane CR.
We could create additional api/v1beta1/keystoneapplicationcredential_webhook.go , let's see what other reviewers comment.
mauricioharley
left a comment
There was a problem hiding this comment.
Thanks for the work on this, Veronika. The overall architecture is solid - immutable per-rotation secrets, Keystone-side revocation, and consumer finalizer coordination are the right design choices.
I left five inline comments on areas that could lead to subtle issues in production:
- Infinite requeue loop (line 223) - unbounded retry when a Secret is genuinely deleted
- Fragile substring match (line 564) -
hasConsumerFinalizercould match unrelated finalizers - Orphaned secrets (line 606) - crash between finalizer removal and delete leaves permanently uncleaned secrets
- Immutable secret retry failure (line 729) -
CreateOrPatchfails on retry against an immutable Secret - Implicit naming convention (line 584) -
TrimPrefixsilently misbehaves if CR name doesn't followac-<service>
Please address these before merging.
Deleted unused `GetApplicationCredentialFromSecret` function and introduce immutable per-rotation AC secrets with deterministic names, add Keystone-side revocation of unused rotated ACs, and suppress Owns() create events on the secret watch to prevent a race condition caused by stale informer cach and sometimes causing additional AC secret to be created and deleted immediately during rotation. Signed-off-by: Veronika Fisarova <vfisarov@redhat.com> Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>
7af98fc to
45357af
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Deydra71 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Jira: OSPRH-28176, OSPRH-27512
GetApplicationCredentialFromSecretfunctionapplication-credential-service: barbicanpreviousSecretNameused for tracking previously used AC secretapplication-credential-servicelabel is added to any existing secret that lacks it, making it visible to the label-based deletion. This ensures old secrets are properly revoked and cleaned up on the next rotation or CR deletion, and prevents orphaned protection finalizersEach service operator that consumes an AC secret now places a
openstack.org/<service>-ac-consumerfinalizer on the AC secret it is actively using. This ensures the keystone-operator cannot revoke or clean up secret while a service is still holding a reference to it.NOTE: This PR doesn't incldue changes to tracking services that have credentials deployed on EDPM, that depends on openstack-k8s-operators/openstack-operator#1781
Tested with openstack-k8s-operators/barbican-operator#356
Assisted-by: Claude Opus 4.6 noreply@anthropic.com