CNV-80608: PR15 - k8s: add orphan AlertRelabelConfig garbage collection#15
CNV-80608: PR15 - k8s: add orphan AlertRelabelConfig garbage collection#15sradco wants to merge 1 commit intoalert-mgmt-14-single-rulefrom
Conversation
b8e9698 to
bde1f65
Compare
3cf4844 to
e768021
Compare
bde1f65 to
1eb5d54
Compare
| return | ||
| } | ||
|
|
||
| arcs, err := rrm.alertRelabelConfigs.List(ctx, "") |
There was a problem hiding this comment.
we should try to filter by some label in the query instead fetching all
There was a problem hiding this comment.
Hi,
The GC workload is inherently "scan all, check each".
There's no way to pre-filter for orphans without already knowing which ones are orphans.
It's an informer cache read, not an API query.
There was a problem hiding this comment.
but then we are filtering the results by liveRuleIDs
are we not?
| } | ||
|
|
||
| if deleted > 0 { | ||
| log.Infof("orphan ARC GC: cleaned up %d orphaned ARCs", deleted) |
There was a problem hiding this comment.
not much use as we are already logging the deletion with the same level
|
|
||
| log.Infof("Synced %d relabeled rules in memory", len(alerts)) | ||
|
|
||
| rrm.gcOrphanedARCs(ctx, allRuleIDs) |
There was a problem hiding this comment.
this does not belong here
we want to run when PrometheusRule is updated or deleted
that should be handled in pkg/k8s/prometheus_rule.go
where you would add event handlers for those actions
logic for the garbage collection and respective unit test should also live in a dedicated file with matching names
pkg/k8s/relabeled_rules_gc_test.go is probably not convetional
There was a problem hiding this comment.
this does not belong here
we want to run when PrometheusRule is updated or deleted that should be handled in pkg/k8s/prometheus_rule.go where you would add event handlers for those actions
I disagree.
There are two separate PrometheusRule informers in this codebase, and they serve different purposes:
prometheus_rule.go is a CRUD manager. It wraps the Get/List/Update/Delete/AddRule operations.
It has no workqueue, no reconciliation loop, no event handlers.
relabeled_rules.go is a controller, It has a workqueue, event handlers, and a reconciliation loop.
Running GC on secret changes is unnecessary but harmless. It's an local cache scan that's a no-op when nothing is orphaned. Running on initial-sync and the 15-minute resync is actually important.
It catches orphans left behind from before the plugin started (for example: plugin crash, missed events). A pure event-driven approach would miss those.
logic for the garbage collection and respective unit test should also live in a dedicated file with matching names pkg/k8s/relabeled_rules_gc_test.go is probably not convetional
+1. Moved gcOrphanedARCs to alert_relabel_config_gc.go and renamed the test file to alert_relabel_config_gc_test.go
There was a problem hiding this comment.
the cache receives the event handlers for all the prometheus rule resources on startup
so thats not an issue
and although the garbage collector performs operations on ARC resources
those operations are necessary only due to operations on related prometheus rule resources
so the handler should be somewhere related
it actually can still be in this file, because we track the relationship between those resources
but should still track prometheus rule events, given by rrm.queue.Add("prometheus-rule-sync")
1eb5d54 to
fdee571
Compare
f62057b to
bad1af3
Compare
|
@machadovilaca thank you for the review. Please see my comments |
fdee571 to
e8a3aad
Compare
bad1af3 to
6ff63c9
Compare
e8a3aad to
a17e0df
Compare
6ff63c9 to
6b2d009
Compare
a17e0df to
5af9134
Compare
6b2d009 to
1c95614
Compare
5af9134 to
c37ca02
Compare
1c95614 to
6ace03c
Compare
When an operator or manual action removes rules from a PrometheusRule (or deletes the CR entirely), the ARCs created by the plugin for classification/drop/stamp become orphaned. This adds a GC pass to sync() that detects and deletes orphaned ARCs by comparing live rule IDs against each ARC's alertRuleId annotation. GitOps-managed ARCs are never auto-deleted; a warning is logged instead. Signed-off-by: Shirly Radco <sradco@redhat.com> Co-authored-by: AI Assistant <noreply@cursor.com>
6ace03c to
53c6cf9
Compare
When an operator or manual action removes rules from a PrometheusRule (or deletes the CR entirely),
the ARCs created by the plugin for
classification/drop/stamp become orphaned.
This adds a GC pass to sync() that detects and deletes orphaned ARCs by comparing live rule IDs against each ARC's alertRuleId annotation. GitOps-managed ARCs
are never auto-deleted; a warning is logged instead.