🌱 Add OLMv0-to-OLMv1 migration CLI and library#2581
🌱 Add OLMv0-to-OLMv1 migration CLI and library#2581perdasilva wants to merge 1 commit intooperator-framework:mainfrom
Conversation
Add a CLI tool (hack/tools/migrate/) and supporting migration library (internal/operator-controller/migration/) to migrate operators managed by OLMv0 (Subscription/CSV) to OLMv1 (ClusterExtension/ClusterExtensionRevision). CLI subcommands: - migrate (root): Full interactive migration of a single operator - migrate all: Scan cluster, check eligibility, and migrate all eligible operators - migrate check: Run readiness/compatibility checks without modifying resources - migrate gather: Collect and display migration info (dry-run) Migration library capabilities: - Operator profiling: read Subscription, CSV, InstallPlan state - Readiness checks: Subscription state, CSV health, uniqueness, dependencies - Compatibility checks: AllNamespaces mode, dependencies, APIServices, OperatorConditions - Resource collection: 4 strategies (CRD labels, owner labels, ownerRefs, InstallPlan steps) - ClusterCatalog resolution: match OLMv0 CatalogSource to OLMv1 ClusterCatalog - Zero-downtime migration with backup/recovery at each phase - Server-side apply with field manager olm.operatorframework.io/migration Signed-off-by: Per G. da Silva <pegoncal@redhat.com>
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
This PR introduces an OLMv0→OLMv1 migration library under internal/operator-controller/migration/ plus a hack/tools/migrate CLI that profiles OLMv0 Subscriptions/CSVs, runs eligibility checks, gathers installed resources, resolves a target ClusterCatalog, and then migrates the operator into ClusterExtension/ClusterExtensionRevision with backup/recovery and optional batch migration.
Changes:
- Add a migration library (readiness/compatibility checks, resource collection, catalog resolution, migration + recovery primitives).
- Add a
migrateCLI withmigrate,migrate all,migrate check, andmigrate gatherflows and interactive output/spinner UX. - Update Boxcutter applier to backfill missing
ownerReferenceson existing revisions (to support migrated revisions created before the CE exists).
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/operator-controller/migration/types.go | Adds core migration option/types and a Migrator wrapper. |
| internal/operator-controller/migration/scan.go | Implements cluster-wide Subscription scanning for eligibility. |
| internal/operator-controller/migration/readiness.go | Adds readiness checks for stable Subscription/CSV state. |
| internal/operator-controller/migration/compatibility.go | Adds compatibility checks (OperatorGroup mode, deps, APIServices, OperatorConditions). |
| internal/operator-controller/migration/compatibility_test.go | Unit tests for dependency/APIServices compatibility checks. |
| internal/operator-controller/migration/collector.go | Implements multi-strategy resource inventory collection from OLMv0 state. |
| internal/operator-controller/migration/catalog.go | Adds ClusterCatalog query/resolution + optional creation from CatalogSource image. |
| internal/operator-controller/migration/migration.go | Implements end-to-end migration steps, backup/recovery, CER/CE creation, cleanup helpers, and object stripping. |
| internal/operator-controller/migration/migration_test.go | Unit tests for resource stripping/annotation filtering and option defaults. |
| internal/operator-controller/applier/boxcutter.go | Ensures existing revisions get CE ownerRefs for GC (migration adoption support). |
| hack/tools/migrate/main.go | Wires CLI root + schemes + kubeconfig handling. |
| hack/tools/migrate/migrate.go | Implements interactive single-operator migration flow. |
| hack/tools/migrate/all.go | Implements migrate all batch scan + migrate loop. |
| hack/tools/migrate/check.go | Implements read-only readiness/compatibility/catalog availability checks. |
| hack/tools/migrate/gather.go | Implements dry-run profiling + resource inventory display. |
| hack/tools/migrate/output.go | Adds CLI formatting + spinner/progress plumbing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var cer ocv1.ClusterExtensionRevision | ||
| if err := m.Client.Get(ctx, types.NamespacedName{Name: cerName}, &cer); err != nil { | ||
| m.progress(fmt.Sprintf("Waiting for CER %s (not found yet)", cerName)) | ||
| return false, err | ||
| } |
There was a problem hiding this comment.
WaitForRevisionAvailable returns the Client.Get error from the poll function, which causes PollUntilContextTimeout to exit immediately on NotFound/transient errors. Ignore NotFound (return false, nil) so the wait can retry until the CER appears.
| case "Installed": | ||
| if c.Status == metav1.ConditionTrue { | ||
| return true, nil | ||
| } | ||
| installed = fmt.Sprintf("%s (%s)", c.Status, c.Reason) | ||
| case "Progressing": |
There was a problem hiding this comment.
Condition type matching here uses string literals ("Installed"/"Progressing") even though the API exposes constants (ocv1.TypeInstalled, ocv1.TypeProgressing). Using the constants matches existing repo patterns and avoids drift if condition names change.
| case "Installed": | |
| if c.Status == metav1.ConditionTrue { | |
| return true, nil | |
| } | |
| installed = fmt.Sprintf("%s (%s)", c.Status, c.Reason) | |
| case "Progressing": | |
| case ocv1.TypeInstalled: | |
| if c.Status == metav1.ConditionTrue { | |
| return true, nil | |
| } | |
| installed = fmt.Sprintf("%s (%s)", c.Status, c.Reason) | |
| case ocv1.TypeProgressing: |
| if err := m.PrepareForMigration(ctx, opts, csv); err != nil { | ||
| if recoverErr := m.RecoverFromBackup(ctx, opts, backup); recoverErr != nil { | ||
| return fmt.Errorf("preparation failed: %w; recovery also failed: %v", err, recoverErr) | ||
| } | ||
| return fmt.Errorf("preparation failed (recovered): %w", err) |
There was a problem hiding this comment.
In Migrate, PrepareForMigration (which deletes the Subscription/CSV) runs before resource collection. The CLI flow collects resources before preparation specifically to avoid losing ownerRef/label information during OLMv0 cleanup. Consider collecting resources before deleting OLMv0 management objects here as well so the library and CLI don’t diverge.
| success("Subscription state: AtLatestKnown, currentCSV == installedCSV") | ||
| success("CSV phase: Succeeded (InstallSucceeded)") | ||
| success("No olm.generated-by annotation (not a dependency)") | ||
| success("No duplicate Subscriptions for the same package") |
There was a problem hiding this comment.
These success messages claim checks that CheckReadiness doesn’t currently perform (e.g., “AtLatestKnown” specifically and currentCSV == installedCSV). Update the messages to reflect the actual readiness criteria (AtLatest/UpgradePending + CSV Succeeded + uniqueness/dependency checks) so users aren’t misled.
| success("Subscription state: AtLatestKnown, currentCSV == installedCSV") | |
| success("CSV phase: Succeeded (InstallSucceeded)") | |
| success("No olm.generated-by annotation (not a dependency)") | |
| success("No duplicate Subscriptions for the same package") | |
| success("Subscription state is acceptable for migration (e.g., AtLatest or UpgradePending)") | |
| success("CSV phase: Succeeded (InstallSucceeded)") | |
| success("Subscription is not marked as a dependency (no olm.generated-by annotation)") | |
| success("No duplicate Subscriptions detected for this package") |
| }) | ||
| if err := bc.Client.Update(ctx, rev); err != nil { | ||
| return fmt.Errorf("setting owner reference on revision %s: %w", rev.Name, err) | ||
| } |
There was a problem hiding this comment.
Using Client.Update to add ownerReferences is relatively conflict-prone because revisions are also updated by controllers (status/metadata). Consider patching only metadata.ownerReferences (e.g., merge/JSON patch) to reduce update conflicts and retries.
| fail(fmt.Sprintf("Readiness: %v", err)) | ||
| return fmt.Errorf("readiness check failed: %w", err) | ||
| } | ||
| success("Subscription is stable (state: AtLatestKnown, CSV: Succeeded)") |
There was a problem hiding this comment.
This output hard-codes “AtLatestKnown” as the stable state, but CheckReadiness allows both AtLatest and UpgradePending (and the constant name is SubscriptionStateAtLatest). Consider updating the message to reflect the allowed states rather than naming a single one.
| success("Subscription is stable (state: AtLatestKnown, CSV: Succeeded)") | |
| success("Subscription is stable (state: AtLatest or UpgradePending, CSV: Succeeded)") |
| if err := m.Client.Get(ctx, types.NamespacedName{ | ||
| Name: opts.SubscriptionName, | ||
| Namespace: opts.SubscriptionNamespace, | ||
| }, &restored); err != nil { | ||
| m.progress("Waiting for Subscription to appear...") |
There was a problem hiding this comment.
PollUntilContextTimeout stops polling when the condition function returns a non-nil error. Returning the raw Client.Get error here will abort recovery on transient errors (including NotFound right after create). Treat NotFound as a retry (return false, nil) and only return terminal errors.
| // Check readiness | ||
| if err := m.CheckReadiness(ctx, opts); err != nil { | ||
| result.ReadinessError = err | ||
| results = append(results, result) | ||
| continue | ||
| } |
There was a problem hiding this comment.
ScanAllSubscriptions already has the full Subscription list, but CheckReadiness re-GETs the Subscription and re-LISTs all Subscriptions again to enforce uniqueness. This makes migrate all O(N^2) API calls on clusters with many Subscriptions. Consider refactoring readiness checks to accept the current sub and/or the pre-fetched subList (or split uniqueness checking into a single pre-pass).
| objects, err := m.CollectResources(ctx, opts, csv, ip, info.PackageName) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to collect resources: %w", err) | ||
| } | ||
| info.CollectedObjects = objects |
There was a problem hiding this comment.
CollectResources is invoked after PrepareForMigration has already deleted the Subscription/CSV. If OLMv0 removes tracking metadata during CSV deletion, this can result in an incomplete inventory in the CER. Align ordering with the CLI (collect first) or make the dependency on deletion semantics explicit.
| blockOwnerDeletion := true | ||
| controller := true | ||
| rev.OwnerReferences = append(rev.OwnerReferences, metav1.OwnerReference{ | ||
| APIVersion: gvk.GroupVersion().String(), | ||
| Kind: gvk.Kind, |
There was a problem hiding this comment.
This always appends an ownerReference with Controller=true. If a revision already has some other controller ownerRef, adding another controller ref is invalid. Consider detecting an existing controller ownerRef and either skipping or adding a non-controller ownerRef instead.
| @@ -0,0 +1,300 @@ | |||
| package main | |||
There was a problem hiding this comment.
The hack dir is for we add tooling things for we maintain the project.
Are you thinking of providing this tool for end users?
If so I do not think we can have that here.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2581 +/- ##
==========================================
- Coverage 67.81% 58.25% -9.57%
==========================================
Files 137 150 +13
Lines 9526 11315 +1789
==========================================
+ Hits 6460 6591 +131
- Misses 2572 4217 +1645
- Partials 494 507 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Demo today was really impressive! |
Summary
hack/tools/migrate/) for migrating OLMv0-managed operators (Subscription/CSV) to OLMv1 (ClusterExtension/ClusterExtensionRevision)internal/operator-controller/migration/) with operator profiling, readiness/compatibility checks, resource collection (4 strategies), ClusterCatalog resolution, and zero-downtime migration with backup/recoverymigrate all), dry-run (migrate gather), and pre-flight checks (migrate check)Details
CLI subcommands
migrate -s <sub> -n <ns>migrate allmigrate check -s <sub> -n <ns>migrate gather -s <sub> -n <ns>Migration library capabilities
Test plan
migration/compatibility_test.go)migration/migration_test.go)migrate checkagainst a running cluster with OLMv0 operatorsmigrate gatherto verify resource collectionmigrateflow for a single operatormigrate allfor batch migrationDemo
🤖 Generated with Claude Code