-
Notifications
You must be signed in to change notification settings - Fork 89
DNM: Test: disable pod restart workaround after file-system restore #2134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: oadp-dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,15 +1,13 @@ | ||
| package e2e_test | ||
|
|
||
| import ( | ||
| "context" | ||
| "fmt" | ||
| "log" | ||
| "strings" | ||
| "time" | ||
|
|
||
| "github.com/onsi/ginkgo/v2" | ||
| "github.com/onsi/gomega" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
|
|
||
| "github.com/openshift/oadp-operator/tests/e2e/lib" | ||
| ) | ||
|
|
@@ -175,20 +173,23 @@ func runApplicationBackupAndRestoreViaCLI(brCase ApplicationBackupRestoreCase, u | |
| // run restore via CLI | ||
| runRestoreViaCLI(brCase.BackupRestoreCase, backupName, restoreName, nsRequiredResticDCWorkaround) | ||
|
|
||
| // TODO: Testing whether this workaround is still needed. Remove if | ||
| // file-system restore tests pass without it. | ||
| // | ||
| // For file-system backup restores (KOPIA/restic), the restored pods may have | ||
| // broken networking because OVN-Kubernetes doesn't fully wire the network | ||
| // namespace for pods recreated by Velero with a restore-wait init container. | ||
| // Deleting the pods lets the deployment controller create fresh ones with | ||
| // proper networking while preserving the restored PVC data. | ||
| if brCase.BackupRestoreType == lib.KOPIA { | ||
| log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace) | ||
| err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection( | ||
| context.Background(), | ||
| metav1.DeleteOptions{}, | ||
| metav1.ListOptions{LabelSelector: "e2e-app=true"}, | ||
| ) | ||
| gomega.Expect(err).ToNot(gomega.HaveOccurred()) | ||
| } | ||
| // if brCase.BackupRestoreType == lib.KOPIA { | ||
| // log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace) | ||
| // err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection( | ||
| // context.Background(), | ||
| // metav1.DeleteOptions{}, | ||
| // metav1.ListOptions{LabelSelector: "e2e-app=true"}, | ||
| // ) | ||
| // gomega.Expect(err).ToNot(gomega.HaveOccurred()) | ||
| // } | ||
|
Comment on lines
+176
to
+192
|
||
|
|
||
| // Run optional custom verification | ||
| if brCase.PostRestoreVerify != nil { | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -266,20 +266,23 @@ func runApplicationBackupAndRestore(brCase ApplicationBackupRestoreCase, updateL | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // run restore | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| runRestore(brCase.BackupRestoreCase, backupName, restoreName, nsRequiredResticDCWorkaround) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // TODO: Testing whether this workaround is still needed. Remove if | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // file-system restore tests pass without it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // For file-system backup restores (KOPIA/restic), the restored pods may have | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| // For file-system backup restores (KOPIA/restic), the restored pods may have | |
| // For file-system backup restores using KOPIA, the restored pods may have |
Copilot
AI
Mar 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of leaving the workaround as a large commented-out code block, consider keeping it as live code and gating it behind an explicit toggle (e.g., a ginkgo flag or env var like E2E_DISABLE_KOPIA_POD_RESTART_WORKAROUND). That keeps the test intent clear, avoids accumulating dead code in the suite, and makes it easy to re-enable in CI without editing source again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t make this experiment the default KOPIA path.
The code comment still documents this restart as a workaround for a known post-restore networking issue, so commenting it out here will make every non-CLI KOPIA job depend on cluster-specific OVN behavior again. If this goes past DNM, keep the current behavior as default and disable it only via an env var/label or a dedicated experimental lane.
One way to gate the experiment
- // TODO: Testing whether this workaround is still needed. Remove if
- // file-system restore tests pass without it.
- //
+ // TODO: Remove this once filesystem-restore tests are stable without the OVN workaround.
// For file-system backup restores (KOPIA/restic), the restored pods may have
// broken networking because OVN-Kubernetes doesn't fully wire the network
// namespace for pods recreated by Velero with a restore-wait init container.
// Deleting the pods lets the deployment controller create fresh ones with
// proper networking while preserving the restored PVC data.
- // if brCase.BackupRestoreType == lib.KOPIA {
- // log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
- // err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
- // context.Background(),
- // metav1.DeleteOptions{},
- // metav1.ListOptions{LabelSelector: "e2e-app=true"},
- // )
- // gomega.Expect(err).ToNot(gomega.HaveOccurred())
- // }
+ if brCase.BackupRestoreType == lib.KOPIA && os.Getenv("OADP_DISABLE_FS_RESTORE_POD_RESTART_WORKAROUND") != "true" {
+ log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
+ err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
+ context.Background(),
+ metav1.DeleteOptions{},
+ metav1.ListOptions{LabelSelector: "e2e-app=true"},
+ )
+ gomega.Expect(err).ToNot(gomega.HaveOccurred())
+ }As per coding guidelines, **: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // TODO: Testing whether this workaround is still needed. Remove if | |
| // file-system restore tests pass without it. | |
| // | |
| // For file-system backup restores (KOPIA/restic), the restored pods may have | |
| // broken networking because OVN-Kubernetes doesn't fully wire the network | |
| // namespace for pods recreated by Velero with a restore-wait init container. | |
| // Deleting the pods lets the deployment controller create fresh ones with | |
| // proper networking while preserving the restored PVC data. | |
| if brCase.BackupRestoreType == lib.KOPIA { | |
| log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace) | |
| err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection( | |
| context.Background(), | |
| metav1.DeleteOptions{}, | |
| metav1.ListOptions{LabelSelector: "e2e-app=true"}, | |
| ) | |
| gomega.Expect(err).ToNot(gomega.HaveOccurred()) | |
| } | |
| // if brCase.BackupRestoreType == lib.KOPIA { | |
| // log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace) | |
| // err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection( | |
| // context.Background(), | |
| // metav1.DeleteOptions{}, | |
| // metav1.ListOptions{LabelSelector: "e2e-app=true"}, | |
| // ) | |
| // gomega.Expect(err).ToNot(gomega.HaveOccurred()) | |
| // } | |
| // TODO: Remove this once filesystem-restore tests are stable without the OVN workaround. | |
| // For file-system backup restores (KOPIA/restic), the restored pods may have | |
| // broken networking because OVN-Kubernetes doesn't fully wire the network | |
| // namespace for pods recreated by Velero with a restore-wait init container. | |
| // Deleting the pods lets the deployment controller create fresh ones with | |
| // proper networking while preserving the restored PVC data. | |
| if brCase.BackupRestoreType == lib.KOPIA && os.Getenv("OADP_DISABLE_FS_RESTORE_POD_RESTART_WORKAROUND") != "true" { | |
| log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace) | |
| err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection( | |
| context.Background(), | |
| metav1.DeleteOptions{}, | |
| metav1.ListOptions{LabelSelector: "e2e-app=true"}, | |
| ) | |
| gomega.Expect(err).ToNot(gomega.HaveOccurred()) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/e2e/backup_restore_suite_test.go` around lines 269 - 285, The
commented-out pod-restart workaround for KOPIA restores should be restored but
gated behind a configurable opt-in flag rather than enabled by default:
re-enable the block that checks if brCase.BackupRestoreType == lib.KOPIA and
calls
kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(...),
but wrap it in a condition that reads an explicit environment variable or
test-case flag (e.g., ENABLE_KOPIA_RESTART or brCase.ExperimentalRestart bool)
that defaults to false; update test setup to read
os.Getenv("ENABLE_KOPIA_RESTART") (or the test harness config) and only perform
the DeleteCollection when that flag is true, and document the flag so jobs can
opt into the experimental behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "KOPIA/restic", but this code path is guarded by lib.KOPIA and there is no restic BackupRestoreType in tests/e2e/lib. Please update the wording if restic is no longer applicable.