Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 12 additions & 11 deletions tests/e2e/backup_restore_cli_suite_test.go
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
package e2e_test

import (
"context"
"fmt"
"log"
"strings"
"time"

"github.com/onsi/ginkgo/v2"
"github.com/onsi/gomega"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

"github.com/openshift/oadp-operator/tests/e2e/lib"
)
Expand Down Expand Up @@ -175,20 +173,23 @@ func runApplicationBackupAndRestoreViaCLI(brCase ApplicationBackupRestoreCase, u
// run restore via CLI
runRestoreViaCLI(brCase.BackupRestoreCase, backupName, restoreName, nsRequiredResticDCWorkaround)

// TODO: Testing whether this workaround is still needed. Remove if
// file-system restore tests pass without it.
//
// For file-system backup restores (KOPIA/restic), the restored pods may have
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "KOPIA/restic", but this code path is guarded by lib.KOPIA and there is no restic BackupRestoreType in tests/e2e/lib. Please update the wording if restic is no longer applicable.

Suggested change
// For file-system backup restores (KOPIA/restic), the restored pods may have
// For file-system backup restores (KOPIA), the restored pods may have

Copilot uses AI. Check for mistakes.
// broken networking because OVN-Kubernetes doesn't fully wire the network
// namespace for pods recreated by Velero with a restore-wait init container.
// Deleting the pods lets the deployment controller create fresh ones with
// proper networking while preserving the restored PVC data.
if brCase.BackupRestoreType == lib.KOPIA {
log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
context.Background(),
metav1.DeleteOptions{},
metav1.ListOptions{LabelSelector: "e2e-app=true"},
)
gomega.Expect(err).ToNot(gomega.HaveOccurred())
}
// if brCase.BackupRestoreType == lib.KOPIA {
// log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
// err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
// context.Background(),
// metav1.DeleteOptions{},
// metav1.ListOptions{LabelSelector: "e2e-app=true"},
// )
// gomega.Expect(err).ToNot(gomega.HaveOccurred())
// }
Comment on lines +176 to +192
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of commenting out the pod-restart workaround (and having to also remove/re-add imports when toggling), consider keeping it as live code behind an explicit toggle (ginkgo flag or env var). That makes it easy to run both variants in CI and avoids leaving large blocks of commented code in the suite long-term.

Copilot uses AI. Check for mistakes.
Comment on lines +176 to +192
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the CLI lane behind the same opt-in switch.

This has the same reliability risk in the CLI suite: if a cluster still needs the workaround, all KOPIA CLI restore jobs will start failing by default. If this moves past DNM, please wire this through the same env-var/helper gate as the non-CLI path instead of hard-disabling it here.

As per coding guidelines, **: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/backup_restore_cli_suite_test.go` around lines 176 - 192, The CLI
test disables the pod-restart workaround unconditionally; instead gate it behind
the same opt-in switch used by the non-CLI path so clusters that need the
workaround opt into it; update the commented block around
brCase.BackupRestoreType == lib.KOPIA to check the shared env-var/helper (the
same helper used by the non-CLI code) before calling
kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(...),
and preserve the existing behavior (log.Printf and
gomega.Expect(err).ToNot(gomega.HaveOccurred())) when the gate is enabled.


// Run optional custom verification
if brCase.PostRestoreVerify != nil {
Expand Down
21 changes: 12 additions & 9 deletions tests/e2e/backup_restore_suite_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -266,20 +266,23 @@ func runApplicationBackupAndRestore(brCase ApplicationBackupRestoreCase, updateL
// run restore
runRestore(brCase.BackupRestoreCase, backupName, restoreName, nsRequiredResticDCWorkaround)

// TODO: Testing whether this workaround is still needed. Remove if
// file-system restore tests pass without it.
//
// For file-system backup restores (KOPIA/restic), the restored pods may have
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions "KOPIA/restic", but this test suite only has a KOPIA backup/restore type (no restic type in lib.BackupRestoreType). If restic is no longer supported here, please update the wording to avoid confusion/outdated documentation.

Suggested change
// For file-system backup restores (KOPIA/restic), the restored pods may have
// For file-system backup restores using KOPIA, the restored pods may have

Copilot uses AI. Check for mistakes.
// broken networking because OVN-Kubernetes doesn't fully wire the network
// namespace for pods recreated by Velero with a restore-wait init container.
// Deleting the pods lets the deployment controller create fresh ones with
// proper networking while preserving the restored PVC data.
if brCase.BackupRestoreType == lib.KOPIA {
log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
context.Background(),
metav1.DeleteOptions{},
metav1.ListOptions{LabelSelector: "e2e-app=true"},
)
gomega.Expect(err).ToNot(gomega.HaveOccurred())
}
// if brCase.BackupRestoreType == lib.KOPIA {
// log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
// err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
// context.Background(),
// metav1.DeleteOptions{},
// metav1.ListOptions{LabelSelector: "e2e-app=true"},
// )
// gomega.Expect(err).ToNot(gomega.HaveOccurred())
// }
Comment on lines +269 to +285
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of leaving the workaround as a large commented-out code block, consider keeping it as live code and gating it behind an explicit toggle (e.g., a ginkgo flag or env var like E2E_DISABLE_KOPIA_POD_RESTART_WORKAROUND). That keeps the test intent clear, avoids accumulating dead code in the suite, and makes it easy to re-enable in CI without editing source again.

Copilot uses AI. Check for mistakes.
Comment on lines +269 to +285
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t make this experiment the default KOPIA path.

The code comment still documents this restart as a workaround for a known post-restore networking issue, so commenting it out here will make every non-CLI KOPIA job depend on cluster-specific OVN behavior again. If this goes past DNM, keep the current behavior as default and disable it only via an env var/label or a dedicated experimental lane.

One way to gate the experiment
- // TODO: Testing whether this workaround is still needed. Remove if
- // file-system restore tests pass without it.
- //
+ // TODO: Remove this once filesystem-restore tests are stable without the OVN workaround.
  // For file-system backup restores (KOPIA/restic), the restored pods may have
  // broken networking because OVN-Kubernetes doesn't fully wire the network
  // namespace for pods recreated by Velero with a restore-wait init container.
  // Deleting the pods lets the deployment controller create fresh ones with
  // proper networking while preserving the restored PVC data.
- // if brCase.BackupRestoreType == lib.KOPIA {
- // 	log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
- // 	err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
- // 		context.Background(),
- // 		metav1.DeleteOptions{},
- // 		metav1.ListOptions{LabelSelector: "e2e-app=true"},
- // 	)
- // 	gomega.Expect(err).ToNot(gomega.HaveOccurred())
- // }
+ if brCase.BackupRestoreType == lib.KOPIA && os.Getenv("OADP_DISABLE_FS_RESTORE_POD_RESTART_WORKAROUND") != "true" {
+ 	log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
+ 	err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
+ 		context.Background(),
+ 		metav1.DeleteOptions{},
+ 		metav1.ListOptions{LabelSelector: "e2e-app=true"},
+ 	)
+ 	gomega.Expect(err).ToNot(gomega.HaveOccurred())
+ }

As per coding guidelines, **: Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// TODO: Testing whether this workaround is still needed. Remove if
// file-system restore tests pass without it.
//
// For file-system backup restores (KOPIA/restic), the restored pods may have
// broken networking because OVN-Kubernetes doesn't fully wire the network
// namespace for pods recreated by Velero with a restore-wait init container.
// Deleting the pods lets the deployment controller create fresh ones with
// proper networking while preserving the restored PVC data.
if brCase.BackupRestoreType == lib.KOPIA {
log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
context.Background(),
metav1.DeleteOptions{},
metav1.ListOptions{LabelSelector: "e2e-app=true"},
)
gomega.Expect(err).ToNot(gomega.HaveOccurred())
}
// if brCase.BackupRestoreType == lib.KOPIA {
// log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
// err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
// context.Background(),
// metav1.DeleteOptions{},
// metav1.ListOptions{LabelSelector: "e2e-app=true"},
// )
// gomega.Expect(err).ToNot(gomega.HaveOccurred())
// }
// TODO: Remove this once filesystem-restore tests are stable without the OVN workaround.
// For file-system backup restores (KOPIA/restic), the restored pods may have
// broken networking because OVN-Kubernetes doesn't fully wire the network
// namespace for pods recreated by Velero with a restore-wait init container.
// Deleting the pods lets the deployment controller create fresh ones with
// proper networking while preserving the restored PVC data.
if brCase.BackupRestoreType == lib.KOPIA && os.Getenv("OADP_DISABLE_FS_RESTORE_POD_RESTART_WORKAROUND") != "true" {
log.Printf("Restarting pods in namespace %s to ensure proper networking after file-system restore", brCase.Namespace)
err = kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(
context.Background(),
metav1.DeleteOptions{},
metav1.ListOptions{LabelSelector: "e2e-app=true"},
)
gomega.Expect(err).ToNot(gomega.HaveOccurred())
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/backup_restore_suite_test.go` around lines 269 - 285, The
commented-out pod-restart workaround for KOPIA restores should be restored but
gated behind a configurable opt-in flag rather than enabled by default:
re-enable the block that checks if brCase.BackupRestoreType == lib.KOPIA and
calls
kubernetesClientForSuiteRun.CoreV1().Pods(brCase.Namespace).DeleteCollection(...),
but wrap it in a condition that reads an explicit environment variable or
test-case flag (e.g., ENABLE_KOPIA_RESTART or brCase.ExperimentalRestart bool)
that defaults to false; update test setup to read
os.Getenv("ENABLE_KOPIA_RESTART") (or the test harness config) and only perform
the DeleteCollection when that flag is true, and document the flag so jobs can
opt into the experimental behavior.


// Run optional custom verification
if brCase.PostRestoreVerify != nil {
Expand Down
Loading