More details on restore design by sseago · Pull Request #2151 · openshift/oadp-operator

sseago · 2026-04-02T21:05:04Z

Why the changes were made

This PR updates the restore design for kubevirt datamover with more details.

How to test the changes made

coderabbitai · 2026-04-02T21:05:18Z

Walkthrough

This pull request updates the KubeVirt datamover design document to expand on velero integration details, including new VM/PVC coordination annotations, restore plugin requirements, controller restore workflow enhancements, datamover pod operation sequencing, and backup metadata format extensions with per-checkpoint pvcSizes entries.

Changes

Cohort / File(s)	Summary
Design Documentation `docs/design/kubevirt-datamover.md`	Extended velero integration section with new annotations (`velerov1api.PVCNamespaceNameLabel`, `kubevirt-datamover-vm`), updated restore plugin requirements for VirtualMachine and PVC RIA plugins, refined DataDownload reconciler workflow with explicit BSL metadata and temporary PVC/PV creation logic, detailed datamover pod sequential PVC processing with qcow2 handling and block device writes, and augmented backup metadata format with pvcSizes per checkpoint.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-04-02T21:05:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sseago

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [sseago]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/kubevirt-datamover.md`:
- Around line 101-102: The doc's PVC sizing logic currently hardcodes a 10%
buffer when computing temp PVC size from qcow2 sizes and PVC sizes; update the
implementation (the PVC sizing calculation / any function named like
calculate_buffered_size or tempPVCSize computation in the datamover/controller)
to make the buffer percentage configurable and increase the default to a safer
value (e.g., 15–20%), add validation tests or notes documenting empirical
bounds, and implement runtime handling for insufficient-space errors by
detecting qemu-img or filesystem ENOSPC failures and retrying the operation
after increasing the PVC size (or failing with a clear actionable error). Ensure
the change is reflected in docs by replacing the hardcoded "10% buffer" text
with the new configurable/default value and a note about dynamic retry behavior.
- Line 118: The FIXME in docs/design/kubevirt-datamover.md about using the
unsafe `-u` flag for qemu-img rebase must be resolved: decide whether to use
`-u` (and under what conditions) by reviewing current KubeVirt incremental
backup docs and consulting KubeVirt SME/PRs, then replace the FIXME with a
definitive guidance sentence that states the chosen approach, the exact qemu-img
invocation pattern, and the safety rationale (when `-u` is allowed vs forbidden)
so implementers of the datamover know whether to include `-u` in the rebase step
and why.
- Line 126: Clarify cleanup by explicitly naming the temporary resources and
their lifecycle: state that the "qcow2 download PVC" (the temp PVC created for
the qcow2 download/staging) and its backing PV (referred to as the "qcow2
staging PV") are deleted after the datamover pod finishes and the qcow2 image is
converted; state that each "per-VM temp PVC" (the temp PVCs created for each VM
PVC) are also deleted after their corresponding VM disk has been copied to the
target PVC and the PV bind is completed; specify which PVs are retained (the
final restored PVs bound to the target PVCs) versus which PVs are removed (the
qcow2 staging PV and any PVs exclusively backing the temp PVCs), and add a
concise ordered lifecycle sequence (create qcow2 download PVC -> create per-VM
temp PVCs -> datamover copies/creates final PVCs -> datamover exits -> delete
per-VM temp PVCs and qcow2 download PVC -> leave restored PVs bound to final
PVCs).
- Around line 62-63: The document references a non-existent constant
velerov1api.PVCNamespaceNameLabel and mixes “label” vs “annotation”; update the
design to either (a) import/add a constant in Velero upstream only if agreed
with maintainers, or (b) define it locally in pkg/common/common.go (e.g.,
PVCNamespaceNameAnnotation) and use that symbol in the doc; rename the
identifier to remove “Label” if it will be an annotation, update the
VirtualMachine annotation description to state its exact key/name, expected
value format (e.g., "<namespace>/<name>" or just "<namespace>"), and how it is
used to compute the ConfigMap label selector during restore so the
implementation (OperationID/async progress/DU status) can rely on a concrete
annotation key and value shape.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 36d9a07b-999a-41a6-8690-0d2f27272d0e

📥 Commits

Reviewing files that changed from the base of the PR and between 3cceb1e and 5ed54cf.

📒 Files selected for processing (1)

docs/design/kubevirt-datamover.md

coderabbitai · 2026-04-02T21:09:03Z

docs/design/kubevirt-datamover.md

+  - Add `velerov1api.PVCNamespaceNameLabel` annotation to VirtualMachine (doesn't need to be a label, since we're just using it to figure out what label selector to use for the ConfigMap on restore).
  - OperationID will be created and returned similar to what's done with the CSI PVC plugin, and the async operation Progress method will report on progress based on the DU status (similar to CSI PVC plugin)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Search for PVCNamespaceNameLabel definition in Velero API references # Search for PVCNamespaceNameLabel in go files rg -n "PVCNamespaceNameLabel" --type=go # Also check if it's imported from velero rg -n "velerov1api\." --type=go -A2 -B2 | rg -i "pvc.*label"

Repository: openshift/oadp-operator

Length of output: 49

🏁 Script executed:

# Search for similar label/annotation constants in the codebase rg "Label\s*=" --type=go | head -20 # Search for velerov1api imports to understand what's available rg "velerov1api\." --type=go -B5 | head -40 # Check pkg/common/common.go for existing constants fd "common.go" --type=f | head -5

Repository: openshift/oadp-operator

Length of output: 1220

🏁 Script executed:

# Read pkg/common/common.go to see existing constants cat -n pkg/common/common.go

Repository: openshift/oadp-operator

Length of output: 15794

🏁 Script executed:

# Search for how other Velero annotations are referenced and used rg "velerov1\." --type=go | grep -i "annotation\|label" | head -20

Repository: openshift/oadp-operator

Length of output: 586

Clarify and verify the PVCNamespaceNameLabel annotation.

The constant velerov1api.PVCNamespaceNameLabel is referenced in the design document but does not exist anywhere in the codebase or Velero v1 API references found here. Additionally, the annotation name includes "Label" while the comment states it "doesn't need to be a label"—this contradiction is confusing.

Before implementation, clarify:

Whether PVCNamespaceNameLabel needs to be added to Velero upstream or defined locally in pkg/common/common.go

Why the name includes "Label" if it's an annotation, not a label

The precise purpose and structure of this annotation for ConfigMap label selector resolution

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/design/kubevirt-datamover.md` around lines 62 - 63, The document references a non-existent constant velerov1api.PVCNamespaceNameLabel and mixes “label” vs “annotation”; update the design to either (a) import/add a constant in Velero upstream only if agreed with maintainers, or (b) define it locally in pkg/common/common.go (e.g., PVCNamespaceNameAnnotation) and use that symbol in the doc; rename the identifier to remove “Label” if it will be an annotation, update the VirtualMachine annotation description to state its exact key/name, expected value format (e.g., "<namespace>/<name>" or just "<namespace>"), and how it is used to compute the ConfigMap label selector during restore so the implementation (OperationID/async progress/DU status) can rely on a concrete annotation key and value shape.

coderabbitai · 2026-04-02T21:09:03Z

docs/design/kubevirt-datamover.md

+	- PVC size based on the size of the qcow2 files in BSL needed for restore as well as the PVC sizes
+    - For each PVC, calculate the sum of all qcow2 files added to the PVC size, and then add 10% as a buffer. If there are multiple PVCs, take the max value, as we can process one PVC at a time, so we don't need to hold files for all PVCs on the temp disk at the same time.


⚠️ Potential issue | 🟠 Major

Validate the 10% buffer sizing calculation.

The sizing calculation adds a 10% buffer to the sum of qcow2 files and PVC sizes. This buffer percentage appears arbitrary and may be insufficient in edge cases, such as:

Highly compressed qcow2 files that expand significantly during conversion

Temporary intermediate files created during the qemu-img operations

Filesystem overhead on the temporary PVC

Consider whether:

The 10% buffer has been empirically validated with real-world VM backup scenarios

A larger or dynamic buffer (e.g., 15-20%) would be safer

The controller should detect and handle "insufficient space" errors by automatically retrying with a larger PVC

Would you like me to research typical qcow2 expansion ratios or generate logic to handle PVC resize scenarios?

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 101-101: Inconsistent indentation for list items at the same level
Expected: 4; Actual: 1

(MD005, list-indent)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/design/kubevirt-datamover.md` around lines 101 - 102, The doc's PVC sizing logic currently hardcodes a 10% buffer when computing temp PVC size from qcow2 sizes and PVC sizes; update the implementation (the PVC sizing calculation / any function named like calculate_buffered_size or tempPVCSize computation in the datamover/controller) to make the buffer percentage configurable and increase the default to a safer value (e.g., 15–20%), add validation tests or notes documenting empirical bounds, and implement runtime handling for insufficient-space errors by detecting qemu-img or filesystem ENOSPC failures and retrying the operation after increasing the PVC size (or failing with a clear actionable error). Ensure the change is reflected in docs by replacing the hardcoded "10% buffer" text with the new configurable/default value and a note about dynamic retry behavior.

coderabbitai · 2026-04-02T21:09:03Z

docs/design/kubevirt-datamover.md

+	- Process one PVC at a time:
+      - Download all required qcow2 files for this PVC.
+	  - Call `qemu-img rebase -b fullbackup.qcow2 -f qcow2 -u incrementalN.qcow2` for each incremental backup, in order
+      - FIXME: Should we use the unsafe `-u` flag? kubevirt docs suggested this, but we should confirm that this is still appropriate.


⚠️ Potential issue | 🟠 Major

Resolve the FIXME about the unsafe -u flag before implementation.

The FIXME comment indicates uncertainty about whether to use the unsafe -u flag with qemu-img rebase. This is a critical decision point that affects data integrity during restore operations.

The unsafe flag bypasses backing file format consistency checks. Using it incorrectly could result in:

Corrupted restored VM disks

Silent data loss if the backing chain is inconsistent

Difficult-to-debug restore failures

This design decision must be resolved before implementation proceeds. Consult with KubeVirt subject matter experts or review the latest KubeVirt incremental backup documentation to confirm the recommended approach.

Would you like me to search the KubeVirt enhancement proposal and related issues to find guidance on this flag usage?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/design/kubevirt-datamover.md` at line 118, The FIXME in docs/design/kubevirt-datamover.md about using the unsafe `-u` flag for qemu-img rebase must be resolved: decide whether to use `-u` (and under what conditions) by reviewing current KubeVirt incremental backup docs and consulting KubeVirt SME/PRs, then replace the FIXME with a definitive guidance sentence that states the chosen approach, the exact qemu-img invocation pattern, and the safety rationale (when `-u` is allowed vs forbidden) so implementers of the datamover know whether to include `-u` in the rebase step and why.

coderabbitai · 2026-04-02T21:09:03Z

docs/design/kubevirt-datamover.md

+	  - Delete the raw file
    - Note that the various `qemu-img` actions might eventually be combined into a single kubevirt API call, but for the moment this would need to be done manually.
-  - Once datamover pod has restored the VM disks, it will exit and the VirtualMachine can launch with these disks (following the velero datamover model where the temporary PVC is deleted, retaining the PV, which then binds to the VM's PVCs may work here). The temporary PVC (containing the qcow2 images, not the restored VM disk image) should be completely removed at this point, including PV content.
+  - Once datamover pod has restored the VM disks, it will exit, the temporary PVCs will be deleted, leaving the restored PVs but deleting the qcow2 staging PV.


⚠️ Potential issue | 🟡 Minor

Clarify the cleanup logic for temporary resources.

The statement "temporary PVCs will be deleted, leaving the restored PVs but deleting the qcow2 staging PV" is unclear about the distinction between:

Temporary PVCs used for qcow2 download (created at line 99)

Temporary PVCs for each VM PVC (created at line 103)

The "qcow2 staging PV" mentioned here

Please clarify:

Which specific PVCs/PVs are deleted vs. retained

Whether "qcow2 staging PV" refers to the PV backing the temp PVC from line 99

The lifecycle of all temporary resources created during restore

This ambiguity could lead to resource leaks or incorrect cleanup during implementation.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/design/kubevirt-datamover.md` at line 126, Clarify cleanup by explicitly naming the temporary resources and their lifecycle: state that the "qcow2 download PVC" (the temp PVC created for the qcow2 download/staging) and its backing PV (referred to as the "qcow2 staging PV") are deleted after the datamover pod finishes and the qcow2 image is converted; state that each "per-VM temp PVC" (the temp PVCs created for each VM PVC) are also deleted after their corresponding VM disk has been copied to the target PVC and the PV bind is completed; specify which PVs are retained (the final restored PVs bound to the target PVCs) versus which PVs are removed (the qcow2 staging PV and any PVs exclusively backing the temp PVCs), and add a concise ordered lifecycle sequence (create qcow2 download PVC -> create per-VM temp PVCs -> datamover copies/creates final PVCs -> datamover exits -> delete per-VM temp PVCs and qcow2 download PVC -> leave restored PVs bound to final PVCs).

openshift-ci · 2026-04-02T21:44:33Z

@sseago: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

More details on restore design

5ed54cf

openshift-ci bot requested review from kaovilai and shubham-pampattiwar April 2, 2026 21:05

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 2, 2026

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More details on restore design#2151

More details on restore design#2151
sseago wants to merge 1 commit intoopenshift:oadp-devfrom
sseago:restore-design

sseago commented Apr 2, 2026

Uh oh!

coderabbitai bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

openshift-ci bot commented Apr 2, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 2, 2026

Uh oh!

coderabbitai bot Apr 2, 2026

Uh oh!

coderabbitai bot Apr 2, 2026

Uh oh!

coderabbitai bot Apr 2, 2026

Uh oh!

openshift-ci bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		- Add `velerov1api.PVCNamespaceNameLabel` annotation to VirtualMachine (doesn't need to be a label, since we're just using it to figure out what label selector to use for the ConfigMap on restore).
		- OperationID will be created and returned similar to what's done with the CSI PVC plugin, and the async operation Progress method will report on progress based on the DU status (similar to CSI PVC plugin)

		- PVC size based on the size of the qcow2 files in BSL needed for restore as well as the PVC sizes
		- For each PVC, calculate the sum of all qcow2 files added to the PVC size, and then add 10% as a buffer. If there are multiple PVCs, take the max value, as we can process one PVC at a time, so we don't need to hold files for all PVCs on the temp disk at the same time.

Conversation

sseago commented Apr 2, 2026

Why the changes were made

How to test the changes made

Uh oh!

coderabbitai bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

openshift-ci bot commented Apr 2, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Apr 2, 2026 •

edited

Loading