Skip to content

Allow parallel execution of NAS backup and delete commands#12847

Open
jmsperu wants to merge 1 commit intoapache:4.20from
jmsperu:fix/nasbackup-parallel-execution
Open

Allow parallel execution of NAS backup and delete commands#12847
jmsperu wants to merge 1 commit intoapache:4.20from
jmsperu:fix/nasbackup-parallel-execution

Conversation

@jmsperu
Copy link

@jmsperu jmsperu commented Mar 17, 2026

Summary

  • Change executeInSequence() from true to false in TakeBackupCommand and DeleteBackupCommand
  • Allows the KVM agent to process multiple backup/delete operations concurrently via its existing worker thread pool
  • RestoreBackupCommand and PrepareForBackupRestorationCommand remain sequential (they modify VM state)

Motivation

Currently all backup commands are serialized on the agent — a large VM backup (e.g. 100+ GB taking 2+ hours) blocks all other backup and delete operations on the same host. This is the root cause of backup schedule delays and timeouts in environments with many VMs per host.

Each backup operation:

  • Mounts its own temporary NFS directory (mktemp -d -t csbackup.XXXXX)
  • Operates on independent VM disks via separate QEMU block jobs
  • Has no shared state with other backup operations

There is no technical reason to serialize them. The agent already has a thread pool (requestHandler) that can execute multiple commands concurrently — this change simply allows backup commands to use it.

Impact

  • Hosts with 10+ VMs will see significantly faster backup completion (backups run in parallel instead of queuing)
  • NFS bandwidth is shared across concurrent backups (can be controlled with the bandwidth throttle flag from PR nasbackup.sh: add bandwidth throttle via -b flag #12846)
  • No change to management server scheduling — it already submits backups as independent async jobs

Test plan

  • Schedule 3+ VM backups at the same time on one host — verify they run concurrently (check virsh domjobinfo on multiple VMs)
  • Verify each backup gets its own mount point (no mount conflicts)
  • Run backup + delete concurrently — verify no interference
  • Verify restore operations still execute sequentially
  • Monitor host I/O during concurrent backups — consider using -b bandwidth throttle if NFS saturates

Change executeInSequence() to return false for TakeBackupCommand
and DeleteBackupCommand, allowing the KVM agent to process multiple
backup/delete operations concurrently via its worker thread pool.

Previously, all backup commands were serialized — a large VM backup
(e.g. 100+ GB taking 2+ hours) would block all other backup and
delete operations on the same host. Since each backup mounts its
own temporary NFS directory and operates on independent VM disks,
there is no shared state requiring serialization.

Restore and PrepareForBackupRestoration commands remain sequential
as they modify VM state that should not be concurrent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 16.25%. Comparing base (61afb4c) to head (81f5dd2).

Files with missing lines Patch % Lines
.../apache/cloudstack/backup/DeleteBackupCommand.java 0.00% 1 Missing ⚠️
...rg/apache/cloudstack/backup/TakeBackupCommand.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.20   #12847   +/-   ##
=========================================
  Coverage     16.24%   16.25%           
  Complexity    13411    13411           
=========================================
  Files          5664     5664           
  Lines        500463   500463           
  Branches      60779    60779           
=========================================
+ Hits          81308    81333   +25     
+ Misses       410059   410035   -24     
+ Partials       9096     9095    -1     
Flag Coverage Δ
uitests 4.15% <ø> (ø)
unittests 17.10% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes CloudStack backup agent command scheduling by making the NAS backup take and delete commands eligible for parallel execution on the agent (instead of being forced through the single-threaded “in-sequence” queue).

Changes:

  • Make TakeBackupCommand.executeInSequence() return false.
  • Make DeleteBackupCommand.executeInSequence() return false.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
core/src/main/java/org/apache/cloudstack/backup/TakeBackupCommand.java Marks backup creation command as non-sequential (parallelizable) on the agent.
core/src/main/java/org/apache/cloudstack/backup/DeleteBackupCommand.java Marks backup deletion command as non-sequential (parallelizable) on the agent.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@Override
public boolean executeInSequence() {
return true;
return false;
@Override
public boolean executeInSequence() {
return true;
return false;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants