Skip to content

virtio: convert device disable to async poll-based pattern#2850

Merged
jstarks merged 6 commits intomicrosoft:mainfrom
jstarks:poll_disable
Mar 10, 2026
Merged

virtio: convert device disable to async poll-based pattern#2850
jstarks merged 6 commits intomicrosoft:mainfrom
jstarks:poll_disable

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented Feb 27, 2026

Summary

Convert the VirtioDevice::disable() method from a synchronous fire-and-forget pattern to an async poll-based poll_disable(), enabling transports to properly track device shutdown and report completion to the guest.

Motivation

The previous disable() implementation had each device spawn a detached task to stop its workers. These detached tasks could outlive the device and there was no way for the transport to know when shutdown was actually complete. The guest saw status=0 immediately regardless of whether workers had finished draining.

Changes

task_control: add poll_stop() method — Extracts the polling logic from stop() into a new poll_stop() that returns Poll<bool>, then rewrites stop() as poll_fn(|cx| self.poll_stop(cx)).await. This gives devices a non-async way to drive worker shutdown from poll_disable(). Includes unit tests.

virtio: change VirtioDevice::disable() to poll_disable() — Changes the trait method signature to fn poll_disable(&mut self, cx: &mut Context<'_>) -> Poll<()>. Device implementations (virtio_net, virtiofs, virtio_pmem, virtio_p9, TestDevice) now use TaskControl::poll_stop() directly instead of spawning detached tasks. Removes unused pal_async and futures dependencies from several crates.

virtio: implement PollDevice for async disable in transports — Both MMIO and PCI transports now implement PollDevice to drive pending disables via the chipset polling infrastructure:

  • On guest status=0 write: initiates poll_disable with a noop waker
  • Fast path: if the device completes synchronously, clears status immediately
  • Slow path: sets a disabling flag and returns — the guest sees non-zero status until PollDevice::poll_device drives it to completion
  • Implements ChangeDeviceState::reset() (previously a TODO) to async-drive poll_disable and reset all transport state

virtio: remove Drop impls from transports — The Drop-based poll_disable with a noop waker could initiate but never drive shutdown to completion. Removed in favor of proper async teardown via reset().

Files changed

Area Files
task_control support/task_control/src/lib.rs
virtio core vm/devices/virtio/virtio/src/common.rs, vm/devices/virtio/virtio/src/tests.rs
transports vm/devices/virtio/virtio/src/transport/mmio.rs, vm/devices/virtio/virtio/src/transport/pci.rs
devices vm/devices/virtio/virtio_net/src/lib.rs, vm/devices/virtio/virtio_p9/src/lib.rs, vm/devices/virtio/virtio_pmem/src/lib.rs, vm/devices/virtio/virtiofs/src/virtio.rs

@github-actions github-actions Bot added the unsafe Related to unsafe code label Feb 27, 2026
@github-actions
Copy link
Copy Markdown

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

@jstarks jstarks marked this pull request as ready for review March 2, 2026 04:01
@jstarks jstarks requested review from a team as code owners March 2, 2026 04:01
Copilot AI review requested due to automatic review settings March 2, 2026 04:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors virtio device shutdown from a synchronous disable() to a poll-driven poll_disable() so transports can keep guest-visible device status non-zero until device shutdown is actually complete, and it adds a TaskControl::poll_stop() helper to enable non-async shutdown driving.

Changes:

  • Add TaskControl::poll_stop() (and refactor stop() to use it), including unit tests.
  • Replace VirtioDevice::disable() with VirtioDevice::poll_disable() and update multiple virtio device implementations accordingly.
  • Update virtio MMIO/PCI transports to asynchronously drive disables via PollDevice, and implement ChangeDeviceState::reset() to fully reset transport/device state.

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
support/task_control/src/lib.rs Introduces poll_stop() and rewrites stop() to use poll-based driving; adds tests.
vm/devices/virtio/virtio/src/common.rs Updates the VirtioDevice trait to poll_disable() with documentation.
vm/devices/virtio/virtio/src/transport/mmio.rs Adds disable state tracking and PollDevice integration for async reset/disable.
vm/devices/virtio/virtio/src/transport/pci.rs Adds disable state tracking and PollDevice integration for async reset/disable.
vm/devices/virtio/virtio/src/tests.rs Updates the test device implementation to the new poll_disable() contract.
vm/devices/virtio/virtio_net/src/lib.rs Converts device shutdown hook to poll_disable() (currently synchronous completion).
vm/devices/virtio/virtio_p9/src/lib.rs Converts shutdown logic to poll_disable() and removes detached-task shutdown.
vm/devices/virtio/virtio_p9/Cargo.toml Removes unused async/spawn dependency.
vm/devices/virtio/virtio_pmem/src/lib.rs Converts shutdown logic to poll_disable() and removes detached-task shutdown.
vm/devices/virtio/virtio_pmem/Cargo.toml Removes unused async/spawn dependency.
vm/devices/virtio/virtiofs/src/virtio.rs Converts shutdown logic to poll_disable() and removes detached-task shutdown.
vm/devices/virtio/virtiofs/Cargo.toml Removes unused futures/spawn dependencies.
Cargo.lock Reflects dependency removals from the affected crates.

Comment thread vm/devices/virtio/virtio_pmem/src/lib.rs
Comment on lines +131 to +136
fn poll_disable(&mut self, cx: &mut Context<'_>) -> Poll<()> {
if let Some(worker) = &mut self.worker {
ready!(worker.poll_stop(cx));
}
self.worker = None;
Poll::Ready(())
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous disable() notified exit_event before stopping the worker. Since VirtioQueueWorker uses the exit_event to exit its run loop, stopping without notifying can cancel in-flight processing rather than allowing a graceful drain/exit.

Notify self.exit_event.notify(usize::MAX) when beginning poll_disable() (ideally once per disable) before polling worker.poll_stop(cx).

Copilot uses AI. Check for mistakes.
Comment thread vm/devices/virtio/virtio/src/tests.rs
Comment thread vm/devices/virtio/virtio/src/transport/mmio.rs
Comment thread vm/devices/virtio/virtio/src/transport/pci.rs
Comment thread vm/devices/virtio/virtiofs/src/virtio.rs
Brian-Perkins
Brian-Perkins previously approved these changes Mar 9, 2026
jstarks added 5 commits March 10, 2026 02:22
Add TaskControl::poll_stop() as the poll-based variant of stop(). Returns
Poll::Ready(true) when the task was running and has stopped,
Poll::Ready(false) if already stopped/not inserted/completed, and
Poll::Pending while waiting for the task to stop.

Rewrite stop() as poll_fn(|cx| self.poll_stop(cx)).await.

This is a prerequisite for virtio poll_disable, where devices need a
non-async way to drive worker shutdown from poll_disable().
Change the VirtioDevice trait method from sync fn disable(&mut self) to
fn poll_disable(&mut self, cx: &mut Context<'_>) -> Poll<()>.

Device implementations now use TaskControl::poll_stop() to stop workers
instead of spawning detached fire-and-forget tasks. This eliminates the
spawn+detach pattern that could outlive the device.

Updated implementations:
- virtio-net: sends CoordinatorMessage::Disable, returns Ready (unchanged
  semantics -- coordinator handles cleanup internally)
- virtiofs: poll_stop on all queue workers
- virtio-pmem: poll_stop on single worker
- virtio-9p: poll_stop on single worker
- TestDevice: poll_stop on all workers

Transport call sites (PCI and MMIO) use a noop waker to initiate the
disable, maintaining the same fire-and-forget behavior for now. Proper
async completion via PollDevice will be added in a follow-up.

Removed now-unused dependencies: pal_async from virtio_p9, virtio_pmem,
and virtiofs; futures from virtiofs.
Add PollDevice support to both MMIO and PCI virtio transports to enable
asynchronous device disabling. When a guest writes status=0 to trigger a
device reset, the transport now:

1. Initiates poll_disable with a noop waker
2. If the device completes synchronously (fast path), clears status
   immediately
3. If pending, sets a 'disabling' flag and returns -- the guest sees the
   old status bits (non-zero) until PollDevice drives completion

The PollDevice::poll_device implementation drives the pending disable to
completion via the chipset infrastructure's polling mechanism. Once ready,
it clears the device status to 0 so the guest can observe the reset.

Also implements ChangeDeviceState::reset() for both transports (previously
TODO), which drives poll_disable to completion and resets all transport
state. Updates Drop to only poll_disable when the device was actually
started or is in the middle of disabling.
The poll_disable calls in Drop were a half-measure -- polling with a noop
waker initiates shutdown but cannot drive it to completion. Proper async
teardown will require a different mechanism (e.g., an async method that
takes ownership). Remove the Drop impls to keep things clean in the
meantime.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Comment thread vm/devices/virtio/virtio/src/transport/pci.rs
Comment on lines +350 to 355
fn poll_disable(&mut self, _cx: &mut Context<'_>) -> Poll<()> {
if let Some(send) = self.coordinator_send.take() {
send.send(CoordinatorMessage::Disable);
}
Poll::Ready(())
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

virtio_net's new poll_disable() returns Poll::Ready(()) immediately after sending CoordinatorMessage::Disable, without polling/waiting for the coordinator + queue workers to actually stop. This appears to violate the new VirtioDevice::poll_disable contract (and the PR goal of keeping status non-zero until shutdown completes), since work may still be running after the transport considers the reset complete. Suggestion: have poll_disable() drive shutdown completion (e.g., by polling worker/coordinator shutdown to completion, or introducing an explicit completion/ack mechanism from the coordinator) and return Poll::Pending until all background tasks have actually stopped.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known, will handle in future change.

Copy link
Copy Markdown
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo.lock LGTM

@jstarks jstarks merged commit 2fd7cea into microsoft:main Mar 10, 2026
60 checks passed
@jstarks jstarks deleted the poll_disable branch March 10, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants