virtio: convert device disable to async poll-based pattern#2850
virtio: convert device disable to async poll-based pattern#2850jstarks merged 6 commits intomicrosoft:mainfrom
Conversation
|
This PR modifies files containing For more on why we check whole files, instead of just diffs, check out the Rustonomicon |
There was a problem hiding this comment.
Pull request overview
This PR refactors virtio device shutdown from a synchronous disable() to a poll-driven poll_disable() so transports can keep guest-visible device status non-zero until device shutdown is actually complete, and it adds a TaskControl::poll_stop() helper to enable non-async shutdown driving.
Changes:
- Add
TaskControl::poll_stop()(and refactorstop()to use it), including unit tests. - Replace
VirtioDevice::disable()withVirtioDevice::poll_disable()and update multiple virtio device implementations accordingly. - Update virtio MMIO/PCI transports to asynchronously drive disables via
PollDevice, and implementChangeDeviceState::reset()to fully reset transport/device state.
Reviewed changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| support/task_control/src/lib.rs | Introduces poll_stop() and rewrites stop() to use poll-based driving; adds tests. |
| vm/devices/virtio/virtio/src/common.rs | Updates the VirtioDevice trait to poll_disable() with documentation. |
| vm/devices/virtio/virtio/src/transport/mmio.rs | Adds disable state tracking and PollDevice integration for async reset/disable. |
| vm/devices/virtio/virtio/src/transport/pci.rs | Adds disable state tracking and PollDevice integration for async reset/disable. |
| vm/devices/virtio/virtio/src/tests.rs | Updates the test device implementation to the new poll_disable() contract. |
| vm/devices/virtio/virtio_net/src/lib.rs | Converts device shutdown hook to poll_disable() (currently synchronous completion). |
| vm/devices/virtio/virtio_p9/src/lib.rs | Converts shutdown logic to poll_disable() and removes detached-task shutdown. |
| vm/devices/virtio/virtio_p9/Cargo.toml | Removes unused async/spawn dependency. |
| vm/devices/virtio/virtio_pmem/src/lib.rs | Converts shutdown logic to poll_disable() and removes detached-task shutdown. |
| vm/devices/virtio/virtio_pmem/Cargo.toml | Removes unused async/spawn dependency. |
| vm/devices/virtio/virtiofs/src/virtio.rs | Converts shutdown logic to poll_disable() and removes detached-task shutdown. |
| vm/devices/virtio/virtiofs/Cargo.toml | Removes unused futures/spawn dependencies. |
| Cargo.lock | Reflects dependency removals from the affected crates. |
| fn poll_disable(&mut self, cx: &mut Context<'_>) -> Poll<()> { | ||
| if let Some(worker) = &mut self.worker { | ||
| ready!(worker.poll_stop(cx)); | ||
| } | ||
| self.worker = None; | ||
| Poll::Ready(()) |
There was a problem hiding this comment.
Previous disable() notified exit_event before stopping the worker. Since VirtioQueueWorker uses the exit_event to exit its run loop, stopping without notifying can cancel in-flight processing rather than allowing a graceful drain/exit.
Notify self.exit_event.notify(usize::MAX) when beginning poll_disable() (ideally once per disable) before polling worker.poll_stop(cx).
Add TaskControl::poll_stop() as the poll-based variant of stop(). Returns Poll::Ready(true) when the task was running and has stopped, Poll::Ready(false) if already stopped/not inserted/completed, and Poll::Pending while waiting for the task to stop. Rewrite stop() as poll_fn(|cx| self.poll_stop(cx)).await. This is a prerequisite for virtio poll_disable, where devices need a non-async way to drive worker shutdown from poll_disable().
Change the VirtioDevice trait method from sync fn disable(&mut self) to fn poll_disable(&mut self, cx: &mut Context<'_>) -> Poll<()>. Device implementations now use TaskControl::poll_stop() to stop workers instead of spawning detached fire-and-forget tasks. This eliminates the spawn+detach pattern that could outlive the device. Updated implementations: - virtio-net: sends CoordinatorMessage::Disable, returns Ready (unchanged semantics -- coordinator handles cleanup internally) - virtiofs: poll_stop on all queue workers - virtio-pmem: poll_stop on single worker - virtio-9p: poll_stop on single worker - TestDevice: poll_stop on all workers Transport call sites (PCI and MMIO) use a noop waker to initiate the disable, maintaining the same fire-and-forget behavior for now. Proper async completion via PollDevice will be added in a follow-up. Removed now-unused dependencies: pal_async from virtio_p9, virtio_pmem, and virtiofs; futures from virtiofs.
Add PollDevice support to both MMIO and PCI virtio transports to enable asynchronous device disabling. When a guest writes status=0 to trigger a device reset, the transport now: 1. Initiates poll_disable with a noop waker 2. If the device completes synchronously (fast path), clears status immediately 3. If pending, sets a 'disabling' flag and returns -- the guest sees the old status bits (non-zero) until PollDevice drives completion The PollDevice::poll_device implementation drives the pending disable to completion via the chipset infrastructure's polling mechanism. Once ready, it clears the device status to 0 so the guest can observe the reset. Also implements ChangeDeviceState::reset() for both transports (previously TODO), which drives poll_disable to completion and resets all transport state. Updates Drop to only poll_disable when the device was actually started or is in the middle of disabling.
The poll_disable calls in Drop were a half-measure -- polling with a noop waker initiates shutdown but cannot drive it to completion. Proper async teardown will require a different mechanism (e.g., an async method that takes ownership). Remove the Drop impls to keep things clean in the meantime.
| fn poll_disable(&mut self, _cx: &mut Context<'_>) -> Poll<()> { | ||
| if let Some(send) = self.coordinator_send.take() { | ||
| send.send(CoordinatorMessage::Disable); | ||
| } | ||
| Poll::Ready(()) | ||
| } |
There was a problem hiding this comment.
virtio_net's new poll_disable() returns Poll::Ready(()) immediately after sending CoordinatorMessage::Disable, without polling/waiting for the coordinator + queue workers to actually stop. This appears to violate the new VirtioDevice::poll_disable contract (and the PR goal of keeping status non-zero until shutdown completes), since work may still be running after the transport considers the reset complete. Suggestion: have poll_disable() drive shutdown completion (e.g., by polling worker/coordinator shutdown to completion, or introducing an explicit completion/ack mechanism from the coordinator) and return Poll::Pending until all background tasks have actually stopped.
There was a problem hiding this comment.
Known, will handle in future change.
Summary
Convert the
VirtioDevice::disable()method from a synchronous fire-and-forget pattern to an async poll-basedpoll_disable(), enabling transports to properly track device shutdown and report completion to the guest.Motivation
The previous
disable()implementation had each device spawn a detached task to stop its workers. These detached tasks could outlive the device and there was no way for the transport to know when shutdown was actually complete. The guest saw status=0 immediately regardless of whether workers had finished draining.Changes
task_control: addpoll_stop()method — Extracts the polling logic fromstop()into a newpoll_stop()that returnsPoll<bool>, then rewritesstop()aspoll_fn(|cx| self.poll_stop(cx)).await. This gives devices a non-async way to drive worker shutdown frompoll_disable(). Includes unit tests.virtio: changeVirtioDevice::disable()topoll_disable()— Changes the trait method signature tofn poll_disable(&mut self, cx: &mut Context<'_>) -> Poll<()>. Device implementations (virtio_net,virtiofs,virtio_pmem,virtio_p9,TestDevice) now useTaskControl::poll_stop()directly instead of spawning detached tasks. Removes unusedpal_asyncandfuturesdependencies from several crates.virtio: implementPollDevicefor async disable in transports — Both MMIO and PCI transports now implementPollDeviceto drive pending disables via the chipset polling infrastructure:poll_disablewith a noop wakerdisablingflag and returns — the guest sees non-zero status untilPollDevice::poll_devicedrives it to completionChangeDeviceState::reset()(previously a TODO) to async-drivepoll_disableand reset all transport statevirtio: removeDropimpls from transports — TheDrop-basedpoll_disablewith a noop waker could initiate but never drive shutdown to completion. Removed in favor of proper async teardown viareset().Files changed
support/task_control/src/lib.rsvm/devices/virtio/virtio/src/common.rs,vm/devices/virtio/virtio/src/tests.rsvm/devices/virtio/virtio/src/transport/mmio.rs,vm/devices/virtio/virtio/src/transport/pci.rsvm/devices/virtio/virtio_net/src/lib.rs,vm/devices/virtio/virtio_p9/src/lib.rs,vm/devices/virtio/virtio_pmem/src/lib.rs,vm/devices/virtio/virtiofs/src/virtio.rs