Skip to content

fix: prevent silent volume migration in ControllerPublishVolume#143

Open
mweibel wants to merge 1 commit intomasterfrom
multi-attach-staleness
Open

fix: prevent silent volume migration in ControllerPublishVolume#143
mweibel wants to merge 1 commit intomasterfrom
multi-attach-staleness

Conversation

@mweibel
Copy link
Copy Markdown
Collaborator

@mweibel mweibel commented Mar 27, 2026

ControllerPublishVolume previously overwrote ServerUUIDs unconditionally, silently moving an attached volume to a new node. When the subsequent ControllerUnpublishVolume for the old node found the volume no longer there, it returned success without detaching, leaving a stale VolumeAttachment that caused Multi-Attach deadlocks in production.

Fix: fetch the volume before attaching. If it is attached to a different node return FailedPrecondition so the external-attacher waits for the old detach to complete first. If already attached to the requested node return idempotent success.

Also add volume locks (TryAcquire/Release, already used on the node side) to DeleteVolume, ControllerPublishVolume, ControllerUnpublishVolume, CreateSnapshot, and ControllerExpandVolume to prevent TOCTOU races between concurrent mutating operations on the same volume.

Script to reproduce this case (only worked with forced mode). Generated by Claude Code.
reproduce-multiattach.sh

ControllerPublishVolume previously overwrote ServerUUIDs unconditionally,
silently moving an attached volume to a new node. When the subsequent
ControllerUnpublishVolume for the old node found the volume no longer
there, it returned success without detaching, leaving a stale
VolumeAttachment that caused Multi-Attach deadlocks in
production.

Fix: fetch the volume before attaching. If it is attached to a different
node return FailedPrecondition so the external-attacher waits for the old
detach to complete first. If already attached to the requested node return
idempotent success.

Also add volume locks (TryAcquire/Release, already used on the node side)
to DeleteVolume, ControllerPublishVolume, ControllerUnpublishVolume,
CreateSnapshot, and ControllerExpandVolume to prevent TOCTOU races between
concurrent mutating operations on the same volume.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant