device-plugin: gate Enabled label on SriovNetworkNodeState sync status#15
device-plugin: gate Enabled label on SriovNetworkNodeState sync status#15beccccaboo wants to merge 9 commits intomasterfrom
Conversation
Add a wait-for-node-state init container to the sriov-device-plugin DaemonSet that blocks startup until SriovNetworkNodeState.status.syncStatus is Succeeded for the node. Without this, the device plugin can start before the config daemon finishes creating and binding VFs to vfio-pci on a new node. When it finds 0 matching devices it skips resource server creation entirely and never picks up VFs created later, requiring a manual pod restart to surface the node annotation. The init container polls the SriovNetworkNodeState CR every 10s using the auto-mounted service account credentials. It exits immediately on already- configured nodes where syncStatus is already Succeeded. Also adds get permission on sriovnetworknodestates to the sriov-plugin Role and exposes SRIOVNetworkConfigDaemonImage to the plugin manifest renderer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
ENVTEST_K8S_VERSION was referenced in Makefile test targets but never defined, causing `setup-envtest use` to run with an empty version argument. Newer versions of setup-envtest require an explicit version and fail silently on empty input, leaving KUBEBUILDER_ASSETS unset. envtest then falls back to searching PATH for etcd, which does not exist on GitHub Actions runners, causing all envtest-based test suites to fail at BeforeSuite. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The GCS 401 error is the root cause, not the missing version variable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The GCS bucket used by setup-envtest from controller-runtime@v0.16.3 now returns 401 Unauthorized, leaving KUBEBUILDER_ASSETS empty and causing all envtest-based test suites to fail with "etcd not found". Add an explicit step to both test jobs that installs setup-envtest from release-0.19, which uses the GitHub-hosted envtest-releases.yaml index instead of GCS, and pre-populates KUBEBUILDER_ASSETS in the job environment before any make test-* targets run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This reverts commit 096ad0e.
Replace the init container approach with a controller-level fix. The SriovNetworkNodePolicyReconciler now delays setting the sriovnetwork.openshift.io/device-plugin=Enabled node label until SriovNetworkNodeState.Status.SyncStatus == "Succeeded". Since the device plugin DaemonSet uses this label as its nodeSelector, it cannot schedule on a node until VFs are fully configured, eliminating the race between the config daemon and device plugin on new nodes. During reconfiguration (SyncStatus briefly InProgress on an already-labelled node), the label is preserved to avoid evicting the running device plugin pod. A NodeStateSyncStatusPredicate watch on SriovNetworkNodeState ensures the reconciler re-triggers as soon as the config daemon transitions to Succeeded. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Yarosh
left a comment
There was a problem hiding this comment.
Overall this change looks good (see 1 comment I left) - I would:
- Test it in tcloud staging, with a full test- i.e.: scale down operator, delete sriovnetworknodestate, remove label from a node, scale operator back up and see it reconcile properly. Then roll this out to prod substrates
- Worth adding tests to this scenario - there are quite a few tests in this repo. Pretty sure there's a test for this device-plugin code that you can tweak
| // Only enable the device plugin when the node state has successfully synced. | ||
| // If already Enabled (reconfiguration case), keep it Enabled to avoid evicting the running pod. | ||
| ns := &sriovnetworkv1.SriovNetworkNodeState{} | ||
| err = r.Get(ctx, types.NamespacedName{Namespace: vars.Namespace, Name: node.Name}, ns) |
There was a problem hiding this comment.
could we move the check for currentLabel == constants.SriovDevicePluginLabelEnabled here, and just skip making any changes if already enabled?
You can short-circuit, and spare the etcd lookup for sriovnetworknodestate - saving some load on k8s api-server and etcd.
|
When sriov-network-operator/pkg/daemon/daemon.go Lines 525 to 530 in 1fb8405 |
Problem
When a new node joins the cluster, the
sriov-device-pluginpod can start beforesriov-network-config-daemonfinishes configuring VFs. The device plugin scans once at startup, finds 0 VFs, and never picks them up later — leaving the node without SR-IOV resources until the pod is manually restarted.Solution
The device plugin DaemonSet uses
nodeSelector: { sriovnetwork.openshift.io/device-plugin: Enabled }and only schedules on nodes with this label. The operator'sSriovNetworkNodePolicyReconcileris exclusively responsible for setting this label.This PR delays setting the label to
EnableduntilSriovNetworkNodeState.Status.SyncStatus == "Succeeded", ensuring the device plugin pod cannot start until VFs are fully configured.A
NodeStateSyncStatusPredicatewatch onSriovNetworkNodeStatetriggers reconciliation as soon as the config daemon transitions toSucceeded, so the label is applied promptly.Behaviour
""/"InProgress"Disabled; device plugin not scheduled"Succeeded"DisabledEnabled; device plugin starts, finds VFs"InProgress"Enabled"Succeeded"EnabledChanges
controllers/sriovnetworknodepolicy_controller.go: fetchSriovNetworkNodeStatebefore labeling; only setEnabledwhenSyncStatus == Succeeded(or alreadyEnabled). AddSriovNetworkNodeStatewatch withNodeStateSyncStatusPredicate.controllers/helper.go: addNodeStateSyncStatusPredicatethat fires only whenSyncStatuschanges.🤖 Generated with Claude Code