Skip to content

[Bug]: nvidia-driver-daemonset podAntiAffinity causes permanent Misscheduled state on multi-GPU node clusters #2255

@dor1a

Description

@dor1a

Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.

Describe the bug

On clusters with 2+ GPU nodes, nvidia-driver-daemonset permanently reports
Desired=1 and Misscheduled=1 despite all pods running healthy on every GPU node.

The root cause is a requiredDuringSchedulingIgnoredDuringExecution podAntiAffinity
rule hardcoded in assets/state-driver/0500_daemonset.yaml:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app.kubernetes.io/component
          operator: In
          values:
          - nvidia-driver
      topologyKey: kubernetes.io/hostname

This conflicts with the fundamental behavior of a DaemonSet. The DS controller
marks the pod on the second node as misscheduled because anti-affinity prohibits
scheduling there, but cannot evict it due to IgnoredDuringExecution.
The result is a permanent Misscheduled=1 state.

This rule cannot be overridden via daemonsets.affinity: {} in values.yaml
or ClusterPolicy — it is injected directly from the asset file.

Reproduced on both v25.3.4 and v26.3.0.

To Reproduce

  1. Deploy GPU Operator on a cluster with 2 or more GPU nodes
  2. Run kubectl get ds nvidia-driver-daemonset -n gpu-operator
  3. Observe DESIRED=1, AVAILABLE=1, but driver pods running on all GPU nodes
  4. Run kubectl describe ds nvidia-driver-daemonset -n gpu-operator | grep Misscheduled
  5. Observe Number of Nodes Misscheduled: 1

Expected behavior

DESIRED should match the number of GPU nodes and Misscheduled should be 0.
DaemonSets inherently guarantee one pod per node — requiredDuringScheduling
podAntiAffinity is redundant and causes incorrect state reporting in multi-GPU-node clusters.

Suggested Fix

Replace requiredDuringSchedulingIgnoredDuringExecution with
preferredDuringSchedulingIgnoredDuringExecution, or remove the podAntiAffinity
block entirely from assets/state-driver/0500_daemonset.yaml.

Environment

  • GPU Operator Version: v25.3.4 (also reproduced on v26.3.0)
  • OS: Ubuntu 24.04
  • Kernel Version: 6.8.0-87-generic
  • Container Runtime Version: containerd (k3s embedded)
  • Kubernetes Distro and Version: k3s v1.33.5

Information to attach

NAME                                       DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE
nvidia-driver-daemonset                    1         1         1       1            1
nvidia-operator-validator                  1         1         1       1            1

Both daemonsets show DESIRED=1 while driver pods are Running/Ready on 2 GPU nodes:

nvidia-driver-daemonset-xxxxx   1/1   Running   k3s-gpu-worker1
nvidia-driver-daemonset-yyyyy   1/1   Running   k3s-gpu-worker2
Number of Nodes Misscheduled: 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue/PR to expose/discuss/fix a bugneeds-triageissue or PR has not been assigned a priority-px label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions