Unfortunately, it looks like an inconsistency with the package naming for:
nvidia-fabric-manager
libnvidia-nscq
Breaks precompiled builds for OpenShift (RHEL9) when specifying driver version 580.65.06.
For example:
580.65.06
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/nvidia-fabricmanager-580.65.06-1.x86_64.rpm
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/libnvidia-nscq-580.65.06-1.x86_64.rpm
nvidia-fabric-manager is now formatted as nvidia-fabricmanager
libnvidia-nscq does not have the driver branch (i.e., 580) in the version string. Ideally, it would be: libnvidia-nscq-580-580.65.06-1.x86_64.rpm
In comparison to:
575.57.08
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/nvidia-fabric-manager-575.57.08-1.x86_64.rpm
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/libnvidia-nscq-575-575.57.08-1.x86_64.rpm
The Dockerfile currently is unable to account for this discrepancy:
|
dnf install -y nvidia-fabric-manager-${DRIVER_VERSION} libnvidia-nscq-${DRIVER_BRANCH}-${DRIVER_VERSION} ; \ |
I hesitate to open a pull request in case this is an oversight with the package naming. In the meantime, it is easy enough to work around the issue by modifying the string formatting in the Dockerfile.
Cheers,
Unfortunately, it looks like an inconsistency with the package naming for:
nvidia-fabric-managerlibnvidia-nscqBreaks precompiled builds for OpenShift (RHEL9) when specifying driver version 580.65.06.
For example:
580.65.06
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/nvidia-fabricmanager-580.65.06-1.x86_64.rpm
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/libnvidia-nscq-580.65.06-1.x86_64.rpm
nvidia-fabric-manageris now formatted asnvidia-fabricmanagerlibnvidia-nscqdoes not have the driver branch (i.e.,580) in the version string. Ideally, it would be:libnvidia-nscq-580-580.65.06-1.x86_64.rpmIn comparison to:
575.57.08
https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/nvidia-fabric-manager-575.57.08-1.x86_64.rpm
https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/libnvidia-nscq-575-575.57.08-1.x86_64.rpm
The Dockerfile currently is unable to account for this discrepancy:
gpu-driver-container/rhel9/precompiled/Dockerfile
Line 163 in 407b170
I hesitate to open a pull request in case this is an oversight with the package naming. In the meantime, it is easy enough to work around the issue by modifying the string formatting in the Dockerfile.
Cheers,