Update NVIDIA driver symlink script#158
Conversation
|
Although we don't have the symlinks yet, I can actually already test this in the container - it will just create the symlinks in What I did: Works as intended. After implementing the variant symlinks, we should retest, try to use the |
|
Tested in the container using EESSI 2025.06 and without having configured the variant symlinks: With the variant symlink reconfigured as Wiping that dir and doing it again using Also checked the symlinks, and the pointed to the expected locations. |
| # Do some checks on existence of links and that we don't end up at /dev/null (the default), so we can print some informative information | ||
| # One downside is that we can't explicitely check if something is a variant symlink, so we'll just assume that if it's a link AND it | ||
| # lives in our CVMFS repository, it must be a variant symlink | ||
| nvidia_trusted_dir="${EESSI_EPREFIX}/lib/nvidia" |
There was a problem hiding this comment.
Does this mean that the script will no longer work for 2023.06?
There was a problem hiding this comment.
Hm, yeah, that's annoying, this script is in an unversioned prefix. I mean, if we deploy this only for 2025.06, we keep the old version for 2023.06. But then if we want to update that, we have to revert all changes, etc. Maybe we should just duplicate the script? I.e. create something like scripts/gpu_support/nvidia/2023.06/link_nvidia_host_libraries.sh? Or should it be at higher level 2023.06/scripts/gpu_support...?
There was a problem hiding this comment.
Ok, I did solve it differently in the end. Only the symlink_mode function changed. I now just duplicated it, and call the correct one based on the value of EESSI_VERSION (note that check_eessi_initialized is called before symlink_mode, so we can be sure that EESSI is initialized and that EESSI_VERSION has been set by the time it gets to that part of the script).
The duplication means there is some code duplication - but actually not to much: the function has changed substantially, so I think the small amount of duplication that is still left is acceptable - especially since it's unlikely we'll still change anything on the 2023.06 side.
Co-authored-by: Bob Dröge <b.e.droge@rug.nl>
Co-authored-by: Bob Dröge <b.e.droge@rug.nl>
…yer-scripts into link_nvidia_drivers
|
Tested using: Gives: As expected. Then, modify the CVMFS config: And run again: The result looks fine: (of course, we don't need to symlink this in a container, as singularity sets Similarly, if I set And the result looks as expected: |
|
Also tested Looks as expected! |
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx |
|
New job on instance
|
|
New job on instance
|
|
Staging PR merged. |
|
@casparvl Do we need to update docs accordingly @ https://eessi.io/docs/site_specific_config/gpu/#exposing-nvidia-gpu-drivers ? |
|
We'll need the following variant symlinks to be in place before this script can work as intended:
And then:
This can then be quite easily tested from within the container:
This should error out stating that the variant symlink resolves to
/dev/null. Then, you can change/etc/cvmfs/default.localto set e.g.EESSI_NVIDIA_OVERRIDE_DEFAULT(e.g. to/opt/eessi/nvidia) and run the linking script again - this should the install the symlinks.