From 9b6eca3ce16ca620742f6bfd52a439d7e20ae6cd Mon Sep 17 00:00:00 2001
From: Aleksei Sviridkin <f@lex.la>
Date: Thu, 2 Apr 2026 15:19:29 +0300
Subject: [PATCH 1/3] docs(gpu): add vGPU setup guide for GPU sharing between
 VMs

Add practical instructions for deploying GPU Operator with vGPU variant:
- Building proprietary vGPU Manager container image
- Deploying with vgpu variant via Package CR
- NLS license server configuration
- KubeVirt mediatedDeviceTypes setup
- vGPU profile reference table for L40S
- VM creation example with vGPU resource

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
---
 content/en/docs/v1/virtualization/gpu.md | 208 +++++++++++++++++++++--
 1 file changed, 198 insertions(+), 10 deletions(-)

diff --git a/content/en/docs/v1/virtualization/gpu.md b/content/en/docs/v1/virtualization/gpu.md
index 36ef3542..676e5b49 100644
--- a/content/en/docs/v1/virtualization/gpu.md
+++ b/content/en/docs/v1/virtualization/gpu.md
@@ -198,25 +198,213 @@ We are now ready to create a VM.
             Kernel modules: nvidiafb, nvidia_drm, nvidia
     ```
 
-## GPU Sharing for Virtual Machines
+## GPU Sharing for Virtual Machines (vGPU)
 
-GPU passthrough assigns an entire physical GPU to a single VM. To share one GPU between multiple VMs, you need **NVIDIA vGPU**.
+GPU passthrough assigns an entire physical GPU to a single VM. To share one GPU between multiple VMs, you can use **NVIDIA vGPU**, which creates virtual GPUs from a single physical GPU using mediated devices (mdev).
 
-### vGPU (Virtual GPU)
+{{% alert color="info" %}}
+**Why not MIG?** MIG (Multi-Instance GPU) partitions a GPU into isolated instances, but these are logical divisions within a single PCIe device. VFIO cannot pass them to VMs — MIG only works with containers. To use MIG with VMs, you need vGPU on top of MIG partitions (still requires a vGPU license).
+{{% /alert %}}
 
-NVIDIA vGPU uses mediated devices (mdev) to create virtual GPUs assignable to VMs. This is the only production-ready solution for GPU sharing between VMs.
+### Prerequisites
 
-**Requirements:**
-- NVIDIA vGPU license (commercial, purchased from NVIDIA)
-- NVIDIA vGPU Manager installed on host nodes
+- A GPU that supports vGPU (e.g., NVIDIA L40S, A100, A30, A16)
+- An NVIDIA vGPU Software license (NVIDIA AI Enterprise or vGPU subscription)
+- Access to the [NVIDIA Licensing Portal](https://ui.licensing.nvidia.com) to download the vGPU Manager driver
 
-{{% alert color="info" %}}
-**Why not MIG?** MIG (Multi-Instance GPU) partitions a GPU into isolated instances, but these are logical divisions within a single PCIe device. VFIO cannot pass them to VMs — MIG only works with containers. To use MIG with VMs, you need vGPU on top of MIG partitions (still requires a license).
+{{% alert color="warning" %}}
+The vGPU Manager driver is proprietary software distributed by NVIDIA under a commercial license. Cozystack does not include or redistribute this driver. You must obtain it directly from NVIDIA and build the container image yourself.
 {{% /alert %}}
 
+### 1. Build the vGPU Manager Image
+
+Download the vGPU Manager driver from the [NVIDIA Licensing Portal](https://ui.licensing.nvidia.com) and build a container image:
+
+```bash
+# Example Containerfile
+FROM ubuntu:22.04
+ARG DRIVER_VERSION
+COPY NVIDIA-Linux-x86_64-${DRIVER_VERSION}-vgpu-kvm.run /opt/
+RUN chmod +x /opt/NVIDIA-Linux-x86_64-${DRIVER_VERSION}-vgpu-kvm.run
+```
+
+```bash
+docker build --build-arg DRIVER_VERSION=550.90.05 \
+  --tag registry.example.com/nvidia/vgpu-manager:550.90.05 .
+docker push registry.example.com/nvidia/vgpu-manager:550.90.05
+```
+
+Refer to the [NVIDIA GPU Operator documentation](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/install-gpu-operator-vgpu.html) for detailed instructions on building the vGPU Manager image.
+
+### 2. Install the GPU Operator with vGPU Variant
+
+The GPU Operator provides a `vgpu` variant that enables the vGPU Manager and vGPU Device Manager instead of the VFIO Manager used in passthrough mode.
+
+1. Label the worker node for vGPU workloads:
+
+    ```bash
+    kubectl label node <node-name> --overwrite nvidia.com/gpu.workload.config=vm-vgpu
+    ```
+
+2. Create the GPU Operator Package with the `vgpu` variant, providing your vGPU Manager image coordinates:
+
+    ```yaml
+    apiVersion: cozystack.io/v1alpha1
+    kind: Package
+    metadata:
+      name: cozystack.gpu-operator
+    spec:
+      variant: vgpu
+      components:
+        gpu-operator:
+          values:
+            gpu-operator:
+              vgpuManager:
+                repository: registry.example.com/nvidia
+                version: "550.90.05"
+    ```
+
+    If your registry requires authentication, create an `imagePullSecret` in the `cozy-gpu-operator` namespace first, then reference it:
+
+    ```yaml
+    gpu-operator:
+      vgpuManager:
+        repository: registry.example.com/nvidia
+        version: "550.90.05"
+        imagePullSecrets:
+        - name: nvidia-registry-secret
+    ```
+
+3. Verify all pods are running:
+
+    ```bash
+    kubectl get pods -n cozy-gpu-operator
+    ```
+
+    Example output:
+
+    ```console
+    NAME                                            READY   STATUS    RESTARTS   AGE
+    ...
+    nvidia-vgpu-manager-daemonset-xxxxx             1/1     Running   0          60s
+    nvidia-vgpu-device-manager-xxxxx                1/1     Running   0          45s
+    nvidia-sandbox-validator-xxxxx                  1/1     Running   0          30s
+    ```
+
+### 3. Configure NVIDIA License Server (NLS)
+
+vGPU requires a license to operate. Create a ConfigMap with the NLS client configuration:
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: licensing-config
+  namespace: cozy-gpu-operator
+data:
+  gridd.conf: |
+    ServerAddress=nls.example.com
+    ServerPort=443
+    FeatureType=1
+```
+
+Then reference it in the Package values:
+
+```yaml
+gpu-operator:
+  vgpuManager:
+    repository: registry.example.com/nvidia
+    version: "550.90.05"
+  driver:
+    licensingConfig:
+      configMapName: licensing-config
+```
+
+### 4. Update the KubeVirt Custom Resource
+
+Configure KubeVirt to permit mediated devices. The `mediatedDeviceTypes` field specifies which vGPU profiles to use, and `permittedHostDevices` makes them available to VMs:
+
+```bash
+kubectl edit kubevirt -n cozy-kubevirt
+```
+
+```yaml
+spec:
+  configuration:
+    mediatedDevicesConfiguration:
+      mediatedDeviceTypes:
+      - nvidia-592    # Example: NVIDIA L40S-24Q
+    permittedHostDevices:
+      mediatedDevices:
+      - mdevNameSelector: NVIDIA L40S-24Q
+        resourceName: nvidia.com/NVIDIA_L40S-24Q
+```
+
+To find the correct type ID and profile name for your GPU, consult the [NVIDIA vGPU User Guide](https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/).
+
+### 5. Create a Virtual Machine with vGPU
+
+```yaml
+apiVersion: apps.cozystack.io/v1alpha1
+appVersion: '*'
+kind: VirtualMachine
+metadata:
+  name: gpu-vgpu
+  namespace: tenant-example
+spec:
+  running: true
+  instanceProfile: ubuntu
+  instanceType: u1.medium
+  systemDisk:
+    image: ubuntu
+    storage: 5Gi
+    storageClass: replicated
+  gpus:
+  - name: nvidia.com/NVIDIA_L40S-24Q
+  cloudInit: |
+    #cloud-config
+    password: ubuntu
+    chpasswd: { expire: False }
+```
+
+```bash
+kubectl apply -f vmi-vgpu.yaml
+```
+
+Once the VM is running, log in and verify the vGPU is available:
+
+```bash
+virtctl console virtual-machine-gpu-vgpu
+```
+
+```console
+ubuntu@gpu-vgpu:~$ nvidia-smi
++-----------------------------------------------------------------------------------------+
+| NVIDIA-SMI 550.90.05              Driver Version: 550.90.05    CUDA Version: 12.4       |
+|                                                                                         |
+| GPU  Name              ...   MIG M.                                                     |
+|  0   NVIDIA L40S-24Q   ...   N/A                                                        |
++-----------------------------------------------------------------------------------------+
+```
+
+### vGPU Profiles
+
+Each GPU model supports specific vGPU profiles that determine how the GPU is partitioned. Common profiles for NVIDIA L40S:
+
+| Profile | Frame Buffer | Max Instances | Use Case |
+| --- | --- | --- | --- |
+| NVIDIA L40S-1Q | 1 GB | 48 | Light 3D / VDI |
+| NVIDIA L40S-2Q | 2 GB | 24 | Medium 3D / VDI |
+| NVIDIA L40S-4Q | 4 GB | 12 | Heavy 3D / VDI |
+| NVIDIA L40S-6Q | 6 GB | 8 | Professional 3D |
+| NVIDIA L40S-8Q | 8 GB | 6 | AI/ML inference |
+| NVIDIA L40S-12Q | 12 GB | 4 | AI/ML training |
+| NVIDIA L40S-24Q | 24 GB | 2 | Large AI workloads |
+| NVIDIA L40S-48Q | 48 GB | 1 | Full GPU equivalent |
+
 ### Open-Source vGPU (Experimental)
 
-NVIDIA is developing open-source vGPU support for the Linux kernel. Once merged, this could enable GPU sharing without a license.
+NVIDIA is developing open-source vGPU support for the Linux kernel. Once merged, this could enable GPU sharing without a commercial license.
 
 - Status: RFC stage, not merged into mainline kernel
 - Supports Ada Lovelace and newer (L4, L40, etc.)

From 468dd7b8381b320bca8d9838adae8c2a644a0aba Mon Sep 17 00:00:00 2001
From: Aleksei Sviridkin <f@lex.la>
Date: Thu, 2 Apr 2026 15:48:48 +0300
Subject: [PATCH 2/3] docs(gpu): fix vGPU driver container build instructions
 and NLS config

Replace simplified Containerfile with NVIDIA's Makefile-based build
system from gitlab.com/nvidia/container-images/driver. The GPU Operator
expects pre-compiled kernel modules, not a raw .run file.

Add EULA warning about public redistribution of vGPU driver images.
Add note about NLS ServerPort being deployment-dependent.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
---
 content/en/docs/v1/virtualization/gpu.md | 35 ++++++++++++++++--------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/content/en/docs/v1/virtualization/gpu.md b/content/en/docs/v1/virtualization/gpu.md
index 676e5b49..b983b905 100644
--- a/content/en/docs/v1/virtualization/gpu.md
+++ b/content/en/docs/v1/virtualization/gpu.md
@@ -218,23 +218,35 @@ The vGPU Manager driver is proprietary software distributed by NVIDIA under a co
 
 ### 1. Build the vGPU Manager Image
 
-Download the vGPU Manager driver from the [NVIDIA Licensing Portal](https://ui.licensing.nvidia.com) and build a container image:
+The GPU Operator expects a pre-built driver container image — it does not install the driver from a raw `.run` file at runtime.
 
-```bash
-# Example Containerfile
-FROM ubuntu:22.04
-ARG DRIVER_VERSION
-COPY NVIDIA-Linux-x86_64-${DRIVER_VERSION}-vgpu-kvm.run /opt/
-RUN chmod +x /opt/NVIDIA-Linux-x86_64-${DRIVER_VERSION}-vgpu-kvm.run
-```
+1. Download the vGPU Manager driver from the [NVIDIA Licensing Portal](https://ui.licensing.nvidia.com) (Software Downloads → NVIDIA AI Enterprise → Linux KVM)
+2. Build the driver container image using NVIDIA's Makefile-based build system:
 
 ```bash
-docker build --build-arg DRIVER_VERSION=550.90.05 \
-  --tag registry.example.com/nvidia/vgpu-manager:550.90.05 .
+# Clone the NVIDIA driver container repository
+git clone https://gitlab.com/nvidia/container-images/driver.git
+cd driver
+
+# Place the downloaded .run file in the appropriate directory
+cp NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm.run vgpu/
+
+# Build using the provided Makefile
+make OS_TAG=ubuntu22.04 \
+  VGPU_DRIVER_VERSION=550.90.05 \
+  PRIVATE_REGISTRY=registry.example.com/nvidia
+
+# Push to your private registry
 docker push registry.example.com/nvidia/vgpu-manager:550.90.05
 ```
 
-Refer to the [NVIDIA GPU Operator documentation](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/install-gpu-operator-vgpu.html) for detailed instructions on building the vGPU Manager image.
+{{% alert color="info" %}}
+The build process compiles kernel modules against the host kernel version. Refer to the [NVIDIA GPU Operator vGPU documentation](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/install-gpu-operator-vgpu.html) for the complete build procedure and supported OS/kernel combinations.
+{{% /alert %}}
+
+{{% alert color="warning" %}}
+Uploading the vGPU driver to a publicly available registry is a violation of the NVIDIA vGPU EULA. Always use a private registry.
+{{% /alert %}}
 
 ### 2. Install the GPU Operator with vGPU Variant
 
@@ -306,6 +318,7 @@ data:
     ServerAddress=nls.example.com
     ServerPort=443
     FeatureType=1
+    # ServerPort depends on your NLS deployment (commonly 443 for DLS or 7070 for legacy NLS)
 ```
 
 Then reference it in the Package values:

From 492f318894bbdaca8e5305e3a17d10cfa5b52713 Mon Sep 17 00:00:00 2001
From: Aleksei Sviridkin <f@lex.la>
Date: Thu, 2 Apr 2026 15:57:14 +0300
Subject: [PATCH 3/3] fix(gpu): use Secret for licensing config, fix console
 hostname

- Switch licensing config from ConfigMap to Secret (configMapName deprecated)
- Add FeatureType comment explaining values (1=vGPU, 2=vCS)
- Fix console hostname to match Cozystack naming convention (virtual-machine- prefix)

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
---
 content/en/docs/v1/virtualization/gpu.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/content/en/docs/v1/virtualization/gpu.md b/content/en/docs/v1/virtualization/gpu.md
index b983b905..404538d1 100644
--- a/content/en/docs/v1/virtualization/gpu.md
+++ b/content/en/docs/v1/virtualization/gpu.md
@@ -305,23 +305,23 @@ The GPU Operator provides a `vgpu` variant that enables the vGPU Manager and vGP
 
 ### 3. Configure NVIDIA License Server (NLS)
 
-vGPU requires a license to operate. Create a ConfigMap with the NLS client configuration:
+vGPU requires a license to operate. Create a Secret with the NLS client configuration:
 
 ```yaml
 apiVersion: v1
-kind: ConfigMap
+kind: Secret
 metadata:
   name: licensing-config
   namespace: cozy-gpu-operator
-data:
+stringData:
   gridd.conf: |
     ServerAddress=nls.example.com
     ServerPort=443
-    FeatureType=1
+    FeatureType=1  # 1 for vGPU (vPC/vWS), 2 for Virtual Compute Server (vCS)
     # ServerPort depends on your NLS deployment (commonly 443 for DLS or 7070 for legacy NLS)
 ```
 
-Then reference it in the Package values:
+Then reference the Secret in the Package values:
 
 ```yaml
 gpu-operator:
@@ -330,7 +330,7 @@ gpu-operator:
     version: "550.90.05"
   driver:
     licensingConfig:
-      configMapName: licensing-config
+      secretName: licensing-config
 ```
 
 ### 4. Update the KubeVirt Custom Resource
@@ -391,7 +391,7 @@ virtctl console virtual-machine-gpu-vgpu
 ```
 
 ```console
-ubuntu@gpu-vgpu:~$ nvidia-smi
+ubuntu@virtual-machine-gpu-vgpu:~$ nvidia-smi
 +-----------------------------------------------------------------------------------------+
 | NVIDIA-SMI 550.90.05              Driver Version: 550.90.05    CUDA Version: 12.4       |
 |                                                                                         |