[docs] Add vGPU setup guide for GPU sharing between VMs by lexfrei · Pull Request #467 · cozystack/website

lexfrei · 2026-04-02T12:22:16Z

What this PR does

Expands the GPU documentation page with a practical guide for deploying the GPU Operator in vGPU mode. Replaces the brief theoretical section with step-by-step instructions covering:

Building the proprietary vGPU Manager container image
Deploying GPU Operator with the vgpu variant via Package CR
NVIDIA License Server (NLS) configuration
KubeVirt mediatedDeviceTypes setup for VM access
vGPU profile reference table for L40S
Complete VM creation example with vGPU resource

Summary by CodeRabbit

Documentation
- Replaced GPU Sharing guide with an expanded vGPU guide covering mediated devices and why MIG isn’t suitable for passthrough
- Added prerequisites, licensing notes, and warning that proprietary vGPU Manager driver must be obtained from NVIDIA
- Provided step-by-step workflow: pre-built driver image build/publish, GPU Operator vgpu variant, License Server wiring, and VM updates to permit mediated devices
- Added vGPU VM examples, sample verification output, profile reference table, and updated open-source vGPU wording

Add practical instructions for deploying GPU Operator with vGPU variant: - Building proprietary vGPU Manager container image - Deploying with vgpu variant via Package CR - NLS license server configuration - KubeVirt mediatedDeviceTypes setup - vGPU profile reference table for L40S - VM creation example with vGPU resource Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>

netlify · 2026-04-02T12:22:22Z

✅ Deploy Preview for cozystack ready!

Name	Link
🔨 Latest commit	`492f318`
🔍 Latest deploy log	https://app.netlify.com/projects/cozystack/deploys/69ce67b64ef3e60008af1961
😎 Deploy Preview	https://deploy-preview-467--cozystack.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-04-02T12:22:24Z

📝 Walkthrough

Walkthrough

Documentation replaces the previous GPU-sharing overview with a focused vGPU (mediated device) guide covering prerequisites, NVIDIA vGPU licensing, GPU Operator vgpu variant, vGPU Manager image build, NVIDIA License Server wiring, KubeVirt mediated device config, VM examples, and vGPU profile details (≤50 words).

Changes

Cohort / File(s)	Summary
GPU vGPU Documentation `content/en/docs/v1/virtualization/gpu.md`	Rewrote GPU sharing section into a full vGPU (mdev) guide: new prerequisites and licensing notes; instructions to build/publish vGPU Manager driver container; GPU Operator `variant: vgpu` installation and node labeling; NVIDIA License Server Secret/ConfigMap example and Package wiring; KubeVirt mediatedDevices configuration; VM example requesting `nvidia.com/<profile>` and verification; added L40S vGPU profiles table; clarified open-source vGPU wording.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped through docs with glee today,
Tucked mdev notes and profiles away,
Built driver images, licenses in tow,
VMs now share GPUs—watch them go! 🎉

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a comprehensive vGPU setup guide for GPU sharing between virtual machines, which aligns with the expanded documentation covering vGPU Manager deployment, licensing, KubeVirt configuration, and VM examples.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch docs/gpu-vgpu-setup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request provides comprehensive documentation for configuring NVIDIA vGPU sharing for virtual machines, including prerequisites, image building, operator installation, and licensing setup. The review feedback suggests clarifying the FeatureType parameter in the licensing configuration and updating the example command prompt to maintain consistency with the platform's virtual machine naming conventions.

gemini-code-assist · 2026-04-02T12:23:22Z

content/en/docs/v1/virtualization/gpu.md

+  gridd.conf: |
+    ServerAddress=nls.example.com
+    ServerPort=443
+    FeatureType=1


It is helpful to clarify what the FeatureType value represents to assist users in customizing their configuration. In the NVIDIA Grid configuration, 1 corresponds to the "NVIDIA vGPU" (vPC/vWS) feature, while 2 is for "NVIDIA Virtual Compute Server" (vCS).

Suggested change

FeatureType=1

FeatureType=1 # 1 for vGPU

gemini-code-assist · 2026-04-02T12:23:22Z

content/en/docs/v1/virtualization/gpu.md

+```
+
+```console
+ubuntu@gpu-vgpu:~$ nvidia-smi


For consistency with the GPU passthrough example (line 194) and Cozystack's default naming convention for virtual machine instances, the hostname in the command prompt should include the virtual-machine- prefix. Since the VM name is defined as gpu-vgpu on line 352, the resulting instance name is virtual-machine-gpu-vgpu.

Suggested change

ubuntu@gpu-vgpu:~$ nvidia-smi

ubuntu@virtual-machine-gpu-vgpu:~$ nvidia-smi

Replace simplified Containerfile with NVIDIA's Makefile-based build system from gitlab.com/nvidia/container-images/driver. The GPU Operator expects pre-compiled kernel modules, not a raw .run file. Add EULA warning about public redistribution of vGPU driver images. Add note about NLS ServerPort being deployment-dependent. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>

coderabbitai

🧹 Nitpick comments (1)

content/en/docs/v1/virtualization/gpu.md (1)
360-385: Name the manifest file explicitly before the apply command.

kubectl apply -f vmi-vgpu.yaml appears without first labeling the YAML block as vmi-vgpu.yaml (unlike the earlier passthrough example). Adding a filename label right above the manifest would remove ambiguity for copy/paste users.
✏️ Suggested doc tweak
 ### 5. Create a Virtual Machine with vGPU

+**vmi-vgpu.yaml**:
+
 ```yaml
 apiVersion: apps.cozystack.io/v1alpha1
 appVersion: '*'
 kind: VirtualMachine
 ...
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @content/en/docs/v1/virtualization/gpu.md around lines 360 - 385, Add an
explicit filename label above the YAML manifest block so users know the file
name to save before running kubectl; specifically, annotate the VirtualMachine
manifest block with "vmi-vgpu.yaml" (the same name used in the kubectl apply -f
vmi-vgpu.yaml command) by placing the filename line immediately before the
invocation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@content/en/docs/v1/virtualization/gpu.md`:
- Around line 360-385: Add an explicit filename label above the YAML manifest
block so users know the file name to save before running kubectl; specifically,
annotate the VirtualMachine manifest block with "vmi-vgpu.yaml" (the same name
used in the kubectl apply -f vmi-vgpu.yaml command) by placing the filename line
immediately before the ```yaml fence, ensuring consistency between the manifest
and the kubectl apply invocation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 150cd704-7d01-4c3d-86f5-81294e0a31b5

📥 Commits

Reviewing files that changed from the base of the PR and between 624a38c and 468dd7b.

📒 Files selected for processing (1)

content/en/docs/v1/virtualization/gpu.md

- Switch licensing config from ConfigMap to Secret (configMapName deprecated) - Add FeatureType comment explaining values (1=vGPU, 2=vCS) - Fix console hostname to match Cozystack naming convention (virtual-machine- prefix) Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

content/en/docs/v1/virtualization/gpu.md (1)

279-288: Make the imagePullSecrets snippet fully qualified to prevent misplacement.

At Line 279, the snippet is context-trimmed and can be pasted under the wrong key. Please show the full values path to avoid broken Package configuration.

Proposed doc patch

-    ```yaml
-    gpu-operator:
-      vgpuManager:
-        repository: registry.example.com/nvidia
-        version: "550.90.05"
-        imagePullSecrets:
-        - name: nvidia-registry-secret
-    ```
+    ```yaml
+    components:
+      gpu-operator:
+        values:
+          gpu-operator:
+            vgpuManager:
+              repository: registry.example.com/nvidia
+              version: "550.90.05"
+              imagePullSecrets:
+              - name: nvidia-registry-secret
+    ```

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@content/en/docs/v1/virtualization/gpu.md` around lines 279 - 288, The snippet
for imagePullSecrets is too context-trimmed and can be pasted under the wrong
key; update the example so it shows the full values path (wrap the existing
gpu-operator.vgpuManager block under components -> gpu-operator -> values ->
gpu-operator -> vgpuManager) so users see the complete hierarchy and the
imagePullSecrets entry (refer to symbols: components, gpu-operator, values,
gpu-operator, vgpuManager, imagePullSecrets) and replace the trimmed snippet
with this fully-qualified version.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@content/en/docs/v1/virtualization/gpu.md`:
- Around line 358-391: Add an explicit readiness check for the
VirtualMachineInstance before calling "virtctl console": after applying
vmi-vgpu.yaml (kubectl apply -f vmi-vgpu.yaml) add a step that waits for the VMI
to become Ready (e.g., using "kubectl get vmi -n tenant-example -w" or "kubectl
wait --for=condition=Ready vmi/gpu-vgpu -n tenant-example") so the subsequent
"virtctl console virtual-machine-gpu-vgpu" call won't fail intermittently.

---

Nitpick comments:
In `@content/en/docs/v1/virtualization/gpu.md`:
- Around line 279-288: The snippet for imagePullSecrets is too context-trimmed
and can be pasted under the wrong key; update the example so it shows the full
values path (wrap the existing gpu-operator.vgpuManager block under components
-> gpu-operator -> values -> gpu-operator -> vgpuManager) so users see the
complete hierarchy and the imagePullSecrets entry (refer to symbols: components,
gpu-operator, values, gpu-operator, vgpuManager, imagePullSecrets) and replace
the trimmed snippet with this fully-qualified version.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d6caa76-d2a5-4856-97eb-524345257f6b

📥 Commits

Reviewing files that changed from the base of the PR and between 468dd7b and 492f318.

📒 Files selected for processing (1)

content/en/docs/v1/virtualization/gpu.md

coderabbitai · 2026-04-02T13:05:39Z

content/en/docs/v1/virtualization/gpu.md

+### 5. Create a Virtual Machine with vGPU
+
+```yaml
+apiVersion: apps.cozystack.io/v1alpha1
+appVersion: '*'
+kind: VirtualMachine
+metadata:
+  name: gpu-vgpu
+  namespace: tenant-example
+spec:
+  running: true
+  instanceProfile: ubuntu
+  instanceType: u1.medium
+  systemDisk:
+    image: ubuntu
+    storage: 5Gi
+    storageClass: replicated
+  gpus:
+  - name: nvidia.com/NVIDIA_L40S-24Q
+  cloudInit: |
+    #cloud-config
+    password: ubuntu
+    chpasswd: { expire: False }
+```
+
+```bash
+kubectl apply -f vmi-vgpu.yaml
+```
+
+Once the VM is running, log in and verify the vGPU is available:
+
+```bash
+virtctl console virtual-machine-gpu-vgpu
+```


⚠️ Potential issue | 🟡 Minor

Add an explicit VM readiness check before opening console.

After Line 384, jumping directly to virtctl console can fail intermittently if the VM/VMI is not ready yet. Add a wait/check step to keep the flow deterministic.

Proposed doc patch

```bash kubectl apply -f vmi-vgpu.yaml

+Wait until the VM instance is ready:
+
+bash +kubectl get vmi -n tenant-example -w +
+
Once the VM is running, log in and verify the vGPU is available:

</details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @content/en/docs/v1/virtualization/gpu.md around lines 358 - 391, Add an
explicit readiness check for the VirtualMachineInstance before calling "virtctl
console": after applying vmi-vgpu.yaml (kubectl apply -f vmi-vgpu.yaml) add a
step that waits for the VMI to become Ready (e.g., using "kubectl get vmi -n
tenant-example -w" or "kubectl wait --for=condition=Ready vmi/gpu-vgpu -n
tenant-example") so the subsequent "virtctl console virtual-machine-gpu-vgpu"
call won't fail intermittently.

</details>  

lexfrei added this to Cozystack Roadmap Apr 2, 2026

gemini-code-assist bot reviewed Apr 2, 2026

View reviewed changes

lexfrei self-assigned this Apr 2, 2026

lexfrei marked this pull request as ready for review April 2, 2026 12:51

lexfrei requested review from kvaps and lllamnyp as code owners April 2, 2026 12:51

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] Add vGPU setup guide for GPU sharing between VMs#467

[docs] Add vGPU setup guide for GPU sharing between VMs#467
lexfrei wants to merge 3 commits intomainfrom
docs/gpu-vgpu-setup

lexfrei commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

netlify bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 2, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

gemini-code-assist bot Apr 2, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	ubuntu@gpu-vgpu:~$ nvidia-smi
	ubuntu@virtual-machine-gpu-vgpu:~$ nvidia-smi

Conversation

lexfrei commented Apr 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Summary by CodeRabbit

Uh oh!

netlify bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for cozystack ready!

Uh oh!

coderabbitai bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lexfrei commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

netlify bot commented Apr 2, 2026 •

edited

Loading

coderabbitai bot commented Apr 2, 2026 •

edited

Loading