Skip to content

NVIDIA GPU Operator

Juno integrates with NVIDIA's official GPU Operator to manage GPU resources in Kubernetes.

Step 1: Install the NVIDIA GPU Operator

Install GPU Operator using the latest version
  • Navigate to GenesisApp Store
  • Install the official NVIDIA GPU Operator
  • Use latest stable CRD and kernel modules

Step 2: Configuring Nodes for GPU

We will want to configure our nodes to handle our GPU. This can be done via our nodes labels. We can adjust our labels per node to specify if the GPU should be utilized for one of the following:

  • Virtual Machine Passthrough
  • MIG
  • Timeslicing

Tip

For virtual machine passthrough, ensure your gpu has been configured properly. You can follow our guide here

1. Navigate to Node Labels Table
  1. Go to Networking tab
  2. Click Node Labels (top right)
2. Update Node Labels
  1. Create a new label:

    • Key: nvidia.com/gpu.workload.config
    • Value: vm-passthrough || timeslicing || mig
  2. Click Select Servers

  3. Select all servers with GPUs already configured

Critical

If utilizing vm-passthrough ensure the GPU has been configured for VFIO passthrough. If you haven't done this you can see our guide for setting up VM GPU passthrough here

Verify GPU Operator Deployment

1. Ensure Sandbox Components are Running

Gather our sandbox pods:

kubectl get pods -A | grep sandbox

Expected output:

testing-argocd-gpu-operator   nvidia-sandbox-device-plugin-daemonset-67mvp    1/1   Running   0   19h
testing-argocd-gpu-operator   nvidia-sandbox-validator-jqw9d                  1/1   Running   0   19h

2. Confirm GPU Detection

Verify the GPU is listed as an allocatable resource:

kubectl get nodes <node-name> -o json | jq '.status.allocatable'

Expected output:

{
  "cpu": "16",
  "devices.kubevirt.io/kvm": "1k",
  "devices.kubevirt.io/tun": "1k",
  "devices.kubevirt.io/vhost-net": "1k",
  "ephemeral-storage": "442866475890",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "memory": "32790116Ki",
  "nvidia.com/TU104_GEFORCE_RTX_2080": "1",
  "nvidia.com/TU104_HD_AUDIO_CONTROLLER": "0",
  "nvidia.com/gpu": "0",
  "pods": "110"
}

Key indicator: "nvidia.com/TU104_GEFORCE_RTX_2080": "1" confirms the GPU is registered and available for passthrough.

Info

Juno's UI doesn't currently list all allocatable resources in the dashboard. This feature will be released soon. Use the command line in the meantime.

If coming from our KubeVirt VM GPU passthrough guide, you can continue to the next step here