Troubleshooting
Troubleshooting¶
Common Issues¶
GPU not showing in allocatable resources:
- Verify VFIO binding:
lspci -nnk -s 08:00.0should showvfio-pci - Check GPU Operator logs:
kubectl logs -n gpu-operator-resources <pod-name> - Ensure
vm-passthroughlabel is set on the correct node
IOMMU group contains other devices:
- You may need ACS override patches (advanced)
- Consider passing through the entire IOMMU group
- Consult your motherboard documentation for IOMMU configuration
Bindings don't persist after reboot:
- Verify kernel parameters:
cat /proc/cmdline | grep vfio-pci.ids - Check initramfs date:
ls -lh /boot/initrd* - Enable systemd service as backup:
systemctl enable vfio-pci-bind.service
VM fails to start with GPU:
- Check VM definition includes correct device name
- Verify host has available GPU resources:
kubectl describe node <node-name> - Review KubeVirt logs:
kubectl logs -n kubevirt <virt-launcher-pod>
Verification Commands Reference¶
# Check IOMMU enabled
cat /proc/cmdline | grep iommu
# List VFIO modules
lsmod | grep vfio
# Check device driver
lspci -nnk -s 08:00.0
# View IOMMU groups
find /sys/kernel/iommu_groups/ -type l
# Check node allocatable resources
kubectl get nodes -o json | jq '.items[].status.allocatable'
# GPU Operator status
kubectl get pods -n gpu-operator-resources
kubectl describe node <node-name> | grep -A10 Allocatable
Additional Resources¶
- VFIO Documentation: kernel.org/doc/vfio
- KubeVirt GPU Passthrough: kubevirt.io
- NVIDIA GPU Operator: docs.nvidia.com/datacenter/cloud-native
- Juno Documentation: Contact your Juno representative for the latest integration guides
Document Version: 1.0
Last Updated: February 2025
Tested On: Debian 13, RTX 2080, KubeVirt 1.x, NVIDIA GPU Operator 24.x