Skip to content

Troubleshooting

Troubleshooting

Common Issues

GPU not showing in allocatable resources:

  • Verify VFIO binding: lspci -nnk -s 08:00.0 should show vfio-pci
  • Check GPU Operator logs: kubectl logs -n gpu-operator-resources <pod-name>
  • Ensure vm-passthrough label is set on the correct node

IOMMU group contains other devices:

  • You may need ACS override patches (advanced)
  • Consider passing through the entire IOMMU group
  • Consult your motherboard documentation for IOMMU configuration

Bindings don't persist after reboot:

  • Verify kernel parameters: cat /proc/cmdline | grep vfio-pci.ids
  • Check initramfs date: ls -lh /boot/initrd*
  • Enable systemd service as backup: systemctl enable vfio-pci-bind.service

VM fails to start with GPU:

  • Check VM definition includes correct device name
  • Verify host has available GPU resources: kubectl describe node <node-name>
  • Review KubeVirt logs: kubectl logs -n kubevirt <virt-launcher-pod>

Verification Commands Reference

# Check IOMMU enabled
cat /proc/cmdline | grep iommu

# List VFIO modules
lsmod | grep vfio

# Check device driver
lspci -nnk -s 08:00.0

# View IOMMU groups
find /sys/kernel/iommu_groups/ -type l

# Check node allocatable resources
kubectl get nodes -o json | jq '.items[].status.allocatable'

# GPU Operator status
kubectl get pods -n gpu-operator-resources
kubectl describe node <node-name> | grep -A10 Allocatable

Additional Resources


Document Version: 1.0
Last Updated: February 2025
Tested On: Debian 13, RTX 2080, KubeVirt 1.x, NVIDIA GPU Operator 24.x