PVE IOMMU Isolation for Passthrough

From WikiMLT
Revision as of 17:56, 1 November 2022 by Spas (talk | contribs) (→‎IOMMU Hardware Isolation at the Proxmox host)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

IOM­MU Hard­ware Iso­la­tion at the Prox­mox host

Here we will set­up IOM­MU at In­tel based host sys­tem. More de­tails (and in­for­ma­tion about the host's hard­ware in use) are pro­vid­ed in the man­u­al QEMU/KVM and GPU Passthrough in De­tails, here are list­ed on­ly the steps used to iso­late one NVIDIA Tes­la K20Xm, that will be used as GPU at a Win­dows Guest, but this is an­oth­er sto­ry. So let's be­gin. One im­por­tant thing in the cur­rent case is that the Tes­la K20Xm is PCI‑E 2.0, so to guar­an­tee a sta­ble work of the serv­er, I was need­ed to go in­to its BIOS and set the link speed PCI‑E 2.0 for the slot in use. Al­so the op­tion Above 4G de­cod­ing in the BIOS is en­abled.

En­able IOM­MU

En­able IOM­MU iso­la­tion by per­form­ing the fol­low­ing steps.

nano /etc/default/grub
#GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
update-grub
systemctl reboot

Find the de­vices to be iso­lat­ed

Find the de­vices to be iso­lat­ed by us­ing the fol­low­ing script – source.

nano /usr/local/bin/get_iommu_groups.sh && chmod +x /usr/local/bin/get_iommu_groups.sh
#Script
#!/bin/bash
# https://mathiashueber.com/pci-passthrough-ubuntu-2004-virtual-machine/
# change the 9999 if needed
shopt -s nullglob
for d in /sys/kernel/iommu_groups/{0..9999}/devices/*; do
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    printf 'IOMMU Group %s ' "$n"
    lspci -nns "${d##*/}"
done

Run the script and fil­ter the out­put.

get_iommu_groups.sh | grep -iP 'Ethernet controller|NVIDIA'
#Out­put
IOMMU Group 40 02:00.0 3D controller [0302]: NVIDIA Corporation GK110GL [Tesla K20Xm] [10de:1021] (rev a1)
IOMMU Group 43 08:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
IOMMU Group 44 09:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)

We will iso­late IOM­MU Group 40 that con­tains PCI-bus 02:00.0,de­vice Id: 10de:1021.

VI­FO Mod­ules

You'll need to add a few VFIO mod­ules to your Prox­mox sys­tem.

nano /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf

Black­list­ing Dri­vers

nano /etc/modprobe.d/blacklist.conf
# Blacklist NVIDIA Modules - 'lsmod', 'dmesg' and 'lspci -nnv'
blacklist nvidiafb
blacklist nouveau
blacklist nvidia_drm
blacklist nvidia
blacklist rivafb
blacklist rivatv
blacklist snd_hda_intel
blacklist radeon

options nouveau modeset-0
update-initramfs -u
reset 
reboot

Adding GPU to VI­FO

lspci -n -s 02:00
02:00.0 0302: 10de:1021 (rev a1)
nano /etc/modprobe.d/vfio.conf
softdep nouveau pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
options vfio-pci ids=10de:1021 disable_vga=1
update-initramfs -u
reset 
systemctl reboot

Ver­i­fy the Iso­la­tion

Ver­i­fy the Iso­la­tion of the GPU by the help of the fol­low­ing com­mand. Note the line Ker­nel dri­ver in use: vfio-pci.

sudo lspci -nnv -s 02:00
#Out­put
02:00.0 3D controller [0302]: NVIDIA Corporation GK110GL [Tesla K20Xm] [10de:1021] (rev a1)
        Subsystem: NVIDIA Corporation GK110GL [Tesla K20Xm] [10de:097d]
        Physical Slot: 1
        Flags: fast devsel, IRQ 255, NUMA node 0, IOMMU group 40
        Memory at fa000000 (32-bit, non-prefetchable) [disabled] [size=16M]
        Memory at 23fe0000000 (64-bit, prefetchable) [disabled] [size=256M]
        Memory at 23ff0000000 (64-bit, prefetchable) [disabled] [size=32M]
        Expansion ROM at fb000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau

Ref­er­ences