QEMU/KVM and GPU Passthrough in Details: Difference between revisions
Line 355: | Line 355: | ||
* [https://wiki.debian.org/SecureBoot/VirtualMachine Debian Wiki: Secure Boot » Virtual Machine » Get Secure Boot Firmware from Fedora] ([https://rpmfind.net/linux/rpm2html/search.php?query=edk2-ovmf edk2-ovmf]) | * [https://wiki.debian.org/SecureBoot/VirtualMachine Debian Wiki: Secure Boot » Virtual Machine » Get Secure Boot Firmware from Fedora] ([https://rpmfind.net/linux/rpm2html/search.php?query=edk2-ovmf edk2-ovmf]) | ||
*[[wikipedia:System_Management_Mode|Wikipedia: UEFI and SMM]] | [[wikipedia:Unified_Extensible_Firmware_Interface#CSM_booting|Wikipedia: UEFI and CSM]] | *[[wikipedia:System_Management_Mode|Wikipedia: UEFI and SMM]] | [[wikipedia:Unified_Extensible_Firmware_Interface#CSM_booting|Wikipedia: UEFI and CSM]] | ||
{{Collapse/end}} | {{Collapse/end}} | ||
Revision as of 14:57, 2 September 2022
This is a documentation of my experience in creating a virtual machine capable to run Windows 10 guest OS (for desktop operations) within my home server which operating system is Ubuntu Server. The Windows 10 guest itself must be capable to run virtualization in order to use WSL2 inside. The passthrough option is not mandatory for my user case but I decided to try it.
The Host System
The host operating system is Ubuntu Server 20.04 with kernel 5.4. Also ProxmoxVE 7.2 with kernel 5.15 is valid tested host. The host CPU is Intel Xeon but in the this manual will be provided also AMD specific parameters and commands.
Host Hardware
- Lenovo ThinkServer TD350 with Xeon E5-2673 v3 » with the latest firmware
- 2x Gigabit Ethernet Adapter Intel i210
- 2x 8GB DDR4-2133MHz (1Rx4) RDIMM
- NVIDIA NVS 315 at PCIE Slot 1 » FOR PASSTHROUGH
- Onboard Aspeed AST2400 with 16MB memory, 1920x1200@60Hz max. (Roms and Drivers)
lspci | grep VGA
02:00.0 VGA compatible controller: NVIDIA Corporation GF119 [NVS 315] (rev a1)
07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
sudo lshw -class memory | sed -n -e '/bank:0/,/bank:1/p' -e '/bank:2/,/bank:3/p'| sed -e 's/^.*bank.*$//'
description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
product: HMA41GR7AFR4N-TF
vendor: Hynix Semiconductor
physical id: 0
serial: 517692CB
slot: CPU1 DIMM A1
size: 8GiB
width: 64 bits
clock: 2133MHz (0.5ns)
description: DIMM DDR4 Synchronous 2133 MHz (0.5 ns)
product: HMA41GR7AFR4N-TF
vendor: Hynix Semiconductor
physical id: 2
serial: 51769065
slot: CPU1 DIMM B1
size: 8GiB
width: 64 bits
clock: 2133MHz (0.5ns)
lscpu | sed -nr '/Model name/ s/.*:\s*(.*) @ .*/\1/p'
Intel(R) Xeon(R) CPU E5-2673 v3
sudo lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
Stepping: 2
CPU MHz: 1197.645
CPU max MHz: 3100.0000
CPU min MHz: 1200.0000
BogoMIPS: 4789.15
Virtualization: VT-x
L1d cache: 384 KiB
L1i cache: 384 KiB
L2 cache: 3 MiB
L3 cache: 30 MiB
NUMA node0 CPU(s): 0-23
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constan
t_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtp
r pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssb
d ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dther
m ida arat pln pts md_clear flush_l1d
Test the Virtualization Capabilities of the System
Check weather the system supports virtualisation. The following command must return at least 1
:
egrep -c '(vmx|svm)' /proc/cpuinfo
24
kvm-ok
from the package cpu-checker
:sudo apt install cpu-checker && kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used
Enable the relevant virtualisation settings (VT‑x/AMD‑V) in the UEFI/BIOS:
Setting-up the PCI Passthrough
This section is primary based on the great Mathias Hueber's manual, so not only the commands, but some of the sentences are copy/paste from there.
Enabling IOMMU
In order to enabling the IOMMU feature we must edit the configuration file /etc/default/grub
, as follow:
sudo nano /etc/default/grub # cat /etc/default/grub | grep 'GRUB_CMDLINE_LINUX_DEFAULT'
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on iommu=pt" # For Intel CPU (current case)
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on iommu=pt" # For AMD CPU
Short explanations:
- In computing, an input–output memory management unit (IOMMU) is a memory management unit (MMU) that connects a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. Like a traditional MMU, which translates CPU-visible virtual addresses to physical addresses, the IOMMU maps device-visible virtual addresses (also called device addresses or I/O addresses in this context) to physical addresses. Some units also provide memory protection from faulty or malicious devices.
- To enable single-root input/output virtualization (SR-IOV) in the kernel, configure
intel_iommu=on
in the grub file. To get the best performance, addiommu=pt
(pass-through) to the grub file when using SR-IOV. When in pass-through mode, the adapter does not need to use DMA translation to the memory, and this improves the performance. iommu=pt is needed mainly with hypervisor performance is needed. - The Open Virtual Machine Firmware (OVMF) is a project to enable UEFI support for virtual machines. Starting with Linux 3.9 and recent versions of QEMU, it is now possible to passthrough a graphics card, offering the VM native graphics performance which is useful for graphic-intensive tasks.
References:
Update the boot manager configuration and reboot the system.
sudo update-grub
sudo systemctl reboot
For systemd boot manager as used in Pop!_OS.
One can use the kernelstub module, on systemd booting operating systems, in order to provide boot parameters. Use it like so:
sudo kernelstub -o "amd_iommu=on amd_iommu=pt"
And later to do the isolation use (with correct ids):
sudo kernelstub --add-options "vfio-pci.ids=10de:1b80,10de:10f0,8086:1533"
References:
After the reboot verify does IOMMU is enabled:
sudo dmesg | grep -i 'IOMMU'
[0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-77-generic root=UUID=09e7c...14 ro intel_iommu=on iommu=pt
[0.068270] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-77-generic root=UUID=09e7c...14 ro intel_iommu=on iommu=pt
[0.068326] DMAR: IOMMU enabled
[0.140127] DMAR-IR: IOAPIC id 1 under DRHD base 0xfbffc000 IOMMU 0
[0.140129] DMAR-IR: IOAPIC id 2 under DRHD base 0xfbffc000 IOMMU 0
[0.480605] iommu: Default domain type: Passthrough (set via kernel command line)
[0.764139] pci 0000:ff:0b.0: Adding to iommu group 0
[0.764165] pci 0000:ff:0b.1: Adding to iommu group 0
[0.764188] pci 0000:ff:0b.2: Adding to iommu group 0
[0.764348] pci 0000:ff:0c.0: Adding to iommu group 1
...
sudo dmesg | grep -i 'vfio' # For Intel CPU (current case)
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-77-generic root=UUID=09e7c8ed-fb55-4a44-8be4-18b1696fc714 ro intel_iommu=on iommu=pt kvm.ignore_msrs=1 irqpoll vfio-pci.ids=10de:107c,10de:0e08 vfio-pci.disable_vga=1
[ 0.068558] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-77-generic root=UUID=09e7c8ed-fb55-4a44-8be4-18b1696fc714 ro intel_iommu=on iommu=pt kvm.ignore_msrs=1 irqpoll vfio-pci.ids=10de:107c,10de:0e08 vfio-pci.disable_vga=1
[ 0.862286] VFIO - User Level meta-driver version: 0.3
[ 0.862594] vfio-pci 0000:02:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 0.880568] vfio_pci: add [10de:107c[ffffffff:ffffffff]] class 0x000000/00000000
[ 0.900583] vfio_pci: add [10de:0e08[ffffffff:ffffffff]] class 0x000000/00000000
sudo dmesg |grep 'AMD-Vi' # For AMD CPU
[ 0.607751] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.608569] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.608569] AMD-Vi: Extended features (0x58f77ef22294a5a): PPR NX GT IA PC GA_vAPIC
[ 0.608572] AMD-Vi: Interrupt remapping enabled
[ 0.890747] AMD-Vi: AMD IOMMUv2 loaded and initialized
Identification of the Group Controllers
In order to generate a tidy list of your grouped devices create a script as the follow.
nano ~/bin/get_iommu_groups.sh && chmod +x ~/bin/get_iommu_groups.sh
#!/bin/bash
# https://mathiashueber.com/pci-passthrough-ubuntu-2004-virtual-machine/
# change the 9999 if needed
shopt -s nullglob
for d in /sys/kernel/iommu_groups/{0..9999}/devices/*; do
n=${d#*/iommu_groups/*}; n=${n%%/*}
printf 'IOMMU Group %s ' "$n"
lspci -nns "${d##*/}"
done
Run the script and filter the output.
get_iommu_groups.sh | grep -iP 'VGA compatible controller|Ethernet controller|SATA controller|USB controller|NVIDIA'
IOMMU Group 30 00:11.4 SATA controller [0106]: Intel Corporation C610/X99 series chipset sSATA Controller [AHCI mode] [8086:8d62] (rev 05)
IOMMU Group 31 00:14.0 USB controller [0c03]: Intel Corporation C610/X99 series chipset USB xHCI Host Controller [8086:8d31] (rev 05)
IOMMU Group 33 00:1a.0 USB controller [0c03]: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #2 [8086:8d2d] (rev 05)
IOMMU Group 38 00:1d.0 USB controller [0c03]: Intel Corporation C610/X99 series chipset USB Enhanced Host Controller #1 [8086:8d26] (rev 05)
IOMMU Group 39 00:1f.2 SATA controller [0106]: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] [8086:8d02] (rev 05)
IOMMU Group 40 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 315] [10de:107c] (rev a1)
IOMMU Group 40 02:00.1 Audio device [0403]: NVIDIA Corporation GF119 HDMI Audio Controller [10de:0e08] (rev a1)
IOMMU Group 42 07:00.0 VGA compatible controller [0300]: ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000] (rev 30)
IOMMU Group 43 08:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
IOMMU Group 44 09:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
We will isolate IOMMU Group 40
that contains PCI-bus 02:00.0
[device ID 10de:107c
] and 02:00.1
[device ID 10de:0e08
].
Isolation of the Guest GPU
In order to isolate the GPU we have two options. Select the devices by PCI bus address or by device ID. Both options have pros and cons. Here we will isolate VFIO-pci driver by device id. This option should only be used, in case the graphic cards (or other devices that will be isolated) in the system are not exactly the same model, otherwise we need to use isolation by PCI bus, because the devices will have an identical IDs.
sudo nano /etc/default/grub # cat /etc/default/grub | grep 'GRUB_CMDLINE_LINUX_DEFAULT'
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on iommu=pt kvm.ignore_msrs=1 kvm.report_ignored_msrs=0 irqpoll vfio-pci.ids=10de:107c,10de:0e08 vfio-pci.disable_vga=1"
Short explanations:
- The command
ignore_msrs
is only necessary for Windows 10 versions higher 1803 (otherwise BSOD). - The command
irqpoll
is a work around for an error likeirq XX: nobody cared (try booting with the "irqpool" option)…
, possibly hang and restart of the host. Actually, I think, this problem was solved by enabling the MSI (Message Signaled Interrupts) option in the guest OS, as it is described below. - The commands
vfio-pci.disable_vga=1
is an attempt for workaround for my system, which hangs during the boot while a monitor is connected to the guest GPU. But actually it doesn't change anything in my particular case.
References:
Update the boot manager configuration and reboot the system.
sudo update-grub
sudo systemctl reboot
After this reboot the isolated GPU will be ignored by the host OS. Now, you have to use the other GPU for the host OS. After the reboot, verify the Isolation of the guest GPU by analyze the output of the following command:
sudo lspci -nnv -s 02:00
- Note the lines
Kernel driver in use: vfio-pci
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF119 [NVS 315] [10de:107c] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation GF119 [NVS 315] [10de:102f] Physical Slot: 1 Flags: bus master, fast devsel, latency 0, IRQ 255, NUMA node 0 Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Memory at 23ff0000000 (64-bit, prefetchable) [size=128M] Memory at 23ff8000000 (64-bit, prefetchable) [size=32M] I/O ports at d000 [size=128] Expansion ROM at fb000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia 02:00.1 Audio device [0403]: NVIDIA Corporation GF119 HDMI Audio Controller [10de:0e08] (rev a1) Subsystem: NVIDIA Corporation GF119 HDMI Audio Controller [10de:102f] Physical Slot: 1 Flags: fast devsel, IRQ 255, NUMA node 0 Memory at fb080000 (32-bit, non-prefetchable) [disabled] [size=16K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel
Congratulations, the hardest part is done!
Setting-up the Software Environment
There is a plenty of manuals how to install and manage KVM and create VMs. So I will post few references that I read and will explain just few things about this process.
Install QEMU, KVM, LIBVIRT
sudo apt install qemu-system-x86 libvirt-daemon bridge-utils
sudo apt install libvirt-clients virtinst libosinfo-bin ovmf
sudo apt install virt-manager virt-viewer remmina # For desktop user
The above set of packages is for the latest Debian based distributions. For little bit older like Ubuntu 20.04 try with the following set.
sudo apt install qemu qemu-kvm bridge-utils
sudo apt install libvirt-daemon libvirt-clients virtinst libosinfo-bin ovmf
sudo apt install virt-manager virt-viewer remmina # For desktop user
Explanation about the packages:
- The
qemu
package (quick emulator) is an application that allows you to perform hardware virtualization. - The
qemu-kvm
package is the main KVM package. - The
libvrit-daemon
is the virtualization daemon. - The
bridge-utils
package helps you create a bridge connection to allow other users to access a virtual machine other than the host system. - The
virtinst
package contains programs to create and clone virtual machines. It is a set of command-line tools to create virtual machines usinglibvirt
. - The
libosinfo-bin
package contains tools for querying the osinfo database vialibosinfo
… It includes a database containing device metadata and provides APIs to match/identify optimal devices for deploying an operating system on a hypervisor. - The
ovmf
package is UEFI firmware for 64-bit x86 virtual machines. Open Virtual Machine Firmware is a build of EDK II for 64-bit x86 virtual machines. It includes full support for UEFI, including Secure Boot, allowing use of UEFI in place of a traditional BIOS in your VM. - The
virt-manager
is an application for managing virtual machines through a graphical user interface. - The
virt-viewer
is a SPICE GUI client. - The
remmina
package is remote session manager it supports RDP, VNC, SSH and SFTP protocols.
Verify whether KVM module is loaded into the loaded and verify whether libvirt daemon will start automatically:
sudo systemctl is-active libvirtd
lsmod | grep -i kvm
Basic management – enable, start, get the status or stop and disable libvirtd.service
:
sudo systemctl (enable|start|status/stop|disable) libvirtd.service
Add your user to the libvirt and kvm groups in order to execute related command without sudo
:
sudo usermod -aG libvirt $USER
sudo usermod -aG kvm $USER
grep "$USER" /etc/group
NVIDIA Kernel Modules and Drivers at the Host Level
Note this guide covers installation on Linux server where the host wont use the GPU or any other NVIDIA GPU.
Remove previously installed NVIDIA drivers. And analyze the output of lsmod
, dmesg
and lspci ‑nnv
in order to find which modules are related to the guest GPU and blacklist
them by creating a new section at the bottom of the file /etc/modprobe.d/blacklist.conf
.
sudo apt remove --purge nvidia-headless-390 && sudo apt autoremove && sudo apt autoclean
dpkg -l | grep -i nvidia
sudo apt-get remove --purge '^nvidia-.*'
sudo apt autoremove && sudo apt autoclean
sudo nano /etc/modprobe.d/blacklist.conf
# Blacklist NVIDIA Modules:
# 'lsmod', 'dmesg' and 'lspci -nnv'
blacklist nvidiafb
blacklist nouveau
blacklist nvidia_drm
blacklist nvidia
blacklist rivafb
blacklist rivatv
blacklist snd_hda_intel
References:
Deploy fresh OVMF Firmware for VMs
This step is no longer required. The UEFI images from the latest versions of the ovmf
package installed above are robust, stable and fast.
The default OVMF images that was installed in the previous section are relatively old. In addition I didn't succeed to setup WSL2 with these UEFI images. Also with the images provided below the VM performance is about 5% faster.
Download and deploy an appropriate package with fresh OVMF images from kraxel.org/repos/jenkins/edk2/:
mkdir ~/Downloads/kvm-ovmf-kraxel/
cd ~/Downloads/kvm-ovmf-kraxel/
wget https://www.kraxel.org/repos/jenkins/edk2/edk2.git-ovmf-x64-0-20210421.18.g15ee7b7689.noarch.rpm
rpm2cpio edk2.git-ovmf-x64-0-20210421.18.g15ee7b7689.noarch.rpm | cpio -idmv
sudo cp -R usr/share/edk2.git /usr/share/
sudo cp usr/share/qemu/firmware/* /usr/share/qemu/firmware/
OVMF_VARS-with-csm.fd
.
Additional Setup, Issues, Solutions and Tips
- Linux Network Bridge (refs)
- Virt-manager Setting-up Windows Virtual Machines
- QEMU/KVM and GPU Passthrough Troubleshooting
- QEMU/KVM Guest tools
General References
- Mathias Hueber: VMs with PCI passthrough on Ubuntu 20.04, straightforward guide for gaming on a virtual machine [the main reference]
- Server world: KVM: Install, Create VM, Libvirt basics, Management tools, SPICE, Nested, Migration [simple examples]
- Adam Gradzki's Blog: Faster Virtual Machines on Linux Hosts with GPU Acceleration [interesting advanced manual]
- ArchLinux: PCI passthrough via OVMF | ArchLinux: PCI passthrough via OVMF » Performance tuning
- Red Hat Customer Portal: Configure a host for PCI Passthrough
- KVM on Z: Disk Performance Hints & Tips [not so useful]
- Ubuntu Community Wiki: KVM/Networking
- RedHat OpenShift: Installing the QEMU guest agent on virtual machines | Proxmox: Qemu-guest-agent
- Get Labs Done: How To Install Windows 10 on Ubuntu KVM?
- Get Labs Done: How To Install Ubuntu Desktop On KVM?
- Ubuntu Downloads: Install Ubuntu Core on KVM
- How to Install Nvidia Drivers on Ubuntu 20.04
- Is there something like "VirtualBox Guest Additions" for QEMU/KVM?
- Mouse cursor in KVM Ubuntu Guest! [SOLVED]