[SRU][N/O][PATCH 0/1] Fix vfio_pci soft lockup on VM start while using PCIe passthrough

Jacob Martin jacob.martin at canonical.com
Tue Dec 10 15:23:16 UTC 2024


BugLink: https://bugs.launchpad.net/bugs/2089306

SRU Justification

[Impact]

The patch "vfio/pci: Use unmap_mapping_range()" rewrote the way VFIO tracks
mapped regions to use the "vmf_insert_pfn" function instead of tracking them
itself and using "io_remap_pfn_range". The implementation using
"vmf_insert_pfn" is significantly slower. To mitigate this slowdown, "vfio/pci:
Insert full vma on mmap'd MMIO fault" was introduced to prefault the entirety
of areas mapped by vfio_pci, resulting in soft lockup warnings on the host for
large BAR region devices. Reverting this prefaulting behavior does not fully
resolve the slowness, as a VM still experiences extremely slow accesses to the
passthrough devices as VMAs get faulted in, causing soft lockup warnings in the
guest during boot. Thus, "vfio/pci: Use unmap_mapping_range()" must also be
reverted to restore performance to that of versions prior to 6.8.0-48-generic.

[Fix]

Both of these performance issues are resolved upstream by patchset [1], but
this would be a complex backport to 6.8 and 6.11, with significant changes to
core parts of the kernel.

Reverting the following commits resolves the issue, with a much reduced
potential for regression:
- "mm: use rwsem assertion macros for mmap_lock" (revert needed in Oracular,
  not present in Noble)
- "vfio/pci: Insert full vma on mmap'd MMIO fault"
- "vfio/pci: Use unmap_mapping_range()"

[Test Plan]

Tested on a DGX H100 system, verified to reduce VM start time with 8
passthrough H100 GPUs from 45 minutes back down to 5 minutes and eliminate the
soft lockup warnings.

Reproduced using a libvirt VM, created with:
	$ sudo virt-install --connect qemu:///system -v --name gpu-pt-test \
		--memory 16384 --vcpus 16 --cpu host --cdrom \
		/ubuntu-24.04.1-live-server-amd64.iso --os-variant ubuntu24.04 \
		--disk size=512 -w bridge=virbr0 --graphics none \
		--console pty,target.type=virtio \
		--hostdev pci_0000_1b_00_0 --hostdev pci_0000_43_00_0 \
		--hostdev pci_0000_52_00_0 --hostdev pci_0000_61_00_0 \
		--hostdev pci_0000_9d_00_0 --hostdev pci_0000_d1_00_0 \
		--hostdev pci_0000_df_00_0 --hostdev pci_0000_c3_00_0

[Where problems could occur]

The reverts here primarily affect the vfio_pci driver. However, in Oracular
"mm: use rwsem assertion macros for mmap_lock" is also reverted. This could
result in misbehavior of the vfio_pci driver. In Oracular, it could also result
in mmap locking bugs going undetected unless testing is done with lockdep
enabled.

[1] https://patchwork.kernel.org/project/linux-mm/list/?series=883517

-- 
2.43.0




More information about the kernel-team mailing list