ACK/Cmnt: [SRU][N/O][PATCH 0/1] Fix vfio_pci soft lockup on VM start while using PCIe passthrough

Koichiro Den koichiro.den at canonical.com
Wed Dec 11 02:56:24 UTC 2024


On Tue, Dec 10, 2024 at 09:23:16AM -0600, Jacob Martin wrote:
> BugLink: https://bugs.launchpad.net/bugs/2089306
> 
> SRU Justification
> 
> [Impact]
> 
> The patch "vfio/pci: Use unmap_mapping_range()" rewrote the way VFIO tracks
> mapped regions to use the "vmf_insert_pfn" function instead of tracking them
> itself and using "io_remap_pfn_range". The implementation using
> "vmf_insert_pfn" is significantly slower. To mitigate this slowdown, "vfio/pci:
> Insert full vma on mmap'd MMIO fault" was introduced to prefault the entirety
> of areas mapped by vfio_pci, resulting in soft lockup warnings on the host for
> large BAR region devices. Reverting this prefaulting behavior does not fully
> resolve the slowness, as a VM still experiences extremely slow accesses to the
> passthrough devices as VMAs get faulted in, causing soft lockup warnings in the
> guest during boot. Thus, "vfio/pci: Use unmap_mapping_range()" must also be
> reverted to restore performance to that of versions prior to 6.8.0-48-generic.
> 
> [Fix]
> 
> Both of these performance issues are resolved upstream by patchset [1], but
> this would be a complex backport to 6.8 and 6.11, with significant changes to
> core parts of the kernel.
> 
> Reverting the following commits resolves the issue, with a much reduced
> potential for regression:
> - "mm: use rwsem assertion macros for mmap_lock" (revert needed in Oracular,
>   not present in Noble)
> - "vfio/pci: Insert full vma on mmap'd MMIO fault"
> - "vfio/pci: Use unmap_mapping_range()"
> 
> [Test Plan]
> 
> Tested on a DGX H100 system, verified to reduce VM start time with 8
> passthrough H100 GPUs from 45 minutes back down to 5 minutes and eliminate the
> soft lockup warnings.
> 
> Reproduced using a libvirt VM, created with:
> 	$ sudo virt-install --connect qemu:///system -v --name gpu-pt-test \
> 		--memory 16384 --vcpus 16 --cpu host --cdrom \
> 		/ubuntu-24.04.1-live-server-amd64.iso --os-variant ubuntu24.04 \
> 		--disk size=512 -w bridge=virbr0 --graphics none \
> 		--console pty,target.type=virtio \
> 		--hostdev pci_0000_1b_00_0 --hostdev pci_0000_43_00_0 \
> 		--hostdev pci_0000_52_00_0 --hostdev pci_0000_61_00_0 \
> 		--hostdev pci_0000_9d_00_0 --hostdev pci_0000_d1_00_0 \
> 		--hostdev pci_0000_df_00_0 --hostdev pci_0000_c3_00_0
> 
> [Where problems could occur]
> 
> The reverts here primarily affect the vfio_pci driver. However, in Oracular
> "mm: use rwsem assertion macros for mmap_lock" is also reverted. This could
> result in misbehavior of the vfio_pci driver. In Oracular, it could also result
> in mmap locking bugs going undetected unless testing is done with lockdep
> enabled.
> 
> [1] https://patchwork.kernel.org/project/linux-mm/list/?series=883517
> 
> -- 
> 2.43.0
> 

Acked-by: Koichiro Den <koichiro.den at canonical.com>

I agree with not backporting those large implementations introduced in 6.12+
to the generic kernels to maintain stability and safety. This might be of
interest to those involved with LP#2086668 or something similar.



More information about the kernel-team mailing list