[SRU][N/O][PATCH 0/1] PCI: Batch BAR sizing operations

Mitchell Augustin mitchell.augustin at canonical.com
Wed Feb 5 15:52:54 UTC 2025


BugLink: https://bugs.launchpad.net/bugs/2097389

SRU Justification:

[ Impact ]

VM guests that have large-BAR GPUs passed through to them
will take 2x as long to initialize those devices' BARs without
this patch

[ Test Plan ]

I verified that this patch applies cleanly to the Noble kernel
at 6.8.0-53.55
and resolves the bug on DGX H100 and DGX A100. I observed no
regressions. This can be verified on any machine with a GPU w/ a
sufficiently large BAR and the capability to pass through
to a VM using vfio.

ppa:mitchellaugustin/linux-generic-pci-redundancy-fix contains
the noble-generic kernel with this patch applied and can be
used to validate this patch.

To verify no regressions, I installed the kernel in that PPA
to the guest VM, then rebooted and confirmed that:
1. The measured PCI initialization time on boot was ~50% of the
unmodified kernel
2. Relevant parts of /proc/iomem mappings, the PCI init section
of dmesg output, and lspci -vv output remained unchanged between
the system with the unmodified kernel and with the patched kernel
3. The Nvidia driver still successfully loaded and was shown via
nvidia-smi after the patch was applied

[ Fix ]

Roughly half of the time consuming device configuration options
invoked during the PCI probe function can be eliminated by
rearranging the memory and I/O disable/enable calls such that
they only occur per-device rather than per-BAR. This is what the
upstream patch does, and it results in roughly half the excess
initialization time being eliminated reliably during VM boot.

[ Where problems could occur ]

I do not expect any regressions. The only callers of ABIs changed
by this patch are also adjusted within this patch, and the functional
change only removes entirely redundant calls to disable/enable PCI
memory/IO. With that said, the main altered function is the PCI
probe function, which is highly used across Ubuntu deployments, so
we should pay attention to any user reports regarding PCI device
initialization just in case they might be related.

[ Additional Context ]

Upstream patch: https://lore.kernel.org/all/20250111210652.402845-1-alex.williamson@redhat.com/
Upstream bug report: https://lore.kernel.org/all/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@mail.gmail.com/



Alex Williamson (1):
  PCI: Batch BAR sizing operations

 drivers/pci/iov.c   |  8 +++-
 drivers/pci/pci.h   |  4 +-
 drivers/pci/probe.c | 93 +++++++++++++++++++++++++++++++++------------
 3 files changed, 78 insertions(+), 27 deletions(-)

-- 
2.43.0




More information about the kernel-team mailing list