ACK/Cmnt: [SRU][J][PATCH 0/1] PCI: Batch BAR sizing operations

Mitchell Augustin mitchell.augustin at canonical.com
Thu Apr 17 16:57:21 UTC 2025


Hi Kuba,

We did not include the "PCI: Fix BUILD_BUG_ON usage for old gcc" patch
in our SRU requests since the bug it fixes has only been reported to
impact GCC 5.3.1, which is several versions older than what we support
in Jammy. Since the default GCC in Jammy does not have any issues
compiling the Jammy kernel with "PCI: Batch BAR sizing operations", I
felt there was not a justification to pull it into the Jammy kernel.

Please let me know if you or anyone else on the team would prefer me
to submit it for review as well, and I can get that prepared.

-Mitchell Augustin



On Wed, Apr 16, 2025 at 4:27 PM Kuba Pawlak <kuba.pawlak at canonical.com> wrote:
>
> On 14.04.2025 17:15, Keifer Snedeker wrote:
> > BugLink: https://bugs.launchpad.net/bugs/2097389
> >
> > SRU Justification:
> >
> > [ Impact ]
> >
> > VM guests that have large-BAR GPUs passed through to them
> > will take 2x as long to initialize those devices' BARs without
> > this patch
> >
> > [ Test Plan ]
> >
> > I verified that this patch applies cleanly to the Jammy kernel
> > at 5.15.0-138.148
> > and resolves the bug on DGX H100 and DGX A100. I observed no
> > regressions. This can be verified on any machine with a GPU w/ a
> > sufficiently large BAR and the capability to pass through
> > to a VM using vfio.
> >
> > ppa:ks0/jammy-pci-probe-patch contains
> > the jammy-generic kernel with this patch applied and can be
> > used to validate this patch.
> >
> > To verify no regressions, I installed the kernel in that PPA
> > to the guest VM, then rebooted and confirmed that:
> > 1. The measured PCI initialization time on boot was ~50% of the
> > unmodified kernel
> > 2. Relevant parts of /proc/iomem mappings, the PCI init section
> > of dmesg output, and lspci -vv output remained unchanged between
> > the system with the unmodified kernel and with the patched kernel
> > 3. The Nvidia driver still successfully loaded and was shown via
> > nvidia-smi after the patch was applied
> >
> > [ Fix ]
> >
> > Roughly half of the time consuming device configuration options
> > invoked during the PCI probe function can be eliminated by
> > rearranging the memory and I/O disable/enable calls such that
> > they only occur per-device rather than per-BAR. This is what the
> > upstream patch does, and it results in roughly half the excess
> > initialization time being eliminated reliably during VM boot.
> >
> > [ Where problems could occur ]
> >
> > I do not expect any regressions. The only callers of ABIs changed
> > by this patch are also adjusted within this patch, and the functional
> > change only removes entirely redundant calls to disable/enable PCI
> > memory/IO. With that said, the main altered function is the PCI
> > probe function, which is highly used across Ubuntu deployments, so
> > we should pay attention to any user reports regarding PCI device
> > initialization just in case they might be related.
> >
> > [ Additional Context ]
> >
> > Upstream patch: https://lore.kernel.org/all/20250111210652.402845-1-alex.williamson@redhat.com/
> > Upstream bug report: https://lore.kernel.org/all/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@mail.gmail.com/
> > SRU request for this patch in Noble & Oracular (approved): https://lists.ubuntu.com/archives/kernel-team/2025-February/156788.html
> >
> >
> >
> > Alex Williamson (1):
> >    PCI: Batch BAR sizing operations
> >
> >   drivers/pci/iov.c   |  8 +++-
> >   drivers/pci/pci.h   |  4 +-
> >   drivers/pci/probe.c | 93 +++++++++++++++++++++++++++++++++------------
> >   3 files changed, 78 insertions(+), 27 deletions(-)
> >
>
> there is a followup commit for this one:
>
> commit 472ff48e2c09e49f2f90eeb6922f747306559506
> Author: Alex Williamson <alex.williamson at redhat.com>
> Date:   Wed Feb 12 11:53:32 2025 -0700
>
>      PCI: Fix BUILD_BUG_ON usage for old gcc
>
>      As reported in the below link, it seems older versions of gcc cannot
>      determine that the howmany variable is known for all callers. Include
>      a test so that newer compilers can enforce this sanity check and older
>      compilers can still work.  Add __always_inline attribute to give the
>      compiler an even better chance to know the inputs.
>
>      Link:
> https://lore.kernel.org/r/20250212185337.293023-1-alex.williamson@redhat.com
>      Fixes: 4453f360862e ("PCI: Batch BAR sizing operations")
>      Reported-by: Oleg Nesterov <oleg at redhat.com>
>      Link: https://lore.kernel.org/all/20250209154512.GA18688@redhat.com
>      Signed-off-by: Alex Williamson <alex.williamson at redhat.com>
>      Signed-off-by: Bjorn Helgaas <bhelgaas at google.com>
>      Tested-by: Oleg Nesterov <oleg at redhat.com>
>      Tested-by: Mitchell Augustin <mitchell.augustin at canonical.com>
>
>
> I don't know what this "older version of gcc" is but consider adding
> that patch to this review
>
>
> Acked-by: Kuba Pawlak <kuba.pawlak at canonical.com>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering



More information about the kernel-team mailing list