[SRU][P/N][PATCH 0/2] [UBUNTU 24.04] s390/pci: Fix immediate re-add of PCI function after remove (LP: #2114174)
Massimiliano Pellizzer
massimiliano.pellizzer at canonical.com
Thu Jun 12 17:42:51 UTC 2025
BugLink: https://bugs.launchpad.net/bugs/2114174
[ Impact ]
s390/pci: Fix immediate re-add of PCI function after remove
A PCI function may be reserved directly after being
deconfigured. If it subsequently returns back in the standby
state Linux may not be able to use the new instance generating
a kernel warning about trying to create an already existing
sysfs file for the IOMMU.
The problem occurs because the new instance of the same
underlying device is created before the prior instance is
completely torn down. This happens because the lifetime of the
PCI device representation in Linux is determined by reference
counts. A driver, the network stack, or even user-space
(including via vfio-pci) may be holding onto the device
represenation even after the underlying device is gone.
The solution to this is twofold. Firstly allow re-using the
pre-existing struct zpci_dev and/or struct pci_dev for the newly
re-added instance of the underlying device up until the point
where the struct zpci_dev is fully removed. Secondly serialize
the addition and removal of PCI functions such that re-adding
a new instance, after the old one is already being removed, will
wait for the removal to finish before adding the new instance.
This fix also builds on prior upstream work of serializing state
transitions for PCI devices e.g. from configured to standby.
[ Fix ]
Backport from mainline:
- 0d48566d4b58 s390/pci: rename lock member in struct zpci_dev
- bcb5d6c76903 s390/pci: introduce lock to synchronize state of zpci_dev's
- 6ee600bfbe0f s390/pci: remove hotplug slot when releasing the device
- c4a585e952ca s390/pci: Fix potential double remove of hotplug slot
- 42420c50c68f s390/pci: Fix missing check for zpci_create_device() error return
- 05a2538f2b48 s390/pci: Fix duplicate pci_dev_put() in disable_slot() when PF
has child VFs
- d76f96332967 s390/pci: Remove redundant bus removal and disable from
zpci_release_device()
- 47c397844869 s390/pci: Prevent self deletion in disable_slot()
- 4b1815a52d7e s390/pci: Allow re-add of a reserved but not yet removed device
- 774a1fa880bc s390/pci: Serialize device addition and removal
[ Test Plan ]
Compile tested only.
The fix will be tested further by IBM following the procedure below.
The issue can be reproduced looking at the behavior of the kernel wrt to NETH
PCI functions. In fact, IBM Z firmware temporarily reserves NETH PCI functions
to check for pending service when the last FID of a PCHID is deconfigured. When
nothing is pending the PCI function is immediately returned in the standby
state, thus triggering this issue quite reliably.
[ Where Problems Could Occur ]
The fix affects the PCI function lifecycle management in the s390 PCI hotplug
infrastructure, specifically the serialization and reuse logic of zpci_dev and
pci_dev structures during rapid remove and re-add cycles. An issue with this fix
may introduce problems such as stale or incorrectly reused device state, leading
to improper reinitialization of PCI functions.
More information about the kernel-team
mailing list