APPLIED: [SRU][P/N][PATCH 0/2] [UBUNTU 24.04] s390/pci: Don't abort recovery for user-space drivers (LP: #2121150)

Stefan Bader stefan.bader at canonical.com
Fri Sep 12 11:40:33 UTC 2025


On 29/08/2025 13:33, Massimiliano Pellizzer wrote:
> BugLink: https://bugs.launchpad.net/bugs/2121150
> 
> [ Impact ]
> 
> s390/pci: Don't abort recovery for user-space drivers
> 
> When a PCI device under the control of a vfio-pci based user-space driver
> encounters a PCI error event the subsequent error recovery flow in the kernel is
> aborted because the vfio-pci driver only implements the error_detected PCI error
> handler callback. This leaves the PCI device in the error state requiring
> unbinding/re-binding of the driver to get it operational again instead of only
> having to re-init the user-space driver.
> 
> According to the kernel documentation implementing only the error_detected()
> callback from the error handling operations should be enough for minimal
> recovery support. Contrary to this s390 so far required also the reset_slot()
> and resume() callbacks to be implemented, otherwise recovery would be aborted.
> 
> Remove the requirement for the additional operations bringing s390 in line with
> AER and EEH error recovery flows.
> 
> [ Fix ]
> 
> Backport the following commit from upstream:
> - 62355f1f87b8 s390/pci: Allow automatic recovery with minimal driver support
> 
> [ Test Plan ]
> 
> Bind a PCI device to vfio-pci.
> Start a user-space workload using the device.
> Use the s390 PCI error injection interface to trigger a recoverable PCI error.
> Observe kernel logs (dmesg) and confirm that the vfio-pci driver’s
> error_detected() callback is invoked and recovery proceeds without abort.
> After recovery, check that the device is functional again in the guest or user-
> space application without requiring manual unbind/rebind.
> 
> [ Regression Potential ]
> 
> The fix affects how the s390 PCI error handler interprets missing callbacks and
> the PCI_ERS_RESULT_NONE return code.
> A bug here could cause the recovery flow to proceed when it should have aborted,
> or to treat driver abstention as successful recovery even in faulty situations.
> Users may see PCI devices reported as recovered but remaining non-functional,
> recovery loops that repeatedly attempt to re-enable or reset devices, or devices
> silently failing I/O without triggering the expected operator intervention.
> 
> 


Applied to plucky,noble:linux/master-next. Thanks.

-Stefan

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xE8675DEECBEECEA3.asc
Type: application/pgp-keys
Size: 48643 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250912/858badbb/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250912/858badbb/attachment-0001.sig>


More information about the kernel-team mailing list