[SRU][J:linux-bluefield][PATCH v1 0/1] UBUNTU: SAUCE: ipmb_host: IPMI panic event causes hang
Chris Babroski
cbabroski at nvidia.com
Tue May 13 12:36:44 UTC 2025
BugLink: https://bugs.launchpad.net/bugs/2110498
SRU Justification:
[Impact]
When the kernel configs CONFIG_IPMI_PANIC_EVENT or CONFIG_IPMI_PANIC_STRING
are enabled the ipmi_msghandler driver will attempt to send the kernel
panic event to the BMC to record in the SEL. It was found that this causes
a hang on BlueField which can block kdump from running and rebooting the
system after a kernel panic occurs.
[Fix]
The ipmi_msghandler driver requires the ipmi_smi_handlers "poll" and
"set_run_to_completion" callbacks to be implemented in order to send the
panic event. If those functions are not registered then panic eventing will
be skipped.
In the BlueField ipmb_host driver these callbacks are registered, but are
not implemented to do anything. When the IPMI panic handler runs it
attempts to send the panic request and waits for the operation to be
completed by polling. Because the poll handler is not fully implemented,
this causes an infinite loop.
The fix is to remove unimplemented ipmi_smi handlers because IPMI panic
eventing is not supported on BlueField.
[Test Case]
* Boot image on BF3 platform with updated ipmb_host driver
* Verify no error messages or failures when loading ipmb_host driver.
* Verify IPMI communication with BMC using "ipmitool mc info"
and "ipmitool lan print"
* Enable NMI watchdog, trigger CPU hard lockup, verify BF3 saves crash
information and reboots automatically instead of hanging.
[Regression Potential]
Low potential for creating regression because the callback functions that
were removed were not implemented to do anything and the ipmi_msghandler
driver checks if the callbacks are NULL before calling them.
More information about the kernel-team
mailing list