[SRU][Q/R][PATCH 0/1] Fix graceful fault handling after FPU softirq changes causes hard freeze on EFI runtime calls
Ivan Hu
ivan.hu at canonical.com
Fri May 22 07:07:45 UTC 2026
BugLink: https://bugs.launchpad.net/bugs/2153976
[Impact]
Running `sudo fwts uefirttime` on HP systems (CID: 202511-38089, 202511-38088,
02511-38091, 202511-38068, 202511-38069) with 6.17 causes a hardsystem freeze.
The machine becomes unreachable by ping/SSH and requires a hardpower cycle.
Root cause: commit d02198550423 ("x86/fpu: Improve crypto performance by making
kernel-mode FPU reliably usable in softirqs") changed kernel_fpu_begin() to use
local_bh_disable() instead of preempt_disable(). This sets SOFTIRQ_OFFSET in
preempt_count during EFI runtime calls, making in_interrupt() return true in
normal task context. The EFI graceful page fault handler
efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults
in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails
out, leaving firmware page faults unhandled. This escalates to die() which also
sees in_interrupt() as true and calls panic("Fatal exception in interrupt"),
freezing the system.
[Fix]
Replace in_interrupt() with !in_task() in efi_crash_gracefully_on_page_fault().
This preserves the original intent of bailing for interrupts or NMI faults,
while no longer falsely triggering from the FPU code path's local_bh_disable().
[Test Plan]
1. Boot affected HP machine with the patched 6.17-oem kernel.
2. Run:
$ sudo fwts uefirttime
Without patch: system hard-freezes, requires power cycle.
With patch: fwts completes (pass or fail), system remains responsive.
[Where problems could occur]
The change is in the EFI page fault handler (arch/x86/platform/efi/quirks.c).
If !in_task() incorrectly identifies a real interrupt-context fault as task
context, the handler would try to process it as an EFI firmware fault instead
of letting the normal oops path handle it. This could mask real kernel bugs
during interrupt-context EFI faults, though such faults are extremely rare.
Also, a softirq taken between efi_rts_work.efi_rts_id assignment and the
fpregs_lock() call could cause a page fault that gets misidentified. The use
of !in_task() (which incorporates in_serving_softirq()) handles this window
correctly.
Ivan Hu (1):
x86/efi: Fix graceful fault handling after FPU softirq changes
arch/x86/platform/efi/quirks.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.53.0
More information about the kernel-team
mailing list