[SRU][J:linux-bluefield][PATCH v1 0/7] arm64/sme: Implement ZA context switching
Stav Aviram
saviram at nvidia.com
Thu Aug 7 13:33:18 UTC 2025
BugLink: https://bugs.launchpad.net/bugs/2119457
SRU Justification:
[IMPACT]
In Bluefield-2 and Bluefield-3 embedded ARM cores running Ubuntu 22.04
Jammy (linux-bluefield-5.15), ptp4l randomly goes out of sync during
long-running operations (~24 hours) with the error message:
"ptp4l[3416283.946]: port 1: SLAVE to FAULTY on FAULT_DETECTED (FT_UNSPECIFIED)"
Debugging traces reveal that the failure occurs in the network stack's
sendto() system call when ptp4l attempts to send DelayReq messages,
returning error code -6 (ENXIO - "No such device or address").
The root cause is corrupted FPSIMD register state during kernel mode
context switches. ARM64 kernel code using NEON/FPSIMD instructions for
network operations, cryptographic functions, or other
performance-critical tasks can lose register state when preempted, as
the current kernel does not preserve kernel mode FPSIMD state across
context switches. This corruption manifests as unpredictable behavior in
subsequent operations including network socket calls, leading to the
observed sendto() failures that disrupt PTP synchronization. This issue
affects PTP synchronization reliability on Bluefield hardware.
[FIX]
Cherry picking and backporting 7 upstream patches centered around the
core fix for FPSIMD register corruption.
The backport required some adaptation to remove SME dependencies not
supported in linux-bluefield-5.15, while preserving all core kernel NEON
functionality.
Patches [PATCH v1 1/7] - [PATCH v1 2/7] provide the necessary
infrastructure, [PATCH v1 3/7] - [PATCH v1 5/7] form the core functional
series with [PATCH v1 4/7] as the primary fix, and [PATCH v1 6/7] -
[PATCH v1 7/7] address issues discovered in the core implementation:
**Core fix:**
[PATCH v1 4/7]: This is the primary patch that fixes the FPSIMD register
corruption issue by introducing TIF_KERNEL_FPSTATE thread flag and
kernel_fpsimd_state storage, enabling proper preservation and
restoration of FPSIMD register state during context switches.
**Core fix series:**
[PATCH v1 3/7]: Removes complexity blocking the core transformation by
eliminating the fpsimd_context_busy flag and associated infrastructure.
[PATCH v1 5/7]: Adds performance optimization to the core fix with lazy
restore functionality and CPU tracking to minimize unnecessary FPSIMD
state reloads.
**Prerequisites for the core fix series:**
Required for the core series to compile and function:
[PATCH v1 1/7]: Provides SME infrastructure dependencies (most of which
are not needed in Jammy since it does not support SME)
[PATCH v1 2/7]: Refactors API to support the richer state management
needed by the core functionality.
**Bug fixes for the core patch:**
[PATCH v1 6/7]: Fixes assembly macro broken by the preemption model
changes in [PATCH v1 4/7].
[PATCH v1 7/7]: Fixes critical user state reload bug introduced by the
TIF_FOREIGN_FPSTATE management changes in [PATCH v1 4/7].
[TEST CASE]
Compile tested on linux-bluefield-5.15 on the master-next branch. All
patches compile cleanly with no warnings or errors. Prior to the patch
series, ptp4l consistently failed within 24 hours with ENXIO errors from
sendto() calls during DelayReq message transmission. After applying the
fix, the system was tested for 7 consecutive days under the same
conditions that previously triggered failures. No ptp4l synchronization
failures, ENXIO errors from sendto() calls, or FPSIMD-related corruption
were observed during the extended test period.
[REGRESSION POTENTIAL]
The patches introduce new code paths for kernel mode FPSIMD state
management, with potential impact on context switch performance.
However, the upstream patches are well-tested and present in mainline
kernels, the backport maintains functional equivalence with upstream,
and the extensive 7-day testing under the original failure conditions
provides confidence in the implementation's stability.
Ard Biesheuvel (5):
arm64: fpsimd: Drop unneeded 'busy' flag
arm64: fpsimd: Preserve/restore kernel mode NEON at context switch
arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD
arm64: fpsimd: Bring cond_yield asm macro in line with new rules
arm64/fpsimd: Avoid erroneous elide of user state reload
Mark Brown (2):
arm64/sme: Implement ZA context switching
arm64/fp: Use a struct to pass data to fpsimd_bind_state_to_cpu()
arch/arm64/include/asm/assembler.h | 25 ++--
arch/arm64/include/asm/fpsimd.h | 15 +-
arch/arm64/include/asm/kvm_host.h | 1 +
arch/arm64/include/asm/processor.h | 3 +
arch/arm64/include/asm/simd.h | 11 +-
arch/arm64/include/asm/thread_info.h | 1 +
arch/arm64/kernel/asm-offsets.c | 2 -
arch/arm64/kernel/fpsimd.c | 215 +++++++++++++++------------
arch/arm64/kvm/fpsimd.c | 23 +--
9 files changed, 165 insertions(+), 131 deletions(-)
--
2.34.1
More information about the kernel-team
mailing list