[SRU][J:linux-bluefield][PATCH v1 0/1] genetlink: correctly begin the iteration over policies

Stav Aviram saviram at nvidia.com
Sun Jul 20 15:20:49 UTC 2025


BugLink: https://bugs.launchpad.net/bugs/2117349

SRU Justification:

[Impact]
In BF5.15 (Jammy), CX7 cards experience consistent CQ errors with
syndrome 0x1 when running a performance script:
cq_err_event_notifier:538:(pid 9712): CQ error on CQN 0x424, syndrome 0x1

Multiple call traces appear in dmesg and the system becomes
unresponsive. The test may require multiple iterations to trigger the
issue.
The root cause appears to be a missing upstream fix that can lead to
crashes or warnings when netlink policy is not found, potentially
causing the observed CQ errors during high-connection testing scenarios.

[Fix]
Cherry picking the upstream commit:
154ba79c9f16 ("genetlink: correctly begin the iteration over policies")

This commit fixes incorrect initialization in genl_op_iter_init() by
ensuring genl_op_iter_next() is called to properly begin the iteration.
The fix prevents crashes and warnings in
netlink_policy_dump_get_policy_idx() when policy is not found, which may
be contributing to the CQ error condition during intensive connection
testing.

[Test Case]
Compile tested on linux-bluefield-5.15 on the master-next branch.
Functional testing involves:
Running the test with multiple iterations on CX7 hardware with a
linux-bluefield-5.15 kernel that includes the fix. With the patch
applied, the test should complete without CQ errors and system should
remain responsive.

[Regression Potential]
The change is minimal and matches the upstream implementation exactly.

Jakub Kicinski (1):
  genetlink: correctly begin the iteration over policies

 net/netlink/genetlink.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
2.38.1




More information about the kernel-team mailing list