Cmnt: [SRU][J:linux-bluefield][PATCH v1 0/9] Kernel panic in restart driver after configuring IPsec full offload

Tony Duan yifeid at nvidia.com
Thu Jan 4 03:12:42 UTC 2024


On 1/3/2024 10:36 PM, Bartlomiej Zolnierkiewicz wrote:
> External email: Use caution opening links or attachments
>
>
> Hi,
>
> On Mon, Dec 25, 2023 at 7:21 AM Tony Duan <yifeid at nvidia.com> wrote:
>> BugLink: https://bugs.launchpad.net/bugs/2044427
>>
>> SRU Justification:
>>
>> [Impact]
>>
>> * This patch ported some fixes related to xfrm to avoid crash in some cases
>>
>> [Fix]
>>
>> * cherry-pick afa8cc09c0effbc6532b4a6d89027c63a4f4dfa2 afa8cc0 net: xfrm: Fix xfrm_address_filter OOB read
>>    cherry-pick 027657f5b0e5786fb4a3f81f0c56807128c38e8d 027657f xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESH
>>    cherry-pick e2cfb0384b887db477b969e998c53c4745513f92 e2cfb03 xfrm: Silence warnings triggerable by bad packets
>>    cherry-pick 7cbe43787657bc3d6edd175ba3e486980a89afdf 7cbe437 xfrm: Remove inner/outer modes from input path
>>    cherry-pick 7e4e5880259f9e85d322969577a36f61d98deff4 7e4e588 net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure
>>    cherry-pick 92ad4f000093dcb14dd131a2fd7bf7d59ae956c0 92ad4f0 net: af_key: fix sadb_x_filter validation
>>    cherry-pick 4c8893c6d1f25a9d04740afc27ce0166d1662609 4c8893c xfrm: Flush xfrm state synchronously on netdev close or unregister
>>    backport 1a18e06a37ae5c0eb83f47bdc91a3923a7c21c6f 1a18e06 xfrm: get global statistics from the offloaded device
>>    backport aabb407c261858f1b772eb1f4fa92bc38a203098 aabb407 xfrm: generalize xdo_dev_state_update_curlft to allow statistics update
> The last three commit ids (for the first three patches in the series)
> cannot be found in neither upstream nor linux-next trees. Where do
> these commits come from? (If not from public trees they should be
> marked as "UBUNTU: SAUCE: ...").
>
> Otherwise everything looks fine to me.
>
> --
> Best regards,
> Bartlomiej

Hi Bartlomiej,

I checked with the owners of the last three patches. They are currently 
in NV's repo and still pending to merge into upstream. Could you please 
help to merge the rest six patches first? I will file another request 
when these three patches are merged.

Thank you,

Tony

>> [Test Plan]
>>
>> * Restarting the driver with IPsec full offload transparent mode configuration causes kernel panic.
>> Kernel version is linux-bluefield 5.15
>>
>> Test step:
>> 1) configure xfrm rules
>> 2) configure VF
>> 3) configure FW steering mode
>> 4) restart driver
>> 5) check dmesg
>>
>> Test result:
>>   [ 937.989359] ------------[ cut here ]------------
>>   [ 937.989786] WARNING: CPU: 11 PID: 60463 at /tmp/23.10-0.1.8/6.5.0-rc6_mlnx/fedora_32/mlnx-ofa_kernel/BUILD/mlnx-ofa_kernel-23.10/obj/default/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c:1828 mlx5e_accel_ipsec_fs_cleanup+0x298/0x2b0 [mlx5_core]
>>   [ 937.991698] fuse virtio_net net_failover failover [last unloaded: vdpa]
>>   [ 937.999155] CPU: 11 PID: 60463 Comm: modprobe Tainted: G OE 6.5.0-rc6_mlnx #1
>>   [ 937.999891] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>>   [ 938.000823] RIP: 0010:mlx5e_accel_ipsec_fs_cleanup+0x298/0x2b0 [mlx5_core]
>>   [ 938.001459] Code: f6 45 31 c0 48 89 ea 31 ff e8 d4 d5 df ff 59 e9 8c fe ff ff c3 0f 0b e9 3b fe ff ff 0f 0b e9 e8 fd ff ff 0f 0b e9 07 fe ff ff <0f> 0b e9 65 fe ff ff 0f 0b e9 82 fe ff ff 66 2e 0f 1f 84 00 00 00
>>   [ 938.002949] RSP: 0018:ffffc90001183c08 EFLAGS: 00010202
>>   [ 938.003418] RAX: 0000000000000000 RBX: ffff8882f3869c00 RCX: 0000000000000001
>>   [ 938.004024] RDX: ffffffff82a305c0 RSI: 0000000000000002 RDI: ffff888103aa2b30
>>   [ 938.004624] RBP: ffff888103aa2d80 R08: 0000000000000001 R09: ffff888100042800
>>   [ 938.005238] R10: 0000000000000002 R11: ffffc90001183ba8 R12: ffff8881312e6800
>>   [ 938.005836] R13: ffff8881127401a0 R14: ffff8881312e6800 R15: ffff888148bbd160
>>   [ 938.006444] FS: 00007fd22b82c740(0000) GS:ffff88885fac0000(0000) knlGS:0000000000000000
>>   [ 938.009456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   [ 938.009970] CR2: 00007f26ca697000 CR3: 000000012e73f003 CR4: 0000000000770ee0
>>   [ 938.010568] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>   [ 938.011173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>   [ 938.011772] PKRU: 55555554
>>   [ 938.012065] Call Trace:
>>   [ 938.012333]
>>   [ 938.012583] ? __warn+0x7d/0x120
>>   [ 938.012921] ? mlx5e_accel_ipsec_fs_cleanup+0x298/0x2b0 [mlx5_core]
>>   [ 938.013494] ? report_bug+0xf1/0x1c0
>>   [ 938.013850] ? handle_bug+0x44/0x70
>>   [ 938.014201] ? exc_invalid_op+0x13/0x60
>>   [ 938.014568] ? asm_exc_invalid_op+0x16/0x20
>>   [ 938.014970] ? mlx5e_accel_ipsec_fs_cleanup+0x298/0x2b0 [mlx5_core]
>>   [ 938.015532] ? mlx5e_accel_ipsec_fs_cleanup+0xf2/0x2b0 [mlx5_core]
>>   [ 938.016093] mlx5e_ipsec_cleanup+0x1e/0x100 [mlx5_core]
>>   [ 938.016594] mlx5e_detach_netdev+0x46/0x80 [mlx5_core]
>>   [ 938.017098] mlx5e_vport_rep_unload+0x147/0x1a0 [mlx5_core]
>>   [ 938.017623] mlx5_eswitch_unregister_vport_reps+0x13e/0x190 [mlx5_core]
>>   [ 938.018221] auxiliary_bus_remove+0x18/0x30
>>   [ 938.018616] device_release_driver_internal+0xaa/0x130
>>   [ 938.019076] bus_remove_device+0xc3/0x130
>>   [ 938.019451] device_del+0x157/0x380
>>   [ 938.019792] ? kobject_put+0xb3/0x200
>>   [ 938.020153] delete_drivers+0x72/0xa0 [mlx5_core]
>>   [ 938.020608] mlx5_unregister_device+0x34/0x70 [mlx5_core]
>>   [ 938.021113] mlx5_uninit_one+0x25/0x130 [mlx5_core]
>>   [ 938.021572] remove_one+0x72/0xc0 [mlx5_core]
>>   [ 938.022002] pci_device_remove+0x31/0xb0
>>   [ 938.022376] device_release_driver_internal+0xaa/0x130
>>   [ 938.022827] driver_detach+0x3f/0x80
>>   [ 938.023181] bus_remove_driver+0x69/0xe0
>>   [ 938.023553] pci_unregister_driver+0x22/0x90
>>   [ 938.023957] mlx5_cleanup+0xc/0x4c [mlx5_core]
>>   [ 938.024384] __x64_sys_delete_module+0x157/0x280
>>   [ 938.024806] do_syscall_64+0x34/0x80
>>   [ 938.025163] entry_SYSCALL_64_after_hwframe+0x46/0xb0
>>   [ 938.025616] RIP: 0033:0x7fd22b93812b
>>   [ 938.025969] Code: 73 01 c3 48 8b 0d 6d 0d 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 0d 0c 00 f7 d8 64 89 01 48
>>   [ 938.027458] RSP: 002b:00007ffce1ea2658 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
>>   [ 938.028129] RAX: ffffffffffffffda RBX: 000055b5a4efb3b0 RCX: 00007fd22b93812b
>>   [ 938.028719] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055b5a4efb418
>>   [ 938.029327] RBP: 000055b5a4efb3b0 R08: 1999999999999999 R09: 0000000000000000
>>   [ 938.029932] R10: 00007fd22b9acac0 R11: 0000000000000206 R12: 0000000000000000
>>   [ 938.030529] R13: 000055b5a4efb418 R14: 000055b5a4efe350 R15: 000055b5a4efb150
>>   [ 938.031134]
>>   [ 938.031388] ---[ end trace 0000000000000000 ]---
>>
>> [Where problems could occur]
>>
>> * Without this patch, it will see kernel panic info in dmesg
>>
>> [Other Info]
>>
>> * nothing
>>
>> Herbert Xu (2):
>>    xfrm: Remove inner/outer modes from input path
>>    xfrm: Silence warnings triggerable by bad packets
>>
>> Jianbo Liu (1):
>>    xfrm: Flush xfrm state synchronously on netdev close or unregister
>>
>> Leon Romanovsky (2):
>>    xfrm: generalize xdo_dev_state_update_curlft to allow statistics
>>      update
>>    xfrm: get global statistics from the offloaded device
>>
>> Lin Ma (4):
>>    net: af_key: fix sadb_x_filter validation
>>    net: xfrm: Amend XFRMA_SEC_CTX nla_policy structure
>>    xfrm: add forgotten nla_policy for XFRMA_MTIMER_THRESH
>>    net: xfrm: Fix xfrm_address_filter OOB read
>>
>>   Documentation/networking/xfrm_device.rst |  4 +-
>>   include/linux/netdevice.h                |  2 +-
>>   include/net/xfrm.h                       | 14 +++---
>>   net/key/af_key.c                         |  4 +-
>>   net/xfrm/xfrm_compat.c                   |  2 +-
>>   net/xfrm/xfrm_input.c                    | 78 +++++++++++---------------------
>>   net/xfrm/xfrm_proc.c                     |  1 +
>>   net/xfrm/xfrm_state.c                    | 19 ++++++--
>>   net/xfrm/xfrm_user.c                     | 14 +++++-
>>   9 files changed, 69 insertions(+), 69 deletions(-)
>>




More information about the kernel-team mailing list