APPLIED: [SRU][Bionic][PATCH 0/2] mlx5_core reports hardware checksum error for padded packets on Mellanox NICs

Kleber Souza kleber.souza at canonical.com
Tue Jan 7 10:47:30 UTC 2020


On 2019-12-11 00:24, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1854842
> 
> [Impact]
> 
> On machines equipped with Mellanox NIC's, in this particular case, Mellanox 5 
> series NICs using the mlx5_core driver, there is a kernel splat when sending
> large IP packets which have padding at the end.
> 
> enp6s0f0: hw csum failure
> CPU: 19 PID: 0 Comm: swapper/19 Not tainted 4.15.0-72-generic
> Call Trace:
> <IRQ>
> dump_stack+0x63/0x8e
> netdev_rx_csum_fault+0x38/0x40
> __skb_checksum_complete+0xbc/0xd0
> nf_ip_checksum+0xc3/0xf0
> icmp_error+0x27d/0x310 [nf_conntrack_ipv4]
> nf_conntrack_in+0x15a/0x510 [nf_conntrack]
> ? __skb_checksum+0x68/0x330
> ipv4_conntrack_in+0x1c/0x20 [nf_conntrack_ipv4]
> nf_hook_slow+0x48/0xc0
> ? skb_send_sock+0x50/0x50
> ip_rcv+0x301/0x360
> ? inet_del_offload+0x40/0x40
> __netif_receive_skb_core+0x432/0xb80
> __netif_receive_skb+0x18/0x60
> ? __netif_receive_skb+0x18/0x60
> netif_receive_skb_internal+0x45/0xe0
> napi_gro_receive+0xc5/0xf0
> mlx5e_handle_rx_cqe+0x48d/0x5e0 [mlx5_core]
> ? enqueue_task_rt+0x1b4/0x2e0
> mlx5e_poll_rx_cq+0xd1/0x8c0 [mlx5_core]
> mlx5e_napi_poll+0x9d/0x290 [mlx5_core]
> net_rx_action+0x140/0x3a0
> __do_softirq+0xe4/0x2d4
> irq_exit+0xc5/0xd0
> do_IRQ+0x86/0xe0
> common_interrupt+0x8c/0x8c
> </IRQ>
> 
> This bug is a further attempt to fix these splats, as there has been previous
> fixes in LP #1840854 and a series of commits which landed in 4.15.0-67 
> (LP #1847155) as a part of upstream -stable patches.
> 
> This bug will also fix the same problems on the new Mellanox CX6 and Bluefield 
> hardware, which has been enabled already via previous upstream -stable patches 
> which landed in LP #1847155.
> 
> [Fix]
> 
> This particular issue was fixed for Mellanox series 5 drivers in the following 
> commits:
> 
> commit 0aa1d18615c163f92935b806dcaff9157645233a
> Author: Saeed Mahameed <saeedm at mellanox.com>
> Date:   Tue Mar 12 00:24:52 2019 -0700
> Subject: net/mlx5e: Rx, Fixup skb checksum for packets with tail padding
> 
> This commit required a minor backport.
> 
> This commit was selected for upstream -stable in 4.19.76 and 5.0.10.
> This commit appears to be omitted from "Bionic update: upstream stable patchset 
> 2019-10-07", which is LP #1847155, probably due to requiring a backport.
> 
> commit db849faa9bef993a1379dc510623f750a72fa7ce
> Author: Saeed Mahameed <saeedm at mellanox.com>
> Date:   Fri May 3 13:14:59 2019 -0700
> Subject: net/mlx5e: Rx, Fix checksum calculation for new hardware
> 
> This commit required a minor backport.
> 
> This commit was selected for upstream -stable in 5.1.21 and 5.2.4.
> This commit has already been applied to the disco kernel, as part of stable 
> updates.
> 
> [Testcase]
> 
> The following scapy script will reproduce this issue. Run from the machine with 
> the Mellanox series 5 NIC:
> 
> 1) a=Ether(dst='ff:ff:ff:ff:ff:ff')/IP(dst='127.0.0.1')/ICMP()/
> Padding(load='\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe
> \xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe
> \xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe
> \xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe
> \xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe\xfe')
> 
> 2) sendp(a, iface='enp6s0f0')
> 
> 3) Check dmesg on the receiver side. The example uses localhost, so check dmesg.
> 
> I have built some test kernels, which are available here:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test
> This kernel contains 0aa1d18615c163f92935b806dcaff9157645233a.
> 
> and
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/lp1854842-test-2
> This kernel contains db849faa9bef993a1379dc510623f750a72fa7ce.
> 
> If you install the test kernels the issue is resolved.
> 
> [Regression Potential]
> 
> The changes are limited to the mlx5_core driver, and only modify how packet 
> checksums are calculated when padding is involved.
> 
> Both patches have been accepted and published by upstream -stable, and are 
> widely accepted by the community.
> 
> Because of this, I believe the risk of regression is low.
> 
> Saeed Mahameed (2):
>   net/mlx5e: Rx, Fixup skb checksum for packets with tail padding
>   net/mlx5e: Rx, Fix checksum calculation for new hardware
> 
>  drivers/net/ethernet/mellanox/mlx5/core/en.h  |  1 +
>  .../net/ethernet/mellanox/mlx5/core/en_main.c |  5 ++
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 85 +++++++++++++++----
>  .../ethernet/mellanox/mlx5/core/en_stats.c    |  4 +
>  .../ethernet/mellanox/mlx5/core/en_stats.h    |  4 +
>  include/linux/mlx5/mlx5_ifc.h                 |  3 +-
>  6 files changed, 85 insertions(+), 17 deletions(-)
> 

Applied to bionic/linux.

Thanks,
Kleber



More information about the kernel-team mailing list