APPLIED[F/G]: [SRU][F][G][H][PATCH 0/1] qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload.

Kelsey Skunberg kelsey.skunberg at canonical.com
Fri Jan 22 19:50:07 UTC 2021


Applied to F/G master-next. thank you! 

-Kelsey

On 2021-01-15 11:12:42 , Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1909062
> 
> [Impact]
> 
> For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series 
> 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4
> kernel, Kubernetes Internal DNS requests will fail, due to these packets getting
> corrupted.
> 
> Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this
> particular packet type is not supported for hardware tx checksum offload, and
> the packets end up corrupted when the qede driver attempts to checksum them.
> 
> This only affects internal Kubernetes DNS, as regular DNS lookups to regular
> external domains will succeed, due to them not using IPIP packet types.
> 
> [Fix]
> 
> Marvell has developed a fix for the qede driver, which checks the packet type,
> and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers
> of type IPIP.
> 
> commit 5d5647dad259bb416fd5d3d87012760386d97530
> Author: Manish Chopra <manishc at marvell.com>
> Date: Mon Dec 21 06:55:30 2020 -0800
> Subject: qede: fix offload for IPIP tunnel packets
> Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
> 
> This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream
> stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.
> 
> Note, this SRU isn't targeted for Bionic due to tx csum offload support only 
> landing in 5.0 and onward, meaning the 4.15 kernel still works even without this
> patch. Because of this, Bionic can pick the patch up naturally from upstream 
> stable.
> 
> [Testcase]
> 
> The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part
> of a Kubernetes cluster.
> 
> Firstly, get a list of all devices in the system:
> 
> $ sudo ifconfig
> 
> Next, set all devices down with:
> 
> $ sudo ifconfig <device> down
> 
> Next, bring up the QLogic QL41xxx device:
> 
> $ sudo ifconfig <qlogic nic device> up
> 
> Then, attempt to lookup an internal Kubernetes domain:
> 
> $ nslookup <internal kubernetes domain address>
> 
> Without the patch, the connection will time out:
> 
> ;; connection timed out; no servers could be reached 
> 
> If we look at packet traces with tcpdump, we see it leaves the source, but never
> arrives at the destination.
> 
> There is a test kernel available in the following ppa:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
> 
> If you install it, then Kubernetes internal DNS lookups will succeed.
> 
> [Where problems could occur]
> 
> If a regression were to occur, then users of the qede driver would be affected.
> This is limited to those with QLogic QL41xxx series NICs. The patch explicitly
> checks for IPIP type packets, so only those particular packets would be affected.
> 
> Since IPIP type packets are uncommon, it would not cause a total outage on
> regression, since most packets are not IPIP tunnelled. It could potentially cause
> problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
> 
> A workaround would be to use ethtool to disable tx csum offload for all packet
> types, or to revert to an older kernel.
> 
> Manish Chopra (1):
>   qede: fix offload for IPIP tunnel packets
> 
>  drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> -- 
> 2.27.0
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list