APPLIED[F/G]: [SRU][F][G][H][PATCH 0/1] qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload.
Kelsey Skunberg
kelsey.skunberg at canonical.com
Fri Jan 22 19:50:07 UTC 2021
Applied to F/G master-next. thank you!
-Kelsey
On 2021-01-15 11:12:42 , Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1909062
>
> [Impact]
>
> For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series
> 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4
> kernel, Kubernetes Internal DNS requests will fail, due to these packets getting
> corrupted.
>
> Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this
> particular packet type is not supported for hardware tx checksum offload, and
> the packets end up corrupted when the qede driver attempts to checksum them.
>
> This only affects internal Kubernetes DNS, as regular DNS lookups to regular
> external domains will succeed, due to them not using IPIP packet types.
>
> [Fix]
>
> Marvell has developed a fix for the qede driver, which checks the packet type,
> and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers
> of type IPIP.
>
> commit 5d5647dad259bb416fd5d3d87012760386d97530
> Author: Manish Chopra <manishc at marvell.com>
> Date: Mon Dec 21 06:55:30 2020 -0800
> Subject: qede: fix offload for IPIP tunnel packets
> Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530
>
> This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream
> stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7.
>
> Note, this SRU isn't targeted for Bionic due to tx csum offload support only
> landing in 5.0 and onward, meaning the 4.15 kernel still works even without this
> patch. Because of this, Bionic can pick the patch up naturally from upstream
> stable.
>
> [Testcase]
>
> The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part
> of a Kubernetes cluster.
>
> Firstly, get a list of all devices in the system:
>
> $ sudo ifconfig
>
> Next, set all devices down with:
>
> $ sudo ifconfig <device> down
>
> Next, bring up the QLogic QL41xxx device:
>
> $ sudo ifconfig <qlogic nic device> up
>
> Then, attempt to lookup an internal Kubernetes domain:
>
> $ nslookup <internal kubernetes domain address>
>
> Without the patch, the connection will time out:
>
> ;; connection timed out; no servers could be reached
>
> If we look at packet traces with tcpdump, we see it leaves the source, but never
> arrives at the destination.
>
> There is a test kernel available in the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test
>
> If you install it, then Kubernetes internal DNS lookups will succeed.
>
> [Where problems could occur]
>
> If a regression were to occur, then users of the qede driver would be affected.
> This is limited to those with QLogic QL41xxx series NICs. The patch explicitly
> checks for IPIP type packets, so only those particular packets would be affected.
>
> Since IPIP type packets are uncommon, it would not cause a total outage on
> regression, since most packets are not IPIP tunnelled. It could potentially cause
> problems for users who frequently handle VPN or Kubernetes internal DNS traffic.
>
> A workaround would be to use ethtool to disable tx csum offload for all packet
> types, or to revert to an older kernel.
>
> Manish Chopra (1):
> qede: fix offload for IPIP tunnel packets
>
> drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> --
> 2.27.0
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
More information about the kernel-team
mailing list