APPLIED: [SRU B] [PATCH 0/1] qlcnic: Firmware aborts/hangs in QLogic NIC
Khaled Elmously
khalid.elmously at canonical.com
Mon Feb 18 08:59:34 UTC 2019
On 2019-02-07 10:44:21 , Guilherme G. Piccoli wrote:
> BugLink: http://bugs.launchpad.net/bugs/1815033
>
> [Impact]
>
> * In multi-queue configurations for qlcnic driver, there is a corner case
> in which TX queue zero is used at same time for regular data transmission
> by one CPU while another uses the same queue descriptor for MAC config.
>
> * When such "race" indeed happens, it could lead to TX queue zero
> corruption, triggering as net result firmware aborts/hangs out of
> nowhere. The following kernel log messages were collected during the
> corruption event:
>
> qlcnic 0000:01:00.0: Pause control frames disabled on all ports
> qlcnic 0000:01:00.0: firmware hang detected
> qlcnic 0000:01:00.0: Dumping hw/fw registers
> PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
> PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
> PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
> PEG_NET_4_PC: 0x1e00b
> [...]
> qlcnic 0000:01:00.0: Detected state change from DEV_NEED_RESET, skipping
> ack check
>
> * The following device is known to suffer from the issue (lspci output),
> although a whole class of devices (named 82XX series from the vendor)
> are susceptible to this:
> 01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE
> Controller [1077:8020]
>
> * The fix is the following patch, present in mainline kernel as well as
> in supported stable branches:
> c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
> Link for patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22
>
> [Test Case]
>
> * Unfortunately this is not easy to reproduce; we have a user report of
> the issue with a pretty reliable reproducer - user is running a NFS
> workload on top of the above PCI adapter. His problem goes away with
> the patch proposed here to SRU. His problem happens in both kernels 4.4
> and 4.15, and the patch fixes it for both of them.
> (Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the
> patch from Greg's supported stable branch).
>
> [Regression Potential]
>
> * The patch scope is restricted to a single driver, and the code itself
> is self-contained - basically a restriction to specific tx_ring when
> setting filters. There is potential for regressions in this path for
> the driver which could cause different firmware issues for example,
> but the user testing exhibited great reliability - without the patch
> issue happens after ~6h of machine boot. With the patch the machine
> ran for more than 8 days without issues.
>
> * Also the patch is present in mainline kernel as well as supported
> stable branches, and is already present in Ubuntu 4.4 kernel.
>
> Shahed Shaikh (1):
> qlcnic: fix Tx descriptor corruption on 82xx devices
>
> drivers/net/ethernet/qlogic/qlcnic/qlcnic.h | 8 +++++---
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 3 ++-
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h | 3 ++-
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h | 3 ++-
> drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c | 12 ++++++------
> 5 files changed, 17 insertions(+), 12 deletions(-)
>
> --
> 2.19.2
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
More information about the kernel-team
mailing list