APPLIED: [SRU B] [PATCH 0/1] qlcnic: Firmware aborts/hangs in QLogic NIC

Khaled Elmously khalid.elmously at canonical.com
Mon Feb 18 08:59:34 UTC 2019


On 2019-02-07 10:44:21 , Guilherme G. Piccoli wrote:
> BugLink: http://bugs.launchpad.net/bugs/1815033
> 
> [Impact]
> 
> * In multi-queue configurations for qlcnic driver, there is a corner case
>   in which TX queue zero is used at same time for regular data transmission
>   by one CPU while another uses the same queue descriptor for MAC config.
> 
> * When such "race" indeed happens, it could lead to TX queue zero
>   corruption, triggering as net result firmware aborts/hangs out of
>   nowhere. The following kernel log messages were collected during the
>   corruption event:
> 
>   qlcnic 0000:01:00.0: Pause control frames disabled on all ports
>   qlcnic 0000:01:00.0: firmware hang detected
>   qlcnic 0000:01:00.0: Dumping hw/fw registers
>   PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
>   PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
>   PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
>   PEG_NET_4_PC: 0x1e00b
>   [...]
>   qlcnic 0000:01:00.0: Detected state change from DEV_NEED_RESET, skipping
> ack check
> 
> * The following device is known to suffer from the issue (lspci output),
>   although a whole class of devices (named 82XX series from the vendor)
>   are susceptible to this:
>   01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE
> Controller [1077:8020]
> 
> * The fix is the following patch, present in mainline kernel as well as
>   in supported stable branches:
>   c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
>   Link for patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22
> 
> [Test Case]
> 
> * Unfortunately this is not easy to reproduce; we have a user report of
>   the issue with a pretty reliable reproducer - user is running a NFS
>   workload on top of the above PCI adapter. His problem goes away with
>   the patch proposed here to SRU. His problem happens in both kernels 4.4
>   and 4.15, and the patch fixes it for both of them.
>   (Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the
>   patch from Greg's supported stable branch).
> 
> [Regression Potential]
> 
> * The patch scope is restricted to a single driver, and the code itself
>   is self-contained - basically a restriction to specific tx_ring when
>   setting filters. There is potential for regressions in this path for
>   the driver which could cause different firmware issues for example,
>   but the user testing exhibited great reliability - without the patch
>   issue happens after ~6h of machine boot. With the patch the machine
>   ran for more than 8 days without issues.
> 
> * Also the patch is present in mainline kernel as well as supported
>   stable branches, and is already present in Ubuntu 4.4 kernel.
> 
> Shahed Shaikh (1):
>   qlcnic: fix Tx descriptor corruption on 82xx devices
> 
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic.h         |  8 +++++---
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c |  3 ++-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h |  3 ++-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h      |  3 ++-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c      | 12 ++++++------
>  5 files changed, 17 insertions(+), 12 deletions(-)
> 
> -- 
> 2.19.2
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list