[SRU B] [PATCH 0/1] qlcnic: Firmware aborts/hangs in QLogic NIC

Guilherme G. Piccoli gpiccoli at canonical.com
Thu Feb 7 12:44:21 UTC 2019


BugLink: http://bugs.launchpad.net/bugs/1815033

[Impact]

* In multi-queue configurations for qlcnic driver, there is a corner case
  in which TX queue zero is used at same time for regular data transmission
  by one CPU while another uses the same queue descriptor for MAC config.

* When such "race" indeed happens, it could lead to TX queue zero
  corruption, triggering as net result firmware aborts/hangs out of
  nowhere. The following kernel log messages were collected during the
  corruption event:

  qlcnic 0000:01:00.0: Pause control frames disabled on all ports
  qlcnic 0000:01:00.0: firmware hang detected
  qlcnic 0000:01:00.0: Dumping hw/fw registers
  PEG_HALT_STATUS1: 0x40001502, PEG_HALT_STATUS2: 0x3de7a0,
  PEG_NET_0_PC: 0x6d268, PEG_NET_1_PC: 0x6d2ac,
  PEG_NET_2_PC: 0x149, PEG_NET_3_PC: 0x6e105,
  PEG_NET_4_PC: 0x1e00b
  [...]
  qlcnic 0000:01:00.0: Detected state change from DEV_NEED_RESET, skipping
ack check

* The following device is known to suffer from the issue (lspci output),
  although a whole class of devices (named 82XX series from the vendor)
  are susceptible to this:
  01:00.0 Ethernet controller [0200]: QLogic Corp. cLOM8214 1/10GbE
Controller [1077:8020]

* The fix is the following patch, present in mainline kernel as well as
  in supported stable branches:
  c333fa0c4f22 ("qlcnic: fix Tx descriptor corruption on 82xx devices").
  Link for patch in Linus tree: http://git.kernel.org/linus/c333fa0c4f22

[Test Case]

* Unfortunately this is not easy to reproduce; we have a user report of
  the issue with a pretty reliable reproducer - user is running a NFS
  workload on top of the above PCI adapter. His problem goes away with
  the patch proposed here to SRU. His problem happens in both kernels 4.4
  and 4.15, and the patch fixes it for both of them.
  (Notice this is a Bionic-only SRU, since Ubuntu 4.4 kernel got the
  patch from Greg's supported stable branch).

[Regression Potential]

* The patch scope is restricted to a single driver, and the code itself
  is self-contained - basically a restriction to specific tx_ring when
  setting filters. There is potential for regressions in this path for
  the driver which could cause different firmware issues for example,
  but the user testing exhibited great reliability - without the patch
  issue happens after ~6h of machine boot. With the patch the machine
  ran for more than 8 days without issues.

* Also the patch is present in mainline kernel as well as supported
  stable branches, and is already present in Ubuntu 4.4 kernel.

Shahed Shaikh (1):
  qlcnic: fix Tx descriptor corruption on 82xx devices

 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h         |  8 +++++---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c |  3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h |  3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h      |  3 ++-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c      | 12 ++++++------
 5 files changed, 17 insertions(+), 12 deletions(-)

-- 
2.19.2




More information about the kernel-team mailing list