[SRU X] [PATCH 0/1] UBUNTU: SAUCE: bnxt_en_bpo: Fix TX timeout during netpoll

Juerg Haefliger juerg.haefliger at canonical.com
Tue Feb 26 09:37:39 UTC 2019


On Fri, 22 Feb 2019 11:20:18 +0100
Nivedita Singhvi <nivedita.singhvi at canonical.com> wrote:

> BugLink: http://bugs.launchpad.net/bugs/1814095
> 
> 
> [Impact]
> 
> The bnxt_en_bpo driver experienced tx timeouts causing the system to experience
> network stalls and fail to send data and heartbeat packets.
> 
> The following 25Gb Broadcom NIC error was seen on Xenial running the
> 4.4.0-141-generic kernel on an amd64 host seeing moderate-heavy network
> traffic (just once):
> 
> * The bnxt_en_po driver froze on a "TX timed out" error and triggered the
>   Netdev Watchdog timer under load.
> 
> * From kernel log:
>   "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
>   See attached kern.log excerpt file for full excerpt of error log.
> 
> * Release = Xenial
>   Kernel = 4.4.0-141-generic #167
>   eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet
> 
> * This caused the driver to reset in order to recover:
> 
>   "bnxt_en_bpo 0000:19:00.1 eno2d1: TX timeout detected, starting reset task!"
> 
>   driver: bnxt_en_bpo
>   version: 1.8.1
>   source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()
> 
> * The loss of connectivity and softirq stall caused other cascading failures
>   on the system.
> 
> * The bnxt_en_po driver is the imported Broadcom driver pulled in to support
>   newer Broadcom HW (specific boards) while the bnx_en module continues to
>   support the older HW. The current Linux upstream driver does not compile
>   easily with the 4.4 kernel (too many changes).
> 
> * This upstream and bnxt_en driver fix is a likely solution:
>    "bnxt_en: Fix TX timeout during netpoll"
>    commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906
> 
>   This fix has not been applied to the bnxt_en_po driver version, but review of
>   the code indicates that it is susceptible to the bug, and the fix would be
>   reasonable.
> 
> 
> [Test Case]
> 
> * Unfortunately, this is not easy to reproduce. Also, it is only seen on
>   4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo driver.
> 
> 
> [Regression Potential]
> 
> * The patch is restricted to the bpo driver, with very constrained scope
>   - just the newest Broadcom NICs being used by the Xenial 4.4 kernel (as
>   opposed to the hwe 4.15 etc. kernels, which would have the in-tree fixed
>   driver).

4.15 doesn't have this patch.


> * The patch is very small and backport is fairly minimal and simple.
> 
> * The fix has been running on the in-tree driver in upstream mainline as well
>   as the Ubuntu Linux in-tree driver, although the Broadcom driver has a lot of
>   lower level code that is different, this piece is still the same.

I'm a little reluctant to ACK this given that a) it's an upstream patch that
is applied to an out-of-tree vendor driver and b) the problem can't be
reproduced (or can it)? If I gave you a test kernel with a newer Broadcom
driver, would you be able to do some testing?

The latest Broadcom driver is version 1.9.2 from 01/02/2019 but it doesn't
have the patch that you want.

...Juerg
 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20190226/7e9a771d/attachment.sig>


More information about the kernel-team mailing list