[Bug 1920991] Re: Ubuntu 20.04 - NVMe/IB I/O error detected while manually resetting controller

Christian Ehrhardt  1920991 at bugs.launchpad.net
Wed Mar 24 06:24:55 UTC 2021


Hi Jennifer,

I beg your pardon, but without further help all I see and can confirm is
that indeed you have I/O errors. I can see your I/O errors:

 381 Mar 23 12:23:58 ICTM1605S01H4 kernel: [ 1141.462069] blk_update_request: I/O error, dev nvme0c0n43, sector 2537840 op 0x0:(READ) flags 0x4004000 phys_seg 66 prio class 0
 391 Mar 23 12:23:58 ICTM1605S01H4 kernel: [ 1141.464827] Buffer I/O error on dev nvme0n43, logical block 0, async page read
 394 Mar 23 12:23:58 ICTM1605S01H4 kernel: [ 1141.465199] ldm_validate_partition_table(): Disk read failed.

And what maybe is the underlying issue in a problematic rdma connection.

 566 Mar 23 12:24:18 ICTM1605S01H4 kernel: [ 1161.461659] nvme nvme1: rdma connection establishment failed (-104)
 567 Mar 23 12:24:18 ICTM1605S01H4 kernel: [ 1161.461673] nvme nvme1: Failed reconnect attempt 1
 568 Mar 23 12:24:18 ICTM1605S01H4 kernel: [ 1161.461678] nvme nvme1: Reconnecting in 10 seconds...

But unfortunately the logs themselves do neither contain more than what you've already found.
Nor is your description detailed enough to "try the same in other environments"

You wrote
"
  On all four of my Ubuntu 20.04 hosts, an I/O error is detected almost
  immediately after my E-Series storage controller ***???***.
"

I've marked the missing spot in your text, what steps exactly did you do
to trigger this error. The sentence in the bug report ends a bit abrupt.

Furthermore I wanted to ask, is that smash tool a helper to exercise
tests/stress onto the disks? And if so from where did you get it as I
have not immediately found good hits for it?

And finally, if Linux sees just I/O errors and failing rdma, there
should be something on the other end as well - at least disconnects or
such. So is there anything else that might help to understand this on
the netapp side?

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nvme-cli in Ubuntu.
https://bugs.launchpad.net/bugs/1920991

Title:
  Ubuntu 20.04 - NVMe/IB I/O error detected while manually resetting
  controller

Status in nvme-cli package in Ubuntu:
  Incomplete

Bug description:
  On all four of my Ubuntu 20.04 hosts, an I/O error is detected almost
  immediately after my E-Series storage controller. I am currently
  running with Ubuntu 20.04, kernel-5.4.0-67-generic, rdma-
  core-28.0-1ubuntu1, nvme-cli-1.9-1ubuntu0.1, and native NVMe
  multipathing enabled. These message appear to coincide with when my
  test fails:

  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.616408] blk_update_request: I/O error, dev nvme0c0n12, sector 289440 op 0x1:(WRITE) flags 0x400c800 phys_seg 6 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.616433] blk_update_request: I/O error, dev nvme0c0n12, sector 291488 op 0x1:(WRITE) flags 0x4008800 phys_seg 134 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.617137] blk_update_request: I/O error, dev nvme0c0n12, sector 295048 op 0x1:(WRITE) flags 0x4008800 phys_seg 87 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.617184] blk_update_request: I/O error, dev nvme0c0n12, sector 293000 op 0x1:(WRITE) flags 0x400c800 phys_seg 180 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.617624] blk_update_request: I/O error, dev nvme0c0n12, sector 298608 op 0x1:(WRITE) flags 0x4008800 phys_seg 47 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.617678] blk_update_request: I/O error, dev nvme0c0n12, sector 296560 op 0x1:(WRITE) flags 0x400c800 phys_seg 62 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.618070] blk_update_request: I/O error, dev nvme0c0n12, sector 302160 op 0x1:(WRITE) flags 0x4008800 phys_seg 24 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.618084] blk_update_request: I/O error, dev nvme0c0n12, sector 300112 op 0x1:(WRITE) flags 0x400c800 phys_seg 47 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.618497] blk_update_request: I/O error, dev nvme0c0n12, sector 305712 op 0x1:(WRITE) flags 0x4008800 phys_seg 25 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.618521] blk_update_request: I/O error, dev nvme0c0n12, sector 303664 op 0x1:(WRITE) flags 0x400c800 phys_seg 63 prio class 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.640763] Buffer I/O error on dev nvme0n12, logical block 0, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.641099] Buffer I/O error on dev nvme0n12, logical block 0, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.641305] Buffer I/O error on dev nvme0n12, logical block 0, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.641317] ldm_validate_partition_table(): Disk read failed.
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.641551] Buffer I/O error on dev nvme0n12, logical block 0, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.641751] Buffer I/O error on dev nvme0n12, logical block 0, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.641955] Buffer I/O error on dev nvme0n12, logical block 0, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.642160] Buffer I/O error on dev nvme0n12, logical block 0, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.642172] Dev nvme0n12: unable to read RDB block 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.642394] Buffer I/O error on dev nvme0n12, logical block 0, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.642600] Buffer I/O error on dev nvme0n12, logical block 3, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.642802] Buffer I/O error on dev nvme0n12, logical block 0, async page read
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.643015] nvme0n12: unable to read partition table
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.653495] ldm_validate_partition_table(): Disk read failed.
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.654188] Dev nvme0n20: unable to read RDB block 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.654850] nvme0n20: unable to read partition table
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.665151] ldm_validate_partition_table(): Disk read failed.
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.665673] Dev nvme0n126: unable to read RDB block 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.666194] nvme0n126: unable to read partition table
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.685662] ldm_validate_partition_table(): Disk read failed.
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.686504] Dev nvme0n124: unable to read RDB block 0
  Mar 23 12:23:58 ICTM1605S01H1 kernel: [ 1232.687187] nvme0n124: unable to read partition table

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvme-cli/+bug/1920991/+subscriptions



More information about the foundations-bugs mailing list