[PATCH 0/3][SRU][N] NVMe namespace ID mismatch on repeated map/unmap

Heitor Alves de Siqueira halves at canonical.com
Fri Jul 4 18:12:25 UTC 2025


BugLink: https://bugs.launchpad.net/bugs/2115209

SRU Justification:

[Impact]
During repeated NS map/unmap operations in ONTAP (which triggers NS attr changed
AENs) where new NSs get mapped reusing the old NSID, one occasionally sees the
Ubuntu 24.04 NVMe/TCP host ending up with device inconsistencies where the
respective NVMe block device (i.e. /dev/nvmeXnY) is available, but not the
corresponding NVMe generic char device (i.e. /dev/ngXnY). This issue is not seen
if the same NS is remapped on the same NSID, but only hit when a new NS is
mapped reusing the same NSID which was previously used by some other NS.

The following error entries are seen in the messages file during this device
inconsistency scenario:

...
kernel: [267011.744167][ T2016] nvme nvme6: rescanning namespaces.
kernel: [267011.744347][T46805] nvme nvme2: rescanning namespaces.
kernel: [267011.750418][ T7876] nvme nvme1: rescanning namespaces.
kernel: [267011.784466][ T2016] nvme nvme6: IDs don't match for shared namespace 1
kernel: [267011.784791][T46805] nvme nvme2: IDs don't match for shared namespace 1
kernel: [267011.790843][ T7876] nvme nvme1: IDs don't match for shared namespace 1
kernel: [267011.804852][ T2016] nvme nvme6: IDs don't match for shared namespace 2
kernel: [267011.804867][T46805] nvme nvme2: IDs don't match for shared namespace 2
kernel: [267011.810788][ T7876] nvme nvme1: IDs don't match for shared namespace 2
kernel: [267011.824600][ T2016] nvme nvme6: IDs don't match for shared namespace 3
kernel: [267011.825114][T46805] nvme nvme2: IDs don't match for shared namespace 3
kernel: [267011.830982][ T7876] nvme nvme1: IDs don't match for shared namespace 3
kernel: [267011.844712][ T2016] nvme nvme6: duplicate IDs in subsystem for nsid 4
kernel: [267011.845161][T46805] nvme nvme2: duplicate IDs in subsystem for nsid 4
kernel: [267011.851060][ T7876] nvme nvme1: duplicate IDs in subsystem for nsid 4

[Fix]
The following upstream commits are required:

  9546ad1a9bda nvme: requeue namespace scan on missed AENs
  62baf70c3274 nvme: re-read ANA log page after ns scan completes
  26d7fb4fd4ca nvme: fixup scan failure for non-ANA multipath controllers

$ git describe --contains 9546ad1a9bda 62baf70c3274 26d7fb4fd4ca
v6.15-rc2~11^2~1^2~11
v6.15-rc2~11^2~1^2~10
v6.15-rc3~27^2^2~5

These are already included in the Plucky tree and the Questing kernel seems to
be based on v6.15 already, so only Noble needs the cherry-picks.

[Test Case]
The ns-stress.sh script should be able to reproduce this. It repeatedly creates
and deletes NVMe namespaces mapped to the same ID. An example run from an
affected system will look like the one below:

# ./ns-stress.sh /dev/nvme2
Starting test with parameters:
Controller: /dev/nvme2
NSID: 1
Iterations: 100
Size1: 0x200000
Size2: 0x400000
Iteration 1/100
create-ns: Success, created nsid:1
attach-ns: Success, nsid:1
Char device missing after first attach

[Where Problems Could Occur]
The fix requeues controller scans if there are any pending/missed AEN
events. This can introduce delays when managing NVMe namespaces, so we should
look out for any delays or hangs with such operations.

Hannes Reinecke (3):
  [SRU][N] nvme: requeue namespace scan on missed AENs
  [SRU][N] nvme: re-read ANA log page after ns scan completes
  [SRU][N] nvme: fixup scan failure for non-ANA multipath controllers

 drivers/nvme/host/core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

-- 
2.50.0




More information about the kernel-team mailing list