[Bug 2100738] [NEW] Intel VROC raid mdadm stop bug

TVKain 2100738 at bugs.launchpad.net
Mon Mar 3 06:13:57 UTC 2025


Public bug reported:

When using Intel VROC raid with curtin, if raid is already configured
curtin tries to run `mdadm --stop` on the configured raid but /dev/md127
is a container and must be stopped last (after /dev/md126).

This is my curtin-config file

```
storage:
  config:
    - id: nvme8n1
      model: SAMSUNG MZQL21T9HCJR-00A07
      name: nvme8n1
      serial: S64GNN0X304332
      type: disk
      wipe: superblock
    - id: nvme9n1
      model: SAMSUNG MZQL21T9HCJR-00A07
      name: nvme9n1
      serial: S64GNN0X304328
      type: disk
      wipe: superblock
    - id: raid_container
      type: raid
      metadata: imsm
      raidlevel: container
      devices:
        - nvme8n1
        - nvme9n1
      name: imsm0
    - id: raid_device
      type: raid
      raidlevel: 1
      name: test_raid
      container: raid_container
```

This is the deploy logs
```
...
Generating device storage trees for path(s): ['/dev/nvme8n1', '/dev/nvme9n1']
devname '/sys/class/block/nvme8n1' had holders: ['md127', 'md126']
/dev/nvme8n1 is multipath device partition? False
/dev/nvme8n1 is multipath device partition? False
/dev/nvme8n1 is multipath device partition? False
devname '/sys/class/block/md127' had holders: []
/dev/md127 is multipath device partition? False
/dev/md127 is multipath device partition? False
devname '/sys/class/block/md126' had holders: []
/dev/md126 is multipath device partition? False
/dev/md126 is multipath device partition? False
devname '/sys/class/block/nvme9n1' had holders: ['md127', 'md126']
/dev/nvme9n1 is multipath device partition? False
/dev/nvme9n1 is multipath device partition? False
/dev/nvme9n1 is multipath device partition? False
devname '/sys/class/block/md127' had holders: []
/dev/md127 is multipath device partition? False
/dev/md127 is multipath device partition? False
devname '/sys/class/block/md126' had holders: []
/dev/md126 is multipath device partition? False
/dev/md126 is multipath device partition? False
Current device storage tree:
nvme8n1
|-- md127
`-- md126
nvme9n1
|-- md127
`-- md126
Shutdown Plan:
{'level': 3, 'device': '/sys/class/block/md127', 'dev_type': 'raid'}
{'level': 3, 'device': '/sys/class/block/md126', 'dev_type': 'raid'}
{'level': 1, 'device': '/sys/class/block/nvme8n1', 'dev_type': 'disk'}
{'level': 1, 'device': '/sys/class/block/nvme9n1', 'dev_type': 'disk'}
shutdown running on holder type: 'raid' syspath: '/sys/class/block/md127'
Running command ['mdadm', '--query', '--detail', '--export', '/dev/md127'] with allowed return codes [0] (capture=True)
Discovering raid devices and spares for /sys/class/block/md127
Wiping superblock on raid device: /sys/class/block/md127
wiping superblock on /dev/md127
wiping /dev/md127 attempt 1/4
Running command ['wipefs', '--all', '--force', '/dev/md127'] with allowed return codes [0] (capture=False)
wiping 1M on /dev/md127 at offsets [0, -1048576]
/dev/md127 (size=0): 1048576 bytes from 0 > size. Shortened to 0 bytes.
/dev/md127 (size=0): invalid offset -1048576. Skipping.
successfully wiped device /dev/md127 on attempt 1/4
Removing raid array members: ['/dev/nvme9n1', '/dev/nvme8n1']
mdadm mark faulty: /dev/nvme9n1 in array /dev/md127
Running command ['mdadm', '--fail', '/dev/md127', '/dev/nvme9n1'] with allowed return codes [0] (capture=True)
Non-fatal error clearing raid array: mdadm: set device faulty failed for /dev/nvme9n1:  No such device

mdadm mark faulty: /dev/nvme8n1 in array /dev/md127
Running command ['mdadm', '--fail', '/dev/md127', '/dev/nvme8n1'] with allowed return codes [0] (capture=True)
Non-fatal error clearing raid array: mdadm: set device faulty failed for /dev/nvme8n1:  No such device

using mdadm.mdadm_stop on dev: /dev/md127
mdadm stopping: /dev/md127
mdadm: stop on /dev/md127 attempt 0
/sys/class/block/md127/md/sync_action/sync_max =
mdadm: setting array sync_action=idle
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
/sys/class/block/md127/md/sync_max/sync_max =
mdadm: setting array sync_{min,max}=0
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_max'
Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
mdadm stop failed, retrying
/proc/mdstat:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md126 : active raid1 nvme9n1[1] nvme8n1[0]
      1875368960 blocks super external:/md127/0 [2/2] [UU]
      [=>...................]  resync =  8.0% (151198784/1875368960) finish=142.5min speed=201600K/sec

md127 : inactive nvme9n1[1](S) nvme8n1[0](S)
      2210 blocks super external:imsm

unused devices: <none>

mdadm: stop failed, retrying in 0.2 seconds
mdadm: stop on /dev/md127 attempt 1
/sys/class/block/md127/md/sync_action/sync_max =
mdadm: setting array sync_action=idle
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
/sys/class/block/md127/md/sync_max/sync_max =
mdadm: setting array sync_{min,max}=0
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_max'
Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
mdadm stop failed, retrying
...
```

I patched the get_holders() function as a work-around for this issue and
it works


```
def get_holders(device):
    """
    Look up any block device holders, return list of knames
    """
    # block.sys_block_path works when given a /sys or /dev path
    sysfs_path = block.sys_block_path(device)
    # get holders
    hpath = os.path.join(sysfs_path, 'holders')
    holders = os.listdir(hpath)

    # sort holders list so md126 would be first to be stopped
    holders.sort() 
    LOG.debug("devname '%s' had holders: %s", device, holders)
    return holders
```

** Affects: curtin
     Importance: Undecided
         Status: New


** Tags: intel-vroc raid

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to curtin.
https://bugs.launchpad.net/bugs/2100738

Title:
  Intel VROC raid mdadm stop bug

Status in curtin:
  New

Bug description:
  When using Intel VROC raid with curtin, if raid is already configured
  curtin tries to run `mdadm --stop` on the configured raid but
  /dev/md127 is a container and must be stopped last (after /dev/md126).

  This is my curtin-config file

  ```
  storage:
    config:
      - id: nvme8n1
        model: SAMSUNG MZQL21T9HCJR-00A07
        name: nvme8n1
        serial: S64GNN0X304332
        type: disk
        wipe: superblock
      - id: nvme9n1
        model: SAMSUNG MZQL21T9HCJR-00A07
        name: nvme9n1
        serial: S64GNN0X304328
        type: disk
        wipe: superblock
      - id: raid_container
        type: raid
        metadata: imsm
        raidlevel: container
        devices:
          - nvme8n1
          - nvme9n1
        name: imsm0
      - id: raid_device
        type: raid
        raidlevel: 1
        name: test_raid
        container: raid_container
  ```

  This is the deploy logs
  ```
  ...
  Generating device storage trees for path(s): ['/dev/nvme8n1', '/dev/nvme9n1']
  devname '/sys/class/block/nvme8n1' had holders: ['md127', 'md126']
  /dev/nvme8n1 is multipath device partition? False
  /dev/nvme8n1 is multipath device partition? False
  /dev/nvme8n1 is multipath device partition? False
  devname '/sys/class/block/md127' had holders: []
  /dev/md127 is multipath device partition? False
  /dev/md127 is multipath device partition? False
  devname '/sys/class/block/md126' had holders: []
  /dev/md126 is multipath device partition? False
  /dev/md126 is multipath device partition? False
  devname '/sys/class/block/nvme9n1' had holders: ['md127', 'md126']
  /dev/nvme9n1 is multipath device partition? False
  /dev/nvme9n1 is multipath device partition? False
  /dev/nvme9n1 is multipath device partition? False
  devname '/sys/class/block/md127' had holders: []
  /dev/md127 is multipath device partition? False
  /dev/md127 is multipath device partition? False
  devname '/sys/class/block/md126' had holders: []
  /dev/md126 is multipath device partition? False
  /dev/md126 is multipath device partition? False
  Current device storage tree:
  nvme8n1
  |-- md127
  `-- md126
  nvme9n1
  |-- md127
  `-- md126
  Shutdown Plan:
  {'level': 3, 'device': '/sys/class/block/md127', 'dev_type': 'raid'}
  {'level': 3, 'device': '/sys/class/block/md126', 'dev_type': 'raid'}
  {'level': 1, 'device': '/sys/class/block/nvme8n1', 'dev_type': 'disk'}
  {'level': 1, 'device': '/sys/class/block/nvme9n1', 'dev_type': 'disk'}
  shutdown running on holder type: 'raid' syspath: '/sys/class/block/md127'
  Running command ['mdadm', '--query', '--detail', '--export', '/dev/md127'] with allowed return codes [0] (capture=True)
  Discovering raid devices and spares for /sys/class/block/md127
  Wiping superblock on raid device: /sys/class/block/md127
  wiping superblock on /dev/md127
  wiping /dev/md127 attempt 1/4
  Running command ['wipefs', '--all', '--force', '/dev/md127'] with allowed return codes [0] (capture=False)
  wiping 1M on /dev/md127 at offsets [0, -1048576]
  /dev/md127 (size=0): 1048576 bytes from 0 > size. Shortened to 0 bytes.
  /dev/md127 (size=0): invalid offset -1048576. Skipping.
  successfully wiped device /dev/md127 on attempt 1/4
  Removing raid array members: ['/dev/nvme9n1', '/dev/nvme8n1']
  mdadm mark faulty: /dev/nvme9n1 in array /dev/md127
  Running command ['mdadm', '--fail', '/dev/md127', '/dev/nvme9n1'] with allowed return codes [0] (capture=True)
  Non-fatal error clearing raid array: mdadm: set device faulty failed for /dev/nvme9n1:  No such device

  mdadm mark faulty: /dev/nvme8n1 in array /dev/md127
  Running command ['mdadm', '--fail', '/dev/md127', '/dev/nvme8n1'] with allowed return codes [0] (capture=True)
  Non-fatal error clearing raid array: mdadm: set device faulty failed for /dev/nvme8n1:  No such device

  using mdadm.mdadm_stop on dev: /dev/md127
  mdadm stopping: /dev/md127
  mdadm: stop on /dev/md127 attempt 0
  /sys/class/block/md127/md/sync_action/sync_max =
  mdadm: setting array sync_action=idle
  mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
  /sys/class/block/md127/md/sync_max/sync_max =
  mdadm: setting array sync_{min,max}=0
  mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_max'
  Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
  mdadm stop failed, retrying
  /proc/mdstat:
  Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
  md126 : active raid1 nvme9n1[1] nvme8n1[0]
        1875368960 blocks super external:/md127/0 [2/2] [UU]
        [=>...................]  resync =  8.0% (151198784/1875368960) finish=142.5min speed=201600K/sec

  md127 : inactive nvme9n1[1](S) nvme8n1[0](S)
        2210 blocks super external:imsm

  unused devices: <none>

  mdadm: stop failed, retrying in 0.2 seconds
  mdadm: stop on /dev/md127 attempt 1
  /sys/class/block/md127/md/sync_action/sync_max =
  mdadm: setting array sync_action=idle
  mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
  /sys/class/block/md127/md/sync_max/sync_max =
  mdadm: setting array sync_{min,max}=0
  mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_max'
  Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
  mdadm stop failed, retrying
  ...
  ```

  I patched the get_holders() function as a work-around for this issue
  and it works

  
  ```
  def get_holders(device):
      """
      Look up any block device holders, return list of knames
      """
      # block.sys_block_path works when given a /sys or /dev path
      sysfs_path = block.sys_block_path(device)
      # get holders
      hpath = os.path.join(sysfs_path, 'holders')
      holders = os.listdir(hpath)

      # sort holders list so md126 would be first to be stopped
      holders.sort() 
      LOG.debug("devname '%s' had holders: %s", device, holders)
      return holders
  ```

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/2100738/+subscriptions




More information about the foundations-bugs mailing list