[Bug 2100738] [NEW] Intel VROC raid mdadm stop bug
TVKain
2100738 at bugs.launchpad.net
Mon Mar 3 06:13:57 UTC 2025
Public bug reported:
When using Intel VROC raid with curtin, if raid is already configured
curtin tries to run `mdadm --stop` on the configured raid but /dev/md127
is a container and must be stopped last (after /dev/md126).
This is my curtin-config file
```
storage:
config:
- id: nvme8n1
model: SAMSUNG MZQL21T9HCJR-00A07
name: nvme8n1
serial: S64GNN0X304332
type: disk
wipe: superblock
- id: nvme9n1
model: SAMSUNG MZQL21T9HCJR-00A07
name: nvme9n1
serial: S64GNN0X304328
type: disk
wipe: superblock
- id: raid_container
type: raid
metadata: imsm
raidlevel: container
devices:
- nvme8n1
- nvme9n1
name: imsm0
- id: raid_device
type: raid
raidlevel: 1
name: test_raid
container: raid_container
```
This is the deploy logs
```
...
Generating device storage trees for path(s): ['/dev/nvme8n1', '/dev/nvme9n1']
devname '/sys/class/block/nvme8n1' had holders: ['md127', 'md126']
/dev/nvme8n1 is multipath device partition? False
/dev/nvme8n1 is multipath device partition? False
/dev/nvme8n1 is multipath device partition? False
devname '/sys/class/block/md127' had holders: []
/dev/md127 is multipath device partition? False
/dev/md127 is multipath device partition? False
devname '/sys/class/block/md126' had holders: []
/dev/md126 is multipath device partition? False
/dev/md126 is multipath device partition? False
devname '/sys/class/block/nvme9n1' had holders: ['md127', 'md126']
/dev/nvme9n1 is multipath device partition? False
/dev/nvme9n1 is multipath device partition? False
/dev/nvme9n1 is multipath device partition? False
devname '/sys/class/block/md127' had holders: []
/dev/md127 is multipath device partition? False
/dev/md127 is multipath device partition? False
devname '/sys/class/block/md126' had holders: []
/dev/md126 is multipath device partition? False
/dev/md126 is multipath device partition? False
Current device storage tree:
nvme8n1
|-- md127
`-- md126
nvme9n1
|-- md127
`-- md126
Shutdown Plan:
{'level': 3, 'device': '/sys/class/block/md127', 'dev_type': 'raid'}
{'level': 3, 'device': '/sys/class/block/md126', 'dev_type': 'raid'}
{'level': 1, 'device': '/sys/class/block/nvme8n1', 'dev_type': 'disk'}
{'level': 1, 'device': '/sys/class/block/nvme9n1', 'dev_type': 'disk'}
shutdown running on holder type: 'raid' syspath: '/sys/class/block/md127'
Running command ['mdadm', '--query', '--detail', '--export', '/dev/md127'] with allowed return codes [0] (capture=True)
Discovering raid devices and spares for /sys/class/block/md127
Wiping superblock on raid device: /sys/class/block/md127
wiping superblock on /dev/md127
wiping /dev/md127 attempt 1/4
Running command ['wipefs', '--all', '--force', '/dev/md127'] with allowed return codes [0] (capture=False)
wiping 1M on /dev/md127 at offsets [0, -1048576]
/dev/md127 (size=0): 1048576 bytes from 0 > size. Shortened to 0 bytes.
/dev/md127 (size=0): invalid offset -1048576. Skipping.
successfully wiped device /dev/md127 on attempt 1/4
Removing raid array members: ['/dev/nvme9n1', '/dev/nvme8n1']
mdadm mark faulty: /dev/nvme9n1 in array /dev/md127
Running command ['mdadm', '--fail', '/dev/md127', '/dev/nvme9n1'] with allowed return codes [0] (capture=True)
Non-fatal error clearing raid array: mdadm: set device faulty failed for /dev/nvme9n1: No such device
mdadm mark faulty: /dev/nvme8n1 in array /dev/md127
Running command ['mdadm', '--fail', '/dev/md127', '/dev/nvme8n1'] with allowed return codes [0] (capture=True)
Non-fatal error clearing raid array: mdadm: set device faulty failed for /dev/nvme8n1: No such device
using mdadm.mdadm_stop on dev: /dev/md127
mdadm stopping: /dev/md127
mdadm: stop on /dev/md127 attempt 0
/sys/class/block/md127/md/sync_action/sync_max =
mdadm: setting array sync_action=idle
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
/sys/class/block/md127/md/sync_max/sync_max =
mdadm: setting array sync_{min,max}=0
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_max'
Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
mdadm stop failed, retrying
/proc/mdstat:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md126 : active raid1 nvme9n1[1] nvme8n1[0]
1875368960 blocks super external:/md127/0 [2/2] [UU]
[=>...................] resync = 8.0% (151198784/1875368960) finish=142.5min speed=201600K/sec
md127 : inactive nvme9n1[1](S) nvme8n1[0](S)
2210 blocks super external:imsm
unused devices: <none>
mdadm: stop failed, retrying in 0.2 seconds
mdadm: stop on /dev/md127 attempt 1
/sys/class/block/md127/md/sync_action/sync_max =
mdadm: setting array sync_action=idle
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
/sys/class/block/md127/md/sync_max/sync_max =
mdadm: setting array sync_{min,max}=0
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_max'
Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
mdadm stop failed, retrying
...
```
I patched the get_holders() function as a work-around for this issue and
it works
```
def get_holders(device):
"""
Look up any block device holders, return list of knames
"""
# block.sys_block_path works when given a /sys or /dev path
sysfs_path = block.sys_block_path(device)
# get holders
hpath = os.path.join(sysfs_path, 'holders')
holders = os.listdir(hpath)
# sort holders list so md126 would be first to be stopped
holders.sort()
LOG.debug("devname '%s' had holders: %s", device, holders)
return holders
```
** Affects: curtin
Importance: Undecided
Status: New
** Tags: intel-vroc raid
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to curtin.
https://bugs.launchpad.net/bugs/2100738
Title:
Intel VROC raid mdadm stop bug
Status in curtin:
New
Bug description:
When using Intel VROC raid with curtin, if raid is already configured
curtin tries to run `mdadm --stop` on the configured raid but
/dev/md127 is a container and must be stopped last (after /dev/md126).
This is my curtin-config file
```
storage:
config:
- id: nvme8n1
model: SAMSUNG MZQL21T9HCJR-00A07
name: nvme8n1
serial: S64GNN0X304332
type: disk
wipe: superblock
- id: nvme9n1
model: SAMSUNG MZQL21T9HCJR-00A07
name: nvme9n1
serial: S64GNN0X304328
type: disk
wipe: superblock
- id: raid_container
type: raid
metadata: imsm
raidlevel: container
devices:
- nvme8n1
- nvme9n1
name: imsm0
- id: raid_device
type: raid
raidlevel: 1
name: test_raid
container: raid_container
```
This is the deploy logs
```
...
Generating device storage trees for path(s): ['/dev/nvme8n1', '/dev/nvme9n1']
devname '/sys/class/block/nvme8n1' had holders: ['md127', 'md126']
/dev/nvme8n1 is multipath device partition? False
/dev/nvme8n1 is multipath device partition? False
/dev/nvme8n1 is multipath device partition? False
devname '/sys/class/block/md127' had holders: []
/dev/md127 is multipath device partition? False
/dev/md127 is multipath device partition? False
devname '/sys/class/block/md126' had holders: []
/dev/md126 is multipath device partition? False
/dev/md126 is multipath device partition? False
devname '/sys/class/block/nvme9n1' had holders: ['md127', 'md126']
/dev/nvme9n1 is multipath device partition? False
/dev/nvme9n1 is multipath device partition? False
/dev/nvme9n1 is multipath device partition? False
devname '/sys/class/block/md127' had holders: []
/dev/md127 is multipath device partition? False
/dev/md127 is multipath device partition? False
devname '/sys/class/block/md126' had holders: []
/dev/md126 is multipath device partition? False
/dev/md126 is multipath device partition? False
Current device storage tree:
nvme8n1
|-- md127
`-- md126
nvme9n1
|-- md127
`-- md126
Shutdown Plan:
{'level': 3, 'device': '/sys/class/block/md127', 'dev_type': 'raid'}
{'level': 3, 'device': '/sys/class/block/md126', 'dev_type': 'raid'}
{'level': 1, 'device': '/sys/class/block/nvme8n1', 'dev_type': 'disk'}
{'level': 1, 'device': '/sys/class/block/nvme9n1', 'dev_type': 'disk'}
shutdown running on holder type: 'raid' syspath: '/sys/class/block/md127'
Running command ['mdadm', '--query', '--detail', '--export', '/dev/md127'] with allowed return codes [0] (capture=True)
Discovering raid devices and spares for /sys/class/block/md127
Wiping superblock on raid device: /sys/class/block/md127
wiping superblock on /dev/md127
wiping /dev/md127 attempt 1/4
Running command ['wipefs', '--all', '--force', '/dev/md127'] with allowed return codes [0] (capture=False)
wiping 1M on /dev/md127 at offsets [0, -1048576]
/dev/md127 (size=0): 1048576 bytes from 0 > size. Shortened to 0 bytes.
/dev/md127 (size=0): invalid offset -1048576. Skipping.
successfully wiped device /dev/md127 on attempt 1/4
Removing raid array members: ['/dev/nvme9n1', '/dev/nvme8n1']
mdadm mark faulty: /dev/nvme9n1 in array /dev/md127
Running command ['mdadm', '--fail', '/dev/md127', '/dev/nvme9n1'] with allowed return codes [0] (capture=True)
Non-fatal error clearing raid array: mdadm: set device faulty failed for /dev/nvme9n1: No such device
mdadm mark faulty: /dev/nvme8n1 in array /dev/md127
Running command ['mdadm', '--fail', '/dev/md127', '/dev/nvme8n1'] with allowed return codes [0] (capture=True)
Non-fatal error clearing raid array: mdadm: set device faulty failed for /dev/nvme8n1: No such device
using mdadm.mdadm_stop on dev: /dev/md127
mdadm stopping: /dev/md127
mdadm: stop on /dev/md127 attempt 0
/sys/class/block/md127/md/sync_action/sync_max =
mdadm: setting array sync_action=idle
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
/sys/class/block/md127/md/sync_max/sync_max =
mdadm: setting array sync_{min,max}=0
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_max'
Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
mdadm stop failed, retrying
/proc/mdstat:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md126 : active raid1 nvme9n1[1] nvme8n1[0]
1875368960 blocks super external:/md127/0 [2/2] [UU]
[=>...................] resync = 8.0% (151198784/1875368960) finish=142.5min speed=201600K/sec
md127 : inactive nvme9n1[1](S) nvme8n1[0](S)
2210 blocks super external:imsm
unused devices: <none>
mdadm: stop failed, retrying in 0.2 seconds
mdadm: stop on /dev/md127 attempt 1
/sys/class/block/md127/md/sync_action/sync_max =
mdadm: setting array sync_action=idle
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_action failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_action'
/sys/class/block/md127/md/sync_max/sync_max =
mdadm: setting array sync_{min,max}=0
mdadm: (non-fatal) write to /sys/class/block/md127/md/sync_max failed [Errno 13] Permission denied: '/sys/class/block/md127/md/sync_max'
Running command ['mdadm', '--manage', '--stop', '/dev/md127'] with allowed return codes [0] (capture=True)
mdadm stop failed, retrying
...
```
I patched the get_holders() function as a work-around for this issue
and it works
```
def get_holders(device):
"""
Look up any block device holders, return list of knames
"""
# block.sys_block_path works when given a /sys or /dev path
sysfs_path = block.sys_block_path(device)
# get holders
hpath = os.path.join(sysfs_path, 'holders')
holders = os.listdir(hpath)
# sort holders list so md126 would be first to be stopped
holders.sort()
LOG.debug("devname '%s' had holders: %s", device, holders)
return holders
```
To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/2100738/+subscriptions
More information about the foundations-bugs
mailing list