[Bug 2025563] Re: System can not shutdown if system has multiple VROC RAID arrays
Cyrus Lien
2025563 at bugs.launchpad.net
Fri Aug 25 11:33:30 UTC 2023
SUT information:
$ cat /proc/mdstat
Personalities : [raid0] [raid10] [linear] [multipath] [raid1] [raid6] [raid5] [raid4]
md124 : active raid10 nvme2n1[3] nvme3n1[2] nvme4n1[1] nvme5n1[0]
500107264 blocks super external:/md127/0 64K chunks 2 near-copies [4/4] [UUUU]
[>....................] resync = 2.6% (13288960/500107264) finish=39.3min speed=205974K/sec
md125 : active raid0 nvme0n1[1] nvme1n1[0]
950198272 blocks super external:/md126/0 128k chunks
md126 : inactive nvme1n1[1](S) nvme0n1[0](S)
10402 blocks super external:imsm
md127 : inactive nvme3n1[3](S) nvme5n1[2](S) nvme2n1[1](S) nvme4n1[0](S)
20804 blocks super external:imsm
unused devices: <none>
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 4K 1 loop /snap/bare/5
loop1 7:1 0 63.3M 1 loop /snap/core20/1879
loop2 7:2 0 73.1M 1 loop /snap/core22/634
loop3 7:3 0 242M 1 loop /snap/firefox/2667
loop4 7:4 0 349.7M 1 loop /snap/gnome-3-38-2004/140
loop5 7:5 0 460.6M 1 loop /snap/gnome-42-2204/102
loop6 7:6 0 91.7M 1 loop /snap/gtk-common-themes/1535
loop7 7:7 0 12.3M 1 loop /snap/snap-store/959
loop8 7:8 0 53.2M 1 loop /snap/snapd/19122
loop9 7:9 0 452K 1 loop /snap/snapd-desktop-integration/83
sr0 11:0 1 1024M 0 rom
nvme5n1 259:0 0 238.5G 0 disk
├─md124 9:124 0 476.9G 0 raid10
└─md127 9:127 0 0B 0 md
nvme4n1 259:1 0 238.5G 0 disk
├─md124 9:124 0 476.9G 0 raid10
└─md127 9:127 0 0B 0 md
nvme3n1 259:2 0 238.5G 0 disk
├─md124 9:124 0 476.9G 0 raid10
└─md127 9:127 0 0B 0 md
nvme2n1 259:3 0 238.5G 0 disk
├─md124 9:124 0 476.9G 0 raid10
└─md127 9:127 0 0B 0 md
nvme0n1 259:4 0 476.9G 0 disk
├─md125 9:125 0 906.2G 0 raid0
│ ├─md125p1 259:6 0 238.4M 0 part /boot/efi
│ ├─md125p2 259:7 0 7.6G 0 part
│ └─md125p3 259:8 0 898.3G 0 part /var/snap/firefox/common/host-hunspell
│ /
└─md126 9:126 0 0B 0 md
nvme1n1 259:5 0 476.9G 0 disk
├─md125 9:125 0 906.2G 0 raid0
│ ├─md125p1 259:6 0 238.4M 0 part /boot/efi
│ ├─md125p2 259:7 0 7.6G 0 part
│ └─md125p3 259:8 0 898.3G 0 part /var/snap/firefox/common/host-hunspell
│ /
└─md126 9:126 0 0B 0 md
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2025563
Title:
System can not shutdown if system has multiple VROC RAID arrays
Status in OEM Priority Project:
In Progress
Status in systemd package in Ubuntu:
Fix Released
Status in systemd source package in Jammy:
Fix Committed
Status in systemd source package in Kinetic:
Fix Released
Bug description:
[ Impact ]
The system can not shutdown if the system has multiple VROC RAID arrays.
Intel has fixed it in systemd v251 [1].
Need to cherry-pick the commit to ubuntu-jammy systemd 249.11-0ubuntu3.9.
[1] The commit fixes the issue:
commit 3a3b022d2cc112803ea7b9beea98bbcad110368a
Author: Mariusz Tkaczyk <mariusz.tkaczyk at linux.intel.com>
Date: Tue Mar 29 12:49:54 2022 +0200
shutdown: get only active md arrays.
Current md_list_get() implementation filters all block devices, started from
"md*". This is ambiguous because list could contain:
- partitions created upon md device (mdXpY)
- external metadata container- specific type of md array.
For partitions there is no issue, because they aren't handle STOP_ARRAY
ioctl sent later. It generates misleading errors only.
Second case is more problematic because containers are not locked in kernel.
They are stopped even if container member array is active. For that reason
reboot or shutdown flow could be blocked because metadata manager cannot be
restarted after switch root on shutdown.
Add filters to remove partitions and containers from md_list. Partitions
can be excluded by DEVTYPE. Containers are determined by MD_LEVEL
property, we are excluding all with "container" value.
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk at linux.intel.com>
In the journal, we can see systemd-shutdown looping repeatedly as it
tries and fails to detach all md devices:
...
[ 513.416293] systemd-shutdown[1]: Stopping MD /dev/md124p2 (259:5).
[ 513.422953] systemd-shutdown[1]: Could not stop MD /dev/md124p2: Device or resource busy
[ 513.431227] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:4).
[ 513.437952] systemd-shutdown[1]: Could not stop MD /dev/md124p1: Device or resource busy
[ 513.449298] systemd-shutdown[1]: Stopping MD /dev/md124 (9:124).
[ 513.456278] systemd-shutdown[1]: Could not stop MD /dev/md124: Device or resource busy
[ 513.465323] systemd-shutdown[1]: Not all MD devices stopped, 4 left.
[ 513.472564] systemd-shutdown[1]: Couldn't finalize remaining MD devices, trying again.
[ 513.485302] systemd-shutdown[1]: Failed to open watchdog device /dev/watchdog: No such file or directory
[ 513.496195] systemd-shutdown[1]: Stopping MD devices.
[ 513.502176] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
[ 513.513382] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
[ 513.521436] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
[ 513.534810] systemd-shutdown[1]: Stopping MD /dev/md126 (9:126).
[ 513.545384] systemd-shutdown[1]: Failed to sync MD block device /dev/md126, ignoring: Input/output error
[ 513.557265] md: md126 stopped.
[ 513.561451] systemd-shutdown[1]: Stopping MD /dev/md124p2 (259:5).
[ 513.576673] systemd-shutdown[1]: Could not stop MD /dev/md124p2: Device or resource busy
[ 513.589274] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:4).
[ 513.597976] systemd-shutdown[1]: Could not stop MD /dev/md124p1: Device or resource busy
[ 513.607263] systemd-shutdown[1]: Stopping MD /dev/md124 (9:124).
[ 513.615067] systemd-shutdown[1]: Could not stop MD /dev/md124: Device or resource busy
[ 513.625157] systemd-shutdown[1]: Not all MD devices stopped, 4 left.
[ 513.632209] systemd-shutdown[1]: Couldn't finalize remaining MD devices, trying again.
[ 513.641474] systemd-shutdown[1]: Failed to open watchdog device /dev/watchdog: No such file or directory
[ 513.653660] systemd-shutdown[1]: Stopping MD devices.
[ 513.661257] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
[ 513.668833] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
[ 513.677347] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
[ 513.687047] systemd-shutdown[1]: Stopping MD /dev/md126 (9:126).
[ 513.697206] systemd-shutdown[1]: Failed to sync MD block device /dev/md126, ignoring: Input/output error
[ 513.707193] md: md126 stopped.
...
[ Test Plan ]
1. Build two VROC RAID. One RAID 0 for System volume, another RAID 10 for Data volume.
2. Install system on System volume.
3. Update systemd.
4. Reboot the system.
5. Verify if the system can reboot.
[ Where problems could occur ]
The patch confirmed fixed the reboot issue on the system with two VROC
RAIDs but more than two VROC RAIDs and the combinations of RAID levels
are not all tested. The patch itself adds logic to skip partitions and
containers from the list of md devices to try and stop. Therefore any
regressions would also be related to stopping md devices in systemd-
shutdown.
[ Scope ]
Jammy
To manage notifications about this bug go to:
https://bugs.launchpad.net/oem-priority/+bug/2025563/+subscriptions
More information about the foundations-bugs
mailing list