[Bug 2025563] Re: System can not shutdown if system has multiple VROC RAID arrays

Cyrus Lien 2025563 at bugs.launchpad.net
Fri Aug 25 11:33:30 UTC 2023


SUT information:

$ cat /proc/mdstat 
Personalities : [raid0] [raid10] [linear] [multipath] [raid1] [raid6] [raid5] [raid4] 
md124 : active raid10 nvme2n1[3] nvme3n1[2] nvme4n1[1] nvme5n1[0]
      500107264 blocks super external:/md127/0 64K chunks 2 near-copies [4/4] [UUUU]
      [>....................]  resync =  2.6% (13288960/500107264) finish=39.3min speed=205974K/sec
      
md125 : active raid0 nvme0n1[1] nvme1n1[0]
      950198272 blocks super external:/md126/0 128k chunks
      
md126 : inactive nvme1n1[1](S) nvme0n1[0](S)
      10402 blocks super external:imsm
       
md127 : inactive nvme3n1[3](S) nvme5n1[2](S) nvme2n1[1](S) nvme4n1[0](S)
      20804 blocks super external:imsm
       
unused devices: <none>


$ lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE   MOUNTPOINTS
loop0         7:0    0     4K  1 loop   /snap/bare/5
loop1         7:1    0  63.3M  1 loop   /snap/core20/1879
loop2         7:2    0  73.1M  1 loop   /snap/core22/634
loop3         7:3    0   242M  1 loop   /snap/firefox/2667
loop4         7:4    0 349.7M  1 loop   /snap/gnome-3-38-2004/140
loop5         7:5    0 460.6M  1 loop   /snap/gnome-42-2204/102
loop6         7:6    0  91.7M  1 loop   /snap/gtk-common-themes/1535
loop7         7:7    0  12.3M  1 loop   /snap/snap-store/959
loop8         7:8    0  53.2M  1 loop   /snap/snapd/19122
loop9         7:9    0   452K  1 loop   /snap/snapd-desktop-integration/83
sr0          11:0    1  1024M  0 rom    
nvme5n1     259:0    0 238.5G  0 disk   
├─md124       9:124  0 476.9G  0 raid10 
└─md127       9:127  0     0B  0 md     
nvme4n1     259:1    0 238.5G  0 disk   
├─md124       9:124  0 476.9G  0 raid10 
└─md127       9:127  0     0B  0 md     
nvme3n1     259:2    0 238.5G  0 disk   
├─md124       9:124  0 476.9G  0 raid10 
└─md127       9:127  0     0B  0 md     
nvme2n1     259:3    0 238.5G  0 disk   
├─md124       9:124  0 476.9G  0 raid10 
└─md127       9:127  0     0B  0 md     
nvme0n1     259:4    0 476.9G  0 disk   
├─md125       9:125  0 906.2G  0 raid0  
│ ├─md125p1 259:6    0 238.4M  0 part   /boot/efi
│ ├─md125p2 259:7    0   7.6G  0 part   
│ └─md125p3 259:8    0 898.3G  0 part   /var/snap/firefox/common/host-hunspell
│                                       /
└─md126       9:126  0     0B  0 md     
nvme1n1     259:5    0 476.9G  0 disk   
├─md125       9:125  0 906.2G  0 raid0  
│ ├─md125p1 259:6    0 238.4M  0 part   /boot/efi
│ ├─md125p2 259:7    0   7.6G  0 part   
│ └─md125p3 259:8    0 898.3G  0 part   /var/snap/firefox/common/host-hunspell
│                                       /
└─md126       9:126  0     0B  0 md

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2025563

Title:
  System can not shutdown if system has multiple VROC RAID arrays

Status in OEM Priority Project:
  In Progress
Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Released

Bug description:
  [ Impact ]

  The system can not shutdown if the system has multiple VROC RAID arrays.
  Intel has fixed it in systemd v251 [1].
  Need to cherry-pick the commit to ubuntu-jammy systemd 249.11-0ubuntu3.9.

  [1] The commit fixes the issue:
  commit 3a3b022d2cc112803ea7b9beea98bbcad110368a
  Author: Mariusz Tkaczyk <mariusz.tkaczyk at linux.intel.com>
  Date:   Tue Mar 29 12:49:54 2022 +0200

      shutdown: get only active md arrays.

      Current md_list_get() implementation filters all block devices, started from
      "md*". This is ambiguous because list could contain:
      - partitions created upon md device (mdXpY)
      - external metadata container- specific type of md array.

      For partitions there is no issue, because they aren't handle STOP_ARRAY
      ioctl sent later. It generates misleading errors only.

      Second case is more problematic because containers are not locked in kernel.
      They are stopped even if container member array is active. For that reason
      reboot or shutdown flow could be blocked because metadata manager cannot be
      restarted after switch root on shutdown.

      Add filters to remove partitions and containers from md_list. Partitions
      can be excluded by DEVTYPE. Containers are determined by MD_LEVEL
      property, we are excluding all with "container" value.

      Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk at linux.intel.com>

  In the journal, we can see systemd-shutdown looping repeatedly as it
  tries and fails to detach all md devices:

  ...
  [  513.416293] systemd-shutdown[1]: Stopping MD /dev/md124p2 (259:5).
  [  513.422953] systemd-shutdown[1]: Could not stop MD /dev/md124p2: Device or resource busy
  [  513.431227] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:4).
  [  513.437952] systemd-shutdown[1]: Could not stop MD /dev/md124p1: Device or resource busy
  [  513.449298] systemd-shutdown[1]: Stopping MD /dev/md124 (9:124).
  [  513.456278] systemd-shutdown[1]: Could not stop MD /dev/md124: Device or resource busy
  [  513.465323] systemd-shutdown[1]: Not all MD devices stopped, 4 left.
  [  513.472564] systemd-shutdown[1]: Couldn't finalize remaining  MD devices, trying again.
  [  513.485302] systemd-shutdown[1]: Failed to open watchdog device /dev/watchdog: No such file or directory
  [  513.496195] systemd-shutdown[1]: Stopping MD devices.
  [  513.502176] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
  [  513.513382] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
  [  513.521436] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
  [  513.534810] systemd-shutdown[1]: Stopping MD /dev/md126 (9:126).
  [  513.545384] systemd-shutdown[1]: Failed to sync MD block device /dev/md126, ignoring: Input/output error
  [  513.557265] md: md126 stopped.
  [  513.561451] systemd-shutdown[1]: Stopping MD /dev/md124p2 (259:5).
  [  513.576673] systemd-shutdown[1]: Could not stop MD /dev/md124p2: Device or resource busy
  [  513.589274] systemd-shutdown[1]: Stopping MD /dev/md124p1 (259:4).
  [  513.597976] systemd-shutdown[1]: Could not stop MD /dev/md124p1: Device or resource busy
  [  513.607263] systemd-shutdown[1]: Stopping MD /dev/md124 (9:124).
  [  513.615067] systemd-shutdown[1]: Could not stop MD /dev/md124: Device or resource busy
  [  513.625157] systemd-shutdown[1]: Not all MD devices stopped, 4 left.
  [  513.632209] systemd-shutdown[1]: Couldn't finalize remaining  MD devices, trying again.
  [  513.641474] systemd-shutdown[1]: Failed to open watchdog device /dev/watchdog: No such file or directory
  [  513.653660] systemd-shutdown[1]: Stopping MD devices.
  [  513.661257] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
  [  513.668833] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
  [  513.677347] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
  [  513.687047] systemd-shutdown[1]: Stopping MD /dev/md126 (9:126).
  [  513.697206] systemd-shutdown[1]: Failed to sync MD block device /dev/md126, ignoring: Input/output error
  [  513.707193] md: md126 stopped.
  ...

  [ Test Plan ]

  1. Build two VROC RAID. One RAID 0 for System volume, another RAID 10 for Data volume.
  2. Install system on System volume.
  3. Update systemd.
  4. Reboot the system.
  5. Verify if the system can reboot.

  [ Where problems could occur ]

  The patch confirmed fixed the reboot issue on the system with two VROC
  RAIDs but more than two VROC RAIDs and the combinations of RAID levels
  are not all tested. The patch itself adds logic to skip partitions and
  containers from the list of md devices to try and stop. Therefore any
  regressions would also be related to stopping md devices in systemd-
  shutdown.

  [ Scope ]

  Jammy

To manage notifications about this bug go to:
https://bugs.launchpad.net/oem-priority/+bug/2025563/+subscriptions




More information about the foundations-bugs mailing list