[Bug 1587142] Re: Shutdown hangs in md kworker after "Reached target Shutdown."
John Center
jlcenter at comcast.net
Fri Feb 23 00:33:28 UTC 2018
Hi Dimitri,
> On Feb 22, 2018, at 8:32 AM, Dimitri John Ledkov <launchpad at surgut.co.uk> wrote:
>
> john-center - I believe mdadm is used by default, instead of dmraid.
> However, I'm not sure if we do correctly activate Intel Raids with
> mdadm. I have not done an install with either 16.04 or 18.04. Or I guess
> at least start the install up to the partitioning screen. My expectation
> is, if one has raid arrays pre-setup in BIOS (ctrl+I on boot) already,
> they should show up as assembled and offered to be autopartitioned. With
> both 16.04 and 18.04.
>
I know with 16.04 it wasn’t used by default. I had to do a lot of manipulation to set up the raid array before I could do the install. It used dmraid once it detected the imsm raid. I removed dmraid completely, then installed mdadm, assembled the array & ran the installation program again. I was hoping that mdadm would just do it all instead.
-John
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1587142
>
> Title:
> Shutdown hangs in md kworker after "Reached target Shutdown."
>
> Status in systemd package in Ubuntu:
> Confirmed
>
> Bug description:
> I'm booting a fully patched 16.04 from an Intel Rapid Storage
> Technology enterprise RAID1 volume (ThinkServer TS140 with two SATA
> ST1000NM0033-9ZM drives, ext4 root partition, no LVM, UEFI mode).
>
> If the RAID volume is recovering or resyncing for whatever reason, then `sudo systemctl reboot` and `sudo systemctl poweroff` work fine (I had to `sudo systemctl --now disable lvm2-lvmetad lvm2-lvmpolld lvm2-monitor` in order to consistently get that). However, once the recovery/resync is complete and clean, the reboot and poweroff commands above hang forever after "Reached target Shutdown.". Note that issuing `sudo swapoff -a` beforehand (suggested in the bug #1464917) does not help.
> [EDIT]Actually, the shutdown also hangs from time to time during a resync. But I've never seen it succeed once the resync is complete.[/EDIT]
>
> Then, if the server has been forcibly restarted with the power button,
> the Intel Matrix Storage Manager indicates a "Normal" status for the
> RAID1 volume, but Ubuntu then resyncs the volume anyway:
>
> [ 1.223649] md: bind<sda>
> [ 1.228426] md: bind<sdb>
> [ 1.230030] md: bind<sdb>
> [ 1.230738] md: bind<sda>
> [ 1.232985] usbcore: registered new interface driver usbhid
> [ 1.233494] usbhid: USB HID core driver
> [ 1.234022] md: raid1 personality registered for level 1
> [ 1.234876] md/raid1:md126: not clean -- starting background reconstruction
> [ 1.234956] input: CHESEN USB Keyboard as /devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0/0003:0A81:0101.0001/input/input5
> [ 1.236273] md/raid1:md126: active with 2 out of 2 mirrors
> [ 1.236797] md126: detected capacity change from 0 to 1000202043392
> [ 1.246271] md: md126 switched to read-write mode.
> [ 1.246834] md: resync of RAID array md126
> [ 1.247325] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> [ 1.247503] md126: p1 p2 p3 p4
> [ 1.248269] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
> [ 1.248774] md: using 128k window, over a total of 976759940k.
>
> Note that the pain of "resync upon every (re)boot" cannot even be a
> bit relieved thanks to bitmaps because mdadm does not support them for
> IMSM containers:
>
> $ sudo mdadm --grow --bitmap=internal /dev/md126
> mdadm: Cannot add bitmaps to sub-arrays yet
>
> I also get this in syslog during boot when the individual drives are
> detected, but this seems to be harmless:
>
> May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/sbin/mdadm --incremental /dev/sdb --offroot' failed with exit code 1.
> May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/lib/udev/hdparm' failed with exit code 1.
>
> May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/sbin/mdadm --incremental /dev/sda --offroot' failed with exit code 1.
> May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/lib/udev/hdparm' failed with exit code 1.
>
> During a resync, `sudo sh -c 'echo idle >
> /sys/block/md126/md/sync_action'` actually stops it as expected, but
> it restarts immediately though nothing seems to have triggered it:
>
> May 30 18:17:02 wssrv1 kernel: [ 3106.826710] md: md126: resync interrupted.
> May 30 18:17:02 wssrv1 kernel: [ 3106.836320] md: checkpointing resync of md126.
> May 30 18:17:02 wssrv1 kernel: [ 3106.836623] md: resync of RAID array md126
> May 30 18:17:02 wssrv1 kernel: [ 3106.836625] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> May 30 18:17:02 wssrv1 kernel: [ 3106.836626] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
> May 30 18:17:02 wssrv1 kernel: [ 3106.836627] md: using 128k window, over a total of 976759940k.
> May 30 18:17:02 wssrv1 kernel: [ 3106.836628] md: resuming resync of md126 from checkpoint.
> May 30 18:17:02 wssrv1 mdadm[982]: RebuildStarted event detected on md device /dev/md/Volume0
>
> I attach screenshots of the hanging shutdown log after a `sudo sh -c 'echo 8 > /proc/sys/kernel/printk'`. The second screenshot shows that the kernel has deadlocked in md_write_start(). Note that `sudo systemctl start debug-shell` is unusable on this machine at this point because Ctrl+Alt+F9 brings tty9 without any keyboard.
> [EDIT]But I can still switch back to tty1.[/EDIT]
>
> I have also tried with much lower values for vm.dirty_background_ratio
> and vm.dirty_ratio, but to no avail.
>
> Linux 4.6.0-040600-generic_4.6.0-040600.201605151930_amd64 from
> http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-yakkety/ did not
> help either.
>
> More information below:
>
> $ lsb_release -rd
> Description: Ubuntu 16.04 LTS
> Release: 16.04
>
> $ uname -a
> Linux wssrv1 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> $ apt-cache policy systemd
> systemd:
> Installed: 229-4ubuntu6
> Candidate: 229-4ubuntu6
> Version table:
> *** 229-4ubuntu6 500
> 500 http://fr.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
> 100 /var/lib/dpkg/status
> 229-4ubuntu4 500
> 500 http://fr.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
>
> $ cat /proc/mdstat
> Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
> md126 : active raid1 sda[1] sdb[0]
> 976759808 blocks super external:/md127/0 [2/2] [UU]
> [>....................] resync = 3.3% (32651584/976759940) finish=85.9min speed=183164K/sec
>
> md127 : inactive sdb[1](S) sda[0](S)
> 5288 blocks super external:imsm
>
> unused devices: <none>
>
> $ sudo mdadm -D /dev/md127
> /dev/md127:
> Version : imsm
> Raid Level : container
> Total Devices : 2
>
> Working Devices : 2
>
> UUID : e9bb2216:cb1bbc0f:96943390:bb65943c
> Member Arrays : /dev/md/Volume0
>
> Number Major Minor RaidDevice
>
> 0 8 0 - /dev/sda
> 1 8 16 - /dev/sdb
>
> $ sudo mdadm -D /dev/md126
> /dev/md126:
> Container : /dev/md/imsm0, member 0
> Raid Level : raid1
> Array Size : 976759808 (931.51 GiB 1000.20 GB)
> Used Dev Size : 976759940 (931.51 GiB 1000.20 GB)
> Raid Devices : 2
> Total Devices : 2
>
> State : clean, resyncing
> Active Devices : 2
> Working Devices : 2
> Failed Devices : 0
> Spare Devices : 0
>
> Resync Status : 5% complete
>
> UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090
> Number Major Minor RaidDevice State
> 1 8 0 0 active sync /dev/sda
> 0 8 16 1 active sync /dev/sdb
>
> $ sudo mdadm -E /dev/sda
> /dev/sda:
> Magic : Intel Raid ISM Cfg Sig.
> Version : 1.1.00
> Orig Family : 92b6e3e4
> Family : 92b6e3e4
> Generation : 00000075
> Attributes : All supported
> UUID : e9bb2216:cb1bbc0f:96943390:bb65943c
> Checksum : 5ad6e3c8 correct
> MPB Sectors : 2
> Disks : 2
> RAID Devices : 1
>
> Disk00 Serial : Z1W50P5E
> State : active
> Id : 00000000
> Usable Size : 1953519880 (931.51 GiB 1000.20 GB)
>
> [Volume0]:
> UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090
> RAID Level : 1 <-- 1
> Members : 2 <-- 2
> Slots : [UU] <-- [UU]
> Failed disk : none
> This Slot : 0
> Array Size : 1953519616 (931.51 GiB 1000.20 GB)
> Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)
> Sector Offset : 0
> Num Stripes : 7630936
> Chunk Size : 64 KiB <-- 64 KiB
> Reserved : 0
> Migrate State : repair
> Map State : normal <-- normal
> Checkpoint : 201165 (512)
> Dirty State : dirty
>
> Disk01 Serial : Z1W519DN
> State : active
> Id : 00000001
> Usable Size : 1953519880 (931.51 GiB 1000.20 GB)
>
> $ sudo mdadm -E /dev/sdb
> /dev/sdb:
> Magic : Intel Raid ISM Cfg Sig.
> Version : 1.1.00
> Orig Family : 92b6e3e4
> Family : 92b6e3e4
> Generation : 00000075
> Attributes : All supported
> UUID : e9bb2216:cb1bbc0f:96943390:bb65943c
> Checksum : 5ad6e3c8 correct
> MPB Sectors : 2
> Disks : 2
> RAID Devices : 1
>
> Disk01 Serial : Z1W519DN
> State : active
> Id : 00000001
> Usable Size : 1953519880 (931.51 GiB 1000.20 GB)
>
> [Volume0]:
> UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090
> RAID Level : 1 <-- 1
> Members : 2 <-- 2
> Slots : [UU] <-- [UU]
> Failed disk : none
> This Slot : 1
> Array Size : 1953519616 (931.51 GiB 1000.20 GB)
> Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)
> Sector Offset : 0
> Num Stripes : 7630936
> Chunk Size : 64 KiB <-- 64 KiB
> Reserved : 0
> Migrate State : repair
> Map State : normal <-- normal
> Checkpoint : 201165 (512)
> Dirty State : dirty
>
> Disk00 Serial : Z1W50P5E
> State : active
> Id : 00000000
> Usable Size : 1953519880 (931.51 GiB 1000.20 GB)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1587142/+subscriptions
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1587142
Title:
Shutdown hangs in md kworker after "Reached target Shutdown."
Status in systemd package in Ubuntu:
Confirmed
Bug description:
I'm booting a fully patched 16.04 from an Intel Rapid Storage
Technology enterprise RAID1 volume (ThinkServer TS140 with two SATA
ST1000NM0033-9ZM drives, ext4 root partition, no LVM, UEFI mode).
If the RAID volume is recovering or resyncing for whatever reason, then `sudo systemctl reboot` and `sudo systemctl poweroff` work fine (I had to `sudo systemctl --now disable lvm2-lvmetad lvm2-lvmpolld lvm2-monitor` in order to consistently get that). However, once the recovery/resync is complete and clean, the reboot and poweroff commands above hang forever after "Reached target Shutdown.". Note that issuing `sudo swapoff -a` beforehand (suggested in the bug #1464917) does not help.
[EDIT]Actually, the shutdown also hangs from time to time during a resync. But I've never seen it succeed once the resync is complete.[/EDIT]
Then, if the server has been forcibly restarted with the power button,
the Intel Matrix Storage Manager indicates a "Normal" status for the
RAID1 volume, but Ubuntu then resyncs the volume anyway:
[ 1.223649] md: bind<sda>
[ 1.228426] md: bind<sdb>
[ 1.230030] md: bind<sdb>
[ 1.230738] md: bind<sda>
[ 1.232985] usbcore: registered new interface driver usbhid
[ 1.233494] usbhid: USB HID core driver
[ 1.234022] md: raid1 personality registered for level 1
[ 1.234876] md/raid1:md126: not clean -- starting background reconstruction
[ 1.234956] input: CHESEN USB Keyboard as /devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0/0003:0A81:0101.0001/input/input5
[ 1.236273] md/raid1:md126: active with 2 out of 2 mirrors
[ 1.236797] md126: detected capacity change from 0 to 1000202043392
[ 1.246271] md: md126 switched to read-write mode.
[ 1.246834] md: resync of RAID array md126
[ 1.247325] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 1.247503] md126: p1 p2 p3 p4
[ 1.248269] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 1.248774] md: using 128k window, over a total of 976759940k.
Note that the pain of "resync upon every (re)boot" cannot even be a
bit relieved thanks to bitmaps because mdadm does not support them for
IMSM containers:
$ sudo mdadm --grow --bitmap=internal /dev/md126
mdadm: Cannot add bitmaps to sub-arrays yet
I also get this in syslog during boot when the individual drives are
detected, but this seems to be harmless:
May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/sbin/mdadm --incremental /dev/sdb --offroot' failed with exit code 1.
May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/lib/udev/hdparm' failed with exit code 1.
May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/sbin/mdadm --incremental /dev/sda --offroot' failed with exit code 1.
May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/lib/udev/hdparm' failed with exit code 1.
During a resync, `sudo sh -c 'echo idle >
/sys/block/md126/md/sync_action'` actually stops it as expected, but
it restarts immediately though nothing seems to have triggered it:
May 30 18:17:02 wssrv1 kernel: [ 3106.826710] md: md126: resync interrupted.
May 30 18:17:02 wssrv1 kernel: [ 3106.836320] md: checkpointing resync of md126.
May 30 18:17:02 wssrv1 kernel: [ 3106.836623] md: resync of RAID array md126
May 30 18:17:02 wssrv1 kernel: [ 3106.836625] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
May 30 18:17:02 wssrv1 kernel: [ 3106.836626] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
May 30 18:17:02 wssrv1 kernel: [ 3106.836627] md: using 128k window, over a total of 976759940k.
May 30 18:17:02 wssrv1 kernel: [ 3106.836628] md: resuming resync of md126 from checkpoint.
May 30 18:17:02 wssrv1 mdadm[982]: RebuildStarted event detected on md device /dev/md/Volume0
I attach screenshots of the hanging shutdown log after a `sudo sh -c 'echo 8 > /proc/sys/kernel/printk'`. The second screenshot shows that the kernel has deadlocked in md_write_start(). Note that `sudo systemctl start debug-shell` is unusable on this machine at this point because Ctrl+Alt+F9 brings tty9 without any keyboard.
[EDIT]But I can still switch back to tty1.[/EDIT]
I have also tried with much lower values for vm.dirty_background_ratio
and vm.dirty_ratio, but to no avail.
Linux 4.6.0-040600-generic_4.6.0-040600.201605151930_amd64 from
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-yakkety/ did not
help either.
More information below:
$ lsb_release -rd
Description: Ubuntu 16.04 LTS
Release: 16.04
$ uname -a
Linux wssrv1 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ apt-cache policy systemd
systemd:
Installed: 229-4ubuntu6
Candidate: 229-4ubuntu6
Version table:
*** 229-4ubuntu6 500
500 http://fr.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
100 /var/lib/dpkg/status
229-4ubuntu4 500
500 http://fr.archive.ubuntu.com/ubuntu xenial/main amd64 Packages
$ cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md126 : active raid1 sda[1] sdb[0]
976759808 blocks super external:/md127/0 [2/2] [UU]
[>....................] resync = 3.3% (32651584/976759940) finish=85.9min speed=183164K/sec
md127 : inactive sdb[1](S) sda[0](S)
5288 blocks super external:imsm
unused devices: <none>
$ sudo mdadm -D /dev/md127
/dev/md127:
Version : imsm
Raid Level : container
Total Devices : 2
Working Devices : 2
UUID : e9bb2216:cb1bbc0f:96943390:bb65943c
Member Arrays : /dev/md/Volume0
Number Major Minor RaidDevice
0 8 0 - /dev/sda
1 8 16 - /dev/sdb
$ sudo mdadm -D /dev/md126
/dev/md126:
Container : /dev/md/imsm0, member 0
Raid Level : raid1
Array Size : 976759808 (931.51 GiB 1000.20 GB)
Used Dev Size : 976759940 (931.51 GiB 1000.20 GB)
Raid Devices : 2
Total Devices : 2
State : clean, resyncing
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Resync Status : 5% complete
UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090
Number Major Minor RaidDevice State
1 8 0 0 active sync /dev/sda
0 8 16 1 active sync /dev/sdb
$ sudo mdadm -E /dev/sda
/dev/sda:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.1.00
Orig Family : 92b6e3e4
Family : 92b6e3e4
Generation : 00000075
Attributes : All supported
UUID : e9bb2216:cb1bbc0f:96943390:bb65943c
Checksum : 5ad6e3c8 correct
MPB Sectors : 2
Disks : 2
RAID Devices : 1
Disk00 Serial : Z1W50P5E
State : active
Id : 00000000
Usable Size : 1953519880 (931.51 GiB 1000.20 GB)
[Volume0]:
UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090
RAID Level : 1 <-- 1
Members : 2 <-- 2
Slots : [UU] <-- [UU]
Failed disk : none
This Slot : 0
Array Size : 1953519616 (931.51 GiB 1000.20 GB)
Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)
Sector Offset : 0
Num Stripes : 7630936
Chunk Size : 64 KiB <-- 64 KiB
Reserved : 0
Migrate State : repair
Map State : normal <-- normal
Checkpoint : 201165 (512)
Dirty State : dirty
Disk01 Serial : Z1W519DN
State : active
Id : 00000001
Usable Size : 1953519880 (931.51 GiB 1000.20 GB)
$ sudo mdadm -E /dev/sdb
/dev/sdb:
Magic : Intel Raid ISM Cfg Sig.
Version : 1.1.00
Orig Family : 92b6e3e4
Family : 92b6e3e4
Generation : 00000075
Attributes : All supported
UUID : e9bb2216:cb1bbc0f:96943390:bb65943c
Checksum : 5ad6e3c8 correct
MPB Sectors : 2
Disks : 2
RAID Devices : 1
Disk01 Serial : Z1W519DN
State : active
Id : 00000001
Usable Size : 1953519880 (931.51 GiB 1000.20 GB)
[Volume0]:
UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090
RAID Level : 1 <-- 1
Members : 2 <-- 2
Slots : [UU] <-- [UU]
Failed disk : none
This Slot : 1
Array Size : 1953519616 (931.51 GiB 1000.20 GB)
Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)
Sector Offset : 0
Num Stripes : 7630936
Chunk Size : 64 KiB <-- 64 KiB
Reserved : 0
Migrate State : repair
Map State : normal <-- normal
Checkpoint : 201165 (512)
Dirty State : dirty
Disk00 Serial : Z1W50P5E
State : active
Id : 00000000
Usable Size : 1953519880 (931.51 GiB 1000.20 GB)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1587142/+subscriptions
More information about the foundations-bugs
mailing list