[Bug 1569925] Re: Shutdown hang on 16.04 with iscsi targets

Rafael David Tinoco rafael.tinoco at canonical.com
Wed Aug 23 14:22:59 UTC 2017


Hypothesis,

Test (1) - The error is NEVER propagated to upper layers:

# xfs and ext4 mounted automatically

inaddy at iscsihang:~$ mount | grep _netde
/dev/sda1 on /ext4 type ext4 (rw,relatime,stripe=32,data=ordered,_netdev)
/dev/sdb1 on /xfs type xfs (rw,relatime,attr2,inode64,noquota,_netdev)

# no error propagation

inaddy at iscsihang:~$ sudo iscsiadm -m node -o show | grep timeo.replace
node.session.timeo.replacement_timeout = -1
node.session.timeo.replacement_timeout = -1

# target server can't give any more packets to guest:

inaddy at machete:~$ sudo iptables -A INPUT -s 192.168.49.8 -p tcp
--destination-port 3260 -j DROP

# reboot can't succeed

inaddy at iscsihang:~$ sudo reboot

[   27.596135]  connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294896692, last ping 4294897944, now 4294899196
[   27.628109]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294896700, last ping 4294897952, now 4294899204

Systemd hangs forever:

[  OK  ] Stopped target Remote File Systems.
         Unmounting /ext4...
         Unmounting /xfs...

OBS: There is a tight relationship in between connection disappearing
before the umount service runs and the capability of systemd to shutdown
the machine entirely. I would say that, in case of no error propagation,
is even worse since kernel would be locked up forever:

[  240.132208] INFO: task systemd:1094 blocked for more than 120 seconds.
[  240.133499]       Not tainted 4.4.0-93-generic #116-Ubuntu
[  240.134544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.136092] INFO: task umount:1199 blocked for more than 120 seconds.
[  240.137262]       Not tainted 4.4.0-93-generic #116-Ubuntu
[  240.138302] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.139742] INFO: task umount:1201 blocked for more than 120 seconds.
[  240.140898]       Not tainted 4.4.0-93-generic #116-Ubuntu
[  240.141953] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Systemd is still trying...

[  OK  ] Unmounted /ext4.
[  OK  ] Unmounted /xfs.
[  OK  ] Stopped File System Check on /dev/disk/by-label/XFS.
[  OK  ] Stopped File System Check on /dev/disk/by-label/EXT4.
[  OK  ] Removed slice system-systemd\x2dfsck.slice.
[  OK  ] Stopped target Remote File Systems (Pre).
         Stopping Login to default iSCSI targets...

[  360.140109] INFO: task systemd:1094 blocked for more than 120 seconds.
[  360.141219]       Not tainted 4.4.0-93-generic #116-Ubuntu
[  360.142100] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.143377] INFO: task umount:1199 blocked for more than 120 seconds.
[  360.144451]       Not tainted 4.4.0-93-generic #116-Ubuntu
[  360.145333] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  360.146576] INFO: task umount:1201 blocked for more than 120 seconds.
[  360.147586]       Not tainted 4.4.0-93-generic #116-Ubuntu
[  360.148472] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

This will happen forever. I still have to find a way of causing systemd
to shutdown network and cause this hang because error, likely, is
propagated after the umount service gives up its logic (or something
like it) <-- theory.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1569925

Title:
  Shutdown hang on 16.04 with iscsi targets

Status in systemd package in Ubuntu:
  Confirmed
Status in systemd source package in Xenial:
  In Progress

Bug description:
  I have 4 servers running the latest 16.04 updates from the development
  branch (as of right now).

  Each server is connected to NetApp storage using iscsi software
  initiator.  There are a total of 56 volumes spread across two NetApp
  arrays.  Each volume has 4 paths available to it which are being
  managed by device mapper.

  While logged into the iscsi sessions all I have to do is reboot the
  server and I get a hang.

  I see a message that says:

    "Reached target Shutdown"

  followed by

    "systemd-shutdown[1]: Failed to finalize DM devices, ignoring"

  and then I see 8 lines that say:

    "connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    NOTE: the actual values of the *'s differ for each line above.

  This seems like a bug somewhere but I am unaware of any additional
  logging that I could turn on to pinpoint the problem.

  Note I also have similar setups that are not doing iscsi and they
  don't have this problem.

  Here is a screenshot of what I see on the shell when I try to reboot:

  (https://launchpadlibrarian.net/291303059/Screenshot.jpg)

  This is being tracked in NetApp bug tracker CQ number 860251.

  If I log out of all iscsi sessions before rebooting then I do not
  experience the hang:

  iscsiadm -m node -U all

  We are wondering if this could be some kind of shutdown ordering
  problem.  Like the network devices have already disappeared and then
  iscsi tries to perform some operation (hence the ping timeouts).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1569925/+subscriptions



More information about the foundations-bugs mailing list