[Bug 1569925] Re: Shutdown hang on 16.04 with iscsi targets

Tue Jun 6 15:11:18 UTC 2017

@gimpeystrada: Thank you for providing your journal logs (in the future,
providing the raw journal file is better, as then we can run
`journalctl` on it locally). From the logs:

Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: initiator reported error (8 - connection timed out)
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: Could not log into all portals
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: :0113,3260] successful.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Child 3234 belongs to open-iscsi.service
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Main process exited, code=exited, status=8/n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Changed start -> failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd-logind[2801]: Got message type=signal sender=:1.2 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Job open-iscsi.service/start finished, result=failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Failed to start Login to default iSCSI targets.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=937 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Unit entered failed state.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Failed with result 'exit-code'.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: cgroup is empty

I *think* the issue in this case is that open-iscsi.service is actually
in a failed state before shutdown, which I think means it does not run
the ExecStop commands? Can you verify that is the case (boot your
system, `systemctl is-system-running` should indicate "degraded",
`systemctl status open-iscsi.service` should indicate Failed). The
problem is that the open-iscsi.service has:

ExecStartPre=/bin/systemctl --quiet is-active iscsid.service
ExecStart=/sbin/iscsiadm -m node --loginall=automatic
ExecStart=/lib/open-iscsi/activate-storage.sh
ExecStop=/lib/open-iscsi/umountiscsi.sh
ExecStop=/bin/sync
ExecStop=/lib/open-iscsi/logout-all.sh

This indicates it, on 'start' of the service, it will try to login to
and active all configured iSCSI targets (the two ExecStart lines).
However, if either of those fail (as they did in the journal in this
case), the ExecStop lines *do not* run. From `man systemd.service`:

           Note that if any of the commands specified in ExecStartPre=,
           ExecStart=, or ExecStartPost= fail (and are not prefixed with "-",
           see above) or time out before the service is fully up, execution
           continues with commands specified in ExecStopPost=, the commands in
           ExecStop= are skipped.

So, I think that we should be using ExecStopPost instead.

@gimpeystrada, can you test locally if your system works correctly by
editing /lib/systemd/system/open-iscsi.service to use ExecStopPost
rather than ExecStop? You will need to run `systemctl daemon-reload`
after editing the service file, I believe.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1569925

Title:
  Shutdown hang on 16.04 with iscsi targets

Status in systemd package in Ubuntu:
  Confirmed

Bug description:
  I have 4 servers running the latest 16.04 updates from the development
  branch (as of right now).

  Each server is connected to NetApp storage using iscsi software
  initiator.  There are a total of 56 volumes spread across two NetApp
  arrays.  Each volume has 4 paths available to it which are being
  managed by device mapper.

  While logged into the iscsi sessions all I have to do is reboot the
  server and I get a hang.

  I see a message that says:

    "Reached target Shutdown"

  followed by

    "systemd-shutdown[1]: Failed to finalize DM devices, ignoring"

  and then I see 8 lines that say:

    "connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    "connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
    NOTE: the actual values of the *'s differ for each line above.

  This seems like a bug somewhere but I am unaware of any additional
  logging that I could turn on to pinpoint the problem.

  Note I also have similar setups that are not doing iscsi and they
  don't have this problem.

  Here is a screenshot of what I see on the shell when I try to reboot:

  (https://launchpadlibrarian.net/291303059/Screenshot.jpg)

  This is being tracked in NetApp bug tracker CQ number 860251.

  If I log out of all iscsi sessions before rebooting then I do not
  experience the hang:

  iscsiadm -m node -U all

  We are wondering if this could be some kind of shutdown ordering
  problem.  Like the network devices have already disappeared and then
  iscsi tries to perform some operation (hence the ping timeouts).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1569925/+subscriptions