[Bug 1569925] Re: Shutdown hang on 16.04 with iscsi targets
Nish Aravamudan
nish.aravamudan at canonical.com
Tue Jun 6 15:11:18 UTC 2017
@gimpeystrada: Thank you for providing your journal logs (in the future,
providing the raw journal file is better, as then we can run
`journalctl` on it locally). From the logs:
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: initiator reported error (8 - connection timed out)
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: iscsiadm: Could not log into all portals
Apr 26 08:52:13 ICTM1612S02H1 iscsiadm[3234]: :0113,3260] successful.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Child 3234 belongs to open-iscsi.service
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Main process exited, code=exited, status=8/n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Changed start -> failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd-logind[2801]: Got message type=signal sender=:1.2 destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.DBus.Properties member=PropertiesChanged cookie=936 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Job open-iscsi.service/start finished, result=failed
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Failed to start Login to default iSCSI targets.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: Sent message type=signal sender=n/a destination=n/a object=/org/freedesktop/systemd1 interface=org.freedesktop.systemd1.Manager member=JobRemoved cookie=937 reply_cookie=0 error=n/a
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Unit entered failed state.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: Failed with result 'exit-code'.
Apr 26 08:52:13 ICTM1612S02H1 systemd[1]: open-iscsi.service: cgroup is empty
I *think* the issue in this case is that open-iscsi.service is actually
in a failed state before shutdown, which I think means it does not run
the ExecStop commands? Can you verify that is the case (boot your
system, `systemctl is-system-running` should indicate "degraded",
`systemctl status open-iscsi.service` should indicate Failed). The
problem is that the open-iscsi.service has:
ExecStartPre=/bin/systemctl --quiet is-active iscsid.service
ExecStart=/sbin/iscsiadm -m node --loginall=automatic
ExecStart=/lib/open-iscsi/activate-storage.sh
ExecStop=/lib/open-iscsi/umountiscsi.sh
ExecStop=/bin/sync
ExecStop=/lib/open-iscsi/logout-all.sh
This indicates it, on 'start' of the service, it will try to login to
and active all configured iSCSI targets (the two ExecStart lines).
However, if either of those fail (as they did in the journal in this
case), the ExecStop lines *do not* run. From `man systemd.service`:
Note that if any of the commands specified in ExecStartPre=,
ExecStart=, or ExecStartPost= fail (and are not prefixed with "-",
see above) or time out before the service is fully up, execution
continues with commands specified in ExecStopPost=, the commands in
ExecStop= are skipped.
So, I think that we should be using ExecStopPost instead.
@gimpeystrada, can you test locally if your system works correctly by
editing /lib/systemd/system/open-iscsi.service to use ExecStopPost
rather than ExecStop? You will need to run `systemctl daemon-reload`
after editing the service file, I believe.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1569925
Title:
Shutdown hang on 16.04 with iscsi targets
Status in systemd package in Ubuntu:
Confirmed
Bug description:
I have 4 servers running the latest 16.04 updates from the development
branch (as of right now).
Each server is connected to NetApp storage using iscsi software
initiator. There are a total of 56 volumes spread across two NetApp
arrays. Each volume has 4 paths available to it which are being
managed by device mapper.
While logged into the iscsi sessions all I have to do is reboot the
server and I get a hang.
I see a message that says:
"Reached target Shutdown"
followed by
"systemd-shutdown[1]: Failed to finalize DM devices, ignoring"
and then I see 8 lines that say:
"connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection5:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection6:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection7:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
"connection8:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4311815***, last ping 43118164**, now 4311817***"
NOTE: the actual values of the *'s differ for each line above.
This seems like a bug somewhere but I am unaware of any additional
logging that I could turn on to pinpoint the problem.
Note I also have similar setups that are not doing iscsi and they
don't have this problem.
Here is a screenshot of what I see on the shell when I try to reboot:
(https://launchpadlibrarian.net/291303059/Screenshot.jpg)
This is being tracked in NetApp bug tracker CQ number 860251.
If I log out of all iscsi sessions before rebooting then I do not
experience the hang:
iscsiadm -m node -U all
We are wondering if this could be some kind of shutdown ordering
problem. Like the network devices have already disappeared and then
iscsi tries to perform some operation (hence the ping timeouts).
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1569925/+subscriptions
More information about the foundations-bugs
mailing list