[Bug 1557761] Re: rc-sysinit run twice due to failsafe race condition
Will Bryant
will.bryant at gmail.com
Tue Mar 15 21:56:16 UTC 2016
Current versions:
will at nz-stg-app-wlg-d7:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"
will at nz-stg-app-wlg-d7:~$ uname -a
Linux nz-stg-app-wlg-d7 3.13.0-79-generic #123-Ubuntu SMP Fri Feb 19 14:27:58 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
will at nz-stg-app-wlg-d7:~$ dpkg -l upstart
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================================-===========================-===========================-===============================================================================================
ii upstart 1.12.1-0ubuntu4.2 amd64 event-based init daemon
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1557761
Title:
rc-sysinit run twice due to failsafe race condition
Status in upstart package in Ubuntu:
New
Bug description:
We have noticed that after the upgrade from 12.04 to 14.04, daemons
that are started from rc.d scripts are sometimes being run twice.
We've tracked this down to a race condition in the failsafe script.
Here is the normal sequence of events:
Mar 16 10:39:37 failsafe: Failsafe of 120 seconds reached.
Mar 16 10:39:37 failsafe: net-device-up start event emitted
Mar 16 10:39:37 failsafe: starting failsafe script
Mar 16 10:39:37 failsafe: sleeping in failsafe script
Mar 16 10:39:37 failsafe: static-network-up start event emitted
Mar 16 10:39:37 failsafe: rc-sysinit starting event emitted
Mar 16 10:39:37 kernel: [ 2.056689] init: failsafe main process (642) killed by TERM signal
(Note the inaccurate message about the 120 seconds being reached which
is actually logged immediately on boot - best just to ignore that.
The TERM warning is also harmless - that is the normal result.)
Here is what we see on a bad boot, where the rc.d scripts are started
twice:
Mar 16 10:24:47 failsafe: static-network-up start event emitted
Mar 16 10:24:47 failsafe: rc-sysinit starting event emitted
Mar 16 10:24:47 failsafe: Failsafe of 120 seconds reached.
Mar 16 10:24:47 failsafe: net-device-up start event emitted
Mar 16 10:24:47 failsafe: starting failsafe script
Mar 16 10:24:47 failsafe: sleeping in failsafe script
Mar 16 10:26:47 failsafe: emitting from failsafe script
Mar 16 10:26:47 failsafe: rc-sysinit starting event emitted
Mar 16 10:26:47 kernel: [ 122.229597] init: failsafe main process (797) killed by TERM signal
rc-sysinit has been emitted twice.
Note that the rc-sysinit event has been emitted before the failsafe
script has been emitted, because in this boot it happens that the
static-network-up event was emitted before the net-device-up event.
As a result, the normal stop on "starting rc-sysinit" rule in the
failsafe job definition doesn't work because the failsafe job is not
yet running.
Another way to look at the issue is that the rc-sysinit job
definition's "start on (filesystem and static-network-up) or failsafe-
boot" means that it will always start twice if it finishes before the
failsafe handler fires.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1557761/+subscriptions
More information about the foundations-bugs
mailing list