[Bug 1896614] Re: Race condition when starting dbus services
Victor Tapia
1896614 at bugs.launchpad.net
Wed Oct 28 15:19:54 UTC 2020
# VERIFICATION
Note: As a reminder, the issue here is that there's a race condition
between any DBUS service and systemctl daemon-reload, where systemd adds
the DBUS filter (AddMatch) that looks for a name change when that has
already happened. I'll be using systemd-logind as the DBUS service in my
reproducer.
Using the following reproducer:
for i in $(seq 1 1000); do echo $i; ssh $SERVER 'sudo systemctl daemon-
reload & sudo systemctl restart systemd-logind'; done
- With systemd=237-3ubuntu10.42 (-updates), after a few runs, systemd-logind is stuck as a running job and ssh is not responsive. DBUS messages[1] show that the AddMatch filter is set by systemd after systemd-logind has acquired its final name (systemd-login1)
- With systemd=237-3ubuntu10.43 (-proposed), systemd-logind does not get stuck and everything continues to work. In a scenario[2] where the systemd DBUS AddMatch message arrives after the final systemd-logind NameOwnerChanged, systemd is able to catch up thanks to the GetNameOwner introduced in the patch
[1] https://pastebin.ubuntu.com/p/NxRNX9bwCP/
[2] https://pastebin.ubuntu.com/p/jpKpW3g2bK/
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1896614
Title:
Race condition when starting dbus services
Status in systemd:
Fix Released
Status in systemd package in Ubuntu:
Fix Released
Status in systemd source package in Bionic:
Fix Committed
Bug description:
[impact]
In certain scenarios, such as high load environments or when
"systemctl daemon-reload" runs at the same time a dbus service is
starting (e.g. systemd-logind), systemd is not able to track properly
when the service has started, keeping the job 'running' forever.
[test case]
set up a 1-cpu VM with Bionic, and configure the system with a ssh key
so the user can ssh to localhost. Then run something like:
$ while timeout 5 ssh localhost true; do echo 'reloading'; sudo
systemctl restart systemd-logind & sudo systemctl daemon-reload; done
if that doesn't work try:
$ while timeout 5 ssh localhost true; do echo 'reloading'; sudo sh -c
'systemctl restart systemd-logind & systemctl daemon-reload'; done
once the reproducer exits the while loop, there should be a running job for systemd-logind, and any logins attempted after the bug is reproduced should also hang waiting for the systemd-logind job to complete, e.g.:
ubuntu at lp1896614-b:~$ systemctl list-jobs
JOB UNIT TYPE STATE
525 systemd-logind.service start running
669 session-6.scope start waiting
664 session-5.scope start waiting
3 jobs listed.
[regression potential]
any regression would likely involve services that are Type=dbus
failing to complete starting. as with any systemd change, regressions
could also involve assertion failures in systemd which causes it to
exit.
[scope]
this is needed only for bionic.
this is fixed upstream with commit
a5a8776ae5e4244b7f5acb2a1bfbe6e0b4d8a870 which is including starting
in v243, so it is included already in focal and later.
(per upstream bug) this was introduced by upstream commit
75152a4d6aedbfd3ee8b2d5782b9edf27407622a which was included starting
in v237, so this bug is not present in xenial or earlier.
[original description]
In certain scenarios, such as high load environments or when
"systemctl daemon-reload" runs at the same time a dbus service is
starting (e.g. systemd-logind), systemd is not able to track properly
when the service has started, keeping the job 'running' forever.
The issue appears when systemd runs the "AddMatch" dbus method call to
track the service's "NameOwnerChange" once it has already ran. A
working instance would look like this:
https://pastebin.ubuntu.com/p/868J6WBRQx/
A failing instance would be:
https://pastebin.ubuntu.com/p/HhJZ4p8dT5/
I've been able to reproduce the issue on Bionic (237-3ubuntu10.42)
running:
sudo systemctl daemon-reload & sudo systemctl restart systemd-logind
To manage notifications about this bug go to:
https://bugs.launchpad.net/systemd/+bug/1896614/+subscriptions
More information about the foundations-bugs
mailing list