[Bug 1896614] Re: Race condition when starting dbus services

Wed Sep 23 17:53:41 UTC 2020

** Description changed:

+ [impact]
+ 
+ In certain scenarios, such as high load environments or when "systemctl
+ daemon-reload" runs at the same time a dbus service is starting (e.g.
+ systemd-logind), systemd is not able to track properly when the service
+ has started, keeping the job 'running' forever.
+ 
+ [test case]
+ 
+ set up a 1-cpu VM with Bionic, and configure the system with a ssh key
+ so the user can ssh to localhost. Then run:
+ 
+ ubuntu at lp1896614-b:~$ while timeout 5 ssh localhost true; do echo
+ 'reloading'; sudo systemctl restart systemd-logind & sudo systemctl
+ daemon-reload; done
+ 
+ that should exit the while loop after only a few attempts. At that
+ point, there should be a running job for systemd-logind, and any logins
+ attempted after the bug is reproduced should also hang waiting for the
+ systemd-logind job to complete, e.g.:
+ 
+ ubuntu at lp1896614-b:~$ systemctl list-jobs
+ JOB UNIT                   TYPE  STATE  
+ 525 systemd-logind.service start running
+ 669 session-6.scope        start waiting
+ 664 session-5.scope        start waiting
+ 
+ 3 jobs listed.
+ 
+ [regression potential]
+ 
+ any regression would likely involve services that are Type=dbus failing
+ to complete starting. as with any systemd change, regressions could also
+ involve assertion failures in systemd which causes it to exit.
+ 
+ [scope]
+ 
+ this is needed only for bionic.
+ 
+ TBD - needed for xenial?
+ 
+ this is fixed upstream with commit
+ a5a8776ae5e4244b7f5acb2a1bfbe6e0b4d8a870 which is including starting in
+ v243, so it is included already in focal and later.
+ 
+ [original description]
+ 
  In certain scenarios, such as high load environments or when "systemctl
  daemon-reload" runs at the same time a dbus service is starting (e.g.
  systemd-logind), systemd is not able to track properly when the service
  has started, keeping the job 'running' forever.

  The issue appears when systemd runs the "AddMatch" dbus method call to
  track the service's "NameOwnerChange" once it has already ran. A working
  instance would look like this:

  https://pastebin.ubuntu.com/p/868J6WBRQx/

  A failing instance would be:

  https://pastebin.ubuntu.com/p/HhJZ4p8dT5/

  I've been able to reproduce the issue on Bionic (237-3ubuntu10.42)
  running:

  sudo systemctl daemon-reload & sudo systemctl restart systemd-logind

** Also affects: systemd via
   https://github.com/systemd/systemd/issues/12956
   Importance: Unknown
       Status: Unknown

** Description changed:

  [impact]

  In certain scenarios, such as high load environments or when "systemctl
  daemon-reload" runs at the same time a dbus service is starting (e.g.
  systemd-logind), systemd is not able to track properly when the service
  has started, keeping the job 'running' forever.

  [test case]

  set up a 1-cpu VM with Bionic, and configure the system with a ssh key
  so the user can ssh to localhost. Then run:

  ubuntu at lp1896614-b:~$ while timeout 5 ssh localhost true; do echo
  'reloading'; sudo systemctl restart systemd-logind & sudo systemctl
  daemon-reload; done

  that should exit the while loop after only a few attempts. At that
  point, there should be a running job for systemd-logind, and any logins
  attempted after the bug is reproduced should also hang waiting for the
  systemd-logind job to complete, e.g.:

  ubuntu at lp1896614-b:~$ systemctl list-jobs
- JOB UNIT                   TYPE  STATE  
+ JOB UNIT                   TYPE  STATE
  525 systemd-logind.service start running
  669 session-6.scope        start waiting
  664 session-5.scope        start waiting

  3 jobs listed.

  [regression potential]

  any regression would likely involve services that are Type=dbus failing
  to complete starting. as with any systemd change, regressions could also
  involve assertion failures in systemd which causes it to exit.

  [scope]

  this is needed only for bionic.

- TBD - needed for xenial?
- 
  this is fixed upstream with commit
  a5a8776ae5e4244b7f5acb2a1bfbe6e0b4d8a870 which is including starting in
  v243, so it is included already in focal and later.
+ 
+ (per upstream bug) this was introduced by upstream commit
+ 75152a4d6aedbfd3ee8b2d5782b9edf27407622a which was included starting in
+ v237, so this bug is not present in xenial or earlier.

  [original description]

  In certain scenarios, such as high load environments or when "systemctl
  daemon-reload" runs at the same time a dbus service is starting (e.g.
  systemd-logind), systemd is not able to track properly when the service
  has started, keeping the job 'running' forever.

  The issue appears when systemd runs the "AddMatch" dbus method call to
  track the service's "NameOwnerChange" once it has already ran. A working
  instance would look like this:

  https://pastebin.ubuntu.com/p/868J6WBRQx/

  A failing instance would be:

  https://pastebin.ubuntu.com/p/HhJZ4p8dT5/

  I've been able to reproduce the issue on Bionic (237-3ubuntu10.42)
  running:

  sudo systemctl daemon-reload & sudo systemctl restart systemd-logind

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1896614

Title:
  Race condition when starting dbus services

Status in systemd:
  Unknown
Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Bionic:
  In Progress

Bug description:
  [impact]

  In certain scenarios, such as high load environments or when
  "systemctl daemon-reload" runs at the same time a dbus service is
  starting (e.g. systemd-logind), systemd is not able to track properly
  when the service has started, keeping the job 'running' forever.

  [test case]

  set up a 1-cpu VM with Bionic, and configure the system with a ssh key
  so the user can ssh to localhost. Then run:

  ubuntu at lp1896614-b:~$ while timeout 5 ssh localhost true; do echo
  'reloading'; sudo systemctl restart systemd-logind & sudo systemctl
  daemon-reload; done

  that should exit the while loop after only a few attempts. At that
  point, there should be a running job for systemd-logind, and any
  logins attempted after the bug is reproduced should also hang waiting
  for the systemd-logind job to complete, e.g.:

  ubuntu at lp1896614-b:~$ systemctl list-jobs
  JOB UNIT                   TYPE  STATE
  525 systemd-logind.service start running
  669 session-6.scope        start waiting
  664 session-5.scope        start waiting

  3 jobs listed.

  [regression potential]

  any regression would likely involve services that are Type=dbus
  failing to complete starting. as with any systemd change, regressions
  could also involve assertion failures in systemd which causes it to
  exit.

  [scope]

  this is needed only for bionic.

  this is fixed upstream with commit
  a5a8776ae5e4244b7f5acb2a1bfbe6e0b4d8a870 which is including starting
  in v243, so it is included already in focal and later.

  (per upstream bug) this was introduced by upstream commit
  75152a4d6aedbfd3ee8b2d5782b9edf27407622a which was included starting
  in v237, so this bug is not present in xenial or earlier.

  [original description]

  In certain scenarios, such as high load environments or when
  "systemctl daemon-reload" runs at the same time a dbus service is
  starting (e.g. systemd-logind), systemd is not able to track properly
  when the service has started, keeping the job 'running' forever.

  The issue appears when systemd runs the "AddMatch" dbus method call to
  track the service's "NameOwnerChange" once it has already ran. A
  working instance would look like this:

  https://pastebin.ubuntu.com/p/868J6WBRQx/

  A failing instance would be:

  https://pastebin.ubuntu.com/p/HhJZ4p8dT5/

  I've been able to reproduce the issue on Bionic (237-3ubuntu10.42)
  running:

  sudo systemctl daemon-reload & sudo systemctl restart systemd-logind

To manage notifications about this bug go to:
https://bugs.launchpad.net/systemd/+bug/1896614/+subscriptions