[Bug 1591411] Re: systemd-logind must be restarted every ~1000 SSH logins to prevent a ~25 second delay
William Van Hevelingen
1591411 at bugs.launchpad.net
Mon Oct 30 00:24:35 UTC 2017
This bug does not appear to be resolved on Xenial as we are seeing scope
file leakage causing systemctl to hang.
We are running the version of dbus that contains the fix
# dpkg -s dbus | grep Version
Version: 1.10.6-1ubuntu3.3
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1591411
Title:
systemd-logind must be restarted every ~1000 SSH logins to prevent a
~25 second delay
Status in D-Bus:
Fix Released
Status in systemd:
Unknown
Status in dbus package in Ubuntu:
Fix Released
Status in systemd package in Ubuntu:
Fix Released
Status in dbus source package in Xenial:
Fix Released
Status in systemd source package in Xenial:
Invalid
Status in dbus source package in Yakkety:
Won't Fix
Status in systemd source package in Yakkety:
Invalid
Bug description:
[Impact]
The bug affects multiple users and introduces an user visible delay
(~25 seconds) on SSH connections after a large number of sessions have
been processed. This has a serious impact on big systems and servers
running our software.
The currently proposed fix is actually a safe workaround for the bug
as proposed by the dbus upstream. The workaround makes uid 0 immune to
the pending_fd_timeout limit that kicks in and causes the original
issue.
[Test Case]
lxc launch ubuntu:x test
lxc exec test -- login -f ubuntu
ssh-import-id <whatever>
Then ran a script as follows (passing in ubuntu@<container-ip>):
while [ 1 ]; do
(time ssh $1 "echo OK > /dev/null") 2>&1 | grep ^real >> log
done
Then checking the log file if there are any ssh sessions that are
taking 25+ seconds to complete.
Multiple instances of the same script can be used at the same time.
[Regression Potential]
The fix has a rather low regression potential as the workaround is a
very small change only affecting one particular case - handling of uid
0. The fix has been tested by multiple users and has been around in
zesty for a while, with multiple people involved in reviewing the
change. It's also a change that has been proposed by upstream.
[Original Description]
I noticed on a system that accepts large numbers of SSH connections
that after awhile, SSH sessions were taking ~25 seconds to complete.
Looking in /var/log/auth.log, systemd-logind starts failing with the
following:
Jun 10 23:55:28 test sshd[3666]: pam_unix(sshd:session): session opened for user ubuntu by (uid=0)
Jun 10 23:55:28 test systemd-logind[105]: New session c1052 of user ubuntu.
Jun 10 23:55:28 test systemd-logind[105]: Failed to abandon session scope: Transport endpoint is not connected
Jun 10 23:55:28 test sshd[3666]: pam_systemd(sshd:session): Failed to create session: Message recipient disconnected from message bus without replying
I reproduced this in an LXD container by doing something like:
lxc launch ubuntu:x test
lxc exec test -- login -f ubuntu
ssh-import-id <whatever>
Then ran a script as follows (passing in ubuntu@<container-ip>):
while [ 1 ]; do
(time ssh $1 "echo OK > /dev/null") 2>&1 | grep ^real >> log
done
In my case, after 1052 logins, the 1053rd and thereafter were taking
25+ seconds to complete. Here are some snippets from the log file:
$ cat log | grep 0m0 | wc -l
1052
$ cat log | grep 0m25 | wc -l
4
$ tail -5 log
real 0m0.222s
real 0m25.232s
real 0m25.235s
real 0m25.236s
real 0m25.239s
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: systemd 229-4ubuntu5
ProcVersionSignature: Ubuntu 4.4.0-22.40-generic 4.4.8
Uname: Linux 4.4.0-22-generic x86_64
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
Date: Sat Jun 11 00:09:34 2016
MachineType: Notebook W230SS
ProcEnviron:
TERM=xterm-256color
PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-22-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash
SourcePackage: systemd
SystemdDelta:
[EXTENDED] /lib/systemd/system/rc-local.service → /lib/systemd/system/rc-local.service.d/debian.conf
[EXTENDED] /lib/systemd/system/systemd-timesyncd.service → /lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf
2 overridden configuration files found.
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 04/15/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 4.6.5
dmi.board.asset.tag: Tag 12345
dmi.board.name: W230SS
dmi.board.vendor: Notebook
dmi.board.version: Not Applicable
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 9
dmi.chassis.vendor: Notebook
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr4.6.5:bd04/15/2014:svnNotebook:pnW230SS:pvrNotApplicable:rvnNotebook:rnW230SS:rvrNotApplicable:cvnNotebook:ct9:cvrN/A:
dmi.product.name: W230SS
dmi.product.version: Not Applicable
dmi.sys.vendor: Notebook
To manage notifications about this bug go to:
https://bugs.launchpad.net/dbus/+bug/1591411/+subscriptions
More information about the foundations-bugs
mailing list