[Bug 1794169] Re: AWS ubuntu became unreachable after ssh login
Steve Beattie
sbeattie at ubuntu.com
Tue Sep 25 22:50:15 UTC 2018
Not sure whether the issue is a poor interaction with sd-pam and the
kernel or strictly a kernel issue.
Kernel timeout backtrace:
Sep 21 03:00:33 mainframe01 kernel: [292411.276266] Not tainted 4.15.0-1021-aws #21-Ubuntu
Sep 21 03:00:33 mainframe01 kernel: [292411.277931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 21 03:00:33 mainframe01 kernel: [292411.280331] kworker/u8:5 D 0 25806 2 0x80000080
Sep 21 03:00:33 mainframe01 kernel: [292411.280339] Workqueue: events_unbound fsnotify_mark_destroy_workfn
Sep 21 03:00:33 mainframe01 kernel: [292411.280340] Call Trace:
Sep 21 03:00:33 mainframe01 kernel: [292411.280347] __schedule+0x291/0x8a0
Sep 21 03:00:33 mainframe01 kernel: [292411.280349] schedule+0x2c/0x80
Sep 21 03:00:33 mainframe01 kernel: [292411.280350] schedule_timeout+0x1cf/0x350
Sep 21 03:00:33 mainframe01 kernel: [292411.280354] ? add_timer+0x124/0x280
Sep 21 03:00:33 mainframe01 kernel: [292411.280357] wait_for_completion+0xba/0x140
Sep 21 03:00:33 mainframe01 kernel: [292411.280362] ? wake_up_q+0x80/0x80
Sep 21 03:00:33 mainframe01 kernel: [292411.280365] __synchronize_srcu.part.13+0x85/0xb0
Sep 21 03:00:33 mainframe01 kernel: [292411.280367] ? trace_raw_output_rcu_utilization+0x50/0x50
Sep 21 03:00:33 mainframe01 kernel: [292411.280369] synchronize_srcu+0x66/0xe0
Sep 21 03:00:33 mainframe01 kernel: [292411.280370] ? synchronize_srcu+0x66/0xe0
Sep 21 03:00:33 mainframe01 kernel: [292411.280372] fsnotify_mark_destroy_workfn+0x7b/0xe0
Sep 21 03:00:33 mainframe01 kernel: [292411.280375] process_one_work+0x1de/0x410
Sep 21 03:00:33 mainframe01 kernel: [292411.280377] worker_thread+0x253/0x410
Sep 21 03:00:33 mainframe01 kernel: [292411.280379] kthread+0x121/0x140
Sep 21 03:00:33 mainframe01 kernel: [292411.280380] ? process_one_work+0x410/0x410
Sep 21 03:00:33 mainframe01 kernel: [292411.280382] ? kthread_create_worker_on_cpu+0x70/0x70
Sep 21 03:00:33 mainframe01 kernel: [292411.280385] ? do_syscall_64+0x73/0x130
Sep 21 03:00:33 mainframe01 kernel: [292411.280387] ? SyS_exit+0x17/0x20
Sep 21 03:00:33 mainframe01 kernel: [292411.280391] ret_from_fork+0x35/0x40
** Information type changed from Private Security to Public Security
** Also affects: linux (Ubuntu)
Importance: Undecided
Status: New
** Tags added: bionic
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1794169
Title:
AWS ubuntu became unreachable after ssh login
Status in linux package in Ubuntu:
Incomplete
Status in systemd package in Ubuntu:
New
Bug description:
I've reached strange situation with Ubuntu 18.04 LTS with latest
kernel on AWS m5.xlarge instance.
System became unreachable after series of successful ssh logins.
systemd -user became zombie and block main systemd daemon (PID 1).
I've created bug https://github.com/systemd/systemd/issues/10123 but
it was closed with "there's a problem with your kernel".
https://github.com/systemd/systemd/issues/10123#issuecomment-423984751
Symptoms are very similar to
https://github.com/systemd/systemd/issues/8598
apetren+ 26679 0.0 0.0 0 0 ? Z 02:56 0:00 \_ [(sd-pam)] <defunct>
apetren+ 26855 0.0 0.0 76636 7816 ? Ds 02:57 0:00 /lib/systemd/systemd --user
apetren+ 26856 0.0 0.0 0 0 ? Z 02:57 0:00 \_ [(sd-pam)] <defunct>
apetren+ 26954 0.0 0.0 0 0 ? Zs 02:57 0:00 \_ [kill] <defunct>
apetren+ 27053 0.0 0.0 76636 7496 ? Ss 02:58 0:00 /lib/systemd/systemd --user
apetren+ 27054 0.0 0.0 193972 2768 ? S 02:58 0:00 \_ (sd-pam)
This situation is repeatable on 7 instances 1-2 times per week.
how to repeat: 1. Install ubuntu 18.04 LTS from official ubuntu image
2. update kernel and packages to latest version 3. from another
instance run
while `true` ;do ssh ubuntu at your.instance.ip "hostname; ps -ef|grep
defunc |grep -v grep" ; done
By this command in couple of days I have 2->4->6->8... zombies and in
a hour system is frozen...
sudo reboot is not working, because systemd with PID 1 is unreachable.
kill -9 1 -- not working as well.
# uname -r:
Linux mainframe04 4.15.0-1021-aws #21-Ubuntu SMP Tue Aug 28 10:23:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
# systemd --version
systemd 237
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid
AWS instance m5.xlarge
Please let me know if you need any information.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794169/+subscriptions
More information about the foundations-bugs
mailing list