[Bug 1636912] Re: systemd-networkd runs too late for cloud-init.service (net)
David Glasser
1636912 at bugs.launchpad.net
Wed Nov 30 05:28:57 UTC 2016
Hi. This issue affected us on Xenial; we explicitly enable systemd-
networkd on our images (when creating our AMI), and after a recent AMI
rebuild we were no longer able to start our AMIs. When I looked at the
system console we saw things that looked like:
[ 52.866176] cloud-init[721]: Cloud-init v. 0.7.8 running 'init' at Wed, 30 Nov 2016 03:13:22 +0000. Up 51.74 seconds.
[ 52.873058] cloud-init[721]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
[ 52.879734] cloud-init[721]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[ 52.886030] cloud-init[721]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
[ 52.892162] cloud-init[721]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[ 52.897909] cloud-init[721]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . | . |
[ 52.904408] cloud-init[721]: ci-info: | lo | True | ::1/128 | . | host | . |
[ 52.910315] cloud-init[721]: ci-info: | ens3 | False | . | . | . | 0a:c6:90:b1:76:26 |
[ 52.916070] cloud-init[721]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[ 52.921096] cloud-init[721]: 2016-11-30 03:13:23,567 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/instance-id (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f4feee32cf8>: Failed to establish a new connection: [Errno 101] Network is unreachable',))]
I eventually noticed that (in comparison to the system log for an older
working AMI) the "Starting Network Service" line was missing and found
this bug. (Text above included mostly in case anybody else sees the
same issue and searches for the error.)
I tested with xenial-proposed and 229-4ubuntu13, and it fixed the issue.
I'd love to see this fix in stable xenial soon!
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1636912
Title:
systemd-networkd runs too late for cloud-init.service (net)
Status in systemd:
Fix Released
Status in cloud-init package in Ubuntu:
Triaged
Status in systemd package in Ubuntu:
Fix Committed
Status in cloud-init source package in Xenial:
Confirmed
Status in systemd source package in Xenial:
Fix Committed
Status in cloud-init source package in Yakkety:
New
Status in systemd source package in Yakkety:
Fix Committed
Bug description:
Ubuntu Core 16 images using cloud-init fail to function when the
DataSource is over the network (Like OpenStack) as networking is not
yet available when cloud-init.service runs.
cloud-init service unit deps look like this:
[Unit]
Description=Initial cloud-init job (metadata service crawler)
DefaultDependencies=no
Wants=cloud-init-local.service
Wants=local-fs.target
Wants=sshd-keygen.service
Wants=sshd.service
After=cloud-init-local.service
After=networking.service
Requires=networking.service
Before=basic.target
Before=dbus.socket
Before=network-online.target
Before=sshd-keygen.service
Before=sshd.service
Before=systemd-user-sessions.service
Conflicts=shutdown.target
Here's networkd unit deps:
[Unit]
Description=Network Service
Documentation=man:systemd-networkd.service(8)
ConditionCapability=CAP_NET_ADMIN
DefaultDependencies=no
# dbus.service can be dropped once on kdbus, and systemd-udevd.service can be
# dropped once tuntap is moved to netlink
After=systemd-udevd.service dbus.service network-pre.target systemd-sysusers.service systemd-sysctl.service
Before=network.target multi-user.target shutdown.target
Conflicts=shutdown.target
Wants=network.target
# On kdbus systems we pull in the busname explicitly, because it
# carries policy that allows the daemon to acquire its name.
Wants=org.freedesktop.network1.busname
After=org.freedesktop.network1.busname
And a critical-chain output:
root at snap-test7:~# systemd-analyze critical-chain systemd-networkd
Failed to get ID: Unit name systemd-networkd is not valid.
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.
root at snap-test7:~# systemd-analyze critical-chain systemd-networkd.service
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.
systemd-networkd.service +440ms
└─dbus.service @11.461s
└─basic.target @11.403s
└─sockets.target @11.401s
└─dbus.socket @11.398s
└─cloud-init.service @10.127s +1.266s
└─networking.service @9.305s +799ms
└─network-pre.target @9.295s
└─cloud-init-local.service @3.822s +5.469s
└─local-fs.target @3.813s
└─run-cgmanager-fs.mount @12.687s
└─local-fs-pre.target @1.393s
└─systemd-tmpfiles-setup-dev.service @1.116s +195ms
└─kmod-static-nodes.service @887ms +193ms
└─system.slice @783ms
└─-.slice @721ms
cloud-init would need networkd to run at or before
'networking.service' so it can raise networking to then find and use
network-based datasources.
# grep systemd /usr/share/snappy/dpkg.list
ii libnss-resolve:amd64 229-4ubuntu11 amd64 nss module to resolve names via systemd-resolved
ii libpam-systemd:amd64 229-4ubuntu11 amd64 system and service manager - PAM module
ii libsystemd0:amd64 229-4ubuntu11 amd64 systemd utility library
ii systemd 229-4ubuntu11 amd64 system and service manager
ii systemd-sysv 229-4ubuntu11 amd64 system and service manager - SysV links
# grep cloud-init /usr/share/snappy/dpkg.list
ii cloud-init 0.7.8-201610260005-gf7a5756-0ubuntu1~trunk~ubuntu16.04.1 all Init scripts for cloud instances
SRU INFORMATION FOR systemd
===========================
Fix: For xenial it is sufficient to drop systemd-networkd's After=dbus.service (https://github.com/systemd/systemd/commit/5f004d1e32) and (for xenial only) drop the useless org.freedesktop.network1.busname unit (which is always "condition failed" as there is no kdbus, but it moves systemd-network.service after sockets.target which is too late for cloud-init).
Regression potential: Low. networkd is not widely being used outside of netplan/snappy in xenial. Running it before dbus.service is running has two consequences:
- It cannot immediately expose its D-Bus status interface. But it will retry every 5 s until that succeeds, so the D-Bus status interface will continue to work. (see test case)
- If a DHCP response with a hostname or timezone is received before dbus.service is running, it cannot talk to systemd-hostnamed/systemd-timedated to set these properties (if enabled). However, this is broken in xenial anyway as it fails on polkit permissions (this and retrying this configuration after D-Bus is up has been fixed in upstream master now).
As for removing the "*.busname" units in xenial: kdbus has never been
part of any distribiution, there had just been some experimental DKMS
package in some PPA for it. It's dead as an upstream project, so by
dropping the *.busname unit(s) from xenial there should be no
practical effect as these should always not start with "condition
failed". Yakkety's systemd already has them removed.
Test case:
- Install nplan, set up a netplan configuration and remove /etc/network/interfaces.
- Upgrade to the proposed packages.
- Ensure that the network is still functional and "busctl" shows org.freedesktop.network1, i. e. networkd successfully connected to the bus.
- Check the journal that systemd-networkd.service starts before dbus.service, which should usually be the case with this fix. Check "journalctl -b" for "Started Network Service." vs. "Started D-Bus System Message Bus."
If it repeatedly starts the other way around, you can force it with "sudo systemctl edit systemd-networkd.service" and
[Unit]
Before=sysinit.target
(This is effectively what cloud-init.service will do soon.)
To manage notifications about this bug go to:
https://bugs.launchpad.net/systemd/+bug/1636912/+subscriptions
More information about the foundations-bugs
mailing list