[Bug 1636708] Re: ifup -a does not start dependants last, causes deadlocks with vlans/bonding
Dimitri John Ledkov
launchpad at surgut.co.uk
Tue May 9 11:30:10 UTC 2017
Not quite.
On boot, there are multiple ways that ifup is called, and effectively it
races with itself.
In my case I have two vlans, on top of a bond, of two NICs. By the time
networking.service is called, the two NICs are present.
networking.service is essentially `ifup -a`, it looks at the eni file
and realises that it should bring bond0. It looks for bond0 in its
internal state and creates it.
This is where the race starts.
ifupdown ships /lib/udev/rules.d/80-ifupdown.rules which calls /lib/udev/ifupdown-hotplug which effectively does
$ exec systemctl --no-block start $(systemd-escape --template ifup at .service $INTERFACE)
(very strange to do this on systemd systems, because one could have just did SYSTEMD_WANTS, but anyway)
At this point bond0 is being brought up by networking.service unit (ifup
-a) and ifup at bond0.service (ifup bond0). Sometimes one can see "already
configured" message from either of the two units in the logs.
But also, at this point it time, ifup at bond0.101.service and
ifup at bond0.401.service may have been started as well.
In my case my machine manages to hit this race quite a bit. I am
attaching a journal log, of what is happening.
The log is produced using:
journalctl -u ifup@*.service -u networking -o verbose | grep -e UTC -e UNIT -e MESSAGE
You can see messages that things are waiting on bond0 to be up; and that
one or the other vlan is waiting on bond0 lock. To beat the locks and to
prevent ifup at .service interfering with networking at .service, or executing
in parallel and creating deadlocks, I had to encode the dependencies
between these units in systemd brain by doing this:
# cat /etc/systemd/system/ifup at bond0.101.service.d/order.conf
[Unit]
Wants=ifup at bond0.service
After=ifup at bond0.service
# cat /etc/systemd/system/ifup at bond0.401.service.d/order.conf
[Unit]
Wants=ifup at bond0.service
After=ifup at bond0.service
This way the ordering is enforced for the ifup at .service hotplug. IMHO
ifupdown should ship a generator, that would create these dependencies
and orderings between interfaces. And possibly ifup -a should be reduced
to starting ifup@%I.service for every interface it is meant to start for
a given command.
I'm not sure if we can cheat and state that ifup at .service should be
Wants=networking.service After=networking.service. Because I think then
we may get ourselves into the situation that ifupdown fails to resolve
cycles in the eni, when eni is specified out of order.
For cloud-init, this is more complicated. As on boot the generators will
fire, before eni is populated. Therefore cloud-init should probably re-
run this magical ifupdwon generator (just like it does for netplan) or
cloud-init should create these symlinks directly, and reload systemd
before moving onto networking.service.
Does above make sense at all?
** Attachment added: "ifupdown-race-itself.txt"
https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1636708/+attachment/4874021/+files/ifupdown-race-itself.txt
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to ifupdown in Ubuntu.
https://bugs.launchpad.net/bugs/1636708
Title:
ifup -a does not start dependants last, causes deadlocks with
vlans/bonding
Status in ifupdown package in Ubuntu:
Confirmed
Bug description:
This is a problem I've been struggling with since moving to 16.04.1
from 14.04 (fresh install)
I don't believe this problem affected 14.04. I have used an almost
identical interfaces file on 14.04 without problem.
On 16.04.1, however, 9/10 boots would hang during network
configuration and leave the network incorrectly configured.
When calling "ifup -a" all candidate interfaces appear to be started
in parallel leading to collisions with locks. This causes hanging
(until timeout) during booting and the network interfaces left
incorrectly configured
Imagine this /etc/network/interfaces
auto eno1 bond0 bond0.1
iface eno1 inet manual
bond-master bond0
iface bond0 inet manual
bond-slaves eno1
bond-mode 4
bond-lacp-rate 1
bond-miimon 100
bond-updelay 200
bond-downdelay 200
iface bond0.5 inet dhcp
vlan-raw-device bond0
eno1 -> bond0 -> bond0.5 -> dhcp
When calling "ifup -a" at boot time all three interfaces are started
at the same time.
bond0 and bond0.5 both attempt to share the same lock file:
/run/network/ifstate.bond0
If bond0 wins the race, the system will start correctly (1/10):
* bond0 starts and creates the bond0 device and the ifenslave.bond0 file to indicate the bond is ready
* eno1 polls for the ifenslave.bond0 file, when it appears it attaches eno1 to bond0
* bond0 finishes and releases the lock
* bond0.5 now acquires the lock.
* bond0.5 starts dhclient, which can talk to the network and configure the interface
If, however, bond0.2 wins the lock race, the system will hang at boot (5 mins) and fail to set up the network.
* bond0.5 is awarded the ifstate.bond0 lockfile
* bond0.5 starts dhclient waiting to hear from the network
* bond0 is blocked, so bond0 is not created nor is the bond0.ifenslave file
* eno1 polls but never finds the ifenslave.bond0 file so never attaches to bond0
* bond0.5's dhclient is trying to talk to a disconnected network and never receives an answer
! bond0.5 is stuck running dhclient
! bond0 is stuck waiting for bond0.5 to finish
! eno1 is stuck waiting for bond0 to create the ifenslave.bond0 file
I believe ifup should start interfaces (that share lock files) in dependant order. The most basic interface must be awarded the lock over its dependants. In this case:
1 eno1
2 bond0
3 bond0.5
but never:
1 eno1
2 bond0.5
3 bond0
As a work arouund, in /etc/network/interfaces
-auto eno1 bond0 bond0.1
+auto eno1 bond0
+allow-bond bond0.1
And also in /lib/systemd/system/networking.service
ExecStart=/sbin/ifup -a --read-environment
+ExecStart=/sbin/ifup -a --allow=bond --read-environment
ExecStop=/sbin/ifdown -a --read-environment
Then run:
systemctl dameon-reload
This causes all "auto" interfaces to start then, when they've
completed, all allow-bond interfaces to start.
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: ifupdown 0.8.10ubuntu1.1 [modified: lib/systemd/system/networking.service]
ProcVersionSignature: Ubuntu 4.4.0-45.66-generic 4.4.21
Uname: Linux 4.4.0-45-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.1
Architecture: amd64
Date: Wed Oct 26 06:32:57 2016
InstallationDate: Installed on 2016-10-24 (1 days ago)
InstallationMedia: Ubuntu-Server 16.04.1 LTS "Xenial Xerus" - Release amd64 (20160719)
SourcePackage: ifupdown
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.init.networking.conf: [modified]
mtime.conffile..etc.init.networking.conf: 2016-10-26T04:52:05.750927
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1636708/+subscriptions
More information about the foundations-bugs
mailing list