[Bug 1988119] Re: systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS outages on Azure
Matthew Ruffell
1988119 at bugs.launchpad.net
Thu Sep 1 00:04:20 UTC 2022
The failure mode still exists if "udevadm trigger" has been issued
before the package upgrade to systemd 237-3ubuntu10.55.
That is, if unattended-upgrades or the user had installed open-vm-tools,
and has not rebooted yet, they will lose network connection on upgrade
to 237-3ubuntu10.55.
We need to implement a way to add ID_NET_DRIVER back to the device
before the systemd upgrade takes place, otherwise an outage will occur.
Release admins - DO NOT RELEASE systemd 237-3ubuntu10.55 yet.
Tagging block-proposed.
$ ping google.com
PING google.com (142.251.45.110) 56(84) bytes of data.
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=1 ttl=56 time=1.51 ms
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=2 ttl=56 time=1.35 ms
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=3 ttl=56 time=1.17 ms
^C
--- google.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 1.172/1.349/1.516/0.140 ms
azureuser at mruffell-test:~$ sudo apt-cache policy systemd | grep Installed
Installed: 237-3ubuntu10.53
azureuser at mruffell-test:~$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
E: ID_NET_DRIVER=hv_netvsc
azureuser at mruffell-test:~$ sudo udevadm trigger
azureuser at mruffell-test:~$ ping google.com
PING google.com (142.251.45.110) 56(84) bytes of data.
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=1 ttl=56 time=2.15 ms
64 bytes from iad23s04-in-f14.1e100.net (142.251.45.110): icmp_seq=2 ttl=56 time=1.21 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.212/1.682/2.152/0.470 ms
azureuser at mruffell-test:~$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
azureuser at mruffell-test:~$ sudo apt install libnss-systemd libpam-systemd libsystemd0 libudev1 systemd systemd-sysv udev
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following package was automatically installed and is no longer required:
linux-headers-4.15.0-191
Use 'sudo apt autoremove' to remove it.
Suggested packages:
systemd-container
The following packages will be upgraded:
libnss-systemd libpam-systemd libsystemd0 libudev1 systemd systemd-sysv udev
7 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
Need to get 4497 kB of archives.
After this operation, 8192 B of additional disk space will be used.
Get:1 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 libsystemd0 amd64 237-3ubuntu10.55 [205 kB]
Get:2 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 libnss-systemd amd64 237-3ubuntu10.55 [105 kB]
Get:3 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 libpam-systemd amd64 237-3ubuntu10.55 [107 kB]
Get:4 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 systemd amd64 237-3ubuntu10.55 [2915 kB]
Get:5 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 udev amd64 237-3ubuntu10.55 [1099 kB]
Get:6 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 libudev1 amd64 237-3ubuntu10.55 [54.2 kB]
Get:7 http://ppa.launchpad.net/ubuntu-security-proposed/ppa/ubuntu bionic/main amd64 systemd-sysv amd64 237-3ubuntu10.55 [12.0 kB]
Fetched 4497 kB in 3s (1461 kB/s)
(Reading database ... 77176 files and directories currently installed.)
Preparing to unpack .../libsystemd0_237-3ubuntu10.55_amd64.deb ...
Unpacking libsystemd0:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Setting up libsystemd0:amd64 (237-3ubuntu10.55) ...
(Reading database ... 77176 files and directories currently installed.)
Preparing to unpack .../libnss-systemd_237-3ubuntu10.55_amd64.deb ...
Unpacking libnss-systemd:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Preparing to unpack .../libpam-systemd_237-3ubuntu10.55_amd64.deb ...
Unpacking libpam-systemd:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Preparing to unpack .../systemd_237-3ubuntu10.55_amd64.deb ...
Unpacking systemd (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Preparing to unpack .../udev_237-3ubuntu10.55_amd64.deb ...
Unpacking udev (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Preparing to unpack .../libudev1_237-3ubuntu10.55_amd64.deb ...
Unpacking libudev1:amd64 (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Setting up libudev1:amd64 (237-3ubuntu10.55) ...
Setting up systemd (237-3ubuntu10.55) ...
(Reading database ... 77176 files and directories currently installed.)
Preparing to unpack .../systemd-sysv_237-3ubuntu10.55_amd64.deb ...
Unpacking systemd-sysv (237-3ubuntu10.55) over (237-3ubuntu10.53) ...
Setting up libnss-systemd:amd64 (237-3ubuntu10.55) ...
Setting up systemd-sysv (237-3ubuntu10.55) ...
Setting up udev (237-3ubuntu10.55) ...
update-initramfs: deferring update (trigger activated)
Setting up libpam-systemd:amd64 (237-3ubuntu10.55) ...
Processing triggers for libc-bin (2.27-3ubuntu1.6) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Processing triggers for dbus (1.12.2-1ubuntu1.3) ...
Processing triggers for ureadahead (0.100.0-21) ...
Processing triggers for initramfs-tools (0.130ubuntu3.13) ...
update-initramfs: Generating /boot/initrd.img-5.4.0-1089-azure
azureuser at mruffell-test:~$ ping google.com
ping: google.com: Temporary failure in name resolution
azureuser at mruffell-test:~$ udevadm info /sys/class/net/eth0 | grep ID_NET_DRIVER
azureuser at mruffell-test:~$
** Tags added: block-proposed
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1988119
Title:
systemd-udevd: Run net_setup_link on 'change' uevents to prevent DNS
outages on Azure
Status in systemd package in Ubuntu:
Fix Released
Status in systemd source package in Bionic:
Fix Committed
Bug description:
[Impact]
A widespread outage was caused on Azure instances earlier today, when
systemd 237-3ubuntu10.54 was published to the bionic-security pocket.
Instances could no longer resolve DNS queries, breaking networking.
For affected users, the following workarounds are available. Use whatever is most convenient.
- Reboot your instances
- or -
- Issue "udevadm trigger -cadd -yeth0 && systemctl restart systemd-networkd" as root
The trigger was found to be open-vm-tools issuing "udevadm trigger".
Azure has a specific netplan setup that uses the `driver` match to set
up networking. If a udevadm trigger is executed, the KV pair that
contains this info is lost. Next time netplan is executed, the server
loses it's DNS information.
This is the same as bug 1902960 experienced on Focal two years ago.
The root cause was found to be a bug in systemd, where if we receive a
"Remove" action from a change uevent, we need to run net_setup_link(),
we need to skip device rename and keep the old name.
[Testcase]
Start an instance up on Azure, any type. Simply issue udevadm trigger
and reload systemd-networkd:
$ ping google.com
PING google.com (172.253.62.102) 56(84) bytes of data.
64 bytes from bc-in-f102.1e100.net (172.253.62.102): icmp_seq=1 ttl=56 time=1.85 ms
$ sudo udevadm trigger && sudo systemctl restart systemd-networkd
$ ping google.com
ping: google.com: Temporary failure in name resolution
To fix a broken instance, you can run:
$ sudo udevadm trigger -cadd -yeth0 && sudo systemctl restart systemd-
networkd
and then install the test packages below:
Test packages are available in the following ppa:
https://launchpad.net/~mruffell/+archive/ubuntu/sf343528-test
If you install them, the issue should no longer occur.
[Where problems could occur]
If a regression were to occur, it would affect systemd-udevd
processing 'change' events from network devices, which could lead to
network outages. Since this would happen when systemd-networkd is
restarted on postinstall, a regression would cause widespread outages
due to this SRU being targeted to the security pocket, where
unattended-upgrades will automatically install from.
Side effects could include incorrect udevd device properties.
It is very important that this SRU is well tested before release.
[Other info]
This was fixed in Systemd 247 with the following commit:
commit e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
Author: Yu Watanabe <watanabe.yu+github at gmail.com>
Date: Mon, 14 Sep 2020 15:21:04 +0900
Subject: udev: re-assign ID_NET_DRIVER=, ID_NET_LINK_FILE=, ID_NET_NAME= properties on non-'add' uevent
Link: https://github.com/systemd/systemd/commit/e0e789c1e97e2cdf1cafe0c6b7d7e43fa054f151
This was backported to Focal's systemd 245.4-4ubuntu3.4 in bug 1902960
two years ago. Focal required a heavy backport, which was performed by
Dan Streetman. Focals backport can be found in d/p/lp1902960-udev-re-
assign-ID_NET_DRIVER-ID_NET_LINK_FILE-ID_NET.patch, or the below
pastebin:
https://paste.ubuntu.com/p/K5k7bGt3Wx/
The changes between the Focal backport and the Bionic backport are:
- We use udev_device_get_action() instead of device_get_action()
- device_action_from_string() is used to get to enum DeviceAction
- We return 0 from the "if (a == DEVICE_ACTION_MOVE) " hunk instead of "goto no_rename"
- log_device_* has been changed to log_*.
See attached debdiff for Bionic backport.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1988119/+subscriptions
More information about the foundations-bugs
mailing list