[Bug 2003250] Re: networkctl reload with bond devices causes slaves to go DOWN and UP, causing couple of seconds of network loss
Andreas Hasenack
2003250 at bugs.launchpad.net
Thu Feb 27 18:52:29 UTC 2025
Hello frantisek, or anyone else affected,
Accepted systemd into jammy-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/systemd/249.11-0ubuntu3.15 in a few
hours, and then in the -proposed repository.
Please help us by testing this new package. See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed. Your feedback will aid us getting this
update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
jammy to verification-done-jammy. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-jammy. In either case, without details of your testing we will
not be able to proceed.
Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in
advance for helping!
N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.
** Tags removed: verification-done verification-done-jammy
** Tags added: verification-needed verification-needed-jammy
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2003250
Title:
networkctl reload with bond devices causes slaves to go DOWN and UP,
causing couple of seconds of network loss
Status in systemd package in Ubuntu:
Fix Released
Status in systemd source package in Jammy:
Fix Committed
Status in systemd source package in Kinetic:
Won't Fix
Bug description:
[SRU TEMPLATE]
[DESCRIPTION]
We currently use Ubuntu 22.04.1 LTS including updates for our production cloud (switched from legacy Centos 7).
Although we like the distribution we recently hit serious systemd buggy behavior described in [1] bugreport using packages [2].
Unfortunatelly the clouds we are running consist of openstack on top
of kubernetes and we need to have complex network configuration
including linux bond devices.
Our observation is that every time we apply our configuration via
CI/CD infrastructure using ansible and netplan (regardless whether
there is actual network configuration change) we see approximatelly
8-16 seconds network interruptions and see bond interfaces going DOWN
and then UP.
We expect bond interfaces stay UP when there is no network
configuration change.
We went though couple of options how to solve the issue and the first
one is to add such existing patch [3] into current
systemd-249.11-0ubuntu3.6.
Could you comment whether this kind of non-security patch is likely to land in 22.04.1 LTS soon.
We are able to help to bring patch into systemd package community way if you suggest the steps.
[TESTING]
On a Jammy system, create a bond interface with two subordinate
devices. Assuming the interfaces ens3 and ens9 exist on the system,
this can be done using the following:
$ cat > /etc/netplan/bond.yaml << EOF
network:
version: 2
renderer: networkd
ethernets:
ens3:
dhcp4: no
ens9:
dhcp4: no
bonds:
bond0:
dhcp4: yes
interfaces:
- ens3
- ens9
parameters:
mode: active-backup
primary: ens3
EOF
$ netplan generate && netplan apply
From here, there are two tests that can be used to verify the fix.
1. Update the modification time of the generated network files, and
call networkctl reload. From networkctl(1), when "reload" is called:
[...] If a new, modified or removed .network file is found, then all
interfaces which match the file are reconfigured.
Hence, the following will trigger the desired code path:
$ touch /run/systemd/network/*
$ networkctl reload
Without the fix, you can see in the logs the interfaces of the bond
going up and down. With the fix, this should not happen.
$ journalctl -b -u systemd-networkd.service --grep="Link DOWN"
Finally, check that everything is back in the configured state:
$ networkctl status
2. This bug can also be triggered by calling networkctl reconfigure
directly.
$ networkctl reconfigure ens3
$ networkctl reconfigure ens9
Check the logs that the links were not brought down:
$ journalctl -b -u systemd-networkd.service --grep="Link DOWN"
Finally, check that everything is back in the configured state:
$ networkctl status
[REGRESSION POTENTIAL]
This patch is confined to the SET_LINK_MASTER logic for configuring
links in systemd-networkd. While bond interfaces are the motivation
for the fix, this early return applies for all interface types which
SET_LINK_MASTER is supported, e.g. bridge interfaces as well.
This logic has seen exercise in newer releases of systemd and Ubuntu
without further modification, so I would not expect to see regressions
for other interface types. Furthermore, the bond type is the only type
where the link is set to down in order to configure the master
interface index, so this call was already effectively a no-op for
those other interface types.
If any problems did occur, it would be related to (re-)configuring
link types which have a master interface set.
[OTHER]
This fix requires two upstream patches:
https://github.com/systemd/systemd/commit/9f913d37a0
https://github.com/systemd/systemd/commit/c3e12de0a6
The second is a follow-up to the first, to complete the fix.
These patches do not apply cleanly to v249, so some trivial conflicts
were resolved to make the patches apply. Additionally, some additional
logic is added to the patches so that the link state is correctly set
when this new branch is hit.
Specifically, we decrement the set_link_messages counter, and call
link_check_ready() before returning -EALREADY. This is necessary
because the version of systemd where these patches originate from saw
a lot of refactoring in this area of systemd-networkd since v249. So,
while in newer versions of systemd, the message counter is handled
correctly, and link_check_ready() is eventually called despite
cancelling the SET_LINK_MASTER request, this never happens when these
patches are applied to v249. Hence, we add the necessary steps to the
patch.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2003250/+subscriptions
More information about the foundations-bugs
mailing list