[Bug 2117280] Re: [SRU] Asymmetric routing issue on amphorae in ACTIVE_STANDBY topology
Bryan Fraschetti
2117280 at bugs.launchpad.net
Fri Jul 18 19:39:49 UTC 2025
** Description changed:
[Description] (SRU template below)
There is an asymmetric routing issue present when creating Octavia
amphorae (loadbalancer appliances/VMs) in ACTIVE_STANDBY topology on
Yoga. The setup is as follows:
- There are two private networks: network1 with subnet1 and network2 with subnet2, which are connected by an L3 router
- The loadbalancer has an interface on each network.
- The loadbalancer has a virtual ip (VIP) on network1. This is the intended address for ingress traffic, which (via keepalived) floats between MASTER and BACKUP upon failover
- The member VM is on network2. This is the ultimate target machine for incident requests on the VIP.
The expectation is that, bar security group restrictions, any machine
that can reach the VIP should be able to access the target machine since
the amphora will reverse proxy traffic to the member VM. Connections on
network1 to the VIP work as expected, however, in practice we observe
that requests originating on network2 to the VIP do not route correctly.
To contextualize the following content, in my environment the subnet1
(vip subnet) cidr has the form 192.168.21.0/24 cidr while subnet2
(member subnet) is 172.16.0.0/24. If we look at the amphora-haproxy
namespace in the amphora we see the following ip rules:
$ sudo ip netns exec amphora-haproxy ip rule
0: from all lookup local
100: from 192.168.21.155 lookup 1 proto keepalived # from VIP
32766: from all lookup main
32767: from all lookup default
This means when the amphora is using it's VIP as its src ip, it will reference table 1 for routing. Inspecting the available routes,
$ sudo ip netns exec amphora-haproxy ip route show table 1
192.168.21.0/24 dev eth1 proto keepalived scope link src 192.168.21.155
- There is only a route to the 192.168.21.0/24 subnet, on which it will
- use the interface's primary ip address (rather than the vip). What this
- means is that there is no route to the 172.16.0.0/24 subnet. This means
- if the client vm is on any subnet that isn't the vip-subnet the return
- path is broken
+ There is only a route to the 192.168.21.0/24 subnet (vip subnet), on
+ which it will use the vip address as the source address. What this means
+ is that there is no route to the 172.16.0.0/24 subnet (target subnet) or
+ another subnet. Essentially if the client vm is on any subnet that isn't
+ the vip subnet the return path is broken
This is not a problem in the ACTIVE or ACTIVE_ACTIVE topologies. The
reason is that those are not maintained by keepalived and instead have
default routes programmed into the amphora's table 1 at [1]. Note the if
topology != consts.TOPOLOGY_ACTIVE_STANDBY predicate, which is
indicative of the different way in which ACTIVE_STANDBY is managed.
ACTIVE_STANDBY is instead configured by the vrrp driver populating the
keepalived template at [2]. Unlike the other topologies, in the
keepalived template there is no programmed default route in table 1.
This was fixed in [3], which merged after Yoga but before Zed. This
commit contains feature implementations and small schema changes and as
such I'm not suggesting we SRU this change, but simply mentioning it for
the context of affected versions. Instead, I have prepared a minimal
patch that simply adds the default route to the template
[1] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/backends/utils/interface_file.py?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5#n135
[2] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/drivers/keepalived/jinja/templates/keepalived_base.template?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5 Note that
[3] https://github.com/openstack/octavia/commit/d9ee63f561019c247a49de5805b6d9dcbafeeadf
[Impact]
- Amphorae in ACTIVE_STANDBY topology exhibit an asymmetric routing
issue that prevents traffic from passing as expected.
- As a result of the above, the target VM cannot be on a different
subnet than the vip
- More flexible and complicated networking implementations are not
possible
[Test Plan]
- Set up the network and virtual machines as described
- Check that the patch correctly adds the default route to table 1 of
the amphora network namespace on newly created LBs.
- Validate that the target is now reachable from subnets other than the
vip subnet
- Ensure that amphorae of existing loadbalancers obtain the route when
failed over
[What can go wrong]
- The patch adds a default route if a gateway is detected on the vip
subnet. This should be true, but in the event that a gateway is not
detected no default route will be created (essentially in a worst case
the behaviour matches the current behaviour)
- In a distributed / ha environment with multiple machines running
octavia services, if not all are upgraded, the benefits of the patch may
not be observed. Even if one uses the openstack command line client to
set the octavia service endpoint equal to the fqdn of the upgraded
machine, subsequent activity (such as the processing of the keepalived
template) may be distributed and, therefore, occur on an octavia machine
without the upgrade resulting in the bug not being fixed. Accordingly,
to achieve consistent results it is important to upgrade all octavia
units/hosts.
[Other Info]
- Fortunately, the configuration of amphorae in ACTIVE_STANDBY is done
by the octavia-worker service which runs the vrrp driver that populates
the keepalived template and then uploads the resultant configuration to
a flask server hosted by the amphorae, which digests the file, writes
the contents to its own filesystem, and starts the keepalived service.
What this means is that amphorae images need not be rebuilt to contain
the changes. Simply upgrading the machines running the octavia-worker
service is sufficient. It also means that failing over an existing
amphora results in the new amphora obtaining the route since the unit
that is running octavia has been updated with the new template.
** Description changed:
[Description] (SRU template below)
There is an asymmetric routing issue present when creating Octavia
amphorae (loadbalancer appliances/VMs) in ACTIVE_STANDBY topology on
Yoga. The setup is as follows:
- There are two private networks: network1 with subnet1 and network2 with subnet2, which are connected by an L3 router
- The loadbalancer has an interface on each network.
- The loadbalancer has a virtual ip (VIP) on network1. This is the intended address for ingress traffic, which (via keepalived) floats between MASTER and BACKUP upon failover
- The member VM is on network2. This is the ultimate target machine for incident requests on the VIP.
The expectation is that, bar security group restrictions, any machine
that can reach the VIP should be able to access the target machine since
the amphora will reverse proxy traffic to the member VM. Connections on
network1 to the VIP work as expected, however, in practice we observe
that requests originating on network2 to the VIP do not route correctly.
To contextualize the following content, in my environment the subnet1
(vip subnet) cidr has the form 192.168.21.0/24 cidr while subnet2
(member subnet) is 172.16.0.0/24. If we look at the amphora-haproxy
namespace in the amphora we see the following ip rules:
$ sudo ip netns exec amphora-haproxy ip rule
0: from all lookup local
100: from 192.168.21.155 lookup 1 proto keepalived # from VIP
32766: from all lookup main
32767: from all lookup default
This means when the amphora is using it's VIP as its src ip, it will reference table 1 for routing. Inspecting the available routes,
$ sudo ip netns exec amphora-haproxy ip route show table 1
192.168.21.0/24 dev eth1 proto keepalived scope link src 192.168.21.155
There is only a route to the 192.168.21.0/24 subnet (vip subnet), on
which it will use the vip address as the source address. What this means
is that there is no route to the 172.16.0.0/24 subnet (target subnet) or
another subnet. Essentially if the client vm is on any subnet that isn't
the vip subnet the return path is broken
This is not a problem in the ACTIVE or ACTIVE_ACTIVE topologies. The
reason is that those are not maintained by keepalived and instead have
default routes programmed into the amphora's table 1 at [1]. Note the if
topology != consts.TOPOLOGY_ACTIVE_STANDBY predicate, which is
indicative of the different way in which ACTIVE_STANDBY is managed.
ACTIVE_STANDBY is instead configured by the vrrp driver populating the
keepalived template at [2]. Unlike the other topologies, in the
keepalived template there is no programmed default route in table 1.
This was fixed in [3], which merged after Yoga but before Zed. This
commit contains feature implementations and small schema changes and as
such I'm not suggesting we SRU this change, but simply mentioning it for
the context of affected versions. Instead, I have prepared a minimal
patch that simply adds the default route to the template
[1] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/backends/utils/interface_file.py?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5#n135
[2] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/drivers/keepalived/jinja/templates/keepalived_base.template?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5 Note that
[3] https://github.com/openstack/octavia/commit/d9ee63f561019c247a49de5805b6d9dcbafeeadf
[Impact]
- Amphorae in ACTIVE_STANDBY topology exhibit an asymmetric routing
issue that prevents traffic from passing as expected.
- - As a result of the above, the target VM cannot be on a different
- subnet than the vip
+ - As a result of the above, the target and client VMs cannot be on a
+ different subnet than the vip
- More flexible and complicated networking implementations are not
possible
[Test Plan]
- Set up the network and virtual machines as described
- Check that the patch correctly adds the default route to table 1 of
the amphora network namespace on newly created LBs.
- Validate that the target is now reachable from subnets other than the
vip subnet
- Ensure that amphorae of existing loadbalancers obtain the route when
failed over
[What can go wrong]
- The patch adds a default route if a gateway is detected on the vip
subnet. This should be true, but in the event that a gateway is not
detected no default route will be created (essentially in a worst case
the behaviour matches the current behaviour)
- In a distributed / ha environment with multiple machines running
octavia services, if not all are upgraded, the benefits of the patch may
not be observed. Even if one uses the openstack command line client to
set the octavia service endpoint equal to the fqdn of the upgraded
machine, subsequent activity (such as the processing of the keepalived
template) may be distributed and, therefore, occur on an octavia machine
without the upgrade resulting in the bug not being fixed. Accordingly,
to achieve consistent results it is important to upgrade all octavia
units/hosts.
[Other Info]
- Fortunately, the configuration of amphorae in ACTIVE_STANDBY is done
by the octavia-worker service which runs the vrrp driver that populates
the keepalived template and then uploads the resultant configuration to
a flask server hosted by the amphorae, which digests the file, writes
the contents to its own filesystem, and starts the keepalived service.
What this means is that amphorae images need not be rebuilt to contain
the changes. Simply upgrading the machines running the octavia-worker
service is sufficient. It also means that failing over an existing
amphora results in the new amphora obtaining the route since the unit
that is running octavia has been updated with the new template.
--
You received this bug notification because you are a member of Ubuntu
Sponsors, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/2117280
Title:
[SRU] Asymmetric routing issue on amphorae in ACTIVE_STANDBY topology
Status in octavia:
New
Bug description:
[Description] (SRU template below)
There is an asymmetric routing issue present when creating Octavia
amphorae (loadbalancer appliances/VMs) in ACTIVE_STANDBY topology on
Yoga. The setup is as follows:
- There are two private networks: network1 with subnet1 and network2 with subnet2, which are connected by an L3 router
- The loadbalancer has an interface on each network.
- The loadbalancer has a virtual ip (VIP) on network1. This is the intended address for ingress traffic, which (via keepalived) floats between MASTER and BACKUP upon failover
- The member VM is on network2. This is the ultimate target machine for incident requests on the VIP.
The expectation is that, bar security group restrictions, any machine
that can reach the VIP should be able to access the target machine
since the amphora will reverse proxy traffic to the member VM.
Connections on network1 to the VIP work as expected, however, in
practice we observe that requests originating on network2 to the VIP
do not route correctly.
To contextualize the following content, in my environment the subnet1
(vip subnet) cidr has the form 192.168.21.0/24 cidr while subnet2
(member subnet) is 172.16.0.0/24. If we look at the amphora-haproxy
namespace in the amphora we see the following ip rules:
$ sudo ip netns exec amphora-haproxy ip rule
0: from all lookup local
100: from 192.168.21.155 lookup 1 proto keepalived # from VIP
32766: from all lookup main
32767: from all lookup default
This means when the amphora is using it's VIP as its src ip, it will reference table 1 for routing. Inspecting the available routes,
$ sudo ip netns exec amphora-haproxy ip route show table 1
192.168.21.0/24 dev eth1 proto keepalived scope link src 192.168.21.155
There is only a route to the 192.168.21.0/24 subnet (vip subnet), on
which it will use the vip address as the source address. What this
means is that there is no route to the 172.16.0.0/24 subnet (target
subnet) or another subnet. Essentially if the client vm is on any
subnet that isn't the vip subnet the return path is broken
This is not a problem in the ACTIVE or ACTIVE_ACTIVE topologies. The
reason is that those are not maintained by keepalived and instead have
default routes programmed into the amphora's table 1 at [1]. Note the
if topology != consts.TOPOLOGY_ACTIVE_STANDBY predicate, which is
indicative of the different way in which ACTIVE_STANDBY is managed.
ACTIVE_STANDBY is instead configured by the vrrp driver populating the
keepalived template at [2]. Unlike the other topologies, in the
keepalived template there is no programmed default route in table 1.
This was fixed in [3], which merged after Yoga but before Zed. This
commit contains feature implementations and small schema changes and
as such I'm not suggesting we SRU this change, but simply mentioning
it for the context of affected versions. Instead, I have prepared a
minimal patch that simply adds the default route to the template
[1] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/backends/utils/interface_file.py?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5#n135
[2] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/drivers/keepalived/jinja/templates/keepalived_base.template?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5 Note that
[3] https://github.com/openstack/octavia/commit/d9ee63f561019c247a49de5805b6d9dcbafeeadf
[Impact]
- Amphorae in ACTIVE_STANDBY topology exhibit an asymmetric routing
issue that prevents traffic from passing as expected.
- As a result of the above, the target and client VMs cannot be on a
different subnet than the vip
- More flexible and complicated networking implementations are not
possible
[Test Plan]
- Set up the network and virtual machines as described
- Check that the patch correctly adds the default route to table 1 of
the amphora network namespace on newly created LBs.
- Validate that the target is now reachable from subnets other than
the vip subnet
- Ensure that amphorae of existing loadbalancers obtain the route when
failed over
[What can go wrong]
- The patch adds a default route if a gateway is detected on the vip
subnet. This should be true, but in the event that a gateway is not
detected no default route will be created (essentially in a worst case
the behaviour matches the current behaviour)
- In a distributed / ha environment with multiple machines running
octavia services, if not all are upgraded, the benefits of the patch
may not be observed. Even if one uses the openstack command line
client to set the octavia service endpoint equal to the fqdn of the
upgraded machine, subsequent activity (such as the processing of the
keepalived template) may be distributed and, therefore, occur on an
octavia machine without the upgrade resulting in the bug not being
fixed. Accordingly, to achieve consistent results it is important to
upgrade all octavia units/hosts.
[Other Info]
- Fortunately, the configuration of amphorae in ACTIVE_STANDBY is done
by the octavia-worker service which runs the vrrp driver that
populates the keepalived template and then uploads the resultant
configuration to a flask server hosted by the amphorae, which digests
the file, writes the contents to its own filesystem, and starts the
keepalived service. What this means is that amphorae images need not
be rebuilt to contain the changes. Simply upgrading the machines
running the octavia-worker service is sufficient. It also means that
failing over an existing amphora results in the new amphora obtaining
the route since the unit that is running octavia has been updated with
the new template.
To manage notifications about this bug go to:
https://bugs.launchpad.net/octavia/+bug/2117280/+subscriptions
More information about the Ubuntu-sponsors
mailing list