[Bug 2117280] Re: [SRU] Asymmetric routing issue on amphorae in ACTIVE_STANDBY topology

Bryan Fraschetti 2117280 at bugs.launchpad.net
Tue Sep 2 21:04:39 UTC 2025


** Description changed:

  [Description] (SRU template below)
  
  There is an asymmetric routing issue present when creating Octavia
  amphorae (loadbalancer appliances/VMs) in ACTIVE_STANDBY topology on
  Yoga. The setup is as follows:
  
  - There are two private networks: network1 with subnet1 and network2 with subnet2, which are connected by an L3 router
  - The loadbalancer has an interface on each network.
  - The loadbalancer has a virtual ip (VIP) on network1. This is the intended address for ingress traffic, which (via keepalived) floats between MASTER and BACKUP upon failover
  - The member VM is on network2. This is the ultimate target machine for incident requests on the VIP.
  
  The expectation is that, bar security group restrictions, any machine
  that can reach the VIP should be able to access the target machine since
  the amphora will reverse proxy traffic to the member VM. Connections on
  network1 to the VIP work as expected, however, in practice we observe
  that requests originating on network2 to the VIP do not route correctly.
  
  To contextualize the following content, in my environment the subnet1
  (vip subnet) cidr has the form 192.168.21.0/24 cidr while subnet2
  (member subnet) is 172.16.0.0/24. If we look at the amphora-haproxy
  namespace in the amphora we see the following ip rules:
  
  $ sudo ip netns exec amphora-haproxy ip rule
  0:      from all lookup local
  100:    from 192.168.21.155 lookup 1 proto keepalived # from VIP
  32766:  from all lookup main
  32767:  from all lookup default
  
  This means when the amphora is using it's VIP as its src ip, it will reference table 1 for routing. Inspecting the available routes,
  $ sudo ip netns exec amphora-haproxy ip route show table 1
  192.168.21.0/24 dev eth1 proto keepalived scope link src 192.168.21.155
  
  There is only a route to the 192.168.21.0/24 subnet (vip subnet), on
  which it will use the vip address as the source address. What this means
  is that there is no route to the 172.16.0.0/24 subnet (target subnet) or
  another subnet. Essentially if the client vm is on any subnet that isn't
  the vip subnet the return path is broken
  
  This is not a problem in the ACTIVE or ACTIVE_ACTIVE topologies. The
  reason is that those are not maintained by keepalived and instead have
  default routes programmed into the amphora's table 1 at [1]. Note the if
  topology != consts.TOPOLOGY_ACTIVE_STANDBY predicate, which is
  indicative of the different way in which ACTIVE_STANDBY is managed.
  ACTIVE_STANDBY is instead configured by the vrrp driver populating the
  keepalived template at [2]. Unlike the other topologies, in the
  keepalived template there is no programmed default route in table 1.
  
  This was fixed in [3], which merged after Yoga but before Zed. This
  commit contains feature implementations and small schema changes and as
  such I'm not suggesting we SRU this change, but simply mentioning it for
  the context of affected versions. Instead, I have prepared a minimal
  patch that simply adds the default route to the template
  
  [1] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/backends/utils/interface_file.py?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5#n135
  [2] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/drivers/keepalived/jinja/templates/keepalived_base.template?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5 Note that
  [3] https://github.com/openstack/octavia/commit/d9ee63f561019c247a49de5805b6d9dcbafeeadf
  
  [Impact]
  
  - Amphorae in ACTIVE_STANDBY topology exhibit an asymmetric routing
  issue that prevents traffic from passing as expected.
  
  - As a result of the above, the target and client VMs cannot be on a
  different subnet than the vip
  
  - More flexible and complicated networking implementations are not
  possible
  
  [Test Plan]
  
- - Set up the network and virtual machines as described
- 
- - Check that the patch correctly adds the default route to table 1 of
- the amphora network namespace on newly created LBs.
- 
- - Validate that the target is now reachable from subnets other than the
- vip subnet
- 
- - Ensure that amphorae of existing loadbalancers obtain the route when
- failed over
- 
- [What can go wrong]
- 
- - The patch adds a default route if a gateway is detected on the vip
- subnet. This should be true, but in the event that a gateway is not
- detected no default route will be created (essentially in a worst case
- the behaviour matches the current behaviour)
- 
- - In a distributed / ha environment with multiple machines running
- octavia services, if not all are upgraded, the benefits of the patch may
- not be observed. Even if one uses the openstack command line client to
- set the octavia service endpoint equal to the fqdn of the upgraded
- machine, subsequent activity (such as the processing of the keepalived
- template) may be distributed and, therefore, occur on an octavia machine
- without the upgrade resulting in the bug not being fixed. Accordingly,
- to achieve consistent results it is important to upgrade all octavia
- units/hosts.
+ [Test Plan]
+ 
+ - Run the following steps without using the patched octavia
+ 
+ 1. Deploy OpenStack with Octavia using any method you would like (via
+ juju, devstack, kolla-ansible, or manually/custom) and ensure that when
+ configuring Octavia, the load-balancer topology is set to
+ ACTIVE_STANDBY. As there are many ways to deploy OpenStack, each with
+ their own nuances, and unique steps, I don't think it's practical to
+ elaborate and will leave it up to the user to choose their method for
+ this step. Generally, the recommendation is to follow the upstream
+ deployment guide for whichever platform you're using. I will be using
+ juju. To set the topology in juju, run:
+ 
+ juju config octavia loadbalancer-topology=ACTIVE_STANDBY
+ 
+ 2. Once the openstack services are up and the environment is ready,
+ authorize the openstack command line client to the desired scope by
+ sourcing the credentials any way you would like (via a .creds-rc file,
+ setting the OS_CLOUD environment variable, etc.).
+ 
+ 3. Once authorized to the desired scope (user and project), create the
+ network as described in the [Description] section:
+ 
+ openstack network create net1 # This is the VIP subnet
+ 
+ openstack subnet create subnet1 \
+   --network net1 \
+   --subnet-range 192.168.21.0/24 \
+   --gateway 192.168.21.1 \
+   --dns-nameserver 8.8.8.8
+ 
+ openstack network create net2 # This is the subnet for the ultimate
+ target machine
+ 
+ openstack subnet create subnet2 \
+   --network net2 \
+   --subnet-range 172.16.0.0/24 \
+   --gateway 172.16.0.1 \
+   --dns-nameserver 8.8.8.8
+ 
+ - Create a router and attach it to the two subnets
+ 
+ openstack router create router1
+ openstack router add subnet router1 subnet1
+ openstack router add subnet router1 subnet2
+ 
+ 4. Create a machine on each subnet. Note that this assumes you have
+ uploaded a cirros image to glance called cirros-0.4.0, have created a
+ flavor called m1.tiny, and that the default security group allows SSH
+ (TCP on port 22) from anywhere (0.0.0.0/0)
+ 
+ openstack server create --flavor m1.tiny --image cirros-0.4.0 --net net1 server1 # This machine will act as a client on the VIP network
+ openstack server create --flavor m1.tiny --image cirros-0.4.0 --net net2 server2 # This machine will act as the destination of the loadbalancer
+ 
+ 5. Create an amphora-based loadbalancer. This assumes you have created
+ an amphora image, either manually, with octavia's diskimage-create.sh
+ tool, or using the disk-image-retrofit snap, and that it has been
+ properly uploaded to glance with the octavia-amphora image tag. We're
+ going to use the LB to reverse proxy all ssh traffic to the target
+ machine to test connectivity
+ 
+ openstack loadbalancer create --name lb --vip-network-id net1 --wait
+ openstack loadbalancer pool create --name pool --protocol TCP --loadbalancer lb --lb-algorithm ROUND_ROBIN --wait
+ export SERVER2_IP=$(openstack server show server2 --format json --column addresses | jq --raw-output '.addresses.net2[]')
+ openstack loadbalancer member create --name server2 --subnet-id subnet2 --address ${SERVER2_IP} --protocol-port 22 pool --wait
+ openstack loadbalancer listener create lb --protocol TCP --protocol-port 22 --name listener --default-pool pool --wait
+ 
+ At this point, the environment is configured and we should have two
+ amphorae, which can be checked via
+ 
+ openstack loadbalancer amphora list
+ 
+ 6. Now we need to start test whether or not machines can reach the
+ target machine through the amphora VIP. Open up a session in the compute
+ hypervisor with the needed environment variables (network uuids, and
+ machine ips)
+ 
+ juju ssh nova-compute/0 "export NET1_UUID=$(openstack network show net1
+ -f json | jq --raw-output .id); export NET2_UUID=$(openstack network
+ show net2 -f json | jq --raw-output .id); export SERVER1_IP=$(openstack
+ server show server1 --format json --column addresses | jq --raw-output
+ '.addresses.net1[]'); export SERVER2_IP=$(openstack server show server2
+ --format json --column addresses | jq --raw-output '.addresses.net2[]');
+ export VIP_IP=$(openstack loadbalancer list -f json | jq --raw-output
+ .[].vip_address); bash -l"
+ 
+ - Connect to the machine on the VIP subnet
+ 
+ sudo ip netns exec ovnmeta-$NET1_UUID ssh cirros@$SERVER1_IP "export
+ VIP_IP=$VIP_IP; sh -l"
+ 
+ - ssh to the target via the VIP
+ 
+ ssh cirros@$VIP_IP # This works successfully
+ 
+ - Exit back to the juju machine (nova hypervisor) and connect to the target machine
+ sudo ip netns exec ovnmeta-$NET2_UUID ssh cirros@$SERVER2_IP "export VIP_IP=$VIP_IP; sh -l"
+ 
+ 7. Try to ssh into itself through the VIP_IP. Note that if you want,
+ instead of ssh-ing to itself, you could create a third server on net2
+ and validate that ssh-ing to the target machine from there through the
+ VIP is also broken
+ 
+ ssh cirros@$VIP_IP # This does not work, the command hangs.
+ 
+ - Exit back to the machine with the openstack and juju clients
+ 
+ 8. Optionally, you can check that the amphora doesn't have the default
+ route by copying the amphora ssh key to the octavia unit (which has the
+ octavia-lb-mgmt network namespace), ssh-ing into the MASTER amphora, and
+ running sudo ip netns exec amphora-haproxy ip route show table 1.
+ 
+ 9. Upgrade all octavia units to -proposed package and restart all
+ octavia-* services if they do not automatically do so
+ 
+ 10. Fortunately, because the vrrp driver and keepalived template are
+ uploaded by the octavia-worker.service to the amphora, we do not need to
+ rebuild the amphora image. All we need to do is failover the
+ loadbalancer so that the old amphora master instance is deleted and the
+ unit that replaces it receives the updated template.
+ 
+ openstack loadbalancer failover lb
+ 
+ 11. Repeat 6 and 7 verifying that ssh works from both subnets
+ 
+ 12. Optionally, repeat 8 but observe that now table 1 contains a default
+ route
+ 
  
  [Other Info]
  
  - Fortunately, the configuration of amphorae in ACTIVE_STANDBY is done
  by the octavia-worker service which runs the vrrp driver that populates
  the keepalived template and then uploads the resultant configuration to
  a flask server hosted by the amphorae, which digests the file, writes
  the contents to its own filesystem, and starts the keepalived service.
  What this means is that amphorae images need not be rebuilt to contain
  the changes. Simply upgrading the machines running the octavia-worker
  service is sufficient. It also means that failing over an existing
  amphora results in the new amphora obtaining the route since the unit
  that is running octavia has been updated with the new template.

** Description changed:

  [Description] (SRU template below)
  
  There is an asymmetric routing issue present when creating Octavia
  amphorae (loadbalancer appliances/VMs) in ACTIVE_STANDBY topology on
  Yoga. The setup is as follows:
  
  - There are two private networks: network1 with subnet1 and network2 with subnet2, which are connected by an L3 router
  - The loadbalancer has an interface on each network.
  - The loadbalancer has a virtual ip (VIP) on network1. This is the intended address for ingress traffic, which (via keepalived) floats between MASTER and BACKUP upon failover
  - The member VM is on network2. This is the ultimate target machine for incident requests on the VIP.
  
  The expectation is that, bar security group restrictions, any machine
  that can reach the VIP should be able to access the target machine since
  the amphora will reverse proxy traffic to the member VM. Connections on
  network1 to the VIP work as expected, however, in practice we observe
  that requests originating on network2 to the VIP do not route correctly.
  
  To contextualize the following content, in my environment the subnet1
  (vip subnet) cidr has the form 192.168.21.0/24 cidr while subnet2
  (member subnet) is 172.16.0.0/24. If we look at the amphora-haproxy
  namespace in the amphora we see the following ip rules:
  
  $ sudo ip netns exec amphora-haproxy ip rule
  0:      from all lookup local
  100:    from 192.168.21.155 lookup 1 proto keepalived # from VIP
  32766:  from all lookup main
  32767:  from all lookup default
  
  This means when the amphora is using it's VIP as its src ip, it will reference table 1 for routing. Inspecting the available routes,
  $ sudo ip netns exec amphora-haproxy ip route show table 1
  192.168.21.0/24 dev eth1 proto keepalived scope link src 192.168.21.155
  
  There is only a route to the 192.168.21.0/24 subnet (vip subnet), on
  which it will use the vip address as the source address. What this means
  is that there is no route to the 172.16.0.0/24 subnet (target subnet) or
  another subnet. Essentially if the client vm is on any subnet that isn't
  the vip subnet the return path is broken
  
  This is not a problem in the ACTIVE or ACTIVE_ACTIVE topologies. The
  reason is that those are not maintained by keepalived and instead have
  default routes programmed into the amphora's table 1 at [1]. Note the if
  topology != consts.TOPOLOGY_ACTIVE_STANDBY predicate, which is
  indicative of the different way in which ACTIVE_STANDBY is managed.
  ACTIVE_STANDBY is instead configured by the vrrp driver populating the
  keepalived template at [2]. Unlike the other topologies, in the
  keepalived template there is no programmed default route in table 1.
  
  This was fixed in [3], which merged after Yoga but before Zed. This
  commit contains feature implementations and small schema changes and as
  such I'm not suggesting we SRU this change, but simply mentioning it for
  the context of affected versions. Instead, I have prepared a minimal
  patch that simply adds the default route to the template
  
  [1] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/backends/utils/interface_file.py?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5#n135
  [2] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/drivers/keepalived/jinja/templates/keepalived_base.template?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5 Note that
  [3] https://github.com/openstack/octavia/commit/d9ee63f561019c247a49de5805b6d9dcbafeeadf
  
  [Impact]
  
  - Amphorae in ACTIVE_STANDBY topology exhibit an asymmetric routing
  issue that prevents traffic from passing as expected.
  
  - As a result of the above, the target and client VMs cannot be on a
  different subnet than the vip
  
  - More flexible and complicated networking implementations are not
  possible
- 
- [Test Plan]
  
  [Test Plan]
  
  - Run the following steps without using the patched octavia
  
  1. Deploy OpenStack with Octavia using any method you would like (via
  juju, devstack, kolla-ansible, or manually/custom) and ensure that when
  configuring Octavia, the load-balancer topology is set to
  ACTIVE_STANDBY. As there are many ways to deploy OpenStack, each with
  their own nuances, and unique steps, I don't think it's practical to
  elaborate and will leave it up to the user to choose their method for
  this step. Generally, the recommendation is to follow the upstream
  deployment guide for whichever platform you're using. I will be using
  juju. To set the topology in juju, run:
  
  juju config octavia loadbalancer-topology=ACTIVE_STANDBY
  
  2. Once the openstack services are up and the environment is ready,
  authorize the openstack command line client to the desired scope by
  sourcing the credentials any way you would like (via a .creds-rc file,
  setting the OS_CLOUD environment variable, etc.).
  
  3. Once authorized to the desired scope (user and project), create the
  network as described in the [Description] section:
  
  openstack network create net1 # This is the VIP subnet
  
  openstack subnet create subnet1 \
-   --network net1 \
-   --subnet-range 192.168.21.0/24 \
-   --gateway 192.168.21.1 \
-   --dns-nameserver 8.8.8.8
+   --network net1 \
+   --subnet-range 192.168.21.0/24 \
+   --gateway 192.168.21.1 \
+   --dns-nameserver 8.8.8.8
  
  openstack network create net2 # This is the subnet for the ultimate
  target machine
  
  openstack subnet create subnet2 \
-   --network net2 \
-   --subnet-range 172.16.0.0/24 \
-   --gateway 172.16.0.1 \
-   --dns-nameserver 8.8.8.8
+   --network net2 \
+   --subnet-range 172.16.0.0/24 \
+   --gateway 172.16.0.1 \
+   --dns-nameserver 8.8.8.8
  
  - Create a router and attach it to the two subnets
  
  openstack router create router1
  openstack router add subnet router1 subnet1
  openstack router add subnet router1 subnet2
  
  4. Create a machine on each subnet. Note that this assumes you have
  uploaded a cirros image to glance called cirros-0.4.0, have created a
  flavor called m1.tiny, and that the default security group allows SSH
  (TCP on port 22) from anywhere (0.0.0.0/0)
  
  openstack server create --flavor m1.tiny --image cirros-0.4.0 --net net1 server1 # This machine will act as a client on the VIP network
  openstack server create --flavor m1.tiny --image cirros-0.4.0 --net net2 server2 # This machine will act as the destination of the loadbalancer
  
  5. Create an amphora-based loadbalancer. This assumes you have created
  an amphora image, either manually, with octavia's diskimage-create.sh
  tool, or using the disk-image-retrofit snap, and that it has been
  properly uploaded to glance with the octavia-amphora image tag. We're
  going to use the LB to reverse proxy all ssh traffic to the target
  machine to test connectivity
  
  openstack loadbalancer create --name lb --vip-network-id net1 --wait
  openstack loadbalancer pool create --name pool --protocol TCP --loadbalancer lb --lb-algorithm ROUND_ROBIN --wait
  export SERVER2_IP=$(openstack server show server2 --format json --column addresses | jq --raw-output '.addresses.net2[]')
  openstack loadbalancer member create --name server2 --subnet-id subnet2 --address ${SERVER2_IP} --protocol-port 22 pool --wait
  openstack loadbalancer listener create lb --protocol TCP --protocol-port 22 --name listener --default-pool pool --wait
  
  At this point, the environment is configured and we should have two
  amphorae, which can be checked via
  
  openstack loadbalancer amphora list
  
  6. Now we need to start test whether or not machines can reach the
  target machine through the amphora VIP. Open up a session in the compute
  hypervisor with the needed environment variables (network uuids, and
  machine ips)
  
  juju ssh nova-compute/0 "export NET1_UUID=$(openstack network show net1
  -f json | jq --raw-output .id); export NET2_UUID=$(openstack network
  show net2 -f json | jq --raw-output .id); export SERVER1_IP=$(openstack
  server show server1 --format json --column addresses | jq --raw-output
  '.addresses.net1[]'); export SERVER2_IP=$(openstack server show server2
  --format json --column addresses | jq --raw-output '.addresses.net2[]');
  export VIP_IP=$(openstack loadbalancer list -f json | jq --raw-output
  .[].vip_address); bash -l"
  
  - Connect to the machine on the VIP subnet
  
  sudo ip netns exec ovnmeta-$NET1_UUID ssh cirros@$SERVER1_IP "export
  VIP_IP=$VIP_IP; sh -l"
  
  - ssh to the target via the VIP
  
  ssh cirros@$VIP_IP # This works successfully
  
  - Exit back to the juju machine (nova hypervisor) and connect to the target machine
  sudo ip netns exec ovnmeta-$NET2_UUID ssh cirros@$SERVER2_IP "export VIP_IP=$VIP_IP; sh -l"
  
  7. Try to ssh into itself through the VIP_IP. Note that if you want,
  instead of ssh-ing to itself, you could create a third server on net2
  and validate that ssh-ing to the target machine from there through the
  VIP is also broken
  
  ssh cirros@$VIP_IP # This does not work, the command hangs.
  
  - Exit back to the machine with the openstack and juju clients
  
  8. Optionally, you can check that the amphora doesn't have the default
  route by copying the amphora ssh key to the octavia unit (which has the
  octavia-lb-mgmt network namespace), ssh-ing into the MASTER amphora, and
  running sudo ip netns exec amphora-haproxy ip route show table 1.
  
  9. Upgrade all octavia units to -proposed package and restart all
  octavia-* services if they do not automatically do so
  
  10. Fortunately, because the vrrp driver and keepalived template are
  uploaded by the octavia-worker.service to the amphora, we do not need to
  rebuild the amphora image. All we need to do is failover the
  loadbalancer so that the old amphora master instance is deleted and the
  unit that replaces it receives the updated template.
  
  openstack loadbalancer failover lb
  
  11. Repeat 6 and 7 verifying that ssh works from both subnets
  
  12. Optionally, repeat 8 but observe that now table 1 contains a default
  route
- 
  
  [Other Info]
  
  - Fortunately, the configuration of amphorae in ACTIVE_STANDBY is done
  by the octavia-worker service which runs the vrrp driver that populates
  the keepalived template and then uploads the resultant configuration to
  a flask server hosted by the amphorae, which digests the file, writes
  the contents to its own filesystem, and starts the keepalived service.
  What this means is that amphorae images need not be rebuilt to contain
  the changes. Simply upgrading the machines running the octavia-worker
  service is sufficient. It also means that failing over an existing
  amphora results in the new amphora obtaining the route since the unit
  that is running octavia has been updated with the new template.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2117280

Title:
  [SRU] Asymmetric routing issue on amphorae in ACTIVE_STANDBY topology

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive yoga series:
  New
Status in octavia:
  New
Status in octavia package in Ubuntu:
  New
Status in octavia source package in Jammy:
  New

Bug description:
  [Description] (SRU template below)

  There is an asymmetric routing issue present when creating Octavia
  amphorae (loadbalancer appliances/VMs) in ACTIVE_STANDBY topology on
  Yoga. The setup is as follows:

  - There are two private networks: network1 with subnet1 and network2 with subnet2, which are connected by an L3 router
  - The loadbalancer has an interface on each network.
  - The loadbalancer has a virtual ip (VIP) on network1. This is the intended address for ingress traffic, which (via keepalived) floats between MASTER and BACKUP upon failover
  - The member VM is on network2. This is the ultimate target machine for incident requests on the VIP.

  The expectation is that, bar security group restrictions, any machine
  that can reach the VIP should be able to access the target machine
  since the amphora will reverse proxy traffic to the member VM.
  Connections on network1 to the VIP work as expected, however, in
  practice we observe that requests originating on network2 to the VIP
  do not route correctly.

  To contextualize the following content, in my environment the subnet1
  (vip subnet) cidr has the form 192.168.21.0/24 cidr while subnet2
  (member subnet) is 172.16.0.0/24. If we look at the amphora-haproxy
  namespace in the amphora we see the following ip rules:

  $ sudo ip netns exec amphora-haproxy ip rule
  0:      from all lookup local
  100:    from 192.168.21.155 lookup 1 proto keepalived # from VIP
  32766:  from all lookup main
  32767:  from all lookup default

  This means when the amphora is using it's VIP as its src ip, it will reference table 1 for routing. Inspecting the available routes,
  $ sudo ip netns exec amphora-haproxy ip route show table 1
  192.168.21.0/24 dev eth1 proto keepalived scope link src 192.168.21.155

  There is only a route to the 192.168.21.0/24 subnet (vip subnet), on
  which it will use the vip address as the source address. What this
  means is that there is no route to the 172.16.0.0/24 subnet (target
  subnet) or another subnet. Essentially if the client vm is on any
  subnet that isn't the vip subnet the return path is broken

  This is not a problem in the ACTIVE or ACTIVE_ACTIVE topologies. The
  reason is that those are not maintained by keepalived and instead have
  default routes programmed into the amphora's table 1 at [1]. Note the
  if topology != consts.TOPOLOGY_ACTIVE_STANDBY predicate, which is
  indicative of the different way in which ACTIVE_STANDBY is managed.
  ACTIVE_STANDBY is instead configured by the vrrp driver populating the
  keepalived template at [2]. Unlike the other topologies, in the
  keepalived template there is no programmed default route in table 1.

  This was fixed in [3], which merged after Yoga but before Zed. This
  commit contains feature implementations and small schema changes and
  as such I'm not suggesting we SRU this change, but simply mentioning
  it for the context of affected versions. Instead, I have prepared a
  minimal patch that simply adds the default route to the template

  [1] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/backends/utils/interface_file.py?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5#n135
  [2] https://git.launchpad.net/ubuntu/+source/octavia/tree/octavia/amphorae/drivers/keepalived/jinja/templates/keepalived_base.template?h=applied/ubuntu/jammy-updates&id=65552cbabcfc7f230bc66fccfac7019d409409b5 Note that
  [3] https://github.com/openstack/octavia/commit/d9ee63f561019c247a49de5805b6d9dcbafeeadf

  [Impact]

  - Amphorae in ACTIVE_STANDBY topology exhibit an asymmetric routing
  issue that prevents traffic from passing as expected.

  - As a result of the above, the target and client VMs cannot be on a
  different subnet than the vip

  - More flexible and complicated networking implementations are not
  possible

  [Test Plan]

  - Run the following steps without using the patched octavia

  1. Deploy OpenStack with Octavia using any method you would like (via
  juju, devstack, kolla-ansible, or manually/custom) and ensure that
  when configuring Octavia, the load-balancer topology is set to
  ACTIVE_STANDBY. As there are many ways to deploy OpenStack, each with
  their own nuances, and unique steps, I don't think it's practical to
  elaborate and will leave it up to the user to choose their method for
  this step. Generally, the recommendation is to follow the upstream
  deployment guide for whichever platform you're using. I will be using
  juju. To set the topology in juju, run:

  juju config octavia loadbalancer-topology=ACTIVE_STANDBY

  2. Once the openstack services are up and the environment is ready,
  authorize the openstack command line client to the desired scope by
  sourcing the credentials any way you would like (via a .creds-rc file,
  setting the OS_CLOUD environment variable, etc.).

  3. Once authorized to the desired scope (user and project), create the
  network as described in the [Description] section:

  openstack network create net1 # This is the VIP subnet

  openstack subnet create subnet1 \
    --network net1 \
    --subnet-range 192.168.21.0/24 \
    --gateway 192.168.21.1 \
    --dns-nameserver 8.8.8.8

  openstack network create net2 # This is the subnet for the ultimate
  target machine

  openstack subnet create subnet2 \
    --network net2 \
    --subnet-range 172.16.0.0/24 \
    --gateway 172.16.0.1 \
    --dns-nameserver 8.8.8.8

  - Create a router and attach it to the two subnets

  openstack router create router1
  openstack router add subnet router1 subnet1
  openstack router add subnet router1 subnet2

  4. Create a machine on each subnet. Note that this assumes you have
  uploaded a cirros image to glance called cirros-0.4.0, have created a
  flavor called m1.tiny, and that the default security group allows SSH
  (TCP on port 22) from anywhere (0.0.0.0/0)

  openstack server create --flavor m1.tiny --image cirros-0.4.0 --net net1 server1 # This machine will act as a client on the VIP network
  openstack server create --flavor m1.tiny --image cirros-0.4.0 --net net2 server2 # This machine will act as the destination of the loadbalancer

  5. Create an amphora-based loadbalancer. This assumes you have created
  an amphora image, either manually, with octavia's diskimage-create.sh
  tool, or using the disk-image-retrofit snap, and that it has been
  properly uploaded to glance with the octavia-amphora image tag. We're
  going to use the LB to reverse proxy all ssh traffic to the target
  machine to test connectivity

  openstack loadbalancer create --name lb --vip-network-id net1 --wait
  openstack loadbalancer pool create --name pool --protocol TCP --loadbalancer lb --lb-algorithm ROUND_ROBIN --wait
  export SERVER2_IP=$(openstack server show server2 --format json --column addresses | jq --raw-output '.addresses.net2[]')
  openstack loadbalancer member create --name server2 --subnet-id subnet2 --address ${SERVER2_IP} --protocol-port 22 pool --wait
  openstack loadbalancer listener create lb --protocol TCP --protocol-port 22 --name listener --default-pool pool --wait

  At this point, the environment is configured and we should have two
  amphorae, which can be checked via

  openstack loadbalancer amphora list

  6. Now we need to start test whether or not machines can reach the
  target machine through the amphora VIP. Open up a session in the
  compute hypervisor with the needed environment variables (network
  uuids, and machine ips)

  juju ssh nova-compute/0 "export NET1_UUID=$(openstack network show
  net1 -f json | jq --raw-output .id); export NET2_UUID=$(openstack
  network show net2 -f json | jq --raw-output .id); export
  SERVER1_IP=$(openstack server show server1 --format json --column
  addresses | jq --raw-output '.addresses.net1[]'); export
  SERVER2_IP=$(openstack server show server2 --format json --column
  addresses | jq --raw-output '.addresses.net2[]'); export
  VIP_IP=$(openstack loadbalancer list -f json | jq --raw-output
  .[].vip_address); bash -l"

  - Connect to the machine on the VIP subnet

  sudo ip netns exec ovnmeta-$NET1_UUID ssh cirros@$SERVER1_IP "export
  VIP_IP=$VIP_IP; sh -l"

  - ssh to the target via the VIP

  ssh cirros@$VIP_IP # This works successfully

  - Exit back to the juju machine (nova hypervisor) and connect to the target machine
  sudo ip netns exec ovnmeta-$NET2_UUID ssh cirros@$SERVER2_IP "export VIP_IP=$VIP_IP; sh -l"

  7. Try to ssh into itself through the VIP_IP. Note that if you want,
  instead of ssh-ing to itself, you could create a third server on net2
  and validate that ssh-ing to the target machine from there through the
  VIP is also broken

  ssh cirros@$VIP_IP # This does not work, the command hangs.

  - Exit back to the machine with the openstack and juju clients

  8. Optionally, you can check that the amphora doesn't have the default
  route by copying the amphora ssh key to the octavia unit (which has
  the octavia-lb-mgmt network namespace), ssh-ing into the MASTER
  amphora, and running sudo ip netns exec amphora-haproxy ip route show
  table 1.

  9. Upgrade all octavia units to -proposed package and restart all
  octavia-* services if they do not automatically do so

  10. Fortunately, because the vrrp driver and keepalived template are
  uploaded by the octavia-worker.service to the amphora, we do not need
  to rebuild the amphora image. All we need to do is failover the
  loadbalancer so that the old amphora master instance is deleted and
  the unit that replaces it receives the updated template.

  openstack loadbalancer failover lb

  11. Repeat 6 and 7 verifying that ssh works from both subnets

  12. Optionally, repeat 8 but observe that now table 1 contains a
  default route

  [Other Info]

  - Fortunately, the configuration of amphorae in ACTIVE_STANDBY is done
  by the octavia-worker service which runs the vrrp driver that
  populates the keepalived template and then uploads the resultant
  configuration to a flask server hosted by the amphorae, which digests
  the file, writes the contents to its own filesystem, and starts the
  keepalived service. What this means is that amphorae images need not
  be rebuilt to contain the changes. Simply upgrading the machines
  running the octavia-worker service is sufficient. It also means that
  failing over an existing amphora results in the new amphora obtaining
  the route since the unit that is running octavia has been updated with
  the new template.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2117280/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list