[Bug 2038529] Fix merged to octavia (stable/2025.1)

Fri Oct 17 06:44:26 UTC 2025

Reviewed:  https://review.opendev.org/c/openstack/octavia/+/961053
Committed: https://opendev.org/openstack/octavia/commit/6cb77654934f3e84641d7f7e0554492217ca6483
Submitter: "Zuul (22348)"
Branch:    stable/2025.1

commit 6cb77654934f3e84641d7f7e0554492217ca6483
Author: Gregory Thiemonge <gthiemon at redhat.com>
Date:   Thu Oct 5 11:13:57 2023 -0400

    Fix race condition in cascade delete

    update_vip was called multiple times concurrently when cascade deleting
    a load balancer with many listeners, it may trigger a race condition
    when fetching, computing and updating the SGs.
    Calling update_vip for each listener is not necessary, it's now called
    only once, that fixes the race condition and optimize the delete flow.

    Closes-Bug: #2038529
    Change-Id: I4a6e4830d0e916b1af1a11dd10097980a57d97ea
    Signed-off-by: Gregory Thiemonge <gthiemon at redhat.com>
    (cherry picked from commit 5c802aad949e4abd878f326c53123ddf8a299be0)

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2038529

Title:
  Cascade delete with prom listener fails

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive antelope series:
  Won't Fix
Status in Ubuntu Cloud Archive bobcat series:
  Won't Fix
Status in Ubuntu Cloud Archive caracal series:
  New
Status in Ubuntu Cloud Archive dalmatian series:
  New
Status in Ubuntu Cloud Archive epoxy series:
  New
Status in Ubuntu Cloud Archive flamingo series:
  Fix Released
Status in Ubuntu Cloud Archive ussuri series:
  New
Status in Ubuntu Cloud Archive victoria series:
  Won't Fix
Status in Ubuntu Cloud Archive wallaby series:
  Won't Fix
Status in Ubuntu Cloud Archive xena series:
  Won't Fix
Status in Ubuntu Cloud Archive yoga series:
  New
Status in Ubuntu Cloud Archive zed series:
  Won't Fix
Status in octavia:
  Fix Released
Status in octavia package in Ubuntu:
  New
Status in octavia source package in Focal:
  New
Status in octavia source package in Jammy:
  New
Status in octavia source package in Noble:
  New
Status in octavia source package in Plucky:
  New
Status in octavia source package in Questing:
  Fix Released

Bug description:
  Greetings,

  a couple of days ago we upgraded octavia to yoga(10.1.0) in our test environment.
  We also upgraded our octavia-tempest-plguin version to 2.4.1 to get the new prometheus listener tests.

  Since those upgrades tempest fails for its tearDownClass in
  `octavia_tempest_plugin.tests.api.v2.test_listener.ListenerAPITest.*`.

  As this fails 'almost' everytime for us I tried to debug this and for
  me it seems that there could be a race condition in cascade delete.

  The traceback I am getting for why the cascade delete is not working is the following:
  [Traceback (most recent call last):,   File "/var/lib/kolla/venv/lib/python3.8/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task,     result = task.execute(**arguments),   File "/var/lib/kolla/venv/lib/python3.8/site-packages/octavia/controller/worker/v2/tasks/network_tasks.py", line 704, in execute,     self.network_driver.update_vip(loadbalancer, for_delete=True),   File "/var/lib/kolla/venv/lib/python3.8/site-packages/octavia/network/drivers/neutron/allowed_address_pairs.py", line 644, in update_vip,     self._update_security_group_rules(load_balancer,,   File "/var/lib/kolla/venv/lib/python3.8/site-packages/octavia/network/drivers/neutron/allowed_address_pairs.py", line 221, in _update_security_group_rules,     self._create_security_group_rule(sec_grp_id, port_protocol[1],,   File "/var/lib/kolla/venv/lib/python3.8/site-packages/octavia/network/drivers/neutron/base.py", line 160, in _create_security_group_rule,     self.neutron_client.create_security_group_rule(rule),   File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 1049, in create_security_group_rule,     return self.post(self.security_group_rules_path, body=body),   File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 361, in post,     return self.do_request("POST", action, body=body,,   File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 297, in do_request,     self._handle_fault_response(status_code, replybody, resp),   File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 272, in _handle_fault_response,     exception_handler_v20(status_code, error_body),   File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 90, in exception_handler_v20,     raise client_exc(message=error_message,, neutronclient.common.exceptions.Conflict: Security group rule already exists. Rule id is 08bedc57-cc6e-41bb-8a13-597887980dc5., Neutron server returns request_ids: ['req-f1bdc5cc-bfda-412d-952a-98eb4e18dc81']]

  This is getting triggert from the following flow:
  Task 'delete_update_vip_8beed3b6-b8e8-472b-a9a4-883a52675176' (33c5a41f-f3ab-4406-831e-4175d353d585) transitioned into state 'FAILURE' from state 'RUNNING'

  After digging through the code the delete is going through the
  following code [1] which it should never go through on a delete task?

  If I downgrade the octavia-tempest-plugin to a version that does not
  include the Prometheus protocol the delete always works without any
  issue which makes me to believe that there might be some race
  condition when the new prometheus listener is configured on a
  loadbalancer.

  The lb that got into a provisioning_status ERROR after a cascade
  delete can correctly be deleted when executing a cascade delete a
  second time on the loadbalancer.

  Does anyone maybe has an idea what this could be triggered by?

  [1]
  https://github.com/openstack/octavia/blob/10.1.0/octavia/network/drivers/neutron/allowed_address_pairs.py#L220-L225

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2038529/+subscriptions