[Bug 2038529] Fix merged to octavia (stable/2025.1)
OpenStack Infra
2038529 at bugs.launchpad.net
Fri Oct 17 06:44:26 UTC 2025
Reviewed: https://review.opendev.org/c/openstack/octavia/+/961053
Committed: https://opendev.org/openstack/octavia/commit/6cb77654934f3e84641d7f7e0554492217ca6483
Submitter: "Zuul (22348)"
Branch: stable/2025.1
commit 6cb77654934f3e84641d7f7e0554492217ca6483
Author: Gregory Thiemonge <gthiemon at redhat.com>
Date: Thu Oct 5 11:13:57 2023 -0400
Fix race condition in cascade delete
update_vip was called multiple times concurrently when cascade deleting
a load balancer with many listeners, it may trigger a race condition
when fetching, computing and updating the SGs.
Calling update_vip for each listener is not necessary, it's now called
only once, that fixes the race condition and optimize the delete flow.
Closes-Bug: #2038529
Change-Id: I4a6e4830d0e916b1af1a11dd10097980a57d97ea
Signed-off-by: Gregory Thiemonge <gthiemon at redhat.com>
(cherry picked from commit 5c802aad949e4abd878f326c53123ddf8a299be0)
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2038529
Title:
Cascade delete with prom listener fails
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive antelope series:
Won't Fix
Status in Ubuntu Cloud Archive bobcat series:
Won't Fix
Status in Ubuntu Cloud Archive caracal series:
New
Status in Ubuntu Cloud Archive dalmatian series:
New
Status in Ubuntu Cloud Archive epoxy series:
New
Status in Ubuntu Cloud Archive flamingo series:
Fix Released
Status in Ubuntu Cloud Archive ussuri series:
New
Status in Ubuntu Cloud Archive victoria series:
Won't Fix
Status in Ubuntu Cloud Archive wallaby series:
Won't Fix
Status in Ubuntu Cloud Archive xena series:
Won't Fix
Status in Ubuntu Cloud Archive yoga series:
New
Status in Ubuntu Cloud Archive zed series:
Won't Fix
Status in octavia:
Fix Released
Status in octavia package in Ubuntu:
New
Status in octavia source package in Focal:
New
Status in octavia source package in Jammy:
New
Status in octavia source package in Noble:
New
Status in octavia source package in Plucky:
New
Status in octavia source package in Questing:
Fix Released
Bug description:
Greetings,
a couple of days ago we upgraded octavia to yoga(10.1.0) in our test environment.
We also upgraded our octavia-tempest-plguin version to 2.4.1 to get the new prometheus listener tests.
Since those upgrades tempest fails for its tearDownClass in
`octavia_tempest_plugin.tests.api.v2.test_listener.ListenerAPITest.*`.
As this fails 'almost' everytime for us I tried to debug this and for
me it seems that there could be a race condition in cascade delete.
The traceback I am getting for why the cascade delete is not working is the following:
[Traceback (most recent call last):, File "/var/lib/kolla/venv/lib/python3.8/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task, result = task.execute(**arguments), File "/var/lib/kolla/venv/lib/python3.8/site-packages/octavia/controller/worker/v2/tasks/network_tasks.py", line 704, in execute, self.network_driver.update_vip(loadbalancer, for_delete=True), File "/var/lib/kolla/venv/lib/python3.8/site-packages/octavia/network/drivers/neutron/allowed_address_pairs.py", line 644, in update_vip, self._update_security_group_rules(load_balancer,, File "/var/lib/kolla/venv/lib/python3.8/site-packages/octavia/network/drivers/neutron/allowed_address_pairs.py", line 221, in _update_security_group_rules, self._create_security_group_rule(sec_grp_id, port_protocol[1],, File "/var/lib/kolla/venv/lib/python3.8/site-packages/octavia/network/drivers/neutron/base.py", line 160, in _create_security_group_rule, self.neutron_client.create_security_group_rule(rule), File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 1049, in create_security_group_rule, return self.post(self.security_group_rules_path, body=body), File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 361, in post, return self.do_request("POST", action, body=body,, File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 297, in do_request, self._handle_fault_response(status_code, replybody, resp), File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 272, in _handle_fault_response, exception_handler_v20(status_code, error_body), File "/var/lib/kolla/venv/lib/python3.8/site-packages/neutronclient/v2_0/client.py", line 90, in exception_handler_v20, raise client_exc(message=error_message,, neutronclient.common.exceptions.Conflict: Security group rule already exists. Rule id is 08bedc57-cc6e-41bb-8a13-597887980dc5., Neutron server returns request_ids: ['req-f1bdc5cc-bfda-412d-952a-98eb4e18dc81']]
This is getting triggert from the following flow:
Task 'delete_update_vip_8beed3b6-b8e8-472b-a9a4-883a52675176' (33c5a41f-f3ab-4406-831e-4175d353d585) transitioned into state 'FAILURE' from state 'RUNNING'
After digging through the code the delete is going through the
following code [1] which it should never go through on a delete task?
If I downgrade the octavia-tempest-plugin to a version that does not
include the Prometheus protocol the delete always works without any
issue which makes me to believe that there might be some race
condition when the new prometheus listener is configured on a
loadbalancer.
The lb that got into a provisioning_status ERROR after a cascade
delete can correctly be deleted when executing a cascade delete a
second time on the loadbalancer.
Does anyone maybe has an idea what this could be triggered by?
[1]
https://github.com/openstack/octavia/blob/10.1.0/octavia/network/drivers/neutron/allowed_address_pairs.py#L220-L225
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2038529/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list