[Bug 2017748] Re: [SRU] OVN: ovnmeta namespaces missing during scalability test causing DHCP issues
Matthew Ruffell
2017748 at bugs.launchpad.net
Wed Apr 16 23:45:47 UTC 2025
Feedback for Joshua:
I have attached the final jammy/yoga debdiff for you to study. The same things
apply since when I last sponsored your octavia upload.
- debian/changelog: You need to describe your change and then follow with the
patch file, instead of just having a single line of the patch file. The
description I wrote is:
* Under heavy load, OVN metadata notifications can be held up
leading to ovsdb-server merging insert and update notifications.
This can lead to metadata port being missing for some VMs which
breaks connectivity, e.g. missing DHCP leases. (LP: #2017748)
- d/p/lp2017748-handle-creation-of-Port_Binding-with-chassis-set.patch
- I renamed the patch to "lp2017748-handle-creation-of-Port_Binding-with-chassis-set.patch"
to put the lp bug number infront of it.
- I refreshed the patch, and moved the dep3 tags to under the Subject block.
I also indented the Subject block to match dep3 requirements.
As for the SRU template, I think you really need to be more descriptive of what
the change does, e.g. the impact section needs to be more than one line of
> ovnmeta- namespaces are missing intermittently then can't reach to VMs
I sponsored for now due to the original description having the necessary
details for the SRU Team to make an informed decision.
For the "where problems could occur" section, I think you really need to consider
the impact to users if a regression were to occur, and what symptoms users would
likely see, and how they might be able to correct it / workaround it.
** Patch added: "Final debdiff for jammy/yoga"
https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/2017748/+attachment/5872269/+files/lp2017748_jammy.debdiff
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2017748
Title:
[SRU] OVN: ovnmeta namespaces missing during scalability test causing
DHCP issues
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive antelope series:
Won't Fix
Status in Ubuntu Cloud Archive bobcat series:
Won't Fix
Status in Ubuntu Cloud Archive caracal series:
Fix Released
Status in Ubuntu Cloud Archive dalmatian series:
Fix Released
Status in Ubuntu Cloud Archive epoxy series:
Fix Released
Status in Ubuntu Cloud Archive yoga series:
In Progress
Status in Ubuntu Cloud Archive zed series:
Won't Fix
Status in neutron:
New
Status in neutron ussuri series:
Fix Released
Status in neutron victoria series:
New
Status in neutron wallaby series:
New
Status in neutron xena series:
New
Status in neutron package in Ubuntu:
Fix Released
Status in neutron source package in Focal:
In Progress
Status in neutron source package in Jammy:
In Progress
Status in neutron source package in Noble:
Fix Released
Status in neutron source package in Oracular:
Fix Released
Status in neutron source package in Plucky:
Fix Released
Bug description:
[Impact]
ovnmeta- namespaces are missing intermittently then can't reach to VMs
[Test Case]
Not able to reproduce this easily, so I run charmed-openstack-tester, the result is below:
======
Totals
======
Ran: 469 tests in 4273.6309 sec.
- Passed: 398
- Skipped: 69
- Expected Fail: 0
- Unexpected Success: 0
- Failed: 2
Sum of execute time for each test: 4387.2727 sec.
2 failed tests
(tempest.api.object_storage.test_account_quotas.AccountQuotasTest and
octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest)
is not related to the fix
[Where problems could occur]
This patches are related to ovn metadata agent in compute.
VM's connectivity can possibly be affected by this patch when ovn is used.
Biding port to datapath could be affected.
[Others]
== ORIGINAL DESCRIPTION ==
Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=2187650
During a scalability test it was noted that a few VMs where having
issues being pinged (2 out of ~5000 VMs in the test conducted). After
some investigation it was found that the VMs in question did not
receive a DHCP lease:
udhcpc: no lease, failing
FAIL
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 181.90. request failed
And the ovnmeta- namespaces for the networks that the VMs was booting
from were missing. Looking into the ovn-metadata-agent.log:
2023-04-18 06:56:09.864 353474 DEBUG neutron.agent.ovn.metadata.agent
[-] There is no metadata port for network
9029c393-5c40-4bf2-beec-27413417eafa or it has no MAC or IP addresses
configured, tearing the namespace down if needed _get_provision_params
/usr/lib/python3.9/site-
packages/neutron/agent/ovn/metadata/agent.py:495
Apparently, when the system is under stress (scalability tests) there
are some edge cases where the metadata port information has not yet
being propagated by OVN to the Southbound database and when the
PortBindingChassisEvent event is being handled and try to find either
the metadata port of the IP information on it (which is updated by
ML2/OVN during subnet creation) it can not be found and fails silently
with the error shown above.
Note that, running the same tests but with less concurrency did not
trigger this issue. So only happens when the system is overloaded.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2017748/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list