[Bug 2017748] Re: [SRU] OVN: ovnmeta namespaces missing during scalability test causing DHCP issues

Hua Zhang 2017748 at bugs.launchpad.net
Fri May 9 04:46:13 UTC 2025


** Description changed:

  [Impact]
  
- ovnmeta- namespaces are missing intermittently then can't reach to VMs
+ ovnmeta- namespaces are missing intermittently then can't reach to VMs.
+ 
+ The ovn metadata namespace may be missing intermittently under certain
+ conditions, such as high load. This prevents VMs from retrieving
+ metadata (e.g., ssh keys), making them unreachable. The issue is not
+ easily reproducible.
  
  [Test Case]
- Not able to reproduce this easily, so I run charmed-openstack-tester, the result is below:
  
- ======                                                                                     
- Totals                                                                                     
- ======                                                                                     
- Ran: 469 tests in 4273.6309 sec.                                                           
-  - Passed: 398                                                                             
-  - Skipped: 69                                                                             
-  - Expected Fail: 0                                                                        
-  - Unexpected Success: 0                                                                   
-  - Failed: 2                                                                               
- Sum of execute time for each test: 4387.2727 sec. 
+ This issue is theoretically reproducible under certian condistions, such
+ as high load. Howevr, in practice, it has proven extremely difficult to
+ reproduce.
+ 
+ I first talked with the fix author, Brian, who confirmed that he does
+ not have a reprodcer. I then did almost 10 tests attempts to reproduce
+ the issue, but was unsuccefully, pls refer to this pastebin for more
+ details - https://paste.ubuntu.com/p/H6vh8jycvC/
+ 
+ Given the lack of a reproducer, I continued to run the charmed-
+ openstack-tester according to SRU standards to ensure no regressions
+ were introduced.
+ 
+ and as of today (20250509), this fix has also been deployed in a
+ customer env via hotfix, and no regression issues have been observed so
+ far. Of course, it remains unclear whether the fix actually resolves the
+ original problem, as the issue itself is rare in the customer env as
+ well. But I can say for sure (99.99%) that there is no regressions.
+ 
+ Not able to reproduce this easily, so I run charmed-openstack-tester,
+ the result is below:
+ 
+ ======
+ Totals
+ ======
+ Ran: 469 tests in 4273.6309 sec.
+  - Passed: 398
+  - Skipped: 69
+  - Expected Fail: 0
+  - Unexpected Success: 0
+  - Failed: 2
+ Sum of execute time for each test: 4387.2727 sec.
  
  2 failed tests
  (tempest.api.object_storage.test_account_quotas.AccountQuotasTest and
  octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest)
- is not related to the fix
+ is not related to the ovn metadata and this fix, whether or not you use
+ this fix, you will have these 2 failed tests, so we can ignore these 2
+ failed tests.
  
  [Where problems could occur]
  This patches are related to ovn metadata agent in compute.
  VM's connectivity can possibly be affected by this patch when ovn is used.
  Biding port to datapath could be affected.
  
  [Others]
  
  == ORIGINAL DESCRIPTION ==
  
  Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=2187650
  
  During a scalability test it was noted that a few VMs where having
  issues being pinged (2 out of ~5000 VMs in the test conducted). After
  some investigation it was found that the VMs in question did not receive
  a DHCP lease:
  
  udhcpc: no lease, failing
  FAIL
  checking http://169.254.169.254/2009-04-04/instance-id
  failed 1/20: up 181.90. request failed
  
  And the ovnmeta- namespaces for the networks that the VMs was booting
  from were missing. Looking into the ovn-metadata-agent.log:
  
  2023-04-18 06:56:09.864 353474 DEBUG neutron.agent.ovn.metadata.agent
  [-] There is no metadata port for network
  9029c393-5c40-4bf2-beec-27413417eafa or it has no MAC or IP addresses
  configured, tearing the namespace down if needed _get_provision_params
  /usr/lib/python3.9/site-packages/neutron/agent/ovn/metadata/agent.py:495
  
  Apparently, when the system is under stress (scalability tests) there
  are some edge cases where the metadata port information has not yet
  being propagated by OVN to the Southbound database and when the
  PortBindingChassisEvent event is being handled and try to find either
  the metadata port of the IP information on it (which is updated by
  ML2/OVN during subnet creation) it can not be found and fails silently
  with the error shown above.
  
  Note that, running the same tests but with less concurrency did not
  trigger this issue. So only happens when the system is overloaded.

** Changed in: neutron (Ubuntu Jammy)
       Status: Incomplete => New

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/2017748

Title:
  [SRU] OVN:  ovnmeta namespaces missing during scalability test causing
  DHCP issues

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive antelope series:
  Won't Fix
Status in Ubuntu Cloud Archive bobcat series:
  Won't Fix
Status in Ubuntu Cloud Archive caracal series:
  Fix Released
Status in Ubuntu Cloud Archive dalmatian series:
  Fix Released
Status in Ubuntu Cloud Archive epoxy series:
  Fix Released
Status in Ubuntu Cloud Archive yoga series:
  In Progress
Status in Ubuntu Cloud Archive zed series:
  Won't Fix
Status in neutron:
  New
Status in neutron ussuri series:
  Fix Released
Status in neutron victoria series:
  New
Status in neutron wallaby series:
  New
Status in neutron xena series:
  New
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  In Progress
Status in neutron source package in Jammy:
  New
Status in neutron source package in Noble:
  Fix Released
Status in neutron source package in Oracular:
  Fix Released
Status in neutron source package in Plucky:
  Fix Released

Bug description:
  [Impact]

  ovnmeta- namespaces are missing intermittently then can't reach to
  VMs.

  The ovn metadata namespace may be missing intermittently under certain
  conditions, such as high load. This prevents VMs from retrieving
  metadata (e.g., ssh keys), making them unreachable. The issue is not
  easily reproducible.

  [Test Case]

  This issue is theoretically reproducible under certian condistions,
  such as high load. Howevr, in practice, it has proven extremely
  difficult to reproduce.

  I first talked with the fix author, Brian, who confirmed that he does
  not have a reprodcer. I then did almost 10 tests attempts to reproduce
  the issue, but was unsuccefully, pls refer to this pastebin for more
  details - https://paste.ubuntu.com/p/H6vh8jycvC/

  Given the lack of a reproducer, I continued to run the charmed-
  openstack-tester according to SRU standards to ensure no regressions
  were introduced.

  and as of today (20250509), this fix has also been deployed in a
  customer env via hotfix, and no regression issues have been observed
  so far. Of course, it remains unclear whether the fix actually
  resolves the original problem, as the issue itself is rare in the
  customer env as well. But I can say for sure (99.99%) that there is no
  regressions.

  Not able to reproduce this easily, so I run charmed-openstack-tester,
  the result is below:

  ======
  Totals
  ======
  Ran: 469 tests in 4273.6309 sec.
   - Passed: 398
   - Skipped: 69
   - Expected Fail: 0
   - Unexpected Success: 0
   - Failed: 2
  Sum of execute time for each test: 4387.2727 sec.

  2 failed tests
  (tempest.api.object_storage.test_account_quotas.AccountQuotasTest and
  octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest)
  is not related to the ovn metadata and this fix, whether or not you
  use this fix, you will have these 2 failed tests, so we can ignore
  these 2 failed tests.

  [Where problems could occur]
  This patches are related to ovn metadata agent in compute.
  VM's connectivity can possibly be affected by this patch when ovn is used.
  Biding port to datapath could be affected.

  [Others]

  == ORIGINAL DESCRIPTION ==

  Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=2187650

  During a scalability test it was noted that a few VMs where having
  issues being pinged (2 out of ~5000 VMs in the test conducted). After
  some investigation it was found that the VMs in question did not
  receive a DHCP lease:

  udhcpc: no lease, failing
  FAIL
  checking http://169.254.169.254/2009-04-04/instance-id
  failed 1/20: up 181.90. request failed

  And the ovnmeta- namespaces for the networks that the VMs was booting
  from were missing. Looking into the ovn-metadata-agent.log:

  2023-04-18 06:56:09.864 353474 DEBUG neutron.agent.ovn.metadata.agent
  [-] There is no metadata port for network
  9029c393-5c40-4bf2-beec-27413417eafa or it has no MAC or IP addresses
  configured, tearing the namespace down if needed _get_provision_params
  /usr/lib/python3.9/site-
  packages/neutron/agent/ovn/metadata/agent.py:495

  Apparently, when the system is under stress (scalability tests) there
  are some edge cases where the metadata port information has not yet
  being propagated by OVN to the Southbound database and when the
  PortBindingChassisEvent event is being handled and try to find either
  the metadata port of the IP information on it (which is updated by
  ML2/OVN during subnet creation) it can not be found and fails silently
  with the error shown above.

  Note that, running the same tests but with less concurrency did not
  trigger this issue. So only happens when the system is overloaded.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2017748/+subscriptions




More information about the Ubuntu-openstack-bugs mailing list