OVN: ovnmeta namespaces missing during scalability test causing DHCP issues

Bug #2017748 reported by Lucas Alvares Gomes
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
In Progress
High
Lucas Alvares Gomes

Bug Description

Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=2187650

During a scalability test it was noted that a few VMs where having issues being pinged (2 out of ~5000 VMs in the test conducted). After some investigation it was found that the VMs in question did not receive a DHCP lease:

udhcpc: no lease, failing
FAIL
checking http://169.254.169.254/2009-04-04/instance-id
failed 1/20: up 181.90. request failed

And the ovnmeta- namespaces for the networks that the VMs was booting from were missing. Looking into the ovn-metadata-agent.log:

2023-04-18 06:56:09.864 353474 DEBUG neutron.agent.ovn.metadata.agent [-] There is no metadata port for network 9029c393-5c40-4bf2-beec-27413417eafa or it has no MAC or IP addresses configured, tearing the namespace down if needed _get_provision_params /usr/lib/python3.9/site-packages/neutron/agent/ovn/metadata/agent.py:495

Apparently, when the system is under stress (scalability tests) there are some edge cases where the metadata port information has not yet being propagated by OVN to the Southbound database and when the PortBindingChassisEvent event is being handled and try to find either the metadata port of the IP information on it (which is updated by ML2/OVN during subnet creation) it can not be found and fails silently with the error shown above.

Note that, running the same tests but with less concurrency did not trigger this issue. So only happens when the system is overloaded.

Tags: ovn
Changed in neutron:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/881487
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.