[OVN] Hash Ring nodes removed when "periodic worker" is killed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
In Progress
|
High
|
Lucas Alvares Gomes |
Bug Description
Reported at: https:/
In the ML2/OVN driver we set a signal handler for SIGTERM to remove the hash ring nodes upon the service exit [0] but, during the investigation of one bug with a customer we identified that an unrelated Neutron worker is killed (such as the "periodic worker" in this case) this could lead to that process removing the entries from the ovn_hash_ring table for that hostname.
If this happens on all controllers, the ovn_hash_ring table is rendered empty and OVSDB events are no longer processed by ML2/OVN.
Proposed solution:
This LP proposes to make this more reliable by instead of removing the nodes from the ovn_hash_ring table at exiting, we would mark them as offline instead. That way, if a worker dies the nodes will remain registered in the table and the heartbeat thread will set them as online again on the next beat. If the service is properly stopped the heartbeat won't be running and the nodes will be seeing as offline to the Hash Ring manager.
As a note, upon the next startup of the service the nodes matching the server hostname will be removed from the ovn_hash_ring table and added again accordingly as Neutron worker are spawned [1].
[0] https:/
[1] https:/
Changed in neutron: | |
status: | Fix Committed → Confirmed |
Fix proposed to branch: master /review. opendev. org/c/openstack /neutron/ +/886279
Review: https:/