HA router state may be set incorrectly in the Neutron DB in some cases

Bug #2030735 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Low
Unassigned

Bug Description

In case when Neutron-L3-agent reports state of the HA router to the neutron server, like:

2023-07-30 23:06:49.216 3139 DEBUG neutron.agent.l3.ha [-] Updating server with HA routers states {'f5d52aaf-30e1-4396-bde4-8acb7506c301': 'active'} notify_server /usr/lib/python3.9/site-packages/neutron/agent/l3/ha.py:243

It may happen that e.g. during reboot of controller or some other fault in the cloud (we hit that in the faults Tobiko test in our downstream CI), this message potentially can never be delivered to the neutron server and state of the router will be incorrect in the Neutron DB thus incorrectly reported through Neutron API.

I see 2 potential ways to solve that issue:
* L3 agent with HA routers could maybe include state of all routers in the heartbeat and then Neutron server could update it's state in db while processing heartbeat messages from L3 agent or
* Neutron server would periodically ask each L3 agent with HA routers about state of the routers and update it in the DB accordingly.

tags: added: low-hanging-fruit
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.