2/3 snat namespace transitions to master
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
New
|
Undecided
|
Unassigned |
Bug Description
neutron version: 14.0.2
general deployment version: stein
deployment method: kolla-ansible
neutron configuration:
- l3 = ha
- agent_mode = dvr_snat
- ovs
general info: multi node deployment, ca ~100 computes
when spawning larger heat stacks with multiple instances (think k8s infrastructure) sometimes (roughly 50%) we get a "split brain" on snat namespaces.
logs looks like this on one of the three controller/network nodes.
11:53:43.402 Handling notification for router 2a218a31-
Router 2a218a31-
and then this happens on another of the three controller/network nodes.
11:53:57.582 Handling notification for router 2a218a31-
11:53:57.583 Router 2a218a31-
so neutron sets up all routes in both controller nodes and wrecks havoc on session that instances are creating to the outside. obviously deleting the routes from the faulty namespace solves the issue.
i can't really find the reason for it being promoted to master even when looking through the debug logs. would greatly appreciate any helpful pointers.
the only thing i can think of is some kind of race condition happening and therefor everything in neutron looks fine.
Hi,
It's keepalived process which decides which node is master and which is backup. Can You check in keepalived logs - maybe there is some info about what is the reason of such problem.