Neutron L3 agent doesn't reschedule routers when MQ is down
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mirantis OpenStack |
Invalid
|
High
|
MOS Neutron | ||
6.0.x |
Invalid
|
High
|
MOS Maintenance | ||
6.1.x |
Invalid
|
High
|
MOS Maintenance | ||
7.0.x |
Invalid
|
High
|
MOS Neutron | ||
8.0.x |
Invalid
|
High
|
MOS Neutron |
Bug Description
HA, Neutron/VLAN, Ubuntu, MOS 6.0 (Juno); Kilo is likely to be affected as well.
Easiest way to reproduce:
- deploy HA/Neutron+
- reboot a controller that has "qrouter" network namespace
Expected result:
- router is rescheduled to another controller, along with qrouter namespace
Actual result:
- router is not rescheduled
Additional analysis:
When L3 agent starts, it runs periodic_
Such sync is being performed on any router/agent update. After a successful sync, "fullsync" parameter is being set to False [2]. However, if a controller node which hosts a router has been rebooted, other nodes do not perform a fullsync; as a result, routers are not being rescheduled from a dead L3 agent. This is likely due to failover in MOS 6.0 requiring a shutdown and reconfiguration of RabbitMQ cluster, during which RPC calls are not available. So server tries to auto-reschedule routers from a dead agent (due to "allow_
[1] https:/
[2] https:/
Update1:
Server gives up since there are no alive L3 agents left for a duration of HA failover (during which RabbitMQ cluster is down which leads to agents being marked as dead):
2015-09-09 07:40:33.247 15781 WARNING neutron. scheduler. l3_agent_ scheduler [-] No active L3 agents db.l3_agentsche dulers_ db [-] Failed to reschedule router 5924f1d2- a47c-4085- af02-79fa381cfe 5d db.l3_agentsche dulers_ db Traceback (most recent call last): db.l3_agentsche dulers_ db File "/usr/lib/ python2. 7/dist- packages/ neutron/ db/l3_agentsche dulers_ db.py", line 136, in reschedule_ routers_ from_down_ agents db.l3_agentsche dulers_ db self.reschedule _router( context, binding.router_id) db.l3_agentsche dulers_ db File "/usr/lib/ python2. 7/dist- packages/ neutron/ db/l3_agentsche dulers_ db.py", line 273, in reschedule_router db.l3_agentsche dulers_ db router_ id=router_ id) db.l3_agentsche dulers_ db RouterReschedul ingFailed: Failed rescheduling router 5924f1d2- a47c-4085- af02-79fa381cfe 5d: no eligible l3 agent found.
2015-09-09 07:40:33.252 15781 ERROR neutron.
2015-09-09 07:40:33.252 15781 TRACE neutron.
2015-09-09 07:40:33.252 15781 TRACE neutron.
2015-09-09 07:40:33.252 15781 TRACE neutron.
2015-09-09 07:40:33.252 15781 TRACE neutron.
2015-09-09 07:40:33.252 15781 TRACE neutron.
2015-09-09 07:40:33.252 15781 TRACE neutron.
Update 2: /bugs.launchpad .net/neutron/ +bug/1403921
There seems to be exactly same or similar bug in Neutron: https:/
However it has been marked as expired.