L3 agent failed to respawn keepalived process
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Undecided
|
Hong Hui Xiao | ||
Kilo |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
I enabled the l3 ha in neutron configuration, and I usually see the following log in l3_agent.log:
2015-10-14 22:30:16.397 21460 ERROR neutron.
2015-10-14 22:30:16.397 21460 ERROR neutron.
2015-10-14 22:30:16.397 21460 DEBUG neutron.
2015-10-14 22:30:16.398 21460 DEBUG neutron.
And I noticed that the counts of vrrp pid files were usually bigger than the "pid" files:
root@neutron2:~# ls /var/lib/
664
root@neutron2:~# ls /var/lib/
677
And seems that if "pid.vrrp" file existed, we can't successfully respawn the keepalived process using this kind of command:
keepalived -P -f /var/lib/
So I think in neutron, after we checked that the pid is not active, can we check the existence of "pid" file and "vrrp pid" file and remove them before respawn the keepalived process to make sure the process can be started successfully ?
https:/
Changed in neutron: | |
assignee: | nobody → Hong Hui Xiao (xiaohhui) |
I can't reproduce this bug by rm .pid file and keep .pid-vrrp file. When I restart neutron-l3-agent, I can have keepalived process re-spawned. /github. com/acassen/ keepalived/ blob/03da0d2d03 93808bbb2feac7a bc07aaf8d647855 /keepalived/ core/main. c#L236 /github. com/acassen/ keepalived/ blob/03da0d2d03 93808bbb2feac7a bc07aaf8d647855 /keepalived/ core/main. c#L291 /github. com/acassen/ keepalived/ blob/03da0d2d03 93808bbb2feac7a bc07aaf8d647855 /keepalived/ core/pidfile. c#L92
Bug look into the keepalived code[1-3], it may be because the vrrp process is alive, while keepalived process is dead. So, neutron code can't detect the keepalived process, meanwhile, neutron can't re-spawn the keepalived process too.
[1]
https:/
[2]
https:/
[3] https:/