Comment 0 for bug 1947585

Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

This is a spin-off of lp:1927277. On a fresh deployment, some ceilometer-agent units end up blocked due to the ceilometer-agent-compute not running.

The ceilometer-agent-compute.log show no problem.

`systemctl status ceilometer-agent-compute` shows the service as failed and `journalctl -u ceilometer-agent-compute` shows

Oct 18 10:14:25 solqa-lab1-server-11 systemd[1]: ceilometer-agent-compute.service: Start request repeated too quickly.
Oct 18 10:14:25 solqa-lab1-server-11 systemd[1]: ceilometer-agent-compute.service: Failed with result 'start-limit-hit'.
Oct 18 10:14:25 solqa-lab1-server-11 systemd[1]: Failed to start Ceilometer Agent Compute.

I suspect ceilometer-agent-compute was being brought up before the nova-compute service, although ceilometer-agent-compute has a dependency to nova-compute on service level. Or maybe ceilometer-agent-compute was being brought up at a very specific moment when nova-compute was down. This seems to happen often on certain labs and pretty much never on other labs, so this smells like a race condition.