Ceilometer-agent-compute service not running after Scale out of nova-cloud-controller application
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Ceilometer Agent Charm |
New
|
Undecided
|
Unassigned |
Bug Description
On a deployment all but 2 (out of 114 units) of the ceilometer agent units are blocked in a state with "Services not running that should be: ceilometer-
This is mostly due to nova-compute.
The workaround was simply to run a systemctl restart ceilometer-
System is ussuri/focal, charm revs.:
cs:ceilometer-282, cs:ceilometer-
cs:~openstack-
cs:nova-compute-334
(FCE template 21.10)
Event flow on one machine:
- ceilometer-
- last entry in this journalctl ceilometer-
- nova-compute-
- at between 17:30:04 and 17:30:16 the nova-compute juju application unit's cloud-compute:483 relation is being executed and finished. Unit is ready.
But there is no communication between the nova-compute unit and the ceilometer-agent juju units, hence the ceilometer-agent is never started back again and stuck stopped.
Getting out logs from the systems are hard, it is a secured environment going through locked down jump hosts.
tags: | added: aubergine |
I believe this is a duplicate of lp:1947585