[Bionic/Stein] Ceilometer-agent fails to collect metrics after restart
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Ceilometer Agent Charm |
Confirmed
|
Undecided
|
Unassigned | ||
Ubuntu Cloud Archive |
Fix Committed
|
Medium
|
Unassigned | ||
Stein |
Fix Committed
|
Medium
|
Unassigned | ||
Train |
Fix Released
|
Medium
|
Unassigned | ||
Ussuri |
Fix Released
|
Medium
|
Unassigned | ||
Victoria |
Fix Released
|
Medium
|
Unassigned | ||
ceilometer (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Focal |
Fix Released
|
Medium
|
Unassigned | ||
Groovy |
Fix Released
|
Medium
|
Unassigned | ||
Hirsute |
Fix Released
|
Medium
|
Unassigned |
Bug Description
Bionic/Stein - stable 20.05 charms
Juju 2.7.6
I am aware of: https:/
Decided to open a new bug since there was no activity on the previous one and it expired.
After rebooting my cloud (rack-by-rack), I got into a situation where I could not collect memory.usage from VMs anymore.
Looking into: openstack metric resource --type instance <ID>
I could not see memory.usage there.
Access to ceilometer-agent and I could see the services were on active/running status, but following log was present:
Jun 27 22:34:09 sgdemr0114bp033 ceilometer-
Jun 27 22:34:09 sgdemr0114bp033 ceilometer-
Jun 27 22:34:09 sgdemr0114bp033 ceilometer-
stat on that /var/run file shows me:
stat /var/run/
File: /var/run/
Size: 0 Blocks: 0 IO Block: 4096 socket
Device: 17h/23d Inode: 1289 Links: 1
Access: (0777/srwxrwxrwx) Uid: ( 0/ root) Gid: ( 118/ libvirt)
Access: 2020-06-28 14:28:47.292838669 +0000
Modify: 2020-06-27 22:34:11.010520529 +0000
Change: 2020-06-27 22:34:11.010520529 +0000
Birth: -
So, I guess there is a race-condition here, where libvirt is opening the socket after ceilometer-
Restarting it restores memory.usage back to normal.
However, I still cannot see all the metrics as shown in: https:/
Changed in charm-ceilometer-agent: | |
status: | New → Confirmed |
I'm seeing this as well.
On startup of a bionic-stein nova-compute node with ceilometer-agent subordinate, I'm seeing the following after upgrading to ceilometer- agent-compute 1:12.1. 1-0ubuntu1~ cloud0, when a node reboots, We see the following in the ceilometer-agent log and no logs after this point, though the service continues to run:
2021-02-16 03:17:13.423 3963 WARNING ceilometer. polling. manager [-] No valid pollsters can be loaded from ['compute'] namespaces
When I checked the service startup in systemd, I found that nova-compute hadn't started up until 2021-02-16 03:19
I'm wondering if there's a missing "Should-Start: nova-compute. service" dependency missing from ceilometer-agent, or if ceilometer-agent should have a retry loop to re-check for valid pollsters for the failed namespace until the compute service is online for querying.