No Metrics in Influx DB or Grafana for Certain Nodes MOS 9
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StackLight |
New
|
Undecided
|
Unassigned |
Bug Description
I have run into this issue multiple times and have not been able to find a solution. I have tried the following:
Restarting metric_collector on broken nodes via "CRM" or the "restart" command.
Running "crm resource cleanup metric_collector" on controllers
Force quiting collectd and hekad and restarting metric collector
Completely deleting and rebuilding my fuel environment from scratch
Running hiera and post-deployment as detailed in the LMA collector user guide.
Moving the Stacklight node to different physical servers.
My environment consists of 14 nodes including 3 controllers, 7 compute, 3 ceph, and 1 stacklight. All of the compute nodes except one and the stacklight node show metric data in Grafana. The controllers and ceph nodes DO NOT show any data in Grafana. If I manually run the queries in influxdb no data is shown. As far as I can tell all the correct collector processes are running (hekad, collectd, metric_collector) on the nodes that have no data.
It appears that metrics for the broken nodes are shown at least partly during deployment, however once deployment completes the metrics no longer show. The screenshots detail this as well.
I have attached the LMA diagnostic snapshot for all nodes. I have also placed a link below with screenshots of my Grafana dashboard. Please let me know if you need any more information, this is currently running in a DEV environment.
Screenshots: http://
Thanks!
Thanks for the report.
Looks like InfluxDB cannot handle the load (probably due to a "slow" disk): metric_ collector/ output_ queues/ influxdb_ output)
* disks are busy (sdb and sdc)
* collectors bufferize metrics (/var/cache/
IIRC, Elasticsearch is using the same disk which doesn't help.
I'd recommend to either:
* ensure that Elasticsearch and InfluxDB do not use the same disk
* OR use a SSD disk for InfluxDB
* OR enable the InfluxDB option: "Store WAL files in memory"