collect_rabbitmq_stats.sh locking can timeout causing a monitoring self-DOS
Bug #1830036 reported by
James Troup
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack RabbitMQ Server Charm |
Incomplete
|
Medium
|
Martin Kalcok |
Bug Description
collect_
Once a file is locked, the lock must be touched at least once every five minutes or the lock will be considered stale
The end result of this is that if your rabbitmq server is not down but also not returning results, the monitoring will keep running new collect_
I believe the locking should be switched to use something which does not make purely time based assumptions about freshness.
Changed in charm-rabbitmq-server: | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in charm-rabbitmq-server: | |
assignee: | nobody → Martin Kalcok (martin-kalcok) |
status: | Triaged → In Progress |
Changed in charm-rabbitmq-server: | |
status: | In Progress → Incomplete |
To post a comment you must log in.
Isn't main issue that we can have 'collect_ rabbitmq_ stats.sh' scripts hanging forever (or at least for more than a 5 minutes)? I'm not that familiar with RabbitMQ but 5 minutes just to collect stats seems a lot to me.
We could use "-t TIMEOUT" on rabbitmqctl command [1] to ensure that the nrpe script does not hang forever and we could report back unresponsive services.
[1] https:/ /www.rabbitmq. com/rabbitmqctl .8.html