rabbitmq-server died abruptly
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack RabbitMQ Server Charm |
Triaged
|
High
|
Unassigned |
Bug Description
We saw a rabbimq-server unit which is the leader in a cluster with 2 others die abruptly. Cluster status on other units remained ok, except for inability to communicate with the downed node. We were unable to determine a root cause. We've seen this happen on 3 other occasions and on one we observed that before starting the rabbit service the jujud agent was in error state; afterwards it was in executing state. I do not know if that was the case this time.
== LOGS ETC. ==
Service status after failure:
https:/
=======
From juju unit log, I see repetitions similar to the first section (02:05:42-02:05:44) going back for days. At the time of the failure 02:05:45, it abruptly cuts away to reporting inability to connect. This section of enteries (02:05:45-02:10:49) basically repeats until the service is manually restarted around 02:46:18, where it goes back to the WARNING and entries resembling the first section:
https:/
=======
In the rabbit logs (/<email address hidden>) we see it abruptly stop logging at the time of death and then start up again with the unit starting up at recovery:
https:/
=======
syslog from time of failure to time of recovery:
tags: | added: canonical-bootstack |
This cloud is running juju 2.2.4-xenial-amd64