'/usr/sbin/rabbitmqctl wait' results in 'Error: process_not_running' during amqp-relation-changed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
rabbitmq-server (Juju Charms Collection) |
New
|
Undecided
|
Unassigned |
Bug Description
This was found during CI testing of Landscape Openstack Autopilot [1].
It looks like rabbitmq restarts unexpectedly while the charm is checking that it is running. There are multiple successful cases of the 'wait' check earlier in this log, with the same pid of 12237. However, I can not find any evidence of rabbitmq restarting in any of the log files (syslog or the rabbitmq/*log files).
The result of this failure is an error from the amqp-relation-
[from landscape-
2017-03-06 07:38:25 DEBUG juju-log amqp:41: Checking for minimum of 3 peer units
2017-03-06 07:38:25 INFO juju-log amqp:41: Sufficient number of peer units to form cluster 3
2017-03-06 07:38:26 DEBUG juju-log amqp:41: Waiting for rabbitmq app to start: /<email address hidden>
2017-03-06 07:38:26 DEBUG juju-log amqp:41: Running ['timeout', '180', '/usr/sbin/
2017-03-06 07:38:26 INFO amqp-relation-
2017-03-06 07:38:26 INFO amqp-relation-
2017-03-06 07:38:26 INFO amqp-relation-
2017-03-06 07:38:27 DEBUG juju-log amqp:41: Status of node 'rabbit@
[{pid,12414},
{running_
{os,{unix,linux}},
{erlang_
{memory,
{alarms,[]},
{listeners,
{vm_memory_
{vm_memory_
{disk_
{disk_
{file_
{processes,
{run_queue,0},
{uptime,2512}]
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 INFO amqp-relation-
2017-03-06 07:38:27 ERROR juju.worker.
[1] - https:/
(Note this is not accessible outside of the landscape team, it is included here for our reference)
Attaching log files from all of the rabbitmq-server units from the CI run [1] and base-machine-2 (which contains var/log/ ps-fauxww. txt and var/log/ps_mem.txt with processes listings and memory usage).
[1] - https:/ /ci.lscape. net/job/ landscape- system- tests/5448
(Note this is not accessible outside of the landscape team, it is included here for reference)