I have deployed cs:trusty/rabbitmq-server-43 and the service is stuck:
rabbitmq-server/0 maintenance executing 2.0-beta6 0/lxc/3 10.96.12.161 (config-changed) Waiting for rabbitmq app to start: /<email address hidden>
It's indeed waiting for a pid file to appear: 965 ? Ssl 0:00 /var/lib/juju/tools/unit-rabbitmq-server-0/jujud unit --data-dir /var/lib/juju --unit-name rabbitmq-server/0 --debug 6634 ? S 0:00 \_ /usr/bin/python /var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/config-changed 8182 ? S 0:00 \_ /bin/sh /usr/sbin/rabbitmqctl wait /<email address hidden> 8191 ? S 0:00 \_ su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmqctl "wait" "/<email address hidden>" 8192 ? Ss 0:00 \_ sh -c /usr/lib/rabbitmq/bin/rabbitmqctl "wait" "/<email address hidden>" 8193 ? Sl 0:02 \_ /usr/lib/erlang/erts-5.10.4/bin/beam.smp -- -root /usr/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.2.4/sbin/../ebin -noshell -noinput -hidden -sname rabbitmqctl8193 -boot start_clean -s rabbit_control_main -nodename rabbit@euphoric-hook -extra wait /<email address hidden>
That directory has no such file.
Turns out rabbit failed to start:
# cat /var/log/rabbitmq/startup_log ERROR: epmd error for host "euphoric-hook": nxdomain (non-existing domain)
Somehow that failure was not caught, and now the charm is telling rabbit to wait forever for a pid file that will never show up.
Looking at the charm code, wait_app() is used after a restart in all cases. Something like this:
service_restart('rabbitmq-server') rabbit.wait_app()
So bug number one: the service_restart() failure was not caught.
Bug number two: wait_app() should not wait forever. It should implement a reasonable timeout.
I have deployed cs:trusty/ rabbitmq- server- 43 and the service is stuck:
rabbitmq-server/0 maintenance executing 2.0-beta6 0/lxc/3 10.96.12.161 (config-changed) Waiting for rabbitmq app to start: /<email address hidden>
It's indeed waiting for a pid file to appear: juju/tools/ unit-rabbitmq- server- 0/jujud unit --data-dir /var/lib/juju --unit-name rabbitmq-server/0 --debug juju/agents/ unit-rabbitmq- server- 0/charm/ hooks/config- changed rabbitmqctl wait /<email address hidden> rabbitmq/ bin/rabbitmqctl "wait" "/<email address hidden>" rabbitmq/ bin/rabbitmqctl "wait" "/<email address hidden>" erlang/ erts-5. 10.4/bin/ beam.smp -- -root /usr/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/ rabbitmq/ lib/rabbitmq_ server- 3.2.4/sbin/ ../ebin -noshell -noinput -hidden -sname rabbitmqctl8193 -boot start_clean -s rabbit_control_main -nodename rabbit@ euphoric- hook -extra wait /<email address hidden>
965 ? Ssl 0:00 /var/lib/
6634 ? S 0:00 \_ /usr/bin/python /var/lib/
8182 ? S 0:00 \_ /bin/sh /usr/sbin/
8191 ? S 0:00 \_ su rabbitmq -s /bin/sh -c /usr/lib/
8192 ? Ss 0:00 \_ sh -c /usr/lib/
8193 ? Sl 0:02 \_ /usr/lib/
That directory has no such file.
Turns out rabbit failed to start:
# cat /var/log/ rabbitmq/ startup_ log
ERROR: epmd error for host "euphoric-hook": nxdomain (non-existing domain)
Somehow that failure was not caught, and now the charm is telling rabbit to wait forever for a pid file that will never show up.
Looking at the charm code, wait_app() is used after a restart in all cases. Something like this:
service_ restart( 'rabbitmq- server' ) wait_app( )
rabbit.
So bug number one: the service_restart() failure was not caught.
Bug number two: wait_app() should not wait forever. It should implement a reasonable timeout.