Comment 0 for bug 1575349

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I have deployed cs:trusty/rabbitmq-server-43 and the service is stuck:

rabbitmq-server/0 maintenance executing 2.0-beta6 0/lxc/3 10.96.12.161 (config-changed) Waiting for rabbitmq app to start: /<email address hidden>

It's indeed waiting for a pid file to appear:
  965 ? Ssl 0:00 /var/lib/juju/tools/unit-rabbitmq-server-0/jujud unit --data-dir /var/lib/juju --unit-name rabbitmq-server/0 --debug
 6634 ? S 0:00 \_ /usr/bin/python /var/lib/juju/agents/unit-rabbitmq-server-0/charm/hooks/config-changed
 8182 ? S 0:00 \_ /bin/sh /usr/sbin/rabbitmqctl wait /<email address hidden>
 8191 ? S 0:00 \_ su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmqctl "wait" "/<email address hidden>"
 8192 ? Ss 0:00 \_ sh -c /usr/lib/rabbitmq/bin/rabbitmqctl "wait" "/<email address hidden>"
 8193 ? Sl 0:02 \_ /usr/lib/erlang/erts-5.10.4/bin/beam.smp -- -root /usr/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.2.4/sbin/../ebin -noshell -noinput -hidden -sname rabbitmqctl8193 -boot start_clean -s rabbit_control_main -nodename rabbit@euphoric-hook -extra wait /<email address hidden>

That directory has no such file.

Turns out rabbit failed to start:

# cat /var/log/rabbitmq/startup_log
ERROR: epmd error for host "euphoric-hook": nxdomain (non-existing domain)

Somehow that failure was not caught, and now the charm is telling rabbit to wait forever for a pid file that will never show up.

Looking at the charm code, wait_app() is used after a restart in all cases. Something like this:

    service_restart('rabbitmq-server')
    rabbit.wait_app()

So bug number one: the service_restart() failure was not caught.

Bug number two: wait_app() should not wait forever. It should implement a reasonable timeout.