rabbitmq unit in cluster goes AWOL while cluster-relation-changed hook runs

Bug #1620641 reported by Ursula Junque
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Autopilot Log Analyser
Fix Committed
Ursula Junque
rabbitmq-server (Juju Charms Collection)

Bug Description

In an autopilot install with three rabbitmq-server units (HA), a whole cloud deployment fails due to one of the rabbitmq-server units disappearing when cluster-relation-changed hook runs. Units remain waiting for relations to establish but they can't reach their rabbitmq.

In this failed deployment, the AWOL unit is rabbitmq-server-2:

- rabbitmq-server service juju status:
agent-status: {current: executing, message: running cluster-relation-changed hook, since: '06 Sep 2016 01:23:08Z', version: 1.25.6}

- Unit juju logs:
unit-rabbitmq-server-2[1085]: ... INFO unit.rabbitmq-server/2.cluster-relation-changed logger.go:40 Stopping node 'rabbit@juju-machine-2-lxc-5' ...
unit-rabbitmq-server-2[1085]: ... INFO unit.rabbitmq-server/2.cluster-relation-changed logger.go:40 ...done.

Unit is not accessible via ssh as well. Something changes when the hook is running (coincidence?) that causes the container to become unreachable. Maybe its ip?

Ursula Junque (ursinha)
Changed in autopilot-log-analyser:
assignee: nobody → Ursula Junque (ursinha)
status: New → In Progress
importance: Undecided → High
Changed in autopilot-log-analyser:
status: In Progress → Fix Committed
Revision history for this message
Ursula Junque (ursinha) wrote :

It turns out the unit failing to find the other rabbit unit didn't have the /etc/hosts entry for it. Once I manually added that, hook completed instantly without the need of a restart.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.