rabbitmq unit in cluster goes AWOL while cluster-relation-changed hook runs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Autopilot Log Analyser |
Fix Committed
|
High
|
Ursula Junque | ||
rabbitmq-server (Juju Charms Collection) |
New
|
Undecided
|
Unassigned |
Bug Description
In an autopilot install with three rabbitmq-server units (HA), a whole cloud deployment fails due to one of the rabbitmq-server units disappearing when cluster-
In this failed deployment, the AWOL unit is rabbitmq-server-2:
- rabbitmq-server service juju status:
https:/
{{{
agent-status: {current: executing, message: running cluster-
}}}
- Unit juju logs:
https:/
{{{
unit-rabbitmq-
unit-rabbitmq-
}}}
Unit is not accessible via ssh as well. Something changes when the hook is running (coincidence?) that causes the container to become unreachable. Maybe its ip?
Changed in autopilot-log-analyser: | |
assignee: | nobody → Ursula Junque (ursinha) |
status: | New → In Progress |
importance: | Undecided → High |
Changed in autopilot-log-analyser: | |
status: | In Progress → Fix Committed |
It turns out the unit failing to find the other rabbit unit didn't have the /etc/hosts entry for it. Once I manually added that, hook completed instantly without the need of a restart.