Intermitent HA controller down
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Heather Lanigan |
Bug Description
During the triage of bug https:/
To reproduce use the same steps mentioned in the bug above and kill a secondary :
juju bootstrap localhost lxd-lcl-controller
juju add-machine -m controller -n 2
juju enable-ha --to 1,2
Find the HA primary:
controller-
"0":
instance-id: juju-4fe7ec-0
ha-status: ha-enabled
ha-primary: true
PRIMARY=0
juju ssh -m $PRIMARY -- sudo reboot
In my case I rebooted controller 0. Then, juju status reports:
Model Controller Cloud/Region Version SLA Timestamp
controller lxd-lcl-controller localhost/localhost 2.9.29 unsupported 14:53:51+02:00
Machine State DNS Inst id Series AZ Message
0 started 10.73.25.245 juju-c924af-0 focal Running
1 started 10.73.25.60 juju-c924af-1 focal Running
2 down 10.73.25.130 juju-c924af-2 focal Running
and then...
Model Controller Cloud/Region Version SLA Timestamp
controller lxd-lcl-controller localhost/localhost 2.9.29 unsupported 14:54:40+02:00
Machine State DNS Inst id Series AZ Message
0 started 10.73.25.245 juju-c924af-0 focal Running
1 started 10.73.25.60 juju-c924af-1 focal Running
2 started 10.73.25.130 juju-c924af-2 focal Running
This might be a concurrency issue because the problem may appear eventually.
Changed in juju: | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in juju: | |
milestone: | 2.9.31 → none |
Please upload controller logs for investigative purposes. My understanding this not always reproduced.