HA Juju controllers showing inconsistent status
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Undecided
|
Joseph Phillips |
Bug Description
Hi,
We've experienced an issue with the consistency and stability of our Juju controllers, and are struggling to pinpoint what's actually happening.
We're operating a HA controller set, running Juju 2.9.42, deployed in an Openstack cloud.
Symptoms we've observed have been:
* Issues with the stability of relationship hooks in deployed models (we have observed issues with relationships being created, updated, and departed)
* Controllers returning inconsistent "juju status" results
When running "juju status --debug" to make sure we get one result from each controller, we have observed that at least one controller will consistently return a different result than the other(s).
For example, this paste shows both secondary controllers reporting the primary controller as "agent-lost", while the primary disagrees: https:/
Controller logs from the period in question have been made available via secure portal https:/
Model logs for the specific model in which we observed relationship hook issues are located in "special-request" under that directory.
Please advise if there are any additional logs we should supply, any metrics we can gather from the time, or anything else.
Thanks!
tags: | added: canonical-is |
Changed in juju: | |
status: | New → Incomplete |
Changed in juju: | |
status: | Incomplete → New |
Changed in juju: | |
status: | New → Triaged |
assignee: | nobody → Joseph Phillips (manadart) |
The controllers share agent connectivity info (aka presence) using pubsub. I don't think there's an explicit delivery guarantee for such messages.
Logging to turn on would be
juju.worker. pubsub= TRACE presence= TRACE
juju.worker.
Logs could also contain messages matching the format string
"%p programming error, e.ch=%v did not accept %v - missing Unwatch?\nwatch source:\n%s"
Extra relation debug can be obtained by setting
juju.worker. uniter. relation= TRACE