Comment 3 for bug 1810331

Revision history for this message
William Grant (wgrant) wrote :

The stg-ols-snap-store controller (currently 2.5-rc1, using raft leases) is affected by what seems to be a similar bug. The controller has one non-controller model, jsft for which is at https://pastebin.canonical.com/p/3Kyq4fbW4F/. There are a number of services for which juju status doesn't know about a leader, but is-leader is true on exactly one unit.

[STAGING] stg-ols-snap-store@wendigo:~$ juju run --application cassandra is-leader
- Stdout: |
    False
  UnitId: cassandra/3
- Stdout: |
    False
  UnitId: cassandra/4
- Stdout: |
    True
  UnitId: cassandra/5

[STAGING] stg-ols-snap-store@wendigo:~$ jsft | grep ^cassandra
cassandra active 3 cassandra local 1 ubuntu
cassandra/3 active idle 237 10.50.79.95 9042/tcp,9160/tcp Live seed
cassandra/4 active idle 238 10.50.79.96 9042/tcp,9160/tcp Live node
cassandra/5 active idle 239 10.50.79.97 9042/tcp,9160/tcp Live seed

Controller log since the upgrade: https://pastebin.canonical.com/p/3NFNkYcBQ5/

Unit log from the sole unit of an application that has no leader in status:
  Immediately after the upgrade: https://pastebin.canonical.com/p/p8hRsRpqSB/
  All mentions of "leader": https://pastebin.canonical.com/p/v4Jq7QNs7M/

While I was interrogating the controller, it OOMed and restarted. The cassandra application, at least, remains in an identical state: status reports no leader, but is-leader is true only on cassandra/5.

Controller log for the restart: https://pastebin.canonical.com/p/M45gXhhVHM/
cassandra/5 agent log for the restart: https://pastebin.canonical.com/p/JR9dBpWvGj/