relation-list missing a unit
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
New
|
Undecided
|
Unassigned |
Bug Description
After an OpenStack deployment completed the hacluster subordinate reported it wasn't at the expected scale. Looking into it showed that unit hacluster-aodh/1 was missing from the hanode relationship with hacluster-aodh/0 but only from hacluster-aodh/0's point of view. Inspecting the relationship from hacluster-aodh/1's point of view correctly shows both peer nodes (hacluster-aodh/0 and hacluster-aodh/2).
juju version: 2.9.33-ubuntu-amd64
$ juju run --application hacluster-aodh "relation-ids hanode" [11/51]
- Stdout: |
hanode:8
UnitId: hacluster-aodh/1
- Stdout: |
hanode:8
UnitId: hacluster-aodh/2
- Stdout: |
hanode:8
UnitId: hacluster-aodh/0
$ juju run --application hacluster-aodh "relation-list -r hanode:8"
- Stdout: |
hacluster-
UnitId: hacluster-aodh/0
- Stdout: |
hacluster-
hacluster-
UnitId: hacluster-aodh/1
- Stdout: |
hacluster-
hacluster-
UnitId: hacluster-aodh/2
$ juju status aodh
Model Controller Cloud/Region Version SLA Timestamp
openstack foundations-maas maas_cloud/default 2.9.33 unsupported 08:46:42Z
SAAS Status Store URL
grafana active foundations-maas admin/lma-
graylog active foundations-maas admin/lma-
nagios active foundations-maas admin/lma-
prometheus active foundations-maas admin/lma-
App Version Status Scale Charm Channel Rev Exposed Message
aodh 14.0.0 active 3 aodh yoga/stable 77 no Unit is ready
aodh-mysql-router 8.0.30 active 3 mysql-router 8.0/stable 35 no Unit is ready
filebeat 6.8.23 active 3 filebeat candidate 38 no Filebeat ready.
hacluster-aodh waiting 3 hacluster edge 109 no Resource: res_aodh_
logrotated active 3 logrotated candidate 7 no Unit is ready.
nrpe active 3 nrpe candidate 94 no Ready
prometheus-
public-
telegraf active 3 telegraf candidate 54 no Monitoring ceph-osd/2 (source version/commit 76901fd)
Unit Workload Agent Machine Public address Ports Message
aodh/0* active idle 0/lxd/0 10.246.165.92 8042/tcp Unit is ready
aodh-
filebeat/30 active idle 10.246.165.92 Filebeat ready.
hacluster-aodh/0* waiting idle 10.246.165.92 Resource: res_aodh_
logrotated/24 active idle 10.246.165.92 Unit is ready.
nrpe/36 active idle 10.246.165.92 icmp,5666/tcp Ready
prometheus-
public-
telegraf/29 active idle 10.246.165.92 9103/tcp Monitoring aodh/0 (source version/commit 76901fd)
aodh/1 active idle 1/lxd/0 10.246.166.208 8042/tcp Unit is ready
aodh-
filebeat/31 active idle 10.246.166.208 Filebeat ready.
hacluster-aodh/1 waiting idle 10.246.166.208 Resource: res_aodh_
logrotated/25 active idle 10.246.166.208 Unit is ready.
nrpe/37 active idle 10.246.166.208 icmp,5666/tcp Ready
prometheus-
public-
telegraf/31 active idle 10.246.166.208 9103/tcp Monitoring aodh/1 (source version/commit 76901fd)
aodh/2 active idle 2/lxd/0 10.246.165.66 8042/tcp Unit is ready
aodh-
filebeat/64 active idle 10.246.165.66 Filebeat ready.
hacluster-aodh/2 waiting idle 10.246.165.66 Resource: res_aodh_
logrotated/56 active idle 10.246.165.66 Unit is ready.
nrpe/68 active idle 10.246.165.66 icmp,5666/tcp Ready
prometheus-
public-
telegraf/63 active idle 10.246.165.66 9103/tcp Monitoring aodh/2 (source version/commit 76901fd)
Machine State Address Inst id Series AZ Message
0 started 10.246.164.163 solqa-lab1-
0/lxd/0 started 10.246.165.92 juju-30865a-0-lxd-0 jammy zone1 Container started
1 started 10.246.166.192 solqa-lab1-
1/lxd/0 started 10.246.166.208 juju-30865a-1-lxd-0 jammy zone2 Container started
2 started 10.246.165.238 solqa-lab1-
2/lxd/0 started 10.246.165.66 juju-30865a-2-lxd-0 jammy zone3 Container started
Ideally we'd get the output of juju show-unit for the affected units. FEATURE_ FLAGS=developer -mode)
Plus a juju dump-db output (after exporting JUJU_DEV_
The above will tell us the internals of the juju model is and we can use that plus logging info to try and see what's going on.