"NOT tolerant to any failures" status shouldn't be considered as green

Bug #1997235 reported by Nobuto Murata
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Triaged
Wishlist
Unassigned

Bug Description

When there is a failure node out of 3 units, Juju status goes to "Cluster is NOT tolerant to any failures. 1 member is not active", which is good. However, the workload status itself is still "active" and it shouldn't be considered as green. The expected status is "blocked" in this case since it requires a human intervention.

Steps:

1. deploy (in this case, LXD provider with a single L2 network for simplicity as the charm acts differently with L3)

$ juju deploy --series focal mysql-innodb-cluster --channel latest/edge -n3

[status]
Unit Workload Agent Machine Public address Ports Message
mysql-innodb-cluster/0 active idle 0 10.0.9.169 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/1 active idle 1 10.0.9.94 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/2* active idle 2 10.0.9.57 Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.

2. simulate a node failure

juju ssh mysql-innodb-cluster/0 "
    sudo systemctl mask mysql.service
    sudo systemctl kill -s 9 mysql.service

    sudo systemctl stop jujud-machine-0.service
"

[status]

Unit Workload Agent Machine Public address Ports Message
mysql-innodb-cluster/0 unknown lost 0 10.0.9.169 agent lost, see 'juju show-status-log mysql-innodb-cluster/0'
mysql-innodb-cluster/1 active idle 1 10.0.9.94 Unit is ready: Mode: R/O, Cluster is NOT tolerant to any failures. 1 member is not active.
mysql-innodb-cluster/2* active idle 2 10.0.9.57 Unit is ready: Mode: R/W, Cluster is NOT tolerant to any failures. 1 member is not active.

^^^ the status turned into "NOT tolerant to any failures" but the workload status is active/green. This should be blocked or something.

3. remove the failed machine

$ juju remove-machine --force 0

[status]
Unit Workload Agent Machine Public address Ports Message
mysql-innodb-cluster/1 waiting idle 1 10.0.9.94 'cluster' incomplete
mysql-innodb-cluster/2* active idle 2 10.0.9.57 Unit is ready: Mode: R/W, Cluster is NOT tolerant to any failures. 1 member is not active.

^^^ "'cluster' incomplete" is also a weird status. In this step, all units should be blocked or something.

4. re-add an unit.

$ juju add-unit mysql-innodb-cluster

Unit Workload Agent Machine Public address Ports Message
mysql-innodb-cluster/1 active idle 1 10.0.9.94 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/2* active idle 2 10.0.9.57 Unit is ready: Mode: R/W, Cluster is ONLINE and can tolerate up to ONE failure.
mysql-innodb-cluster/3 active idle 3 10.0.9.143 Unit is ready: Mode: R/O, Cluster is ONLINE and can tolerate up to ONE failure.

^^^ all green as expected.

Changed in charm-mysql-innodb-cluster:
status: New → Triaged
importance: Undecided → Wishlist
tags: added: good-first-bug
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I don't disagree with the sentiment in the bug report. The active, blocked, error, unknown statuses are currently viewed (by the charm) from the perspective of "whether that instance is working correctly". The 'cluster-level' view status doesn't really exist, as it's a meta-state of the complete cluster.

We'd need to change the meaning of the unit status *when clustered* to mean something subtly-else which is a compound of the individual unit status and the overall cluster health, which is what the status message is trying to indicate - i.e. the charm is aware of the cluster status.

Ideally, there would be a 'degraded' status (or similar) that could be set to indicate a cluster-level status, AND the ability of the units to to indicate their individual statuses. Sadly, this isn't currently available.

I'd probably be okay with changing the status to blocked in the event that the cluster is degraded, but interested in other viewpoints.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.