2021-12-03 03:51:45 |
Trent Lloyd |
description |
It would be helpful to have a charm action which will re-configure the local MySQL installation from scratch, primarily creating the clusteruser (which is needed for the add-instance action to work). Otherwise it is tedious to do this requiring manual clearing of flags, etc, or manually setting the root password to the value hidden in the charm leader settings database as well as re-creating the clusteruser with the correct password and IP access manually.
We hit a production situation where a unit was broken, it was decided to remove the unit and then another one added. Unfortunately the new unit installed a newer MySQL (8.0.27) than the other nodes were running (8.0.25) so it fails to join the cluster with the old units. Because we only have 2/3 units we can't really upgrade the others without disruption either as we will lose quorum.
So the solution done was to downgrade the package on the new node, but in that case the data directory is not compatible and it won't start so we need to remove the data directory and re-initialize mysql. But from this point, there is no clean way to have the charm re-configure MySQL from scratch again. Though there is an 'add-instance' action to add it back to the cluster, it requires the clusteruser is already configured.
A charm action to purge and rebuild a unit would have solved the original problem and also solve the problem we created and make it easy to get the unit back into a working state. Here are the steps I used to prototype what such an action would look like.
(1) [new unit] Stop MySQL and move data directory
systemctl stop mysql
mv /var/lib/mysql /var/lib/mysql.old # this is so that the dpkg script will re-create the data directory
(2) [juju jumphost] Remove member from cluster
juju run-action --wait mysql-innodb-cluster/leader remove-instance address=1.2.3.4 force=true
force=true required as otherwise the cluster member has to be joined and working to remove it.
(3) [mysql-innodb-cluster/4] Downgrade package and/or re-initialize data directory
Note that the initial root password for MySQL (which the charm needs to be correct so it can then correct and create the clusteruser) is done by the postinst dpkg script using a password the charm pre-seeds using debconf-set-selections. We need to ensure this is still in place. If you purge the apt package for example, this password is lost and has to be set again.
In this case, to re-create the data directory we just rely on the dpkg postinst script to do it which it does because /var/lib/mysql was removed and the script is triggered during downgrade. If you are trying to re-init in place you may be able to use "dpkg-reconfigure mysql-server-8.0" to trigger the script (I didnt' test that).
If you try mysqld --initialize on your own, the root password is not set and a random password is logged into the error log that you then need to login with and set the password expected by the charm. There is a --insecure mode which will let you connect with no password which is what the dpkg script does.
Extra note: The mysql-innodb-cluster package also has code to detect a downgrade and "Freeze" the package (stop it from starting etc) until after manual intervention. But it seems broken currently so this didn't happen for me as it relies on /var/lib/mysql/debian_flags which didn't seem to exist with the correct info on either the original data directory or obviously after removing the data directory. But you'd also have to remove /etc/mysql/FROZEN if that code is fixed in the specific case of a downgrade if you downgraded before moving the data directory out of the way.
(4) [juju jumphost] Clear flags to force charm to re-create cluster users
Once we have a new working data directory and mysql starts in standalone mode with the correct root password as known by the charm setup, we can manually clear these 2 flags and trigger udpate-status to have the user creation run again.
juju run --unit mysql-innodb-cluster/4 -- charms.reactive clear_flag local.cluster.user-created
juju run --unit mysql-innodb-cluster/4 -- charms.reactive clear_flag local.cluster.all-users-created
juju run --unit mysql-innodb-cluster/4 -- ./hooks/update-status
juju run-action --wait mysql-innodb-cluster/leader add-instance address=1.2.3.4
(5) [juju jumphost] Re-add instance to cluster
juju run-action --wait mysql-innodb-cluster/leader add-instance address=1.2.3.4 |
It would be helpful to have a charm action which will re-configure the local MySQL installation from scratch, primarily creating the clusteruser (which is needed for the add-instance action to work). Otherwise it is tedious to do this requiring manual clearing of flags, etc, or manually setting the root password to the value hidden in the charm leader settings database as well as re-creating the clusteruser with the correct password and IP access manually.
We hit a production situation where a unit was broken, it was decided to remove the unit and then another one added. Unfortunately the new unit installed a newer MySQL (8.0.27) than the other nodes were running (8.0.25) so it fails to join the cluster with the old units. Because we only have 2/3 units we can't really upgrade the others without disruption either as we will lose quorum.
So the solution done was to downgrade the package on the new node, but in that case the data directory is not compatible and it won't start so we need to remove the data directory and re-initialize mysql. But from this point, there is no clean way to have the charm re-configure MySQL from scratch again. Though there is an 'add-instance' action to add it back to the cluster, it requires the clusteruser is already configured.
A charm action to purge and rebuild a unit would have solved the original problem and also solve the problem we created and make it easy to get the unit back into a working state. |
|