Cannot get out of blocked state (Vault failed to start; check journalctl -u vault)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
vault-charm |
Confirmed
|
Undecided
|
Jorge Niedbalski |
Bug Description
If you reboot vault and MySQL is not yet ready vault enters a blocked state (as per Bug #1818973 vault fails to start when MySQL backend down).
However in my scenario it is impossible to get the charm out of this blocked state. This environment is a single vault unit (no HA) with totally-
I tried the following things
(1) systemctl start vault # works
(2) juju run --unit vault/1 ./hooks/
(3) juju run-action vault/leader resume
(4) juju run-action vault/leader pause; and then resume again;
(4) manually unseal the vault using "VAULT_ADDR=http://
(5) retrying 2-3 after manually unsealing the vault
(6) rebooting the node, when vault starts successfully and then trying 2/3/4/5 again
In all cases the charm never escapes "blocked (Vault failed to start; check journalctl -u vault)" even though vault is in fact started and even unsealed.
The debug log shows the following flags simultaneously set: started, failed.to.start, configured
If we look at the logic around these flags. The only function that clears failed.to.start is start_vault. It only runs @when('configured') @when_not(
However several functions set failed.to.start without clearing started. Such as publish_ca_info, tune_pki_
Additionally the 'resume' action just invokes charmhelpers resume_unit and doesn't clear the failed.to.start flags or any others.
This leads to there being no charm scenario that can get out of this situation. Perhaps _assess_status (which checks for failed.to.start and sets blocked) can also check if vault is actually started, and if so set the started and clear failed.to.start or if its stopped, clear started and set failed.to.start?
tags: | added: seg |
Changed in vault-charm: | |
status: | New → Confirmed |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
As a work around, the blocked state of the charm is only shown when the failed.to.start flag is set. This can be cleared with:
juju run --unit vault/<unit_num> -- charms.reactive clear_flag failed.to.start
Additionally, if the charm needs to restart the vault service (i.e. you haven't started it manually), you can clear the started flag as well:
juju run --unit vault<unit_num> -- charms.reactive clear_flag started
If the vault service is already started, it will restart the service. All necessary unsealing will be required for the unit.