ocn rev 105 Unable to authorize approle after unseal
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
vault-charm |
Fix Released
|
Critical
|
David Ames |
Bug Description
https:/
After initializing all 3 vault units. the Vault units go into a bad state, unable to authorize approle, and are unable to generate the authorization token.
2020-07-30 08:24:37 DEBUG juju-log Could not retrieve app_role_id
2020-07-30 08:24:37 DEBUG jujuc server.go:211 running hook tool "juju-log"
2020-07-30 08:24:37 WARNING juju-log InternalServerE
2020-07-30 08:24:37 DEBUG jujuc server.go:211 running hook tool "juju-log"
2020-07-30 08:24:37 ERROR juju-log Traceback (most recent call last):
File "/var/lib/
vault.
File "/var/lib/
return self.call(f, *args, **kw)
File "/var/lib/
do = self.iter(
File "/var/lib/
raise retry_exc.reraise()
File "/var/lib/
raise self.last_
File "/usr/lib/
return self.__get_result()
File "/usr/lib/
raise self._exception
File "/var/lib/
result = fn(*args, **kwargs)
File "/var/lib/
raise VaultNotReady(
lib.charm.
2020-07-30 08:24:37 DEBUG jujuc server.go:211 running hook tool "status-set"
2020-07-30 08:24:37 DEBUG jujuc server.go:211 running hook tool "relation-set"
summary: |
- ocn rev 105 Unable to athorize approle after unseal + ocn rev 105 Unable to authorize approle after unseal |
tags: | added: cdo-qa cdoqa-release-blocker foundations-engine |
Changed in vault-charm: | |
importance: | Undecided → Critical |
assignee: | nobody → Alex Kavanagh (ajkavanagh) |
tags: |
added: cdo-release-blocker removed: cdoqa-release-blocker |
Changed in vault-charm: | |
status: | Fix Committed → Fix Released |
The following introduced a gate health check, client_ approle_ authorized, to handle database topology changes (rolling restarts, pause/resumes, etc). It checks that the local charm can authorize itself. A tenacity retry is also added. /review. opendev. org/#/c/ 740086/ /review. opendev. org/#/c/ 739129/
https:/
https:/
These changes have caused delays during long update-status hook executions.
As it turns out SQA uses a 5 minute TTL on their token create: 127.0.0. 1:8200 && /snap/bin/vault token create --ttl=5m'
juju run -u vault/leader 'export VAULT_TOKEN=<token> && export VAULT_ADDR=http://
If an update-status is running ahead of the above juju run we get a time out on the juju run.
If update-status runs after this but before the action, authorize-charm is run we get the following "permission denied" error because the token TTL has been exceeded.
2020-08-02 06:18:05 ERROR juju-log Traceback (most recent call last): juju/agents/ unit-vault- 0/charm/ actions/ authorize- charm", line 185, in main juju/agents/ unit-vault- 0/charm/ actions/ authorize- charm", line 45, in authorize_ charm_action charm_vault_ access( action_ config[ 'token' ]) juju/agents/ unit-vault- 0/charm/ lib/charm/ vault.py" , line 213, in setup_charm_ vault_access approle_ auth(client) juju/agents/ unit-vault- 0/charm/ lib/charm/ vault.py" , line 178, in enable_approle_auth list_auth_ backends( ): juju/agents/ unit-vault- 0/.venv/ lib/python3. 6/site- packages/ hvac/v1/ __init_ _.py", line 1738, in list_auth_backends get('/v1/ sys/auth' ).json( ) juju/agents/ unit-vault- 0/.venv/ lib/python3. 6/site- packages/ hvac/adapters. py", line 90, in get juju/agents/ unit-vault- 0/.venv/ lib/python3. 6/site- packages/ hvac/adapters. py", line 233, in request raise_for_ error(response. status_ code, text, errors=errors) juju/agents/ unit-vault- 0/.venv/ lib/python3. 6/site- packages/ hvac/utils. py", line 33, in raise_for_error Forbidden( message, errors=errors) .Forbidden: permission denied
File "/var/lib/
action(args)
File "/var/lib/
role_id = vault.setup_
File "/var/lib/
enable_
File "/var/lib/
if 'approle/' not in client.
File "/var/lib/
return self._adapter.
File "/var/lib/
return self.request('get', url, **kwargs)
File "/var/lib/
utils.
File "/var/lib/
raise exceptions.
hvac.exceptions
Root cause: approle_ authorized, checking for app role authorization is called before the charm has been authorized causing tenacity retries and long update-state hook executions ultimately exceeding the 5 minute token TTL.
The gate, client_
TRIAGE: approle_ authorized for leader setting of local-charm- access- id which is set during the authorize-charm action.
Add a check in client_