"chain" and "ca" values sometimes not shared due to incorrect return value of is_unit_paused_set()
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
vault-charm |
Confirmed
|
Undecided
|
dongdong tao |
Bug Description
This may be a shared issue between both vault and charmhelpers.
I have a 3 unit vault cluster in use by an OpenStack cloud. Certificates have recently been updated, but on at least one of the relationships we're seeing that "chain" and "ca" values aren't being provided, and thus the client (octavia) isn't able to set up its certs appropriately.
I traced this to reactive/
That branch is directly dependent on the return value of the is_unit_
"juju status" shows all 3 vault units as active and running:
$ juju status vault | grep ^vault/
vault/0 active idle 15 <REDACTED> 8200/tcp Unit is ready (active: true, mlock: enabled)
vault/1 active idle 16 <REDACTED> 8200/tcp Unit is ready (active: false, mlock: enabled)
vault/2* active idle 17 <REDACTED> 8200/tcp Unit is ready (active: false, mlock: enabled)
However, vault/1 and vault/2 both have the unit-paused flag set in the state DB. Example:
ubuntu@vault-2:~$ sudo su -
root@vault-2:~# cd /var/lib/
root@vault-
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pprint
>>> import sqlite3
>>> conn = sqlite3.
>>> pprint.
[('unit-paused', 'true')]
My instinct here was that pausing and resuming the vault units may resolve this. However, in this particular situation it does not; we end up hitting the publish_ca_info reactive hook after the pause logic has completed on the embedded call to charms.
Changed in vault-charm: | |
assignee: | nobody → dongdong tao (taodd) |
I suspect, based upon reviewing the charmhelpers pause_unit implementation, that if services are indeed running normally that we can work around this manually via clearing the incorrect unit-paused record, via the SQL command "DELETE FROM kv WHERE key='unit-paused'".