Cluster outage - Vault Certificate Expired

Bug #1830937 reported by Aaron Jennings
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
vault-charm
Incomplete
Undecided
Unassigned

Bug Description

We experienced a full cluster outage that appears to be caused by expired certificates.

The symptoms encountered were:
- kubectl commands returned: Unable to connect to the server: x509: certificate has expired or is not yet valid
- all etcd nodes reported: Errored with 0 known peers
- certs in /root/cdk on master and workers were expired with a lifespan of 30 days

The behavior I saw was identical to the issue described here:
https://github.com/charmed-kubernetes/bundle/issues/723

We are using the standard charmed-kubernetes bundle with the vault overlay

cs:~containers/etcd-397
cs:~containers/flannel-386
cs:~containers/kubeapi-load-balancer-583
cs:~containers/kubernetes-master-604
cs:~containers/kubernetes-worker-472
cs:percona-cluster
cs:~openstack-charmers-next/vault-41

I had been under the impression that the commit for this ticket would have prevented this issue:
https://bugs.launchpad.net/vault-charm/+bug/1788945

Since we were using vault-41, we thought we had the fix.

I was hoping someone could clarify if this is a known issue or if it has been fixed? I also saw this ticket: https://bugs.launchpad.net/vault-charm/+bug/1813180, which I thought might be part of the issue, but wasn't sure.

We are using cs:~openstack-charmers-next/vault-50 now, and things are fine, but it's unclear to me if the certs will be renewed, or if we need to take some action on the cluster before the expiration time comes.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

The revisions between cs:vault-41 and cs:~openstack-charmers-next/vault-41 are not the same, they track independent lists of versions. Can you confirm what version of the charm you were using, using a fully qualified version?

Changed in vault-charm:
status: New → Incomplete
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Looking at the linked bug/issue, neither of those actually address certificate renewal, unless I'm misreading something?

Revision history for this message
Aaron Jennings (extulsan) wrote :

Thank you very much for the response. Sorry for the confusion in the ticket. When we experienced the issue with expired certificates, we were using this version of the vault charm: cs:~openstack-charmers-next/vault-41

I've probably misunderstood the details of the issues I linked above. They are both issues that we've run into, and that we initially tried to document here: https://github.com/charmed-kubernetes/bundle/issues/719.

The comment here is what made me think these issues were related: https://github.com/charmed-kubernetes/bundle/issues/723#issuecomment-457293020, but I admit it's not at all clear to me.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.