Control plane crashloop due to cert regeneration caused by inconsistent SANs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Kubernetes API Load Balancer |
New
|
Undecided
|
Unassigned | ||
Kubernetes Control Plane Charm |
New
|
Undecided
|
Unassigned |
Bug Description
Observed Behaviour
------------------
1. kubernetes-
2. kubernetes-
3. easyrsa units detect client relation changes triggering certification revocation and generates new certificates with different SANs.
Would be great if someone has a workaround for this that we can use.
Probable Root Cause
-------------------
When multiple DNS records exists for the host, python's socket.getfqdn() calls will provide inconsistent results between calls due to https:/
For example the following commands were consecutively executed on one of the control plane machines.
root@juju-
10-XXX-
root@juju-
juju-bf6a17-
This causes the SAN list generated for the certificate request to be different every time. https:/
This then triggers a certificate change and the cycle continues.
Proposed Fix
------------
A potential fix here could be to replace the call to `socket.getfqdn()` to the patched method in the cpython upstream issue. For convenience, I have put the code into a gist at https:/
Alternatively, you can simply replace the call with the following.
socket.
Ideally the fix can be replicated and/or reused across all the charms that request for certs.
Workaround
----------
Should adjust the command to your specific case.
juju exec --all -- bash -c 'sudo sed -i s/"127.0.0.1 localhost"
Since the hosts file has precedence, this seems to have at least mitigated the issue for now.
Possibly related bug: https:/ /bugs.launchpad .net/maas/ +bug/2012801