2023-06-04 16:14:09 |
Arun Neelicattu |
bug |
|
|
added bug |
2023-06-04 16:17:19 |
Arun Neelicattu |
bug task added |
|
charm-kubeapi-load-balancer |
|
2023-06-04 16:19:03 |
Arun Neelicattu |
description |
Observed Behaviour
------------------
1. kubernetes-control-plane units never get to active/idle state.
2. kubernetes-control-plane units continuously reacts to client relation / config change events causing restarts.
3. easyrsa units detect client relation changes triggering certification revocation and generates new certificates with different SANs.
Would be great if someone has a workaround for this that we can use.
Probably Root Cause
-------------------
When multiple DNS records exists for the host, python's socket.getfqdn() calls will provide inconsistent results between calls due to https://github.com/python/cpython/issues/49254.
For example the following commands were consecutively executed on one of the control plane machines.
root@juju-bf6a17-0:/# python3 -c 'import socket; print(socket.getfqdn())'
10-XXX-XXX-XXX.example.net
root@juju-bf6a17-0:/# python3 -c 'import socket; print(socket.getfqdn())'
juju-bf6a17-0.example.net
This causes the SAN list generated for the certificate request to be different every time. https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/7258630cf0a5560a665ed1e4770cc8f2f52013c4/reactive/kubernetes_control_plane.py#L1570C7-L1588
This then triggers a certificate change and the cycle continues.
Proposed Fix
------------
A potential fix here could be to replace the call to `socket.getfqdn()` to the patched method in the cpython upstream issue. For convenience, I have put the code into a gist at https://gist.github.com/abn/c4165a6d288e5f7137bdec5a4db199d1.
Alternatively, you can simply replace the call with the following.
socket.getaddrinfo(socket.gethostname(), None, 0, socket.SOCK_DGRAM, 0, socket.AI_CANONNAME)
Ideally the fix can be replicated and/or reused across all the charms that request for certs. |
Observed Behaviour
------------------
1. kubernetes-control-plane units never get to active/idle state.
2. kubernetes-control-plane units continuously reacts to client relation / config change events causing restarts.
3. easyrsa units detect client relation changes triggering certification revocation and generates new certificates with different SANs.
Would be great if someone has a workaround for this that we can use.
Probable Root Cause
-------------------
When multiple DNS records exists for the host, python's socket.getfqdn() calls will provide inconsistent results between calls due to https://github.com/python/cpython/issues/49254.
For example the following commands were consecutively executed on one of the control plane machines.
root@juju-bf6a17-0:/# python3 -c 'import socket; print(socket.getfqdn())'
10-XXX-XXX-XXX.example.net
root@juju-bf6a17-0:/# python3 -c 'import socket; print(socket.getfqdn())'
juju-bf6a17-0.example.net
This causes the SAN list generated for the certificate request to be different every time. https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/7258630cf0a5560a665ed1e4770cc8f2f52013c4/reactive/kubernetes_control_plane.py#L1570C7-L1588
This then triggers a certificate change and the cycle continues.
Proposed Fix
------------
A potential fix here could be to replace the call to `socket.getfqdn()` to the patched method in the cpython upstream issue. For convenience, I have put the code into a gist at https://gist.github.com/abn/c4165a6d288e5f7137bdec5a4db199d1.
Alternatively, you can simply replace the call with the following.
socket.getaddrinfo(socket.gethostname(), None, 0, socket.SOCK_DGRAM, 0, socket.AI_CANONNAME)
Ideally the fix can be replicated and/or reused across all the charms that request for certs. |
|
2023-06-04 19:15:52 |
Arun Neelicattu |
description |
Observed Behaviour
------------------
1. kubernetes-control-plane units never get to active/idle state.
2. kubernetes-control-plane units continuously reacts to client relation / config change events causing restarts.
3. easyrsa units detect client relation changes triggering certification revocation and generates new certificates with different SANs.
Would be great if someone has a workaround for this that we can use.
Probable Root Cause
-------------------
When multiple DNS records exists for the host, python's socket.getfqdn() calls will provide inconsistent results between calls due to https://github.com/python/cpython/issues/49254.
For example the following commands were consecutively executed on one of the control plane machines.
root@juju-bf6a17-0:/# python3 -c 'import socket; print(socket.getfqdn())'
10-XXX-XXX-XXX.example.net
root@juju-bf6a17-0:/# python3 -c 'import socket; print(socket.getfqdn())'
juju-bf6a17-0.example.net
This causes the SAN list generated for the certificate request to be different every time. https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/7258630cf0a5560a665ed1e4770cc8f2f52013c4/reactive/kubernetes_control_plane.py#L1570C7-L1588
This then triggers a certificate change and the cycle continues.
Proposed Fix
------------
A potential fix here could be to replace the call to `socket.getfqdn()` to the patched method in the cpython upstream issue. For convenience, I have put the code into a gist at https://gist.github.com/abn/c4165a6d288e5f7137bdec5a4db199d1.
Alternatively, you can simply replace the call with the following.
socket.getaddrinfo(socket.gethostname(), None, 0, socket.SOCK_DGRAM, 0, socket.AI_CANONNAME)
Ideally the fix can be replicated and/or reused across all the charms that request for certs. |
Observed Behaviour
------------------
1. kubernetes-control-plane units never get to active/idle state.
2. kubernetes-control-plane units continuously reacts to client relation / config change events causing restarts.
3. easyrsa units detect client relation changes triggering certification revocation and generates new certificates with different SANs.
Would be great if someone has a workaround for this that we can use.
Probable Root Cause
-------------------
When multiple DNS records exists for the host, python's socket.getfqdn() calls will provide inconsistent results between calls due to https://github.com/python/cpython/issues/49254.
For example the following commands were consecutively executed on one of the control plane machines.
root@juju-bf6a17-0:/# python3 -c 'import socket; print(socket.getfqdn())'
10-XXX-XXX-XXX.example.net
root@juju-bf6a17-0:/# python3 -c 'import socket; print(socket.getfqdn())'
juju-bf6a17-0.example.net
This causes the SAN list generated for the certificate request to be different every time. https://github.com/charmed-kubernetes/charm-kubernetes-control-plane/blob/7258630cf0a5560a665ed1e4770cc8f2f52013c4/reactive/kubernetes_control_plane.py#L1570C7-L1588
This then triggers a certificate change and the cycle continues.
Proposed Fix
------------
A potential fix here could be to replace the call to `socket.getfqdn()` to the patched method in the cpython upstream issue. For convenience, I have put the code into a gist at https://gist.github.com/abn/c4165a6d288e5f7137bdec5a4db199d1.
Alternatively, you can simply replace the call with the following.
socket.getaddrinfo(socket.gethostname(), None, 0, socket.SOCK_DGRAM, 0, socket.AI_CANONNAME)
Ideally the fix can be replicated and/or reused across all the charms that request for certs.
Workaround
----------
Should adjust the command to your specific case.
juju exec --all -- bash -c 'sudo sed -i s/"127.0.0.1 localhost"/"127.0.0.1 $(hostname -f) localhost"/ /etc/hosts'
Since the hosts file has precedence, this seems to have at least mitigated the issue for now. |
|