Kubernetes Control Plane unit goes into error state: hook failed: "aws-relation-changed" or "leader-settings-changed"

Bug #1983721 reported by Bas de Bruijne
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
New
Undecided
Unassigned

Bug Description

In testrun https://solutions.qa.canonical.com/testruns/testRun/89e74981-4d6c-4efc-b5d4-45868278f45d, with FCE logs: https://oil-jenkins.canonical.com/job/fce_build/4480//console we see a k8s control-plane unit going into an error state:

```
kubernetes-control-plane/0* error idle 7 54.237.76.42 hook failed: "aws-relation-changed"
  calico/1 waiting idle 54.237.76.42 Waiting to retry Calico node configuration
  canonical-livepatch/7 active idle 54.237.76.42 Running kernel 5.13.0-1031.35~20.04.1-aws, patchState: nothing-to-apply (source version/commit dad6199)
  containerd/1 active idle 54.237.76.42 Container runtime available
  filebeat/7 active idle 54.237.76.42 Filebeat ready.
  ntp/7 active idle 54.237.76.42 123/udp chrony: Ready
  telegraf/8 active idle 54.237.76.42 9103/tcp Monitoring kubernetes-control-plane/0
kubernetes-control-plane/1 maintenance executing 8 54.227.155.67 6443/tcp (leader-settings-changed) Restarting snap.kube-scheduler.daemon service
  calico/4 waiting idle 54.227.155.67 Waiting to retry Calico node configuration
  canonical-livepatch/12 active idle 54.227.155.67 Running kernel 5.13.0-1031.35~20.04.1-aws, patchState: nothing-to-apply (source version/commit dad6199)
  containerd/4 active idle 54.227.155.67 Container runtime available
  filebeat/12 active idle 54.227.155.67 Filebeat ready.
  ntp/12 active idle 54.227.155.67 123/udp chrony: Ready
  telegraf/12 active idle 54.227.155.67 9103/tcp Monitoring kubernetes-control-plane/1
```

In the logs we see:
```
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 73, in main
    hookenv._run_atstart()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 1348, in _run_atstart
    callback(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/reactive/vault_kv.py", line 46, in manage_app_kv_flags
    app_kv = vault_kv.VaultAppKV()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/lib/charms/layer/vault_kv.py", line 33, in __call__
    cls._singleton_instance = super().__call__(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/lib/charms/layer/vault_kv.py", line 131, in __init__
    super().__init__()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/lib/charms/layer/vault_kv.py", line 41, in __init__
    response = self._client.read(self._path)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/lib/charms/layer/vault_kv.py", line 60, in _client
    client.auth_approle(self._config["role_id"], self._config["secret_id"])
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/utils.py", line 201, in new_func
    return method(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/v1/__init__.py", line 1805, in auth_approle
    return self.login(
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/v1/__init__.py", line 1495, in login
    return self._adapter.login(url=url, use_token=use_token, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/adapters.py", line 197, in login
    response = self.post(url, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/adapters.py", line 126, in post
    return self.request("post", url, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/adapters.py", line 364, in request
    response = super(JSONAdapter, self).request(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/adapters.py", line 330, in request
    utils.raise_for_error(
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/utils.py", line 49, in raise_for_error
    raise exceptions.InternalServerError(
hvac.exceptions.InternalServerError: internal error, on post http://172.31.45.105:8200/v1/auth/approle/login
```

172.31.45.105 is the IP of machine 15, which is the vault unit. It could just be a network interruption, but maybe there is something else behind it.

Crashdumps etc:
https://oil-jenkins.canonical.com/artifacts/89e74981-4d6c-4efc-b5d4-45868278f45d/index.html

description: updated
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

I'm seeing this again in https://solutions.qa.canonical.com/testruns/testRun/2e420b50-01b5-496f-bb01-0a91dcbcd644, except that the message is `hook failed: "leader-settings-changed"`. In the logs:

```
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charms/reactive/__init__.py", line 73, in main
    hookenv._run_atstart()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/charmhelpers/core/hookenv.py", line 1348, in _run_atstart
    callback(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/charm/reactive/vault_kv.py", line 46, in manage_app_kv_flags
    app_kv = vault_kv.VaultAppKV()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/charm/lib/charms/layer/vault_kv.py", line 33, in __call__
    cls._singleton_instance = super().__call__(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/charm/lib/charms/layer/vault_kv.py", line 131, in __init__
    super().__init__()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/charm/lib/charms/layer/vault_kv.py", line 41, in __init__
    response = self._client.read(self._path)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/charm/lib/charms/layer/vault_kv.py", line 60, in _client
    client.auth_approle(self._config["role_id"], self._config["secret_id"])
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/hvac/utils.py", line 201, in new_func
    return method(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/hvac/v1/__init__.py", line 1805, in auth_approle
    return self.login(
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/hvac/v1/__init__.py", line 1495, in login
    return self._adapter.login(url=url, use_token=use_token, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/hvac/adapters.py", line 197, in login
    response = self.post(url, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/hvac/adapters.py", line 126, in post
    return self.request("post", url, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/hvac/adapters.py", line 364, in request
    response = super(JSONAdapter, self).request(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/hvac/adapters.py", line 330, in request
    utils.raise_for_error(
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-1/.venv/lib/python3.10/site-packages/hvac/utils.py", line 49, in raise_for_error
    raise exceptions.InternalServerError(
hvac.exceptions.InternalServerError: internal error, on post http://172.31.35.161:8200/v1/auth/approle/login
```

summary: Kubernetes Control Plane unit goes into error state: hook failed: "aws-
- relation-changed"
+ relation-changed" or "leader-settings-changed"
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

I'm seeing the same error with `hook failed: "certificates-relation-changed"` as well.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.