Comment 0 for bug 1983721

Revision history for this message
Bas de Bruijne (basdbruijne) wrote : Kubernetes Control Plane unit goes into error state: hook failed: "aws-relation-changed"

In testrun https://solutions.qa.canonical.com/testruns/testRun/89e74981-4d6c-4efc-b5d4-45868278f45d, with FCE logs: https://oil-jenkins.canonical.com/job/fce_build/4480//console we see a k8s control-plane unit going into an error state:

```
kubernetes-control-plane/0* error idle 7 54.237.76.42 hook failed: "aws-relation-changed"
  calico/1 waiting idle 54.237.76.42 Waiting to retry Calico node configuration
  canonical-livepatch/7 active idle 54.237.76.42 Running kernel 5.13.0-1031.35~20.04.1-aws, patchState: nothing-to-apply (source version/commit dad6199)
  containerd/1 active idle 54.237.76.42 Container runtime available
  filebeat/7 active idle 54.237.76.42 Filebeat ready.
  ntp/7 active idle 54.237.76.42 123/udp chrony: Ready
  telegraf/8 active idle 54.237.76.42 9103/tcp Monitoring kubernetes-control-plane/0
kubernetes-control-plane/1 maintenance executing 8 54.227.155.67 6443/tcp (leader-settings-changed) Restarting snap.kube-scheduler.daemon service
  calico/4 waiting idle 54.227.155.67 Waiting to retry Calico node configuration
  canonical-livepatch/12 active idle 54.227.155.67 Running kernel 5.13.0-1031.35~20.04.1-aws, patchState: nothing-to-apply (source version/commit dad6199)
  containerd/4 active idle 54.227.155.67 Container runtime available
  filebeat/12 active idle 54.227.155.67 Filebeat ready.
  ntp/12 active idle 54.227.155.67 123/udp chrony: Ready
  telegraf/12 active idle 54.227.155.67 9103/tcp Monitoring kubernetes-control-plane/1
```

In the logs we see:
```
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 73, in main
    hookenv._run_atstart()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 1348, in _run_atstart
    callback(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/reactive/vault_kv.py", line 46, in manage_app_kv_flags
    app_kv = vault_kv.VaultAppKV()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/lib/charms/layer/vault_kv.py", line 33, in __call__
    cls._singleton_instance = super().__call__(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/lib/charms/layer/vault_kv.py", line 131, in __init__
    super().__init__()
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/lib/charms/layer/vault_kv.py", line 41, in __init__
    response = self._client.read(self._path)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/charm/lib/charms/layer/vault_kv.py", line 60, in _client
    client.auth_approle(self._config["role_id"], self._config["secret_id"])
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/utils.py", line 201, in new_func
    return method(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/v1/__init__.py", line 1805, in auth_approle
    return self.login(
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/v1/__init__.py", line 1495, in login
    return self._adapter.login(url=url, use_token=use_token, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/adapters.py", line 197, in login
    response = self.post(url, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/adapters.py", line 126, in post
    return self.request("post", url, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/adapters.py", line 364, in request
    response = super(JSONAdapter, self).request(*args, **kwargs)
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/adapters.py", line 330, in request
    utils.raise_for_error(
  File "/var/lib/juju/agents/unit-kubernetes-control-plane-0/.venv/lib/python3.8/site-packages/hvac/utils.py", line 49, in raise_for_error
    raise exceptions.InternalServerError(
hvac.exceptions.InternalServerError: internal error, on post http://172.31.45.105:8200/v1/auth/approle/login
```

172.31.45.105 is the IP of machine 15, which is the vault unit. It would just be a network interruption, but maybe there is something else behind it.

Crashdumps etc:
https://oil-jenkins.canonical.com/artifacts/89e74981-4d6c-4efc-b5d4-45868278f45d/index.html