juju upgrade-machine for a kubernetes-worker stuck on pre-series-upgrade hook
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Kubernetes Control Plane Charm |
Triaged
|
Wishlist
|
Unassigned | ||
Kubernetes Worker Charm |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
Upgrading the machine for a kubernetes-worker machine it got stuck on pre-series-upgrade hook
# =======
# =============== juju upgrade-machine 11 prepare jammy (stuck)
16:26:17 INFO cmd upgrademachine.
advanced-
advanced-
ntp/9 pre-series-upgrade hook running
machine-11 started upgrade series from "focal" to "jammy"
16:26:17 INFO cmd upgrademachine.
logrotated/15 pre-series-upgrade hook running
16:26:18 INFO cmd upgrademachine.
16:26:18 INFO cmd upgrademachine.
16:26:21 INFO cmd upgrademachine.
16:26:21 INFO cmd upgrademachine.
16:26:23 INFO cmd upgrademachine.
16:26:23 INFO cmd upgrademachine.
16:26:24 INFO cmd upgrademachine.
16:26:24 INFO cmd upgrademachine.
16:26:24 INFO cmd upgrademachine.
16:26:24 INFO cmd upgrademachine.
# =============== juju status
Every 2.0s: juju status --color | egrep 'executing|
containerd 1.6.8 blocked 9 containerd 1.28/stable 69 no Series upgrade in progress
kubernetes-worker/5 active executing 11 138.26.125.127 80/tcp,443/tcp (pre-series-
containerd/9 blocked idle 138.26.125.127 Series upgrade in progress
# =============== juju ssh 11 cat /var/log/
# ..
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:17 INFO juju.worker.
2023-09-08 16:26:18 INFO juju.worker.
2023-09-08 16:26:18 INFO juju.worker.
2023-09-08 16:26:18 INFO juju.worker.
2023-09-08 16:26:18 INFO juju.worker.
2023-09-08 16:26:21 INFO juju.worker.
2023-09-08 16:26:21 INFO juju.worker.
2023-09-08 16:26:21 INFO juju.worker.
2023-09-08 16:26:21 INFO juju.worker.
2023-09-08 16:26:23 INFO juju.worker.
2023-09-08 16:26:23 INFO juju.worker.
2023-09-08 16:26:23 INFO juju.worker.
2023-09-08 16:26:23 INFO juju.worker.
2023-09-08 16:26:24 INFO juju.worker.
2023-09-08 16:26:24 INFO juju.worker.
2023-09-08 16:26:24 INFO juju.worker.
2023-09-08 16:26:24 INFO juju.worker.
2023-09-08 16:26:24 INFO juju.worker.
2023-09-08 16:26:24 INFO juju.worker.
2023-09-08 16:26:24 INFO juju.worker.
2023-09-08 16:26:24 INFO juju.worker.
2023-09-08 16:39:19 ERROR juju.worker.
d34-4aeb-
# ================ juju ssh kubernetes-worker/5 cat /var/log/
https:/
# =============== juju show-status-log kubernetes-worker/5
# ..
08 Sep 2023 16:23:37Z workload active Kubernetes worker running (without gpu support).
08 Sep 2023 16:26:24Z juju-unit executing running pre-series-upgrade hook
$ date
Fri 08 Sep 2023 05:44:13 PM UTC # About 1h+ after executing started
# =======
The kubernetes-worker unit was stuck running this hook
https:/
It was stuck draining pods from the worker
So, I propose we add a message or log before starting the draining
This way from Juju POV (ideally from juju status, but from juju debug-log as well) we know what is happening
Specially for cases where pods take time to evict
# ---
Here how to force them out if it is stuck
kubectl drain <k8s-node-name> --ignore-daemonsets --delete-
description: | updated |
Changed in charm-kubernetes-worker: | |
milestone: | none → 1.29 |
status: | New → Triaged |
importance: | Undecided → Wishlist |
Changed in charm-kubernetes-master: | |
status: | New → Triaged |
milestone: | none → 1.29 |
importance: | Undecided → Wishlist |