kubelet and kube-proxy hang due to systemd start limit

Bug #1999419 reported by George Kraft
14
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Kubernetes Common Layer
Fix Released
High
George Kraft
Kubernetes Control Plane Charm
Fix Released
High
George Kraft
Kubernetes Worker Charm
Fix Released
High
George Kraft

Bug Description

In a CI test run, we saw kube-proxy hang due to systemd start rate limiting.

The charm had just upgraded to k8s 1.26 snaps, which caused kube-proxy to crash since it was still using a deprecated --logtostderr argument. These crashes caused kube-proxy to hit the systemd start limit, where no further restarts will be attempted.

The charm then reconfigured the service and tried to restart it, but failed:

2022-12-09 23:34:40 INFO unit.kubernetes-worker/1.juju-log server.go:316 Restarting kubelet and kube-proxy.
2022-12-09 23:34:40 WARNING unit.kubernetes-worker/1.upgrade logger.go:60 Job for snap.kube-proxy.daemon.service failed because the control process exited with error code.
2022-12-09 23:34:40 WARNING unit.kubernetes-worker/1.upgrade logger.go:60 See "systemctl status snap.kube-proxy.daemon.service" and "journalctl -xe" for details.

Which failed due to:

Dec 9 23:34:40 juju-86e872-8 systemd[1]: snap.kube-proxy.daemon.service: Start request repeated too quickly.
Dec 9 23:34:40 juju-86e872-8 systemd[1]: snap.kube-proxy.daemon.service: Failed with result 'exit-code'.
Dec 9 23:34:40 juju-86e872-8 systemd[1]: Failed to start Service for snap application kube-proxy.daemon.

No further restart attempts occurred.

George Kraft (cynerva)
Changed in charm-kubernetes-worker:
assignee: nobody → George Kraft (cynerva)
importance: Undecided → High
status: New → In Progress
Adam Dyess (addyess)
Changed in charm-kubernetes-master:
milestone: none → 1.26
Changed in charm-kubernetes-worker:
milestone: none → 1.26
Changed in charm-kubernetes-master:
status: New → In Progress
Changed in layer-kubernetes-common:
status: New → In Progress
Changed in charm-kubernetes-master:
importance: Undecided → High
Changed in layer-kubernetes-common:
importance: Undecided → High
milestone: none → 1.26
tags: added: backport-needed
Revision history for this message
George Kraft (cynerva) wrote :
Revision history for this message
George Kraft (cynerva) wrote :
tags: removed: backport-needed
Changed in layer-kubernetes-common:
status: In Progress → Fix Committed
Changed in charm-kubernetes-master:
status: In Progress → Fix Committed
Changed in charm-kubernetes-worker:
status: In Progress → Fix Committed
Changed in layer-kubernetes-common:
assignee: nobody → George Kraft (cynerva)
Changed in charm-kubernetes-master:
assignee: nobody → George Kraft (cynerva)
Adam Dyess (addyess)
Changed in layer-kubernetes-common:
status: Fix Committed → Fix Released
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
Changed in charm-kubernetes-worker:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.