Activity log for bug #1928806

Date Who What changed Old value New value Message
2021-05-18 14:30:19 Przemyslaw Lal bug added bug
2021-05-18 14:30:19 Przemyslaw Lal attachment added Full list of leftover systemd scopes https://bugs.launchpad.net/bugs/1928806/+attachment/5498496/+files/scope_bug.txt
2021-05-18 14:31:44 Przemyslaw Lal description We found over 8.5k snap kubectl systemd scope units on each of our Kubernetes master nodes. This causes 100% CPU usage spikes caused by systemd and /sbin/init processes hosing the entire cluster. $ sudo systemctl list-units --type scope | grep snap | wc -l 8643 Typical entries looks like this: snap.kubectl.kubectl.001b8133-30f5-42c3-90e4-39da4229cbef.scope loaded active running snap.kubectl.kubectl.001b8133-30f5-42c3-90e4-39da4229cbef.scope snap.kubectl.kubectl.001e4ffa-17c6-49ea-8fd2-1d7d07fb0e2f.scope loaded active running snap.kubectl.kubectl.001e4ffa-17c6-49ea-8fd2-1d7d07fb0e2f.scope snap.kubectl.kubectl.0022dadb-f7b5-456f-ac51-98cfcd128035.scope loaded active running snap.kubectl.kubectl.0022dadb-f7b5-456f-ac51-98cfcd128035.scope snap.kubectl.kubectl.0026e64c-8794-47d9-b58c-a05759814dbd.scope loaded active running snap.kubectl.kubectl.0026e64c-8794-47d9-b58c-a05759814dbd.scope snap.kubectl.kubectl.002d028f-b30c-4ce7-b6b9-4c72ef74f308.scope loaded active running snap.kubectl.kubectl.002d028f-b30c-4ce7-b6b9-4c72ef74f308.scope snap.kubectl.kubectl.003584be-b585-48b9-8385-fca6a642433a.scope loaded active running snap.kubectl.kubectl.003584be-b585-48b9-8385-fca6a642433a.scope snap.kubectl.kubectl.0042ab28-ac94-476b-877d-9cde2d5e2312.scope loaded active running snap.kubectl.kubectl.0042ab28-ac94-476b-877d-9cde2d5e2312.scope snap.kubectl.kubectl.00537acd-ce13-4a9c-a70d-059d7d1088dc.scope loaded active running snap.kubectl.kubectl.00537acd-ce13-4a9c-a70d-059d7d1088dc.scope Please note that all of them are in status active/running. After manually stopping them using this one-liner: sudo systemctl list-units --type scope | grep kubectl | awk '{print $1}' | xargs sudo systemctl stop The number goes down to expected values: sudo systemctl list-units --type scope | grep snap | wc -l 11 And the system becomes much snappier again. The increased load caused by this issue, causes transient failures in communication between API servers and kubelets, resulting in errors similar to this: [0] Then we end up restarting kubelets which is the only way to restore connectivity between kubelets and API servers. Additionally we see a lot of similarities with this bug [1] reported for etcdctl. Both kubectl and etcdctl from that bug are running as snaps, leaving thousands of systemd scope units, slowing down the system. Versions: kubernetes-master charm: 1.18.15 charm revision: 895 Ubuntu: 18.04.5 LTS kubectl snap: 1.18.15 1.18/stable $ snap --version snap 2.50 snapd 2.50 series 16 ubuntu 18.04 kernel 5.4.0-1046-azure [0] https://github.com/kubernetes/kubernetes/issues/87615 [1] https://bugs.launchpad.net/charm-etcd/+bug/1926185 We found over 8.5k snap kubectl systemd scope units on each of our Kubernetes master nodes. This causes 100% CPU usage spikes caused by systemd and /sbin/init processes hosing the entire cluster. $ sudo systemctl list-units --type scope | grep snap | wc -l 8643 Typical entries look like these:   snap.kubectl.kubectl.001b8133-30f5-42c3-90e4-39da4229cbef.scope loaded active running snap.kubectl.kubectl.001b8133-30f5-42c3-90e4-39da4229cbef.scope   snap.kubectl.kubectl.001e4ffa-17c6-49ea-8fd2-1d7d07fb0e2f.scope loaded active running snap.kubectl.kubectl.001e4ffa-17c6-49ea-8fd2-1d7d07fb0e2f.scope   snap.kubectl.kubectl.0022dadb-f7b5-456f-ac51-98cfcd128035.scope loaded active running snap.kubectl.kubectl.0022dadb-f7b5-456f-ac51-98cfcd128035.scope   snap.kubectl.kubectl.0026e64c-8794-47d9-b58c-a05759814dbd.scope loaded active running snap.kubectl.kubectl.0026e64c-8794-47d9-b58c-a05759814dbd.scope   snap.kubectl.kubectl.002d028f-b30c-4ce7-b6b9-4c72ef74f308.scope loaded active running snap.kubectl.kubectl.002d028f-b30c-4ce7-b6b9-4c72ef74f308.scope   snap.kubectl.kubectl.003584be-b585-48b9-8385-fca6a642433a.scope loaded active running snap.kubectl.kubectl.003584be-b585-48b9-8385-fca6a642433a.scope   snap.kubectl.kubectl.0042ab28-ac94-476b-877d-9cde2d5e2312.scope loaded active running snap.kubectl.kubectl.0042ab28-ac94-476b-877d-9cde2d5e2312.scope   snap.kubectl.kubectl.00537acd-ce13-4a9c-a70d-059d7d1088dc.scope loaded active running snap.kubectl.kubectl.00537acd-ce13-4a9c-a70d-059d7d1088dc.scope Please note that all of them are in status active/running. After manually stopping them using this one-liner: sudo systemctl list-units --type scope | grep kubectl | awk '{print $1}' | xargs sudo systemctl stop The number goes down to expected values: sudo systemctl list-units --type scope | grep snap | wc -l 11 And the system becomes much snappier again. The increased load caused by this issue, causes transient failures in communication between API servers and kubelets, resulting in errors similar to this: [0] Then we end up restarting kubelets which is the only way to restore connectivity between kubelets and API servers. Additionally we see a lot of similarities with this bug [1] reported for etcdctl. Both kubectl and etcdctl from that bug are running as snaps, leaving thousands of systemd scope units, slowing down the system. Versions: kubernetes-master charm: 1.18.15 charm revision: 895 Ubuntu: 18.04.5 LTS kubectl snap: 1.18.15 1.18/stable $ snap --version snap 2.50 snapd 2.50 series 16 ubuntu 18.04 kernel 5.4.0-1046-azure [0] https://github.com/kubernetes/kubernetes/issues/87615 [1] https://bugs.launchpad.net/charm-etcd/+bug/1926185
2021-05-18 15:08:14 Garrett Neugent bug added subscriber Canonical IS BootStack
2021-05-19 09:06:43 Calvin Hartwell bug added subscriber Canonical Field High
2021-05-19 09:06:52 Calvin Hartwell removed subscriber Canonical Field High
2021-05-19 09:07:18 Calvin Hartwell bug added subscriber Canonical Field High
2021-05-19 14:46:48 George Kraft charm-kubernetes-master: status New Incomplete
2021-05-20 15:16:40 George Kraft charm-kubernetes-master: status Incomplete New
2021-05-20 21:28:17 George Kraft charm-kubernetes-master: status New Incomplete
2021-05-21 13:22:16 George Kraft charm-kubernetes-master: status Incomplete New
2021-05-24 14:22:27 George Kraft charm-kubernetes-master: status New Incomplete
2021-05-24 14:22:39 George Kraft bug task added snapd
2021-07-22 20:23:10 Ian Johnson attachment added zoom hook sysfs tarball https://bugs.launchpad.net/charm-kubernetes-master/+bug/1928806/+attachment/5512966/+files/zoom-hook-stuck-cgroup.tgz
2021-07-23 11:40:43 Ian Johnson marked as duplicate 1934147
2021-07-26 01:53:33 Dominique Poulain bug added subscriber Dominique Poulain
2022-01-03 21:43:55 Michael Iatrou bug added subscriber Michael Iatrou