increase inotify limits for kubelet/cAdvisor

Bug #1828759 reported by Paul Collins
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Kubernetes Worker Charm
Fix Released
Medium
Mike Wilson

Bug Description

One of our k8s clusters, running v1.12.8, has a lot of cron jobs running, and therefore spawns a lot of pods. This seems to provoke an inotify leak somewhere in k8s that eventually causes our nodes to stop working and become NotReady. kubelet was logging this at the end of each attempt to start:

May 12 06:25:43 juju-66cffb-mojo-is-kubernetes-24 kubelet.daemon[14274]: E0512 06:25:43.547072 14274 raw.go:146] Failed to watch directory "/sys/fs/cgroup/blkio/system.slice/run-r21e036a699424d61aab9c6320782209e.scope": inotify_add_watch /sys/fs/cgroup/blkio/system.slice/run-r21e036a699424d61aab9c6320782209e.scope: no space left on device
May 12 06:25:43 juju-66cffb-mojo-is-kubernetes-24 kubelet.daemon[14274]: E0512 06:25:43.547173 14274 raw.go:146] Failed to watch directory "/sys/fs/cgroup/blkio/system.slice": inotify_add_watch /sys/fs/cgroup/blkio/system.slice/run-r21e036a699424d61aab9c6320782209e.scope: no space left on device
May 12 06:25:43 juju-66cffb-mojo-is-kubernetes-24 kubelet.daemon[14274]: F0512 06:25:43.547217 14274 kubelet.go:1344] Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/blkio/system.slice/run-r21e036a699424d61aab9c6320782209e.scope: no space left on device
May 12 06:25:45 juju-66cffb-mojo-is-kubernetes-24 systemd[1]: snap.kubelet.daemon.service: Main process exited, code=exited, status=255/n/a

fs.inotify.max_user_watches was set to 8192 when I investigated, and after changing it to 1048576, on the next restart attempt kubelet didn't exit, the nodes became Ready, and our cron jobs started running again.

Various third parties increased these limits:
  * https://github.com/kubermatic/machine-controller/pull/471/files
  * https://github.com/jetstack/tarmak/pull/757/files
and it seems that cAdvisor has fixed it, and possibly the change has made it into some version of Kubernetes itself, although the precise status of the k8s issue is not entirely clear to me
  * https://github.com/google/cadvisor/pull/1916
  * https://github.com/kubernetes/kubernetes/issues/63204

If the fixed cAdvisor has not yet made it to all current releases, then the Juju charm should probably bump the inotify limits in the meantime.

Tags: sts
Revision history for this message
Mike Wilson (knobby) wrote :

This is a configurable thing after the next stable release. It was fixed in https://github.com/charmed-kubernetes/layer-kubernetes-master-worker-base/pull/3.

The way it will work is `juju config sysctl="{ fs.inotify.max_user_watches=1048576 }"`

Changed in charm-kubernetes-worker:
assignee: nobody → Mike Wilson (knobby)
importance: Undecided → Medium
status: New → Fix Committed
Revision history for this message
Tom Haddon (mthaddon) wrote :

What will the default be? Surely CDK should set the defaults to be sensible enough that people won't run into this.

Revision history for this message
Mike Wilson (knobby) wrote :

Excellent point. Currently the defaults do not include inotify settings. Do you have any input as to what settings should be default?

Revision history for this message
Paul Collins (pjdc) wrote :

I've set them as follows on our spawn-heavy cluster:

fs.inotify.max_user_instances = 8192 # default 128
fs.inotify.max_user_watches = 1048576 # default 8192

and so far so good, although I don't know how large these structures are so this could represent a surprisingly large amount of unpageable kernel memory. They don't appear to live in their own slabs, so /proc/slabinfo is no help. FWIW, the first two links in my original report used similar values.

Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

It seems that this fix is released.
Could somebody confirm if it is released?

Thanks

Revision history for this message
Seyeong Kim (seyeongkim) wrote :
Revision history for this message
George Kraft (cynerva) wrote :

The sysctl config has been released to stable with these charm revisions:
cs:~containers/kubernetes-master-684
cs:~containers/kubernetes-worker-541

However, we have not addressed the second part of this issue: bumping the inotify limits by default. For now, you can work around it by using the kubernetes-worker charm's sysctl config to manually set the inotify limits.

Removing Fix Committed status, since part of this issue is still unresolved.

Changed in charm-kubernetes-worker:
status: Fix Committed → Confirmed
Revision history for this message
Mike Wilson (knobby) wrote :
Changed in charm-kubernetes-worker:
status: Confirmed → In Progress
Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

The new fs.inotify defaults will be available in edge builds:
cs:~containers/kubernetes-master-714
cs:~containers/kubernetes-worker-562

Changed in charm-kubernetes-worker:
status: In Progress → Fix Committed
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

Is this reverted again?

I was able to see this on 714 but not in 724

Revision history for this message
George Kraft (cynerva) wrote :

This hasn't been reverted. The fix is available in edge.

kubernetes-master-714 was an edge build, so it makes sense that you saw the fix there.

kubernetes-master-724 was a candidate/stable build, and we have not backported this fix to stable, so it's expected not to see the fix there.

Changed in charm-kubernetes-worker:
milestone: none → 1.16
Changed in charm-kubernetes-worker:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.