Can't start more than 35 containers on my machine

Bug #948623 reported by Stéphane Graber
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxc (Ubuntu)
Fix Released
High
Serge Hallyn

Bug Description

Doing some load testing on LXC, I noticed that I'm unable to start more than 35 containers on my machine.

The limitation is apparently inotify and the utmp inotify watch:
      lxc-start 1331084250.970 NOTICE lxc_conf - 'test40' is setup.
      lxc-start 1331084250.970 NOTICE lxc_start - exec'ing '/sbin/init'
      lxc-start 1331084250.970 NOTICE lxc_start - '/sbin/init' started with pid '20939'
      lxc-start 1331084251.275 ERROR lxc_utmp - Too many open files - failed to inotify_init
      lxc-start 1331084251.275 ERROR lxc_start - failed to add utmp handler to mainloop
      lxc-start 1331084251.275 ERROR lxc_start - mainloop exited with an error
      lxc-start 1331084251.276 DEBUG lxc_cgroup - get_init_cgroup: found init cgroup for subsys (null) at /

      lxc-start 1331084251.276 DEBUG lxc_cgroup - destroying /sys/fs/cgroup/cpuset//lxc/test40

      lxc-start 1331084251.276 ERROR lxc_cgroup - Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/cpuset//lxc/test40'

Which seems odd as we're not supposed to use that in 12.04 now that we have the kernel patch right?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 948623] [NEW] Can't start more than 35 containers on my machine

> lxc-start 1331084251.275 ERROR lxc_utmp - Too many open files - failed to inotify_init

Note I don't know that just stopping utmp watching will help this, however,

> lxc-start 1331084251.275 ERROR lxc_start - failed to add utmp handler to mainloop
> lxc-start 1331084251.275 ERROR lxc_start - mainloop exited with an error
> lxc-start 1331084251.276 DEBUG lxc_cgroup - get_init_cgroup: found init cgroup for subsys (null) at /
>
>
> lxc-start 1331084251.276 DEBUG lxc_cgroup - destroying /sys/fs/cgroup/cpuset//lxc/test40
>
> lxc-start 1331084251.276 ERROR lxc_cgroup - Device or resource
> busy - failed to remove cgroup '/sys/fs/cgroup/cpuset//lxc/test40'
>
>
> Which seems odd as we're not supposed to use that in 12.04 now that we have the kernel patch right?

Well this is embarassing.

The code to check whether utmp needs to be checked is run in a thread
which is actually cloned from the code which uses it. We need to
run must_drop_cap_sys_boot() earlier and cache its results in the
handler for both lxc_poll() and the do_start() which is run by a new
thread in lxc_spawn().

So, the check of must_drop_cap_sys_boot() should be run right before
lxc_spawn() is called in __lxc_start(), the results cached in
handler->conf, and used in do_start() (to decide whether to drop
caps) and lxc_poll() (to decide whether to watch utmp).

I can do this next week if you don't get to it first.

Changed in lxc (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
David Ward (dpward) wrote :

Note that you can increase the number of inotify instances allowed per user. See the "/proc interfaces" section of:
http://www.kernel.org/doc/man-pages/online/pages/man7/inotify.7.html

Revision history for this message
Stéphane Graber (stgraber) wrote :

Oh, thanks for the link, I googled it a bit when I first discovered the bug but couldn't find that parameter, apparently I didn't try hard enough :)

LXC should still be modified to not call the utmp watch when using a recent kernel though.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I'm not sure why I don't see my linked tree, but the proposed fix is at lp:~serge-hallyn/ubuntu/precise/lxc/lxc-dont-watch-utmp

Revision history for this message
Stéphane Graber (stgraber) wrote :

I didn't do extensive testing on it besides running the same test I used to trigger the bug initially bug I can now run 150 containers on the same machine using the fix from your branch, so looks like it works :)

Changed in lxc (Ubuntu):
assignee: nobody → Serge Hallyn (serge-hallyn)
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxc - 0.7.5-3ubuntu35

---------------
lxc (0.7.5-3ubuntu35) precise; urgency=low

  [Gary Poster]
  * lxc-start-ephemeral: convert ephemeral approach to change all bound fstab
    mounts; convert binding to also modify fstab
  [Benji York]
  * lxc-start-ephemeral: munge the fstab and comment out a flaky line
  [Serge Hallyn]
  * 0056-dont-watch-utmp: don't watch utmp if kernel supports container
    reboot. (LP: #948623)
  * debian/control: add dh-apparmor to Build-Depends (LP: #948481)
  * lxc-start-ephemeral: add '-d' option to daemonize.
  * debian/lxc.upstart: don't run post-stop if LXC_AUTO=false (LP: #949362)
 -- Serge Hallyn <email address hidden> Mon, 12 Mar 2012 09:51:59 -0500

Changed in lxc (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.