Comment 2 for bug 1629226

Revision history for this message
Balint Reczey (rbalint) wrote : Re: systemd's service killed by cgroup controller pids

Regarding the original report this is a simple program which keeps the maximal allowed children running and it does not get killed by cgroups, just the fork() call fails:
---
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define MASTER_SLEEP_NS 1000000L
#define CHILD_SLEEP_S 5

void main(void)
{
  pid_t pid;
  struct timespec master_sleep = {0, MASTER_SLEEP_NS};

  for (;;) {
    pid = fork();
    if (pid < 0) {
      perror("fork failed:");
      nanosleep(&master_sleep, NULL);
    }
    if (pid == 0) {
      sleep(CHILD_SLEEP_S);
      exit(0);
    }
    nanosleep(&master_sleep, NULL);
    /* collect exited children */
    while (waitpid(-1, NULL, WNOHANG) > 0);
  }
}
---
[Unit]
Description=Reproducer
After=multi-user.target

[Service]
ExecStart=/home/user/reproducer
Type=simple
TasksMax=512

[Install]
WantedBy=multi-user.target
---
● reproducer.service - Reproducer
   Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-05-22 13:16:50 UTC; 3min 3s ago
 Main PID: 11778 (reproducer)
    Tasks: 512 (limit: 512)
   Memory: 55.4M
      CPU: 1min 2.794s
   CGroup: /system.slice/reproducer.service
           ├─11778 /home/rbalint/reproducer
           ├─18144 /home/rbalint/reproducer
...
           ├─26763 /home/rbalint/reproducer
           ├─26764 /home/rbalint/reproducer
           ├─26765 /home/rbalint/reproducer
           └─26766 /home/rbalint/reproducer

May 22 13:20:14 zesty-test reproducer[11778]: fork failed:: Resource temporarily unavailable
May 22 13:20:14 zesty-test reproducer[11778]: fork failed:: Resource temporarily unavailable
May 22 13:20:14 zesty-test reproducer[11778]: fork failed:: Resource temporarily unavailable

---

Bash on the other hand kills itself after a few failing forks:

● reproducer.service - Reproducer
   Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2017-05-22 13:22:38 UTC; 3s ago
  Process: 14281 ExecStart=/home/rbalint/reproducer.sh (code=exited, status=0/SUCCESS)
 Main PID: 14287 (code=exited, status=254)
      CPU: 639ms

May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:38 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: Interrupted system call
May 22 13:22:38 zesty-test systemd[1]: reproducer.service: Main process exited, code=exited, status=254/n/a
May 22 13:22:38 zesty-test systemd[1]: reproducer.service: Unit entered failed state.

http://sources.debian.net/src/bash/4.4-5/jobs.c/?hl=1919#L1919

  /* Create the child, handle severe errors. Retry on EAGAIN. */
  while ((pid = fork ()) < 0 && errno == EAGAIN && forksleep < FORKSLEEP_MAX)
    {
      /* bash-4.2 */
      sigprocmask (SIG_SETMASK, &oset, (sigset_t *)NULL);
      /* If we can't create any children, try to reap some dead ones. */
      waitchld (-1, 0);

      errno = EAGAIN; /* restore errno */
      sys_error ("fork: retry");
      RESET_SIGTERM;

      if (sleep (forksleep) != 0)
 break;
      forksleep <<= 1;

      if (interrupt_state)
 break;
      sigprocmask (SIG_SETMASK, &set, (sigset_t *)NULL);
    }
...
  if (pid < 0)
    {
      sys_error ("fork");

      /* Kill all of the processes in the current pipeline. */
      terminate_current_pipeline ();

      /* Discard the current pipeline, if any. */
      if (the_pipeline)
 kill_current_pipeline ();

      last_command_exit_value = EX_NOEXEC;
      throw_to_top_level (); /* Reset signals, etc. */
    }
...

I believe this is by design and I think this approach is reasonable.

A shell should not try to keep itself alive forking new processes when it hit system limits already for a few times. There are other tools available for implementing servers with worker pools which adapt to system limits which are not defined in advance.

If you know the number of workers need in advance I suggest setting TasksMax to high enough or to infinity in case you don't want to rely on cgroup fork limits.