[Bug 1629226] Re: systemd's service killed by cgroup controller pids

Mon May 22 13:23:49 UTC 2017

Regarding the original report this is a simple program which keeps the maximal allowed children running and it does not get killed by cgroups, just the fork() call fails:
---
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#define MASTER_SLEEP_NS 1000000L
#define CHILD_SLEEP_S 5

void  main(void)
{
  pid_t  pid;
  struct timespec master_sleep = {0, MASTER_SLEEP_NS};

  for (;;) {
    pid = fork();
    if (pid < 0) {
      perror("fork failed:");
      nanosleep(&master_sleep, NULL);
    }
    if (pid == 0) {
      sleep(CHILD_SLEEP_S);
      exit(0);
    }
    nanosleep(&master_sleep, NULL);
    /* collect exited children */
    while (waitpid(-1, NULL, WNOHANG) > 0);
  } 
}
---
[Unit]
Description=Reproducer 
After=multi-user.target

[Service]
ExecStart=/home/user/reproducer
Type=simple
TasksMax=512

[Install]
WantedBy=multi-user.target
---
● reproducer.service - Reproducer
   Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-05-22 13:16:50 UTC; 3min 3s ago
 Main PID: 11778 (reproducer)
    Tasks: 512 (limit: 512)
   Memory: 55.4M
      CPU: 1min 2.794s
   CGroup: /system.slice/reproducer.service
           ├─11778 /home/rbalint/reproducer
           ├─18144 /home/rbalint/reproducer
...
           ├─26763 /home/rbalint/reproducer
           ├─26764 /home/rbalint/reproducer
           ├─26765 /home/rbalint/reproducer
           └─26766 /home/rbalint/reproducer

May 22 13:20:14 zesty-test reproducer[11778]: fork failed:: Resource temporarily unavailable
May 22 13:20:14 zesty-test reproducer[11778]: fork failed:: Resource temporarily unavailable
May 22 13:20:14 zesty-test reproducer[11778]: fork failed:: Resource temporarily unavailable

---

Bash on the other hand kills itself after a few failing forks:

● reproducer.service - Reproducer
   Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2017-05-22 13:22:38 UTC; 3s ago
  Process: 14281 ExecStart=/home/rbalint/reproducer.sh (code=exited, status=0/SUCCESS)
 Main PID: 14287 (code=exited, status=254)
      CPU: 639ms

May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:35 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: retry: Resource temporarily unavailable
May 22 13:22:38 zesty-test reproducer.sh[14281]: /home/rbalint/reproducer.sh: fork: Interrupted system call
May 22 13:22:38 zesty-test systemd[1]: reproducer.service: Main process exited, code=exited, status=254/n/a
May 22 13:22:38 zesty-test systemd[1]: reproducer.service: Unit entered failed state.

http://sources.debian.net/src/bash/4.4-5/jobs.c/?hl=1919#L1919

  /* Create the child, handle severe errors.  Retry on EAGAIN. */
  while ((pid = fork ()) < 0 && errno == EAGAIN && forksleep < FORKSLEEP_MAX)
    {
      /* bash-4.2 */
      sigprocmask (SIG_SETMASK, &oset, (sigset_t *)NULL);
      /* If we can't create any children, try to reap some dead ones. */
      waitchld (-1, 0);

      errno = EAGAIN;		/* restore errno */
      sys_error ("fork: retry");
      RESET_SIGTERM;

      if (sleep (forksleep) != 0)
	break;
      forksleep <<= 1;

      if (interrupt_state)
	break;
      sigprocmask (SIG_SETMASK, &set, (sigset_t *)NULL);
    }
...
  if (pid < 0)
    {
      sys_error ("fork");

      /* Kill all of the processes in the current pipeline. */
      terminate_current_pipeline ();

      /* Discard the current pipeline, if any. */
      if (the_pipeline)
	kill_current_pipeline ();

      last_command_exit_value = EX_NOEXEC;
      throw_to_top_level ();	/* Reset signals, etc. */
    }
...

I believe this is by design and I think this approach is reasonable.

A shell should not try to keep itself alive forking new processes when
it hit system limits already for a few times. There are other tools
available for implementing servers with worker pools which adapt to
system limits which are not defined in advance.

If you know the number of workers need in advance I suggest setting
TasksMax to high enough or to infinity in case you don't want to rely on
cgroup fork limits.

** Changed in: bash (Ubuntu)
       Status: In Progress => Invalid

** Summary changed:

- systemd's service killed by cgroup controller pids
+ Bash exits after a few failed fork()-s

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to bash in Ubuntu.
https://bugs.launchpad.net/bugs/1629226

Title:
  Bash exits after a few failed fork()-s

Status in The Ubuntu-power-systems project:
  New
Status in bash package in Ubuntu:
  Invalid

Bug description:
  Problem Description
  ===========================
  I write a simple systemd service which will fork child processes fiercely. But quickly the service failed:

  % sudo systemctl status reproducer.service
  ? reproducer.service - Reproducer of systemd services killed by ips
     Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Fri 2016-03-18 06:58:37 CDT; 2min 43s ago
    Process: 5103 ExecStart=/home/hpt/reproducer/reproducer.sh (code=exited, status=0/SUCCESS)
   Main PID: 5105 (code=exited, status=254)

  Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Main process exited, code=exited, status=254/n/a
  Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Unit entered failed state.
  Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Failed with result 'exit-code'.

  The default task limit of systemd services is 512. Looks like the
  service is terminated by the kernel's ips cgroup controller. I think
  this isn't correct. Child processes cannot be forked shouldn't cause
  parent to die.

  % cat /etc/systemd/system/reproducer.service 
  [Unit]
  Description=Reproducer of systemd services killed by ips
  After=multi-user.target

  [Service]
  ExecStart=/home/hpt/reproducer/reproducer.sh
  Type=forking

  [Install]
  WantedBy=multi-user.target

  % cat /home/hpt/reproducer/reproducer.sh
  #!/bin/bash

  foo()
  {
          #exec sh -c "echo $1: \$\$;sleep 60"
          echo $1: 
          sleep 60
  }

  bar()
  {
          c=1
          while true
          do
                  for ((i=1;i<=2048;i++))
                  do
                          foo $c &
                          ((c++))
                  done

                  wait
                  c=1
          done
  }

  # main
  bar &

  disown -a

  exit 0

  ---uname output---
  Linux pinelp3 4.4.0-12-generic #28-Ubuntu SMP Wed Mar 9 00:40:38 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

  Machine Type = IBM,8408-E8E,lpar 

  Steps to Reproduce
  ================================
  1. install the simple service in "Problem description"
  2. sudo systemctl start reproducer.service
  3. wait 2~3 minutes

  == Comment: #3 - Vaishnavi Bhat <vaish123 at in.ibm.com> - 2016-03-22 11:21:55 ==
  From the machine,
  root at pinelp3:~# ulimit -a
  core file size          (blocks, -c) 0
  data seg size           (kbytes, -d) unlimited
  scheduling priority             (-e) 0
  file size               (blocks, -f) unlimited
  pending signals                 (-i) 48192
  max locked memory       (kbytes, -l) 64
  max memory size         (kbytes, -m) unlimited
  open files                      (-n) 1024
  pipe size            (512 bytes, -p) 8
  POSIX message queues     (bytes, -q) 819200
  real-time priority              (-r) 0
  stack size              (kbytes, -s) 8192
  cpu time               (seconds, -t) unlimited
  max user processes              (-u) 48192
  virtual memory          (kbytes, -v) unlimited
  file locks                      (-x) unlimited

  root at pinelp3:~# ps aux | wc -l          --------->While the service is running 
  1084
  root at pinelp3:~# ps aux | wc -l                                   "
  1084
  root at pinelp3:~# ps aux | wc -l                                   "
  1084
  root at pinelp3:~# ps aux | wc -l                                   "
  1084
  root at pinelp3:~# ps aux | wc -l    ---------->While the service is not running. 
  572

  root at pinelp3:~# free -m   --------------> While service is running
                total        used        free      shared  buff/cache   available
  Mem:          12117         628         459          22       11029        9541
  Swap:          2052           8        2044

  root at pinelp3:~# free -m   ------------> while the service is not running. 
                total        used        free      shared  buff/cache   available
  Mem:          12117         308         809          22       10999        9890
  Swap:          2052           8        2044

  == Comment: #4 - Breno Henrique Leitao <brenohl at br.ibm.com> - 2016-03-22 11:48:57 ==
  This is a new feature in Ubuntu and Systemd that limits the amount of processes/child created.

  You can disable it as doucmented in
  https://wiki.ubuntu.com/ppc64el/Recommendations#Max_pids_on_Ubuntu_16.04

  == Comment: #5 - Ping Tian Han <pthan at cn.ibm.com> - 2016-03-22 20:24:33 ==
  (In reply to comment #4)
  > This is a new feature in Ubuntu and Systemd that limits the amount of
  > processes/child created.
  > 
  > You can disable it as doucmented in
  > https://wiki.ubuntu.com/ppc64el/Recommendations#Max_pids_on_Ubuntu_16.04

  Hi,

  We knew this feature. But it is then main process of the service
  failed when it cannont fork chlid processes. This the problem we want
  to know how to fix. Thanks.

  == Comment: #6 - Breno Henrique Leitao <brenohl at br.ibm.com> - 2016-03-24 13:58:08 ==
  Hi,

  > We knew this feature. But it is then main process of the service failed when
  > it cannont fork chlid processes. This the problem we want to know how to
  > fix. Thanks.

  What do you want to fix exactly? There was an RFC limiting all systemd
  services to spawn, a maximum of process.

  This came from the Systemd creator at
  https://lists.freedesktop.org/archives/systemd-
  devel/2015-November/035006.html.

  If you wish to have a service that needs more than 512 process, just
  enable it on the service file, as:

  [Unit]
  Description=Reproducer of systemd services killed by ips
  After=multi-user.target

  [Service]
  ExecStart=/home/hpt/reproducer/reproducer.sh
  Type=forking
  TasksMax=1024 (or bigger)

  [Install]
  WantedBy=multi-user.target

  Anyway, what are you trying to test exactly, and why do you think it
  fails?

  == Comment: #7 - Ping Tian Han <pthan at cn.ibm.com> - 2016-03-24 21:26:35 ==
  (In reply to comment #6)
  > Hi,
  > 
  > > We knew this feature. But it is then main process of the service failed when
  > > it cannont fork chlid processes. This the problem we want to know how to
  > > fix. Thanks.
  > 
  > What do you want to fix exactly? There was an RFC limiting all systemd
  > services to spawn, a maximum of process.
  > 
  > This came from the Systemd creator at
  > https://lists.freedesktop.org/archives/systemd-devel/2015-November/035006.
  > html.
  > 
  > If you wish to have a service that needs more than 512 process, just enable
  > it on the service file, as:
  > 
  > [Unit]
  > Description=Reproducer of systemd services killed by ips
  > After=multi-user.target
  > 
  > [Service]
  > ExecStart=/home/hpt/reproducer/reproducer.sh
  > Type=forking
  > TasksMax=1024 (or bigger)
  > 
  > [Install]
  > WantedBy=multi-user.target
  > 
  > Anyway, what are you trying to test exactly, and why do you think it fails?

  Hi,

  We saw that the main process of the example service quited (or
  killed?) by some reason when it cannot fork more chlid processes. This
  is the problem. We can tolerate that there is only 512 child processes
  but cannot if the main process being killed.

  We think that even though the main process cannot fork more than 512
  child processes, it shouldn't be killed.

  thanks.

  == Comment: #8 - Breno Henrique Leitao <brenohl at br.ibm.com> - 2016-03-29 10:16:04 ==
  Hi PIng,

  > We think that even though the main process cannot fork more than 512 child
  > processes, it shouldn't be killed.

  Right. Now I understand your point. So, what is the main process here?
  Is it your serivce process, or the systemd process?

  PS: I am decreasing the priority to normal, since this is a very
  corner case usage.

  == Comment: #9 - Kevin W. Rudd <ruddk at us.ibm.com> - 2016-03-29 15:50:15 ==
  The process in question wasn't killed. It did its own exit:

  ...
  19236 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x3fff96c25aa0) = -1 EAGAIN (Resource temporarily unavailable)
  19236 write(2, "/usr/local/bin/real_forkit.sh: f"..., 70) = 70
  19236 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
  19236 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
  19236 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
  19236 exit_group(254)                   = ?
  19236 +++ exited with 254 +++

  The problem here is that the replication script is not *handling* the
  error.  It is not even looking at the return code of the function it
  is trying to fork.  It is very possible that the shell itself
  triggered its own protection mechanism when faced with a script that
  appeared to be  running away (the above exit was not after the first
  clone failure, but after several calls to clone had failed).

  Personally, if I were the admin of this system, this is exactly what I
  would want to happen to a run away process (smack it down quickly).

  == Comment: #10 - Ping Tian Han <pthan at cn.ibm.com> - 2016-03-29 21:17:16 ==
  (In reply to comment #9)
  > The process in question wasn't killed. It did its own exit:
  > 
  > ...
  > 19236 clone(child_stack=0,
  > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
  > child_tidptr=0x3fff96c25aa0) = -1 EAGAIN (Resource temporarily unavailable)
  > 19236 write(2, "/usr/local/bin/real_forkit.sh: f"..., 70) = 70
  > 19236 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
  > 19236 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
  > 19236 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
  > 19236 exit_group(254)                   = ?
  > 19236 +++ exited with 254 +++
  > 
  > The problem here is that the replication script is not *handling* the error.
  > It is not even looking at the return code of the function it is trying to
  > fork.  It is very possible that the shell itself triggered its own
  > protection mechanism when faced with a script that appeared to be  running
  > away (the above exit was not after the first clone failure, but after
  > several calls to clone had failed).
  > 

  Thanks, Kevin. Looks like this is a problem of bash? I think there
  isn't any reason casuing a shell process as parents to quit just
  because it cannot fork more child processes...

  > Personally, if I were the admin of this system, this is exactly what I would
  > want to happen to a run away process (smack it down quickly).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1629226/+subscriptions