I am able to semi-reliably reproduce this (or very similar?) problem on a setup very close to one in comment #21
- kernel: 4.2.0-30-generic (ubuntu 15.10)
- 2 GB RAM, 1 CPU, running under Xen (EC2 t2.small instance)
- docker with LVM thin-pool storage backend, running 3 containers, no memory limits set for their memcg's
- server is mostly idling (load average 0.0-0.1)
To reproduce it I have to:
1. set vm.overcomit_memory=1
2. initiate some disk activity:
find -xdev / -type f |xargs -P10 -n1 md5sum &>/dev/null &
find /var/lib/docker -type f |xargs -P10 -n1 md5sum &>/dev/null &
3. run some memory allocations until you hit OOM
for x in {1..200}; do ./memalloc & : ; done
memalloc above is a simple C program which allocates 100MB and memsets it with 'x':
once you hit OOM, console slows down, it is time to CTRL+C, pkill memalloc and then check top. many times it spins `kswapd0` then recovers within tens of seconds, but once in a while it stays there for hours (didn't have patience to check for longer).
Once I triggered bug, I tried to get as much information as possible from running system. I am attaching /proc/*info files (some taken 5 s apart), ftrace outputs for event tracer (vmscan events only), ftrace output for function_graph tester. Let me know if you need more information.
To recover from situation need to free enough memory in a short period of time, sometime dropping caches helps, sometimes needed to close applications/containers as well, but never had to reboot to recover.
I am able to semi-reliably reproduce this (or very similar?) problem on a setup very close to one in comment #21
- kernel: 4.2.0-30-generic (ubuntu 15.10)
- 2 GB RAM, 1 CPU, running under Xen (EC2 t2.small instance)
- docker with LVM thin-pool storage backend, running 3 containers, no memory limits set for their memcg's
- server is mostly idling (load average 0.0-0.1)
To reproduce it I have to:
1. set vm.overcomit_ memory= 1
2. initiate some disk activity:
find -xdev / -type f |xargs -P10 -n1 md5sum &>/dev/null &
find /var/lib/docker -type f |xargs -P10 -n1 md5sum &>/dev/null &
3. run some memory allocations until you hit OOM
for x in {1..200}; do ./memalloc & : ; done
memalloc above is a simple C program which allocates 100MB and memsets it with 'x':
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int block_mb = 100;
char *buf;
printf("allocing %dMB: ", block_mb); "FAILED! \n"); EXIT_FAILURE) ;
buf = malloc(block_mb * 1024 * 1000);
if (! buf) {
printf(
exit(
}
printf("ok\n");
memset(buf, 'x', block_mb * 1024 * 1000);
sleep(180);
return 0;
}
once you hit OOM, console slows down, it is time to CTRL+C, pkill memalloc and then check top. many times it spins `kswapd0` then recovers within tens of seconds, but once in a while it stays there for hours (didn't have patience to check for longer).
Once I triggered bug, I tried to get as much information as possible from running system. I am attaching /proc/*info files (some taken 5 s apart), ftrace outputs for event tracer (vmscan events only), ftrace output for function_graph tester. Let me know if you need more information.
To recover from situation need to free enough memory in a short period of time, sometime dropping caches helps, sometimes needed to close applications/ containers as well, but never had to reboot to recover.