We're getting zombies here which aren't being reaped:
130428 ? Z 0:00 [stress-ng-brk] <defunct>
130432 ? Z 0:00 [stress-ng-brk] <defunct>
130434 ? Z 0:00 [stress-ng-brk] <defunct>
130436 ? Z 0:00 [stress-ng-brk] <defunct>
The reason for this is that memory stressors like brk have a parent that forks off a child. The child performs the stressing and if it gets OOM'd the parent can spawn off another stressor. So I think the SIGKILL on the stress-ng brk stressor is killing the parent bug the child (which is still holding onto a load of memory on the heap) is not being waited for and hence is in a memory hogging zombie state. We may be in a pathologically memory hogging state because the zombies may be holding brk regions that are swapped out to disk due to memory pressure and we're hitting a low-memory state which is not being cleared up.
I suggest modifying the test bash script as follows:
1. run stress-ng with -k flag (so that all the processes have the same stress-ng name)
2. kill with ALRM first
3. then kill with KILL all the stress-ng processes after a small grace period.
4. report on unkillable stressors
We're getting zombies here which aren't being reaped:
130428 ? Z 0:00 [stress-ng-brk] <defunct>
130432 ? Z 0:00 [stress-ng-brk] <defunct>
130434 ? Z 0:00 [stress-ng-brk] <defunct>
130436 ? Z 0:00 [stress-ng-brk] <defunct>
The reason for this is that memory stressors like brk have a parent that forks off a child. The child performs the stressing and if it gets OOM'd the parent can spawn off another stressor. So I think the SIGKILL on the stress-ng brk stressor is killing the parent bug the child (which is still holding onto a load of memory on the heap) is not being waited for and hence is in a memory hogging zombie state. We may be in a pathologically memory hogging state because the zombies may be holding brk regions that are swapped out to disk due to memory pressure and we're hitting a low-memory state which is not being cleared up.
I suggest modifying the test bash script as follows:
1. run stress-ng with -k flag (so that all the processes have the same stress-ng name)
2. kill with ALRM first
3. then kill with KILL all the stress-ng processes after a small grace period.
4. report on unkillable stressors
refer to the changes I made to https:/ /launchpadlibra rian.net/ 296974522/ disk_stress_ ng