Processes in "D" state due to zap_pid_ns_processes kernel call with Ubuntu + Docker
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Seth Forshee | ||
Xenial |
Fix Released
|
Medium
|
Seth Forshee | ||
Yakkety |
Fix Released
|
Medium
|
Seth Forshee | ||
Zesty |
Fix Released
|
Medium
|
Seth Forshee |
Bug Description
SRU Justification
Impact: In some cases some docker processes can be stuck in the D state after a container has terminated. They will remain in this state until reboot.
Fix: Cherry pick upstream commit b9a985db98961ae
Test case: See below.
Regression potential: Low, this is a simple change and as stated above the patch has already been shipped out in upstream stable kernels.
---
(please refer to https:/
Precondition: Ubuntu 16.04.2 with Docker 17.03 (kernel 4.4)
Steps to reproduce:
- Install latest Docker
- Run 300 containers with health check (for i in {1..300}; do docker run -d -it --restart=always --name poc_$i talves/health_poc; done)
- Send termination signal to the containers (docker kill -s TERM $(docker ps -q)
- A few processes are going to be stuck in "uninterruptible sleep" ("D" state). The only know way to recover from this is host reboot
Expected behavior:
- All containers should be terminated without any dangling process
Actual behavior:
- Some processes are left in "D" state. In our production environment this leads over time to performance degradation and maintenance issues due to containers that cannot be stopped / removed.
A fix is provided on kernel 4.12 - it would be nice if it could be backported and included in the next Ubuntu release within the supported kernel.
Thanks in advance
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 May 29 16:54 seq
crw-rw---- 1 root audio 116, 33 May 29 16:54 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.1-0ubuntu2.6
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse:
Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Cannot stat file /proc/11652/fd/4: Stale file handle
Cannot stat file /proc/11652/fd/5: Stale file handle
Cannot stat file /proc/11652/fd/6: Stale file handle
Cannot stat file /proc/11652/fd/7: Stale file handle
Cannot stat file /proc/11652/fd/11: Stale file handle
DistroRelease: Ubuntu 16.04
Ec2AMI: ami-45b69e52
Ec2AMIManifest: (unknown)
Ec2Availability
Ec2InstanceType: t2.large
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB: 0 cirrusdrmfb
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.157.10
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial ec2-images
Uname: Linux 4.4.0-78-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
_MarkForUpload: True
dmi.bios.date: 02/16/2017
dmi.bios.vendor: Xen
dmi.bios.version: 4.2.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:
dmi.product.name: HVM domU
dmi.product.
dmi.sys.vendor: Xen
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
assignee: | nobody → Seth Forshee (sforshee) |
description: | updated |
Changed in linux (Ubuntu Yakkety): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Zesty): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu): | |
status: | Fix Committed → Fix Released |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1698264
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.