OOM and High CPU utilization in update_blocked_averages because of too many cfs_rqs in rq->leaf_cfs_rq_list
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Gavin Guo |
Bug Description
[Impact]
The CPU utilization keeps high and the flamegraph[1] shows that the CPU
is busy updating the load average in the for loop inside
update_
the decayed cfs_rqs are not released.
[Fix]
commit a9e7f6544b9cebd
Author: Tejun Heo <email address hidden>
Date: Tue Apr 25 17:43:50 2017 -0700
sched/fair: Fix O(nr_cgroups) in load balance path
Currently, rq->leaf_
live cfs_rqs which have ever been active on the CPU; unfortunately,
this makes update_
scalable at all.
This shows up as a small CPU consumption and scheduling latency
increase in the load balancing path in systems with CPU controller
enabled across most cgroups. In an edge case where temporary cgroups
were leaking, this caused the kernel to consume good several tens of
percents of CPU cycles running update_
taking multiple millisecs.
This patch fixes the issue by taking empty and fully decayed cfs_rqs
off the rq->leaf_
[Test]
1). Running the script
#/bin/bash
for i in $(seq 1 10); do
( for j in $(seq 1 3000); do ssh -S none u@localhost date;done; echo "done $i" ) &
done
2). Observe the cfs_rqs
$ watch -n1 "grep cfs_rq /proc/sched_debug| wc -l"
3). Observe the CPU utilization rate
$ sudo htop
The patched kernel[2] shows that the CPU utilization rate is normal, the
cfs_rqs is decreased periodically, and the memory can be limited.
[Reference]
[1]. http://
[2]. https:/
Changed in linux (Ubuntu Xenial): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-xenial removed: verification-needed-xenial |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1747896
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.