[KVM] Lower the default for halt_poll_ns to 200000 ns
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
linux (Ubuntu) | ||||||
Xenial |
Fix Released
|
Medium
|
Unassigned | |||
Zesty |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
[Environment]
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
Linux porygon 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[Description]
We've identified a constant high (~90%) system time load at the host level
when a VCPU in a KVM guest remains or switches/resumes in/from halt/idle state
in a constant frequency, usually for a slightly smaller time than the default polling
period.
The halt polling mechanism has the intention to reduce latency in the cases
on which the guest is quickly resumed saving a call to the scheduler.
We've performed some testing by adjusting the /sys/module/
value which defines the max time that should be spend polling before calling the
scheduler to allow it to run other tasks (which defaults to 400000 ns in Ubuntu).
With the default value the tests shows that the load remains nearly on 90% on a
VCPU that has a single task in the run queue.
We've also tested altering the halt_poll_ns value to 200000 ns and the results
seems to drop the system time usage from 90% to ~25%.
root@porygon:
root@porygon:
Linux 4.4.0-112-generic (porygon) 01/24/2018 _x86_64_ (64 CPU)
02:06:08 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
02:06:09 PM 6 0.00 0.00 4.85 0.00 0.00 0.00 0.00 16.50 0.00 78.64
[...]
Average: 6 0.00 0.00 4.26 0.00 0.00 0.00 0.00 17.83 0.00 77.91
root@porygon:
root@porygon:
Linux 4.4.0-112-generic (porygon) 01/24/2018 _x86_64_ (64 CPU)
02:06:20 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
02:06:21 PM 6 0.00 0.00 87.13 0.00 0.00 0.00 0.00 11.88 0.00 0.99
[...]
Average: 6 0.00 0.00 89.59 0.00 0.00 0.00 0.00 8.45 0.00 1.96
[Reproducer]
1) Configure a KVM guest with a single pinned VCPU.
2) Run the following program (http://
$ gcc test.c -lpthread -o test && ./test 250 0
3) Run mpstat at the host on the pinned CPU and compare the stats
$ sudo mpstat 1 -P 6 5
[Fix]
Change the halt polling max time to half of the current value.
In some fio benchmarks, halt_poll_ns=400000 caused CPU utilization to
increase heavily even in cases where the performance improvement was
small. In particular, bandwidth divided by CPU usage was as much as
60% lower.
To some extent this is the expected effect of the patch, and the
additional CPU utilization is only visible when running the
benchmarks. However, halving the threshold also halves the extra
CPU utilization (from +30-130% to +20-70%) and has no negative
effect on performance.
Signed-off-by: Paolo Bonzini <email address hidden>
* https:/
tags: | added: sts |
description: | updated |
Changed in linux (Ubuntu Zesty): | |
status: | Triaged → Won't Fix |
Changed in linux (Ubuntu Xenial): | |
status: | Triaged → Fix Committed |
tags: |
added: verification-done-xenial removed: verification-needed-xenial |
no longer affects: | linux (Ubuntu) |
no longer affects: | linux (Ubuntu Xenial) |
Changed in linux (Ubuntu Xenial): | |
status: | New → Fix Released |
no longer affects: | linux (Ubuntu Groovy) |
Changed in linux (Ubuntu Xenial): | |
importance: | Undecided → Medium |
Do you plan on submitting an SRU request to the kernel team mailing list?