acpi_pad consumes 100% of resources
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Nvidia |
New
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
acpi_pad will take up 100% of the CPU resources and slow the system to a crawl. 'rmmod acpi_pad' removes the offender and brings the system response back.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20765 root 20 0 0 0 0 R 100.0 0.0 5:07.99 xhpl
20879 root -2 0 0 0 0 R 100.0 0.0 7:12.40 acpi_pad/5
20887 root -2 0 0 0 0 R 100.0 0.0 6:57.72 acpi_pad/13
20891 root -2 0 0 0 0 R 100.0 0.0 7:05.74 acpi_pad/17
20874 root -2 0 0 0 0 R 100.0 0.0 7:15.16 acpi_pad/0
20875 root -2 0 0 0 0 R 100.0 0.0 7:14.76 acpi_pad/1
20876 root -2 0 0 0 0 R 100.0 0.0 7:13.54 acpi_pad/2
20877 root -2 0 0 0 0 R 100.0 0.0 7:13.54 acpi_pad/3
20880 root -2 0 0 0 0 R 100.0 0.0 7:11.44 acpi_pad/6
20881 root -2 0 0 0 0 R 100.0 0.0 7:11.17 acpi_pad/7
20882 root -2 0 0 0 0 R 100.0 0.0 7:05.42 acpi_pad/8
20883 root -2 0 0 0 0 R 100.0 0.0 7:10.80 acpi_pad/9
20884 root -2 0 0 0 0 R 100.0 0.0 7:09.50 acpi_pad/10
20885 root -2 0 0 0 0 R 100.0 0.0 7:09.66 acpi_pad/11
20888 root -2 0 0 0 0 R 100.0 0.0 7:07.30 acpi_pad/14
20889 root -2 0 0 0 0 R 100.0 0.0 7:07.37 acpi_pad/15
20890 root -2 0 0 0 0 R 100.0 0.0 7:05.50 acpi_pad/16
20892 root -2 0 0 0 0 R 100.0 0.0 7:04.40 acpi_pad/18
20893 root -2 0 0 0 0 R 100.0 0.0 7:04.21 acpi_pad/19
20894 root -2 0 0 0 0 R 100.0 0.0 7:03.70 acpi_pad/20
20895 root -2 0 0 0 0 R 100.0 0.0 7:03.63 acpi_pad/21
20896 root -2 0 0 0 0 R 100.0 0.0 7:01.61 acpi_pad/22
20897 root -2 0 0 0 0 R 100.0 0.0 7:01.66 acpi_pad/23
20898 root -2 0 0 0 0 R 100.0 0.0 7:00.80 acpi_pad/24
20899 root -2 0 0 0 0 R 100.0 0.0 7:00.81 acpi_pad/25
20901 root -2 0 0 0 0 R 100.0 0.0 6:58.79 acpi_pad/26
20902 root -2 0 0 0 0 R 100.0 0.0 6:58.96 acpi_pad/27
20903 root -2 0 0 0 0 R 100.0 0.0 6:57.82 acpi_pad/28
20904 root -2 0 0 0 0 R 100.0 0.0 6:57.83 acpi_pad/29
20906 root -2 0 0 0 0 R 100.0 0.0 6:55.54 acpi_pad/31
20886 root -2 0 0 0 0 R 99.7 0.0 7:08.80 acpi_pad/12
20878 root -2 0 0 0 0 R 98.4 0.0 7:12.20 acpi_pad/4
20905 root -2 0 0 0 0 R 98.4 0.0 6:55.85 acpi_pad/30
3049 newrelic 20 0 245800 8388 4724 S 22.3 0.0 0:14.74 nrsysmond
22126 root 20 0 19592 3876 2392 R 6.0 0.0 0:00.99 top
1441 root 39 19 0 0 0 S 3.4 0.0 3:05.47 kipmi0
20720 root 20 0 870276 13080 6208 S 1.6 0.0 0:01.50 collectd
8 root 20 0 0 0 0 S 0.9 0.0 0:03.19 rcu_sched
13 root rt 0 0 0 0 S 0.3 0.0 0:00.03 watchdog/0
This has been seen on the 4.2 and 4.4 kernels. I believe the LINPACK test suite was running in all cases this was seen. However, it occurs pretty infrequently, and I don't know how to reliably recreate the issue. It has only been seen on the DGX-1 Server, not on the DGX Station. I'm not sure if any other systems have seen it.
Another data point which may or may not be relevant is that C-states and P-states are enabled.
We can workaround this issue by blacklisting the acpi_pad module, or by using the acpi_pad.disable=1 kernel bootarg. What are the implications of disabling acpi_pad are?
Googling "acpi_pad uses up all the resource" returns many hits where they all suggest to simply disable it.
information type: | Proprietary → Public |
information type: | Public → Private |
information type: | Private → Private Security |
information type: | Private Security → Public |
tags: | added: cscc |
Processor Aggregator Device is introduced on ACPI 4.0 as a mechanism for platforms to ask the OS to force processors to enter (power saving) idle., in order to reduce the platform’s power consumption.
Some description on ACPI spec:
The following section describes the definition and operation of the optional Processor Aggregator
device. The Processor Aggregator Device provides a control point that enables the platform to
perform specific processor configuration and control that applies to all processors in the platform. The Plug and Play ID of the Processor Aggregator Device is ACPI000C.
Acpi_pad driver will do the job for processors to enter idle when platform requests via notification and acpi _PUR methods.
Disable the acpi_pad driver will not put the processors to idle and reduce some power consumption when platform requests.