Comment 8 for bug 1803179

Revision history for this message
In , peter (peter-linux-kernel-bugs) wrote :

Created attachment 232611
dmesg for v4.7-rc5 (triggered runtime-resume via writing "on" to (nvidia device)/power/control)

See also https://www.spinics.net/lists/linux-pci/msg53694.html ("Kernel Freeze with American Megatrends BIOS") for more details (acpidump, lspci, some analysis, etc.).

Steps to reproduce on the affected machines:

 1. Load nouveau.
 2. Wait for it to runtime suspend.
 2. Invoke 'lspci', this resumes the Nvidia PCI device via nouveau. (alternatively: write "on" to /sys/bus/pci/devices/0000:01:00.0/power/control)
 3. lspci never returns, few moments later an AML_INFINITE_LOOP is reported.

Affected machines from
https://github.com/Bumblebee-Project/Bumblebee/issues/764#issuecomment-234494238
- Clevo P651RA (and other Clevo P6xxRx models).
- MSI GE62 Apache Pro
- Gigabyte P35V5
- Razer Blade 14" (2016)
- Dell Inspiron 7559

These *new* laptops all have an Skylake CPU (i7-6500HQ) and a Nvidia GTX 9xxM GPU. Originally it was only observed for laptops with AMI BIOSes, but later we found a Dell laptop as well. The workaround acpi_osi="!Windows 2015" prevents Linux from reporting Windows 10 compatibility and helps *in some cases* because the ACPI code falls back to a different approach to power on the device (or PCIe link?).

Attached is one of the more interesting dmesg dumps which could be obtained that shows how the system breaks down over time. (This was v4.7-rc5 with PCI/PM D3cold + nouveau power resource/PM refcount leaks patches, but the problem was also visible on unpatches 4.4.0 for example.)