4.2 (linux-generic-lts-wily) kernel hangs on Bay Trail based Asus X553MA

Bug #1531865 reported by Janne Heikkinen
60
This bug affects 12 people
Affects Status Importance Assigned to Milestone
linux (Debian)
New
Undecided
Unassigned
linux (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

Kernels since 4.2-rc1 contain:

[8fb55197e64d5988ec57b54e973daeea72c3f2ff] drm/i915: Agressive downclocking on Baytrail

which causes hangs on my Bay Trail based Asus X553MA and likely on other Bay Trail based machines
as well. The hangs occur between minutes and hours after booting and there is no information found
related to them in kernel message buffer.

I found the above commit by doing bisection between 4.1.0 and 4.2-rc1 and reported it to intel-gfx
mailing list:

http://lists.freedesktop.org/archives/intel-gfx/2016-January/084203.html

I tested that these hangs occur with 14.04 LTS using latest linux-generic-lts-wily (4.2.0-23-generic) kernel
exactly as with any kernel beginning from 4.2-rc1 up to 4.4-rc8.

I'm now writing this bug report running 4.4-rc8 with patches from the above commit removed because
4.2.0-23-generic just hanged while I was trying to send report.
---
ApportVersion: 2.14.1-0ubuntu3.19
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: jamse 2144 F.... pulseaudio
CurrentDesktop: Unity
CurrentDmesg:
 [ 20.017300] init: plymouth-upstart-bridge main process (1462) terminated with status 1
 [ 20.017332] init: plymouth-upstart-bridge main process ended, respawning
DistroRelease: Ubuntu 14.04
EcryptfsInUse: Yes
InstallationDate: Installed on 2014-11-17 (415 days ago)
InstallationMedia: Ubuntu 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.2)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 004: ID 04ca:3010 Lite-On Technology Corp.
 Bus 001 Device 002: ID 04f2:b424 Chicony Electronics Co., Ltd
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: ASUSTeK COMPUTER INC. X553MA
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-23-generic root=UUID=85bd44f0-e67f-450d-bfb5-0f87c90482cd ro quiet splash console=tty0 console=ttyUSB0,115200n8 ignore_loglevel vt.handoff=7
ProcVersionSignature: Ubuntu 4.2.0-23.28~14.04.1-generic 4.2.6
RelatedPackageVersions:
 linux-restricted-modules-4.2.0-23-generic N/A
 linux-backports-modules-4.2.0-23-generic N/A
 linux-firmware 1.127.19
Tags: trusty
Uname: Linux 4.2.0-23-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 08/08/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: X553MA.209
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: X553MA
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK COMPUTER INC.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrX553MA.209:bd08/08/2014:svnASUSTeKCOMPUTERINC.:pnX553MA:pvr1.0:rvnASUSTeKCOMPUTERINC.:rnX553MA:rvr1.0:cvnASUSTeKCOMPUTERINC.:ct10:cvr1.0:
dmi.product.name: X553MA
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK COMPUTER INC.

summary: - 4.2 (linux-generic-ltw-wily) kernel hangs on Bay Trail based Asus X553MA
+ 4.2 (linux-generic-lts-wily) kernel hangs on Bay Trail based Asus X553MA
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1531865

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected trusty
description: updated
Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : BootDmesg.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : CRDA.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : IwConfig.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : Lspci.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : ProcEnviron.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : ProcModules.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : PulseList.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : RfKill.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : UdevDb.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : UdevLog.txt

apport information

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-da-key vivid wily xenial
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
tags: added: kernel-bug-exists-upstream
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Since this bug still exists upstream, it might be best to also cc the message you sent to intel-gfx to the following addresses:

Author: Chris Wilson <email address hidden>
    Cc: Deepak S <email address hidden>
    Cc: Ville Syrjälä <email address hidden>
    Cc: Rodrigo Vivi <email address hidden>
    Cc: Daniel Vetter <email address hidden>

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

I did cc Chris Wilson and after that someone added Deepak S and then Chris Wilson also replied so I believe it went to right address.

Revision history for this message
Thomas Moestl (tmoestl) wrote :

The drm.intel team seems to have come to the conclusion that reverting this change does not fix the issue, at least not for everybody, although it might make the hangs much more rare. There exists a drm-intel bug report for this issue with all the details:
https://bugs.freedesktop.org/show_bug.cgi?id=88012
After it was concluded by the drm-intel developers that the fault was most likely not with the graphics drivers, as it first seemed to be, Daniel Vetter of drm-intel filed a kernel bug, assigned to Len Brown (who wrote the intel_idle driver):
https://bugzilla.kernel.org/show_bug.cgi?id=109051

I will nevertheless try the kernel with that change reverted as soon as I get a chance.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

I had been running 4.4-rc8 with the patch reversed three days with no problems but yesterday evening I put an OpenGL application causing heavy load to graphics pipeline running and machine had hanged overnight. Today after booting I ran the OpenGL application couple minutes and then few minutes later I had hang. During those three days I had been streaming video from the web several hours. I'll try if normal video playback would trigger the hang.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

Or it could have been something else. I decided to test 4.2.8 with "drm/i915: Agressive downclocking on Baytrail" reversed running that gpu intensive application.

Revision history for this message
Thomas Moestl (tmoestl) wrote :

In my test of your kernel image, I also have just had a hang again. The frequency of hangs seems much reduced, but they do seem to still occur, as others had also reported in the freedesktop.org bug report. The symptoms were slightly different this time (the display turned black, which has happened before, but was unusual, and the sound card played junk for a short time), but I do think that this is still the same problem, since it has never happened with much longer testing with intel_idle.max_cstate=1 (also, much of the sound hardware seems to be on-chip on BayTrail processors, so an invalid processor state might conceivably also cause the other on-board hardware to malfunction).

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

So far it seems that 4.2.8 doesn't have problems with high gpu load.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

4.2.8 didn't hang overnight either with the OpenGL application.

Some people have reported this hang happening also with 4.1.x kernels. I wonder have I really been so damn lucky not to have single hang with 4.1.13 since November.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

>Thomas Moestl (tmoestl) wrote 18 hours ago: #21
>
>In my test of your kernel image, I also have just had a hang again.

I put the mainline 4.2.8 I've been testing also here:

http://www.helsinki.fi/~jmoheikk/baytrail/

I wasn't able to get it hang with running the OpenGL apllication. Today I played videos with VLC 4 hours and built kernel at the same time with "-j 4" option. Seems as stable as any 4.1.x I've used.

Looking at your BootDmesg.txt it seems that you still have HDD. I've replaced mine with Samsung 840 EVO SSD. That is at least one thing that could make my system behave quite differently.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

Now I had VLC running overnight with 4.2.8 and no hang.

At the moment I'm assuming that with 4.2.8 and "drm/i915: Agressive downclocking on Baytrail" I'm getting the same behavior as with 4.1.13. The hangs might happen but something makes them extremely rare for me.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

I meant:

and "drm/i915: Agressive downclocking on Baytrail" reversed.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

Ok, during third night the OpenGL application had cause hang with 4.2.8.

I will now begint testing how "intel_idle.max_cstate=1" works :)

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

So far "intel_idle.max_cstate=1" has been working with Ubuntu 4.2.0-23.28 kernel. Uptime is now 2 days 11 hours. I've been playing DVD from the DVD drive, let it run my OpenGL application while streaming online video overnights and also used it for normal web browsing.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

Now after over 4 days of uptime with "intel_idle.max_cstate=1" I seemed to have a hang. But that was not the case. Screen was turned off and touching or pressing touchpad didn't bring it back on so I thought it had hanged. Then I decided to try taking ssh connection from my workstation to the ASUS X553MA and it succeed and I was able to reboot it from the ssh shell.

All of the previous "overnight hangs" without "intel_idle.max_cstate=1" happened like this. In the morning I couldn't wake it up using touchpad so I assumed that it had hanged. But I should had also tested if getting connection with ssh would have worked.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

But just verified that when it does hang with screen active it is not reachable with ssh via network.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

After this I do not have more time to play with this so I conclude that workaroud using kernel command line argument "intel_idle.max_cstate=1" solves the issue for me too.

Revision history for this message
Vladimír Jícha (jech) wrote :

I'm also affected by this bug. Basically any kernel newer then 3.16 has it. On some kernel versions it runs more on some less stable.

The problem is that even the "intel_idle.max_cstate=1" workaround isn't working on all kernels. I think it is very important to fix this bug. People running Ubuntu 14.04 LTS with the original 3.13 kernel are fine. But once they update to 16.04 and new kernel, their baytrail computers will most likely hang within a few minutes.

Revision history for this message
justin parker (s0m3f00l) wrote :

I have the same bug in my dell 3531 with debian testing. Also baytrail. I have been going crazy looking for an answer.

Revision history for this message
justin parker (s0m3f00l) wrote :

FYI I sent the bug to debian as well. They assigned it Bug#818330.

Revision history for this message
justin parker (s0m3f00l) wrote :

Workaround "intel_idle.max_cstate=1" didn't work completely for me. It did reduce the frequency but the hang eventually returns with the same symptoms.

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

Patches for Linux 4.4: https://github.com/hadess/rtl8723bs/tree/master/patches
Patches for Linux 4.5: https://github.com/hadess/rtl8723bs/tree/master/patches_4.5

But this won't be enough (this patches doesn't cover all hang cases) so there is more work on this is going on: https://bugzilla.kernel.org/show_bug.cgi?id=109051#c202

Revision history for this message
micha (kbt-d) wrote :

Got the same freezes with my ASUS F551 (N2930 Baytrail) - (no more hangs with intel_idle.max_cstate=1 since 2 days)

Interestingly enough I got those freezes after updating from win 8.1 to win10.
In fact, that's why I landed here, because to me it looked like a hardware defect at first sight and my idea was trying a different OS for verification.
Even if it sounds a little weird: It looks like the same bug has made it into Linux AND Windows. Too bad that I don't know how to run Windows with a similar boot option - I'd expect it would fix the problem on that side, too.

Revision history for this message
Maura (mdhausman) wrote :

I'm on 16.04, kernel 4.4, with the Pentium N3530 CPU and Intel Bay Trail graphics, suffering this same problem. I've implemented the workaround and hopefully that solves the issue because 16.04 has been pretty unusable for me.

Revision history for this message
Janne Heikkinen (janne-m-heikkinen) wrote :

I installed 16.04 weeks ago and first hang happened in in less than hour. After that I added the
intel_idle.max_cstate=1 kernel command line parameter and no hangs have happened since.

Revision history for this message
carlix (carlixlinux) wrote :

I have the same problem solved with intel_idle.max_cstate=1

Revision history for this message
Vincent Gerris (vgerris) wrote :

It's not really solved, it's a workaround that makes the machine use more power.
Let's hope that anyone can get intel to fix this?
I am trying to contact people at intel and Canonical to pick this up.
Let's hope someone will act.

Revision history for this message
carlix (carlixlinux) wrote :

A lot of mini pc,tablet x86 and netbooks 2 in 1 have this problem

Revision history for this message
Hanno (hzulla) wrote :

There's progress for this bug which resulted in this patch:

https://lists.freedesktop.org/archives/intel-gfx/2017-February/120021.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.