[sandybridge-m-gt2+] False GPU lockup IPEHR: 0x3b000000 IPEHR: 0x0b140001

Bug #1154591 reported by Stan Schymanski
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
xserver-xorg-video-intel (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I am being bombarded with these apports, but whenever I try to identify an existing bug that resembles this one, the information is not uploaded and then I'm being told that it may be a different bug. Therefore I am posting this as a new bug report, hoping that someone will be able to point me to the correct duplicate. This one is marked as False GPU lockup but in reality I had a real one, followed by a lot of apports after reboot, and I was not sure whether these later apports relate to new "false" lockups or the original fatal one.

ProblemType: Crash
DistroRelease: Ubuntu 12.10
Package: xserver-xorg-video-intel 2:2.20.9-0ubuntu2
Uname: Linux 3.8.2-030802-generic x86_64
.tmp.unity.support.test.0:

ApportVersion: 2.6.1-0ubuntu10
Architecture: amd64
Chipset: sandybridge-m-gt2+
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: compiz
Date: Tue Mar 12 12:11:55 2013
DistUpgraded: 2012-10-27 09:30:52,172 DEBUG enabling apt cron job
DistroCodename: quantal
DistroVariant: ubuntu
DuplicateSignature: [sandybridge-m-gt2+] GPU lockup IPEHR: 0x3b000000 IPEHR: 0x0b140001 Ubuntu 12.10
ExecutablePath: /usr/share/apport/apport-gpu-error-intel.py
ExtraDebuggingInterest: Yes
GraphicsCard:
 Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller [8086:0126] (rev 09) (prog-if 00 [VGA controller])
   Subsystem: Dell Device [1028:0492]
InstallationDate: Installed on 2011-06-28 (623 days ago)
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release amd64 (20110427)
InterpreterPath: /usr/bin/python3.2mu
MachineType: Dell Inc. Latitude E6320
MarkForUpload: True
ProcCmdline: /usr/bin/python3 /usr/share/apport/apport-gpu-error-intel.py
ProcEnviron:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.8.2-030802-generic root=UUID=5083e04c-1bad-44bf-a241-c839914a697a ro crashkernel=384M-2G:64M,2G-:128M quiet splash i915.i915_enable_rc6=0
RelatedPackageVersions:
 xserver-xorg 1:7.7+1ubuntu4
 libdrm2 2.4.39-0ubuntu1
 xserver-xorg-video-intel 2:2.20.9-0ubuntu2
SourcePackage: xserver-xorg-video-intel
Title: [sandybridge-m-gt2+] False GPU lockup IPEHR: 0x3b000000 IPEHR: 0x0b140001
UpgradeStatus: Upgraded to quantal on 2012-10-27 (137 days ago)
UserGroups:

dmi.bios.date: 08/15/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A15
dmi.board.name: 087HK7
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 9
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA15:bd08/15/2012:svnDellInc.:pnLatitudeE6320:pvr01:rvnDellInc.:rn087HK7:rvrA00:cvnDellInc.:ct9:cvr:
dmi.product.name: Latitude E6320
dmi.product.version: 01
dmi.sys.vendor: Dell Inc.
version.compiz: compiz 1:0.9.8.6-0ubuntu1
version.ia32-libs: ia32-libs 20090808ubuntu36
version.libdrm2: libdrm2 2.4.39-0ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 9.0.2-0ubuntu0.1
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 9.0.2-0ubuntu0.1
version.xserver-xorg-core: xserver-xorg-core 2:1.13.0-0ubuntu6.1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.7.3-0ubuntu2
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.99.99~git20120913.8637f772-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.20.9-0ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.2-0ubuntu3

Revision history for this message
Stan Schymanski (schymans) wrote :
tags: removed: need-duplicate-check
Revision history for this message
Chris Wilson (ickle) wrote :

The root cause here is the missing TLB invalidate, fixed in raring.

Changed in xserver-xorg-video-intel (Ubuntu):
status: New → Fix Released
Revision history for this message
Stan Schymanski (schymans) wrote :

Thanks, Chris! How can I fix it for my system without installing the not-yet released raring?

Revision history for this message
Chris Wilson (ickle) wrote :

You need a v3.8 kernel as the bug fix wasn't marked for stable.

Revision history for this message
Stan Schymanski (schymans) wrote :

Not sure why this does not show up in the bug report, but I am already using v3.8.2 kernel.
Does this mean that the fix does not work for my system?

Revision history for this message
Stan Schymanski (schymans) wrote :

Ok, to draw things together again:
I originally submitted Bug #1153587, where you identified the missing TLB invalidate as the reason and suggested that an upgrade to kernel v3.8 would fix it. I installed Kernel v3.8.2 but the problems continued. You asked to provide /sys/kernel/debug/dri/0/i915_error_state, which I was not able to provide, so instead I submitted this bug report here, hoping that the relevant information would be added automatically. Based on the current crash report you again identified the missing TLB invalidate as the reason and claim that Kernel v3.8 would fix it. In your last comment about Bug #1153587, you mentioned that something might be wrong with rc6. Would deactivating rc6 be a way to break out of this loop? How can I do this?

Revision history for this message
Chris Wilson (ickle) wrote :

The error state here says Kernel: 3.5.0-26-generic, so I would check the grub menu to make sure it was defaulting to the older kernel. To get past #1153587, we need i915.i915_enable_rc6=0 - the root cause behind those is still a mystery. Some have been linked to a severe GPU hang, some have been linked to needing a BIOS update, most seem to have no obvious cause. And you are also hitting the semaphore issue, so i915.semaphores=0 to workaround that issue as well. I'm sorry, you seem to have found all of our outstanding issues. :(

Revision history for this message
Stan Schymanski (schymans) wrote :

Thanks for putting my issues in context with the rest.
However, I'm still confused, as the 4th line of the above bug report says:
"Uname: Linux 3.8.2-030802-generic x86_64"
I currently have the i915.i915_enable_rc6=0 set (see e.g. https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1153587/comments/9) and I had the i915.semaphhores=0 before. How can I set both at the same time? I set it in the /etc/default/grub as GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=0". Can I add a second GRUB_CMDLINE_LINUX_DEFAULT?
I am also wondering if the Virtualbox kernel additions might be causing problems. And another thought is that there may also be a problem in apport-gtk, as it really keeps bombarding me with more and more gpu hung apports if I try to submit the first one. If I click on "Cancel" instead, it leaves me alone. Except that the system just gets unresponsive after a while without any crash report...

Revision history for this message
Chris Wilson (ickle) wrote :

And vbox is not without issues ;-) I'm not sure about the inner workings of apport myself, so I'll let that be. Maybe Bryce could check it over?

In order to add multiple options, just list them inside the single GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=0 i195.i915_enable_rc6=0"

Revision history for this message
Stan Schymanski (schymans) wrote :

Thanks heaps! I modified the line in grub accordingly and I am about to reboot. Will let you know if it makes much of a difference.

Revision history for this message
Stan Schymanski (schymans) wrote :

Unfortunately, an hour later, apport catapulted me back onto this site. So this is with the 3.8.2 kernel and with the line GRUB_CMDLINE_LINUX_DEFAULT="quiet splash i915.semaphores=0 i195.i915_enable_rc6=0"
in grub.

This time, I cannot find any error messages relating to i915 in the syslog or kernel.log. Not sure why apport sent me to this site here again. I now remembered that I installed some extension to apport in order to help capture kernel errors. All these error reports are generated by /usr/share/apport/apport-gpu-error-intel.py. It contains the line:
attach_file_if_exists(report, '/sys/kernel/debug/dri/0/i915_error_state', 'i
915_error_state')

Unfortunately:
root@machine:/usr/share/apport# more /sys/kernel/debug/dri/0/i915_error_state
no error state collected

Any more ideas how I could shed light into this? Otherwise I'll have to try a clean install in the coming days.

Revision history for this message
Stan Schymanski (schymans) wrote :

UPDATE:
I re-installed Ubuntu 12.10 from scratch on a new hard drive, no virtualbox installed yet and I keep getting both false and real GPU lockups. So it seems like a problem in the 3.5 kernel, which carries through to the 3.8 kernel. Would the problem likely go away if I changed computers and used one with a real graphics card instead of the on-board one?

Revision history for this message
Chris Wilson (ickle) wrote :

On a second look, that IPEHR indicates the TLB invalidate bug which is indeed fixed in v3.8.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.