2.6.27 dell studio 15 resume hang

Bug #289212 reported by Andy Whitcroft
6
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
Medium
Andy Whitcroft

Bug Description

I have a Dell Studio 15 system which was recently updated to the Intrepid Ibex Release Candidate release. Since that upgrade suspend/resume and hibernate/restore have failed shortly following restore of the graphical interface; often with a black screen but with the cursor visible, sometimes the bare white square of the password prompt. The machine will suspend and resume correctly on the latest kernel from Hardy Heron (2.6.24), but with the exact same Intrepid root filesystem.

I did some debugging on this. If instead of using the fast-user-switcher to initiate suspend I sudo to root and run '/etc/acpi/sleep.sh force' the system suspends and partially resumes. Specificially comes back up and you can get the X11 console back with ctrl-alt-F7. The /etc/acpi/sleep.sh has not completed as expected, but is waiting for 'vbetool post' which seems to be running in a tight loop in userspace. Killing this leads to a hang.

Due to the vbetool interaction and the normal hang with just a cursor visible, I suspect that this is graphics related. We seem to be using the i915 module for DRM.

apw@dm:~$ lsb_release -rd
Description: Ubuntu 8.10
Release: 8.10
apw@dm:~$ apt-cache policy linux
linux:
  Installed: (none)
  Candidate: 2.6.27.7.11
  Version table:
     2.6.27.7.11 0
        500 http://gb.archive.ubuntu.com intrepid/restricted Packages
apw@dm:~$ lspci
00:00.0 Host bridge: Intel Corporation Mobile 4 Series Chipset Memory Controller Hub (rev 07)
00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
00:02.1 Display controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07)
00:1a.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 2 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 4 (rev 03)
00:1c.5 PCI bridge: Intel Corporation 82801I (ICH9 Family) PCI Express Port 6 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 93)
00:1f.0 ISA bridge: Intel Corporation ICH9M LPC Interface Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation ICH9M/M-E SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 03)
04:00.0 Network controller: Intel Corporation PRO/Wireless 5300 AGN [Shiloh] Network Connection
08:00.0 Ethernet controller: Broadcom Corporation NetLink BCM5784M Gigabit Ethernet PCIe (rev 10)
09:01.0 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 05)
09:01.1 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 22)
09:01.2 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev 12)
09:01.3 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 12)
09:01.4 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev ff)

Tags: intrepid
Revision history for this message
Andy Whitcroft (apw) wrote :

I did some further testing on Hardy's kernel and found that resume is not 100% reliable, but works something like 9 out of 10 times. Further testing on Intrepid's kernel indicates that it does not always crash, working about 1 time in 20. The symptoms of the failure on Hardy's kernel was identicle to that seen on Intrepid's.j

Changed in linux:
assignee: nobody → ubuntu-kernel-team
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Andy Whitcroft (apw) wrote :

This problem is exhibiting under the 64 bit (amd64) port. Now that Intrepid has shipped will download some live cd's (i386 and amd64) and see if they exhibit the problem.

Revision history for this message
Stefan Bader (smb) wrote :

Is there any increased fan activity noticeable? There have been other machines with i915 and similar symptoms. with this the fan started to go to full speed after some time. Which felt like the driver went into a tight loop (spinlock?). Booting with maxcpus=1 made the problem vanish there (err, rather the symptom) so this seems to be a race there...

Revision history for this message
Andy Whitcroft (apw) wrote :

can confirm that if i leave the 'hung' state that the fans do indeed come on which seems to confirm the other findings. booting with maxcpus=1 sadly changes the problem and resume gets stuck in the first bit of disk activity during resume, with the disk light flickering like its loading a couple of blocks a second and the screen never comes back.

Andy Whitcroft (apw)
Changed in linux:
assignee: ubuntu-kernel-team → apw
status: Triaged → In Progress
Revision history for this message
Andy Whitcroft (apw) wrote :

When experimenting with /etc/apci/sleep.sh force it was found that the system would resume just fine, but that the resume process was locked up performing a vbetool post. This led me to try suspending from a text console. Using 'pmi action suspend' the machine suspends, and resumes just fine to the text console. The act of then switching to the X-server VT causes the lockup.

By waiting after resume the networking seems to get reinitialised which implies network manager is waking up and doing its thing. Then can login remotely, trying to take an strace of X to a file left me with nothing in the file. Doing an strace over the network got me less output than is recorded in the X11 log file for the event:

    [...]
    (II) AIGLX: Resuming AIGLX clients after VT switch
    (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x01fff000 (pgoffset 8191)
    (II) intel(0): xf86BindGARTMemory: bind key 1 at 0x03a40000 (pgoffset 14912)
    (II) intel(0): xf86BindGARTMemory: bind key 2 at 0x03a41000 (pgoffset 14913)
    (II) intel(0): xf86BindGARTMemory: bind key 3 at 0x04851000 (pgoffset 18513)
    (II) intel(0): xf86BindGARTMemory: bind key 4 at 0x05661000 (pgoffset 22113)
    (II) intel(0): Fixed memory allocation layout:
    (II) intel(0): 0x00000000-0x0001ffff: ring buffer (128 kB)
    (II) intel(0): 0x00020000-0x00100fff: compressed frame buffer (900 kB, 0x00000000be020000 physical)
    (II) intel(0): 0x00101000-0x0010afff: HW cursors (40 kB)
    (II) intel(0): 0x0010b000-0x00112fff: logical 3D context (32 kB)
    (II) intel(0): 0x00113000-0x00124fff: exa G965 state buffer (72 kB)
    (II) intel(0): 0x00125000-0x00125fff: power context (4 kB)
    (II) intel(0): 0x00200000-0x0100ffff: front buffer (14400 kB) X tiled
    (II) intel(0): 0x01010000-0x03a3ffff: exa offscreen (43200 kB)
    (II) intel(0): 0x01fff000: end of stolen memory
    (II) intel(0): 0x03a40000-0x03a40fff: HW status (4 kB)
    (II) intel(0): 0x03a41000-0x04850fff: back buffer (14400 kB) X tiled
    (II) intel(0): 0x04851000-0x05660fff: depth buffer (14400 kB) Y tiled
    (II) intel(0): 0x05661000-0x07660fff: classic textures (32768 kB)
    (II) intel(0): 0x10000000: end of aperture
    (II) intel(0): using SSC reference clock of 96 MHz
    (II) intel(0): Selecting standard 18 bit TMDS pixel format.

Note that in a regular resume from VT switch we then get:
    (II) intel(0): Output configuration:
    (II) intel(0): Pipe A is off
    (II) intel(0): Display plane A is now disabled and connected to pipe A.
    (II) intel(0): Pipe B is on
    (II) intel(0): Display plane B is now enabled and connected to pipe B.
    (II) intel(0): Output VGA is connected to pipe none
    (II) intel(0): Output LVDS is connected to pipe B
    (II) intel(0): Output HDMI-1 is connected to pipe none
    (II) intel(0): [drm] dma control initialized, using IRQ 2298
    (II) AlpsPS/2 ALPS GlidePoint: x-axis range 0 - 1023
    (II) AlpsPS/2 ALPS GlidePoint: y-axis range 0 - 767
    (--) AlpsPS/2 ALPS GlidePoint touchpad found
    [...]

Some other bugs on Intel graphics have recommended forcing on Pipe A, tried this with no effect:

    Option "ForceEnablePipeA" "true"

Revision history for this message
Andy Whitcroft (apw) wrote :

This is very likely related to bug #276943, which in conclusion intimates at a concurrency bug DRI support. That again links through to a work around of turning off all but one CPU during suspend:

    http://ubuntuforums.org/showpost.php?p=6105510&postcount=12

This seems to work here, over a few suspend cyles. More testing required.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.