Frequent hard reboots on RaptorLake

Bug #2022981 reported by Shane McKee
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu
Incomplete
Undecided
Unassigned

Bug Description

I can repro in multiple ways:

1. Unplug my HDMI monitor. This is pretty much a guaranteed reboot.
2. Using hardware accelerated Chromium, switching between google profiles seems to trigger a crash and full reboot pretty often.

These could be two separate bugs. I'm not sure I'd be unlucky enough to have two full reboot bugs at the same time though. Happy to help with more debugging

Release: Jammy
Chromium version: 115.0.5762.4-hwacc
snap refresh --channel=latest/edge/hwacc chromium

What you expected to happen
1. HDMI monitor disconnects, and no reboots happen.
2. I can switch Chromium profiles without a reboot

What happened instead
Both actions cause a reboot

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: xorg 1:7.7+23ubuntu2
ProcVersionSignature: Ubuntu 5.19.0-43.44~22.04.1-generic 5.19.17
Uname: Linux 5.19.0-43-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
.tmp.unity_support_test.0:

ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CasperMD5CheckResult: pass
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Tue Jun 6 16:09:12 2023
DistUpgraded: Fresh install
DistroCodename: jammy
DistroVariant: ubuntu
DkmsStatus:
 fwts-efi-runtime-dkms/23.01.00, 5.19.0-41-generic, x86_64: installed (WARNING! Diff between built and installed module!)
 fwts-efi-runtime-dkms/23.01.00, 5.19.0-43-generic, x86_64: installed (WARNING! Diff between built and installed module!)
 tp_smapi/0.43, 5.19.0-41-generic, x86_64: installed
 tp_smapi/0.43, 5.19.0-43-generic, x86_64: installed
ExtraDebuggingInterest: Yes, including running git bisection searches
GpuHangFrequency: Several times a day
GpuHangReproducibility: Occurs more often under certain circumstances
GpuHangStarted: Within the last week or two
GraphicsCard:
 Intel Corporation Device [8086:a7a0] (rev 04) (prog-if 00 [VGA controller])
   Subsystem: CLEVO/KAPOK Computer Device [1558:5630]
 NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] [10de:25a2] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] [10de:0000]
InstallationDate: Installed on 2023-04-20 (46 days ago)
InstallationMedia: Ubuntu 22.04.2 LTS "Jammy Jellyfish" - Release amd64 (20230223)
MachineType: System76 Gazelle
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_90m3zy@/vmlinuz-5.19.0-43-generic root=ZFS=rpool/ROOT/ubuntu_90m3zy ro drm.debug=0xe quiet splash vt.handoff=1
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/22/2023
dmi.bios.release: 4.19
dmi.bios.vendor: coreboot
dmi.bios.version: 2023-03-22_799ed79
dmi.board.name: Gazelle
dmi.board.vendor: System76
dmi.board.version: gaze18
dmi.chassis.type: 9
dmi.chassis.vendor: System76
dmi.ec.firmware.release: 0.0
dmi.modalias: dmi:bvncoreboot:bvr2023-03-22_799ed79:bd03/22/2023:br4.19:efr0.0:svnSystem76:pnGazelle:pvrgaze18:rvnSystem76:rnGazelle:rvrgaze18:cvnSystem76:ct9:cvr:sku:
dmi.product.name: Gazelle
dmi.product.version: gaze18
dmi.sys.vendor: System76
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.113-2~ubuntu0.22.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 22.2.5-0ubuntu0.1~22.04.1
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.xserver-xorg-core: xserver-xorg-core 2:21.1.4-2ubuntu1.7~22.04.1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20210115-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau N/A

Revision history for this message
Shane McKee (mckeesh) wrote :
affects: ubuntu → xorg (Ubuntu)
Revision history for this message
Sebastien Bacher (seb128) wrote :

Thank you for your bug report. Could you after the system has restarted do
$ journalctl -b -1 > journal.log
and add the log to the report?

affects: xorg (Ubuntu) → linux (Ubuntu)
Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Please also remove the kernel parameter 'drm.debug=0xe' because it seems to be causing more noise than help right now.

Revision history for this message
Shane McKee (mckeesh) wrote (last edit ):

Unfortunately, it looks like removing drm.debug=0xe makes reproduction of this bug impossible for me so far. So it sounds like we're dealing with some sort of timing issue.

the journal.log file doesn't want to attach to a comment here, so I just uploaded it to Google Drive (from a boot with drm.debug=0xe):
https://drive.google.com/file/d/1zQdwnHwaeaMnlSTMeSv2H_ObjGtQObep/view?usp=drive_link

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Compressing large text files before attaching them usually solves that.

affects: linux (Ubuntu) → linux-hwe-5.19 (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

That log is too large and cluttered to find the "hard reboots". Can you try identifying the exact date and time of such a reboot? Maybe with a new log?

Also I notice your gnome-shell crashed a couple of times so please be sure to report that in a separate bug: https://wiki.ubuntu.com/Bugs/Responses#Missing_a_crash_report_or_having_a_.crash_attachment

Revision history for this message
Shane McKee (mckeesh) wrote :

Here's the output without drm.debug=0xe. I was able to repro by putting the GPU under stress with a bunch of WebGL aquarium windows using the Chromium hwaccel beta snap. Reproduced using the HDMI unplug method.

Would you like me to re-do all the logs this way or just this one?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The log in comment #7 seems to show a controlled shutdown, no "hard reboot". But gnome-shell does seem to have crashed with signal 11 again so maybe that's the main issue. Please follow:
https://wiki.ubuntu.com/Bugs/Responses#Missing_a_crash_report_or_having_a_.crash_attachment

Revision history for this message
Shane McKee (mckeesh) wrote (last edit ):

Interesting, maybe something is triggering a controlled shutdown for some reason then? For context, here's what I'm seeing:
https://drive.google.com/file/d/1OlPV7akUgekD9-rl1gsMes_zhO68aius/view?usp=sharing

My gnome shell is now crashing while trying to upload the crash file for the gnome crash, and I'm not seeing the same timestamps as my reboots for anything on https://errors.ubuntu.com/user/ID. However, there are some issues related to kdeconnect, apport, and gnome-shell that could be relevant, so here are the links to those:

https://errors.ubuntu.com/oops/156e3de2-06ed-11ee-a27e-fa163e993415
https://errors.ubuntu.com/oops/127fe0f8-06ed-11ee-b088-fa163e55efd0
https://errors.ubuntu.com/oops/43df2b8e-063c-11ee-bccf-fa163ef35206
https://errors.ubuntu.com/oops/18bb3ea2-0623-11ee-b07b-fa163e55efd0

You said to file as a separate bug in a previous comment, so would you prefer to close this one out and have me add those links to a new bug, or are we considering this to be relevant to the reboots I'm getting in this bug?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for the video. That does look like a gnome-shell crash so I guess you meant reboot of the shell rather than the whole system.

One of your gnome-shell crashes is bug 1933186 so let's track it there for now.

affects: linux-hwe-5.19 (Ubuntu) → ubuntu
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.