[amdgpu] gnome-shell gets SIGKILL'd when lock screen or under heavy load in Wayland

Bug #2034619 reported by Martin Randau
106
This bug affects 16 people
Affects Status Importance Assigned to Milestone
Linux
New
Unknown
Mutter
Fix Released
Unknown
X.Org X server
Fix Released
Unknown
linux (Ubuntu)
Status tracked in Mantic
Mantic
Triaged
Undecided
Unassigned
mutter (Ubuntu)
Status tracked in Mantic
Mantic
Fix Committed
High
Unassigned

Bug Description

[ Impact ]

gnome-shell gets unceremoniously SIGKILLed on some Ryzen systems, sometimes when the screen locks, sometimes when launching particular apps.

[ Workaround ]

Add this to /etc/environment:

  MUTTER_DEBUG_KMS_THREAD_TYPE=user

and then reboot.

[ Test Plan ]

Not all Ryzen systems (including one I just purchased) are able to reproduce the bug. We have no choice but to leave final verification to the community. Anyone affected should try locking their screen and verify they are not instantly returned to the login screen.

[ Where problems could occur ]

Anywhere in frame scheduling and particularly for mouse cursor movement since that's what the real-time thread exists to optimize.

[ Original Description ]

I have this issue on Ubuntu 23.10. Lock screen works only with an external monitor connected. Otherwise the session is ended and the user is logged out and brought to the gdm screen.
All works in xorg.

ProblemType: Crash
DistroRelease: Ubuntu 23.10
Package: gnome-shell 45~beta.1-0ubuntu2
ProcVersionSignature: Ubuntu 6.3.0-7.7-generic 6.3.5
Uname: Linux 6.3.0-7-generic x86_64
ApportVersion: 2.27.0-0ubuntu2
Architecture: amd64
CasperMD5CheckResult: unknown
CurrentDesktop: ubuntu:GNOME
Date: Wed Sep 6 22:32:22 2023
DisplayManager: gdm3
ExecutablePath: /usr/bin/gnome-shell
InstallationDate: Installed on 2023-09-03 (3 days ago)
InstallationMedia: Ubuntu 23.10 "Mantic Minotaur" - Daily amd64 (20230901.1)
ProcCmdline: /usr/bin/gnome-shell
RelatedPackageVersions: mutter-common 45~beta.1-1ubuntu2
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sudo users
separator:

Revision history for this message
Martin Randau (cmmrandau) wrote :
Revision history for this message
Apport retracing service (apport) wrote :

Stacktrace:
 #0 0x00007fa67dc9999b in ?? ()
 No symbol table info available.
 Backtrace stopped: Cannot access memory at address 0x7ffda41cde60
StacktraceSource: #0 0x00007fa67dc9999b in ?? ()
StacktraceTop: ?? ()

Revision history for this message
Apport retracing service (apport) wrote : ThreadStacktrace.txt
tags: added: apport-failed-retrace
tags: removed: need-amd64-retrace
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: gnome-shell crashed with SIGSEGV when lock screen in wayland (WORKS WHEN LOCK SCREEN WITH SECOND MONITOR CONNECTED!!)

Thank you for taking the time to report this bug and helping to make Ubuntu better.

However, processing it in order to get sufficient information for the developers failed (it does not generate an useful symbolic stack trace). This might be caused by some outdated packages which were installed on your system at the time of the report. Please upgrade your system to the latest package versions. If you still encounter the crash, please file a new report.

Thank you for your understanding, and sorry for the inconvenience!

Changed in gnome-shell (Ubuntu):
status: New → Invalid
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Before reporting the next crash it's probably a good idea to try removing these unsupported extensions:

  '<email address hidden>',
  '<email address hidden>'

You can do it quickly with:

  cd ~/.local/share/gnome-shell
  rm -rf extensions

and then reboot.

Revision history for this message
Martin Randau (cmmrandau) wrote :

Yes, sorry, my bad, but that doesn't fix it. I attach dmesg -t --level=alert,crit,err,warn and journalctl after a fresh boot where I have just logged in, tried to lock screen with win-L and been brought back to gdm with the session ended.

In journalctl notice this which I have googled without success to this error:

Sep 07 06:33:30 ThinkPad-P14s-Gen-3 /usr/libexec/gdm-wayland-session[1399]: dbus-daemon[1399]: [session uid=123 pid=1399] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1

Note that everything works when I have an external monitor connected. Also - it sometimes works to lock screen ONCE on a fresh session. It did just now, so I thought it was removing extensions that did it, but it failed when I tried to lock screen a second time.

Revision history for this message
Martin Randau (cmmrandau) wrote :

Here is journalctl

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Those logs appear to be incomplete and also not relevant to this bug. Next time it happens please run:

  journalctl -b0 > journal.txt

or if you had to reboot then run:

  journalctl -b-1 > prevjournal.txt

and attach the resulting text file here.

Revision history for this message
Martin Randau (cmmrandau) wrote :

Hello Daniel and thanks for the help.
I attach two logs, both after a fresh reboot. The first (_lock) is when locking with win-L, the second after 1 minute screen blank (_blank. I have set it not to automatically lock at screen blank. Both cause wayland to crash and I return to the gdm with the session ended. There is of course some overlap in the logs.

I'm not an expert as you, but very curious as to what's causing it. There is at least some indication of a wayland crash...

Sep 07 09:09:41 ThinkPad-P14s-Gen-3 systemd[3855]: Started snap.snapd-desktop-integration.snapd-desktop-integration.service - Service for snap application snapd-desktop-integration.snapd-desktop-integration.
Sep 07 09:09:41 ThinkPad-P14s-Gen-3 gnome-shell[3352]: Connection to xwayland lost
Sep 07 09:09:41 ThinkPad-P14s-Gen-3 gnome-shell[3352]: Xwayland terminated, exiting since it was mandatory
Sep 07 09:09:41 ThinkPad-P14s-Gen-3 gnome-shell[3352]: (../src/core/meta-context.c:533):meta_context_terminate: runtime check failed: (g_main_loop_is_running (priv->main_loop))
Sep 07 09:09:41 ThinkPad-P14s-Gen-3 systemd[3855]: Started tracker-miner-fs-3.service - Tracker file system data miner.
Sep 07 09:09:41 ThinkPad-P14s-Gen-3 gnome-shell[3352]: JS ERROR: Gio.IOErrorEnum: Xwayland exited unexpectedly
                                                       @resource:///org/gnome/shell/ui/init.js:21:20
Sep 07 09:09:41 ThinkPad-P14s-Gen-3 gnome-shell[3352]: Execution of main.js threw exception: Module resource:///org/gnome/shell/ui/init.js threw an exception
Sep 07 09:09:41 ThinkPad-P14s-Gen-3 gdm-launch-environment][3244]: pam_unix(gdm-launch-environment:session): session closed for user gdm

Revision history for this message
Martin Randau (cmmrandau) wrote :
Revision history for this message
Martin Randau (cmmrandau) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

I don't see any SIGSEGV in those logs (or "signal 11" or "segfault"). But it does look like gnome-shell got manually killed (SIGKILL) for some reason. The kill seems to occur soon after bug 2034665.

Revision history for this message
Martin Randau (cmmrandau) wrote :

It also displays an error in the extension manager. I have noticed that this error is not always present. The extension seems to work, i.e., I can tile windows by dragging them towards corners.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Indeed that might be an the error leading to gnome-shell's demise but it's hard to tell because 'JS ERROR' is not uncommon and not in itself fatal.

Revision history for this message
Martin Randau (cmmrandau) wrote :

I think it's worthwhile to note that both screen blank and lock screen works when I have an external monitor connected (through a lenovo usb dock), regardless of whether the laptop display is set as primary or not. I don't know if you wants logs from that, or if I can provide any other relevant information?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Try disabling the tiling assistant extension and see if the bug goes away.

Revision history for this message
Martin Randau (cmmrandau) wrote :

Unfortunately not. Here is a log where you can see the crash even though tiling-manager was disabled with "gnome-extensions disable <email address hidden>". The first crash was with and the second without tiling-manger, no reboot in-between though.

Revision history for this message
Martin Randau (cmmrandau) wrote :

Hello again!
I've found this bug, which was similar in causing crashes at suspend/blank/lock screen, and also with "Error reading events from display: Broken pipe": https://askubuntu.com/questions/1465560/lunar-lobster-wayland-crashes and the bug: https://bugs.launchpad.net/ubuntu/+source/mutter/+bug/2012230

Revision history for this message
Martin Randau (cmmrandau) wrote :

I just realized I can't even login via xorg anymore, not sure what has changed. It crashes and tells me to log out and I'm back on gdm. I think the xorg login is around 19:03:07 in this log (right after boot).

Revision history for this message
Martin Randau (cmmrandau) wrote :

Tried a fresh install of 23.10 and the problem persists, only thing I've done is to set amd_pstate=active at kernel boot parameter, which should not matter here. In fact, xorg crashes at the first login attempt, and sends me back to gdm, where wayland is unavailable. After reboot, wayland is again working, but with this same error of suspend/blank/lock described above.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Tiling Assistant is still enabled in your logs. It might be one of those cases where an extension still causes problems when disabled.

I do see gnome-shell being SIGKILL'd by something external still. Not sure why but it's understandable that feels like a crash.

summary: - gnome-shell crashed with SIGSEGV when lock screen in wayland (WORKS WHEN
- LOCK SCREEN WITH SECOND MONITOR CONNECTED!!)
+ gnome-shell gets SIGKILL'd when lock screen in Wayland
Changed in gnome-shell (Ubuntu):
status: Invalid → New
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: gnome-shell gets SIGKILL'd when lock screen in Wayland

Please try commenting out or removing line 274 of /<email address hidden>/src/extension/altTab.js

        AltTab.AppSwitcherPopup = this._originalAltTab;

and then log in again.

Revision history for this message
Martin Randau (cmmrandau) wrote :

Hello Daniel
No luck doing that either. This is with the line commented on a fresh install and without manually disabling the extension.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I don't think this is a gnome-shell bug. gnome-shell is just the part of the problem you can see. What's happening before gnome-shell is killed is a bunch of daemons going down:

Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gnome-calendar[5010]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 xdg-desktop-por[4930]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gjs[5196]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gjs[5213]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gnome-terminal-[5019]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gnome-control-c[5018]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 snapd-desktop-i[4860]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 xdg-desktop-por[4754]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 evolution-alarm[4440]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gsd-media-keys[4364]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gsd-color[4351]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gsd-power[4366]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gsd-wacom[4387]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 gsd-keyboard[4362]: Error reading events from display: Broken pipe
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 systemd[2065]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 polkitd[806]: Unregistered Authentication Agent for unix-session:5 (system bus name :1.132, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Sep 07 14:46:56 ThinkPad-P14s-Gen-3 systemd[2065]: <email address hidden>: Main process exited, code=killed, status=9/KILL

It might be the main settings daemon or again Xwayland since I think a lot of GNOME code still has some X11 connection.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:

1. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

2. If step 1 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

Changed in gnome-shell (Ubuntu):
status: New → Incomplete
affects: gnome-shell (Ubuntu) → ubuntu
Revision history for this message
Martin Randau (cmmrandau) wrote :

Cool, looking forward to finding out what's causing it. Note again that everything works when an external monitor is connected. Let me know if I can supply any logs.

Revision history for this message
Martin Randau (cmmrandau) wrote :

I submitted a report on the one crash log in /var/crash: https://bugs.launchpad.net/ubuntu/+source/geoclue-2.0/+bug/2034902

Not sure how geoclue plays into it.

Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

It's unlikely geoclue would bring down all the settings daemons, or gnome-shell.

Each time the problem happens please repeat the steps in comment #25...

Revision history for this message
Martin Randau (cmmrandau) wrote :

Well, it "crashes" EVERY time I do either of suspend/lock screen/blank screen, unless the external monitor is connected, but this does not generate a report in /var/crash. I'll try it out a few times more and see if any of the "crashes" generate a log in /var/crash.

Revision history for this message
Martin Randau (cmmrandau) wrote :

They have possibly identified the bug here: https://gitlab.gnome.org/GNOME/mutter/-/issues/3012

Revision history for this message
Martin Randau (cmmrandau) wrote (last edit ):

For some reason tiling-assistant is giving additional errors now, see the attached log at 00:00:59 which is when I tried to lock the screen, e.g.:

JS ERROR: Error: Expected an object of type MtkRectangle for argument 'src2' but got type undefined
                                                       equal@file:///<email address hidden>/src/extension/utility.js:443:27
                                                       _edgeTilingPreview@file:///<email address hidden>/src/extension/moveHandler.js:564:39

edit: But uninstalling gnome-extension-ubuntu-tiling-manager does not solve the problem

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Each time the problem happens please repeat all the steps in comment #25.

Revision history for this message
Martin Randau (cmmrandau) wrote (last edit ):

It very rarely generates a bug/crash report but this morning it did, from xwayland, which was identical to what I previously have reported here: https://bugs.launchpad.net/ubuntu/+source/xwayland/+bug/2034995

However, launchpad said that bug was a duplicate of a much older bug.

I reported it again here: https://bugs.launchpad.net/ubuntu/+source/xwayland/+bug/2035057

Gnome-shell and other packages have been updated since. It doesn't (yet) say it's a duplicate of an older bug.

Edit: it does now. Not sure where to go from here. The bug is very reproducible and specific: wayland, no external monitor -> lock screen or screen blank --> crash and return to gdm. If in xorg or in wayland with an external monitor --> everything works.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

Does this bug need to remain private? If it were public it might get more attention or serve as a bug for other people to subscribe to, etc.

Please delete the CoreDump.gz attachment before making this public.

Thanks

Changed in ubuntu:
status: Incomplete → Confirmed
information type: Private → Public
description: updated
Revision history for this message
Julian Andres Klode (juliank) wrote :

I'm seeing the same issue, closing the lid or pressing WIndows+L kills gnome-shell, not sure yet how to debug especially while at conference without second laptop or common wifi.

I'll go attach gdb from tty3 and then lock the screen and see what happens.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I have attached gdb and strace and there's no SIGTERM send before it or any sort of backtrace it just gets a SIGKILL, but I don't know who is sending it.

Changed in ubuntu:
importance: Undecided → High
tags: added: rls-mm-incoming
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The log says it's systemd reporting the SIGKILL. Does that mean it's also systemd issuing the SIGKILL?

Random idea: Does the bug still occur without Desktop Icons NG enabled?

Revision history for this message
Martin Randau (cmmrandau) wrote :

Yes, it still occurs when icons are disabled.

However, it does not occur when am external monitor is connected, at least not here. The monitor is connected through an usb dock.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I have multiple 23.10 machines where screen locking works correctly, no bug. So we probably need to figure out what the common feature is of machines that do experience the bug. Is it 'amdgpu' specific?

Revision history for this message
Martin Randau (cmmrandau) wrote :

I have an AMD Ryzen 7 PRO 6850U with Radeon Graphics (16) @ 4.768GHz. This model, Lenovo P14s (and T14) gen 3, has seen several gpu related issues with earlier kernels requiring boot parameters to avoid freezing and flickering. However, since kernel 6.3 and amd_pstate=active, these problems are gone.

Let me know what I can do to help debugging.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It's more likely to be a software configuration than hardware.

Revision history for this message
Julian Andres Klode (juliank) wrote :

I did killsnoop and killsnoop-bpfcc but I didn't see any SIGKILL delivered to the gnome-shell process, current running theory is the kernel is killing the process directly, and sure possibly due to some GPU related code path.

I'm trying to find out how to frace the kernel function killing the process to get a kernel stack trace and see where it comes from.

Revision history for this message
Martin Randau (cmmrandau) wrote :

A bug with similar symptoms has recently been fixed in gnome-shell: https://gitlab.gnome.org/GNOME/mutter/-/issues/3012

But I can't say if it's related to this.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

That's a crash (SIGSEGV) so unlikely related to this bug. Although your original report started as a SIGSEGV it has not been reproduced so this bug is now exclusively about the SIGKILL.

description: updated
Revision history for this message
Martin Randau (cmmrandau) wrote :

It works fine in 22.04.3 with the 6.5 kernel (linux-oem-22.04d).

Revision history for this message
Martin Randau (cmmrandau) wrote :

Another observation: Since it works with an external monitor connected, I tried to lock the screen and then disconnect the monitor. Not only did this not kill gnome-shell but I was also able to unlock AND lock the screen again, but only once. At the second lock attempt, gnome-shell was killed and I was returned to gdm as usual. This suggests them problem lies in some config file related to what monitor is active, e.g., monitors.xml, but I have not tried to edit that file.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Please check each time that you still have this in the log:

  <email address hidden>: Main process exited, code=killed, status=9/KILL

and that it hasn't reverted to a SIGSEGV somewhere.

Revision history for this message
Martin Randau (cmmrandau) wrote (last edit ):
Download full text (17.9 KiB)

It often does not generate a report. I can trigger it at will. This I did at 06:35:01 and journalctl only says this:

Sep 19 06:35:00 ThinkPad-P14s-Gen-3 update-notifier[4354]: gtk_widget_get_scale_factor: assertion 'GTK_IS_WIDGET (widget)' failed
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 gsd-keyboard[3421]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 update-notifier[4354]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 gsd-power[3429]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 xdg-desktop-por[3916]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 xdg-desktop-por[4004]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 gsd-wacom[3462]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 snapd-desktop-i[3993]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 evolution-alarm[3479]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 gsd-color[3410]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 gsd-media-keys[3422]: Error reading events from display: Broken pipe
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: org.gnome.SettingsDaemon.Keyboard.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 polkitd[1226]: Unregistered Authentication Agent for unix-session:2 (system bus name :1.71, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: <email address hidden>: Main process exited, code=killed, status=9/KILL
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: org.gnome.SettingsDaemon.MediaKeys.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: org.gnome.SettingsDaemon.Power.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: org.gnome.SettingsDaemon.Wacom.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: xdg-desktop-portal-gnome.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: xdg-desktop-portal-gnome.service: Failed with result 'exit-code'.
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: xdg-desktop-portal-gtk.service: Main process exited, code=exited, status=1/FAILURE
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: xdg-desktop-portal-gtk.service: Failed with result 'exit-code'.
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: org.gnome.SettingsDaemon.Keyboard.service: Failed with result 'exit-code'.
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: org.gnome.SettingsDaemon.Color.service: Failed with result 'exit-code'.
Sep 19 06:35:01 ThinkPad-P14s-Gen-3 systemd[2758]: org.gnome.SettingsDaemon.Medi...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It looks like the real problem started earlier than that. Can you attach the full log?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Please also repeat the steps in comment #25 each time, in case we get lucky.

Revision history for this message
Martin Randau (cmmrandau) wrote :

Here is a log containing a fresh boot. I press win-L to lock at Sep 19 09:27:00 where this is the first error:

Sep 19 09:27:00 ThinkPad-P14s-Gen-3 gnome-shell[3270]: JS ERROR: Extension <email address hidden>: TypeError: "AppSwitcherPopup" is read-only
                                                       destroy@file:///<email address hidden>/src/extension/altTab.js:274:9
                                                       disable@file:///<email address hidden>/extension.js:248:30
                                                       _callExtensionDisable@resource:///org/gnome/shell/ui/extensionSystem.js:202:32
                                                       _onEnabledExtensionsChanged@resource:///org/gnome/shell/ui/extensionSystem.js:624:24
                                                       async*_sessionUpdated@resource:///org/gnome/shell/ui/extensionSystem.js:808:20
                                                       ExtensionManager/<@resource:///org/gnome/shell/ui/extensionSystem.js:44:18
                                                       _callHandlers@resource:///org/gnome/gjs/modules/core/_signals.js:130:42
                                                       _emit@resource:///org/gnome/gjs/modules/core/_signals.js:119:10
                                                       _sync@resource:///org/gnome/shell/ui/sessionMode.js:216:14
                                                       pushMode@resource:///org/gnome/shell/ui/sessionMode.js:174:14
                                                       activate@resource:///org/gnome/shell/ui/screenShield.js:666:34
                                                       lock@resource:///org/gnome/shell/ui/screenShield.js:733:14
                                                       LockAsync@resource:///org/gnome/shell/ui/shellDBus.js:519:28
                                                       _handleMethodCall@resource:///org/gnome/gjs/modules/core/overrides/Gio.js:373:35
                                                       _wrapJSObject/<@resource:///org/gnome/gjs/modules/core/overrides/Gio.js:408:34
                                                       @resource:///org/gnome/shell/ui/init.js:21:20

Revision history for this message
Martin Randau (cmmrandau) wrote :
Revision history for this message
Martin Randau (cmmrandau) wrote (last edit ):

If I disable tiling assistant I get this (press win-L at 09:38:40):

Sep 19 09:38:31 ThinkPad-P14s-Gen-3 gnome-shell[8138]: Meta.Rectangle is deprecated, use Mtk.Rectangle instead
Sep 19 09:38:31 ThinkPad-P14s-Gen-3 gnome-shell[8138]: Meta.Rectangle is deprecated, use Mtk.Rectangle instead
Sep 19 09:38:31 ThinkPad-P14s-Gen-3 gnome-shell[8138]: Meta.Rectangle is deprecated, use Mtk.Rectangle instead
Sep 19 09:38:31 ThinkPad-P14s-Gen-3 gnome-shell[8138]: Meta.Rectangle is deprecated, use Mtk.Rectangle instead
Sep 19 09:38:37 ThinkPad-P14s-Gen-3 gnome-character[9259]: JS LOG: Characters Application exiting
Sep 19 09:38:40 ThinkPad-P14s-Gen-3 update-notifier[9048]: gtk_widget_get_scale_factor: assertion 'GTK_IS_WIDGET (widget)' failed
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 update-notifier[9048]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 gnome-tweaks[9196]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 nautilus[8701]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 gnome-control-c[9097]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 xdg-desktop-por[8840]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 xdg-desktop-por[8743]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 gnome-terminal-[8898]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 evolution-alarm[8452]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 gsd-media-keys[8405]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 gsd-wacom[8444]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 snapd-desktop-i[8320]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 gsd-power[8410]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 gsd-color[8390]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 gnome-text-edit[8980]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 gsd-keyboard[8401]: Error reading events from display: Broken pipe
Sep 19 09:38:41 ThinkPad-P14s-Gen-3 systemd[2987]: org.gnome.SettingsDaemon.Color.service: Main process exited, code=exited, status=1/FAILURE

Revision history for this message
Julian Andres Klode (juliank) wrote :

Updates on my end:

Interestingly, I also got my session killed yesterday with an external screen attached, just not one it had seen before. But I was there over the weekend only, so can't continue testing there.

I also created a new user and that shows the same issues, so this is not a user configuration issue, but either system state or a general bug in the code.

As I've said before, I've done plenty of log reading and nothing is in there, just broken pipes and then the message that gnome-shell got killed. Nothing is visible when you bpf trace signals to gnome-shell PID either. I still need to use trace-cmd to ftrace the exit point of gnome-shell to see the path leading to that.

The only reasoning I can see from not getting results from both killsnoop implementations is that the kernel is killing the process itself. But reverting from 6.5 to 6.3 kernel also did not improve things.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Comment #51 is bug 2034665.

Comment #52 makes me think it's Xwayland that went away. But NOT related to any log messages mentioning "Xwayland" because those are just from the login screen and harmless. It looks like the lock screen may have caused the login's Xwayland instance to crash silently or just exit, which is why a bunch of binaries start failing and one explicitly mentions ":0" (the connection to Xwayland).

Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Please try adding this to /etc/environment:

  MUTTER_DEBUG_KMS_THREAD_TYPE=user

and then reboot.

no longer affects: ubuntu
no longer affects: Ubuntu Mantic
no longer affects: mutter (Ubuntu Mantic)
Changed in mutter (Ubuntu):
status: New → Confirmed
Changed in xwayland (Ubuntu):
status: New → Confirmed
tags: added: amdgpu
summary: - gnome-shell gets SIGKILL'd when lock screen in Wayland
+ [amdgpu gnome-shell gets SIGKILL'd when lock screen in Wayland
summary: - [amdgpu gnome-shell gets SIGKILL'd when lock screen in Wayland
+ [amdgpu] gnome-shell gets SIGKILL'd when lock screen in Wayland
Changed in linux (Ubuntu):
status: New → Confirmed
Changed in mutter (Ubuntu):
assignee: nobody → Daniel van Vugt (vanvugt)
status: Confirmed → In Progress
Changed in xwayland (Ubuntu):
status: Confirmed → Opinion
no longer affects: xwayland (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: [amdgpu] gnome-shell gets SIGKILL'd when lock screen in Wayland
Revision history for this message
Martin Randau (cmmrandau) wrote (last edit ):

I can confirm that it works! <:D

edit: I notice a significantly longer boot time where it seems to pause before entering gdm. Before it booted in a few seconds. NVM, it must have been something else - it boots as fast as before now again. :)

Changed in xorg-server:
status: Unknown → New
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Yeah I think the major kernel upgrade (or something around the same time) caused one boot to be very slow.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It looks like mutter is *asking* to be killed if any system calls take too long:

  setrlimit (RLIMIT_RTTIME...

and that only seems to happen on some Ryzen systems. Still, I'm not so sure the default limit of 200ms (inherited from rtkit) is sensible if the machine would otherwise recover. It's not like the problem is happening on every frame.

Changed in mutter (Ubuntu):
importance: Undecided → High
Changed in linux:
status: Unknown → New
Changed in xorg-server:
status: New → Fix Released
Tim Holmes-Mitra (timhm)
tags: removed: rls-mm-incoming
Changed in mutter:
status: Unknown → New
description: updated
summary: - [amdgpu] gnome-shell gets SIGKILL'd when lock screen in Wayland
+ [amdgpu] gnome-shell gets SIGKILL'd when lock screen or under heavy load
+ in Wayland
Changed in mutter (Ubuntu Mantic):
milestone: none → ubuntu-23.10
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mutter - 45.0-3ubuntu3

---------------
mutter (45.0-3ubuntu3) mantic; urgency=medium

  * Update build autopkgtest to not run dh_auto_test (LP: #2038564)

 -- Jeremy Bícha <email address hidden> Thu, 05 Oct 2023 14:29:32 -0400

Changed in mutter (Ubuntu Mantic):
status: In Progress → Fix Released
Revision history for this message
Martin Randau (cmmrandau) wrote :

Just updated and mutter 45.0-3ubuntu3 does not fix this error. Still crash when lock screen and works with MUTTER_DEBUG_KMS_THREAD_TYPE=user added to /etc/environment

Revision history for this message
Nicolás Abel Carbone (nicocarbone) wrote :

I can confirm @cmmrandau findings. mutter 45.0-3ubuntu 3 does not fix the bug but Martin's workaround works for me.

Jeremy Bícha (jbicha)
Changed in mutter (Ubuntu Mantic):
status: Fix Released → Triaged
Changed in mutter (Ubuntu Mantic):
status: Triaged → In Progress
description: updated
Changed in mutter (Ubuntu Mantic):
milestone: ubuntu-23.10 → mantic-updates
Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Changed in mutter:
status: New → Fix Released
Changed in mutter (Ubuntu Mantic):
status: In Progress → Fix Committed
assignee: Daniel van Vugt (vanvugt) → nobody
tags: added: fixed-in-mutter-45.1 fixed-upstream
Changed in mutter (Ubuntu Mantic):
assignee: nobody → Daniel van Vugt (vanvugt)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Removing myself because we'll get the upstream fix in 45.1

Changed in mutter (Ubuntu Mantic):
assignee: Daniel van Vugt (vanvugt) → nobody
Revision history for this message
Mario Limonciello (superm1) wrote :

This is the upstream fix for this issue: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/3324

Changed in linux (Ubuntu Mantic):
status: Confirmed → Triaged
To post a comment you must log in.