Repeatedly unusable truncated crash files (on log-out)

Bug #2015857 reported by Daniel van Vugt
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
apport (Ubuntu)
Triaged
High
Unassigned
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Repeatedly unusable truncated crash files:

Bug 2012974, bug 2015842, bug 2015140, bug 2012075

Based on my own testing, the problems seems to happen if multiple binaries crash simultaneously (like at logout). One crash file gets fully written and the other is incomplete.

If I remove apport from the equation and just get the kernel to dump core files then the core files are always written reliably (should be several hundred MB in the case of gnome-shell).

Running `apport-retrace -g $crash` on the .crash files will fail with "/tmp/apport_core_[...] is not a core dump: file format not recognized"

Steps to reproduce
==================

1. Use Ubuntu 23.04 (lunar) desktop with GNOME shell

2. Downgrade gjs and libgjs0g to 1.76.0-1 (for triggering bug 1974293)

3. Remove all crashes: `sudo rm -f /var/crash/*`

4. Interact with the desktop, icon grid and calendar for 30 seconds.

5. Log out.

6. Crash files written.

To test the core file, either use `apport-unpack` on the /var/crash/*.crash file to extract the CoreDump file or use `apport-retrace -g` on the .crash file. It will fail to print the backtrace:

```
$ apport-retrace -g /var/crash/_usr_bin_gnome-shell.1000.crash
[...]
Warnung: Error reading shared library list entry at 0x646c747200000000
Failed to read a valid object file image from memory.
Core was generated by `/usr/bin/gnome-shell'.
Program terminated with signal SIGSEGV, Segmentation fault.
Warnung: Section `.reg-xstate/10527' in core file too small.
(gdb) bt
#0 0x00007fe1c8090ffb in ?? ()
Backtrace stopped: Cannot access memory at address 0x7ffd59a40fe0
```

You can use the 0001-apport-Write-coredump-at-beginning-and-quit.patch on apport to only write the coredump and adjust the slowdown to 100% reproduce the behavior on log-out.

ApportVersion: 2.26.1-0ubuntu2
Architecture: amd64
DistroRelease: Ubuntu 23.04
Package: linux-image-6.2.0-20-generic 6.2.0-20.20
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-20-generic root=UUID=6278216c-6f7c-4a94-8eab-e9c43da65227 ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 6.2.0-20.20-generic 6.2.6

Tags: lunar patch
tags: added: lunar
Revision history for this message
Benjamin Drung (bdrung) wrote :

Do you have logs and example for it? The linked bugs do not contain any apport logs or the original .crash file. Can you provide an example how to reproduce it?

Changed in apport (Ubuntu):
status: New → Incomplete
Revision history for this message
Brian Murray (brian-murray) wrote :

I sort of recall there being code in apport to not capture crash files during the log out process.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I suspect the fix for bug 1974293 (released a minute ago) will hide the problem for now, but my machine does still have one "useless" crash file from last week (attached).

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

And another.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Steps to reproduce:

1. Make sure you don't have the fix for bug 1974293, so make sure libgjs0g == 1.76.0-1

2. Log into Lunar.

3. Interact with the desktop, icon grid and calendar for 30 seconds.

4. Log out.

5. Crash files written.

Changed in apport (Ubuntu):
status: Incomplete → New
Benjamin Drung (bdrung)
description: updated
Revision history for this message
Benjamin Drung (bdrung) wrote (last edit ):

Thanks for providing the steps to reproduce. Preparation:

1. Ubuntu 23.04 (lunar) desktop in a VM
2. Remove all crashes: `sudo rm -f /var/crash/*`
3. Downgrade gjs and libgjs0g to 1.76.0-1

Then did your steps. On the first tries the crash file was 49 MB in size and retracing looks correct. On my last try it only was 14 MB apport-retrace -g showed following warnings (and backtrace fails):

```
Warnung: Section `.reg-xstate/10527' in core file too small.
Warnung: Error reading shared library list entry at 0x646c747200000000
Failed to read a valid object file image from memory.
Core was generated by `/usr/bin/gnome-shell'.
Program terminated with signal SIGSEGV, Segmentation fault.
Warnung: Section `.reg-xstate/10527' in core file too small.
(gdb) bt
#0 0x00007fe1c8090ffb in ?? ()
Backtrace stopped: Cannot access memory at address 0x7ffd59a40fe0
```

```
$ ls /var/crash/ -alh
insgesamt 12M
drwxrwxrwt 2 root whoopsie 4,0K Apr 17 15:44 .
drwxr-xr-x 14 root root 4,0K Mär 29 11:14 ..
-rw-r----- 1 bdrung whoopsie 12M Apr 17 15:44 _usr_bin_gnome-shell.1000.crash
-rw-r----- 1 bdrung whoopsie 376K Apr 17 15:44 _usr_lib_x86_64-linux-gnu_indicator-messages_indicator-messages-service.1000.crash
```

I disabled all processing afterwards to exclude that as reason:

```
sudo systemctl stop whoopsie.path apport-autoreport.path apport-autoreport.timer
```

Changed in apport (Ubuntu):
importance: Undecided → High
status: New → Triaged
Revision history for this message
Benjamin Drung (bdrung) wrote :

If I kill gnome-shell the crash file is written correctly:

```
$ killall -11 gnome-shell
$ apport-retrace -g /var/crash/_usr_bin_gnome-shell.1000.crash
[...]
[New LWP 7620]
Warnung: Section `.reg-xstate/7229' in core file too small.
Downloading separate debug info for /usr/lib/x86_64-linux-gnu/mutter-12/libmutter-clutter-12.so.0
[...]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/gnome-shell'.
Program terminated with signal SIGSEGV, Segmentation fault.
Warnung: Section `.reg-xstate/7229' in core file too small.
#0 __pthread_kill_implementation (no_tid=0, signo=11, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
Download failed: Das Argument ist ungültig. Continuing without source file ./nptl/./nptl/pthread_kill.c.
44 ./nptl/pthread_kill.c: Datei oder Verzeichnis nicht gefunden.
[Current thread is 1 (Thread 0x7f250baaf600 (LWP 7229))]
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=11, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=11, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
[...]
```

Revision history for this message
Benjamin Drung (bdrung) wrote (last edit ):

I modified apport to write the core file right at the beginning and it was correctly written. That points to an issue with apport itself. The written core file is around 500 MB.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Yes I noticed full gnome-shell core files are often 500MB (as is the RSS), though they seem to be mostly zeroes as evidenced by the reported filesystem usage. Not saying that's a bug, just an observation.

Revision history for this message
Benjamin Drung (bdrung) wrote :

One day of debugging gave more insight. Apport is not directly the culprit. If reading from stdin takes too long, the input will be truncated. Attached a patch for apport to write the coredump at the beginning and quit. It can be slowed down by sleeping between the 1 MB blocks. The longer the sleep, the smaller the files get. For my VM: sleeping for 0.001s truncates the file slightly to ~400 MB; sleeping 0.005s truncates it to 42 MB. I verified that /proc/<pid> is still there afterwards.

So I would say that the kernel team should have a look next.

Revision history for this message
Benjamin Drung (bdrung) wrote :

Note: Apport triggers this behavior, because it compresses the core dump and base64 encodes the compressed result. That increases the time to read the coredump file. In addition apport also takes some time to collect the crash information.

description: updated
description: updated
summary: - Repeatedly unusable truncated crash files
+ Repeatedly unusable truncated crash files (on log-out)
Revision history for this message
Benjamin Drung (bdrung) wrote :

This behavior is only noticeable on log-out. Killing gnome-shell works. I am using the 0001-apport-Write-coredump-at-beginning-and-quit.patch with a delay of 0.005s:

```
$ killall -11 gnome-shell
$ tail /var/log/apport.log
ERROR: apport (pid 15107) 2023-04-19 13:53:03,089: called for pid 14206, signal 11, core limit 0, dump mode 1
WARNING: apport (pid 15107) 2023-04-19 13:53:57,167: Wrote 639,475,712 bytes of core to /var/lib/apport/coredump/core._usr_bin_gnome-shell.1000.af6fdb2c-b9ba-4eeb-8727-683ac2247576.14206.5881189
$ gdb /usr/bin/gnome-shell /var/lib/apport/coredump/core._usr_bin_gnome-shell.1000.*
```

The 639 MB were successful written and that took 54 seconds.

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2015857

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: patch
Revision history for this message
Benjamin Drung (bdrung) wrote :

Added package version to the description (apport-collect does not allow me to collect the data since I am not the original reporter).

description: updated
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.