WARN in trace_event_dyn_put_ref

Bug #1987232 reported by Krister Johansen
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Jammy
Fix Released
Medium
Unassigned
Kinetic
Won't Fix
Undecided
Unassigned

Bug Description

[SRU Justification]

Impact: Some imbalanced ref-counting produces kernel warnings regularly. Since it is a warning level, this triggers system monitoring on servers which in turn causes unnecessary work for inspecting the logs.

Fix: There is a fix upstream and also backported to the upstream stable branch. However we are still a bit behind catching up with the latest versions. Since this is having quite an impact and the fix is rather straight forward, we pull this in from upstream stable ahead of time.

Test case: tbd

Regression potential: Regressions would manifest as different errors related to ref-counting.

---

I have systems that are regularly hitting a WARN in trace_event_dyn_put_ref.

The exact message is:

WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46
+trace_event_dyn_put_ref+0x15/0x20

With the following stacktrace:

 perf_trace_init+0x8f/0xd0
 perf_tp_event_init+0x1f/0x40
 perf_try_init_event+0x4a/0x130
 perf_event_alloc+0x497/0xf40
 __do_sys_perf_event_open+0x1d4/0xf70
 __x64_sys_perf_event_open+0x20/0x30
 do_syscall_64+0x5c/0xc0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

I've debugged this and worked with upstream to get a fix into Linux. It was recently merged in 6.0-rc2. See here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0-rc2&id=7249921d94ff64f67b733eca0b68853a62032b3d

The problem started appearing as soon as our systems picked up the linux-aws-5.15 branch for Focal. (That was 5.15.0-1015-aws, if memory serves). Could you please cherry pick this fix and pull it back to the the linux and linux-aws kernels for Focal? There's test here: https://<email address hidden>/ that reproduces the problem very reliably for me. With the patch applied, I no longer get the WARNs.
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Aug 22 17:32 seq
 crw-rw---- 1 root audio 116, 33 Aug 22 17:32 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu27.24
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t:

Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
MachineType: Amazon EC2 c5d.12xlarge
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-1015-aws root=PARTUUID=4986e35b-1bd5-45d3-b528-fa2edb861a38 ro console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
ProcVersionSignature: Ubuntu 5.15.0-1015.19~20.04.1-aws 5.15.39
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-1015-aws N/A
 linux-backports-modules-5.15.0-1015-aws N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: focal uec-images
Uname: Linux 5.15.0-1015-aws x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: False
dmi.bios.date: 10/16/2017
dmi.bios.release: 1.0
dmi.bios.vendor: Amazon EC2
dmi.bios.version: 1.0
dmi.board.asset.tag: i-03f5d8581c7ad94aa
dmi.board.vendor: Amazon EC2
dmi.chassis.asset.tag: Amazon EC2
dmi.chassis.type: 1
dmi.chassis.vendor: Amazon EC2
dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnc5d.12xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
dmi.product.name: c5d.12xlarge
dmi.sys.vendor: Amazon EC2

CVE References

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1987232

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Krister Johansen (kmjohansen) wrote : CurrentDmesg.txt

apport information

tags: added: apport-collected focal uec-images
description: updated
Revision history for this message
Krister Johansen (kmjohansen) wrote : Lspci.txt

apport information

Revision history for this message
Krister Johansen (kmjohansen) wrote : Lspci-vt.txt

apport information

Revision history for this message
Krister Johansen (kmjohansen) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Krister Johansen (kmjohansen) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Krister Johansen (kmjohansen) wrote : ProcModules.txt

apport information

Revision history for this message
Krister Johansen (kmjohansen) wrote : UdevDb.txt

apport information

Revision history for this message
Krister Johansen (kmjohansen) wrote : WifiSyslog.txt

apport information

Revision history for this message
Krister Johansen (kmjohansen) wrote : acpidump.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Krister Johansen (kmjohansen) wrote :
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Krister Johansen (kmjohansen) wrote :

Should this also get nominated as affecting Focal? I hit this on the 5.15 kernel that was attached to linux-aws for Focal.

Revision history for this message
Stefan Bader (smb) wrote :

@Krister, no the affected series is related to where the primarily affected kernel version sits. For 5.15 kernels this is 22.04/Jammy. On AWS the custom kernels used, roll. So it is a 5.15 based kernel. 5.4 would be the native Focal kernel. But that is not affected. So this overall is not tracked against Focal.

description: updated
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Revision history for this message
Krister Johansen (kmjohansen) wrote :

@Stefan thanks for explaining how the process works. I appreciate your willingness to take this patch ahead of its arrival in the stable pull for the Jammy train. One of your updates mentioned TBD on a test. I have a reproducer in the original cover letter to Steven here, if it helps:

https://<email address hidden>/

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-50.56 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Krister Johansen (kmjohansen) wrote :

I ran the original reproducer on a VM that was running linux/5.15.0-50.56 and linux/linux/5.15.0-46.49. On the former the problem did not reproduce, but on the latter it did. Marking this as verified via testing and setting 'verification-done-jammy'.

tags: added: verification-done-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (42.9 KiB)

This bug was fixed in the package linux - 5.15.0-50.56

---------------
linux (5.15.0-50.56) jammy; urgency=medium

  * jammy/linux: 5.15.0-50.56 -proposed tracker (LP: #1990148)

  * CVE-2022-3176
    - io_uring: refactor poll update
    - io_uring: move common poll bits
    - io_uring: kill poll linking optimisation
    - io_uring: inline io_poll_complete
    - io_uring: correct fill events helpers types
    - io_uring: clean cqe filling functions
    - io_uring: poll rework
    - io_uring: remove poll entry from list when canceling all
    - io_uring: bump poll refs to full 31-bits
    - io_uring: fail links when poll fails
    - io_uring: fix wrong arm_poll error handling
    - io_uring: fix UAF due to missing POLLFREE handling

  * ip/nexthop: fix default address selection for connected nexthop
    (LP: #1988809)
    - selftests/net: test nexthop without gw

  * ip/nexthop: fix default address selection for connected nexthop
    (LP: #1988809) // icmp_redirect.sh in ubuntu_kernel_selftests failed on
    Jammy 5.15.0-49.55 (LP: #1990124)
    - ip: fix triggering of 'icmp redirect'

linux (5.15.0-49.55) jammy; urgency=medium

  * jammy/linux: 5.15.0-49.55 -proposed tracker (LP: #1989785)

  * amdgpu module crash after 5.15 kernel update (LP: #1981883)
    - drm/amdgpu: fix check in fbdev init

  * scsi: hisi_sas: Increase debugfs_dump_index after dump is  completed
    (LP: #1982070)
    - scsi: hisi_sas: Increase debugfs_dump_index after dump is completed

  * [UBUNTU 22.04] s390/qeth: cache link_info for ethtool (LP: #1984103)
    - s390/qeth: cache link_info for ethtool

  * WARN in trace_event_dyn_put_ref (LP: #1987232)
    - tracing/perf: Fix double put of trace event when init fails

  * Jammy update: v5.15.60 upstream stable release (LP: #1989221)
    - x86/speculation: Make all RETbleed mitigations 64-bit only
    - selftests/bpf: Extend verifier and bpf_sock tests for dst_port loads
    - selftests/bpf: Check dst_port only on the client socket
    - block: fix default IO priority handling again
    - tools/vm/slabinfo: Handle files in debugfs
    - ACPI: video: Force backlight native for some TongFang devices
    - ACPI: video: Shortening quirk list by identifying Clevo by board_name only
    - ACPI: APEI: Better fix to avoid spamming the console with old error logs
    - crypto: arm64/poly1305 - fix a read out-of-bound
    - KVM: x86: do not report a vCPU as preempted outside instruction boundaries
    - KVM: x86: do not set st->preempted when going back to user space
    - KVM: selftests: Make hyperv_clock selftest more stable
    - tools/kvm_stat: fix display of error when multiple processes are found
    - selftests: KVM: Handle compiler optimizations in ucall
    - KVM: x86/svm: add __GFP_ACCOUNT to __sev_dbg_{en,de}crypt_user()
    - arm64: set UXN on swapper page tables
    - btrfs: zoned: prevent allocation from previous data relocation BG
    - btrfs: zoned: fix critical section of relocation inode writeback
    - Bluetooth: hci_bcm: Add BCM4349B1 variant
    - Bluetooth: hci_bcm: Add DT compatible for CYW55572
    - dt-bindings: bluetooth: broadcom: Add BCM4349B1 DT binding
    - Bluetooth: btusb: Add support of IMC Netw...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gkeop-5.15/5.15.0-1005.7~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-bluefield/5.15.0-1010.12 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-bluefield verification-needed-jammy
removed: verification-done-jammy
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia/5.15.0-1011.11 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-jammy-linux-nvidia
Revision history for this message
Utkarsh Gupta (utkarsh) wrote :

Ubuntu 22.10 (Kinetic Kudu) has reached end of life, so this bug will not be fixed for that specific release.

Changed in linux (Ubuntu Kinetic):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.