Hard lockup with "watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [Xorg:13615]" in the journal
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
nvidia-graphics-drivers-450 (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
The system was restored from hibernation this morning, but the issue did not exhibit for ~30 minutes after "boot". I have also seen hard locks without hibernation (but they have never produced any journal output, so may be a different issue). Examining `journalctl -k`, I see something like the below repeated every few seconds. I've attached `journalctl -k`s output (truncated from unhibernate this morning).
Nov 27 09:42:09 surprise kernel: watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [Xorg:13615]
Nov 27 09:42:09 surprise kernel: Modules linked in: hid_logitech unix_diag vhost_net tap vhost_vsock vmw_vsock_
Nov 27 09:42:09 surprise kernel: sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c dm_crypt hid_logitech_hidpp hid_microsoft hid_logitech_dj ff_memless hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_kms_helper aesni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops crypto_simd cec cryptd glue_helper rc_core drm i2c_piix4 i2c_nvidia_gpu nvme r8169 ahci xhci_pci nvme_core realtek xhci_pci_renesas libahci wmi gpio_amdpt gpio_generic
Nov 27 09:42:09 surprise kernel: CPU: 10 PID: 13615 Comm: Xorg Tainted: P OE 5.8.0-29-generic #31-Ubuntu
Nov 27 09:42:09 surprise kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
Nov 27 09:42:09 surprise kernel: RIP: 0010:_nv001550k
Nov 27 09:42:09 surprise kernel: Code: 53 28 e9 0e fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 41 54 55 49 89 fc 53 48 8d 5f 38 48 89 f5 48 89 df e8 9a 48 00 00 <84> c0 75 46 49 8b 54 24 40 48 39 d3 48 8b 42 18 74 17 48 39 c5 75
Nov 27 09:42:09 surprise kernel: RSP: 0018:ffffb479cf
Nov 27 09:42:09 surprise kernel: RAX: ffffffffc1a15800 RBX: ffff9c30b2233640 RCX: 00000000001f1623
Nov 27 09:42:09 surprise kernel: RDX: ffff9c2fd6186ac8 RSI: ffff9c2d33ef4008 RDI: ffff9c30b2233640
Nov 27 09:42:09 surprise kernel: RBP: ffff9c2d33ef4008 R08: ffffb479cf577830 R09: 0000000000000001
Nov 27 09:42:09 surprise kernel: R10: ffff9c2cd20fbbc0 R11: 000000000000001a R12: ffff9c30b2233608
Nov 27 09:42:09 surprise kernel: R13: 0000000000000000 R14: ffff9c2d33ef4008 R15: 0000000000000002
Nov 27 09:42:09 surprise kernel: FS: 00007f82d95e4a4
Nov 27 09:42:09 surprise kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 27 09:42:09 surprise kernel: CR2: 000055683e860ff8 CR3: 000000037893c000 CR4: 00000000003406e0
Nov 27 09:42:09 surprise kernel: Call Trace:
Nov 27 09:42:09 surprise kernel: ? _nv001123kms+
Nov 27 09:42:09 surprise kernel: ? _nv000732kms+
Nov 27 09:42:09 surprise kernel: ? _nv002395kms+
Nov 27 09:42:09 surprise kernel: ? _nv000515kms+
Nov 27 09:42:09 surprise kernel: ? _nv000019kms+
Nov 27 09:42:09 surprise kernel: ? kfree+0xb8/0x220
Nov 27 09:42:09 surprise kernel: ? os_free_
Nov 27 09:42:09 surprise kernel: ? _nv008503rm+
Nov 27 09:42:09 surprise kernel: ? _nv035038rm+
Nov 27 09:42:09 surprise kernel: ? _nv030385rm+
Nov 27 09:42:09 surprise kernel: ? _nv033621rm+
Nov 27 09:42:09 surprise kernel: ? _nv008135rm+
Nov 27 09:42:09 surprise kernel: ? os_acquire_
Nov 27 09:42:09 surprise kernel: ? os_release_
Nov 27 09:42:09 surprise kernel: ? _nv037019rm+
Nov 27 09:42:09 surprise kernel: ? nvidia_
Nov 27 09:42:09 surprise kernel: ? _nv002759kms+
Nov 27 09:42:09 surprise kernel: ? mpol_rebind_
Nov 27 09:42:09 surprise kernel: ? _nv000531kms+
Nov 27 09:42:09 surprise kernel: ? nvKmsIoctl+
Nov 27 09:42:09 surprise kernel: ? nvkms_ioctl+
Nov 27 09:42:09 surprise kernel: ? nvidia_
Nov 27 09:42:09 surprise kernel: ? ksys_ioctl+
Nov 27 09:42:09 surprise kernel: ? __x64_sys_
Nov 27 09:42:09 surprise kernel: ? do_syscall_
Nov 27 09:42:09 surprise kernel: ? entry_SYSCALL_
ProblemType: Bug
DistroRelease: Ubuntu 20.10
Package: nvidia-driver-450 450.80.02-0ubuntu1
ProcVersionSign
Uname: Linux 5.8.0-29-generic x86_64
NonfreeKernelMo
ApportVersion: 2.20.11-0ubuntu50.2
Architecture: amd64
CasperMD5CheckR
CurrentDesktop: i3
Date: Fri Nov 27 10:11:35 2020
InstallationDate: Installed on 2019-05-07 (569 days ago)
InstallationMedia: Ubuntu 18.04.2 LTS "Bionic Beaver" - Release amd64 (20190210)
SourcePackage: nvidia-
UpgradeStatus: Upgraded to groovy on 2020-06-22 (157 days ago)
Looking through the journal further, I do see non-NVidia call traces such as:
Nov 27 09:43:52 surprise kernel: INFO: task qemu-system- x86:16736 blocked for more than 120 seconds. kernel/ hung_task_ timeout_ secs" disables this message. 0x212/0x5d0 range+0x90/ 0x90 timeout+ 0x10f/0x160 core+0x1d/ 0x20 for_common+ 0xa8/0x150 completion+ 0x24/0x30 rcu_gp+ 0x11b/0x120 rcu+0x67/ 0x70 rcu+0x250/ 0x250 rcu_utilization +0x10/0x10 event+0x1e8/ 0x1f0 alloc+0x77e/ 0x920 overflow+ 0x40/0x40 [kvm] create_ kernel_ counter. part.0+ 0x21/0x160 create_ kernel_ counter+ 0xf/0x20 counter+ 0x105/0x190 [kvm] gp_counter+ 0x194/0x210 [kvm] set_msr+ 0x17d/0x190 [kvm_amd] set_msr+ 0x4e/0x60 [kvm] msr_common+ 0x4cc/0xf00 [kvm] msr+0x39d/ 0x6e0 [kvm_amd] msr+0x8a/ 0x150 [kvm] wrmsr+0x3c/ 0x120 [kvm] exit+0x39a/ 0x420 [kvm_amd] cr8+0x22/ 0x40 [kvm] guest+0x862/ 0xd90 [kvm] has_interrupt+ 0x41/0x80 [kvm] has_interrupt+ 0x7a/0x90 [kvm] has_events+ 0x134/0x190 [kvm] vcpu_ioctl_ run+0x9f/ 0x2b0 [kvm] ioctl+0x247/ 0x600 [kvm] 0x8e/0xc0 ioctl+0x1a/ 0x20 64+0x49/ 0xc0 64_after_ hwframe+ 0x44/0xa9 220068 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 27 09:43:52 surprise kernel: Tainted: P OEL 5.8.0-29-generic #31-Ubuntu
Nov 27 09:43:52 surprise kernel: "echo 0 > /proc/sys/
Nov 27 09:43:52 surprise kernel: qemu-system-x86 D 0 16736 1 0x00000320
Nov 27 09:43:52 surprise kernel: Call Trace:
Nov 27 09:43:52 surprise kernel: __schedule+
Nov 27 09:43:52 surprise kernel: ? usleep_
Nov 27 09:43:52 surprise kernel: schedule+0x55/0xc0
Nov 27 09:43:52 surprise kernel: schedule_
Nov 27 09:43:52 surprise kernel: ? do_sync_
Nov 27 09:43:52 surprise kernel: __wait_
Nov 27 09:43:52 surprise kernel: wait_for_
Nov 27 09:43:52 surprise kernel: __wait_
Nov 27 09:43:52 surprise kernel: synchronize_
Nov 27 09:43:52 surprise kernel: ? __call_
Nov 27 09:43:52 surprise kernel: ? __bpf_trace_
Nov 27 09:43:52 surprise kernel: account_
Nov 27 09:43:52 surprise kernel: perf_event_
Nov 27 09:43:52 surprise kernel: ? kvm_perf_
Nov 27 09:43:52 surprise kernel: perf_event_
Nov 27 09:43:52 surprise kernel: perf_event_
Nov 27 09:43:52 surprise kernel: pmc_reprogram_
Nov 27 09:43:52 surprise kernel: reprogram_
Nov 27 09:43:52 surprise kernel: amd_pmu_
Nov 27 09:43:52 surprise kernel: kvm_pmu_
Nov 27 09:43:52 surprise kernel: kvm_set_
Nov 27 09:43:52 surprise kernel: svm_set_
Nov 27 09:43:52 surprise kernel: __kvm_set_
Nov 27 09:43:52 surprise kernel: kvm_emulate_
Nov 27 09:43:52 surprise kernel: handle_
Nov 27 09:43:52 surprise kernel: ? kvm_set_
Nov 27 09:43:52 surprise kernel: vcpu_enter_
Nov 27 09:43:52 surprise kernel: ? kvm_apic_
Nov 27 09:43:52 surprise kernel: ? kvm_cpu_
Nov 27 09:43:52 surprise kernel: ? kvm_vcpu_
Nov 27 09:43:52 surprise kernel: vcpu_run+0x76/0x240 [kvm]
Nov 27 09:43:52 surprise kernel: kvm_arch_
Nov 27 09:43:52 surprise kernel: kvm_vcpu_
Nov 27 09:43:52 surprise kernel: ksys_ioctl+
Nov 27 09:43:52 surprise kernel: __x64_sys_
Nov 27 09:43:52 surprise kernel: do_syscall_
Nov 27 09:43:52 surprise kernel: entry_SYSCALL_
Nov 27 09:43:52 surprise kernel: RIP: 0033:0x7fc2853b16d7
Nov 27 09:43:52 surprise kernel: Code: Bad RIP value.
Nov 27 09:43:52 surprise kernel: RSP: 002b:00007fc276
Nov 27 09:43:52 surp...