NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out

Bug #1874464 reported by Dan Watkins
58
This bug affects 10 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
Focal
Incomplete
Undecided
Unassigned
Groovy
Confirmed
Undecided
Unassigned

Bug Description

Running focal on a desktop, I accidentally clicked "Enable networking" in nm-applet, disabling my networking. When I clicked it again to reenable it, my networking did not return. After unsuccessfully poking at it for a while, I rebooted and saw the below (and still no networking). `rmmod r8169; modprobe r8169` had no (apparent) effect, nor did further reboots. I rebooted onto two different kernels, both exhibited the same behaviour: 5.4.0-21-generic, 5.4.0-26-generic.

I was finally only able to restore networking by _rebooting into Windows_ and then rebooting back into Ubuntu.

(My supposition is that NetworkManager/the kernel set *waves hands* something on the network card that persists across boots when it was disabled, and that wasn't correctly unset when I reenabled networking (or on following boots), but Windows _does_ correctly handle that case on boot, and reset it to a working state.)

Apr 23 10:07:43 surprise kernel: ------------[ cut here ]------------
Apr 23 10:07:43 surprise kernel: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
Apr 23 10:07:43 surprise kernel: WARNING: CPU: 9 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x258/0x260
Apr 23 10:07:43 surprise kernel: Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) zlua(PO) xt_comment dummy xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_>
Apr 23 10:07:43 surprise kernel: autofs4 btrfs xor zstd_compress raid6_pq libcrc32c dm_crypt hid_microsoft ff_memless hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drm_km>
Apr 23 10:07:43 surprise kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: P OE 5.4.0-26-generic #30-Ubuntu
Apr 23 10:07:43 surprise kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
Apr 23 10:07:43 surprise kernel: RIP: 0010:dev_watchdog+0x258/0x260
Apr 23 10:07:43 surprise kernel: Code: 85 c0 75 e5 eb 9f 4c 89 ff c6 05 bf 06 e8 00 01 e8 6d bb fa ff 44 89 e9 4c 89 fe 48 c7 c7 50 6d a3 b4 48 89 c2 e8 83 3f 71 ff <0f> 0b eb 80 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 d7
Apr 23 10:07:43 surprise kernel: RSP: 0018:ffffa90d40378e30 EFLAGS: 00010286
Apr 23 10:07:43 surprise kernel: RAX: 0000000000000000 RBX: ffff8a7578b00400 RCX: 0000000000000000
Apr 23 10:07:43 surprise kernel: RDX: ffff8a758ee67740 RSI: ffff8a758ee578c8 RDI: 0000000000000300
Apr 23 10:07:43 surprise kernel: RBP: ffffa90d40378e60 R08: ffff8a758ee578c8 R09: 0000000000000004
Apr 23 10:07:43 surprise kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
Apr 23 10:07:43 surprise kernel: R13: 0000000000000000 R14: ffff8a758cadc480 R15: ffff8a758cadc000
Apr 23 10:07:43 surprise kernel: FS: 0000000000000000(0000) GS:ffff8a758ee40000(0000) knlGS:0000000000000000
Apr 23 10:07:43 surprise kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 23 10:07:43 surprise kernel: CR2: 00007f4d2000d5eb CR3: 00000003fcfe2000 CR4: 00000000003406e0
Apr 23 10:07:43 surprise kernel: Call Trace:
Apr 23 10:07:43 surprise kernel: <IRQ>
Apr 23 10:07:43 surprise kernel: ? pfifo_fast_enqueue+0x150/0x150
Apr 23 10:07:43 surprise kernel: call_timer_fn+0x32/0x130
Apr 23 10:07:43 surprise kernel: __run_timers.part.0+0x180/0x280
Apr 23 10:07:43 surprise kernel: ? tick_sched_handle+0x33/0x60
Apr 23 10:07:43 surprise kernel: ? tick_sched_timer+0x3d/0x80
Apr 23 10:07:43 surprise kernel: ? ktime_get+0x3e/0xa0
Apr 23 10:07:43 surprise kernel: run_timer_softirq+0x2a/0x50
Apr 23 10:07:43 surprise kernel: __do_softirq+0xe1/0x2d6
Apr 23 10:07:43 surprise kernel: ? hrtimer_interrupt+0x13b/0x220
Apr 23 10:07:43 surprise kernel: irq_exit+0xae/0xb0
Apr 23 10:07:43 surprise kernel: smp_apic_timer_interrupt+0x7b/0x140
Apr 23 10:07:43 surprise kernel: apic_timer_interrupt+0xf/0x20
Apr 23 10:07:43 surprise kernel: </IRQ>
Apr 23 10:07:43 surprise kernel: RIP: 0010:cpuidle_enter_state+0xc5/0x450
Apr 23 10:07:43 surprise kernel: Code: ff e8 df 0d 81 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 65 03 00 00 31 ff e8 32 7a 87 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8f 02 00 00 49 63 cd 4c 8b 7d d0 4c 2b 7d c8 48 8d
Apr 23 10:07:43 surprise kernel: RSP: 0018:ffffa90d4016fe38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
Apr 23 10:07:43 surprise kernel: RAX: ffff8a758ee6ad00 RBX: ffffffffb4d69160 RCX: 000000000000001f
Apr 23 10:07:43 surprise kernel: RDX: 0000000000000000 RSI: 00000000239f52d0 RDI: 0000000000000000
Apr 23 10:07:43 surprise kernel: RBP: ffffa90d4016fe78 R08: 0000011276ac078f R09: 00000000000148ba
Apr 23 10:07:43 surprise kernel: R10: ffff8a758ee69a00 R11: ffff8a758ee699e0 R12: ffff8a757a318000
Apr 23 10:07:43 surprise kernel: R13: 0000000000000002 R14: 0000000000000002 R15: ffff8a757a318000
Apr 23 10:07:43 surprise kernel: ? cpuidle_enter_state+0xa1/0x450
Apr 23 10:07:43 surprise kernel: cpuidle_enter+0x2e/0x40
Apr 23 10:07:43 surprise kernel: call_cpuidle+0x23/0x40
Apr 23 10:07:43 surprise kernel: do_idle+0x1dd/0x270
Apr 23 10:07:43 surprise kernel: cpu_startup_entry+0x20/0x30
Apr 23 10:07:43 surprise kernel: start_secondary+0x167/0x1c0
Apr 23 10:07:43 surprise kernel: secondary_startup_64+0xa4/0xb0
Apr 23 10:07:43 surprise kernel: ---[ end trace cf93a9794ecfd126 ]---

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-generic 5.4.0.26.32
ProcVersionSignature: Ubuntu 5.4.0-26.30-generic 5.4.30
Uname: Linux 5.4.0-26-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: i3
Date: Thu Apr 23 10:38:05 2020
InstallationDate: Installed on 2019-05-07 (351 days ago)
InstallationMedia: Ubuntu 18.04.2 LTS "Bionic Beaver" - Release amd64 (20190210)
MachineType: Gigabyte Technology Co., Ltd. B450M DS3H
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-26-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash resume=UUID=73909634-a75d-42c9-8f66-a69138690756 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-26-generic N/A
 linux-backports-modules-5.4.0-26-generic N/A
 linux-firmware 1.187
RfKill:

SourcePackage: linux
UpgradeStatus: Upgraded to focal on 2019-11-15 (159 days ago)
dmi.bios.date: 01/25/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F4
dmi.board.asset.tag: Default string
dmi.board.name: B450M DS3H-CF
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF4:bd01/25/2019:svnGigabyteTechnologyCo.,Ltd.:pnB450MDS3H:pvrDefaultstring:rvnGigabyteTechnologyCo.,Ltd.:rnB450MDS3H-CF:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: Default string
dmi.product.name: B450M DS3H
dmi.product.sku: Default string
dmi.product.version: Default string
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Dan Watkins (oddbloke) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (10.8 KiB)

Here are the NetworkManager logs when I first disabled my networking (note that ensp5s0 does get its link up reported correctly, but no traffic seems to go over it):

Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1626] manager: disable requested (sleeping: no enabled: yes)
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1627] manager: NetworkManager state is now ASLEEP
Apr 23 09:42:54 surprise dbus-daemon[1280]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.9' (uid=0 pid=1281 comm="/usr/sbin/NetworkManager --no-daemon " label="unconfined")
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1654] device (enp5s0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'managed')
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1695] device (virbr0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'external')
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1703] device (lxdbr0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'external')
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1711] device (docker0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'external')
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1718] device (tun0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'external')
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1744] device (mpqemubr0): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'external')
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1754] device (tap-2eb5e2d1a05): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'external')
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1756] device (mpqemubr0): bridge port tap-2eb5e2d1a05 was detached
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1756] device (tap-2eb5e2d1a05): released from master device mpqemubr0
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1764] device (tap-8b9ffd8bf3f): state change: activated -> deactivating (reason 'sleeping', sys-iface-state: 'external')
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1765] device (mpqemubr0): bridge port tap-8b9ffd8bf3f was detached
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.1765] device (tap-8b9ffd8bf3f): released from master device mpqemubr0
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.2065] device (enp5s0): state change: deactivating -> disconnected (reason 'sleeping', sys-iface-state: 'managed')
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.2124] dhcp4 (enp5s0): canceled DHCP transaction
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.2124] dhcp4 (enp5s0): state changed extended -> done
Apr 23 09:42:54 surprise NetworkManager[1281]: <info> [1587649374.2132] dhcp6 (enp5s0): canceled DHCP tr...

Revision history for this message
Dan Watkins (oddbloke) wrote :

I'd be happy to assist in debugging this, is there anything I can do to help track it down?

You-Sheng Yang (vicamo)
tags: added: hwe-networking-ethernet
Revision history for this message
Nisalon Caje (nisalon-caje) wrote :
Download full text (7.0 KiB)

I have the exact same bug on focal.

But it happens randomly to me, while my server is under load.
2 of the servers I migrated to focal have the same issue (not in the same DC), so it excludes a hardware issue from ths particular machine.

Here is my syslog
May 26 10:04:15 service01K kernel: [161735.901135] ------------[ cut here ]------------
May 26 10:04:15 service01K kernel: [161735.901136] NETDEV WATCHDOG: eth1 (ixgbe): transmit queue 2 timed out
May 26 10:04:15 service01K kernel: [161735.901145] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x258/0x260
May 26 10:04:15 service01K kernel: [161735.901146] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport isofs ip6table_filter ip6_tables xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif kvm_intel kvm joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler video acpi_pad acpi_tad sch_fq_codel ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear hid_generic uas usbhid hid usb_storage raid1 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt crypto_simd fb_sys_fops ixgbe cryptd glue_helper nvme drm xfrm_algo ahci dca mdio libahci nvme_core
May 26 10:04:15 service01K kernel: [161735.901163] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-31-generic #35-Ubuntu
May 26 10:04:15 service01K kernel: [161735.901163] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./E3C246D4U2-2T, BIOS L2.02K 12/18/2019
May 26 10:04:15 service01K kernel: [161735.901164] RIP: 0010:dev_watchdog+0x258/0x260
May 26 10:04:15 service01K kernel: [161735.901165] Code: 85 c0 75 e5 eb 9f 4c 89 ff c6 05 ef f6 e7 00 01 e8 6d bb fa ff 44 89 e9 4c 89 fe 48 c7 c7 40 73 43 ba 48 89 c2 e8 03 30 71 ff <0f> 0b eb 80 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 d7
May 26 10:04:15 service01K kernel: [161735.901165] RSP: 0018:ffffb8774003ce30 EFLAGS: 00010286
May 26 10:04:15 service01K kernel: [161735.901166] RAX: 0000000000000000 RBX: ffff891bdf924ec0 RCX: 0000000000000006
May 26 10:04:15 service01K kernel: [161735.901166] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff891bee8578c0
May 26 10:04:15 service01K kernel: [161735.901167] RBP: ffffb8774003ce60 R08: 000000000000046b R09: 0000000000000004
May 26 10:04:15 service01K kernel: [161735.901167] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000040
May 26 10:04:15 service01K kernel: [161735.901167] R13: 0000000000000002 R14: ffff891bdf980480 R15: ffff891bdf980000
May 26 10:04:15 service01K kernel: [161735.901168] FS: 0000000000000000(0000) GS:ffff891bee840000(0000) knlGS:0000000000000000
May 26 10:04:15 service01K kernel: [161735.901168] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 26 10:04:15 service01K kernel: [161735.901169] CR2: 00007fb0077d6148 CR3: 0000000894f1c006 CR4: 00000000003606e0
May 26 10:04:15 service01K kernel: [161735.901169] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 2...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please test kernel parameter "pcie_aspm=off"?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Jeremy Soller (jackpot51) wrote :
Revision history for this message
AceLan Kao (acelankao) wrote :

Dan,

Please try Kai-Heng's suggestion adding "pcie_aspm=off" to see if it helps
If it does, please also attach the following logs
   1. sudo lspci -xxxs 00:01.3
   2. sudo lspci -xxxs 05:00.0

Thanks.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Jeremy, the link you provided is the same as this bug.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Ok, it's LP: #1874464.

Does this issue also happen on System76 platforms?
In addition to L1.1, does it help if L1.2 gets disabled?

AceLan Kao (acelankao)
Changed in linux (Ubuntu Focal):
status: New → Incomplete
Revision history for this message
Rene Meier (meier.rene) wrote :

I can confirm this error on my system after upgrade from bionic to focal. System works "normaly" but error appears in the logs.

Adding "pcie_aspm=off" does not help in my case.

Revision history for this message
Rene Meier (meier.rene) wrote :
Download full text (4.4 KiB)

here comes the error:

[ 60.259403] ------------[ cut here ]------------
[ 60.259405] NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
[ 60.259416] WARNING: CPU: 6 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x258/0x260
[ 60.259417] Modules linked in: vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter aufs overlay nls_iso8859_1 snd_hda_codec_hdmi intel_rapl_msr mei_hdcp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi intel_rapl_common bridge x86_pkg_temp_thermal intel_powerclamp stp snd_seq llc coretemp kvm_intel snd_seq_device kvm input_leds snd_timer intel_cstate intel_rapl_perf eeepc_wmi asus_wmi sparse_keymap wmi_bmof snd soundcore mei_me mei mac_hid acpi_pad sch_fq_codel parport_pc ppdev lp parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid uas usb_storage crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
[ 60.259438] crypto_simd i915 mxm_wmi i2c_algo_bit cryptd glue_helper drm_kms_helper nvme syscopyarea sysfillrect sysimgblt fb_sys_fops r8169 i2c_i801 drm nvme_core ahci realtek libahci wmi video
[ 60.259444] CPU: 6 PID: 0 Comm: swapper/6 Tainted: G OE 5.4.0-33-generic #37-Ubuntu
[ 60.259445] Hardware name: bluechip Computer AG BUSINESSline/Z170-K, BIOS 3805 05/16/2018
[ 60.259446] RIP: 0010:dev_watchdog+0x258/0x260
[ 60.259447] Code: 85 c0 75 e5 eb 9f 4c 89 ff c6 05 ef f6 e7 00 01 e8 6d bb fa ff 44 89 e9 4c 89 fe 48 c7 c7 40 73 e3 8f 48 89 c2 e8 03 30 71 ff <0f> 0b eb 80 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 d7
[ 60.259448] RSP: 0018:ffffaf7200234e30 EFLAGS: 00010286
[ 60.259448] RAX: 0000000000000000 RBX: ffff9c6ded99fa00 RCX: 0000000000000006
[ 60.259449] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9c6df6b978c0
[ 60.259449] RBP: ffffaf7200234e60 R08: 00000000000004a0 R09: 0000000000000004
[ 60.259450] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
[ 60.259450] R13: 0000000000000000 R14: ffff9c6defed4480 R15: ffff9c6defed4000
[ 60.259451] FS: 0000000000000000(0000) GS:ffff9c6df6b80000(0000) knlGS:0000000000000000
[ 60.259451] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 60.259452] CR2: 00001a62928112d0 CR3: 00000009e1e0a002 CR4: 00000000003606e0
[ 60.259452] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 60.259453] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 60.259453] Call Trace:
[ 60.259454] <IRQ>
[ 60.259456] ? pfifo_fast_enqueue+0x150/0x150
[ 60.259458] call_timer_fn+0x32/0x130
[ 60.259459] __run_timers.part.0+0x180/0x280
[ 60.259461] ? timerqueue_add+0x68/0xb0
[ 60.259462] ? enqueue_hrtimer+0x3d/0x90
[ 60.259464] ? recalibrate_cpu_khz+0x10/0x10
[ 60.259465] ? ktime_get+0x3e/0xa0
[ 60.259466] run_timer_softirq+0x2a/0x50
[ 60.259467] __do_soft...

Read more...

Revision history for this message
Dan Watkins (oddbloke) wrote :

I'll try the command line change after my morning meetings; here's the requested debug output:

$ sudo lspci -xxxs 00:01.3
00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00: 22 10 53 14 07 04 10 00 00 00 04 06 10 00 81 00
10: 00 00 00 00 00 00 00 00 00 02 06 00 f1 f1 00 20
20: 50 f7 60 f7 21 f2 21 f2 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 00 12 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 01 58 03 c8 00 00 00 00 10 a0 42 01 22 80 00 00
60: 1f 29 00 00 43 f8 72 02 40 00 43 70 00 00 04 00
70: 00 00 40 01 18 00 01 00 00 00 00 00 bf 01 70 00
80: 06 00 00 00 0e 00 00 00 03 00 1f 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 05 c0 81 00 00 00 e0 fe 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 0d c8 00 00 22 10 53 14 08 00 03 a8 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 ff ff ff ff 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

$ sudo lspci -xxxs 05:00.0
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
00: ec 10 68 81 07 04 10 00 0c 00 00 02 10 00 00 00
10: 01 f0 00 00 00 00 00 00 04 00 50 f7 00 00 00 00
20: 0c 00 20 f2 00 00 00 00 00 00 00 00 58 14 00 e0
30: 00 00 00 00 40 00 00 00 00 00 00 00 0a 01 00 00
40: 01 50 c3 ff 08 00 00 00 00 00 00 00 00 00 00 00
50: 05 70 80 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 10 b0 02 02 c0 8c 68 00 10 50 19 00 11 7c 47 00
80: 40 00 11 10 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 1f 08 0c 00 00 00 00 00 02 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 11 d0 03 80 04 00 00 00 04 08 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Revision history for this message
Dan Watkins (oddbloke) wrote :

OK, I just did the following (all on 5.4.0-33-generic):

* modified /etc/default/grub to include " pcie_aspm=off" in the kernel command line
* `update-grub`
* `reboot`
* double-checked that "pcie_aspm=off" was in the kernel command line, then booted
* once booted and logged in, I disabled networking

Unfortunately:

* re-enabling networking did not restore my network connection
* rebooting did not restore my network connection
* only rebooting into Windows fixed my network connection
* I see the same trace on the boots with broken networking as in the original report

AFAICT, the kernel command line option had no effect on the issue at hand.

Anything else I can try?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

If latest kernel doesn't work then we need to bisect between Bionic kernel and Focal kernel.

Revision history for this message
Rene Meier (meier.rene) wrote :

I tried the mainline kernel and the error is gone. What next?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Would it be possible for you to do a reverse kernel bisection?

First, find the first -rc kernel works and the last -rc kernel doesn’t work from http://kernel.ubuntu.com/~kernel-ppa/mainline/

Then,
$ sudo apt build-dep linux
$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
$ cd linux
$ git bisect start
$ git bisect new $(the working version you found)
$ git bisect old $(the non-working version found)
$ make localmodconfig
$ make -j`nproc` deb-pkg
Install the newly built kernel, then reboot with it.
If it doesn’t work,
$ git bisect old
Otherwise,
$ git bisect new
Repeat to "make -j`nproc` deb-pkg" until you find the commit fixes the issue.

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (5.4 KiB)

I have tested both rc3 and rc4; neither addresses the issue for me. I am seeing a slightly different call trace now though (this is from an rc3 boot):

Jul 06 08:44:48 surprise kernel: ------------[ cut here ]------------
Jul 06 08:44:48 surprise kernel: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
Jul 06 08:44:48 surprise kernel: WARNING: CPU: 15 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x25e/0x270
Jul 06 08:44:48 surprise kernel: Modules linked in: xt_comment dummy xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bpfilter rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm ib_core overlay snd_hda_codec_realtek snd_hda_codec_generic nls_iso8859_1 ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device joydev snd_timer edac_mce_amd ucsi_ccg mousedev input_leds amd_energy typec_ucsi rapl wmi_bmof efi_pstore snd typec k10temp soundcore mac_hid sch_fq_codel kvm_amd ccp kvm iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables parport_pc ppdev lp parport drm backlight ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c dm_crypt hid_microsoft ff_memless hid_logitech_hidpp hid_logitech_dj hid_generic usbhid hid
Jul 06 08:44:48 surprise kernel: crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper r8169 i2c_piix4 nvme ahci i2c_nvidia_gpu realtek xhci_pci nvme_core libahci xhci_pci_renesas wmi gpio_amdpt gpio_generic
Jul 06 08:44:48 surprise kernel: CPU: 15 PID: 0 Comm: swapper/15 Not tainted 5.8.0-050800-generic #202006282330
Jul 06 08:44:48 surprise kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
Jul 06 08:44:48 surprise kernel: RIP: 0010:dev_watchdog+0x25e/0x270
Jul 06 08:44:48 surprise kernel: Code: 85 c0 75 e5 eb 9c 4c 89 ff c6 05 eb ef 22 01 01 e8 d7 d1 fa ff 44 89 e9 4c 89 fe 48 c7 c7 90 28 e8 b7 48 89 c2 e8 17 1a 6a ff <0f> 0b e9 7a ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
Jul 06 08:44:48 surprise kernel: RSP: 0018:ffffa29e404b4e70 EFLAGS: 00010282
Jul 06 08:44:48 surprise kernel: RAX: 0000000000000000 RBX: ffff943bca232600 RCX: 0000000000000000
Jul 06 08:44:48 surprise kernel: RDX: ffff943bccfe9020 RSI: ffff943bccfd8cd0 RDI: 0000000000000300
Jul 06 08:44:48 surprise kernel: RBP: ffffa29e404b4ea0 R08: 0000000000000545 R09: 0000000000000004
Jul 06 08:44:48 surprise kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
Jul 06 08:44:48 surprise kernel: R13: 0000000000000000 R14: ffff943bca938480 R15: ffff943bca938000
Jul 06 08:44:48 surprise kernel: FS: 0000000000000000(0000) GS:ffff943bccfc0000(0000) knlGS:0000000000000000
Jul 06 08:44:48 surprise kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 06 08:44:48 surprise kernel: CR2: 00007f507eb35160 CR3: 00000003d81f4000 CR4: 00000000003406e0
Jul 06 08:44:48 surprise kernel: Call Trace:
Jul 06 08:44:48 surprise kernel: <IRQ>
Jul 06 08:44...

Read more...

Revision history for this message
Dan Watkins (oddbloke) wrote :

(I'm now running groovy on this system, in case that changes anything.)

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Dan,

Possible to try older kernel releases? Like 4.15?

Revision history for this message
Dan Watkins (oddbloke) wrote :

> Possible to try older kernel releases? Like 4.15?

I'm happy to try; how would I go about installing them on groovy?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Dan Watkins (oddbloke) wrote :

I just saw this again on 5.8.0-18-generic, and this time I did not manually modify my networking state; I installed the new kernel and rebooted into it, and the problem exhibited immediately. (Fortunately, for whatever reason, a reboot into Windows and back to -18 _has_ come up with networking, so it isn't happening on every boot.)

I'll try a 4.15 kernel as requested in the last comment, and see what happens.

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (4.1 KiB)

I saw this traceback on the 4.15.18 mainline build:

Sep 04 14:47:41 surprise kernel: ------------[ cut here ]------------
Sep 04 14:47:41 surprise kernel: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
Sep 04 14:47:41 surprise kernel: WARNING: CPU: 11 PID: 0 at /home/kernel/COD/linux/net/sched/sch_generic.c:323 dev_watchdog+0x221/0x230
Sep 04 14:47:41 surprise kernel: Modules linked in: devlink nft_meta nft_compat ipt_REJECT nf_reject_ipv4 xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_CHECKSUM xt_comment xt_tcpudp nft_chain_route_ipv6 iptable_mangle iptable_nat nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_def>
Sep 04 14:47:41 surprise kernel: videodev joydev input_leds media snd_seq snd_seq_device irqbypass snd_timer k10temp wmi_bmof snd ccp nvidia_uvm(OE) 8250 soundcore 8250_base shpchp mac_hid wmi iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables parpor>
Sep 04 14:47:41 surprise kernel: CPU: 11 PID: 0 Comm: swapper/11 Tainted: P OE 4.15.18-041518-generic #201804190330
Sep 04 14:47:41 surprise kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
Sep 04 14:47:41 surprise kernel: RIP: 0010:dev_watchdog+0x221/0x230
Sep 04 14:47:41 surprise kernel: RSP: 0018:ffff96204e8c3e58 EFLAGS: 00010286
Sep 04 14:47:41 surprise kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Sep 04 14:47:41 surprise kernel: RDX: 0000000000040400 RSI: 00000000000000f6 RDI: 0000000000000300
Sep 04 14:47:41 surprise kernel: RBP: ffff96204e8c3e88 R08: 0000000000000001 R09: 00000000000004c6
Sep 04 14:47:41 surprise kernel: R10: ffff96204e8c3ee0 R11: 0000000000000000 R12: 0000000000000001
Sep 04 14:47:41 surprise kernel: R13: ffff96204e32e000 R14: ffff96204e32e478 R15: ffff96203bd19880
Sep 04 14:47:41 surprise kernel: FS: 0000000000000000(0000) GS:ffff96204e8c0000(0000) knlGS:0000000000000000
Sep 04 14:47:41 surprise kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 04 14:47:41 surprise kernel: CR2: 00007f564479a0c0 CR3: 000000040adb0000 CR4: 00000000003406e0
Sep 04 14:47:41 surprise kernel: Call Trace:
Sep 04 14:47:41 surprise kernel: <IRQ>
Sep 04 14:47:41 surprise kernel: ? dev_deactivate_queue.constprop.33+0x60/0x60
Sep 04 14:47:41 surprise kernel: call_timer_fn+0x30/0x130
Sep 04 14:47:41 surprise kernel: run_timer_softirq+0x3fb/0x450
Sep 04 14:47:41 surprise kernel: ? ktime_get+0x43/0xa0
Sep 04 14:47:41 surprise kernel: ? lapic_next_event+0x20/0x30
Sep 04 14:47:41 surprise kernel: __do_softirq+0xdf/0x2b2
Sep 04 14:47:41 surprise kernel: irq_exit+0xb6/0xc0
Sep 04 14:47:41 surprise kernel: smp_apic_timer_interrupt+0x71/0x130
Sep 04 14:47:41 surprise kernel: apic_timer_interrupt+0x84/0x90
Sep 04 14:47:41 surprise kernel: </IRQ>
Sep 04 14:47:41 surprise kernel: RIP: 0010:cpuidle_enter_state+0xa7/0x2f0
Sep 04 14:47:41 surprise kernel: RSP: 0018:ffffabdc019fbe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
Sep 04 14:47:41 surprise kernel: RAX: ffff96204e8e2880 RBX: 00000013b8893c67 RCX: 000000000000001f
Sep 04 14:47:41 surprise kernel: RDX: 00000013b8893c67 RSI: fffffffc216ba992 RDI: 0000000000000000
Sep 04 14:47:41 surprise ker...

Read more...

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (4.6 KiB)

And this is what I see on 5.9.0-050900rc3-generic:

Sep 04 15:05:02 surprise kernel: ------------[ cut here ]------------
Sep 04 15:05:02 surprise kernel: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
Sep 04 15:05:02 surprise kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x25b/0x270
Sep 04 15:05:02 surprise kernel: Modules linked in: xt_comment iptable_mangle iptable_nat bpfilter dummy xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfne>
Sep 04 15:05:02 surprise kernel: xor raid6_pq libcrc32c dm_crypt hid_logitech_hidpp hid_logitech_dj hid_microsoft ff_memless hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper i2c_piix4 r8169 nvme i2c_nvidia_gpu rea>
Sep 04 15:05:02 surprise kernel: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-050900rc3-generic #202008302030
Sep 04 15:05:02 surprise kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
Sep 04 15:05:02 surprise kernel: RIP: 0010:dev_watchdog+0x25b/0x270
Sep 04 15:05:02 surprise kernel: Code: 85 c0 75 e5 eb 9c 4c 89 ff c6 05 3a 29 1d 01 01 e8 5a 7c fa ff 44 89 e9 4c 89 fe 48 c7 c7 b0 9b a5 ac 48 89 c2 e8 aa 21 64 ff <0f> 0b e9 7a ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f
Sep 04 15:05:02 surprise kernel: RSP: 0018:ffffb1c340003e78 EFLAGS: 00010286
Sep 04 15:05:02 surprise kernel: RAX: 0000000000000000 RBX: ffff9a27782a0000 RCX: 0000000000000000
Sep 04 15:05:02 surprise kernel: RDX: ffff9a278cc28fe0 RSI: ffff9a278cc18cc0 RDI: 0000000000000300
Sep 04 15:05:02 surprise kernel: RBP: ffffb1c340003ea8 R08: 0000000000000004 R09: 000000000000053a
Sep 04 15:05:02 surprise kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9a27782a0080
Sep 04 15:05:02 surprise kernel: R13: 0000000000000000 R14: ffff9a278ab9a480 R15: ffff9a278ab9a000
Sep 04 15:05:02 surprise kernel: FS: 0000000000000000(0000) GS:ffff9a278cc00000(0000) knlGS:0000000000000000
Sep 04 15:05:02 surprise kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 04 15:05:02 surprise kernel: CR2: 000055b1cb035468 CR3: 0000000400f86000 CR4: 00000000003506f0
Sep 04 15:05:02 surprise kernel: Call Trace:
Sep 04 15:05:02 surprise kernel: <IRQ>
Sep 04 15:05:02 surprise kernel: ? pfifo_fast_enqueue+0x150/0x150
Sep 04 15:05:02 surprise kernel: call_timer_fn+0x32/0x130
Sep 04 15:05:02 surprise kernel: __run_timers.part.0+0x1eb/0x270
Sep 04 15:05:02 surprise kernel: run_timer_softirq+0x2a/0x50
Sep 04 15:05:02 surprise kernel: __do_softirq+0xd0/0x2a5
Sep 04 15:05:02 surprise kernel: asm_call_on_stack+0x12/0x20
Sep 04 15:05:02 surprise kernel: </IRQ>
Sep 04 15:05:02 surprise kernel: do_softirq_own_stack+0x3f/0x50
Sep 04 15:05:02 surprise kernel: irq_exit_rcu+0x95/0xd0
Sep 04 15:05:02 surprise kernel: sysvec_call_function_single+0x3d/0xa0
Sep 04 15:05:02 surprise kernel: asm_sysvec_call_function_single+0x12/0x20
Sep 04 15:05:02 surprise kernel: RIP: 0010:cpuidle_enter_state+0xc2/0x390
Sep 04 15:05:02 surprise kernel: Code: f4 22 07 54 e8 ff 77 74...

Read more...

Changed in linux (Ubuntu Groovy):
status: Incomplete → New
Revision history for this message
Dan Watkins (oddbloke) wrote :

(Moved the groovy task back to New, not sure if that's the right process!)

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Kreisch István András (istvan-kreisch) wrote :

Hi!

I'm using a really old kernel with this same error: v3.13.170 with Ubuntu 14.04.6. I could circumvent the issue by reduce the speed of the ethernet interface from 1Gb to 100Mb using ethtool.

ethtool –s eth3 speed 100 duplex full autoneg on

Maybe it helps to operate until the fix is implemented and released.

Br,
István

Revision history for this message
Dan Watkins (oddbloke) wrote : Re: [Bug 1874464] Re: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out

On Sat, Sep 05, 2020 at 08:46:51PM -0000, Kreisch István András wrote:
> I'm using a really old kernel with this same error: v3.13.170 with
> Ubuntu 14.04.6. I could circumvent the issue by reduce the speed of the
> ethernet interface from 1Gb to 100Mb using ethtool.
>
> ethtool –s eth3 speed 100 duplex full autoneg on
>
> Maybe it helps to operate until the fix is implemented and released.

Unfortunately, this did not address the issue I was seeing. (Thanks for
the suggestion!)

Revision history for this message
Dan Watkins (oddbloke) wrote :

I'm still seeing this issue, and it now sometimes appears on boot without me having done anything. What can I do to help move this forward?

Revision history for this message
AceLan Kao (acelankao) wrote :

Could you try this test kernel to see if it helps.
I applied one commit[1] on top of focal kernel master-next branch
https://people.canonical.com/~acelan/bugs/lp1874464

1. https://patchwork.ozlabs<email address hidden>/

Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks for the test kernel! I can no longer reproduce this on the most recent two kernels in groovy (5.8.0-19-generic, 5.8.0-20-generic) nor with that test kernel.

I think we can mark this Incomplete for groovy too, and I'll respond if I see this again.

Thanks to you and Kai-Heng for all your help throughout this process!

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Dan, it will be great if you can revert workaround [1] and apply possible fix [2] to see if it helps.

I guess you no longer see the issue because of the workaround.

[1] https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/unstable/commit/?id=759bc16ddfd4d7f3e195a9662d9d067625b805b6
[2] https://patchwork.ozlabs<email address hidden>/

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Oh the workaround actually enables L0s and L1. Please just comment the line out.

Revision history for this message
Dan Watkins (oddbloke) wrote :

On Sat, Oct 10, 2020 at 09:01:15PM -0000, Kai-Heng Feng wrote:
> Dan, it will be great if you can revert workaround [1] and apply
> possible fix [2] to see if it helps.
>
> I guess you no longer see the issue because of the workaround.
>
> [1] https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/unstable/commit/?id=759bc16ddfd4d7f3e195a9662d9d067625b805b6
> [2] https://patchwork.ozlabs<email address hidden>/

I'm happy to help, but I haven't compiled a kernel before; what's the
process for going about it?

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (4.5 KiB)

Actually, looks like I spoke too soon. I just upgraded to 5.8.0-22-generic and I'm seeing the issue still:

Oct 13 10:43:37 surprise kernel: ------------[ cut here ]------------
Oct 13 10:43:37 surprise kernel: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
Oct 13 10:43:37 surprise kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x25b/0x270
Oct 13 10:43:37 surprise kernel: Modules linked in: nft_compat ipt_REJECT nf_reject_ipv4 xt_conntrack nft_counter xt_MASQUERADE xt_CHECKSUM xt_comment xt_tcpudp iptable_mangle iptable_nat bpfilter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink >
Oct 13 10:43:37 surprise kernel: xor raid6_pq libcrc32c dm_crypt hid_logitech_hidpp hid_microsoft hid_logitech_dj ff_memless hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt crct10dif_pclmul crc32_pclmul ghash>
Oct 13 10:43:37 surprise kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: P OE 5.8.0-22-generic #23-Ubuntu
Oct 13 10:43:37 surprise kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
Oct 13 10:43:37 surprise kernel: RIP: 0010:dev_watchdog+0x25b/0x270
Oct 13 10:43:37 surprise kernel: Code: 85 c0 75 e5 eb 9c 4c 89 ff c6 05 46 85 1c 01 01 e8 2a 93 fa ff 44 89 e9 4c 89 fe 48 c7 c7 48 7a e8 84 48 89 c2 e8 da 30 64 ff <0f> 0b e9 7a ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f
Oct 13 10:43:37 surprise kernel: RSP: 0018:ffffb770401dce78 EFLAGS: 00010286
Oct 13 10:43:37 surprise kernel: RAX: 0000000000000000 RBX: ffff8ed4b848a000 RCX: ffff8ed4cee58cd8
Oct 13 10:43:37 surprise kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8ed4cee58cd0
Oct 13 10:43:37 surprise kernel: RBP: ffffb770401dcea8 R08: 0000000000000004 R09: 0000000000000551
Oct 13 10:43:37 surprise kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed4b848a080
Oct 13 10:43:37 surprise kernel: R13: 0000000000000000 R14: ffff8ed4b8952480 R15: ffff8ed4b8952000
Oct 13 10:43:37 surprise kernel: FS: 0000000000000000(0000) GS:ffff8ed4cee40000(0000) knlGS:0000000000000000
Oct 13 10:43:37 surprise kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 10:43:37 surprise kernel: CR2: 00007f738e845d90 CR3: 000000010960a000 CR4: 00000000003406e0
Oct 13 10:43:37 surprise kernel: Call Trace:
Oct 13 10:43:37 surprise kernel: <IRQ>
Oct 13 10:43:37 surprise kernel: ? pfifo_fast_enqueue+0x150/0x150
Oct 13 10:43:37 surprise kernel: call_timer_fn+0x32/0x130
Oct 13 10:43:37 surprise kernel: __run_timers.part.0+0x184/0x280
Oct 13 10:43:37 surprise kernel: ? lapic_next_event+0x21/0x30
Oct 13 10:43:37 surprise kernel: ? clockevents_program_event+0x8f/0xe0
Oct 13 10:43:37 surprise kernel: run_timer_softirq+0x2a/0x50
Oct 13 10:43:37 surprise kernel: __do_softirq+0xd0/0x2a1
Oct 13 10:43:37 surprise kernel: asm_call_irq_on_stack+0x12/0x20
Oct 13 10:43:37 surprise kernel: </IRQ>
Oct 13 10:43:37 surprise kernel: do_softirq_own_stack+0x3d/0x50
Oct 13 10:43:37 surprise kernel: irq_exit_rcu+0x95/0xd0
Oct 13 10:43:37 surprise kernel: sysvec_apic_timer_interrupt+0x3b/0x90
Oct 13 10:...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please test this kernel:
https://people.canonical.com/~khfeng/lp1896576/

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (3.4 KiB)

On Fri, Oct 16, 2020 at 03:05:09AM -0000, Kai-Heng Feng wrote:
> Can you please test this kernel:
> https://people.canonical.com/~khfeng/lp1896576/

Thanks for the kernel! Still seeing this, unfortunately:

kernel: ------------[ cut here ]------------
kernel: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
kernel: WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x25e/0x270
kernel: Modules linked in: nft_compat nft_counter nft_chain_nat nf_tables nfnetlink ipt_REJECT nf_reject_ipv4 xt_conntrack xt_MASQUERADE xt_CHECKSUM xt_comment xt_tcpudp iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter >
kernel: bridge stp llc arp_tables parport_pc ppdev lp parport drm ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c dm_crypt hid_logitech_hidpp hid_microsoft hid_logitech_dj ff_memless hid_generic usbhid hid crct10dif_p>
kernel: CPU: 8 PID: 0 Comm: swapper/8 Tainted: P OE 5.6.0-2030-oem #30~lp1896576
kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
kernel: RIP: 0010:dev_watchdog+0x25e/0x270
kernel: Code: 85 c0 75 e5 eb 9c 4c 89 ff c6 05 1f a5 e7 00 01 e8 a7 00 fb ff 44 89 e9 4c 89 fe 48 c7 c7 f0 3b 07 9e 48 89 c2 e8 57 95 6e ff <0f> 0b e9 7a ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
kernel: RSP: 0018:ffffb0eb40344e30 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: ffff89ae4bd0f200 RCX: 0000000000000007
kernel: RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff89ae4f019800
kernel: RBP: ffffb0eb40344e60 R08: 0000000000000545 R09: 0000000000000004
kernel: R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
kernel: R13: 0000000000000000 R14: ffff89ae4a660480 R15: ffff89ae4a660000
kernel: FS: 0000000000000000(0000) GS:ffff89ae4f000000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000562e21d43b88 CR3: 000000023960a000 CR4: 00000000003406e0
kernel: Call Trace:
kernel: <IRQ>
kernel: ? pfifo_fast_enqueue+0x150/0x150
kernel: call_timer_fn+0x32/0x130
kernel: __run_timers.part.0+0x180/0x280
kernel: ? timerqueue_add+0x9b/0xb0
kernel: ? enqueue_hrtimer+0x3d/0x90
kernel: ? ktime_get+0x3e/0xa0
kernel: run_timer_softirq+0x2a/0x50
kernel: __do_softirq+0xe1/0x2d6
kernel: ? hrtimer_interrupt+0x13b/0x220
kernel: irq_exit+0xae/0xb0
kernel: smp_apic_timer_interrupt+0x7b/0x140
kernel: apic_timer_interrupt+0xf/0x20
kernel: </IRQ>
kernel: RIP: 0010:cpuidle_enter_state+0xca/0x3e0
kernel: Code: ff e8 fa 69 7e ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ea 02 00 00 31 ff e8 7d ed 84 ff fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 3f 02 00 00 49 63 d4 4c 8b 7d d0 4c 2b 7d c8 48 8d
kernel: RSP: 0018:ffffb0eb40167e38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
kernel: RAX: ffff89ae4f02ce00 RBX: ffff89ae3b54f800 RCX: 000000000000001f
kernel: RDX: 0000000000000000 RSI: 00000000239f541c RDI: 0000000000000000
kernel: RBP: ffffb0eb40167e78 R08: 000000090b4aca5a R09: 0000000000000e17
kernel: R10: ffff89ae4f02bac4 R11: ffff89ae4f02baa4 R12: 0000000000000002
kernel: R13: ffffffff9e378700 R14: 0000000000000002 R15: ffff89ae3...

Read more...

Revision history for this message
Bruce Pieterse (octoquad) wrote :
Download full text (4.5 KiB)

I'd like to add that I'm seeing this with Groovy as well:

Oct 26 19:08:40 ubuntu kernel: ------------[ cut here ]------------
Oct 26 19:08:40 ubuntu kernel: NETDEV WATCHDOG: enp3s0 (r8169): transmit queue 0 timed out
Oct 26 19:08:40 ubuntu kernel: WARNING: CPU: 6 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x25b/0x270
Oct 26 19:08:40 ubuntu kernel: Modules linked in: cfg80211 8021q garp mrp stp llc nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr mei_hdcp ledtrig_audio snd_hda_codec_hdmi snd_hda_intel in>
Oct 26 19:08:40 ubuntu kernel: hid_generic usbhid hid uas usb_storage amdgpu iommu_v2 gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_inte>
Oct 26 19:08:40 ubuntu kernel: CPU: 6 PID: 0 Comm: swapper/6 Tainted: G OE 5.8.0-7625-generic #26~1603126178~20.10~210fe73-Ubuntu
Oct 26 19:08:40 ubuntu kernel: Hardware name: MSI MS-7850/H97 PC Mate(MS-7850), BIOS V5.9 02/16/2016
Oct 26 19:08:40 ubuntu kernel: RIP: 0010:dev_watchdog+0x25b/0x270
Oct 26 19:08:40 ubuntu kernel: Code: 85 c0 75 e5 eb 9c 4c 89 ff c6 05 26 85 1c 01 01 e8 2a 93 fa ff 44 89 e9 4c 89 fe 48 c7 c7 c0 7a a8 b1 48 89 c2 e8 ba 30 64 ff <0f> 0b e9 7a ff ff ff 66 66 2e 0f 1f 84 00 00 00 >
Oct 26 19:08:40 ubuntu kernel: RSP: 0018:ffffad1500210e78 EFLAGS: 00010286
Oct 26 19:08:40 ubuntu kernel: RAX: 0000000000000000 RBX: ffff9b0179912200 RCX: ffff9b018dd98cd8
Oct 26 19:08:40 ubuntu kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9b018dd98cd0
Oct 26 19:08:40 ubuntu kernel: RBP: ffffad1500210ea8 R08: 0000000000000004 R09: 000000000000047a
Oct 26 19:08:40 ubuntu kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9b0179912280
Oct 26 19:08:40 ubuntu kernel: R13: 0000000000000000 R14: ffff9b01788da480 R15: ffff9b01788da000
Oct 26 19:08:40 ubuntu kernel: FS: 0000000000000000(0000) GS:ffff9b018dd80000(0000) knlGS:0000000000000000
Oct 26 19:08:40 ubuntu kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 26 19:08:40 ubuntu kernel: CR2: 00007f7677e18c90 CR3: 00000000a5c0a005 CR4: 00000000001606e0
Oct 26 19:08:40 ubuntu kernel: Call Trace:
Oct 26 19:08:40 ubuntu kernel: <IRQ>
Oct 26 19:08:40 ubuntu kernel: ? pfifo_fast_enqueue+0x150/0x150
Oct 26 19:08:40 ubuntu kernel: call_timer_fn+0x32/0x130
Oct 26 19:08:40 ubuntu kernel: __run_timers.part.0+0x184/0x280
Oct 26 19:08:40 ubuntu kernel: ? lapic_next_deadline+0x26/0x30
Oct 26 19:08:40 ubuntu kernel: ? clockevents_program_event+0x8f/0xe0
Oct 26 19:08:40 ubuntu kernel: run_timer_softirq+0x2a/0x50
Oct 26 19:08:40 ubuntu kernel: __do_softirq+0xd0/0x2a1
Oct 26 19:08:40 ubuntu kernel: asm_call_irq_on_stack+0x12/0x20
Oct 26 19:08:40 ubuntu kernel: </IRQ>
Oct 26 19:08:40 ubuntu kernel: do_softirq_own_stack+0x3d/0x50
Oct 26 19:08:40 ubuntu kernel: irq_exit_rcu+0x95/0xd0
Oct 26 19:08:40 ubuntu kernel: sysvec_apic_timer_interrupt+0x3b/0x90
Oct 26 19:08:40 ubuntu kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
Oct 26 19:08:40 ubuntu kernel: RIP: 0010:cpuidle_enter_state+0xb7/0x3f0
Oct 26 19:08:40 ubuntu kernel: Code: 3f fb 06 4f e8 4a 5d 74 ff 48 89 45 d0 0f 1f 44 00 00 31 ff e8 fa 68 74 ff 80 7...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Sorry, I pasted the wrong link, here it is:
https://people.canonical.com/~khfeng/lp1874464/

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (3.3 KiB)

Still seeing this with that kernel:

kernel: ------------[ cut here ]------------
kernel: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
kernel: WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x25b/0x270
kernel: Modules linked in: xt_comment iptable_mangle iptable_nat bpfilter xt_CHECKSUM xt_MASQUERADE dummy xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfne>
kernel: xor raid6_pq libcrc32c dm_crypt hid_logitech_hidpp hid_microsoft hid_logitech_dj ff_memless hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect ghash_clmulni_i>
kernel: CPU: 8 PID: 0 Comm: swapper/8 Tainted: P OE 5.8.0-24-generic #25~lp1874464
kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
kernel: RIP: 0010:dev_watchdog+0x25b/0x270
kernel: Code: 85 c0 75 e5 eb 9c 4c 89 ff c6 05 36 85 1c 01 01 e8 2a 93 fa ff 44 89 e9 4c 89 fe 48 c7 c7 50 7c e8 8c 48 89 c2 e8 ca 30 64 ff <0f> 0b e9 7a ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f
kernel: RSP: 0018:ffffbd1800348e78 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: ffff95808d02dc00 RCX: ffff95808ee18cd8
kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff95808ee18cd0
kernel: RBP: ffffbd1800348ea8 R08: 0000000000000004 R09: 0000000000000554
kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff95808d02dc80
kernel: R13: 0000000000000000 R14: ffff95808c9be480 R15: ffff95808c9be000
kernel: FS: 0000000000000000(0000) GS:ffff95808ee00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007ffd799aee88 CR3: 00000003ac6d8000 CR4: 00000000003406e0
kernel: Call Trace:
kernel: <IRQ>
kernel: ? pfifo_fast_enqueue+0x150/0x150
kernel: call_timer_fn+0x32/0x130
kernel: __run_timers.part.0+0x184/0x280
kernel: ? lapic_next_event+0x21/0x30
kernel: ? clockevents_program_event+0x8f/0xe0
kernel: run_timer_softirq+0x2a/0x50
kernel: __do_softirq+0xd0/0x2a1
kernel: asm_call_irq_on_stack+0x12/0x20
kernel: </IRQ>
kernel: do_softirq_own_stack+0x3d/0x50
kernel: irq_exit_rcu+0x95/0xd0
kernel: sysvec_apic_timer_interrupt+0x3b/0x90
kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
kernel: RIP: 0010:cpuidle_enter_state+0xb7/0x3f0
kernel: Code: 4f fb c6 73 e8 5a 5d 74 ff 48 89 45 d0 0f 1f 44 00 00 31 ff e8 0a 69 74 ff 80 7d c7 00 0f 85 d3 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 e4 0f 88 df 01 00 00 49 63 d4 48 8d 04 52 48 8d 0c d5 00 00
kernel: RSP: 0018:ffffbd1800167e48 EFLAGS: 00000246
kernel: RAX: ffff95808ee2c6c0 RBX: ffff958079782400 RCX: 000000000000001f
kernel: RDX: 0000000000000000 RSI: 00000000239f5376 RDI: 0000000000000000
kernel: RBP: ffffbd1800167e88 R08: 0000000853f09bec R09: 00000000ffffffff
kernel: R10: 0000000000000a06 R11: ffff95808ee2b364 R12: 0000000000000002
kernel: R13: ffffffff8d577ba0 R14: 0000000000000002 R15: 0000000000000000
kernel: ? cpuidle_enter_state+0xa6/0x3f0
kernel: cpuidle_enter+0x2e/0x40
kernel: cpuidle_idle_call+0x145/0x200
kernel: do_idle+0x7a/0xe0
kernel: ...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Dan,
Given that it's a kernel regression, would it be possible to try mainline kernel and do a kernel bisection if issue persists?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Dan, let's try latest mainline kernel again:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.10-rc2/amd64/

... and report the issue to upstream if issue persists.

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (6.1 KiB)

Hi Kai-Heng,

Here is the (much longer) trace from that kernel.

Thanks!

kernel: ------------[ cut here ]------------
kernel: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x24c/0x250
kernel: Modules linked in: scsi_transport_iscsi binfmt_misc veth nft_masq xt_comment iptable_mangle iptable_nat bpfilter dummy xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm ib_core overlay snd_hda_codec_realtek nls_iso8859>
kernel: autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c dm_crypt hid_logitech_hidpp hid_microsoft ff_memless hid_logitech_dj hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper i2c_piix4 r8169 nvme ahci i2c_nvidia_gpu xhci_pci realtek nvme_core libahci xhci_pci_renesas wmi gpio_amdpt gpio_generic
kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.10.0-051000rc2-generic #202011012330
kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
kernel: RIP: 0010:dev_watchdog+0x24c/0x250
kernel: Code: 5a 94 fd ff eb ab 4c 89 ff c6 05 e2 58 4d 01 01 e8 99 4c fa ff 44 89 e9 4c 89 fe 48 c7 c7 48 35 48 97 48 89 c2 e8 2a 69 16 00 <0f> 0b eb 8c 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 55
kernel: RSP: 0018:ffffa5e700210e90 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: ffff9a1800d12e00 RCX: 0000000000000000
kernel: RDX: ffff9a1b0eea8ca0 RSI: ffff9a1b0ee98980 RDI: 0000000000000300
kernel: RBP: ffffa5e700210ec0 R08: 0000000000000000 R09: ffffa5e700210c70
kernel: R10: ffffa5e700210c68 R11: ffffffff97b52ca8 R12: ffff9a1800d12e80
kernel: R13: 0000000000000000 R14: ffff9a18008164c0 R15: ffff9a1800816000
kernel: FS: 0000000000000000(0000) GS:ffff9a1b0ee80000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000055cfd3d2d008 CR3: 0000000107f48000 CR4: 00000000003506e0
kernel: Call Trace:
kernel: <IRQ>
kernel: ? pfifo_fast_enqueue+0x150/0x150
kernel: call_timer_fn+0x2e/0x100
kernel: __run_timers.part.0+0x1d8/0x250
kernel: ? ktime_get+0x3e/0xa0
kernel: ? lapic_next_event+0x21/0x30
kernel: ? clockevents_program_event+0x8f/0xe0
kernel: run_timer_softirq+0x2a/0x50
kernel: __do_softirq+0xce/0x281
kernel: asm_call_irq_on_stack+0x12/0x20
kernel: </IRQ>
kernel: do_softirq_own_stack+0x3d/0x50
kernel: irq_exit_rcu+0x95/0xd0
kernel: sysvec_apic_timer_interrupt+0x3d/0x90
kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
kernel: RIP: 0010:cpuidle_enter_state+0xcc/0x360
kernel: Code: 3d e9 64 88 69 e8 64 31 75 ff 49 89 c6 0f 1f 44 00 00 31 ff e8 f5 3c 75 ff 80 7d d7 00 0f 85 01 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 0d 01 00 00 49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89
kernel: RSP: 0018:ffffa5e700137e60 EFLAGS: 00000246
kernel: RAX: ffff9a1b0eeac480 RBX: 0000000000000002 RCX: 000000000000001f
kernel: RDX: 0000000000000000 RSI: 00000000239f5376 RDI: 0000000000000000
kernel: RBP: ffffa5e700137e98 R08: 00000009851dc34e R09: 00...

Read more...

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Subscribe r8169 maintainer Heiner Kallweit...

Revision history for this message
Heiner Kallweit (kalle1) wrote :

Well, reason could by anything. Most users of this chip version don't have this problem, so it may be the BIOS. Is known meanwhile whether any (mainline) kernel version is fine on this system (so that issue can be bisected)? Also interesting would be whether the issue happens with r8168 too.

Revision history for this message
Dan Watkins (oddbloke) wrote :
Download full text (6.3 KiB)

Trace from that mainline kernel:

kernel: ------------[ cut here ]------------
kernel: NETDEV WATCHDOG: enp5s0 (r8169): transmit queue 0 timed out
kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x24c/0x250
kernel: Modules linked in: scsi_transport_iscsi binfmt_misc veth nft_masq xt_comment iptable_mangle iptable_nat bpfilter dummy xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink rdma_ucm ib_uverbs rdma_cm iw_cm ib_cm ib_core overlay snd_hda_codec_realtek nls_iso8859_1 snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_usb_audio snd_hda_core snd_usbmidi_lib snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event uvcvideo snd_raw>
kernel: autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c dm_crypt hid_logitech_hidpp hid_microsoft ff_memless hid_logitech_dj hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper i2c_piix4 r8169 nvme ahci i2c_nvidia_gpu xhci_pci realtek nvme_core libahci xhci_pci_renesas wmi gpio_amdpt gpio_generic
kernel: CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.10.0-051000rc2-generic #202011012330
kernel: Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F4 01/25/2019
kernel: RIP: 0010:dev_watchdog+0x24c/0x250
kernel: Code: 5a 94 fd ff eb ab 4c 89 ff c6 05 e2 58 4d 01 01 e8 99 4c fa ff 44 89 e9 4c 89 fe 48 c7 c7 48 35 48 97 48 89 c2 e8 2a 69 16 00 <0f> 0b eb 8c 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 55
kernel: RSP: 0018:ffffa5e700210e90 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: ffff9a1800d12e00 RCX: 0000000000000000
kernel: RDX: ffff9a1b0eea8ca0 RSI: ffff9a1b0ee98980 RDI: 0000000000000300
kernel: RBP: ffffa5e700210ec0 R08: 0000000000000000 R09: ffffa5e700210c70
kernel: R10: ffffa5e700210c68 R11: ffffffff97b52ca8 R12: ffff9a1800d12e80
kernel: R13: 0000000000000000 R14: ffff9a18008164c0 R15: ffff9a1800816000
kernel: FS: 0000000000000000(0000) GS:ffff9a1b0ee80000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 000055cfd3d2d008 CR3: 0000000107f48000 CR4: 00000000003506e0
kernel: Call Trace:
kernel: <IRQ>
kernel: ? pfifo_fast_enqueue+0x150/0x150
kernel: call_timer_fn+0x2e/0x100
kernel: __run_timers.part.0+0x1d8/0x250
kernel: ? ktime_get+0x3e/0xa0
kernel: ? lapic_next_event+0x21/0x30
kernel: ? clockevents_program_event+0x8f/0xe0
kernel: run_timer_softirq+0x2a/0x50
kernel: __do_softirq+0xce/0x281
kernel: asm_call_irq_on_stack+0x12/0x20
kernel: </IRQ>
kernel: do_softirq_own_stack+0x3d/0x50
kernel: irq_exit_rcu+0x95/0xd0
kernel: sysvec_apic_timer_interrupt+0x3d/0x90
kernel: asm_sysvec_apic_timer_interrupt+0x12/0x20
kernel: RIP: 0010:cpuidle_enter_state+0xcc/0x360
kernel: Code: 3d e9 64 88 69 e8 64 31 75 ff 49 89 c6 0f 1f 44 00 00 31 ff e8 f5 3c 75 ff 80 7d d7 00 0f 85 01 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 0d 01 00 00 49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89
kernel: RSP: 0018:ffffa5e700137e60 EFLAGS: 00000246
kernel: RAX: ffff9a1b0eeac480 RBX: 0...

Read more...

Revision history for this message
Bruce.Zhao1 (ryzen-linux) wrote :

I could see this issue with same Call trace pasted above with ubuntu 20.04/20.10/21.04.
Had a look at relevant kernel code and found all of them disable only ASPM L1.1 rather than ASPM completely.Rebuilt the kernel by disabling whole ASPM feature this NETDEV WATCHDOG call trace disappears.
code hacked was like below:
pci_disable_link_state(pdev, PCIE_LINK_STATE_L1_1);//issue appears

pci_disable_link_state(pdev, PCIE_LINK_STATE_L0S |
                                           PCIE_LINK_STATE_L1);//issue disappears

I believe the experiment shows there is ASPM issue with r8169 driver.Can anyone share your view on this problem which may shed some light on the way forward.Is there a Realtek contact window we could reach out ot.Thanks for anyone who can help.

Revision history for this message
Heiner Kallweit (kalle1) wrote :

That's why mainline r8169 disables ASPM completely. Users still have the option to re-enable individual ASPM states per sysfs, but that's at own risk.
It's not known why and which combinations of board chipset / BIOS / NIC chipset version have an issue when ASPM L1 is enabled. All three components may contribute, unfortunately Realtek doesn't release errata information.

Revision history for this message
Bruce Pieterse (octoquad) wrote :

I just want to share a quick observation related to this bug. I've noticed that when there is a kernel update and doing a soft-reboot, this problem shows up. When this occurs, I do a soft-shutdown and power up and the problem is resolved.

Revision history for this message
Bruce Pieterse (octoquad) wrote :

Forgot to mention this problem also exists in hirsute with 5.11.0-7620-generic.

Revision history for this message
Bruce.Zhao1 (ryzen-linux) wrote :

If problem happens on certain platform,SOC/BIOS/NIC chipset are something we know of,which combination of them causing the issue could also be known by regression test.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Heiner, can we use something like DMI-based blacklist in the r8169 driver? Whitelist doesn't scale that well...

Revision history for this message
Heiner Kallweit (kalle1) wrote :

Blacklist for what? ASPM L1? In mainline this wouldn't be needed because L1 is disabled per default. Downstream this could be an option, of course.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I mean keep the default ASPM L0/L1 set by PCI, and create a blacklist in r8169.

Revision history for this message
Heiner Kallweit (kalle1) wrote :

For mainline that's too risky, because there has been a number of different symptoms of ASPM-related problems. And it would take time to assemble a blacklist, in the meantime users would complain about network problems.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Is it possible to defaults to ASPM enabled, and disable ASPM in rtl8169_tx_timeout()?

Revision history for this message
Heiner Kallweit (kalle1) wrote :

ASPM issues come with quite different symptoms. Sometimes there's just a number of rx_missed errors and performance is significantly reduced, w/o tx timeout. Therefore at least in mainline I'd like to keep ASPM disabled per default. But every user or distro can use sysfs to enable selected ASPM states by using the attributes under /sys/class/net/<if>/device/link (provided that BIOS allows OS to control ASPM).

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

The distro kernel also needs to think of laptop power consumption, so we need to enable ASPM by default to achieve maximum battery life.

So it's a choice between keep the blacklist in the distro kernel or in a distro udev rule file.

Let me think about it...

Revision history for this message
Bruce.Zhao1 (ryzen-linux) wrote :

Inadvertently,I found Realtek NIC came in and out of the aspm L1 state in an unreasonably high frequency even during downloading something.For a NVME device,it does not show this symptom during it's working and having workloads in flight which can be indicated by monitoring clkreq# being asserted/deasserted.To dig deeper for this issue,is someone familiar with Realtek NIC internal logic of aspm,maybe it will do the trick by tweaking some of the specific registers?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Bruce, how do you observe APSM L1 residency? I'd like to do the same.

Revision history for this message
Bruce.Zhao1 (ryzen-linux) wrote :

Probably you need to hack BIOS code to keep monitoring clkreq# as an interrupt.If not possible,I believe you can capture it by the Scope.

Revision history for this message
Gustavo A. Díaz (gdiaz) wrote :
Download full text (6.9 KiB)

Hi,

I am facing the same problem in Focal, but not with the onboard TL8111/8168/8411 PCI Express Gigabit Ethernet Controller, but with the USB 3.0 one I have (SIX Electronics Corp. AX88179 Gigabit Ethernet):

------------[ cut here ]------------
[94104.121581] NETDEV WATCHDOG: waneth (ax88179_178a): transmit queue 0 timed out
[94104.121606] WARNING: CPU: 2 PID: 217952 at net/sched/sch_generic.c:467 dev_watchdog+0x24f/0x260
[94104.121615] Modules linked in: binfmt_misc xt_recent nfnetlink nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua snd_sof_pci_intel_apl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof soundwire_bus snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core intel_rapl_msr snd_soc_sst_ipc intel_rapl_common 8814au(OE) mei_hdcp snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi intel_pmc_bxt snd_soc_core intel_telemetry_pltdrv intel_punit_ipc snd_compress intel_telemetry_core snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic ac97_bus ledtrig_audio snd_pcm_dmaengine kvm_intel snd_hda_intel kvm snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec rapl intel_cstate snd_hda_core snd_hwdep snd_pcm snd_timer ax88179_178a cfg80211 efi_pstore mei_me usbnet ee1004 snd mii soundcore mei mac_hid bridge ip6t_REJECT nf_reject_ipv6 stp llc xt_hl
[94104.121696] ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_limit xt_addrtype xt_tcpudp xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6table_filter ip6_tables sch_fq_codel iptable_filter bpfilter msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage i915 i2c_algo_bit drm_kms_helper crct10dif_pclmul crc32_pclmul ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core aesni_intel crypto_simd i2c_i801 xhci_pci i2c_smbus cryptd r8169 realtek xhci_pci_renesas drm sdhci_pci ahci cqhci sdhci libahci video
[94104.121771] CPU: 2 PID: 217952 Comm: PLUGIN[cgroups] Tainted: G OE 5.13.0-22-generic #22~20.04.1-Ubuntu
[94104.121775] Hardware name: GIGABYTE MZGLKDP-00/MZGLKDP-00, BIOS F1 12/21/2017
[94104.121777] RIP: 0010:dev_watchdog+0x24f/0x260
[94104.121782] Code: c7 36 fd ff eb ab 4c 89 ff c6 05 60 f1 6e 01 01 e8 86 11 fa ff 44 89 e9 4c 89 fe 48 c7 c7 20 9c ca 89 48 89 c2 e8 38 17 17 00 <0f> 0b eb 8c 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00
[94104.121785] RSP: 0018:ffffb515c0138e88 EFLAGS: 00010282
[94104.121788] RAX: 0000000000000000 RBX: ffff91214fc45e00 RCX: 0000000000000027
[94104.121790] RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff9124b05189c8
[94104.121792] RBP: ffffb515c0138eb8 R08: ffff9124b05189c0 R09: ffffb515c0138c60
[94104.121794] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001
[94104.121795] R13: 0000000000000000 R14: ffff912154a7d480 R15: ffff912154a7d000
[94104.121797] FS: 00007f74619e5700(0000) GS:ffff9124b0500000(0000) knlGS:0000000000000000
[94104....

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.