Soft lockup due to interrupt storm from smbus
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Linux |
Fix Released
|
Medium
|
|||
Fedora |
Confirmed
|
Undecided
|
|||
linux (Ubuntu) |
Incomplete
|
Undecided
|
Unassigned | ||
linux-hwe-5.11 (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned | ||
linux-hwe-5.13 (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Ubuntu 20.04 LTS and Ubuntu 21.04 occasionally boots with very bad performance and very unresponsive to user input on Lenovo laptop Lenovo 300e 2nd Gen 81M9 (LENOVO_
When this happens you can read this kind of messages on journal:
---
root@alumne-1-58:~# journalctl | grep "BUG: soft"
may 20 21:44:35 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
may 20 21:44:35 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
may 22 09:33:34 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
may 24 16:45:14 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [prometheus-
may 24 16:45:14 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
jun 03 00:01:09 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
jun 03 00:01:09 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:0]
jun 03 00:01:09 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
jun 03 00:01:09 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
jun 03 00:02:15 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [swapper/0:0]
jun 05 08:22:58 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [irq/138-
jun 05 08:25:06 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
jun 05 08:25:06 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [irq/138-
jun 05 08:26:42 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [lxd:3975]
jun 05 08:26:42 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [swapper/2:0]
jun 05 08:26:42 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [irq/138-
jun 05 08:27:38 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [irq/138-
jun 05 08:28:34 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [irq/138-
jun 05 08:29:46 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [irq/138-
root@alumne-1-58:~#
---
Usually if you reboot everything works fine but it's very annoying when happens.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #16 |
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #17 |
The jc42 module seems to work, as lm_sensors do find the sensors, after loading it:
Galactica ~ # sensors
jc42-i2c-1-19
Adapter: SMBus I801 adapter at e000
temp1: +30.8°C (low = +0.0°C) ALARM (HIGH, CRIT)
jc42-i2c-1-1a
Adapter: SMBus I801 adapter at e000
temp1: +29.5°C (low = +0.0°C) ALARM (HIGH, CRIT)
jc42-i2c-1-18
Adapter: SMBus I801 adapter at e000
temp1: +27.2°C (low = +0.0°C) ALARM (HIGH, CRIT)
jc42-i2c-1-1b
Adapter: SMBus I801 adapter at e000
temp1: +28.2°C (low = +0.0°C) ALARM (HIGH, CRIT)
In Linux Kernel Bug Tracker #177311, linux (linux-linux-kernel-bugs) wrote : | #18 |
You need to set the temperature limits correctly. Without limits, the chips will persistently generate alarms which is the likely cause of the interrupts.
That won't solve the completion interrupt timeouts, though. That may be another problem.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #19 |
(In reply to Guenter Roeck from comment #2)
> You need to set the temperature limits correctly. Without limits, the chips
> will persistently generate alarms which is the likely cause of the
> interrupts.
>
> That won't solve the completion interrupt timeouts, though. That may be
> another problem.
Hi!
Thanks for your answer. I've gave a try and set those limits, so sensors does not show any more ALARM. Seems not to be the cause, because after settings, the interrupts are still generated massivley..
jc42-i2c-1-1b
Adapter: SMBus I801 adapter at e000
RAM: +30.0°C (low = +0.0°C)
jc42-i2c-1-19
Adapter: SMBus I801 adapter at e000
RAM: +32.0°C (low = +0.0°C)
jc42-i2c-1-1a
Adapter: SMBus I801 adapter at e000
RAM: +31.0°C (low = +0.0°C)
jc42-i2c-1-18
Adapter: SMBus I801 adapter at e000
RAM: +28.0°C (low = +0.0°C)
Cheers
Conrad
In Linux Kernel Bug Tracker #177311, linux (linux-linux-kernel-bugs) wrote : | #20 |
Weird, especially since the chips should not generate interrupts in the first place unless it is explicitly enabled (which the driver doesn't do, or at least shouldn't do). My wild guess is that taking the chips out of shutdown mode for some reasons enables the interrupt.
Can you send the output of "i2cdump -y -f 1 0x18 w" ? Also, do the interrupts stop when you unload the driver ?
Thanks,
Guenter
In Linux Kernel Bug Tracker #177311, linux (linux-linux-kernel-bugs) wrote : | #21 |
Please forget the question about the unload, as you already answered it.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #22 |
(In reply to Guenter Roeck from comment #4)
> Weird, especially since the chips should not generate interrupts in the
> first place unless it is explicitly enabled (which the driver doesn't do, or
> at least shouldn't do). My wild guess is that taking the chips out of
> shutdown mode for some reasons enables the interrupt.
>
> Can you send the output of "i2cdump -y -f 1 0x18 w" ?
Here we go:
╭─root@Galactica ~
╰─➤ i2cdump -y -f 1 0x18 w
0,8 1,9 2,a 3,b 4,c 5,d 6,e 7,f
00: ef00 0000 0005 0000 0005 c801 1f00 0182
08: 0000 0000 0000 0000 0000 0000 0000 0000
10: 0000 0000 0000 0000 0000 0000 0000 0000
18: 0000 0000 0000 0000 0000 0000 0000 0000
20: 0000 0000 0000 0000 0000 0000 0000 0000
28: 0000 0000 0000 0000 0000 0000 0000 0000
30: 0000 0000 0000 0000 0000 0000 0000 0000
38: 0000 0000 0000 0000 0000 0000 0000 0000
40: 0000 0000 0000 0000 0000 0000 0000 0000
48: 0000 0000 0000 0000 0000 0000 0000 0000
50: 0000 0000 0000 0000 0000 0000 0000 0000
58: 0000 0000 0000 0000 0000 0000 0000 0000
60: 0000 0000 0000 0000 0000 0000 0000 0000
68: 0000 0000 0000 0000 0000 0000 0000 0000
70: 0000 0000 0000 0000 0000 0000 0000 0000
78: 0000 0000 0000 0000 0000 0000 0000 0000
80: 0000 0000 0000 0000 0000 0000 0000 0000
88: 0000 0000 0000 0000 0000 0000 0000 0000
90: 0000 0000 0000 0000 0000 0000 0000 0000
98: 0000 0000 0000 0000 0000 0000 0000 0000
a0: 0000 0000 0000 0000 0000 0000 0000 0000
a8: 0000 0000 0000 0000 0000 0000 0000 0000
b0: 0000 0000 0000 0000 0000 0000 0000 0000
b8: 0000 0000 0000 0000 0000 0000 0000 0000
c0: 0000 0000 0000 0000 0000 0000 0000 0000
c8: 0000 0000 0000 0000 0000 0000 0000 0000
d0: 0000 0000 0000 0000 0000 0000 0000 0000
d8: 0000 0000 0000 0000 0000 0000 0000 0000
e0: 0000 0000 0000 0000 0000 0000 0000 0000
e8: 0000 0000 0000 0000 0000 0000 0000 0000
f0: 0000 0000 0000 0000 0000 0000 0000 0000
f8: 0000 0000 0000 0000 0000 0000 0000 0000
>Also, do the interrupts stop when you unload the driver ?
No, they stop first, when I do a complete server reboot.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #23 |
Ah, forgot to add. Loading the old "eeprom"-module causes the same problem with the interrupts, see [1]. Maybe this is somehow connected?
In Linux Kernel Bug Tracker #177311, linux (linux-linux-kernel-bugs) wrote : | #24 |
This is an Atmel AT30TS00. Per configuration register, events are disabled, and there is no event pending, meaning it should not really be the JC42s generating the interrupts.
Another question: If you only load the i801 module after boot (ie prevent the jc42 module from loading, eg by blacklisting it, but still load the i801 module), do you still get the interrupts ?
Thanks,
Guenter
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #25 |
(In reply to Guenter Roeck from comment #8)
> Another question: If you only load the i801 module after boot (ie prevent
> the jc42 module from loading, eg by blacklisting it, but still load the i801
> module), do you still get the interrupts ?
That's my current situation ;-) jc42 is only a module, which is currently not being loaded at system startup and i801 is compiled into my kernel. In such case, zero interrupts are generated on i801_smbus.
Cheers
Conrad
In Linux Kernel Bug Tracker #177311, linux (linux-linux-kernel-bugs) wrote : | #26 |
#7 suggests a problem with the i801 driver and its interrupt handling. #9 contradicts that a bit, though.
Maybe the C2000 has problems with interrupts, or implements it differently than handled by the driver. This may be triggered by an actual access on the bus. You could try to confirm it by running the i2cdump command after booting without the jc42 module loaded (i2cdetect -y 1 should show no reserved addresses) and see if the interrupts start happening.
Thanks,
Guenter
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #27 |
(In reply to Guenter Roeck from comment #10)
> #7 suggests a problem with the i801 driver and its interrupt handling. #9
> contradicts that a bit, though.
>
> Maybe the C2000 has problems with interrupts, or implements it differently
> than handled by the driver. This may be triggered by an actual access on the
> bus. You could try to confirm it by running the i2cdump command after
> booting without the jc42 module loaded (i2cdetect -y 1 should show no
> reserved addresses) and see if the interrupts start happening.
>
> Thanks,
> Guenter
You nail it ;-) Right after executing "i2cdump -y -f 1 0x18 w", the interrupts start massively. But jc42 wasn't loaded.
Cheers
Conrad
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #28 |
Sorry, but I don't know, what do you mean here by reserved?
Before/after executing i2cdump (output is the same):
╭─root@Galactica ~
╰─➤ i2cdetect -y 1
0 1 2 3 4 5 6 7 8 9 a b c d e f
00: -- -- -- -- -- 08 -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- 18 19 1a 1b -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- 2e --
30: 30 31 32 33 -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- 49 -- -- -- -- -- --
50: 50 51 52 53 -- -- -- -- -- -- -- -- -- -- -- --
60: -- 61 -- -- -- -- -- -- -- 69 -- -- 6c -- -- --
70: -- -- -- -- -- -- -- --
A simple "i2cdetect -y 1" also triggers the interrupts.
In Linux Kernel Bug Tracker #177311, linux (linux-linux-kernel-bugs) wrote : | #29 |
With "reserved" I meant "a driver for a chip is loaded". After you load the jc42 driver (or the eeprom driver), you'll see that some of the addresses show up as "UU".
Anyway, I think the conclusion is that the i801 driver has problems with interrupt support on your hardware, as I suspected in #10. Issue #177291 is really the same problem. Jean maintains that driver as well, so he should be able to help.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #30 |
(In reply to Guenter Roeck from comment #13)
> With "reserved" I meant "a driver for a chip is loaded". After you load the
> jc42 driver (or the eeprom driver), you'll see that some of the addresses
> show up as "UU".
Ah I see. Yes, after loading jc42, I can see "UU".
╭─root@Galactica ~
╰─➤ i2cdetect -y 1
0 1 2 3 4 5 6 7 8 9 a b c d e f
00: -- -- -- -- -- 08 -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- UU UU UU UU -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- 2e --
30: 30 31 32 33 -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- 49 -- -- -- -- -- --
50: 50 51 52 53 -- -- -- -- -- -- -- -- -- -- -- --
60: -- 61 -- -- -- -- -- -- -- 69 -- -- 6c -- -- --
70: -- -- -- -- -- -- -- --
> Anyway, I think the conclusion is that the i801 driver has problems with
> interrupt support on your hardware, as I suspected in #10. Issue #177291 is
> really the same problem. Jean maintains that driver as well, so he should be
> able to help.
Should I close #177291 as a duplicate, as it's mine ticket.
Thanks for your support. Hope, Jean has an idea :)
In Linux Kernel Bug Tracker #177311, jdelvare (jdelvare-linux-kernel-bugs) wrote : | #31 |
Thanks Guenter for stepping in. I always suspected the problem was with the SMBus controller (i2c-i801 driver) and I intended to comment about it long ago but then forgot, sorry about that :-(
In Linux Kernel Bug Tracker #177311, jdelvare (jdelvare-linux-kernel-bugs) wrote : | #32 |
Conrad, I need detailed information about the SMBus PCI devices and the IRQs on your machine. Please attach the output of:
$ /sbin/lspci -nn | grep SMBus
$ /sbin/lspci -xxx -s <device>
(for each device listed above)
$ cat /proc/interrupts
Also look for any message related to i2c, SMBus, i801 or the PCI devices above in the kernel logs.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #33 |
Hello Jean!
(In reply to Jean Delvare from comment #16)
> $ /sbin/lspci -nn | grep SMBus
00:13.0 System peripheral [0880]: Intel Corporation Atom processor C2000 SMBus 2.0 [8086:1f15] (rev 02)
00:1f.3 SMBus [0c05]: Intel Corporation Atom processor C2000 PCU SMBus [8086:1f3c] (rev 02)
> $ /sbin/lspci -xxx -s <device>
> (for each device listed abov
╭─root@Galactica /home/kostecki
╰─➤ lspci -xxx -s 00:13.0
00:13.0 System peripheral: Intel Corporation Atom processor C2000 SMBus 2.0 (rev 02)
00: 86 80 15 1f 46 05 10 00 02 00 80 08 00 00 00 00
10: 04 40 f1 ff 0f 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 10 80 92 00 01 80 00 10 20 08 04 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 8c 03 00 00 00 00 00 00 00 00 00 05 00 81 01
90: 0c f0 ef fe 00 00 00 00 a6 41 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 01 00 10 00 10 80
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
╭─root@Galactica /home/kostecki
╰─➤ lspci -xxx -s 00:1f.3
00:1f.3 SMBus: Intel Corporation Atom processor C2000 PCU SMBus (rev 02)
00: 86 80 3c 1f 43 01 98 02 02 00 05 0c 00 00 00 00
10: 00 00 50 df 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e0 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 02 00 00
40: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 03 04 04 00 00 00 08 08 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 0f 02 01 03 03 03 00
> $ cat /proc/interrupts
See attachment.
> Also look for any message related to i2c, SMBus, i801 or the PCI devices
> above in the kernel logs.
╭─root@Galactica /
╰─➤ dmesg|grep -i smbus
[ 7.968653] i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
[ 7.970338] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[ 7.974068] ismt_smbus 0000:00:13.0: enabling device (0140 -> 0142)
[ 974.471917] ismt_smbus 0000:00:13.0: completion wait timed out
[ 975.512022] ismt_smbus 0000:00:13.0: completion wait timed out
[ 976.552097] ismt_smbus 0000:00:13.0: completion wait timed out
[ 977.592124] ismt_smbus 0000:00:13.0: completion wait timed out
[ 978.632168] ismt_smbus 0000:00:13.0: completion wait timed out
[ 979.682207] ismt_smbus 0000:00:13.0: completion wait timed out
[ 980.712251] ismt_smbus 0000:00:13.0: completion wait timed out
[ 981.752310] ismt_smbus 0000:00:13...
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #34 |
Created attachment 246221
cat /proc/interrupts
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #35 |
Created attachment 246231
dmesg output
In Linux Kernel Bug Tracker #177311, jdelvare (jdelvare-linux-kernel-bugs) wrote : | #36 |
Can you blacklist ismt-msi, reboot and see if it makes any difference?
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #37 |
(In reply to Jean Delvare from comment #20)
> Can you blacklist ismt-msi, reboot and see if it makes any difference?
No, didn't changed anything. I've compiled a new kernel without ismt-msi (CONFIG_I2C_ISMT=n) and still after loading jc42 interrupts go very high.
In Linux Kernel Bug Tracker #177311, jdelvare (jdelvare-linux-kernel-bugs) wrote : | #38 |
OK, thanks. I have added Intel folks to Cc. I can't find the register descriptions for the Atom C2000 SMBus function, so there's not so much I can do.
Conrad, support for the SMBus in this CPU family was added several years ago to the i2c-i801 driver, so I am wondering why this bug is only reported now.
Is this new hardware for you? Or you have it for some time, and it was working fine so far, and broke with a kernel or OS update?
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #39 |
I found some datasheet through Avoton C2750
http://
->
https:/
I guess both C2758 and C2750 are compatible as they are listed in C2000 Product Family for Communications.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #40 |
(In reply to Jean Delvare from comment #22)
> Is this new hardware for you? Or you have it for some time, and it was
> working fine so far, and broke with a kernel or OS update?
Yes, this is new hardware. I bought it a few weeks before starting this ticket. So I can't tell, if it was working before.
(In reply to Jarkko Nikula from comment #23)
> I found some datasheet through Avoton C2750
> http://
> GHz
> ->
> https:/
> atom-c2000-
>
> I guess both C2758 and C2750 are compatible as they are listed in C2000
> Product Family for Communications.
C2750 is with turbo boost, C2758 has instead of turbo boost a quickassist accelerator. (Don't know, if this makes a difference for the register)
In Linux Kernel Bug Tracker #177311, jdelvare (jdelvare-linux-kernel-bugs) wrote : | #41 |
Jarkko, I found the same document, however it doesn't appear to contain register definitions, or I am blind.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #42 |
(In reply to Jean Delvare from comment #25)
> Jarkko, I found the same document, however it doesn't appear to contain
> register definitions, or I am blind.
Maybe chapter 15.8 and 18.5? Sorry, if that's wrong, as I don't know, if that's, what you are searching?
In Linux Kernel Bug Tracker #177311, linux (linux-linux-kernel-bugs) wrote : | #43 |
Problem is that only the register addresses are provided, not the register definitions. Sure, there is a status register, and we know its address, but we don't know how the bits are defined and if they are defined exactly like in other Intel CPUs.
With the C2000 being a different micro-architecture than the "mainline" Intel CPUs, there is a real possibility that the register definitions are different.
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #44 |
Sorry, I looked at it too quickly. Indeed definitions are missing. I'll ask http://
In Linux Kernel Bug Tracker #177311, jdelvare (jdelvare-linux-kernel-bugs) wrote : | #45 |
Conrad, until we sort it out, you may be able to work around the problem by passing option disable_
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #46 |
(In reply to Jean Delvare from comment #29)
> Conrad, until we sort it out, you may be able to work around the problem by
> passing option disable_
Hey Jean,
seems to help as a workaround after disabling the interrupts for i2c-i801.
[ 7.950079] i801_smbus 0000:00:1f.3: Interrupt disabled by user
[ 7.951624] i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
[ 7.953270] i801_smbus 0000:00:1f.3: SMBus using polling
Cheers
Conrad
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #47 |
*** Bug 177291 has been marked as a duplicate of this bug. ***
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #48 |
Any news for me? :)
In Linux Kernel Bug Tracker #177311, jdelvare (jdelvare-linux-kernel-bugs) wrote : | #49 |
Jarkko, were you able to get your hands on a datasheet? It doesn't need to be public, if you can check the register definitions for us.
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #50 |
I got one contact info back in December but no response. Maybe busy before holidays and I forgot to ping again. I'll ask again.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #51 |
(In reply to Jarkko Nikula from comment #34)
> I got one contact info back in December but no response. Maybe busy before
> holidays and I forgot to ping again. I'll ask again.
Did you got any reply?
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #52 |
Just only out of office reply back in March but pinged again now.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #53 |
(In reply to Jarkko Nikula from comment #36)
> Just only out of office reply back in March but pinged again now.
And now? ;-)
In Linux Kernel Bug Tracker #177311, andy.shevchenko (andy.shevchenko-linux-kernel-bugs) wrote : | #54 |
Hmm... Seems this one gets somehow abandoned. Jarkko, any news on this? Same question to Conrad, do you have any luck with v5.11 based kernels (or closer to latest)?
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #55 |
(In reply to Andy Shevchenko from comment #38)
> Hmm... Seems this one gets somehow abandoned. Jarkko, any news on this? Same
> question to Conrad, do you have any luck with v5.11 based kernels (or closer
> to latest)?
Nope. No news. Problem still exists with latest kernel.
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #56 |
Unfortunately I don't have any updates on this.
vcarceler (vcarceler-b) wrote : | #1 |
Ubuntu Foundations Team Bug Bot (crichton) wrote : | #2 |
Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https:/
To change the source package that this bug is filed about visit https:/
[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]
tags: | added: bot-comment |
affects: | ubuntu → linux (Ubuntu) |
Thadeu Lima de Souza Cascardo (cascardo) wrote : | #3 |
Hi, vcarceler.
Can you give complete dmesg from when this happens?
Thanks for you report.
Cascardo.
vcarceler (vcarceler-b) wrote : | #4 |
Hello Cascardo.
Here you will find dmesg.tgz with:
dmesg/dmesg-
dmesg/dmesg-
dmesg/dmesg-
dmesg/dmesg-
dmesg-normal.txt is a full dmesg when the computer works fine.
dmesg-unrespons
dmesg-2021-
When this happens nothing works well. I even deployed a small script to reboot the laptop when this happens.
We are an school with hundreds of desktops and laptops with ubuntu 20.04 without problems. But we have received a big number of this lenovo laptops that does't work well with ubuntu 20.04 or 21.04.
I don't know if it may help you but with Fedora 34 the laptop works fine.
Thank you for your attention.
In Linux Kernel Bug Tracker #177311, andy.shevchenko (andy.shevchenko-linux-kernel-bugs) wrote : | #57 |
This bug gives me an idea to try MSI on i801, but it appears that there is none of the platforms that have MSI capability on this device. Not sure if it's usable information, but I think it's better to share it anyway.
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs. | #5 |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1931001
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
affects: | linux (Ubuntu) → linux-hwe-5.11 (Ubuntu) |
Changed in linux-hwe-5.11 (Ubuntu): | |
status: | Incomplete → Confirmed |
Jan Herold (yzle) wrote : Re: kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! | #6 |
- logs_and_infos.zip Edit (238.8 KiB, application/zip)
I also have this problem. After the automatic update from kernel 5.8 to 5.11 this error occurs.
The error only occurs during a cold boot. When rebooting the system, this error does not occur.
Here an interesting thread about this problem: https:/
In Red Hat Bugzilla #2009977, byron.c.hawkins (byron.c.hawkins-redhat-bugs) wrote : | #12 |
1. Please describe the problem:
Fedora 34 is totally unusable on an Acer Aspire 1 A114-32-P9MN (laptop), which probably does not have a quality BIOS implementation, but does work fine with Debian 11, Oracle Linux 8.4, etc. It only has problems with Fedora 34. The machine constantly reports "soft lockup" and something about a watchdog, which I know nothing about, really. The "soft lockup" occurs in many different modules and contexts (as indicated by the vast number of stack traces in the system logs). Booting from a live USB of Fedora 34, it often took more than 30 minutes to reach the initial desktop, whereas Oracle Linux boots in about 10 seconds and never causes a "soft lockup". I tried dozens of configuration adjustments to workaround the problem, but nothing improved. Considering the large number of user reports mentioning "soft lockup" on Fedora 34, it seems to me that something is seriously wrong with the build. For now, I have moved to Oracle Linux and will not install Fedora again on any machine.
2. What is the Version-Release number of the kernel:
5.13.4-
3. Did it work previously in Fedora? If so, what kernel version did the issue
*first* appear? Old kernels are available for download at
https:/
I didn't notice any problems on Fedora 32. After upgrading to Fedora 34 (5.13.4-
4. Can you reproduce this issue? If so, please provide the steps to reproduce
the issue below:
Install Fedora 34 on an Acer Aspire 1 A114-32-P9MN, or just boot from a live USB. It will hang with soft lockups.
5. Does this problem occur with the latest Rawhide kernel? To install the
Rawhide kernel, run ``sudo dnf install fedora-
``sudo dnf update --enablerepo=
Sorry, I switched to Oracle Linux, and am in the process of migrating all my machines. Fedora is not an option if it has such severe problems on basic commodity hardware.
6. Are you running any modules that not shipped with directly Fedora's kernel?:
No, just a plain live USB will trigger the problem at its fullest severity.
7. Please attach the kernel logs. You can get the complete kernel log
for a boot with ``journalctl --no-hostname -k > dmesg.txt``. If the
issue occurred on a previous boot, use the journalctl ``-b`` flag.
Sorry, the system has been wiped clean for an install of Oracle Linux, which works with no problems. But it any case, the machine was responding so poorly under Fedora 34 that it would have been nearly impossible to obtain the logs, even by simply copying them to a USB drive. The machine is entirely crippled under Fedora 34.
paul janssen (pauluswaulus) wrote (last edit ): | #7 |
- dmesg, lshw ,lspci, syslog, screenshot atop Edit (1.3 MiB, application/zip)
Same same, after upgrade from 5.8 to 5.11.
Soft lockups during boot, long boot time, and after that a very slow machine.
My machine is also Intel Celeron based, like the previous reports.
After login into the desktop environment, the "atop" program shows that almost all cpu time is spent in irq, where this normally is close to 0 percent. (see attachment)
Login in was hard because not all keyboard input was processed.
The "old" ubuntu is still working as expected (see attachment).
See attachment for logs.
I tried the following:
* Ubuntu Live image on usb: same problem
* Fedora Live image on usb: same problem
* wait until the boot process comes through and collect the logs (kernel params: nomodeset debug verbose)
* perform an apt-get upgrade; apt-get update, reboot, problem still present
* fsck , all was okay.
* tried kernel parameter intel_idle.
* tried kernel parameter noapic (following a hunch), same problem
paul janssen (pauluswaulus) wrote (last edit ): | #8 |
Since I was not able to use "Also affects distribution/
* https:/
* https:/
* https:/
* https:/
* https:/
* https:/
* https:/
The match I looked for was:
* soft lock ups during boot
* If boot log available: RIP at either "__do_softirq" or "cpuidle_
paul janssen (pauluswaulus) wrote (last edit ): | #9 |
Also tried:
* kernel parameter watchdog_thresh=20, same problem
* BIOS setting fast boot=disabled ( was enabled), same problem
paul janssen (pauluswaulus) wrote : | #10 |
Possible work around (not a fix), blacklist module i2c_i801. It works for me ...
Since I noticed a high amount of CPU time spent in interrupt handling I looked at /proc/interrupts (right after the slow boot and slow login):
$ cat /proc/interrupts
CPU0 CPU1
0: 9 0 IR-IO-APIC 2-edge timer
1: 0 249 IR-IO-APIC 1-edge i8042
8: 1 0 IR-IO-APIC 8-fasteoi rtc0
9: 0 1017 IR-IO-APIC 9-fasteoi acpi
14: 0 591 IR-IO-APIC 14-fasteoi INT3453:00, INT3453:01, INT3453:03
15: 0 0 IR-IO-APIC 15-fasteoi INT3453:02
20: 190734634 0 IR-IO-APIC 20-fasteoi i801_smbus
31: 8350 0 IR-IO-APIC 31-fasteoi idma64.0, i2c_designware.0
39: 0 84628 IR-IO-APIC 39-fasteoi mmc0
120: 0 0 DMAR-MSI 0-edge dmar0
121: 0 0 DMAR-MSI 1-edge dmar1
122: 0 0 IR-PCI-MSI 311296-edge PCIe PME
123: 0 0 IR-PCI-MSI 315392-edge PCIe PME
124: 0 0 IR-PCI-MSI 317440-edge PCIe PME
125: 0 0 IR-PCI-MSI 294912-edge ahci[0000:00:12.0]
126: 0 3 IR-PCI-MSI 1048576-edge rtsx_pci
127: 4171 0 IR-PCI-MSI 344064-edge xhci_hcd
128: 0 296 INT3453:00 18 ELAN0503:00
129: 0 0 IR-PCI-MSI 1050624-edge enp2s0f1
130: 0 44 IR-PCI-MSI 245760-edge mei_me
131: 18279 0 IR-PCI-MSI 1572864-edge ath10k_pci
132: 0 669 IR-PCI-MSI 229376-edge snd_hda_intel:card0
NMI: 690 49 Non-maskable interrupts
LOC: 693366 704015 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 690 49 Performance monitoring interrupts
IWI: 31340 91937 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 23071 21772 Rescheduling interrupts
CAL: 10091 3666 Function call interrupts
TLB: 2750 4570 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
DFR: 0 0 Deferred Error APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 10 11 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 Posted-interrupt notification event
NPI: 0 0 Nested posted-interrupt event
PIW: 0 0 Posted-interrupt wakeup event
This lead me to the module i801_smbus which depends on i2c_i801 module (found this out using lsmod).
Following this ~similar~ issue (https:/
I added "module_
Note: I do not fully understand the consequences of not having the i2C_i801 and i801_smbus modules.
The ?bett...
In Linux Kernel Bug Tracker #177311, stephane.poignant (stephane.poignant-linux-kernel-bugs) wrote : | #58 |
Not sure that's completely related, but would assume at least partially.
I have two mini-servers, one with a Supermicro A2SDi-8C-HLN4F (Atom C3758), and the other one with an older Supermicro A1SRM-2758F (Atom C2758F).
I upgraded both from Debian Buster (kernel 4.19.194-3) to Bullseye (5.10.46-5). No issue on the C3758, but i was faced with severe performance regression on the C2758F.
When running 5.10 on the C2758F, /proc/interrupts shows about 100k interrupts per second for 'IO-APIC 18-fasteoi i801_smbus', and overall performance suffers a lot (e.g. iperf between two KVM virtual machines bridged together is 93% slower with 5.10 than with 4.19).
So far i was getting around the issue by blocklisting i2c_i801. After i found this, i tried adding the disable_
I'm not using jc42 at all, sensors thresholds are set to correct values by the distro tools.
# i2cdetect -l
# sensors
nvme-pci-0400
Adapter: PCI adapter
Composite: +30.9°C (low = -273.1°C, high = +84.8°C)
Sensor 1: +30.9°C (low = -273.1°C, high = +65261.8°C)
Sensor 2: +31.9°C (low = -273.1°C, high = +65261.8°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +48.0°C (high = +98.0°C, crit = +98.0°C)
Core 1: +48.0°C (high = +98.0°C, crit = +98.0°C)
Core 2: +48.0°C (high = +98.0°C, crit = +98.0°C)
Core 3: +48.0°C (high = +98.0°C, crit = +98.0°C)
Core 4: +47.0°C (high = +98.0°C, crit = +98.0°C)
Core 5: +46.0°C (high = +98.0°C, crit = +98.0°C)
Core 6: +47.0°C (high = +98.0°C, crit = +98.0°C)
Core 7: +47.0°C (high = +98.0°C, crit = +98.0°C)
# dmesg | egrep -i '(smbus|i801)'
[ 2.226240] ismt_smbus 0000:00:13.0: enabling device (0000 -> 0002)
[ 2.229927] i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)
[ 2.230089] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[ 2.230136] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
~# lspci -nn | grep SMBus
00:13.0 System peripheral [0880]: Intel Corporation Atom processor C2000 SMBus 2.0 [8086:1f15] (rev 03)
00:1f.3 SMBus [0c05]: Intel Corporation Atom processor C2000 PCU SMBus [8086:1f3c] (rev 03)
# lspci -xxx -s 00:13.0
00:13.0 System peripheral: Intel Corporation Atom processor C2000 SMBus 2.0 (rev 03)
00: 86 80 15 1f 06 04 10 00 03 00 80 08 00 00 00 00
10: 04 70 31 df 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 10 80 92 00 01 80 00 10 20 08 04 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 8c 03 00 00 00 00 00 00 00 00 00 05 00 81 01
90: 04 00 e4 fe 00 00 00 00 21 40 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 01 00 10 00 10 80
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
# lspci -xxx -s 00:1f.3
00:1f.3 SMBus: Intel Corporat...
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #59 |
Yes, this is the same problem here. But Intel doesn't seem to be interessted here :-(
paul janssen (pauluswaulus) wrote (last edit ): | #11 |
I also tried blacklisting only "i801_smbus" but that gave the same issue.
Only blacklisting "i2c_i801" is currently the best workaround.
Changed in fedora: | |
importance: | Unknown → Undecided |
status: | Unknown → Confirmed |
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #60 |
(In reply to stephane.poignant from comment #42)
> I upgraded both from Debian Buster (kernel 4.19.194-3) to Bullseye
> (5.10.46-5). No issue on the C3758, but i was faced with severe performance
> regression on the C2758F.
>
Interesting, so was the 4.19 working on the C2758F without interrupt storm?
In Linux Kernel Bug Tracker #177311, stephane.poignant (stephane.poignant-linux-kernel-bugs) wrote : | #61 |
(In reply to Jarkko Nikula from comment #44)
> (In reply to stephane.poignant from comment #42)
> > I upgraded both from Debian Buster (kernel 4.19.194-3) to Bullseye
> > (5.10.46-5). No issue on the C3758, but i was faced with severe performance
> > regression on the C2758F.
> >
> Interesting, so was the 4.19 working on the C2758F without interrupt storm?
I haven't checked the /proc/interrupts when running 4.19 so i cannot tell for sure that the interrupts were not there. The performance regression was not there for sure. I can check this in a couple of weeks (server at a remote location with no oobm network).
Dmesg when running 4.19 shows it had interrupts enabled:
[ 0.000000] Linux version 4.19.0-17-amd64 (<email address hidden>) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.194-3 (2021-07-18)
[ 0.000000] Command line: BOOT_IMAGE=
...
[ 1.434097] Run /init as init process
[ 1.782787] dca service started, version 1.12.1
[ 1.783203] ismt_smbus 0000:00:13.0: enabling device (0000 -> 0002)
[ 1.796694] cryptd: max_cpu_qlen set to 1000
[ 1.801177] i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)
[ 1.801317] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[ 1.801356] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[ 1.805199] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
[ 1.805202] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 1.805246] igb 0000:00:14.0: enabling device (0000 -> 0002)
[ 1.816722] SSE version of gcm_enc/dec engaged.
...
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #62 |
The problem do persists in kernel 4.19 and other versions. It only depens, if a different driver triggers the interrupts. If so, they are counting very high. So it's possible, that you had none driver in 4.19 using those interrupts and as a consequence, the bug did not trigger.
@Jarkko Nikula: Since you are still replying, could you please try again and further to get the needed docs, as requested by Jean Delvare?
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #63 |
@Conrad Kostecki: Yeah, I agree with you it's unlikely problem was absent in 4.19 as it was present way before it.
I was in contact with our sales support and they told the Atom C2758 with F-postfix is custom to SuperMicro. Unfortunately they didn't find explicit specification for the SMBus controller on it but they told it's based on the same 22 nm Silvermonth architecture than the Bay Trail. I suppose SMBus IO should be compatible.
Unfortunately public datasheets for Bay Trails seems scarce too but I was able to find something when searching datasheets for the Bay Trail E3825 used in MinnowBoard Max. Following document seems to be available for the registered ark.intel.com user or by search engines:
"Intel Atom ® Processor E3800 Product Family" with Document Number: 538136 and Chapter 33 "PCU – System Management Bus (SMBus)"
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #64 |
Created attachment 299193
Debug patch for the i2c-i801 interrupts
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #65 |
Could you try attached patch what interrupt statuses it will print in case of interrupt storm? It's rate limited debug print so it shouldn't flood the dmesg.
You need to have CONFIG_
mount none /sys/kernel/debug -t debugfs
echo -n "func i801_isr +p" >/sys/kernel/
or by appending that to your kernel command line:
i2c_i801.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #66 |
Here is the output:
pcicst 0x298, SMBHSTSTS 0x60
[ 359.205884] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 359.205918] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210031] i801_isr: 375367 callbacks suppressed
[ 364.210043] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210085] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210126] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210142] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210178] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210217] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210234] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210253] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210292] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 364.210329] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220035] i801_isr: 380909 callbacks suppressed
[ 369.220047] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220069] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220109] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220146] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220185] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220222] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220262] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220278] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220317] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 369.220333] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230078] i801_isr: 393736 callbacks suppressed
[ 374.230109] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230151] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230191] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230210] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230248] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230283] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230297] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230332] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230345] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 374.230358] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240037] i801_isr: 382705 callbacks suppressed
[ 379.240068] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240090] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240110] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240130] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240150] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240186] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240205] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240242] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240281] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 379.240297] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 384.250032] i801_isr: 387109 callback...
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #67 |
Thanks. Those debug prints confirm the interrupt is really coming from the SMBus controller (bit 3 is set in PCI status) and the SMB alert bit is set.
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #68 |
Created attachment 299201
Experimental patch disabling SMB_ALERT signal
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #69 |
@Conrad Kostecki: Could you try does the attached experimental patch which disables the SMB_ALERT help here.
In Linux Kernel Bug Tracker #177311, stephane.poignant (stephane.poignant-linux-kernel-bugs) wrote : | #70 |
Thanks for the follow up, i will test the patch on my setup as well by next week.
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #71 |
I just tested the patch and can confirm, it works. After applying patch, interrupts dropped nearly to zero on i801_smbus.
In Linux Kernel Bug Tracker #177311, andy.shevchenko (andy.shevchenko-linux-kernel-bugs) wrote : | #72 |
(In reply to Conrad Kostecki from comment #55)
> I just tested the patch and can confirm, it works. After applying patch,
> interrupts dropped nearly to zero on i801_smbus.
According to the specification the host (if implemented ALERT) should issue special byte read command to see which device wants to send something. If the proper implementation won't fix this, it might be some pin configuration issue (like pull down sitting on the respective pin) or PCB or firmware (BIOS) issues.
Would be nice to understand, if it can be done without much efforts, what's exactly is making the ALERT be asserted.
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #73 |
I was thinking too should there be proper acknowledging for the SMB_ALERT but since the driver currently doesn't have support for it I wanted to see does simple disabling help.
Fortunately I was able to reproduce issue locally in an another platform where the SMB_ALERT was connected to a resistor and was able to pull-down the signal by a wire. Interrupt storm begins when the SMB_ALERT goes down for a moment and continues after.
I'll test a bit more and make a proper patch. One thing I'm wondering should the driver restore the original disable status on driver removal like what is done for host notify in i801_disable_
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #74 |
Created attachment 299217
2nd version of patch disabling SMB_ALERT signal
I moved the SMB_ALERT signal disabling to i801_enable_
In Linux Kernel Bug Tracker #177311, andy.shevchenko (andy.shevchenko-linux-kernel-bugs) wrote : | #75 |
(In reply to Jarkko Nikula from comment #58)
> 2nd version of patch disabling SMB_ALERT signal
Side remark: Looking into this code, shouldn't you first clean current notifications and only after that enable IRQ?
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #76 |
Patch v2 works for me. Interrupts still are fine and do not go crazy.
In Linux Kernel Bug Tracker #177311, stephane.poignant (stephane.poignant-linux-kernel-bugs) wrote : | #77 |
I can confirm that i am getting the same results with the two patches on my setup with the Debian kernels.
Debug patch produces the same messages, and with SMB_ALERT disable patch there was no longer any interrupt triggered.
Also when booting into the previous kernel i was using (linux-
Will test the second version of the patch ASAP and provide you with the results.
## Kernel 4.16
# uname -a
Linux hrbpsrv01.intra.lan 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 GNU/Linux
# cat /proc/interrupts | grep i801
18: 0 0 0 0 0 0 0 0 IO-APIC 18-fasteoi i801_smbus
# dmesg
...
[ 6652.023634] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[ 6652.023689] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
...
## Debian linux-image-
# uname -a
Linux hrbpsrv01.intra.lan 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux
# cat /proc/interrupts | grep i801
18: 0 0 0 0 0 7358862 0 0 IO-APIC 18-fasteoi i801_smbus
(increase at about 100k interrupts/sec)
# dmesg
...
[ 516.429120] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[ 516.429140] i801_smbus 0000:00:1f.3: An interrupt is pending!
[ 516.429161] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[ 516.429933] i2c i2c-1: 4/4 memory slots populated (from DMI)
[ 516.430337] at24 1-0050: supply vcc not found, using dummy regulator
[ 516.431043] at24 1-0050: 256 byte spd EEPROM, read-only
[ 516.431078] i2c i2c-1: Successfully instantiated SPD at 0x50
[ 516.431455] at24 1-0051: supply vcc not found, using dummy regulator
[ 516.432148] at24 1-0051: 256 byte spd EEPROM, read-only
[ 516.432174] i2c i2c-1: Successfully instantiated SPD at 0x51
[ 516.432576] at24 1-0052: supply vcc not found, using dummy regulator
[ 516.433284] at24 1-0052: 256 byte spd EEPROM, read-only
[ 516.433325] i2c i2c-1: Successfully instantiated SPD at 0x52
[ 516.433748] at24 1-0053: supply vcc not found, using dummy regulator
[ 516.434454] at24 1-0053: 256 byte spd EEPROM, read-only
[ 516.434497] i2c i2c-1: Successfully instantiated SPD at 0x53
[ 525.513104] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 525.513133] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 525.513161] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 525.513185] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 525.513209] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 525.513234] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 525.513258] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 525.513281] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 525.513316] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 525.513352] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[ 530.514207] i801_isr: 297603 callbacks suppressed
[ 530.5...
In Linux Kernel Bug Tracker #177311, stephane.poignant (stephane.poignant-linux-kernel-bugs) wrote : | #78 |
Patch V2 works for me too.
# cat /proc/interrupts | grep i801
18: 0 0 0 0 0 8 0 0 IO-APIC 18-fasteoi i801_smbus
In Linux Kernel Bug Tracker #177311, jarkko.nikula (jarkko.nikula-linux-kernel-bugs) wrote : | #79 |
(In reply to Andy Shevchenko from comment #59)
> (In reply to Jarkko Nikula from comment #58)
> > 2nd version of patch disabling SMB_ALERT signal
>
> Side remark: Looking into this code, shouldn't you first clean current
> notifications and only after that enable IRQ?
That's a good question and made me debugging more. In fact disabling doesn't disable detection and SMBALERT_STS will be set and cause short burst of interrupts during driver load and unload time if SMB_ALERT signal was asserted. Looks like it's better to add basic acknowledging for it into i801_isr().
I'm not sure would clearing pending interrupts at the probe time cause any regression but acknowledging the SMBALERT_STS in i801_isr() makes sure the status doesn't stay forever if it occurs after probe.
In Linux Kernel Bug Tracker #177311, andy.shevchenko (andy.shevchenko-linux-kernel-bugs) wrote : | #80 |
(In reply to Jarkko Nikula from comment #63)
> (In reply to Andy Shevchenko from comment #59)
> > (In reply to Jarkko Nikula from comment #58)
> > > 2nd version of patch disabling SMB_ALERT signal
> >
> > Side remark: Looking into this code, shouldn't you first clean current
> > notifications and only after that enable IRQ?
>
> That's a good question and made me debugging more. In fact disabling doesn't
> disable detection and SMBALERT_STS will be set and cause short burst of
> interrupts during driver load and unload time if SMB_ALERT signal was
> asserted. Looks like it's better to add basic acknowledging for it into
> i801_isr().
>
> I'm not sure would clearing pending interrupts at the probe time cause any
> regression but acknowledging the SMBALERT_STS in i801_isr() makes sure the
> status doesn't stay forever if it occurs after probe.
It also makes sense to test it with DEBUG_SHIRQ enabled (yes, I know that more than a half of the drivers in the Linux kernel will either crash or behave badly on this, not many developers know about the debugging feature).
paul janssen (pauluswaulus) wrote : Re: kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! | #13 |
I started kernel bisecting in an attempt to find the commit that causes this issue.
Painfull process.
I found another workaround (not a solution) on stackoverflow (which has been deleted from stackoverflow by now). The workaround was to disable virtualization in the BIOS: Intel VTX -> disabled , Intel VTD -> disabled. This "worked" for me. The machine booted. But ... /proc/interrupts still showed about 50.000 interrupts/sec from the smbus. So, the issue of mucho interrupts is still there but it is somehow rate limited allowing the machine to boot and be sufficiently responsive. I prefer blacklisting i2c_i801 upto now.
paul janssen (pauluswaulus) wrote : | #14 |
New best workaround, instead of blacklisting i2c-i801 keep it but disable interrupts and use polling instead.
Step 1 Temporary and to be able to boot for step 2).
a. To able to boot enter the grub menu (press ESC once during boot)
b. select the (Ubuntu)Linux entry you want to boot and press "e" to edit this.
c. edit the line start with " linux /boot/vmlinuz ....."
d. at the end of this line add " i2c-i801.
e. press F10
Now the machine will boot with this new i2c-i801 module parameter. This will happen only once, next boot will be without this parameter (unless you manually add it again by repeating the above steps).
Step 2 After the boot and login, make it last:
a. Run "sudo vi /etc/modprobe.
b. Add the line "options i2c-i801 disable_
c To make sure its used at boot-time run: "sudo update-initramfs -u"
With this best workaround the module i2c-i801 is still loaded but using polling instead of interrupts. I think this is better then no i2c-i801 at all.
I can boot, the issue does not occur.
Still a workaround ..
Tobias Karnat (tobiaskarnat) wrote : | #15 |
My Lenovo Ideapad Duet 3i with Ubuntu 22.04 (Kernel 5.13) is also affected (Current workaround disable_
Changed in linux: | |
importance: | Unknown → Medium |
status: | Unknown → Incomplete |
Tobias Karnat (tobiaskarnat) wrote : Re: kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! | #81 |
I cannot take any logs with apport-collect, because the boot is to slow too finish when this happens.
So please change the status to Confirmed.
Changed in linux-hwe-5.13 (Ubuntu): | |
status: | New → Confirmed |
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
status: | Confirmed → Incomplete |
status: | Incomplete → New |
status: | New → Incomplete |
Tobias Karnat (tobiaskarnat) wrote : | #82 |
- i2c-i801-disable-alert-v2.diff Edit (1.0 KiB, text/plain)
The proposed patch from bugzilla.kernel.org
tags: | added: patch |
In Linux Kernel Bug Tracker #177311, jdelvare (jdelvare-linux-kernel-bugs) wrote : | #84 |
This bug is believed to be fixed in kernel v5.16 by the following 2 commits:
commit 03a976c9afb5e3c
Author: Jarkko Nikula
Date: Wed Nov 17 11:45:09 2021 +0200
i2c: i801: Fix interrupt storm from SMB_ALERT signal
commit 9b5bf5878138293
Author: Jean Delvare
Date: Tue Nov 9 16:02:57 2021 +0100
i2c: i801: Restore INTREN on unload
paul janssen (pauluswaulus) wrote : | #83 |
Believed to be fixed in the kernel by two commits.
See: https:/
Changed in linux: | |
status: | Incomplete → Fix Released |
summary: |
- kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s! + Soft lockup due to interrupt storm from smbus |
In Linux Kernel Bug Tracker #177311, ck+kernelbugzilla (ck+kernelbugzilla-linux-kernel-bugs) wrote : | #85 |
Upgraded to kernel 5.16 today no more irq noise. Thank you!
In Red Hat Bugzilla #2009977, bcotton (bcotton-redhat-bugs) wrote : | #86 |
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version'
to a later Fedora Linux version.
Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora Linux 34 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.
Dave Jones (waveform) wrote (last edit ): | #87 |
Also affects my Acer Aspire TravelMate Spin B118 on Ubuntu 22.04. The i2c-i801 workaround from comment 14 above (https:/
I've noticed, when I do enable CONFIG_SENSORS_JC42 as a module or build into
my kernel, this causes a very high rate of interrupts on i801_smbus - about
6000-8000 per second according to /proc/interrupts. After 20 minutes, there
were about 5 million interrupts generated on i801_smbus.
When I do unload the module jc42, the interrupts do not stop, until I do a
complete reboot.
Mainboard: Supermicro A1SRM-2758F
Kernel: Gentoo-Sources 4.8.1 (Happens also with Vanilla 4.8.1 and older kernel
versions)
dmesg:
[ 8.319900] i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
[ 8.321864] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[ 8.326098] ismt_smbus 0000:00:13.0: enabling device (0140 -> 0142)
lspci:
00:1f.3 SMBus: Intel Corporation Atom processor C2000 PCU SMBus (rev 02)
When the module is loaded, I am also getting this errors:
[ 73.934901] ismt_smbus 0000:00:13.0: completion wait timed out
[ 74.974970] ismt_smbus 0000:00:13.0: completion wait timed out
[ 76.014949] ismt_smbus 0000:00:13.0: completion wait timed out
[ 77.054903] ismt_smbus 0000:00:13.0: completion wait timed out
[ 78.094961] ismt_smbus 0000:00:13.0: completion wait timed out
[ 79.134982] ismt_smbus 0000:00:13.0: completion wait timed out
[ 80.175116] ismt_smbus 0000:00:13.0: completion wait timed out
[ 81.215057] ismt_smbus 0000:00:13.0: completion wait timed out