linux autopkgtest: eternal hang/ BUG: soft lockup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Undecided
|
Colin Ian King |
Bug Description
Following up to bug 1632252, tests still hang (but not due to the suspend issue any more). In a local QEMU run:
$ autopkgtest --testname ubuntu-
[...]
14:46:41 DEBUG| [stdout] Test icebp [Ok]
14:46:41 DEBUG| [stdout] Test int 3 trap [Ok]
14:46:41 DEBUG| [stdout] selftests: breakpoint_test [PASS]
and since then (1 hour) it's hung. But now I still can log into ttyS0. dmesg is almost empty:
[ 5501.499217] ata2.01: NODEV after polling detection
[ 5501.500138] ata2.00: configured for MWDMA2
(something in the test clears the ring buffer), and journalctl confirms that suspend/resume worked fine:
Oct 18 14:47:11 autopkgtest kernel: PM: suspend of devices complete after 65.273 msecs
Oct 18 14:47:11 autopkgtest kernel: PM: late suspend of devices complete after 0.172 msecs
Oct 18 14:47:11 autopkgtest kernel: PM: noirq suspend of devices complete after 1.963 msecs
Oct 18 14:47:11 autopkgtest kernel: ACPI: Preparing to enter system sleep state S3
Oct 18 14:47:11 autopkgtest kernel: PM: Saving platform NVS memory
Oct 18 14:47:11 autopkgtest kernel: Disabling non-boot CPUs ...
Oct 18 14:47:11 autopkgtest kernel: kvm-clock: cpu 0, msr 1:3fff4001, primary cpu clock, resume
Oct 18 14:47:11 autopkgtest kernel: ACPI: Low-level resume complete
Oct 18 14:47:11 autopkgtest kernel: PM: Restoring platform NVS memory
Oct 18 14:47:11 autopkgtest kernel: ACPI: Waking up from system sleep state S3
Oct 18 14:47:11 autopkgtest kernel: PM: noirq resume of devices complete after 6.973 msecs
Oct 18 14:47:11 autopkgtest kernel: PM: early resume of devices complete after 0.105 msecs
Oct 18 14:47:11 autopkgtest kernel: pci 0000:00:01.0: PIIX3: Enabling Passive Release
Oct 18 14:47:11 autopkgtest kernel: rtc_cmos 00:00: System wakeup disabled by ACPI
Oct 18 14:47:11 autopkgtest kernel: PM: resume of devices complete after 8.898 msecs
Oct 18 14:47:11 autopkgtest kernel: PM: Finishing wakeup.
Oct 18 14:47:11 autopkgtest systemd[1]: Time has been changed
Oct 18 14:47:11 autopkgtest systemd[1]: apt-daily.timer: Adding 10h 56min 28.694439s random time.
Oct 18 14:47:11 autopkgtest systemd[3986]: Time has been changed
Oct 18 14:47:11 autopkgtest kernel: Restarting tasks ... done.
Oct 18 14:47:11 autopkgtest sudo[30731]: pam_unix(
Oct 18 14:47:11 autopkgtest kernel: ata2.01: NODEV after polling detection
Oct 18 14:47:11 autopkgtest kernel: ata2.00: configured for MWDMA2
But also, no messages beyond that (last message one hour ago).
On the currently running amd64 yakkety test on the ifrastructure, the int trap bug fails:
13:15:46 DEBUG| [stdout] Test icebp [Ok]
13:15:46 DEBUG| [stdout] Test int 3 trap [Failed]
13:15:46 DEBUG| [stdout] selftests: breakpoint_test [PASS]
and then it's apparently hung at exactly the same problem. But unlike with local QEMU I cannot ssh in any more (Connection refused), and console-log shows trouble:
autopkgtest login: [ 1927.192886] AppArmor: change_hat: Invalid input '^open'
[ 1927.194060] AppArmor: change_hat: Invalid input '^'[ 1927.198501] AppArmor: change_hat: Invalid input, NULL hat and NULL magic
[ 1927.203181] AppArmor: change_hat: Invalid input, NULL hat and NULL magic
[ 1927.207249] AppArmor: change_hat: Invalid input '^open'AppArmor: change_hat: Invalid input '^'
[ 1927.208826] AppArmor: change_hat: Invalid input '^'[[0;32m OK [0m] Stopped target Graphical Interface.
[[0;32m OK [0m] Stopped target Multi-User System.
Stopping OpenBSD Secure Shell server...
Stopping autopkgtest root shell on ttyS1...
[ 2909.360975] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
[ 2937.360965] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
[ 2941.368025] INFO: rcu_sched self-detected stall on CPU
[ 2941.368095] 3-...: (1 GPs behind) idle=ea7/2/0 softirq=
[ 2941.368095] (t=15000 jiffies g=205236 c=205235 q=134637)
[ 2969.360984] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
[ 2997.360987] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
[ 3021.796060] INFO: task systemd:1 blocked for more than 120 seconds.
[ 3021.797626] Tainted: G W OEL 4.8.0-25-generic #27-Ubuntu
[ 3021.797627] "echo 0 > /proc/sys/
[ 3021.797727] INFO: task systemd-journal:285 blocked for more than 120 seconds.
[ 3021.797728] Tainted: G W OEL 4.8.0-25-generic #27-Ubuntu
[ 3021.797728] "echo 0 > /proc/sys/
[ 3021.797772] INFO: task kworker/1:2:303 blocked for more than 120 seconds.
[ 3021.797772] Tainted: G W OEL 4.8.0-25-generic #27-Ubuntu
[ 3021.797773] "echo 0 > /proc/sys/
[ 3021.797814] INFO: task kworker/2:2:350 blocked for more than 120 seconds.
[ 3021.797815] Tainted: G W OEL 4.8.0-25-generic #27-Ubuntu
[ 3021.797815] "echo 0 > /proc/sys/
[ 3021.797853] INFO: task acpid:659 blocked for more than 120 seconds.
[ 3021.797853] Tainted: G W OEL 4.8.0-25-generic #27-Ubuntu
[ 3021.797854] "echo 0 > /proc/sys/
[ 3021.797940] INFO: task (exim4):31259 blocked for more than 120 seconds.
[ 3021.797941] Tainted: G W OEL 4.8.0-25-generic #27-Ubuntu
[ 3021.797941] "echo 0 > /proc/sys/
[ 3025.361006] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
[ 3053.360962] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
[ 3081.360962] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
[ 3109.360971] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [swapper/3:0]
... and a couple of similar repetitions.
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
assignee: | nobody → Colin Ian King (colin-king) |
tags: | added: bot-stop-nagging |
The local QEMU run also doesn't time out cleanly:
qqemu-system- x86_64: /build/ qemu-3KpUoe/ qemu-2. 6.1+dfsg/ hw/char/ serial. c:231: serial_xmit: Assertion `!(s->lsr & UART_LSR_TEMT)' failed. regression- suite: ------- ------- ------- --] st-virt- qemu._36vlf46/ runcmd" , line 63, in <module> '/tmp/autopkgte st-virt- qemu._36vlf46/ ttyS1') edError: [Errno 111] Connection refused
autopkgtest [16:03:27]: ERROR: timed out on command "[...]" (kind: test)
autopkgtest [16:03:28]: test ubuntu-
Traceback (most recent call last):
File "/tmp/autopkgte
s.connect(
ConnectionRefus
I. e. it also breaks the status of ttyS1 somehow.