NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Incomplete
|
High
|
Canonical Kernel Team | ||
Vivid |
Expired
|
High
|
Unassigned | ||
linux-lts-xenial (Ubuntu) |
Expired
|
High
|
Unassigned |
Bug Description
---Problem Description---
NMI watchdog: BUG: soft lockup errors when we execute lock_torture_wr tests
---uname output---
Linux alp15 3.19.0-18-generic #18~14.04.1-Ubuntu SMP Wed May 20 09:40:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
Machine Type = P8
---Steps to Reproduce---
Install a P8 Power VM LPAR with Ubuntu 14.04.2 ISO.
Then install the Ubuntu 14.04.3 kernel on the same and reboot.
Then compile and build the LTP latest test suites on the same.
root@alp15:~# tar -xvf ltp-full-
root@alp15:~# cd ltp-full-20150420/
root@alp15:
aclocal.m4 configure execltp.in install-sh Makefile README runltplite.sh testcases utils
autom4te.cache configure.ac IDcheck.sh lib Makefile.release README.
config.guess COPYING include ltpmenu missing runalltests.sh scenario_groups TODO VERSION
config.sub doc INSTALL m4 pan runltp scripts tools
root@alp15:
root@alp15:
root@alp15:
root@alp15:
lock_torture 1 TINFO : estimate time 6.00 min
lock_torture 1 TINFO : spin_lock: running 60 sec...
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034386] NMI watchdog: BUG: soft lockup - CPU#10 stuck for 21s! [lock_torture_
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034389] NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [lock_torture_
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034394] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [lock_torture_
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034396] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [lock_torture_
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034398] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 21s! [lock_torture_
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034410] NMI watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [lock_torture_
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.034412] NMI watchdog: BUG: soft lockup - CPU#9 stuck for 22s! [lock_torture_
Message from syslogd@alp15 at Thu Jun 18 01:23:32 2015 ...
alp15 vmunix: [ 308.038386] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [lock_torture_
Stack trace output:
root@alp15:~# dmesg | more
[ 1717.146881] lock_torture_wr R running task
[ 1717.146881]
[ 1717.146885] 0 2555 2 0x00000804
[ 1717.146887] Call Trace:
[ 1717.146894] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860 (unreliable)
[ 1717.146899] [c000000c7551b860] [c0000000000b4fb0] __do_softirq+
[ 1717.146904] [c000000c7551b960] [c0000000000b5478] irq_exit+0x98/0x100
[ 1717.146909] [c000000c7551b980] [c00000000001fa54] timer_interrupt
[ 1717.146913] [c000000c7551b9b0] [c000000000002758] decrementer_
[ 1717.146922] --- interrupt: 901 at _raw_write_
[ 1717.146922] LR = torture_
[ 1717.146927] [c000000c7551bca0] [c000000c7551bcd0] 0xc000000c7551bcd0 (unreliable)
[ 1717.146934] [c000000c7551bcd0] [d00000000d4810b8] torture_
[ 1717.146939] [c000000c7551bcf0] [d00000000d480578] lock_torture_
[ 1717.146944] [c000000c7551bd80] [c0000000000da4d4] kthread+0x114/0x140
[ 1717.146948] [c000000c7551be30] [c00000000000956c] ret_from_
[ 1717.146951] Task dump for CPU 10:
[ 1717.146953] lock_torture_wr R running task 0 2537 2 0x00000804
[ 1717.146957] Call Trace:
[ 1717.146961] [c000000c7557b820] [c000000c7557b860] 0xc000000c7557b860 (unreliable)
[ 1717.146966] [c000000c7557b860] [c0000000000b4fb0] __do_softirq+
[ 1717.146970] [c000000c7557b960] [c0000000000b5478] irq_exit+0x98/0x100
[ 1717.146975] [c000000c7557b980] [c00000000001fa54] timer_interrupt
[ 1717.146979] [c000000c7557b9b0] [c000000000002758] decrementer_
[ 1717.146988] --- interrupt: 901 at _raw_write_
[ 1717.146988] LR = torture_
[ 1717.146993] [c000000c7557bca0] [c000000c7557bcd0] 0xc000000c7557bcd0 (unreliable)
[ 1717.147000] [c000000c7557bcd0] [d00000000d4810b8] torture_
[ 1717.147006] [c000000c7557bcf0] [d00000000d480578] lock_torture_
[ 1717.147013] [c000000c7557bd80] [c0000000000da4d4] kthread+0x114/0x140
[ 1717.147017] [c000000c7557be30] [c00000000000956c] ret_from_
[ 1717.147020] Task dump for CPU 17:
[ 1717.147021] Task dump for CPU 2:
[ 1717.147028] lock_torture_wr R
[ 1717.147028] lock_torture_wr R running task
[ 1717.147033] running task 0 2547 2 0x00000804
[ 1717.147042] 0 2533 2 0x00000804
[ 1717.147044] Call Trace:
[ 1717.147045] Call Trace:
[ 1717.147053] [c000000c732a3820] [c000000c7f688448] 0xc000000c7f688448
[ 1717.147056] [c000000c7555f820] [c000000c7fa48448] 0xc000000c7fa48448
[ 1717.147059] (unreliable)
[ 1717.147063] (unreliable)
[ 1717.147063]
[ 1717.147067]
[ 1717.147072] Task dump for CPU 18:
[ 1717.147073] Task dump for CPU 7:
[ 1717.147077] lock_torture_wr R running task
[ 1717.147082] lock_torture_wr R 0 2555 2 0x00000804
[ 1717.147088] running task
[ 1717.147088] Call Trace:
[ 1717.147096] [c000000c7551b820] [c000000c7551b860] 0xc000000c7551b860
[ 1717.147096] 0 2559 2 0x00000804
[ 1717.147102] Call Trace:
[ 1717.147105] (unreliable)
It is possible that we are missing this commit that fixes a deadlock during these tests:
will check the Ubuntu source shortly as see if this is the case and we can suggest building a kernel to see if it helps.
The apt-get source linux-image- on the test system didn't pull down the sources but the kernel being used is close to the one used for vivid (3.19.0-25.26) so I pulled down the git source tree for it with git clone git://kernel.
As I basically understand it, the problem that was fixed is that while torture_
torture_rwlock
anything that calls the counterpart torture_
I'll go ahead and mirror this since I pretty confident this is the issue (also should affect Vivid).
We'll have to figure out how to get the sources for the LTS kernel to build a test kernel as well.
affects: | ubuntu → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
assignee: | nobody → Taco Screen team (taco-screen-team) |
Changed in linux (Ubuntu): | |
assignee: | Taco Screen team (taco-screen-team) → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
status: | New → Triaged |
tags: | added: kernel-da-key |
Changed in linux (Ubuntu Vivid): | |
importance: | Undecided → High |
status: | New → In Progress |
assignee: | nobody → Joseph Salisbury (jsalisbury) |
tags: |
added: severity-high targetmilestone-inin14044 removed: severity-critical targetmilestone-inin14043 |
tags: | added: kernel-key |
tags: | removed: kernel-key |
no longer affects: | linux-lts-xenial (Ubuntu Vivid) |
Changed in linux-lts-xenial (Ubuntu): | |
importance: | Undecided → High |
status: | New → In Progress |
status: | In Progress → Triaged |
Changed in linux (Ubuntu Vivid): | |
status: | In Progress → Incomplete |
tags: |
added: targetmilestone-inin1610 removed: targetmilestone-inin14044 |
Changed in linux (Ubuntu): | |
status: | Triaged → Incomplete |
Changed in linux (Ubuntu Vivid): | |
assignee: | Joseph Salisbury (jsalisbury) → nobody |
Changed in linux-lts-xenial (Ubuntu): | |
status: | Triaged → Incomplete |
Default Comment by Bridge