cpuset_hotplug in ubuntu_ltp_controllers hang with X-aws VM instances
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
New
|
Undecided
|
Unassigned |
Bug Description
Issue found on X-aws 4.4.0-1158.173
The cpuset_hotplug test will timeout with the 45 minutes on specific VMs (passed on AWS bare-metals) because of task hang. Here is the list of failed instances:
* c4.large
- xen / xen-hvm, 2 cpus
- fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single kaiser fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
* c5n.large
- kvm, 2 cpus
- fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
* m5a.large
- kvm, 2 cpus
- fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt nrip_save
* r5.large
- kvm, 2 cpus
- fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
* t3.medium
- kvm, 2 cpus
- fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f rdseed adx smap clflushopt clwb avx512cd xsaveopt xsavec xgetbv1 ida arat pku
Passed on these VMs:
* c3.xlarge
- xen / xen-hvm, 4 cpus
- fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm kaiser fsgsbase smep erms xsaveopt
* t3a.2xlarge
- kvm, 8 cpus
- fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero arat npt nrip_save
* x1e.xlarge
- xen / xen-hvm, 4 cpus
- fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm xsaveopt
Test output:
<<<test_start>>>
tag=cpuset_hotplug stime=1688982088
cmdline=
contacts=""
analysis=exit
<<<test_output>>>
cpuset_hotplug 1 TINFO: CPUs are numbered continuously starting at 0 (0-1)
cpuset_hotplug 1 TINFO: Nodes are numbered continuously starting at 0 (0)
cpuset_hotplug 1 TPASS: Cpuset vs CPU hotplug test succeeded.
cpuset_hotplug 3 TPASS: Cpuset vs CPU hotplug test succeeded.
cpuset_hotplug 5 TPASS: Cpuset vs CPU hotplug test succeeded.
This issue can be reproduced with X-aws-1128, dmesg output:
[ 1083.816153] LTP: starting cpuset_hotplug (cpuset_
[ 1083.948273] Broke affinity for irq 4
[ 1083.948396] Broke affinity for irq 61
[ 1083.948402] Broke affinity for irq 62
[ 1083.948408] Broke affinity for irq 63
[ 1083.948413] Broke affinity for irq 64
[ 1083.949824] smpboot: CPU 1 is now offline
[ 1085.000502] installing Xen timer for CPU 1
[ 1085.020261] x86: Booting SMP configuration:
[ 1085.020263] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 1085.021673] Skipped synchronization checks as TSC is reliable.
[ 1085.021693] cpu 1 spinlock event irq 59
[ 1085.069370] smpboot: CPU 1 is now offline
[ 1085.081932] installing Xen timer for CPU 1
[ 1085.100218] x86: Booting SMP configuration:
[ 1085.100220] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 1085.101588] Skipped synchronization checks as TSC is reliable.
[ 1085.101608] cpu 1 spinlock event irq 59
[ 1086.405418] smpboot: CPU 1 is now offline
[ 1087.467625] installing Xen timer for CPU 1
[ 1087.484259] x86: Booting SMP configuration:
[ 1087.484261] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 1087.485700] Skipped synchronization checks as TSC is reliable.
[ 1087.485721] cpu 1 spinlock event irq 59
[ 1320.308124] INFO: task systemd:1 blocked for more than 120 seconds.
[ 1320.311249] Not tainted 4.4.0-1128-aws #142-Ubuntu
[ 1320.313959] "echo 0 > /proc/sys/
[ 1320.318032] systemd D ffff8800ea4f7d28 0 1 0 0x00000000
[ 1320.318037] ffff8800ea4f7d28 ffff8800372adb40 ffffffff81e13500 ffff8800ea4f8000
[ 1320.318042] ffff8800ea4f8000 ffffffff81e605e4 ffff8800ea4f8000 00000000ffffffff
[ 1320.318045] ffffffff81e605e8 ffff8800ea4f7d40 ffffffff81844555 ffffffff81e605e0
[ 1320.318048] Call Trace:
[ 1320.318057] [<ffffffff81844
[ 1320.318061] [<ffffffff81844
[ 1320.318065] [<ffffffff81846
[ 1320.318068] [<ffffffff81846
[ 1320.318074] [<ffffffff8111f
[ 1320.318078] [<ffffffff81289
[ 1320.318083] [<ffffffff8123e
[ 1320.318088] [<ffffffff81354
[ 1320.318093] [<ffffffff81218
[ 1320.318096] [<ffffffff81218
[ 1320.318100] [<ffffffff81219
[ 1320.318104] [<ffffffff81849
[ 1320.318118] INFO: task kworker/0:3:473 blocked for more than 120 seconds.
[ 1320.321669] Not tainted 4.4.0-1128-aws #142-Ubuntu
[ 1320.324364] "echo 0 > /proc/sys/
[ 1320.328026] kworker/0:3 D ffff8800364cfc68 0 473 2 0x00000000
[ 1320.328031] Workqueue: cgroup_destroy css_killed_work_fn
[ 1320.328033] ffff8800364cfc68 ffff8800d3ff4cc0 ffff880036d94600 ffff880036f9f000
[ 1320.328035] ffff8800364d0000 ffffffff81e42fa4 ffff880036f9f000 00000000ffffffff
[ 1320.328037] ffffffff81e42fa8 ffff8800364cfc80 ffffffff81844555 ffffffff81e42fa0
[ 1320.328039] Call Trace:
[ 1320.328043] [<ffffffff81844
[ 1320.328045] [<ffffffff81844
[ 1320.328047] [<ffffffff81846
[ 1320.328052] [<ffffffff81846
[ 1320.328072] [<ffffffff81084
[ 1320.328078] [<ffffffff81033
[ 1320.328081] [<ffffffff81844
[ 1320.328085] [<ffffffff81193
[ 1320.328088] [<ffffffff81193
[ 1320.328092] [<ffffffff81193
[ 1320.328095] [<ffffffff81193
[ 1320.328098] [<ffffffff81122
[ 1320.328102] [<ffffffff81119
[ 1320.328106] [<ffffffff8109e
[ 1320.328109] [<ffffffff8109f
[ 1320.328116] [<ffffffff8109e
[ 1320.328119] [<ffffffff810a5
[ 1320.328122] [<ffffffff81844
[ 1320.328128] [<ffffffff810a5
[ 1320.328131] [<ffffffff81849
[ 1320.328134] [<ffffffff810a5
[ 1320.328142] INFO: task irqbalance:1189 blocked for more than 120 seconds.
[ 1320.331605] Not tainted 4.4.0-1128-aws #142-Ubuntu
[ 1320.334291] "echo 0 > /proc/sys/
[ 1320.338224] irqbalance D ffff8800e8b07ce8 0 1189 1 0x00000000
[ 1320.338228] ffff8800e8b07ce8 ffff8800e8b07dd0 ffffffff81e13500 ffff8800369dc600
[ 1320.338232] ffff8800e8b08000 ffff8800eb64f7cc ffff8800369dc600 00000000ffffffff
[ 1320.338235] ffff8800eb64f7d0 ffff8800e8b07d00 ffffffff81844555 ffff8800eb64f7c8
[ 1320.338238] Call Trace:
[ 1320.338242] [<ffffffff81844
[ 1320.338249] [<ffffffff81844
[ 1320.338256] [<ffffffff81846
[ 1320.338259] [<ffffffff81846
[ 1320.338264] [<ffffffff81554
[ 1320.338269] [<ffffffff81554
[ 1320.338272] [<ffffffff81846
[ 1320.338279] [<ffffffff8129a
[ 1320.338283] [<ffffffff81299
[ 1320.338285] [<ffffffff8123e
[ 1320.338289] [<ffffffff81299
[ 1320.338292] [<ffffffff81354
[ 1320.338296] [<ffffffff81218
[ 1320.338299] [<ffffffff81218
[ 1320.338303] [<ffffffff81219
[ 1320.338306] [<ffffffff81849
[ 1320.338317] INFO: task kworker/0:2:1448 blocked for more than 120 seconds.
[ 1320.341877] Not tainted 4.4.0-1128-aws #142-Ubuntu
[ 1320.344538] "echo 0 > /proc/sys/
[ 1320.348296] kworker/0:2 D ffff8800bf097d38 0 1448 2 0x00000000
[ 1320.348305] Workqueue: events vmstat_shepherd
[ 1320.348307] ffff8800bf097d38 ffffffff8141c925 ffffffff81e13500 ffff88008aa32a00
[ 1320.348310] ffff8800bf098000 ffffffff81e42fa4 ffff88008aa32a00 00000000ffffffff
[ 1320.348313] ffffffff81e42fa8 ffff8800bf097d50 ffffffff81844555 ffffffff81e42fa0
[ 1320.348316] Call Trace:
[ 1320.348321] [<ffffffff8141c
[ 1320.348329] [<ffffffff81844
[ 1320.348332] [<ffffffff81844
[ 1320.348336] [<ffffffff81846
[ 1320.348342] [<ffffffff81846
[ 1320.348345] [<ffffffff81084
[ 1320.348350] [<ffffffff811b6
[ 1320.348354] [<ffffffff8109e
[ 1320.348362] [<ffffffff8109f
[ 1320.348366] [<ffffffff8109e
[ 1320.348369] [<ffffffff810a5
[ 1320.348372] [<ffffffff81844
[ 1320.348375] [<ffffffff810a5
[ 1320.348378] [<ffffffff81849
[ 1320.348381] [<ffffffff810a5
[ 1320.348386] INFO: task echo:1878 blocked for more than 120 seconds.
[ 1320.351463] Not tainted 4.4.0-1128-aws #142-Ubuntu
[ 1320.354182] "echo 0 > /proc/sys/
[ 1320.358193] echo D ffff88008aaf3c28 0 1878 1761 0x00000000
[ 1320.358197] ffff88008aaf3c28 000000000000001b ffffffff81e13500 ffff880036a83800
[ 1320.358201] ffff88008aaf4000 ffffffff81e6aec4 ffff880036a83800 00000000ffffffff
[ 1320.358204] ffffffff81e6aec8 ffff88008aaf3c40 ffffffff81844555 ffffffff81e6aec0
[ 1320.358207] Call Trace:
[ 1320.358212] [<ffffffff81844
[ 1320.358218] [<ffffffff81844
[ 1320.358226] [<ffffffff81846
[ 1320.358230] [<ffffffff810dd
[ 1320.358233] [<ffffffff81846
[ 1320.358239] [<ffffffff81193
[ 1320.358243] [<ffffffff810ab
[ 1320.358247] [<ffffffff810a6
[ 1320.358251] [<ffffffff810a6
[ 1320.358258] [<ffffffff81084
[ 1320.358262] [<ffffffff81085
[ 1320.358265] [<ffffffff81085
[ 1320.358268] [<ffffffff8155c
[ 1320.358271] [<ffffffff81556
[ 1320.358274] [<ffffffff81556
[ 1320.358277] [<ffffffff81553
[ 1320.358281] [<ffffffff8129a
[ 1320.358284] [<ffffffff8129a
[ 1320.358288] [<ffffffff81218
[ 1320.358291] [<ffffffff81218
[ 1320.358295] [<ffffffff81219
[ 1320.358297] [<ffffffff81849
This failure will cause the following cpuset_memory to fail. And leads to test interruption.
tags: | added: ubuntu-ltp-controllers |
description: | updated |
description: | updated |
description: | updated |
I have cross checked -1128, -1158 with our LTP latest commit 3ebc2dfa85 and the older version ac1a3e40c5. This hang can be 100% reproduced.
And this is our first time running ubuntu_ ltp_controllers test on X-aws.