Comment 0 for bug 1862312

Revision history for this message
Po-Hsu Lin (cypressyew) wrote : P9 node baltar hang with ubuntu_kernel_selftests (kernel oops)

It looks like some test inside the ubuntu_kernel_selftests has triggered this issue, the jenkins job "sru-misc__B_ppc64el-generic__using_baltar__for_kernel" hung at the same spot (the beginning of the KVM unit test) for two out of two attempts:

05:06:37 INFO | GOOD ubuntu_kvm_unit_tests.setup ubuntu_kvm_unit_tests.setup timestamp=1580792797 localtime=Feb 04 05:06:37 completed successfully
05:06:37 INFO | END GOOD ubuntu_kvm_unit_tests.setup ubuntu_kvm_unit_tests.setup timestamp=1580792797 localtime=Feb 04 05:06:37
05:06:37 DEBUG| Persistent state client._record_indent now set to 1
05:06:37 DEBUG| Persistent state client.unexpected_reboot deleted
05:06:37 INFO | START ubuntu_kvm_unit_tests.emulator ubuntu_kvm_unit_tests.emulator timestamp=1580792797 localtime=Feb 04 05:06:37
05:06:37 DEBUG| Persistent state client._record_indent now set to 2
05:06:37 DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_kvm_unit_tests.emulator', 'ubuntu_kvm_unit_tests.emulator')
05:06:37 DEBUG| Running 'kvm-ok'
05:06:37 DEBUG| [stdout] INFO: /dev/kvm exists
05:06:37 DEBUG| [stdout] KVM acceleration can be used
05:06:37 DEBUG| Running 'ppc64_cpu --smt=off'
Build was aborted

Check the syslog, there is a call trace before the test_bpf and after page offline:
[ 1195.321441] Offlined Pages 4096
[ 1195.335056] Offlined Pages 4096
[ 1195.354614] Offlined Pages 4096
[ 1198.491967] Offlined Pages 4096
[ 1199.457587] Injecting error (-12) to MEM_GOING_ONLINE
[ 1200.473838] ------------[ cut here ]------------
[ 1200.473841] kernel BUG at /build/linux-CWyQTi/linux-4.15.0/kernel/rcu/sync.c:128!
[ 1200.473909] Oops: Exception in kernel mode, sig: 5 [#1]
[ 1200.473953] LE SMP NR_CPUS=2048 NUMA PowerNV
[ 1200.473999] Modules linked in: memory_notifier_error_inject notifier_error_inject overlay veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter binfmt_misc joydev input_leds mac_hid idt_89hpesx opal_prd ofpart at24 cmdlinepart powernv_flash ipmi_powernv uio_pdrv_genirq uio mtd ipmi_devintf ibmpowernv ipmi_msghandler sch_fq_codel vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ses enclosure scsi_transport_sas ast i2c_algo_bit hid_generic ttm drm_kms_helper
[ 1200.474641] syscopyarea usbhid sysfillrect sysimgblt hid fb_sys_fops crct10dif_vpmsum crc32c_vpmsum drm i40e aacraid [last unloaded: test_bpf]
[ 1200.474792] CPU: 12 PID: 139071 Comm: mem-on-off-test Not tainted 4.15.0-87-generic #87-Ubuntu
[ 1200.474894] NIP: c0000000001a8490 LR: c0000000001a8478 CTR: c00000000026c5e0
[ 1200.474981] REGS: c000000c830ff7c0 TRAP: 0700 Not tainted (4.15.0-87-generic)
[ 1200.475084] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 28222888 XER: 20040000
[ 1200.475219] CFAR: c00000000001940c SOFTE: 1
[ 1200.475219] GPR00: c0000000001a8434 c000000c830ffa40 c00000000172c900 0000000000000001
[ 1200.475219] GPR04: 00000000000001f0 c000000c7a4d2480 0000000028228882 c00000000001e730
[ 1200.475219] GPR08: 0000000ff9a10000 0000000000000001 0000000000000000 c000000c61bab790
[ 1200.475219] GPR12: 0000000000002000 c00000000fa88400 0000058d97936070 0000000000000000
[ 1200.475219] GPR16: 0000058d6b6e9690 0000058d6b776ab0 0000058d6b7a8204 0000058d6b776ae8
[ 1200.475219] GPR20: 0000058d6b7ad5d8 0000000000000001 0000000000000000 00007fffd1cb80e4
[ 1200.475219] GPR24: 00007fffd1cb80e0 c000000001763428 c0000000015f6ba8 0000000000000000
[ 1200.475219] GPR28: 0000000000000020 c0000000015f6bb0 ffffffffffffffff c0000000015f6ba8
[ 1200.476036] NIP [c0000000001a8490] rcu_sync_enter+0xa0/0x1e0
[ 1200.476124] LR [c0000000001a8478] rcu_sync_enter+0x88/0x1e0
[ 1200.476180] Call Trace:
[ 1200.476215] [c000000c830ffa40] [c000000c830ffaa0] 0xc000000c830ffaa0 (unreliable)
[ 1200.476311] [c000000c830ffab0] [c0000000001889a8] percpu_down_write+0x38/0x140
[ 1200.476407] [c000000c830ffb00] [c00000000039fa6c] online_pages+0x1fc/0x440
[ 1200.476456] [c000000c830ffbd0] [c0000000008a7320] memory_subsys_online+0x180/0x250
[ 1200.476495] [c000000c830ffc60] [c000000000879f54] device_online+0x84/0x120
[ 1200.476528] [c000000c830ffca0] [c0000000008a7ee8] store_mem_state+0xb8/0x180
[ 1200.476566] [c000000c830ffce0] [c0000000008744bc] dev_attr_store+0x3c/0x60
[ 1200.476599] [c000000c830ffd00] [c0000000004ae254] sysfs_kf_write+0x64/0x90
[ 1200.476631] [c000000c830ffd20] [c0000000004acf2c] kernfs_fop_write+0x1ac/0x240
[ 1200.476670] [c000000c830ffd70] [c0000000003e147c] __vfs_write+0x3c/0x70
[ 1200.476703] [c000000c830ffd90] [c0000000003e16d8] vfs_write+0xd8/0x220
[ 1200.476735] [c000000c830ffde0] [c0000000003e1a38] SyS_write+0x78/0x140
[ 1200.476768] [c000000c830ffe30] [c00000000000b288] system_call+0x5c/0x70
[ 1200.476799] Instruction dump:
[ 1200.476819] 409e00b0 7c2004ac 39200000 38600001 913f0008 4be70f85 60000000 2fbe0000
[ 1200.476858] 39200000 419e000c 7f9c0034 5789d97e <0b090000> 4092008c 813f0038 3d42fffb
[ 1200.476909] ---[ end trace 5ef11694541f2535 ]---
[ 1200.527850]
[ 1224.784549] test_bpf: #0 TAX jited:1 36 35 33 PASS
[ 1224.785669] test_bpf: #1 TXA jited:1 11 11 11 PASS
[ 1224.786073] test_bpf: #2 ADD_SUB_MUL_K jited:1 10 PASS
[ 1224.786236] test_bpf: #3 DIV_MOD_KX jited:1 15 PASS
[ 1224.786444] test_bpf: #4 AND_OR_LSH_K jited:1 10 10 PASS

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-87-generic 4.15.0-87.87
ProcVersionSignature: User Name 4.15.0-87.87-generic 4.15.18
Uname: Linux 4.15.0-87-generic ppc64le
.sys.firmware.opal.msglog: Error: [Errno 13] Permission denied: '/sys/firmware/opal/msglog'
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Feb 6 06:35 seq
 crw-rw---- 1 root audio 116, 33 Feb 6 06:35 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.10
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CurrentDmesg:

Date: Fri Feb 7 07:57:32 2020
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 0451:80ff Texas Instruments, Inc.
 Bus 001 Device 004: ID 0557:2419 ATEN International Co., Ltd
 Bus 001 Device 002: ID 0557:7000 ATEN International Co., Ltd Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
PciMultimedia:

ProcFB: 0 astdrmfb
ProcKernelCmdLine: root=UUID=acd1a0d7-f6fc-4130-928c-c8b11ad6e4be ro console=hvc0
ProcLoadAvg: 2.02 1.31 1.11 1/1377 37783
ProcSwaps:
 Filename Type Size Used Priority
 /swap.img file 8388544 0 -2
ProcVersion: Linux version 4.15.0-87-generic (buildd@bos02-ppc64el-002) (gcc version 7.4.0 (User Name 7.4.0-1ubuntu1~18.04.1)) #87-User Name SMP Fri Jan 31 19:32:29 UTC 2020
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-87-generic N/A
 linux-backports-modules-4.15.0-87-generic N/A
 linux-firmware 1.173.15
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
VarLogDump_list: total 0
cpu_cores: Number of cores present = 40
cpu_coreson: Number of cores online = 39
cpu_smt: SMT=4