--Problem Description-- Hard LOCKUP on stressing Ubuntu 18 04 ---Issue observed--- Hard LOCKUP on stressing Ubuntu 18 04 using Ubuntu 18 04, sometimes leads to rcu_stalls. Apr 17 00:00:23 lep8d kernel: [ 4309.786755] Watchdog CPU:3 Hard LOCKUP Apr 17 00:00:23 lep8d kernel: [ 4309.786759] Modules linked in: algif_rng salsa20_generic userio camellia_generic cast6_generic cast_common snd_seq snd_seq_device snd_timer snd soundcore vhost_net serpent_generic tap twofish_generic twofish_common vhost_vsock vmw_vsock_virtio_transport_common vhost vsock lrw unix_diag algif_skcipher cuse sctp tgr192 wp512 rmd320 rmd256 rmd160 hci_vhci rmd128 bluetooth ecdh_generic dccp_ipv4 md4 uhid hid algif_hash dccp af_alg xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables devlink ip6table_filter ip6_tables iptable_filter kvm_hv kvm binfmt_misc uio_pdrv_genirq uio vmx_crypto ibmpowernv powernv_op_panel ipmi_powernv Apr 17 00:00:23 lep8d kernel: [ 4309.786899] ipmi_devintf ipmi_msghandler powernv_rng leds_powernv crct10dif_vpmsum sch_fq_codel ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs xor zstd_compress raid6_pq uas usb_storage crc32c_vpmsum tg3 ipr Apr 17 00:00:23 lep8d kernel: [ 4309.786944] CPU: 3 PID: 28361 Comm: stress-ng-hrtim Not tainted 4.15.0-15-generic #16-Ubuntu Apr 17 00:00:23 lep8d kernel: [ 4309.786950] NIP: c000000000d0c8b8 LR: c000000000120dbc CTR: c000000000024480 Apr 17 00:00:23 lep8d kernel: [ 4309.786956] REGS: c000000007f7fd80 TRAP: 0900 Not tainted (4.15.0-15-generic) Apr 17 00:00:23 lep8d kernel: [ 4309.786957] MSR: 9000000000009033 CR: 28000442 XER: 20000000 Apr 17 00:00:23 lep8d kernel: [ 4309.786972] CFAR: c000000000120db8 SOFTE: 0 Apr 17 00:00:23 lep8d kernel: [ 4309.786972] GPR00: c000000000120dbc c000002d51377ba0 c0000000016eb400 c000002d512b0f88 Apr 17 00:00:23 lep8d kernel: [ 4309.786972] GPR04: 0000000000000000 0000000000000001 0000000001f40668 0000000000000001 Apr 17 00:00:23 lep8d kernel: [ 4309.786972] GPR08: 0000000000000001 0000000000000000 0000000080000003 0000000000000000 Apr 17 00:00:23 lep8d kernel: [ 4309.786972] GPR12: c000000000024480 c000000007a22100 0000000000000000 00000000000186a0 Apr 17 00:00:23 lep8d kernel: [ 4309.786972] GPR16: 00007fffed3687e0 00000abda98e5db8 0000000000008005 0000000000040100 Apr 17 00:00:23 lep8d kernel: [ 4309.786972] GPR20: 0000000000000000 00000000418004fc 00000000003c0000 0000000008430000 Apr 17 00:00:23 lep8d kernel: [ 4309.786972] GPR24: c000002d512b0f88 0000000000000000 c000002d51377d30 c000002d52d47c00 Apr 17 00:00:23 lep8d kernel: [ 4309.786972] GPR28: c000002d51377d50 c000002d51377d50 c000002d52f04500 c000002d512b0f88 Apr 17 00:00:23 lep8d kernel: [ 4309.787035] NIP [c000000000d0c8b8] _raw_spin_lock+0x38/0xe0 Apr 17 00:00:23 lep8d kernel: [ 4309.787045] LR [c000000000120dbc] dequeue_signal+0xcc/0x260 Apr 17 00:00:23 lep8d kernel: [ 4309.787046] Call Trace: Apr 17 00:00:23 lep8d kernel: [ 4309.787052] [c000002d51377bd0] [c000000000120dac] dequeue_signal+0xbc/0x260 Apr 17 00:00:23 lep8d kernel: [ 4309.787059] [c000002d51377c20] [c00000000012459c] get_signal+0x13c/0x7a0 Apr 17 00:00:23 lep8d kernel: [ 4309.787066] [c000002d51377d10] [c00000000001dacc] do_signal+0x7c/0x2c0 Apr 17 00:00:23 lep8d kernel: [ 4309.787072] [c000002d51377e00] [c00000000001deb0] do_notify_resume+0xd0/0x100 Apr 17 00:00:23 lep8d kernel: [ 4309.787083] [c000002d51377e30] [c00000000000b7c4] ret_from_except_lite+0x70/0x74 Apr 17 00:00:23 lep8d kernel: [ 4309.787085] Instruction dump: Apr 17 00:00:23 lep8d kernel: [ 4309.787090] 7c0802a6 60000000 fbe1fff8 f821ffd1 7c7f1b78 39400000 994d028c 814d0008 Apr 17 00:00:23 lep8d kernel: [ 4309.787102] 7d201829 2c090000 40c20010 7d40192d <40c2fff0> 7c2004ac 2fa90000 409e001c Apr 17 00:00:23 lep8d kernel: [ 4313.015781] kauditd_printk_skb: 13 callbacks suppressed ---uname output--- # uname -a Linux lep8d 4.15.0-15-generic #16-Ubuntu SMP Wed Apr 4 13:57:51 UTC 2018 ppc64le ppc64le ppc64le GNU/Linux Machine Type = Power 8 BML/Tuleta ----Additional Info----- Hard LOCKUP is also seen on garri BML. syslog is attached. Reproducible : 90% ---Steps to Reproduce--- 1. wget https://github.com/ColinIanKing/stress-ng/archive/master.zip 2. unzip master.zip; cd stress-ng-master; 3. make; make install; 4. Run the following command multiple times stress-ng --all --vm-bytes 80% --aggressive --maximize --oomable --timeout 300 --verify --syslog --metrics --times Issue is observed on Power 9 BML machines as well. [Tue Apr 17 04:13:42 2018] Watchdog CPU:37 Hard LOCKUP [Tue Apr 17 04:13:42 2018] Modules linked in: vsock lrw algif_skcipher tgr192 wp512 rmd320 rmd256 hci_vhci unix_diag bluetooth rmd160 sctp rmd128 ecdh_generic md4 dccp_ipv4 algif_hash cuse dccp af_alg vhost_net vhost tap xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink binfmt_misc kvm_hv kvm dm_crypt dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua idt_89hpesx joydev input_leds mac_hid ofpart cmdlinepart vmx_crypto ipmi_powernv ipmi_devintf at24 uio_pdrv_genirq ibmpowernv opal_prd crct10dif_vpmsum powernv_flash ipmi_msghandler mtd uio sch_fq_codel ib_iser rdma_cm iw_cm ib_cm [Tue Apr 17 04:13:42 2018] ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi jc42 ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq ses enclosure scsi_transport_sas hid_generic ast i2c_algo_bit ttm drm_kms_helper usbhid hid syscopyarea sysfillrect sysimgblt fb_sys_fops crc32c_vpmsum drm i40e aacraid [Tue Apr 17 04:13:42 2018] CPU: 37 PID: 11524 Comm: stress-ng-hrtim Not tainted 4.15.0-15-generic #16-Ubuntu [Tue Apr 17 04:13:42 2018] NIP: c00000000012058c LR: c000000000120554 CTR: c00000000002bd30 [Tue Apr 17 04:13:42 2018] REGS: c000000007debd80 TRAP: 0900 Not tainted (4.15.0-15-generic) [Tue Apr 17 04:13:42 2018] MSR: 9000000000009033 CR: 22000442 XER: 00000000 [Tue Apr 17 04:13:42 2018] CFAR: c00000000011f8c4 SOFTE: 0 GPR00: c000000000120554 c0000004cfc7bd10 c0000000016eb400 c0000004cfb54200 GPR04: c0000004cfc7be00 0000000042000442 0000000000007338 0000000000040000 GPR08: c0000004cfc78080 c0000004cfc78000 0000000000000002 c000000000d10f78 GPR12: c00000000002bd30 c000000007a39700 0000000000000000 00000000000186a0 GPR16: 00007fffd2cab1d0 00000edac67c5db8 00007fffd2cab1c8 00000edac67c5dc0 GPR20: 00007fffd2cab1cc 00007fffd2cab48b ffffffffffffffff 00007fffd2cab48a GPR24: 0000000000010000 000072164cd80000 00007fffd2cab0c4 00007fffd2cab1d8 GPR28: 0000000000000000 c0000004cfc7be00 c0000004cfc7be00 c0000004cfb54200 [Tue Apr 17 04:13:42 2018] NIP [c00000000012058c] recalc_sigpending+0x5c/0x90 [Tue Apr 17 04:13:42 2018] LR [c000000000120554] recalc_sigpending+0x24/0x90 [Tue Apr 17 04:13:42 2018] Call Trace: [Tue Apr 17 04:13:42 2018] [c0000004cfc7bd10] [c00000000001db60] do_signal+0x110/0x2c0 (unreliable) [Tue Apr 17 04:13:42 2018] [c0000004cfc7bd30] [c000000000121658] __set_task_blocked+0x48/0x90 [Tue Apr 17 04:13:42 2018] [c0000004cfc7bd70] [c000000000124ed8] __set_current_blocked+0x58/0xb0 [Tue Apr 17 04:13:42 2018] [c0000004cfc7bda0] [c00000000002be18] sys_rt_sigreturn+0xe8/0x270 [Tue Apr 17 04:13:42 2018] [c0000004cfc7be30] [c00000000000b184] system_call+0x58/0x6c [Tue Apr 17 04:13:42 2018] Instruction dump: [Tue Apr 17 04:13:42 2018] e86d0260 3d220020 3929def8 81290000 2f890000 409e0030 78290464 39400002 [Tue Apr 17 04:13:42 2018] 39090080 7ce040a8 7ce75078 7ce041ad <40c2fff4> 38210020 e8010010 7c0803a6 Continuous lockups are observed. - Harish == == Hardware: P9 Boston/ P8 Tuleta DD revision: P9 DD2.2 Operating Env.: BML PNOR: version-SUPERMICRO-P9DSU-V1.10-20180413-imp Host OS: Ubuntu 18.04 ==== (In reply to comment #5) > hi Harish, > > Does this bug happen if you set powersave=off? I am wondering if this > problem might be related to the stop state issue. Following Issue is seen with powersave=off. This is on Power 8 BML. [ 517.480153] Modules linked in: salsa20_generic(+) camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common lrw algif_skcipher cuse hci_vhci bluetooth snd_seq snd_seq_device tgr192 ecdh_generic snd_timer snd wp512 soundcore rmd320 rmd256 rmd160 uhid hid vhost_net tap sctp unix_diag userio rmd128 vhost_vsock vmw_vsock_virtio_transport_common md4 dccp_ipv4 vhost vsock algif_hash dccp af_alg xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_hv kvm binfmt_misc ipmi_powernv leds_powernv ipmi_devintf vmx_crypto uio_pdrv_genirq ipmi_msghandler [ 517.480305] powernv_rng crct10dif_vpmsum uio ibmpowernv powernv_op_panel sch_fq_codel ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs xor zstd_compress raid6_pq tg3 uas crc32c_vpmsum ipr usb_storage [ 517.480343] CPU: 73 PID: 4388 Comm: stress-ng-dev Not tainted 4.15.0-15-generic #16-Ubuntu [ 517.480347] NIP: c00000000000a724 LR: c000000000016e74 CTR: 0000000030061154 [ 517.480352] REGS: c000001fef1db890 TRAP: 0901 Not tainted (4.15.0-15-generic) [ 517.480353] MSR: 9000000000009033 CR: 48002224 XER: 20000000 [ 517.480368] CFAR: c000000000d0c8fc SOFTE: 1 [ 517.480368] GPR00: c00000000018c5bc c000001fef1dbb10 c0000000016eb400 0000000000000500 [ 517.480368] GPR04: 0000000000000000 c0000000000a2608 9000000000001033 0000000000000004 [ 517.480368] GPR08: c000000007a52300 0000000000000000 0000000080000049 9000000000001003 [ 517.480368] GPR12: 0000000000000040 c000000007a52300 [ 517.480406] NIP [c00000000000a724] replay_interrupt_return+0x0/0x4 [ 517.480411] LR [c000000000016e74] arch_local_irq_restore+0x74/0x90 [ 517.480412] Call Trace: [ 517.480420] [c000001fef1dbb10] [c0000000018bd940] log_first_seq+0x0/0x8 (unreliable) [ 517.480427] [c000001fef1dbb30] [c00000000018c5bc] console_unlock+0x2fc/0x6c0 [ 517.480432] [c000001fef1dbc20] [c00000000018ccec] vprintk_emit+0x36c/0x420 [ 517.480437] [c000001fef1dbc90] [c00000000018ec54] vprintk_func+0x64/0xf0 [ 517.480442] [c000001fef1dbcb0] [c00000000018e354] printk+0x40/0x54 [ 517.480455] [c000001fef1dbcd0] [d00000001efc2f20] vsock_dev_do_ioctl.isra.4+0xb8/0xe0 [vsock] [ 517.480462] [c000001fef1dbd40] [c0000000003efc34] do_vfs_ioctl+0xd4/0xa00 [ 517.480467] [c000001fef1dbde0] [c0000000003f0624] SyS_ioctl+0xc4/0x130 [ 517.480473] [c000001fef1dbe30] [c00000000000b184] system_call+0x58/0x6c [ 517.480475] Instruction dump: [ 517.480480] 7d8000a6 e9628008 7d200026 618c8000 2c030900 4182e7f8 2c030500 4182e310 [ 517.480491] 2c030a00 4182ffa4 2c030e60 4182f090 <4e800020> 7c781b78 48000359 48000371 which leads to the Hard LOCKUP and rcu_stalls. [ 629.383369] Watchdog CPU:73 Hard LOCKUP [ 629.383372] Modules linked in: salsa20_generic(+) camellia_generic cast6_generic cast_common serpent_generic twofish_generic twofish_common lrw algif_skcipher cuse hci_vhci bluetooth snd_seq snd_seq_device tgr192 ecdh_generic snd_timer snd wp512 soundcore rmd320 rmd256 rmd160 uhid hid vhost_net tap sctp unix_diag userio rmd128 vhost_vsock vmw_vsock_virtio_transport_common md4 dccp_ipv4 vhost vsock algif_hash dccp af_alg xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack libcrc32c ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc devlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter kvm_hv kvm binfmt_misc ipmi_powernv leds_powernv ipmi_devintf vmx_crypto uio_pdrv_genirq ipmi_msghandler [ 629.383451] powernv_rng crct10dif_vpmsum uio ibmpowernv powernv_op_panel sch_fq_codel ip_tables x_tables autofs4 ses enclosure scsi_transport_sas btrfs xor zstd_compress raid6_pq tg3 uas crc32c_vpmsum ipr usb_storage [ 629.383475] CPU: 73 PID: 4388 Comm: stress-ng-dev Tainted: G L 4.15.0-15-generic #16-Ubuntu [ 629.383478] NIP: c0000000000a259c LR: c00000000009df5c CTR: 0000000030036830 [ 629.383481] REGS: c000000007c37d80 TRAP: 0900 Tainted: G L (4.15.0-15-generic) [ 629.383482] MSR: 9000000000009033 CR: 48002222 XER: 20000000 [ 629.383494] CFAR: c00000000009df48 SOFTE: 0 [ 629.383497] GPR00: 0000000030005128 c000001fef1dba30 c0000000016eb400 0000000000000000 [ 629.383504] GPR04: 0000000048002222 c0000000000a259c 9000000000009033 00000000000000f1 [ 629.383510] GPR08: 0000000000000000 00000000300b0218 c00000000009df70 9000000000001003 [ 629.383517] GPR12: c00000000009df48 c000000007a52300 0000714a9655a560 0000000000000000 [ 629.383524] GPR16: 0000000000000027 0000000000000027 c000000001572a00 0000000000000000 [ 629.383530] GPR20: 20c49ba5e353f7cf 0000000000000000 0000000000000017 000000000000000d [ 629.383537] GPR24: fffffffffffffff5 0000000000000000 0000000000000010 c0000000018a8d18 [ 629.383544] GPR28: 0000000000000000 0000000000000010 c000001fef1dbad0 0000000000000010 [ 629.383551] NIP [c0000000000a259c] opal_put_chars+0x19c/0x280 [ 629.383553] LR [c00000000009df5c] opal_return+0x14/0x48 [ 629.383554] Call Trace: [ 629.383556] [c000001fef1dba30] [c0000000000a259c] opal_put_chars+0x19c/0x280 (unreliable) [ 629.383561] [c000001fef1dbab0] [c0000000008089b0] hvc_console_print+0xd0/0x210 [ 629.383564] [c000001fef1dbb30] [c00000000018c59c] console_unlock+0x2dc/0x6c0 [ 629.383568] [c000001fef1dbc20] [c00000000018ccec] vprintk_emit+0x36c/0x420 [ 629.383571] [c000001fef1dbc90] [c00000000018ec54] vprintk_func+0x64/0xf0 [ 629.383574] [c000001fef1dbcb0] [c00000000018e354] printk+0x40/0x54 [ 629.383577] [c000001fef1dbcd0] [d00000001efc2f20] vsock_dev_do_ioctl.isra.4+0xb8/0xe0 [vsock] [ 629.383581] [c000001fef1dbd40] [c0000000003efc34] do_vfs_ioctl+0xd4/0xa00 [ 629.383584] [c000001fef1dbde0] [c0000000003f0624] SyS_ioctl+0xc4/0x130 [ 629.383587] [c000001fef1dbe30] [c00000000000b184] system_call+0x58/0x6c [ 629.383589] Instruction dump: [ 629.383591] 7f03c378 38210080 eb01ffc0 eb61ffd8 4e800020 7f63db78 7f24cb78 48c6a5e1 [ 629.383600] 60000000 38600000 3b00fff5 4bffc01d <60000000> e8010090 eb210048 eb810060 [ 638.478556] INFO: rcu_sched detected stalls on CPUs/tasks: [ 638.478581] 73-....: (149 ticks this GP) idle=0aa/140000000000000/0 softirq=14950/14950 fqs=15015 [ 638.478586] (detected by 83, t=37345 jiffies, g=2733, c=2732, q=9230463) [ 638.478645] Sending NMI from CPU 83 to CPUs 73: So, actual issue might be with powersave. These issue are due to huge data dumped to console. - Harish == Harish Sriram