[thunderx] Synchronous External Abort: synchronous parity or ECC error
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Triaged
|
Undecided
|
Unassigned | ||
Bionic |
Confirmed
|
Undecided
|
Unassigned | ||
Disco |
Won't Fix
|
Undecided
|
Unassigned | ||
Eoan |
Triaged
|
Undecided
|
Unassigned | ||
Focal |
Triaged
|
Undecided
|
Unassigned |
Bug Description
[Impact]
Under load, ThunderX systems eventually fail with:
[ 282.360376] Synchronous External Abort: synchronous parity or ECC error (0x96000018) at 0x0000ffffa6eb7000
[ 282.372351] Internal error: : 96000018 [#1] SMP
[ 282.379152] Modules linked in: nls_iso8859_1 thunderx_edac thunderx_zip shpchp cavium_rng_vf cavium_rng gpio_keys uio_pdrv_genirq uio ipmi_ssif ipmi_devintf ipmi_msghandler sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_
[ 282.467284] Process cc1 (pid: 39700, stack limit = 0x00000000e0c44146)
[ 282.477172] CPU: 25 PID: 39700 Comm: cc1 Not tainted 4.15.0-75-generic #85+lp1857074.1
[ 282.488379] Hardware name: Cavium ThunderX CRB/To be filled by O.E.M., BIOS 5.11 12/12/2012
[ 282.500121] pstate: 80000005 (Nzcv daif -PAN -UAO)
[ 282.508297] pc : __arch_
[ 282.516430] lr : cp_new_
[ 282.523768] sp : ffff00002e4d3d40
[ 282.530369] x29: ffff00002e4d3d40 x28: ffff801f51fa2d00
[ 282.538988] x27: ffff000008b52000 x26: 0000000000000050
[ 282.548031] x25: 0000000000000124 x24: 0000000000000015
[ 282.556872] x23: 0000000000000000 x22: 000000002e4d3d88
[ 282.565449] x21: ffff801f51fa2d00 x20: ffff000009588000
[ 282.574109] x19: ffff00002e4d3e30 x18: 0000ffffa87e7a70
[ 282.582790] x17: 0000ffffa8756110 x16: ffff0000082f4448
[ 282.591433] x15: 0000000000000000 x14: 0000000000000012
[ 282.599986] x13: 00682e6c746e6366 x12: 2f78756e696c2f69
[ 282.608730] x11: 0000000000000000 x10: 0000000000000cf0
[ 282.617283] x9 : 0000000000001000 x8 : 00000001000081a4
[ 282.625839] x7 : 0000000001001a2b x6 : 000000002e4d3da0
[ 282.634238] x5 : 000000002e4d3e08 x4 : 0000000000000008
[ 282.642754] x3 : 0000000000000802 x2 : fffffffffffffff8
[ 282.651250] x1 : ffff00002e4d3d90 x0 : 000000002e4d3d88
[ 282.660013] Call trace:
[ 282.665421] __arch_
[ 282.672979] SyS_newfstat+
[ 282.679272] el0_svc_
[ 282.685605] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829)
[ 282.694411] ---[ end trace 863693cf0c3fd297 ]---
[Test Case]
We found this by doing a reboot/kernel build loop. (The reboot maybe unnecessary). Code to automate this setup is at:
https:/
[Fix]
[Regression Risk]
Changed in linux (Ubuntu Bionic): | |
status: | New → Confirmed |
Changed in linux (Ubuntu Disco): | |
status: | New → Triaged |
Changed in linux (Ubuntu Eoan): | |
status: | New → Triaged |
Changed in linux (Ubuntu Focal): | |
status: | New → Triaged |
Changed in linux (Ubuntu Disco): | |
status: | Triaged → Won't Fix |
Also reproducible w/ the 5.0.0-37.40 kernel. I'll try a mainline 5.5-rc6 build next.
[ 602.796765] Internal error: synchronous parity or ECC error: 96000018 [#1] SMP iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear aes_ce_blk aes_ce_cipher nicvf cavium_ptp ast i2c_algo_bit ttm drm_kms_helper crct10dif_ce ghash_ce syscopyarea sysfillrect sha2_ce sysimgblt uas hid_generic nicpf fb_sys_fops sha256_arm64 drm sha1_ce usbhid usb_storage hid thunder_bgx ahci thunder_xcv i2c_thunderx mdio_thunder thunderx_mmc mdio_cavium aes_neon_bs aes_neon_blk crypto_simd cryptd aes_arm64 copy_to_ user+0x13c/ 0x248 stat+0x140/ 0x178 copy_to_ user+0x13c/ 0x248 newfstat+ 0x58/0x88 sys_newfstat+ 0x20/0x30 common+ 0x88/0x180 handler+ 0x38/0x78
[ 602.803994] Modules linked in: nls_iso8859_1 cavium_rng_vf ipmi_ssif ipmi_devintf input_leds joydev ipmi_msghandler thunderx_edac cavium_rng sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_
[ 602.872414] Process cc1 (pid: 40126, stack limit = 0x0000000090887c2f)
[ 602.878949] CPU: 10 PID: 40126 Comm: cc1 Not tainted 5.0.0-37-generic #40~18.04.1-Ubuntu
[ 602.887040] Hardware name: GIGABYTE R120-T33/MT30-GS1, BIOS T49 02/02/2018
[ 602.893921] pstate: 80000005 (Nzcv daif -PAN -UAO)
[ 602.898724] pc : __arch_
[ 602.903353] lr : cp_new_
[ 602.907277] sp : ffff00002599bcc0
[ 602.910594] x29: ffff00002599bcc0 x28: ffff800ed0538ec0
[ 602.915912] x27: 0000000000000000 x26: 0000000000000000
[ 602.921229] x25: 0000000056000000 x24: 0000000000000015
[ 602.926547] x23: ffff000010c716d8 x22: 000000002599bd08
[ 602.931865] x21: ffff800ed0538ec0 x20: ffff00001170c000
[ 602.937181] x19: ffff00002599bdb0 x18: 0000000000000000
[ 602.942498] x17: 0000000000000000 x16: 0000000000000000
[ 602.947818] x15: 0000000000000000 x14: 0000000000000000
[ 602.953134] x13: 0000000000000000 x12: 0000000000000000
[ 602.958452] x11: 0000000000000000 x10: 000000000000152f
[ 602.963769] x9 : 0000000000001000 x8 : 00000001000081a4
[ 602.969087] x7 : 0000000000a60da3 x6 : 000000002599bd20
[ 602.974405] x5 : 000000002599bd88 x4 : 0000000000000008
[ 602.979721] x3 : 0000000000000802 x2 : fffffffffffffff8
[ 602.985038] x1 : ffff00002599bd10 x0 : 000000002599bd08
[ 602.990356] Call trace:
[ 602.992821] __arch_
[ 602.997107] __se_sys_
[ 603.001045] __arm64_
[ 603.005243] el0_svc_
[ 603.009005] el0_svc_
[ 603.012770] el0_svc+0x8/0xc
[ 603.015664] Code: a8c12027 a88120c7 d503201f d503201f (a8c12829)
[ 603.021765] ---[ end trace 08068f2978fb8211 ]---