memcg_regression_test in ubuntu_ltp_controllers cause system hang on J-ARM64
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
New
|
Undecided
|
Unassigned |
Bug Description
Issue found on J-5.15.0-47.51 with the following ARM64 instances:
* howzit-kernel.arm64
* kuzzle.arm64
* helo-kernel.arm64 (with lowlatency 64k kernel)
The only exception for the moment is:
* appleton-kernel (with lowlatency kernel)
This issue came up after LTP test suite update (bug 1982995), it should not be considered as a regression since memcg_regressio
In this case, the system will complain about this in the end of test case 1:
[ 5481.129771] UBSAN: array-index-
[ 5481.139769] index 256 is out of range for type 'long unsigned int [256]'
[ 5481.146467] CPU: 13 PID: 104657 Comm: memcg_regressio Not tainted 5.15.0-46-generic #49-Ubuntu
[ 5481.146472] Hardware name: Lenovo HR330A 7X33CTO1WW /FALCON , BIOS hve104r-1.15 02/26/2021
[ 5481.146474] Call trace:
[ 5481.146476] dump_backtrace+
[ 5481.146481] show_stack+
[ 5481.146483] dump_stack_
[ 5481.146486] dump_stack+
[ 5481.146489] ubsan_epilogue+
[ 5481.146491] __ubsan_
[ 5481.146495] dl_task_
[ 5481.146499] task_can_
[ 5481.146502] cpuset_
[ 5481.146506] cgroup_
[ 5481.146509] cgroup_
[ 5481.146512] cgroup_
[ 5481.146514] __cgroup_
[ 5481.146517] cgroup_
[ 5481.146520] cgroup_
[ 5481.146523] kernfs_
[ 5481.146527] new_sync_
[ 5481.146531] vfs_write+
[ 5481.146533] ksys_write+
[ 5481.146536] __arm64_
[ 5481.146538] invoke_
[ 5481.146541] el0_svc_
[ 5481.146544] do_el0_
[ 5481.146547] el0_svc+0x48/0x1b0
[ 5481.146550] el0t_64_
[ 5481.146552] el0t_64_
[ 5481.146555] =======
[ 5481.154990] Unable to handle kernel paging request at virtual address ffff80000a17abb0
[ 5481.162903] Mem abort info:
[ 5481.165693] ESR = 0x96000007
[ 5481.168742] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5481.174052] SET = 0, FnV = 0
[ 5481.177101] EA = 0, S1PTW = 0
[ 5481.180237] FSC = 0x07: level 3 translation fault
[ 5481.185109] Data abort info:
[ 5481.187984] ISV = 0, ISS = 0x00000007
[ 5481.191814] CM = 0, WnR = 0
[ 5481.194770] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000bf1a
[ 5481.201465] [ffff80000a17abb0] pgd=100000bffff
[ 5481.213982] Internal error: Oops: 96000007 [#1] SMP
[ 5481.218848] Modules linked in: nls_iso8859_1 acpi_ipmi joydev input_leds ipmi_ssif efi_pstore xgene_hwmon cppc_cpufreq sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear mlx5_ib ib_uverbs ib_core uas hid_generic usbhid hid usb_storage dwc3 ast ulpi drm_vram_helper udc_core drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt mlx5_core crct10dif_ce fb_sys_fops cec ghash_ce rc_core sha2_ce sha256_arm64 mlxfw sha1_ce nvme psample igb drm nvme_core tls i2c_algo_bit i2c_xgene_slimpro ahci_platform gpio_dwapb xhci_plat_hcd aes_neon_bs aes_neon_blk aes_ce_blk crypto_simd cryptd aes_ce_cipher
[ 5481.296632] CPU: 13 PID: 104657 Comm: memcg_regressio Not tainted 5.15.0-46-generic #49-Ubuntu
[ 5481.305230] Hardware name: Lenovo HR330A 7X33CTO1WW /FALCON , BIOS hve104r-1.15 02/26/2021
[ 5481.315042] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 5481.321990] pc : dl_task_
[ 5481.326423] lr : dl_task_
[ 5481.330941] sp : ffff80004210b8d0
[ 5481.334242] x29: ffff80004210b8d0 x28: ffff000817e3ee40 x27: 0000000000000000
[ 5481.341366] x26: ffff80004210bae0 x25: 0000000000000000 x24: ffff000807041800
[ 5481.348489] x23: ffff80000a17a140 x22: 0000000000000100 x21: ffff80000a17a140
[ 5481.355613] x20: ffff80000a912818 x19: ffff80000a90dab0 x18: 0000000000000000
[ 5481.362736] x17: 3d3d3d3d3d3d3d3d x16: 3d3d3d3d3d3d3d3d x15: 3d3d3d3d3d3d3d3d
[ 5481.369860] x14: 3d3d3d3d3d3d3d3d x13: 3d3d3d3d3d3d3d3d x12: 3d3d3d3d3d3d3d3d
[ 5481.376983] x11: 3d3d3d3d3d3d3d3d x10: 3d3d3d3d3d3d3d3d x9 : ffff800008370c18
[ 5481.384106] x8 : 3d3d3d3d3d3d3d3d x7 : 0000000000000001 x6 : 0000000000000001
[ 5481.391229] x5 : 0000000000000000 x4 : ffff00bf5d705a88 x3 : 0000000000000000
[ 5481.398352] x2 : ffff000817e3ee40 x1 : ffff80000a90d000 x0 : ffff80000a17a140
[ 5481.405476] Call trace:
[ 5481.407909] dl_task_
[ 5481.411993] task_can_
[ 5481.415729] cpuset_
[ 5481.419727] cgroup_
[ 5481.424158] cgroup_
[ 5481.427808] cgroup_
[ 5481.431978] __cgroup_
[ 5481.436322] cgroup_
[ 5481.440320] cgroup_
[ 5481.444316] kernfs_
[ 5481.448748] new_sync_
[ 5481.452485] vfs_write+
[ 5481.455874] ksys_write+
[ 5481.459263] __arm64_
[ 5481.463173] invoke_
[ 5481.466910] el0_svc_
[ 5481.471689] do_el0_
[ 5481.474992] el0_svc+0x48/0x1b0
[ 5481.478122] el0t_64_
[ 5481.482379] el0t_64_
[ 5481.486030] Code: b0013734 91206294 f8767a80 8b170000 (f945381c)
[ 5481.492111] ---[ end trace 17955f4bab6956d4 ]---
Test output:
COMMAND: /opt/ltp/
LOG File: /opt/ltp/
FAILED COMMAND File: /opt/ltp/
TCONF COMMAND File: /opt/ltp/
Running tests.......
<<<test_start>>>
tag=memcg_
cmdline=
contacts=""
analysis=exit
<<<test_output>>>
incrementing stop
memcg_regressio
memcg_regressio
memcg_regressio
memcg_regressio
Test timed out, sending SIGTERM!
If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1
Test is still running... 10
Test is still running... 9
Test is still running... 8
Test is still running... 7
Test is still running... 6
Test is still running... 5
Test is still running... 4
Test is still running... 3
Test is still running... 2
Test is still running... 1
Test is still running, sending SIGKILL
I tried to bump LTP_TIMEOUT_MUL to 10, but it's still not working. System will stop responding at this point.
Please find attachment for the complete syslog output.