OOM by msgstress04 in ubuntu_ltp_syscalls caused network connectivity lost on openstack P8 with B-hwe-5.4
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
New
|
Undecided
|
Unassigned |
Bug Description
This is not a regression, it can be found since cycle 2023.07.10 with B-hwe-5.
This is only affecting P8 instance on openstack.
Just like bug 2039515, the instance will be disconnected when running the msgstress04 test, test output:
04:36:36 INFO | START ubuntu_
04:36:36 DEBUG| Persistent state client.
04:36:36 DEBUG| Persistent state client.
04:36:36 DEBUG| Waiting for pid 14884 for 3600 seconds
Connection to 10.43.123.7 closed by remote host.
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: unexplained error (code 255) at io.c(231) [Receiver=3.2.7]
TEST SYSTEM FAILURE DETECTED
A failure of the system under test has been detected.
Please review log files for a potential panic, hang or unexpected reboot
-------
R E S U L T S
-------
With a manual test you will see this test caused OOM and kills sshd:
Oct 17 05:25:44 10 kernel: [ 297.512158] LTP: starting msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479487] msgstress04 invoked oom-killer: gfp_mask=
Oct 17 05:25:52 10 kernel: [ 305.479493] CPU: 1 PID: 8725 Comm: msgstress04 Not tainted 5.4.0-165-generic #182~18.04.1-Ubuntu
Oct 17 05:25:52 10 kernel: [ 305.479496] Call Trace:
Oct 17 05:25:52 10 kernel: [ 305.479502] [c00000006aa2b370] [c000000000f2da68] dump_stack+
Oct 17 05:25:52 10 kernel: [ 305.479506] [c00000006aa2b3b0] [c00000000038e53c] dump_header+
Oct 17 05:25:52 10 kernel: [ 305.479508] [c00000006aa2b440] [c00000000038ed9c] oom_kill_
Oct 17 05:25:52 10 kernel: [ 305.479510] [c00000006aa2b480] [c000000000390088] out_of_
Oct 17 05:25:52 10 kernel: [ 305.479512] [c00000006aa2b520] [c00000000040cb34] __alloc_
Oct 17 05:25:52 10 kernel: [ 305.479514] [c00000006aa2b6e0] [c00000000040d188] __alloc_
Oct 17 05:25:52 10 kernel: [ 305.479517] [c00000006aa2b760] [c000000000436bf8] alloc_pages_
Oct 17 05:25:52 10 kernel: [ 305.479519] [c00000006aa2b7d0] [c0000000003dbd10] wp_page_
Oct 17 05:25:52 10 kernel: [ 305.479520] [c00000006aa2b8a0] [c0000000003dfab4] do_wp_page+
Oct 17 05:25:52 10 kernel: [ 305.479522] [c00000006aa2b8f0] [c0000000003e36a0] __handle_
Oct 17 05:25:52 10 kernel: [ 305.479523] [c00000006aa2b9e0] [c0000000003e40d0] handle_
Oct 17 05:25:52 10 kernel: [ 305.479525] [c00000006aa2ba20] [c00000000008b65c] __do_page_
Oct 17 05:25:52 10 kernel: [ 305.479527] [c00000006aa2baf0] [c00000000000a908] handle_
Oct 17 05:25:52 10 kernel: [ 305.479531] --- interrupt: 301 at schedule_
Oct 17 05:25:52 10 kernel: [ 305.479531] LR = schedule_
Oct 17 05:25:52 10 kernel: [ 305.479532] [c00000006aa2bdf0] [c0000000001977a4] schedule_
Oct 17 05:25:52 10 kernel: [ 305.479534] [c00000006aa2be20] [c00000000000b69c] ret_from_
Oct 17 05:25:52 10 kernel: [ 305.479535] Mem-Info:
Oct 17 05:25:52 10 kernel: [ 305.479540] active_anon:52102 inactive_anon:42 isolated_anon:0
Oct 17 05:25:52 10 kernel: [ 305.479540] active_file:16 inactive_file:0 isolated_file:1
Oct 17 05:25:52 10 kernel: [ 305.479540] unevictable:0 dirty:0 writeback:0 unstable:0
Oct 17 05:25:52 10 kernel: [ 305.479540] slab_reclaimabl
Oct 17 05:25:52 10 kernel: [ 305.479540] mapped:0 shmem:127 pagetables:1702 bounce:0
Oct 17 05:25:52 10 kernel: [ 305.479540] free:2728 free_pcp:18 free_cma:0
Oct 17 05:25:52 10 kernel: [ 305.479543] Node 0 active_
Oct 17 05:25:52 10 kernel: [ 305.479544] Node 0 Normal free:174592kB min:180224kB low:225280kB high:270336kB active_
Oct 17 05:25:52 10 kernel: [ 305.479548] lowmem_reserve[]: 0 0 0
Oct 17 05:25:52 10 kernel: [ 305.479549] Node 0 Normal: 114*64kB (UME) 21*128kB (E) 3*256kB (UE) 26*512kB (M) 13*1024kB (UM) 1*2048kB (M) 1*4096kB (U) 16*8192kB (M) 0*16384kB = 174592kB
Oct 17 05:25:52 10 kernel: [ 305.479556] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
Oct 17 05:25:52 10 kernel: [ 305.479558] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
Oct 17 05:25:52 10 kernel: [ 305.479558] 147 total pagecache pages
Oct 17 05:25:52 10 kernel: [ 305.479560] 0 pages in swap cache
Oct 17 05:25:52 10 kernel: [ 305.479561] Swap cache stats: add 0, delete 0, find 0/0
Oct 17 05:25:52 10 kernel: [ 305.479561] Free swap = 0kB
Oct 17 05:25:52 10 kernel: [ 305.479562] Total swap = 0kB
Oct 17 05:25:52 10 kernel: [ 305.479562] 65536 pages RAM
Oct 17 05:25:52 10 kernel: [ 305.479563] 0 pages HighMem/MovableOnly
Oct 17 05:25:52 10 kernel: [ 305.479563] 778 pages reserved
Oct 17 05:25:52 10 kernel: [ 305.479564] 0 pages cma reserved
Oct 17 05:25:52 10 kernel: [ 305.479564] 0 pages hwpoisoned
Oct 17 05:25:52 10 kernel: [ 305.479566] Tasks state (memory values in pages):
Oct 17 05:25:52 10 kernel: [ 305.479566] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Oct 17 05:25:52 10 kernel: [ 305.479570] [ 398] 0 398 771 65 29696 0 0 systemd-journal
Oct 17 05:25:52 10 kernel: [ 305.479572] [ 410] 0 410 322 53 27392 0 -1000 systemd-udevd
Oct 17 05:25:52 10 kernel: [ 305.479574] [ 411] 0 411 112 22 26368 0 0 blkmapd
Oct 17 05:25:52 10 kernel: [ 305.479576] [ 414] 0 414 1266 22 30976 0 0 lvmetad
Oct 17 05:25:52 10 kernel: [ 305.479578] [ 455] 0 455 88 26 26368 0 0 rpc.idmapd
Oct 17 05:25:52 10 kernel: [ 305.479580] [ 469] 62583 469 1411 63 28160 0 0 systemd-timesyn
Oct 17 05:25:52 10 kernel: [ 305.479581] [ 470] 0 470 173 46 26880 0 0 rpcbind
Oct 17 05:25:52 10 kernel: [ 305.479583] [ 531] 0 531 147 105 30976 0 0 haveged
Oct 17 05:25:52 10 kernel: [ 305.479585] [ 883] 100 883 428 68 28160 0 0 systemd-network
Oct 17 05:25:52 10 kernel: [ 305.479587] [ 906] 101 906 276 69 32256 0 0 systemd-resolve
Oct 17 05:25:52 10 kernel: [ 305.479589] [ 940] 0 940 142 45 30720 0 0 rpc.mountd
Oct 17 05:25:52 10 kernel: [ 305.479591] [ 1072] 0 1072 266 72 32256 0 0 systemd-logind
Oct 17 05:25:52 10 kernel: [ 305.479593] [ 1073] 0 1073 3786 77 34304 0 0 accounts-daemon
Oct 17 05:25:52 10 kernel: [ 305.479594] [ 1075] 0 1075 1739 207 30464 0 0 networkd-dispat
Oct 17 05:25:52 10 kernel: [ 305.479596] [ 1079] 0 1079 166 34 27136 0 0 cron
Oct 17 05:25:52 10 kernel: [ 305.479598] [ 1080] 0 1080 101 31 26368 0 0 atd
Oct 17 05:25:52 10 kernel: [ 305.479599] [ 1084] 0 1084 2381 47 26880 0 0 lxcfs
Oct 17 05:25:52 10 kernel: [ 305.479601] [ 1085] 103 1085 186 53 31232 0 -900 dbus-daemon
Oct 17 05:25:52 10 kernel: [ 305.479603] [ 1086] 0 1086 1331 8 26112 0 0 iprdump
Oct 17 05:25:52 10 kernel: [ 305.479604] [ 1091] 0 1091 1331 42 31232 0 0 irqbalance
Oct 17 05:25:52 10 kernel: [ 305.479606] [ 1092] 102 1092 3513 52 32256 0 0 rsyslogd
Oct 17 05:25:52 10 kernel: [ 305.479608] [ 1097] 0 1097 1841 205 30976 0 0 unattended-upgr
Oct 17 05:25:52 10 kernel: [ 305.479610] [ 1104] 0 1104 118 27 26368 0 0 rtas_errd
Oct 17 05:25:52 10 kernel: [ 305.479611] [ 1213] 0 1213 269 67 32000 0 -1000 sshd
Oct 17 05:25:52 10 kernel: [ 305.479613] [ 1258] 0 1258 3741 117 37632 0 0 polkitd
Oct 17 05:25:52 10 kernel: [ 305.479614] [ 1270] 0 1270 54 9 26112 0 0 iprinit
Oct 17 05:25:52 10 kernel: [ 305.479616] [ 1273] 0 1273 54 9 26112 0 0 iprupdate
Oct 17 05:25:52 10 kernel: [ 305.479618] [ 1350] 0 1350 130 16 30720 0 0 agetty
Oct 17 05:25:52 10 kernel: [ 305.479620] [ 1364] 0 1364 96 16 30464 0 0 agetty
Oct 17 05:25:52 10 kernel: [ 305.479622] [ 1490] 0 1490 334 107 28416 0 0 sshd
Oct 17 05:25:52 10 kernel: [ 305.479623] [ 1505] 1000 1505 315 78 32000 0 0 systemd
Oct 17 05:25:52 10 kernel: [ 305.479625] [ 1507] 1000 1507 1682 140 29440 0 0 (sd-pam)
Oct 17 05:25:52 10 kernel: [ 305.479627] [ 1620] 1000 1620 334 105 28416 0 0 sshd
Oct 17 05:25:52 10 kernel: [ 305.479629] [ 1621] 1000 1621 186 45 31232 0 0 bash
Oct 17 05:25:52 10 kernel: [ 305.479630] [ 1633] 0 1633 230 62 31744 0 0 sudo
Oct 17 05:25:52 10 kernel: [ 305.479632] [ 1635] 0 1635 55 11 29952 0 0 runltp
Oct 17 05:25:52 10 kernel: [ 305.479634] [ 1769] 0 1769 52 7 25856 0 0 ltp-pan
Oct 17 05:25:52 10 kernel: [ 305.479636] [ 1770] 0 1770 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479637] [ 6836] 0 6836 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479639] [ 6838] 0 6838 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479641] [ 6839] 0 6839 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479642] [ 6841] 0 6841 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479644] [ 6842] 0 6842 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.479646] [ 6843] 0 6843 178 75 30976 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480800] [ 9051] 0 9051 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480801] [ 9052] 0 9052 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480803] [ 9053] 0 9053 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480804] [ 9054] 0 9054 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480805] [ 9055] 0 9055 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480807] [ 9056] 0 9056 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480808] [ 9057] 0 9057 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480810] [ 9058] 0 9058 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480811] [ 9059] 0 9059 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480813] [ 9060] 0 9060 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480815] [ 9061] 0 9061 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480816] [ 9062] 0 9062 178 75 30720 0 0 msgstress04
Oct 17 05:25:52 10 kernel: [ 305.480818] [ 9063] 0 9063 178 75 30720 0 0 msgstress04
...
Oct 17 05:25:52 10 kernel: [ 305.482178] oom-kill:
Oct 17 05:25:52 10 kernel: [ 305.482242] Out of memory: Killed process 1075 (networkd-dispat) total-vm:111296kB, anon-rss:13120kB, file-rss:128kB, shmem-rss:0kB, UID:0 pgtables:29kB oom_score_adj:0
$ grep Killed /var/log/syslog
Oct 17 05:25:52 10 kernel: [ 305.482242] Out of memory: Killed process 1075 (networkd-dispat) total-vm:111296kB, anon-rss:13120kB, file-rss:128kB, shmem-rss:0kB, UID:0 pgtables:29kB oom_score_adj:0
Oct 17 05:25:52 10 kernel: [ 305.687351] Out of memory: Killed process 1097 (unattended-upgr) total-vm:117824kB, anon-rss:13120kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0
Oct 17 05:26:00 10 kernel: [ 312.817879] Out of memory: Killed process 1507 ((sd-pam)) total-vm:107648kB, anon-rss:8960kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:28kB oom_score_adj:0
Oct 17 05:26:00 10 kernel: [ 312.859713] Out of memory: Killed process 1258 (polkitd) total-vm:239424kB, anon-rss:7616kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:36kB oom_score_adj:0
Oct 17 05:26:52 10 kernel: [ 364.547278] Out of memory: Killed process 14222 (msgstress04) total-vm:11392kB, anon-rss:4800kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0
Oct 17 05:26:52 10 kernel: [ 364.952675] Out of memory: Killed process 14221 (msgstress04) total-vm:11392kB, anon-rss:4800kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0
Oct 17 05:26:52 10 kernel: [ 365.401469] Out of memory: Killed process 13876 (msgstress04) total-vm:11392kB, anon-rss:4800kB, file-rss:64kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0
Oct 17 05:26:53 10 kernel: [ 365.499581] Out of memory: Killed process 13787 (msgstress04) total-vm:11392kB, anon-rss:4800kB, file-rss:448kB, shmem-rss:0kB, UID:0 pgtables:30kB oom_score_adj:0
Memory on this instance:
$ free -mh
total used free shared buff/cache available
Mem: 4.0G 300M 3.5G 7.9M 140M 3.3G
Swap: 0B 0B 0B
This issue is also affecting Focal Openstack PowerPC VM.