Comment 0 for bug 2023143

Revision history for this message
Ariel E (arielpagaya) wrote :

Hi,
We are trying to diagnose a kernel memory look on a production Ubuntu 22.04.2 LTS.
We have tried several official Ubuntu kernels, 5.15aws, 5.19aws and now even 6.2.0-1004-aws (all Ubuntu signed):
```
# cat /proc/version_signature
Ubuntu 6.2.0-1004.4-aws 6.2.6
```

This is a production server so we'll appreciate any and all help diagnosing and solving this issue!

The server is an u-112 instance with 12TB RAM, and is losing 1TB+ of memory a day to a kernel leak.
For example, currently with an uptime of 3.5 days, we have 1.8Ti available, however RSS+slabs is only 4.1TB.

all active process together take about 4TB of RAM (`ps -eo rss | awk 'BEGIN {x=0} {x = x + $1} END {print x}'` gives 4088636708).

From slabtop we see about 100GB are consumed by slab (`slabtop -o -s t | head`: )
```
 Active / Total Objects (% used) : 303580174 / 332642344 (91.3%)
 Active / Total Slabs (% used) : 6697552 / 6697552 (100.0%)
 Active / Total Caches (% used) : 158 / 215 (73.5%)
 Active / Total Size (% used) : 112801663.93K / 121442845.45K (92.9%)
 Minimum / Average / Maximum Object : 0.01K / 0.36K / 16.00K

  OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
67537280 59696907 88% 0.03K 527635 128 2110540K kmalloc-32
65247564 65241398 99% 0.31K 1279364 51 20469824K arc_buf_hdr_t_full
58270446 58040685 99% 0.10K 747057 78 5976456K abd_t
16697268 13731405 82% 0.38K 397554 42 6360864K dmu_buf_impl_t
15982912 10366686 64% 0.50K 249733 64 7991456K kmalloc-512
14975616 11605380 77% 0.06K 233994 64 935976K kmalloc-64
```

In /proc/meminfo:
```
MemTotal: 12656421408 kB
MemFree: 1975976204 kB
MemAvailable: 1968415088 kB
Buffers: 1087956 kB
Cached: 101168004 kB
SwapCached: 17912340 kB
Active: 101022084 kB
Inactive: 4129984264 kB
Active(anon): 94623216 kB
Inactive(anon): 4104673512 kB
Active(file): 6398868 kB
Inactive(file): 25310752 kB
Unevictable: 338908 kB
Mlocked: 332132 kB
SwapTotal: 4294967292 kB
SwapFree: 3500705532 kB
Zswap: 0 kB
Zswapped: 0 kB
Dirty: 2908 kB
Writeback: 0 kB
AnonPages: 4123489132 kB
Mapped: 3761620 kB
Shmem: 70756156 kB
KReclaimable: 10319220 kB
Slab: 122355620 kB
SReclaimable: 10319220 kB
SUnreclaim: 112036400 kB
KernelStack: 1793296 kB
PageTables: 21748556 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 10623177996 kB
Committed_AS: 6775476544 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 296984480 kB
VmallocChunk: 0 kB
Percpu: 1326080 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1630980096 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 2056036 kB
DirectMap2M: 40935424 kB
DirectMap1G: 12814647296 kB
```

Its not a tmpfs/shm fs issue either:
```
df -h | grep -E 'tmpfs|shm'
tmpfs 256G 70G 187G 27% /dev/shm
tmpfs 256G 3.4M 256G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 8.0G 24K 8.0G 1% /run/user/10102
tmpfs 8.0G 24K 8.0G 1% /run/user/1002
tmpfs 8.0G 24K 8.0G 1% /run/user/10030
tmpfs 8.0G 24K 8.0G 1% /run/user/10194
tmpfs 8.0G 24K 8.0G 1% /run/user/10200
tmpfs 8.0G 24K 8.0G 1% /run/user/10136
tmpfs 8.0G 24K 8.0G 1% /run/user/10198
tmpfs 8.0G 24K 8.0G 1% /run/user/10143
tmpfs 8.0G 24K 8.0G 1% /run/user/10188
tmpfs 8.0G 24K 8.0G 1% /run/user/10124
tmpfs 8.0G 24K 8.0G 1% /run/user/10174
tmpfs 8.0G 24K 8.0G 1% /run/user/10165
tmpfs 8.0G 24K 8.0G 1% /run/user/10197
tmpfs 8.0G 24K 8.0G 1% /run/user/10183
tmpfs 8.0G 24K 8.0G 1% /run/user/10033
tmpfs 8.0G 24K 8.0G 1% /run/user/10023
tmpfs 8.0G 24K 8.0G 1% /run/user/10133
tmpfs 8.0G 24K 8.0G 1% /run/user/10185
tmpfs 8.0G 24K 8.0G 1% /run/user/10201
tmpfs 8.0G 24K 8.0G 1% /run/user/1004
tmpfs 8.0G 24K 8.0G 1% /run/user/10014
```