Memory leak on large server

Bug #2023143 reported by Ariel E
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Hi,
We are trying to diagnose a kernel memory look on a production Ubuntu 22.04.2 LTS.
We have tried several official Ubuntu kernels, 5.15aws, 5.19aws and now even 6.2.0-1004-aws (all Ubuntu signed):
```
# cat /proc/version_signature
Ubuntu 6.2.0-1004.4-aws 6.2.6
```

This is a production server so we'll appreciate any and all help diagnosing and solving this issue!

The server is an u-112 instance with 12TB RAM, and is losing 1TB+ of memory a day to a kernel leak.
For example, currently with an uptime of 3.5 days, we have 1.8Ti available, however RSS+slabs is only 4.1TB.

all active process together take about 4TB of RAM (`ps -eo rss | awk 'BEGIN {x=0} {x = x + $1} END {print x}'` gives 4088636708).

From slabtop we see about 100GB are consumed by slab (`slabtop -o -s t | head`: )
```
 Active / Total Objects (% used) : 303580174 / 332642344 (91.3%)
 Active / Total Slabs (% used) : 6697552 / 6697552 (100.0%)
 Active / Total Caches (% used) : 158 / 215 (73.5%)
 Active / Total Size (% used) : 112801663.93K / 121442845.45K (92.9%)
 Minimum / Average / Maximum Object : 0.01K / 0.36K / 16.00K

  OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
67537280 59696907 88% 0.03K 527635 128 2110540K kmalloc-32
65247564 65241398 99% 0.31K 1279364 51 20469824K arc_buf_hdr_t_full
58270446 58040685 99% 0.10K 747057 78 5976456K abd_t
16697268 13731405 82% 0.38K 397554 42 6360864K dmu_buf_impl_t
15982912 10366686 64% 0.50K 249733 64 7991456K kmalloc-512
14975616 11605380 77% 0.06K 233994 64 935976K kmalloc-64
```

In /proc/meminfo:
```
MemTotal: 12656421408 kB
MemFree: 1975976204 kB
MemAvailable: 1968415088 kB
Buffers: 1087956 kB
Cached: 101168004 kB
SwapCached: 17912340 kB
Active: 101022084 kB
Inactive: 4129984264 kB
Active(anon): 94623216 kB
Inactive(anon): 4104673512 kB
Active(file): 6398868 kB
Inactive(file): 25310752 kB
Unevictable: 338908 kB
Mlocked: 332132 kB
SwapTotal: 4294967292 kB
SwapFree: 3500705532 kB
Zswap: 0 kB
Zswapped: 0 kB
Dirty: 2908 kB
Writeback: 0 kB
AnonPages: 4123489132 kB
Mapped: 3761620 kB
Shmem: 70756156 kB
KReclaimable: 10319220 kB
Slab: 122355620 kB
SReclaimable: 10319220 kB
SUnreclaim: 112036400 kB
KernelStack: 1793296 kB
PageTables: 21748556 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 10623177996 kB
Committed_AS: 6775476544 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 296984480 kB
VmallocChunk: 0 kB
Percpu: 1326080 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1630980096 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 2056036 kB
DirectMap2M: 40935424 kB
DirectMap1G: 12814647296 kB
```

Its not a tmpfs/shm fs issue either:
```
df -h | grep -E 'tmpfs|shm'
tmpfs 256G 70G 187G 27% /dev/shm
tmpfs 256G 3.4M 256G 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 8.0G 24K 8.0G 1% /run/user/10102
tmpfs 8.0G 24K 8.0G 1% /run/user/1002
tmpfs 8.0G 24K 8.0G 1% /run/user/10030
tmpfs 8.0G 24K 8.0G 1% /run/user/10194
tmpfs 8.0G 24K 8.0G 1% /run/user/10200
tmpfs 8.0G 24K 8.0G 1% /run/user/10136
tmpfs 8.0G 24K 8.0G 1% /run/user/10198
tmpfs 8.0G 24K 8.0G 1% /run/user/10143
tmpfs 8.0G 24K 8.0G 1% /run/user/10188
tmpfs 8.0G 24K 8.0G 1% /run/user/10124
tmpfs 8.0G 24K 8.0G 1% /run/user/10174
tmpfs 8.0G 24K 8.0G 1% /run/user/10165
tmpfs 8.0G 24K 8.0G 1% /run/user/10197
tmpfs 8.0G 24K 8.0G 1% /run/user/10183
tmpfs 8.0G 24K 8.0G 1% /run/user/10033
tmpfs 8.0G 24K 8.0G 1% /run/user/10023
tmpfs 8.0G 24K 8.0G 1% /run/user/10133
tmpfs 8.0G 24K 8.0G 1% /run/user/10185
tmpfs 8.0G 24K 8.0G 1% /run/user/10201
tmpfs 8.0G 24K 8.0G 1% /run/user/1004
tmpfs 8.0G 24K 8.0G 1% /run/user/10014
```
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
CRDA: N/A
CasperMD5CheckResult: unknown
DistroRelease: Ubuntu 22.04
Ec2AMI: ami-08c40ec9ead489470
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1d
Ec2InstanceType: u-12tb1.112xlarge
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci: Error: [Errno 2] No such file or directory: 'lspci'
Lspci-vt: Error: [Errno 2] No such file or directory: 'lspci'
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t:

Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
MachineType: Amazon EC2 u-12tb1.112xlarge
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 LC_CTYPE=C.UTF-8
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=C.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-1004-aws root=PARTUUID=cbb5015f-ca94-467b-91ae-cce97828a042 ro quiet mitigations=off console=tty1 console=ttyS0 nvme_core.io_timeout=4294967295 panic=-1
ProcVersionSignature: Ubuntu 6.2.0-1004.4-aws 6.2.6
RelatedPackageVersions:
 linux-restricted-modules-6.2.0-1004-aws N/A
 linux-backports-modules-6.2.0-1004-aws N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy ec2-images
Uname: Linux 6.2.0-1004-aws x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: False
dmi.bios.date: 10/16/2017
dmi.bios.release: 1.0
dmi.bios.vendor: Amazon EC2
dmi.bios.version: 1.0
dmi.board.asset.tag: i-0b8914fe51e3d7555
dmi.board.vendor: Amazon EC2
dmi.chassis.asset.tag: Amazon EC2
dmi.chassis.type: 1
dmi.chassis.vendor: Amazon EC2
dmi.modalias: dmi:bvnAmazonEC2:bvr1.0:bd10/16/2017:br1.0:svnAmazonEC2:pnu-12tb1.112xlarge:pvr:rvnAmazonEC2:rn:rvr:cvnAmazonEC2:ct1:cvr:sku:
dmi.product.name: u-12tb1.112xlarge
dmi.sys.vendor: Amazon EC2

Revision history for this message
Ariel E (arielpagaya) wrote : AudioDevicesInUse.txt

apport information

tags: added: apport-collected ec2-images jammy
description: updated
Revision history for this message
Ariel E (arielpagaya) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Ariel E (arielpagaya) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Ariel E (arielpagaya) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Ariel E (arielpagaya) wrote : ProcModules.txt

apport information

Revision history for this message
Ariel E (arielpagaya) wrote : UdevDb.txt

apport information

Revision history for this message
Ariel E (arielpagaya) wrote : WifiSyslog.txt

apport information

Revision history for this message
Ariel E (arielpagaya) wrote : acpidump.txt

apport information

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2023143

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Ariel E (arielpagaya)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.