[SRU] the leak in bluestore_cache_other mempool
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
New
|
Undecided
|
Unassigned | ||
Ussuri |
New
|
Undecided
|
Unassigned | ||
Wallaby |
Fix Released
|
Undecided
|
Unassigned | ||
Xena |
Fix Released
|
Undecided
|
Unassigned | ||
Yoga |
Fix Released
|
Undecided
|
Unassigned | ||
ceph (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Committed
|
Undecided
|
Unassigned | ||
Jammy |
Fix Released
|
Undecided
|
Unassigned | ||
Kinetic |
Fix Released
|
Undecided
|
Unassigned | ||
Lunar |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
This issue has been observed from ceph octopus 15.2.16.
Bluestore's onode cache might be completely disabled because of the entry leak happened in bluestore_
Below log shows the cache's maximum size had become 0:
------
2022-10-
-------
The dump_mempools bluestore_
---------------
"bluestore_
"items": 3,
"bytes": 1848
},
"bluestore_
"items": 13973,
"bytes": 111338
},
"bluestore_
"items": 5601156,
"bytes": 224152996
},
"bluestore_Buffer": {
"items": 1,
"bytes": 96
},
"bluestore_Extent": {
"items": 20,
"bytes": 960
},
"bluestore_Blob": {
"items": 8,
"bytes": 832
},
"bluestore_
"items": 8,
"bytes": 896
},
--------------
This could cause the io experiencing high latency as the 0 sized cache will significantly increasing the need to fetch the meta data from rocksdb or even from disk.
Another impact is that this can significantly increase the possibility of hitting the race condition in Onode::put [2], which will crash the osds, especially in large cluster.
[Test Case]
1. Deploy a 15.2.16 ceph cluster
2. Create enough rbd images to spread all over the OSDs
3. Stressingthem with fio 4k randwrite workload in parallel until the OSDs got enough onodes in its cache (more than 60k onodes and you'll see the bluestore_
fio --name=randwrite --rw=randwrite --ioengine=rbd --bs=4k --direct=1 --numjobs=1 --size=100G --iodepth=16 --clientname=admin --pool=bench --rbdname=test
4. Shrink the pg_num to a very low number so that pgs per osd is around 1.
Once the shrink finished
5. Enable debug_bluestore
[Potential Regression]
The patch correct the apparent wrong AU calculation of the bluestore_
[Other Info]
The patch[1] had been backported to upstream Pacific and Quincy, but not Octopus.
Pacific is going to have it on 16.2.11 which is still pending.
Quincy already had it in 17.2.4
We'll need to backport this fix to Octopus.
description: | updated |
summary: |
- the leak in bluestore_cache_other mempool + [SRU] the leak in bluestore_cache_other mempool |
tags: | added: sts-sru-needed |
tags: | added: seg |
affects: | xena → cloud-archive |
no longer affects: | cloud-archive/victoria |
Changed in ceph (Ubuntu Lunar): | |
status: | Confirmed → Fix Released |
Changed in ceph (Ubuntu Kinetic): | |
status: | New → Fix Released |
Changed in ceph (Ubuntu Jammy): | |
status: | New → Fix Released |
tags: | removed: patch |
Changed in ceph (Ubuntu): | |
status: | Confirmed → Fix Released |
This is the debdiff based on focal proposed 15.2.17