chk groups possibly need more clustering
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Confirmed
|
High
|
Unassigned | ||
Breezy |
Triaged
|
Medium
|
Unassigned |
Bug Description
Split out from bug #402114
The current CHK streaming code creates lots of mini-streams. Essentially one for every search-key prefix. This has the nice property that similar CHK pages are put together into groups and chk pages that are unlikely to compress well together are not comingled.
The main downside is that something like dumb fetch ends up downloading each chk group separately, which can be a large overhead.
Note that if we just fix bug #402657 (buffer multiple groups to be read at the same time) this may not be an issue.
What needs to be evaluated is:
1) If we start grouping more chk pages into a larger group, what is the effect on overall compression. (It is expected that compression will go down, as the number of regions that can be copied will not increase, but the offset into the group will, causing the variable width offset field to consume more bytes per reference.) The expected benefit is that something like dumb transport copying doesn't need to consider as many groups. Also having fewer groups means better compression of '.cix' since more of the content is the same.
2) What is the effect on text extraction. Initial results I was testing a while ago said that combining too many chk pages into a single gc group could cause significant zlib decompression overhead. If what you need is 200 bytes in the middle of 2MB, you have to decompress 1MB of zlib data to get at it.
3) Note I also looked at "pack recent" to move chk pages that are recently referenced to be grouped separately from 'very old' chk pages. This would probably further exacerbate the problem, though again fixing bug #402657 may cause it to not matter. (There was a modest win for something like 'bzr ls -r -1' under those conditions, which would impact the 'bzr checkout' times as well.)
Changed in bzr: | |
importance: | Medium → High |
Changed in bzr: | |
status: | Triaged → Confirmed |
tags: | added: check-for-breezy |
tags: |
added: performance removed: check-for-breezy |
Changed in brz: | |
status: | New → Triaged |
importance: | Undecided → Medium |
tags: | added: chk |