pack index has no topological locality of reference
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Confirmed
|
Medium
|
Unassigned | ||
Breezy |
Triaged
|
Medium
|
Unassigned |
Bug Description
Pack indices are currently lexographically sorted. They provide
reasonable partial access performance, but we would have many fewer
round trips if they were topologically sorted. A candidate design for
this is:
---
prelude (as current - key count, reference list length, plus the used
hash length etc)
compressed hash table
key data (as currently formatted, possibly with binary lengths rather
than ascii)
---
The compressed hash table will contain a list:
[hash location]*
Where:
hash is the first N bytes of the sha/md5 of the serialised key.
location is a binary location in the file of the key with that hash.
Collisions are handled by repeating the hash with different locations.
(That is, linear chaining).
The hash table is sorted by the hash.
The number of used bytes in the hash table is determined by generating
the entire table and doing a single pass across it to determine how many
bits are needed to balance collisions vs size. (The cost of a collision
is having to read two locations to determine which has the key we want).
the key data would be topologically sorted by one of the key reference
lists - probably the graph one, but possibly the compression one.
Because it's easier to read additional keys after the one we jump to
(because they are internally delimited) we'd probably want children of a
key K to come after K in the file.
This will make a local and remote performance difference.
affects bzr
tag packs
--
GPG key available at: <http://
Changed in bzr: | |
importance: | Undecided → Medium |
status: | New → Confirmed |
tags: | added: check-for-breezy |
tags: | removed: check-for-breezy |
tags: | added: performance |
Changed in brz: | |
status: | New → Triaged |
importance: | Undecided → Medium |