Bazaar

pack index has no topological locality of reference

Bug #165309 reported by Robert Collins on 2007-11-27

Affects		Status	Importance	Assigned to	Milestone
	Bazaar	Confirmed	Medium	Unassigned
	Breezy	Triaged	Medium	Unassigned

Bug Description

Pack indices are currently lexographically sorted. They provide
reasonable partial access performance, but we would have many fewer
round trips if they were topologically sorted. A candidate design for
this is:
---
prelude (as current - key count, reference list length, plus the used
hash length etc)
compressed hash table
key data (as currently formatted, possibly with binary lengths rather
than ascii)
---

The compressed hash table will contain a list:
[hash location]*

Where:
hash is the first N bytes of the sha/md5 of the serialised key.
location is a binary location in the file of the key with that hash.
Collisions are handled by repeating the hash with different locations.
(That is, linear chaining).

The hash table is sorted by the hash.
The number of used bytes in the hash table is determined by generating
the entire table and doing a single pass across it to determine how many
bits are needed to balance collisions vs size. (The cost of a collision
is having to read two locations to determine which has the key we want).

the key data would be topologically sorted by one of the key reference
lists - probably the graph one, but possibly the compression one.

Because it's easier to read additional keys after the one we jump to
(because they are internally delimited) we'd probably want children of a
key K to come after K in the file.

This will make a local and remote performance difference.

affects bzr
tag packs

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Tags:

Martin Pool (mbp) on 2007-11-27

Changed in bzr:
importance:	Undecided → Medium
status:	New → Confirmed

Jelmer Vernooij (jelmer) on 2017-11-08

tags:

added: check-for-breezy

Jelmer Vernooij (jelmer) on 2020-03-22

tags:	removed: check-for-breezy
tags:	added: performance
Changed in brz:
status:	New → Triaged
importance:	Undecided → Medium

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.