loggerhead's revision graph cache may impair performance improvements
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
loggerhead |
Triaged
|
High
|
Unassigned | ||
loggerhead-breezy |
Triaged
|
High
|
Unassigned |
Bug Description
So, for the last few days, I've been investigating how and why we use the revision graph cache, to determine whether or not it can be removed.
There are only two purposes we use the cache for:
1. Translating revision ids to revnos. (This is what _rev_indices is for.)
2. Finding out where a revision id was merged in, and where it was merged from. We also sometimes specifically want to know when it was merged into mainline. (These uses are what _rev_info is for, and they're all essentially one use--figuring out the merge map.)
It turns out that the data that we need for both of those operations is actually *already* being cached by bzrlib:
1. revno info is already cached by get_revision_
2. Calling branch.
So what I was thinking is:
1. Instead of caching all of this data twice, essentially, we should just cache the branch objects themselves in the LRUCache. Since loggerhead doesn't do any write operations to the branches, I imagine it'd be fairly safe to use the same branch object across all our threads.
2. If there turns out to be any performance problem once this is done, then we create three simple additional caches--just a revid-to-revno map, a "where I was merged in" map, and a "where I was merged from" map. These should be sufficiently small data structures that we could store them for 100+ branches. (It's also possible that we would just have to cache them per-request instead of persistently.) Also, they would be far simpler to use and understand than the current cache data structures.
3. We can remove any internal disk caching, and encourage people to use bzr-historycache if they want a speedier loggerhead. bzr-historycache actually caches iter_merge_
4. We could possibly work on bzr-historycache to make it able to append revisions to the cache instead of regenerating it fully whenever the graph changes. (There's already some preliminary code in historycache to account for when it's able to do this, but it can't do it yet.)
Basically, out of this we'd get simplicity and speed by mostly removing code, though altogether it would be a fairly involved project.
summary: |
- remove/refactor loggerhead's revision graph cache + loggerhead's revision graph cache may impair performance improvements |
Changed in loggerhead: | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in loggerhead-breezy: | |
status: | New → Triaged |
importance: | Undecided → High |
Okay, so I talked a bit with mwhudson about that, and if it turns out that branches aren't thread-safe and can't be made thread-safe, then we can create the three simplified caches described in step 2, but only lazily, when they're needed. (That would be a big improvement since we only need the full-graph ones during RevLogUI and RevisionUI.)