building a tree should populate dirstate hashcache
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Confirmed
|
Medium
|
Unassigned |
Bug Description
When running "bzr status" for the first time after creating a new checkout, it seems to take a very long time. For example, when I create a lightweight checkout of a large branch (22000+ files) in a shared repository, the initial checkout takes around 6 minutes. Running "bzr status" then takes *another* 6 minutes to complete. Subsequent statuses take only a few seconds.
I don't think this is simply a result of the OS caching files/directories. After a reboot the first status will take a little longer (up to 30 seconds) but nowhere near as long as 6 minutes.
Looking at filemon during the initial status indicates that it's reading through every file in the tree and rebuilding the dirstate file. I would have thought this wouldn't be necessary given that the "checkout" operation just did the same thing a moment ago and no files have changed.
I've tested this using Bazaar 2.0.1 on Windows XP SP3.
Steps to reproduce:
* Locate a branch with a large number of files (e.g. >10000)
* bzr checkout --lightweight <branch location> checkout
* cd checkout
* bzr status (takes a long time)
* bzr status (very quick)
Even on a branch with a smaller number of files it seems to always rebuild the dirstate on the first status (it just takes much less time to do so!). When I did this on a branch with 500 files I saw the following messages in the log for the "checkout" command - I don't know if they're related to the issue:
5.516 Adding the key (<bzrlib.
16.438 Adding the key (<bzrlib.
Related branches
tags: | added: dirstate transform |
summary: |
- bzr status slow after creating checkout + building a tree should populate dirstate hashcache |
I'm pretty sure this is a duplicate, but I don't remember the number :(.
Anyhow, here is the cause:
We can't cache the stat fingerprint for new files, only files whose stat
value is far enough back in time that file system granularity allows us
to detect if the file is modified subsequently.
As we write to a limbo area it is possible for us to verify the stat
information for files we are about to move into the working tree and
capture that: on a big project (such as yours sound to be) its very
likely that we would be able to get usable fingerprints for much of the
tree doing this.
status confirmed
importance medium
tags transform