building a tree should populate dirstate hashcache

Bug #488172 reported by Gareth White
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned

Bug Description

When running "bzr status" for the first time after creating a new checkout, it seems to take a very long time. For example, when I create a lightweight checkout of a large branch (22000+ files) in a shared repository, the initial checkout takes around 6 minutes. Running "bzr status" then takes *another* 6 minutes to complete. Subsequent statuses take only a few seconds.

I don't think this is simply a result of the OS caching files/directories. After a reboot the first status will take a little longer (up to 30 seconds) but nowhere near as long as 6 minutes.

Looking at filemon during the initial status indicates that it's reading through every file in the tree and rebuilding the dirstate file. I would have thought this wouldn't be necessary given that the "checkout" operation just did the same thing a moment ago and no files have changed.

I've tested this using Bazaar 2.0.1 on Windows XP SP3.

Steps to reproduce:
* Locate a branch with a large number of files (e.g. >10000)
* bzr checkout --lightweight <branch location> checkout
* cd checkout
* bzr status (takes a long time)
* bzr status (very quick)

Even on a branch with a smaller number of files it seems to always rebuild the dirstate on the first status (it just takes much less time to do so!). When I did this on a branch with 500 files I saw the following messages in the log for the "checkout" command - I don't know if they're related to the issue:

5.516 Adding the key (<bzrlib.btree_index.BTreeGraphIndex object at 0x01558DD0>, 1360142, 42699765) to an LRUSizeCache failed. value 85901221 is too big to fit in a the cache with size 41943040 52428800
16.438 Adding the key (<bzrlib.btree_index.BTreeGraphIndex object at 0x01558DD0>, 105598514, 23475379) to an LRUSizeCache failed. value 47154148 is too big to fit in a the cache with size 41943040 52428800

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 488172] [NEW] bzr status slow after creating checkout

I'm pretty sure this is a duplicate, but I don't remember the number :(.

Anyhow, here is the cause:

We can't cache the stat fingerprint for new files, only files whose stat
value is far enough back in time that file system granularity allows us
to detect if the file is modified subsequently.

As we write to a limbo area it is possible for us to verify the stat
information for files we are about to move into the working tree and
capture that: on a big project (such as yours sound to be) its very
likely that we would be able to get usable fingerprints for much of the
tree doing this.

 status confirmed
 importance medium
 tags transform

Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Gareth White (gwhite-deactivatedaccount) wrote : Re: bzr status slow after creating checkout

Thanks for the explanation. Is there any documentation on how the dirstate cache works and when it's updated? We've had several cases here where bzr decided to regenerate the whole cache even after the initial "status" and it would be good to know what we can do help avoid this when possible (since it can take a while).

BTW, there are some comments at the bottom of 380202 which sound similar to this issue but I can't find a separate bug for it.

It would also be great if there was a progress bar for "status" just so we knew it was doing something - I'll enter a separate bug for that!

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 488172] Re: bzr status slow after creating checkout

2009/11/26 Gareth White <email address hidden>:
> Thanks for the explanation. Is there any documentation on how the
> dirstate cache works and when it's updated?

There is some in the developer docs and some comments in the source,
and you can find some more bugs relating to dirstate caching.

I thought there was already a bug for dirstate being filled when it's
created, but I can't find one. So this can be it.

> We've had several cases here
> where bzr decided to regenerate the whole cache even after the initial
> "status" and it would be good to know what we can do help avoid this
> when possible (since it can take a while).

If you can characterize the other cases that would help.

--
Martin <http://launchpad.net/~mbp/>

Martin Pool (mbp)
tags: added: dirstate transform
summary: - bzr status slow after creating checkout
+ building a tree should populate dirstate hashcache
Revision history for this message
Gareth White (gwhite-deactivatedaccount) wrote :

> If you can characterize the other cases that would help.

One case was when the checkout had been moved to a different location (I believe there's already an issue for this). Another case was when the checkout hadn't been used for a few weeks - the next time "bzr status" was used it decided to rebuild the dirstate. If it happens again I'll try to get more information and raise a separate bug.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 488172] Re: building a tree should populate dirstate hashcache

2009/11/26 Gareth White <email address hidden>:
>> If you can characterize the other cases that would help.
>
> One case was when the checkout had been moved to a different location (I
> believe there's already an issue for this).

If it was copied to a different filesystem then rehashing it was
probably unavoidable.

> Another case was when the
> checkout hadn't been used for a few weeks - the next time "bzr status"
> was used it decided to rebuild the dirstate. If it happens again I'll
> try to get more information and raise a separate bug.

There's no 'staleness' criteria other than whether the files have been changed.

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message
Aaron Bentley (abentley) wrote :

I've done some work on the hashcache-from-limbo approach, but it didn't improve performance. See linked branch.

Revision history for this message
Ken Tang (ktlb) wrote :

Can the priority of this be elevated. We need a fix or workaround.
Using Bazaar 2.0.0 we seem to be stuck on this problem. We have a critical branch with lots of files and file changes that we cannot check out as it stalls partway through the "Build phase:Adding file contents". A bzr status on the repository also stalls giving a similar error in the bzr log file.

Here is a sample output of bzr log:
42.436 Adding the key (<bzrlib.btree_index.BTreeGraphIndex object at 0x0150C110>, 1429460, 8584875) to an LRUSizeCache failed. value 64438539 is too big to fit in a the cache with size 41943040 52428800

55.249 Adding the key (<bzrlib.btree_index.BTreeGraphIndex object at 0x0150C110>, 16887081, 43440320) to an LRUSizeCache failed. value 87426735 is too big to fit in a the cache with size 41943040 52428800

1113.447 return code 0

[10256] 2010-03-04 19:56:24.390 INFO: Client disconnected from pipe - closing connection

229.433 Adding the key (<bzrlib.btree_index.BTreeGraphIndex object at 0x015483F0>, 6453685, 8559445) to an LRUSizeCache failed. value 64034471 is too big to fit in a the cache with size 41943040 52428800

Revision history for this message
Andrew Bennetts (spiv) wrote :

Ken: Please file a new bug report. That appears to be an unrelated problem.

Revision history for this message
Martin Pool (mbp) wrote :

Hi Ken,

Are you saying that it actually permanently hangs, or just that it takes a long time to update?

If it's hanging then please press Ctrl-C and attach the traceback that will be in .bzr.log.

Revision history for this message
Gareth White (gwhite-deactivatedaccount) wrote :

I think the original bug here may be a duplicate of 740932. Perhaps someone can confirm?

Revision history for this message
Martin Packman (gz) wrote :

Thanks Gareth, indeed it looks like they are about the same problem, which should be fixed in 2.4 - I expect your original case now has both `bzr st` runs similarly fast?

Revision history for this message
Gareth White (gwhite-deactivatedaccount) wrote :

I'd think it's pretty likely. Unfortunately I'm not really in a position to test it right now!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.