TreeTransform.finalize() spends a lot of time deleting stuff that isn't there

Bug #611037 reported by John A Meinel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Low
Unassigned

Bug Description

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I was looking into the lp:gcc-linaro issues because of its size.

While there, I noticed that it ended up spending a fair amount of time
in the 'delete_any()' code during finalize.

In that instance, there are about 69,000 files in the tree. And it looks
like delete_any() is called for each and every one of them. Which means
that it does 70k calls to:

 if os.path.isdir(path):
  ...
 else:
   os.unlink(path)

And in all of these cases, the path doesn't exist. Nor do any of its
containing directories.

I think we could shortcut a lot of this, by testing to see if a
containing directory exists, and if it doesn't, we don't need to test
any of the subdirectories.

We already know some of this information, because we create files in
their containing directory in limbo when the target doesn't exist. (So
that we can just rename the containing dir, rather than renaming every
object independently.)

I think we could probably change the iteration to just check those
objects that exist at the top level in limbo.

In this particular case, it would change the number of operations from
70k, down to about 36.

 affects bzr
 status confirmed
 importance low
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxQkDEACgkQJdeBCYSNAAM9yACeMcKvdFl+anX9IHg+qfxF4x6r
j1QAmgMA+I9Cj0yOqvIB7o6CIH/r+4Rn
=fuR/
-----END PGP SIGNATURE-----

Revision history for this message
John A Meinel (jameinel) wrote :

I also saw a fair amount of time spent before the 'Build phase: adding file contents' was displayed. It is also possible that it is spending a lot of time trying to stat all 70k files before it decides whether or not there is going to be a file that is in conflict. Again, it should be possible to notice that a directory is not present, and thus we don't need to check all possible files underneath that dir.

I don't have profiles to specifically support this, but I did do a breakin during the post-build phase and it trapped on 'isdir()'.

ISTR that some of the os.path functions actually do multiple stats. (maybe it was just us doing os.path.isdir(), os.path.isfile(), etc.)

We might want to fix the _delete_any code to do a stat and check if it even exists first.

Martin Pool (mbp)
tags: added: performance treetransform
Martin Pool (mbp)
tags: added: affects-linaro
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.