TreeTransform.finalize() spends a lot of time deleting stuff that isn't there
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Confirmed
|
Low
|
Unassigned |
Bug Description
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I was looking into the lp:gcc-linaro issues because of its size.
While there, I noticed that it ended up spending a fair amount of time
in the 'delete_any()' code during finalize.
In that instance, there are about 69,000 files in the tree. And it looks
like delete_any() is called for each and every one of them. Which means
that it does 70k calls to:
if os.path.
...
else:
os.unlink(path)
And in all of these cases, the path doesn't exist. Nor do any of its
containing directories.
I think we could shortcut a lot of this, by testing to see if a
containing directory exists, and if it doesn't, we don't need to test
any of the subdirectories.
We already know some of this information, because we create files in
their containing directory in limbo when the target doesn't exist. (So
that we can just rename the containing dir, rather than renaming every
object independently.)
I think we could probably change the iteration to just check those
objects that exist at the top level in limbo.
In this particular case, it would change the number of operations from
70k, down to about 36.
affects bzr
status confirmed
importance low
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://
iEYEARECAAYFAkx
j1QAmgMA+
=fuR/
-----END PGP SIGNATURE-----
tags: | added: performance treetransform |
tags: | added: affects-linaro |
tags: | added: check-for-breezy |
I also saw a fair amount of time spent before the 'Build phase: adding file contents' was displayed. It is also possible that it is spending a lot of time trying to stat all 70k files before it decides whether or not there is going to be a file that is in conflict. Again, it should be possible to notice that a directory is not present, and thus we don't need to check all possible files underneath that dir.
I don't have profiles to specifically support this, but I did do a breakin during the post-build phase and it trapped on 'isdir()'.
ISTR that some of the os.path functions actually do multiple stats. (maybe it was just us doing os.path.isdir(), os.path.isfile(), etc.)
We might want to fix the _delete_any code to do a stat and check if it even exists first.