Discovering tags as part of a push back to a svn repository takes too long

Bug #592981 reported by Max Bowsher
This bug report is a duplicate of:  Bug #520694: decent cache format. Edit Remove
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar Subversion Plugin
New
Undecided
Unassigned

Bug Description

I have a large svn repository (~410000 revisions, though the individual project I'm working with only has 7728 bzr revisions). As part of a push, there's a 'discovering tags' phase, which is prohibitively time consuming unless the entire TDB bzr-svn cache is in the kernel's disk cache (it's vastly faster to cat the cache.tdb to /dev/null and then run the bzr push).

Here is a backtrace during the operation
(Pdb) bt
  /home/maxb/wc/bzr/bzr/2.1/bzr(142)<module>()
-> exit_val = bzrlib.commands.main()
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/commands.py(1133)main()
-> ret = run_bzr_catch_errors(argv)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/commands.py(1148)run_bzr_catch_errors()
-> return exception_to_return_code(run_bzr, argv)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/commands.py(853)exception_to_return_code()
-> return the_callable(*args, **kwargs)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/commands.py(1055)run_bzr()
-> ret = run(*run_argv)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/commands.py(661)run_argv_aliases()
-> return self.run_direct(**all_cmd_args)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/commands.py(665)run_direct()
-> return self._operation.run_simple(*args, **kwargs)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/cleanup.py(122)run_simple()
-> self.cleanups, self.func, *args, **kwargs)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/cleanup.py(156)_do_with_cleanups()
-> result = func(*args, **kwargs)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/builtins.py(1146)run()
-> use_existing_dir=use_existing_dir)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/push.py(141)_show_push_branch()
-> remember, create_prefix)
  /home/maxb/.bazaar/plugins/svn/remote.py(250)push_branch()
-> overwrite=overwrite)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/branch.py(971)push()
-> *args, **kwargs)
  /home/maxb/.bazaar/plugins/svn/branch.py(1011)push()
-> result.tag_conflicts = self.update_tags(overwrite)
  /home/maxb/.bazaar/plugins/svn/branch.py(996)update_tags()
-> return self.source.tags.merge_to(self.target.tags, overwrite)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/tag.py(210)merge_to()
-> dest_dict = to_tags.get_tag_dict()
  /home/maxb/.bazaar/plugins/svn/tags.py(285)get_tag_dict()
-> tag_revmetas = self._get_tag_dict_revmeta()
  /home/maxb/.bazaar/plugins/svn/tags.py(245)_get_tag_dict_revmeta()
-> revnum=self.branch._revnum)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/decorators.py(140)read_locked()
-> result = unbound(self, *args, **kwargs)
  /home/maxb/.bazaar/plugins/svn/repository.py(1156)find_tags()
-> to_revnum=revnum)
  /home/maxb/wc/bzr/bzr/2.1/bzrlib/decorators.py(140)read_locked()
-> result = unbound(self, *args, **kwargs)
  /home/maxb/.bazaar/plugins/svn/repository.py(1104)find_tags_between()
-> for kind, item in self._revmeta_provider.iter_all_changes(layout, mapping.is_branch_or_tag, to_revnum, from_revnum, project=project):
  /home/maxb/.bazaar/plugins/svn/revmeta.py(1419)iter_all_changes()
-> for kind, item in browser:
  /home/maxb/.bazaar/plugins/svn/revmeta.py(927)next()
-> return self.it()
  /home/maxb/.bazaar/plugins/svn/revmeta.py(1118)next()
-> ret = self._iter.next()
  /home/maxb/.bazaar/plugins/svn/revmeta.py(1127)do()
-> for (paths, revnum, revprops) in self._iter_log:
  /home/maxb/.bazaar/plugins/svn/logwalker.py(74)iter_changes()
-> if path == "" or changes.changes_path(revpaths, path, True):
  /home/maxb/.bazaar/plugins/svn/changes.py(87)changes_path()
-> if path_is_child(path, p):
  /home/maxb/.bazaar/plugins/svn/changes.py(27)path_is_child()
-> path.startswith(branch_path+"/"))
> /home/maxb/wc/bzr/bzr/2.1/bzrlib/breakin.py(41)_debug()
-> signal.signal(_breakin_signal_number, _debug)

Revision history for this message
Max Bowsher (maxb) wrote :

I'm not sure this can really be said to be a duplicate of "decent cache format". Whilst that might ameliorate the situation a bit, I think the real problem here is that bzr-svn is doing a scan of the entire repository, not scoping its search correctly to the single project involved.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

bzr-svn doesn't do a scan of the entire repository, it asks the cache for those revisions that changed under a particular path (the path for your project), and the cache has to process its full contents to find those entries. A smarter cache format will eliminate that need.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.