Comment 4 for bug 721250

Revision history for this message
John A Meinel (jameinel) wrote :

Related to bug #374730 and bug #503071

There are a few conflicting issues. Some of it depends on the average size of changes vs the average size of SUBDIR.

If you look at the callgrind, it is pretty clear that the bulk of the time is spent in _filtered_revision_trees. Which is the point where we expand SUBDIR to all files that are children of SUBDIR. for every revision.

Things that I've tried in the past. Caching that value, based on the tree's file_id=>path map. So if a given revision and its parent have identical tree shape, you don't have to re-compute what children are available.

Right now, the loop looks something like:

  for rev in requested_revisions:
    t = get_tree(rev)
    interesting = find_children_and_direct_parents(t, SUBDIR)
    inv = filter_leaving_only(t, interesting)
    delta =
    prev_inv = inv
    if is_interesting(delta):
      yield rev, delta

The other way to do it, is something like:

  for rev in requested_revisions:
    delta = get_delta(rev, rev.parent_rev)
    if is_interesting(delta, filter_on=SUBDIR):
      yield rev, filter(delta)

Or something along those lines. The former tends to do well when SUBDIR is a very small fraction of the overall tree.

The best way to do it, would be something like:

  for rev in requested_revisions:
    delta = get_partial_delta(rev, rev.parent_rev, filter_on=SUBDIR)
    if delta:
      yield rev, delta

The idea being that we could push the filtering-by-subdir into the code that is computing the changes between two revisions (so it can ignore changes outside of that)

There is enough state involved, that likely we would want a DeltaSearch object, that would be repository specific. So that CHK repos could customize it for specifics of how the data is stored. (Because filter_on=SUBDIR can then know when the tree shape doesn't change between revisions, allowing state to be cached between revisions that we care about.)

Anyway, I have a fair amount of (old) knowledge in this space, but not enough time to focus on it. I would be happy to mentor someone who wants to get into the details.