So the crucial things here are:
- we want every 'revision in a repository' to be able to produce a delta-stream for it, for fetching, so that we know what texts to copy, and what texts to error on if they are absent.
- that means we depend on having *enough* inventory-data-for-parents present to do a set difference against the parent revisions (and ghost revisions mean we have more file texts present to compensate).
- we don't strictly need to require that a repository have the full inventory for a revision that is present - we can expect stacking to take care of that (stacking just can't be used during 'fetch').
So there are two issues here:
- a new stacked repository currently has to copy up the full inventory when its first revision is added, because thats what we do today.
- when we copy up the inventory data for adjacent parents, we take a very simple approach to what is copied (the main thing is simply that noone is focusing on this part of push performance at the moment: we always knew we had more to do - the streaming API has the ability to incrementally request more data - we do multiple round trips already; the main thing is to not have explosive scaling on the round trip counts). Falling back to VFS will suck immensely - more than a 1MB stream.
Until *both* of these are fixed, the first commit pushed to a new stacked branch will be size(inventory) not size(delta). Once the second is fixed, merges pushed to existing stacked branches will be much smaller (because the inventories for the merged revisions will be able to not be 'fully copied'), but we'll still want a full inventory for every revision.
So the crucial things here are: data-for- parents present to do a set difference against the parent revisions (and ghost revisions mean we have more file texts present to compensate).
- we want every 'revision in a repository' to be able to produce a delta-stream for it, for fetching, so that we know what texts to copy, and what texts to error on if they are absent.
- that means we depend on having *enough* inventory-
- we don't strictly need to require that a repository have the full inventory for a revision that is present - we can expect stacking to take care of that (stacking just can't be used during 'fetch').
So there are two issues here:
- a new stacked repository currently has to copy up the full inventory when its first revision is added, because thats what we do today.
- when we copy up the inventory data for adjacent parents, we take a very simple approach to what is copied (the main thing is simply that noone is focusing on this part of push performance at the moment: we always knew we had more to do - the streaming API has the ability to incrementally request more data - we do multiple round trips already; the main thing is to not have explosive scaling on the round trip counts). Falling back to VFS will suck immensely - more than a 1MB stream.
Until *both* of these are fixed, the first commit pushed to a new stacked branch will be size(inventory) not size(delta). Once the second is fixed, merges pushed to existing stacked branches will be much smaller (because the inventories for the merged revisions will be able to not be 'fully copied'), but we'll still want a full inventory for every revision.