The linked branch, lp:~spiv/bzr/stacked-fetch-same-chks, adds a failing test for this case (or at least a case *like* this). The challenge is figuring out what the fix should be.
This is a pretty fiddly problem, but I'll try to explain it as clearly and concisely as possible. When fetching multiple heads from a stacked repository via a smart server the assumptions we use to stream only O(changes) data rather O(tree) data aren't necessarily correct. The assumptions is basically “we can omit transferring records that are unchanged in the parent revisions of the set we are sending”, and the justification is that the parent will be present anyway: we assume another fetch in the chain of stacked fetches will include it (or that the target has already got that fetch anyway).
The problem is that justification isn't true: the other fetches in the stacked chain, following the same assumption, can exclude the same CHK records, so it is possible they are never sent. It is possible to construct fetches that always have ‘present parent inventories’ at every step in the stacking chain, by starting with a fetch with multiple heads. e.g. a revision graph like:
* D has parents: C, A
* C has parents: B
* B has parents: A
* A has parents: (none)
And then fetch [D,A] from the one repo (with C as present parent inventory), and then [C,B] from the next (and A is present, so is present parent inventory). If A and C share some records, both fetches will conclude they can omit those records. That then triggers BzrCheckError during commit_write_group (thank goodness for that integrity check!).
Examples of records that can be omitted by this erroneous logic:
* the parent_id_basename_to_file_id map can easily be shared between multiple revisions ('bzr revert -r -2 && bzr commit'), which shares the CHK root key
* an id_to_entry map with some LeafNodes in common (e.g. it's possible to have all revisions sharing a LeafNode with just the tree root, and have that leaf be the same in all revisions).
* text to sends are determined by the id_to_entry records we send, so if any id_to_entry nodes are missing, then so are some texts.
The challenge is how we can fix this while still sending only O(changes) data (which is a key benefit of using stacking). What doesn't work:
* simply re-requesting those inventories in the 'fix missing inventories' step, because that doesn't cause texts to be transferred (perhaps if we allowed a *third* fetch round to fixup those missing texts then maybe that would work? Ugh.)
* discarding known-interesting CHK roots from the uninteresting-roots set before calling iter_interesting_nodes: that still allows shared non-root CHK nodes to be omitted.
* dropping the filtering of “uninteresting” CHK nodes: that would regress to sending O(tree)-sized data.
The linked branch, lp:~spiv/bzr/stacked-fetch-same-chks, adds a failing test for this case (or at least a case *like* this). The challenge is figuring out what the fix should be.
This is a pretty fiddly problem, but I'll try to explain it as clearly and concisely as possible. When fetching multiple heads from a stacked repository via a smart server the assumptions we use to stream only O(changes) data rather O(tree) data aren't necessarily correct. The assumptions is basically “we can omit transferring records that are unchanged in the parent revisions of the set we are sending”, and the justification is that the parent will be present anyway: we assume another fetch in the chain of stacked fetches will include it (or that the target has already got that fetch anyway).
The problem is that justification isn't true: the other fetches in the stacked chain, following the same assumption, can exclude the same CHK records, so it is possible they are never sent. It is possible to construct fetches that always have ‘present parent inventories’ at every step in the stacking chain, by starting with a fetch with multiple heads. e.g. a revision graph like:
* D has parents: C, A
* C has parents: B
* B has parents: A
* A has parents: (none)
And then fetch [D,A] from the one repo (with C as present parent inventory), and then [C,B] from the next (and A is present, so is present parent inventory). If A and C share some records, both fetches will conclude they can omit those records. That then triggers BzrCheckError during commit_write_group (thank goodness for that integrity check!).
Examples of records that can be omitted by this erroneous logic: id_basename_ to_file_ id map can easily be shared between multiple revisions ('bzr revert -r -2 && bzr commit'), which shares the CHK root key
* the parent_
* an id_to_entry map with some LeafNodes in common (e.g. it's possible to have all revisions sharing a LeafNode with just the tree root, and have that leaf be the same in all revisions).
* text to sends are determined by the id_to_entry records we send, so if any id_to_entry nodes are missing, then so are some texts.
The challenge is how we can fix this while still sending only O(changes) data (which is a key benefit of using stacking). What doesn't work: g_nodes: that still allows shared non-root CHK nodes to be omitted.
* simply re-requesting those inventories in the 'fix missing inventories' step, because that doesn't cause texts to be transferred (perhaps if we allowed a *third* fetch round to fixup those missing texts then maybe that would work? Ugh.)
* discarding known-interesting CHK roots from the uninteresting-roots set before calling iter_interestin
* dropping the filtering of “uninteresting” CHK nodes: that would regress to sending O(tree)-sized data.
Ideas welcome!