The bug is in KnitVersionedFiles.insert_record_stream, or at least its interaction with add_records(..., missing_compression_parents=True). It doesn't pass all the buffered records in one batch, but in several (one per key being depended on). This in turn breaks an assumption in add_records, so that depending on ordering of keys it's possible for it to think a key has a missing parent, when in fact it just has a buffered parent (because that parent, or one of that parents' parents, etc., has a missing parent). Probably this bug is so finicky in part because it depends on the order a Python dict is iterated, which in turn is perhaps partly dependent on the order records are received from the source.
The simple fix is to accumulate all buffered_index_entries into one list before calling add_records at the end of insert_record_stream.
This wouldn't affect 2a repositories, so that's another reason for people to upgrade...
I think this bug has been lurking for a long time, although it requires stacking and HPSS to provoke, as well as some bad luck with how Python orders a dict. Hmm, perhaps before this was masked because we weren't using 'unordered' fetches as often?
The bug is in KnitVersionedFi les.insert_ record_ stream, or at least its interaction with add_records(..., missing_ compression_ parents= True). It doesn't pass all the buffered records in one batch, but in several (one per key being depended on). This in turn breaks an assumption in add_records, so that depending on ordering of keys it's possible for it to think a key has a missing parent, when in fact it just has a buffered parent (because that parent, or one of that parents' parents, etc., has a missing parent). Probably this bug is so finicky in part because it depends on the order a Python dict is iterated, which in turn is perhaps partly dependent on the order records are received from the source.
The simple fix is to accumulate all buffered_ index_entries into one list before calling add_records at the end of insert_ record_ stream.
This wouldn't affect 2a repositories, so that's another reason for people to upgrade...
I think this bug has been lurking for a long time, although it requires stacking and HPSS to provoke, as well as some bad luck with how Python orders a dict. Hmm, perhaps before this was masked because we weren't using 'unordered' fetches as often?