reported by abderrahim in github:
I've made some patches some time ago, they are in my fork branch pack. All but the last patch are ready for me.
Basically, I've fixed and enabled the commented out code of writing deltas, the result is that it's very slow, I've tried to speed things up by various means : reuse deltas in existing packs, reusing the SequenceMatcher, etc. The last patch contains a C extension for diffing (actually finding blocks that match), I'm not sure this is the best idea, it reduced the time for a writing a pack (push to an empty repository) from 2m20 to 1m30. I still find this excessive, so I tried other things: doing a line-based (as opposed to character-diff) diff is way faster, only 10 seconds, but the pack is somewhat larger (216k vs 161k). "word" based diff (splitting on spaces) is somewhere in the middle (30 seconds, 166k). I find that this, or some "adaptive" method (like splitting on the most common char, may be the best idea. (the line-based and word based are both using pure python)
I think we should aim to do something similar to what C git is doing - their implementation has been heavily optimized. Have you looked at what they're doing at all?