The revision_history function is implemented in a really slow manner, which also uncovers some things in the library that seem suboptimal to me.
The main issues I see are:
* Commit objects are compared to each other (really inefficient)
* A search is done on a list instead of a set() or similar to see if a commit already exists
* (sorting is done by a pretty dumb insertion sort, not sure what the effect of this is)
With the first point one issue is that comparing two commit objects needs to call the __eq__ function, and may result in the hash to be calculated (which, in theory is not neccessary as the hash was used to retrieve the data in the first place).
It is not very hard to work around these issues, by just storing the hash as a string, and then doing the checks in the function based on this. However, I am thinking that a better way may exist.
For this I considered the following:
* Commit objects (and other objects) that are written to the repository are basically immutable. They should never be modified.
* A commit that may be modified does not belong to any repository
* If Commits may not be modified, the same object could be returend if an object is retrieved multiple times
Now, if that happens, ie. the same object is retrieved when repo.get_object() is called twice on one ID, then the check whether two commits are the same becomes a simple python object equality tests.
For this to work, one would need to keep a hash table that contains all objects from the repository in a hash table. Of course it needs to use weak refs, so that the objects will be freed again when they are not needed anymore.
Comparing commits just compares their hashes so this shouldn't be a problem - it's not significantly less efficient than working with those hashes as strings directly. We eventually retrieve the full contents of the commit anyway to be able see what its parents and commit timestamp are.
Commits are only retrieved once for each sha anyway, so I don't see the point of caching them.
history is a list intentionally, so it can be sorted. Optionally we could keep a set of sha's that have already been processed so that we don't have to iterate the lsit each time.
Another optimization I see is using binary search to find the right place to insert a commit sha.