Comment 4 for bug 842695

Revision history for this message
Martin von Gagern (gagern) wrote :

OK, finally got this thing analyzed and wrote a test case for it. It's in lp:~gagern/bzr/bug842695-log-dir and can be used as a basis for fixing this bug. I wrote a rather verbose commit message, which I'll simply paste here:

When restricting a log to a given directory, _generate_deltas will be used to find out matching revisions. It does so using repository.get_deltas_for_revisions, which describes the difference that the given revision introduced with respect to its left hand parent. So files introduced by the right hand parent of a commit will be considered "added" by the delta. This can lead to false positives, to commits reportedly touching a given dir although they only merged stuff introducing these files.

Note that for in some cases, this is expected behaviour: every merge on the route from a modification to its first merge into mainline should be considered touched by that modification. But for stuff already included in main line, those modifications should not be reported again if they are merged into some side line. In other words, every change should have one direct child reporting it, but no more.

_generate_deltas apparently processes revisions in batches of 200. After each batch, the found additions are removed from the fileid_set and won't be tracked in the next batch. Processing terminates if there are no more files to track. Due to this logic, a false positive in one batch can lead to false negatives later on, as the file gets removed too early, and its actual
addition is therefore lost.

So much is in the commit message. The bug as described above occurred because some branches forked off lp:bzr before the bash_completion_plugin dir was added, and were merged back into lp:bzr after that. At some point, they themselves merged from lp:bzr, and thus had a lot of additions compared to their own mainline. Those additions caused the log to report these merges, and also to miss later merges that I expected.

I guess that before addressing this, we should make certain that we agree on what we expect. To do so, I'd like to define two terms for the scope of this discussion here. I'd say a commit has an ORIGINAL modification to a file if none of its parent has the same file with the same content. In contrast to that, a modification is INHERITED if the left parent doesn't have the same file content, but some other parent does. Obviously, only merges can inherit modifications. The same terms can be applied to directories by comparing their whole recursive content.

Two important observations: if a file is edited in two branches, then merging them will cause a 3-way merge, resulting in a file different from the version in either branch. So the file content merge produces an ORIGINAL modification. On the other hand, if a branch never modifies a given file itself, then repeated merges from a single trunk will never cause an ORIGINAL modification for that file.

So what behaviour would I expect for the -n0 log restricted to a given file or directory? First of all, I'd like to see every ORIGINAL modification to that file. Secondly, I guess that for every ORIGINAL modification, I'd like to see a SINGLE line of (ORIGINAL or INHERITED) modifications leading from that modification to the main line I'm logging. That line should end at the earliest point integrating the ORIGINAL modification into the mainline, and should continue by recursively applying this criterion to the appropriate branch from which the modification originated.

Is this expectation understandable? Do you agree with it? Do you think it can be made to work?