[MERGE] show (possibly dotted) revnos in `bzr tags` (v2)

Fri Sep 21 17:46:48 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lukáš Lalinský wrote:
> On 9/18/07, Adeodato Simó <dato at net.com.org.es> wrote:
>> (BTW, I'm a bit puzzled by the diff lines generated by `bzr send` and
>> `bzr diff` in this case: if you check with care the diff for
>> test_tags.py, you'll note there are three lines which are deleted and
>> added, whereas /usr/bin/diff gets it right. The lines are:
>>
>>     out, err = self.run_bzr('tags -d branch1', encoding='utf-8')
>>     self.assertEquals(err, '')
>>     self.assertContainsRe(out,
>>
>> Is this some kind of bug?)
> 
> Depends. This is a feature of patiencediff, which doesn't consider
> lines that are not unique within one block as matching. For example if
> there would be one unique and unchanged line after the original four
> lines (which would break the change into two blocks), it would produce
> result like this:
> 
>          self.assertContainsRe(out,
> -            u'^\u30d0zaar  *revid-1\n'.encode('utf-8'))
> +            u'^\u30d0zaar  *1\ntag2  *\\?\n'.encode('utf-8'))
>         # unique line
> 
> Excluding non-unique lines usually helps a lot for things like empty
> lines or { and } in C-like languages, but in this case would removing
> this constraint probably produce better diff. I *think* the algorithm
> could be tweaked to produce the right diff in this case and still
> ignoring most common lines, but I can't think of a particular solution
> right now.
> 
>> 0.7-2                2:stratus-20060507035335-aa563a97409e6b0a
>> 0.8.1-1              6:stratus-20060520153101-ee34dc54b2e64e8a
>> 0.9.0-1              9:stratus-20060825180616-ae66c43c7fd16a94
>> 0.10.0-2             10:stratus-20060919175512-167930889243d43d
>> 0.11.0-1             11:stratus-20061128173110-88924ff1ef22325a
>> 0.16.1-1             17:siretart at tauware.de-20070509095307-occwyxvg2cjhr4j5
>> 0.17.1-1             22.1.2:dato at net.com.org.es-20070615155205-zr6e54m6c9duwpmj
>> 0.18.0-1             29:dato at net.com.org.es-20070717161538-qy9wsewauusbn3nr
>> 0.90.0-1             33:dato at net.com.org.es-20070815100938-y2bhsavdw94vsd99
>> 0.91.0-1             ?:dato at net.com.org.es-20070914083051-4e38y28c6ckkm72o
> 
> I personally don't like this X:Y format, because you are displaying
> two different identifiers and this makes it look like one, but this is
> not why I was writing this mail...

At one point we had tweaked the code to fall back to regular difflib on
the innermost sections to handle this specific case.

The problem there is that you end up with "worst-of-both-worlds"
performance when a file has been completely changed.

One of the reasons we use PatienceDiff is because it handles the 100,000
line file with every line changed problem. (Look in the bug archives for
cases where difflib nukes itself under those conditions, it used to
completely die because of a recursive stack limit, it got better when
python switched to using a list as a stack in 2.4.3, but still performs
very badly.)

So yes, we could tweak the algorithm, but it could have a fairly
significant effect on performance, as you are asking to do another
line-by-line match. It might be a bit cheaper in the C version, because
the algorithm was changed slightly to keep track of matches, rather than
forgetting about them from time to time.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG8/V4JdeBCYSNAAMRAuY4AJwJgS7gBAPAMUwMtCNtABskLWdDmQCeMe/v
08JeT/uwag3DaZmS82nfNzQ=
=HGd/
-----END PGP SIGNATURE-----