[MERGE] Test for bug #272444 (symlinks to Unicode file names)
Andrew Bennetts
andrew.bennetts at canonical.com
Wed Oct 22 08:20:23 BST 2008
bb:tweak
Daniel Clemente wrote:
> John Arbash Meinel <john at arbash-meinel.com> writes:
>
> > --- bzrlib/tests/branch_implementations/test_sprout.py 2007-11-01
> > 09:52:45 +0000
> > +++ bzrlib/tests/branch_implementations/test_sprout.py 2008-10-17
> > 09:15:25 +0000
> > @@ -1,4 +1,5 @@
> > # Copyright (C) 2007 Canonical Ltd
> > +# -*- coding: utf-8 -*-
> >
> > ^- I may be coming late to this, but we shouldn't need a line like this.
> > IIRC, we don't allow non-ascii characters in our source files (at least
> > as much as we can manage that.) Pretty much always we can get away with
> > ...
>
> I changed that, even when I personally don't like that restriction.
> I think it's unfortunate that Python 2.4 defaults to ASCII, but
> that's not a Bazaar issue.
Thanks. Our coding standard (the HACKING.txt file) doesn't have an
opinion on this (yet), but we mostly don't use this, so I think it's
best to stick to that convention.
(Although, some of our test files do already have that marker... grep
finds 19 files, all in bzrlib/tests, but it appears that only one
actually contains non-ASCII bytes: bzrlib/tests/EncodingAdapter.py)
> > Also, 'ó' is a combining character, so it is going to get us in trouble
> > on Mac, which will expand u'\xf3' to u'o\u0301'.
> >
>
> Isn't this also worth testing? A link to 'ó' should also work on
> Mac, and if it doesn't, a test should catch this error. Maybe you
> want a separate test for that which focuses only on combining
> characters?
> I changed the target in my test to Ω anyway.
I'm not an expert in unicode matters, but I know John has looked at
extensively at the issues with encodings on OS X, so I generally just
trust his judgement :)
That said, if ó and Ω have a signficant chance of working differently
(i.e. one case might pass while another fails) then I'd favour an
explicit test for each of those cases.
> > And just to mention *why* it isn't supported. Symlink targets are just
[...]
>
> I also think this. The other alternative (store targets' bytes
> exactly as they are, even if you get broken links at checkout time)
> wouldn't be useful to most users, because they want their stored
> data to work in any other computer, independently of the encoding.
> It is thus better to adapt to the user's encoding.
>
> In addition, symlinks have a fingerprint in dirstate.py which is
> equal to their target. Since the fingerprint is per definition „a
> nonempty utf8 sequence“, the target must also be stored in utf8.
>
> This discussion could follow in bug #272444.
For the test, I think a reference to a detailed discussed in bug #272444
is ok (in lieu of putting an extensive explanation of the failure in the
test).
> > + raise KnownFailure('there is no support for symlinks to
> > non-ASCII targets (bug 272444)')
> > +
> >
> > ^- This line is longer than 80 characters, just split it into something
> > like:
>
> My monitor has place for more than 80 characters, and my editor
> knows how to break lines automatically! I will never understand that
> convention. However, if that is what Bazaar mandates, I change it.
But *my* monitor, with my eyesight, only has room for two 80 column
editor panes side-by-side :)
Basically, it's a safe lowest-common denominator, and it's not overly
onerous to keep lines under 80 columns. And yes, we mandate it :)
> > === modified file 'bzrlib/tests/workingtree_implementations/test_parents.py'
> >
> > ^- I *do* think that we need a 'workingtree_implementations' test for
> > this, but I don't see why it is going in 'test_parents'. Could you
> > explain why you thought it should be there?
> >
>
> The reason to put it there is that set_parent_trees seems the
> function which makes the bug visible. I did that after Robert
> Collins' message:
> http://article.gmane.org/gmane.comp.version-control.bazaar-ng.general/47303
Makes sense to me, if set_parent_ids really is the trigger. (That seems
weird, but if that's the reality, then that's the reality. See also my
comments on the tests below.)
> > Especially considering that the tree is already *at* [revision] I don't
> > see why "set_parent_ids([revision])" would be anything other than a
> > no-op. (Which at a minimum would mean your test fails for the wrong reason.)
> >
>
> This was my fault; thanks for mentioning it. I'm still learning the API.
> I have changed it to simulate an uncommit; I hope it makes sense now.
Looks good to me.
> I attach a new patch with all your suggestions. Still to decide are:
> - which of 'ó' or 'Ω' is more useful to test Unicode
I'm inclined to trust John, so Ω.
> - location of the test (now at test_parents.py)
I think test_parents.py is ok.
[...]
> === modified file 'bzrlib/tests/branch_implementations/test_sprout.py'
[...]
> + def test_sprout_with_unicode_symlink(self):
> + # this tests bug #272444
> + # Also tested by TestSetParents.test_unicode_symlink at test_parents.py
> + self.requireFeature(SymlinkFeature)
> + self.requireFeature(UnicodeFilenameFeature)
> +
> + tree = self.make_branch_and_tree('tree1')
> +
> + # The link points to a file whose name is an omega
> + # U+03A9 GREEK CAPITAL LETTER OMEGA
> + # UTF-8: ce a9 UTF-16BE: 03a9 Decimal: Ω
> + os.symlink(u'\u03a9','tree1/link_name')
> + tree.add(['link_name'],['link-id'])
> + revision = tree.commit('added a link to a Unicode target')
> +
> + try:
> + tree.bzrdir.sprout('target')
> + except UnicodeEncodeError, e:
> + raise KnownFailure('there is no support for'
> + ' symlinks to non-ASCII targets (bug #272444)')
This test is a bit strange:
* that comment should mention that test_parents.py is in
workingtree_implementations, because it's not immediately obvious
* it seems unfortunate that it's possible to commit a revision that
can't then be sprouted. If we're going to fail, we probably should
fail sooner rather than later. That's not your fault of course, so
you don't have to fix it in this patch, but it is worth noting.
* it's odd to have a test that claims to duplicate the testing done by
another test. Is it really a duplicate? If so, what's the value in
having it? It seems to be a workingtree issue rather than a branch
issue, so a branch_implementations test does seem to be the wrong
place.
I'm currently leaning towards getting rid of this test. If there's a
good reason for it, then it needs a comment to explain what its
significant difference (and thus its value) is versus the
workingtree_implementations test.
> === modified file 'bzrlib/tests/workingtree_implementations/test_parents.py'
> --- bzrlib/tests/workingtree_implementations/test_parents.py 2008-08-12 13:07:00 +0000
> +++ bzrlib/tests/workingtree_implementations/test_parents.py 2008-10-18 11:32:13 +0000
> @@ -32,7 +32,7 @@
> InventoryLink,
> )
> from bzrlib.revision import Revision
> -from bzrlib.tests import SymlinkFeature, TestNotApplicable
> +from bzrlib.tests import KnownFailure, SymlinkFeature, TestNotApplicable, UnicodeFilenameFeature
This line is too long.
> + def test_unicode_symlink(self):
> + # this tests bug #272444
> + self.requireFeature(SymlinkFeature)
> + self.requireFeature(UnicodeFilenameFeature)
> +
> + tree = self.make_branch_and_tree('tree1')
> +
> + # The link points to a file whose name is an omega
> + # U+03A9 GREEK CAPITAL LETTER OMEGA
> + # UTF-8: ce a9 UTF-16BE: 03a9 Decimal: Ω
> + os.symlink(u'\u03a9','tree1/link_name')
> + tree.add(['link_name'],['link-id'])
> + revision1 = tree.commit('added a link to a Unicode target')
> + revision2 = tree.commit('this revision will be discarded')
> +
> + try:
> + tree.set_parent_ids([revision1])
> + except UnicodeEncodeError, e:
> + raise KnownFailure('there is no support for'
> + ' symlinks to non-ASCII targets (bug #272444)')
The weirdness about being able to commit a unicode symlink — but not do
other operations on a tree with a unicode symlink — still applies.
Something is fishy here! But I'd rather have this test than not.
-Andrew.
More information about the bazaar
mailing list