[MERGE] OSX test suite passing (well as of 2008/09/08 :)

Vincent Ladeuil v.ladeuil+lp at free.fr
Thu Sep 11 11:55:29 BST 2008


>>>>> "aaron" == Aaron Bentley <aaron at aaronbentley.com> writes:

<snip/>

    >> So I changed test_case_insensitive_clash into
    >> test_rollback_on_directory_clash.
    >> 
    >> Does that fit ?

    aaron> Yes, I think it's better to have a test that works everywhere.

Good.

    >> +    def test_rollback_on_directory_clash(self):
    >> +        wt = self.make_branch_and_tree('.')
    >> +        wt.add

    aaron> ^^^ I think this is a no-op.

Oops :) Note to self: stop relying on python to detect your
unfinished attempts.

    >> +        # It will fail the renaming because the target
    >> +        directory is not empty

    aaron> ^^^ "The rename will fail..."

Done and thanks.

    john> So at this point, we still have code that complains
    john> about adding non-NFC filenames, and a codebase which
    john> doesn't really handle it correctly in all cases. It
    john> took me a lot of effort to get it working the first
    john> time, and it just wasn't worth it (to me) to fight
    john> through all that again.

Ok, let's call that a work-still-in-progress then.

    >> 
    john> But until we can get there, we need to filter at add
    john> and rename time.
    >> 
    >> I read that as 'bzr filter at add and rename' *now*. So even when
    >> checking out on OSX (or on an HFS mounted fs) bzr never
    >> *produces* such files.
    >> 
    >> It comes that these tests can be safely skipped on OSX.
    >> 
    >> Or do we want to add a specific one that indeed, you can't create
    >> such a file ?

    john> It is supposed to check at add and rename, as that is
    john> the only way to inject new filenames into the
    john> system. I'm not 100% sure of what to do on Mac. People
    john> want to be able to add the files. I came to the point
    john> of deciding to just let go, and let people version any
    john> name they want. And just live with the fact that it
    john> breaks between Mac and any other platform. (Mostly
    john> because svn, hg, git, and all others break as well.
    john> Though I've seen some emails about svn trying to play
    john> with normalization.)


So, I'll summarize as follows:

- internally, bzr represent file names as NFC normalized, utf-8
  encoded strings,

- it checks, during add and rename that files are in a canonical
  form,

- using non-NFC encoding file systems requires transcoding at
  various points.

The last point being the one that needs an unknown amount of work
to be 100% correct.

    john> ...

    >> 
    >> So, sorry for that loong mail, here is the patch,
    >> 
    >> Vincent
    >> 

    >> +            if os.environ.get('LANG', None) is None:
    >> +                # If LANG is not set, we end up with 'ascii', which is bad
    >> +                # ('mac-roman' is more than ascii), so we set a default which
    >> +                # will give us UTF-8 (which appears to work in all cases on
    >> +                # OSX). Users are still free to override LANG of course, as
    >> +                # long as it give us something meaningful. This work-around
    >> +                # *may* not be needed with python 3k and/or OSX 10.5, but will
    >> +                # work with them too -- vila 20080908
    >> +                os.environ['LANG'] = 'en_US.UTF-8'

    john> ^- There are some possible negative side effects of this, but I'm
    john> guessing the positives out-weigh it.

    john> 1) Would it be better to just do that in the test suite?

The intent is to cover more users without requiring *them* to
specify the locale. The code above set a reasonable locale if the
user didn't attempt to set it itself via LANG.

As mentioned in another email, pqm set LANG to C which fails for
OSX, but that's part of make check where we run with LANG=C as a
way to test on *Linux* that a default install will work. If we
want to do the same for OSX, I'd say this step is not needed at
all.

    john> 2) There are the LC_* functions that can have similar
    john> effects, and *I* won't claim to be an expert in any of
    john> it.

Me neither, but googling the subject seems to provide only LANG=
examples.

    john> 3) Is the "en_US" locale always available on Mac?

en_US.UTF-8 is what the apple devs use and why I chose it, AFAIK
you can't *remove* it from an OSX install.

    john> I don't think it is under Linux, but that doesn't apply
    john> here.

Trying 'en.UTF-8' as you proposed in another mail leads to:

bzr: warning: unsupported locale setting
  Could not determine what text encoding to use.
  This error usually means your Python interpreter
  doesn't support the locale set by $LANG (en.UTF-8)
  Continuing with ascii encoding.

    john> I really don't have any idea about how Mac goes about
    john> with locale customized installations.

I don't know the specifics of OSX, but the problem here is how
python handles it on OSX and the above seems the best I can do
waiting for 2.6 or 3k.

    john> I'm probably about 50% on this. It solves a direct
    john> issue, but possibly introduces others. Certainly once
    john> we have proper i18n, this is going to become a more
    john> serious issue.

We'll see how it goes, but unaware users will not be *required*
to specify it and at least get proper encoding.

    john> In the short term it is a reasonable fix.

Thanks.

    john> Though I'll also mention my other comment about
    john> avoiding 'locale' entirely. It is a significant blip on
    john> startup time, and seems to be doing a lot of work that
    john> we only need to do once. (Then again, maybe setlocale
    john> is being called, and initializing a lot of state that
    john> 3rd-party code like 'svn' needs to work properly.)

I don't enough expertise to replace 'import locale' with some
code having the same effect, so for now, I'll prefer to use that
approach and see how it goes.

And, as mentioned above, we may not need to import locale at all
once python setup itself correctly on OSX.

<snip/>

    john> ^- These changes aren't correct. We *should* test that bundles can
    john> properly transmit non-ascii characters. We should just do it with a
    john> different character than "'\xe9'".

Sorry, fixed.

<snip/>

    john> BB:tweak

    john> (Though you may want to resubmit to look over how you
    john> handle unicode characters in bundles.)

Since the modifications were really minor, I'll submit to pqm, I
let you decide whether it's worth including in 1.7.

    Vincent

P.S.: Incremental diff below:

=== modified file 'bzrlib/tests/test_bundle.py'
--- bzrlib/tests/test_bundle.py	2008-09-08 15:31:26 +0000
+++ bzrlib/tests/test_bundle.py	2008-09-11 09:24:38 +0000
@@ -795,7 +795,7 @@
         # Handle international characters
         os.mkdir('b1')
         try:
-            f = open(u'b1/foo', 'wb')
+            f = open(u'b1/with Dod\N{Euro Sign}', 'wb')
         except UnicodeEncodeError:
             raise TestSkipped("Filesystem doesn't support unicode")
 
@@ -807,7 +807,7 @@
             u'William Dod\xe9\n').encode('utf-8'))
         f.close()
 
-        self.tree1.add([u'foo'], ['withdod-id'])
+        self.tree1.add([u'with Dod\N{Euro Sign}'], ['withdod-id'])
         self.tree1.commit(u'i18n commit from William Dod\xe9',
                           rev_id='i18n-1', committer=u'William Dod\xe9')
 
@@ -815,7 +815,7 @@
         bundle = self.get_valid_bundle('null:', 'i18n-1')
 
         # Modified
-        f = open(u'b1/foo', 'wb')
+        f = open(u'b1/with Dod\N{Euro Sign}', 'wb')
         f.write(u'Modified \xb5\n'.encode('utf8'))
         f.close()
         self.tree1.commit(u'modified', rev_id='i18n-2')
@@ -823,14 +823,14 @@
         bundle = self.get_valid_bundle('i18n-1', 'i18n-2')
 
         # Renamed
-        self.tree1.rename_one(u'foo', u'bar')
+        self.tree1.rename_one(u'with Dod\N{Euro Sign}', u'B\N{Euro Sign}gfors')
         self.tree1.commit(u'renamed, the new i18n man', rev_id='i18n-3',
                           committer=u'Erik B\xe5gfors')
 
         bundle = self.get_valid_bundle('i18n-2', 'i18n-3')
 
         # Removed
-        self.tree1.remove([u'bar'])
+        self.tree1.remove([u'B\N{Euro Sign}gfors'])
         self.tree1.commit(u'removed', rev_id='i18n-4')
 
         bundle = self.get_valid_bundle('i18n-3', 'i18n-4')

=== modified file 'bzrlib/tests/test_transform.py'
--- bzrlib/tests/test_transform.py	2008-09-08 14:50:15 +0000
+++ bzrlib/tests/test_transform.py	2008-09-11 08:56:03 +0000
@@ -1163,9 +1163,8 @@
         transform.finalize()
 
     def test_rollback_on_directory_clash(self):
-        wt = self.make_branch_and_tree('.')
-        wt.add
         def tt_helper():
+            wt = self.make_branch_and_tree('.')
             tt = TreeTransform(wt)  # TreeTransform obtains write lock
             try:
                 foo = tt.new_directory('foo', tt.root)
@@ -1179,8 +1178,8 @@
             except:
                 wt.unlock()
                 raise
-        # It will fail the renaming because the target directory is not empty
-        # (but raise FileExists anyway).
+        # The rename will fail because the target directory is not empty (but
+        # raises FileExists anyway).
         err = self.assertRaises(errors.FileExists, tt_helper)
         self.assertContainsRe(str(err),
             "^File exists: .+/baz")




More information about the bazaar mailing list