Bazaar-NG traffic #2

David Allouche david at allouche.net
Wed Oct 12 13:39:27 BST 2005


On Tue, 2005-10-11 at 08:33 -0500, John A Meinel wrote:
> David Allouche wrote:
> > To be honest we only once had non-ascii file names in source code
> > repositories in a few hundred mainline imports, but the number are
> > biased since we have been focusing on increasing the number of
> > successful imports, disregarding (numerous) import failures.
> 
> Do you have any of these directories/files available?

Not anymore, sorry.

>  I would be curious
> what this returns:
> 
> python -c "import os; print os.listdir(u'.')"
> versus
> python -c "import os; print os.listdir('.')"
> 
> The first should try and interpret the names and return unicode, the
> second should just do ascii names (possibly just byte-stream names).
> 
> I know for sure that on windows, if you have a non-ascii name, the
> former returns ['???????'], while the later returns [u'\u07077\u71070']
> I believe the windows filesystem is all UTF-16.

I do not have access to any Windows system. But on my Ubuntu:

>>> os.environ['LANG']
'en_GB.UTF-8'
>>> open('\xe9', 'w')
... something that mail client won't paste ...
>>> import os
>>> os.listdir('.')
['\xe9']
>>> os.listdir(u'.')
['\xe9']
>>> open(u'\xe9', 'w')
<open file u'\xe9', mode 'w' at 0x401d6338>
>>> os.listdir(u'.')
['\xe9', u'\xe9']
>>> os.listdir('.')
['\xe9', '\xc3\xa9']

In other words, you cannot assume that listdir is going to always give
you unicode strings.

-- 
                                                            -- ddaa
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051012/0dbf79e6/attachment.pgp 


More information about the bazaar mailing list