[win32] non-ascii/non-english file names: internal usage of file names
Jan Hudec
bulb at ucw.cz
Wed Nov 30 15:08:00 GMT 2005
On Wed, Nov 30, 2005 at 14:33:38 +0100, David Allouche wrote:
> On Tue, 2005-11-29 at 08:23 -0600, John Arbash Meinel wrote:
> > I agree that it should use unicode filenames internally at all times.
> > Thanks for looking into this.
>
> Except when it's not possible. I can trivially create a plausible
> filename in unix that cannot be decoded to unicode in any meaningful
> way.
>
> For example:
>
> u'/Utilisateurs/Édouard/'.encode('latin-1') +
> u'docs/th??se.tex'.encode('utf-8')
That filename is not meaningful in any encoding though.
However, it *CAN* be meaningfully converted to unicode - by considering
it latin-1 (which covers the whole range 0 - 255, so .decode('latin-1')
can never fail).
What does not work though is the reverse encoding. You can get a utf-8
name (in the repository), that can't be encoded using
sys.getfilesystemencoding() or bzrlib.user_encoding.
> Some systems consider file names as character strings (Windows?) others
> consider file names as byte stream. You probably cannot get correct and
> reliable behaviour for both if you do not acknowledge the discrepancy.
>
> It's probably a reasonable requirement that the relative names of
> version controlled files should be stored (and treated internally) as
> unicode, but I do not think it's reasonable to require that all path
> handling be done on unicode strings.
The relative names of version controlled files need to be stored,
because it may not be possible to convert them to filesystem encoding.
However, doing all path handling in unicode has the advantage of
increased sanity (you don't have to think what it is, because it's
always unicode). It means to get the unicode equivalents from the store
though.
--
Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051130/777284e9/attachment.pgp
More information about the bazaar
mailing list