format string should be unicode instead byte string
John Arbash Meinel
john at arbash-meinel.com
Mon Sep 7 21:51:54 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
> 2009/9/7 INADA Naoki <songofacandy at gmail.com>:
>> Related to: https://bugs.launchpad.net/bzr/+bug/404740
>> Human readable format string should be unicode even though ascii string.
>>
>> When belowing code executed::
>>
>> "path: %s" % (path,)
>>
>> If path is unicode string, it may cause UnicodeEncodeError.
>> But next code::
>>
>> u"path: %s" % (path,)
>>
>> It works fine when path is both unicode and bytes.
>
> That seems to make sense. There may be some cases where we're using
> format strings to produce something for a file required to be in a
> particular encoding and we would prefer to get a UnicodeError, but
> they seem rare.
>
Actually, I think he has it backwards. If you do:
"path: %s" % (path,)
Then if 'path' is unicode then it will upcast the string to Unicode. If
path is 'bytes' and contains non-ascii characters, it stays bytes.
However if you do:
u"path: %s" % (path,)
If 'path' is Unicode, things are fine, and if 'path' is ascii things are
fine (auto-upcasting ascii => unicode). However if 'path' is non-ascii
characters you get a failure.
>>> 'path: %s' % ('ascii-path',)
'path: ascii-path'
>>> 'path: %s' % (u'unicode-path',)
u'path: unicode-path'
>>> 'path: %s' % ('nonascii-\xb5path',)
'path: nonascii-\xb5path'
>>> u'path %s' % ('ascii-path',)
u'path: ascii-path'
>>> u'path %s' % (u'unicode-path',)
u'path: unicode-path'
>>> u'path: %s' % ('nonascii-\xb5path',)
UnicodeDecodeError
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkqlcmoACgkQJdeBCYSNAAMuSwCgmn6BRJwIK4ApBDJzRW5h3NX3
hW4AoJP0lGZa88Z6s3fU+MT1LgWXTtBS
=wOLx
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list