format string should be unicode instead byte string
INADA Naoki
songofacandy at gmail.com
Mon Sep 7 06:19:17 BST 2009
Related to: https://bugs.launchpad.net/bzr/+bug/404740
Human readable format string should be unicode even though ascii string.
When belowing code executed::
"path: %s" % (path,)
If path is unicode string, it may cause UnicodeEncodeError.
But next code::
u"path: %s" % (path,)
It works fine when path is both unicode and bytes.
Next example.
class Foo:
def __init__(self, path):
self.path= unicode(path)
def __str__(self):
return self.path.encode('utf-8')
def __unicode__(self):
return self.path
foo = Foo(path) # path may be unicode.
b = "foo: %s" % (foo,)
u = u"foo: %s" % (foo,)
When use byte format string, __str__() is called. And any chance to
encode suitable encoding is lost.
When use unicode format string, __unicode__() is called.
Best practice is:
* Use unicode literal for all human readable string.
* Encoding/decoding should done with I/O and use unicode internal.
--
Naoki INADA <songofacandy at gmail.com>
More information about the bazaar
mailing list