format string should be unicode instead byte string

INADA Naoki songofacandy at gmail.com
Mon Sep 7 06:19:17 BST 2009


Related to: https://bugs.launchpad.net/bzr/+bug/404740
Human readable format string should be unicode even though ascii string.

When belowing code executed::

  "path: %s" % (path,)

If path is unicode string, it may cause UnicodeEncodeError.
But next code::

  u"path: %s" % (path,)

It works fine when path is both unicode and bytes.

Next example.

class Foo:
    def __init__(self, path):
         self.path= unicode(path)
    def __str__(self):
        return self.path.encode('utf-8')
    def __unicode__(self):
        return self.path

foo = Foo(path) # path may be unicode.
b = "foo: %s" % (foo,)
u = u"foo: %s" % (foo,)

When use byte format string, __str__() is called. And any chance to
encode suitable encoding is lost.
When use unicode format string, __unicode__() is called.

Best practice is:
* Use unicode literal for all human readable string.
* Encoding/decoding should done with I/O and use unicode internal.

-- 
Naoki INADA  <songofacandy at gmail.com>



More information about the bazaar mailing list