version-info --include-history UnicodeDecodeError (518609)

Robert Collins robertc at robertcollins.net
Thu Apr 8 23:09:00 BST 2010


On Fri, 2010-04-09 at 07:52 +1000, Robert Collins wrote:
> > This particular bug is a regression of sorts, the operation used to
> > give mojibake, and now throws (attached,
> > bzr_version_info_failure.log), note also log behaves as (un?)expected.
> 
> That looks like bzrlib.rio.Stanza.to_lines is incorrectly returning
> 'unicode' rather than 'str' line objects, it should (per the docstring)
> be returning 'str' lines encoded in utf8. Looking at the code it appears
> to me that the 'tag' variable is the most likely culprit: a unicode tag
> would cause implicit upcasting of individual lines.

Here is a trivial diagnostic patch, on the code speaks louder than words
theory: apply this; run a breaking version-info invocation - you should
get a clear report on the issue. My bet is that many/all of 'tag' are
unicode objects that happen to encode trivially to ascii, but their
presence as unicode is causing the stream to get implicitly encoded and
decoded - and thus the boom.

Doing 
if type(tag) is unicode:
    tag = tag.encode('utf-8')

instead of the assertion below, will probably fix it.

=== modified file 'bzrlib/rio.py'
--- bzrlib/rio.py	2009-09-11 06:36:50 +0000
+++ bzrlib/rio.py	2010-04-08 22:05:27 +0000
@@ -172,6 +172,8 @@
             return []
         result = []
         for tag, value in self.items:
+            if type(tag) is unicode:
+                raise AssertionError("tag is %r" % tag)
             if value == '':
                 result.append(tag + ': \n')
             elif '\n' in value:


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20100409/ad437dbb/attachment.pgp 


More information about the bazaar mailing list