version-info --include-history UnicodeDecodeError (518609)
Robert Collins
robertc at robertcollins.net
Thu Apr 8 23:09:00 BST 2010
On Fri, 2010-04-09 at 07:52 +1000, Robert Collins wrote:
> > This particular bug is a regression of sorts, the operation used to
> > give mojibake, and now throws (attached,
> > bzr_version_info_failure.log), note also log behaves as (un?)expected.
>
> That looks like bzrlib.rio.Stanza.to_lines is incorrectly returning
> 'unicode' rather than 'str' line objects, it should (per the docstring)
> be returning 'str' lines encoded in utf8. Looking at the code it appears
> to me that the 'tag' variable is the most likely culprit: a unicode tag
> would cause implicit upcasting of individual lines.
Here is a trivial diagnostic patch, on the code speaks louder than words
theory: apply this; run a breaking version-info invocation - you should
get a clear report on the issue. My bet is that many/all of 'tag' are
unicode objects that happen to encode trivially to ascii, but their
presence as unicode is causing the stream to get implicitly encoded and
decoded - and thus the boom.
Doing
if type(tag) is unicode:
tag = tag.encode('utf-8')
instead of the assertion below, will probably fix it.
=== modified file 'bzrlib/rio.py'
--- bzrlib/rio.py 2009-09-11 06:36:50 +0000
+++ bzrlib/rio.py 2010-04-08 22:05:27 +0000
@@ -172,6 +172,8 @@
return []
result = []
for tag, value in self.items:
+ if type(tag) is unicode:
+ raise AssertionError("tag is %r" % tag)
if value == '':
result.append(tag + ': \n')
elif '\n' in value:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20100409/ad437dbb/attachment.pgp
More information about the bazaar
mailing list