[PATCH] Commit and log commands: strings encodings
Martin Pool
mbp at sourcefrog.net
Thu Dec 8 07:33:19 GMT 2005
On 24 Nov 2005, Alexander Belchenko <bialix at ukr.net> wrote:
> Here patch that provide extended support for string encodings used in
> commit and log commands. This patch fix 2 bugs.
>
> 1) Bug in log commands: bzr use bzrlib.user_encoding instead of
> sys.stdout.encoding for encoding messages. As result on russian windows
> machine (user_encoding == cp1251, but console encoding and
> sys.stdout.encoding == cp866) russian words showed as hieroglyphs.
>
> 2) Bug in commit command: bzr not decode message from external editor to
> unicode. As result this message stored "as is" with ampersand coding.
>
> This patch provide 2 new global options:
> --input-encoding or -i
> --output-encoding or -o
Thanks for the patch.
I agree with John that these probably don't need the -i and -o short
option names, at least for now. I would hope that we could set the
encoding from an environment variable or configuration option and have
it be right most of the time. I know people might need to override it
for some cases.
>
> Input encoding used for decoding commit message from external editor AND
> from file. Note: commit message from command line (-m option) always
> decoded to unicode with bzrlib.user_encoding.
>
> Output encoding used for encoding log message.
>
> Ability of customization string encodings is important for inter-program
> communications.
>
> Please review my work. This patch lacks of test cases, I know. But test
> cases for this is very non-trivial, any hints is very appreciated.
>
> Alexander
> === modified file 'bzrlib\\builtins.py'
> --- bzrlib\builtins.py
> +++ bzrlib\builtins.py
> @@ -831,6 +831,7 @@
> help='show revisions whose message matches this regexp',
> type=str),
> Option('short', help='use moderately short format'),
> + 'output-encoding',
> ]
> @display_command
> def run(self, filename=None, timezone='original',
> @@ -841,7 +842,8 @@
> message=None,
> long=False,
> short=False,
> - line=False):
> + line=False,
> + output_encoding=None):
> from bzrlib.log import log_formatter, show_log
> import codecs
> assert message is None or isinstance(message, basestring), \
> @@ -878,11 +880,12 @@
> if rev2 == 0:
> rev2 = None
>
> - mutter('encoding log as %r' % bzrlib.user_encoding)
> + output_encoding = output_encoding or sys.stdout.encoding or bzrlib.user_encoding
> + mutter('encoding log as %r' % output_encoding)
>
> # use 'replace' so that we don't abort if trying to write out
> # in e.g. the default C locale.
> - outf = codecs.getwriter(bzrlib.user_encoding)(sys.stdout, errors='replace')
> + outf = codecs.getwriter(output_encoding)(sys.stdout, errors='replace')
It seems like there is some danger that sys.stdout will end up
doubly-encoded, if it already has an encoding and then we create another
object wrapping it. Maybe not?
--
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051208/47f192a7/attachment.pgp
More information about the bazaar
mailing list