[PATCH] Commit and log commands: strings encodings

Thu Dec 8 07:33:19 GMT 2005

On 24 Nov 2005, Alexander Belchenko <bialix at ukr.net> wrote:
> Here patch that provide extended support for string encodings used in 
> commit and log commands. This patch fix 2 bugs.
> 
> 1) Bug in log commands: bzr use bzrlib.user_encoding instead of 
> sys.stdout.encoding for encoding messages. As result on russian windows 
> machine (user_encoding == cp1251, but console encoding and 
> sys.stdout.encoding == cp866) russian words showed as hieroglyphs.
> 
> 2) Bug in commit command: bzr not decode message from external editor to 
> unicode. As result this message stored "as is" with ampersand coding.
> 
> This patch provide 2 new global options:
> --input-encoding or -i
> --output-encoding or -o

Thanks for the patch.

I agree with John that these probably don't need the -i and -o short
option names, at least for now.  I would hope that we could set the
encoding from an environment variable or configuration option and have
it be right most of the time.  I know people might need to override it
for some cases.

> 
> Input encoding used for decoding commit message from external editor AND 
> from file. Note: commit message from command line (-m option) always 
> decoded to unicode with bzrlib.user_encoding.
> 
> Output encoding used for encoding log message.
> 
> Ability of customization string encodings is important for inter-program 
> communications.
> 
> Please review my work. This patch lacks of test cases, I know. But test 
> cases for this is very non-trivial, any hints is very appreciated.
> 
> Alexander

> === modified file 'bzrlib\\builtins.py'
> --- bzrlib\builtins.py
> +++ bzrlib\builtins.py
> @@ -831,6 +831,7 @@
>                              help='show revisions whose message matches this regexp',
>                              type=str),
>                       Option('short', help='use moderately short format'),
> +                     'output-encoding',
>                       ]
>      @display_command
>      def run(self, filename=None, timezone='original',
> @@ -841,7 +842,8 @@
>              message=None,
>              long=False,
>              short=False,
> -            line=False):
> +            line=False,
> +            output_encoding=None):
>          from bzrlib.log import log_formatter, show_log
>          import codecs
>          assert message is None or isinstance(message, basestring), \
> @@ -878,11 +880,12 @@
>          if rev2 == 0:
>              rev2 = None
>  
> -        mutter('encoding log as %r' % bzrlib.user_encoding)
> +        output_encoding = output_encoding or sys.stdout.encoding or bzrlib.user_encoding
> +        mutter('encoding log as %r' % output_encoding)
>  
>          # use 'replace' so that we don't abort if trying to write out
>          # in e.g. the default C locale.
> -        outf = codecs.getwriter(bzrlib.user_encoding)(sys.stdout, errors='replace')
> +        outf = codecs.getwriter(output_encoding)(sys.stdout, errors='replace')

It seems like there is some danger that sys.stdout will end up
doubly-encoded, if it already has an encoding and then we create another
object wrapping it.  Maybe not?

-- 
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051208/47f192a7/attachment.pgp