Unicode (UTF-16) files on Windows

Martin Pool mbp at canonical.com
Thu Aug 20 09:57:58 BST 2009


2009/8/20 Philippe Lhoste <PhiLho at gmx.net>:

> How come Bazaar doesn't handle properly UTF-16 with Bom? Maybe you can add
> the detection of the Bom to the heuristic of binary file detection?

It should be very small if you want to try this - it's in textfile.py.
 Just comment out the checks and see if it works better in your case.

> Of
> course, it means other commands (like cat) should understand UTF-16 as well,
> so it might imply more work than it seems.

I think cat will probably actually be ok, but internal diff might need
some help to read it as utf-16 and properly parse it into lines.

I think we'd need to not just recognize that this is a text file, but
key off the BOM to know that it should be decoded as UTF-16.  Probably
textfile.py should provide an interface that gives back a sequence of
Unicode lines.

-- 
Martin <http://launchpad.net/~mbp/>



More information about the bazaar mailing list