Unicode (UTF-16) files on Windows
Martin Pool
mbp at canonical.com
Thu Aug 20 09:57:58 BST 2009
2009/8/20 Philippe Lhoste <PhiLho at gmx.net>:
> How come Bazaar doesn't handle properly UTF-16 with Bom? Maybe you can add
> the detection of the Bom to the heuristic of binary file detection?
It should be very small if you want to try this - it's in textfile.py.
Just comment out the checks and see if it works better in your case.
> Of
> course, it means other commands (like cat) should understand UTF-16 as well,
> so it might imply more work than it seems.
I think cat will probably actually be ok, but internal diff might need
some help to read it as utf-16 and properly parse it into lines.
I think we'd need to not just recognize that this is a text file, but
key off the BOM to know that it should be decoded as UTF-16. Probably
textfile.py should provide an interface that gives back a sequence of
Unicode lines.
--
Martin <http://launchpad.net/~mbp/>
More information about the bazaar
mailing list