[RFC] more encodings tests [was: bzr handles unicode]

Jan Hudec bulb at ucw.cz
Sun Jan 8 18:31:53 GMT 2006


On Sun, Jan 08, 2006 at 09:10:26 -0600, John Arbash Meinel wrote:
> Alexander Belchenko wrote:
> > John Arbash Meinel пишет:
> > 
> >>> Sure. Here is list of not 'OK' tests in blackbox (for r1539):
> >>
> >>
> >> Wow, it looks like they all fail. Can you give me a single traceback, so
> >> I can figure out where it is failing?
> > 
> > 
> > Here zip archive with test.log when I run:
> > 
> > python bzr --no-plugins selftest blackbox -v > test.log
> > 
> > -- 
> > Alexander
> 
> Well, it looks like the ones that fail are the ones which expect your
> bzrlib.user_encoding to be able to handle european characters, which we
> already know it won't (since you can't handle Erik's name.)
> 
> I'm trying to figure out what the best solution is.
> I could try a few character sets (right now I have Swedish, Arabic,
> Kanji, and Russian).
> And do a couple different tests to evaluate what the current encoding is
> able to handle, and then just use those characters in the rest of the test.
>
> Does that seem like it is still a valid test? On platforms which support
> more (like a utf-8 platform), it could try to use all of the different
> character sets.

I think they should always use all of the different character sets. Python
should always support the recoding, so it should be possible to force
user_encoding to the respective encodings the test samples decode to. The
only problem would be filesystem encoding. On windows it is fixed to mcbs, so
test with that. On unix if you make sure the base path is ascii-only, you
could probably force it to any ascii-comatible encoding (which should be all
of them except utf-16).

Btw, here is a sentense in Czech; should decode in iso-8859-2:
u'\u017dlu\u0165ou\u010dk\xfd k\u016f\u0148 \xfap\u011bl \u010f\xe1belsk\xe9 k\xf3dy'
('Žluťoučký kůň úpěl ďábelské kódy')

-- 
						 Jan 'Bulb' Hudec <bulb at ucw.cz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060108/c691e4f6/attachment.pgp 


More information about the bazaar mailing list