unicode symlink_target handling
John Arbash Meinel
john at arbash-meinel.com
Thu Jun 5 22:41:20 BST 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Just a quick note...
It seems we are very loose with our 'symlink_target' handling. Specifically we
tend to treat it as an 8-bit string, except we store it in XML which would break
if the target was non-ascii.
If you grep the code base for "\<readlink\>" you'll find a few places that are
using it:
~ DirState._read_link() returns the raw os.readlink
~ _PreviewTree.get_symlink_target() returns the raw os.readlink()
~ bzrlib.transform._content_match() also uses the raw os.readlink()
~ WorkingTree.path_content_summary() uses raw os.readlink()
~ WT.get_symlink_target() does as well.
In fact, I didn't find any places that encode the symlink target.
Now WT4._generate_inventory() does do:
~ inv_entry.symlink_target = utf8_decode(fingerprint)[0]
But DirStateRevisionTree.get_symlink_target() just does:
~ # At present, none of the tree implementations supports non-ascii
~ # symlink targets. So we will just assume that the dirstate path is
~ # correct.
~ return entry[1][parent_index][1]
xml8.write_inventory uses:
~ append('<symlink file_id="%s name="%s%s%s revision="%s '
~ 'symlink_target="%s />\n' % (
~ _encode_and_escape(ie.file_id),
~ _encode_and_escape(ie.name),
~ parent_str, parent_id,
~ _encode_and_escape(ie.revision),
~ _encode_and_escape(ie.symlink_target)))
_encode_and_escape uses XML code escapes rather than something like UTF-8.
(å instead of u'\xe5'). So cElementTree would read those back as Unicode
objects if they contain unicode, or plain 8-bit strings if they don't.
I suppose the bug is that we just don't support non-ascii symlink targets. I'm
just trying to work out the right solution for:
~ https://bugs.launchpad.net/bzr/+bug/135320
Because *sometimes* the symlink_target is a Unicode object, and sometimes it is
a plain 'str'. I suppose I'll do the easy thing and just str(fingerprint) since
the rest of the code doesn't support non-ascii symlink targets.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkhIXYAACgkQJdeBCYSNAAMUpACfZXlW8EUOSyzNzq31oyC/aD3F
PDEAoMCpoEIAaj0QeiBJI+7b55MqreQu
=5jlR
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list