New format checklist
Aaron Bentley
aaron.bentley at utoronto.ca
Tue Jan 10 01:53:06 GMT 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Pool wrote:
| Frankly I would still prefer we simply constrain the characters that can
| be used in file-ids to something believed safe in relevant contexts, and
| so avoid a whole encoding/decoding stage. Are other people still
| strongly opposed?
|
| As I recall the arguments are:
|
| * we might guess wrong, and allow a character which turns out not to be
| permitted in some relevant context, so we need to add escaping after all
|
| * There is existing data from baz2bzr that uses ':' in file-ids.
It's not merely the existing data. It's also about presenting a useful
API. If API is constrained to a particular subset, then any client
which wants to use it to store other characters must define its own
escaping mechanism. Since the encoding mechanism is external, tools
like log --show-ids won't produce something meaningful, when they
otherwise could.
It's also about colon in particular, because it is probably the most
common namespace separator.
| If this is done, it's probably a storage format thing: the store
| specifies that an id is actually written to the transport escaped in a
| particular way.
Yes, exactly.
| If we did use %-escaping then those characters will need to be
| doubly-escaped when sent over http. So a Unicode character can expand
| to 3 utf-8 bytes, each of which is 3 quoted bytes '%ab'. Then to send
| that over http requires the % characters to be quoted again, expanding
| each to 3 bytes. So each unicode character can expand to 15 bytes,
| which is faintly ridiculous.
Agreed, it is ridiculous. I think I can live with it, though.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFDwxOC0F+nu1YWqI0RAt9OAJ475SZdyXNRfKSY0r1jVfuKmtmPCQCfQoT2
lXk1OWk8+jeCU1c/RJO/CLQ=
=XxtV
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list