New format checklist

Tue Jan 10 01:53:06 GMT 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
| Frankly I would still prefer we simply constrain the characters that can
| be used in file-ids to something believed safe in relevant contexts, and
| so avoid a whole encoding/decoding stage.  Are other people still
| strongly opposed?
|
| As I recall the arguments are:
|
|  * we might guess wrong, and allow a character which turns out not to be
| permitted in some relevant context, so we need to add escaping after all
|
|  * There is existing data from baz2bzr that uses ':' in file-ids.

It's not merely the existing data.  It's also about presenting a useful
API.  If API is constrained to a particular subset, then any client
which wants to use it to store other characters must define its own
escaping mechanism.  Since the encoding mechanism is external, tools
like log --show-ids won't produce something meaningful, when they
otherwise could.

It's also about colon in particular, because it is probably the most
common namespace separator.

| If this is done, it's probably a storage format thing: the store
| specifies that an id is actually written to the transport escaped in a
| particular way.

Yes, exactly.

| If we did use %-escaping then those characters will need to be
| doubly-escaped when sent over http.  So a Unicode character can expand
| to 3 utf-8 bytes, each of which is 3 quoted bytes '%ab'.  Then to send
| that over http requires the % characters to be quoted again, expanding
| each to 3 bytes.  So each unicode character can expand to 15 bytes,
| which is faintly ridiculous.

Agreed, it is ridiculous.  I think I can live with it, though.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDwxOC0F+nu1YWqI0RAt9OAJ475SZdyXNRfKSY0r1jVfuKmtmPCQCfQoT2
lXk1OWk8+jeCU1c/RJO/CLQ=
=XxtV
-----END PGP SIGNATURE-----