Solution for baz2bzr revision ids on windows

John Arbash Meinel john at arbash-meinel.com
Wed Jan 11 15:25:31 GMT 2006


Aaron Bentley wrote:
> John Arbash Meinel wrote:
> 
>>>I have a somewhat pressing need to have the revision-id problem solved.
> 
> 
>>>If I have to, I'll just hack baz2bzr to use # instead of : or somesuch.
>>>(I can use Arch-2# instead of Arch-1: if preferred)
>>>
>>>This page has the forbidden list:
>>>http://www.grouplogic.com/Knowledge/index.cfm?fuseaction=view&docID=111
>>>/ ? < > \ : * | " ^
>>>
>>>^ is only illegal on FAT
> 
> 
> Okay, I think we should make this Arch-2.  It's going to hurt anyhow, so
> we might as well pull the bandaid off quickly.  Since all we're changing
> is the namespace, I'd like an option to generate Arch-1 style revision-ids.
> 

I would have preferred to have bzr handle some of this internally. But
depending on how we do it, bzr might end up doing the same thing, so we
can just stop translating in baz2bzr if we know that bzr is going to
translate itself.
I just need it done sooner rather than later, and fixing bzrlib is more
difficult. (Since it requires a new branch format, which seems to be
taking forever to get, because Robert wants to have the test suite
upgraded to test all supported branch types.)


> I would like to avoid this situation in the future, so I think we should
> make all ids escaped utf8-encoded strings that cannot include any
> forbidden characters.

So if I'm understanding you correctly, you are willing to create
'Arch-2' which is a namespace which passes
urllib.quote(x.encode('utf-8')) as the revision-id.

We have to use a slightly custom quoter, since urllib.quote doesn't
default to escaping '/' => '%2F'.

> 
> This would permit using any character as the namespace separator, but
> for the sake of aesthetics, I think we should use a different character.
>  Semicolon and comma are my leading candidates.

I'm up for either one.
But honestly, we can stick with :, it just becomes:

Arch-2%3Ajohn%40arbash-meinel.com--2005%2Fmifar--dev--0.6

This isn't the most aesthetically pleasing entry. But it means it would
be compatible with what bzr might do in the future.

> 
> If we use URL escaping, we should also change the Arch path separator,
> since % would be %25.  If it's going to be escaped anyhow, we should use
> %2f.  Otherwise, suggestions for an alternate character are welcome.
> 
> Do you have the time to do this work?
> 
> Aaron

def quote(arch_id):
  return urllib.quote(arch_id).replace('/', '%2F')

quote(u'Arch-2:erik at Bågfors.com/test--proj--0.6'.encode('utf-8'))
'Arch-2%3Aerik%40B%C3%A5gfors.com%2Ftest--proj--0.6'

I didn't think arch-ids could contain non ASCII characters. I thought
that was in the plan, but arch never got that far.

If I have to, I can make the time to do this. I need it.

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060111/0afe7340/attachment.pgp 


More information about the bazaar mailing list