[RFC] "short" revision id spec.

John Arbash Meinel john at arbash-meinel.com
Wed Apr 11 15:48:03 BST 2007


Martin Pool wrote:
> On 4/11/07, John Arbash Meinel <john at arbash-meinel.com> wrote:
>> For bzr.dev there seems to be enough entropy that you only really need
>> to supply 4 characters (often 2 or 3 is enough).
> 
> That looks nice.
> 
>> So I have a few questions about the spec:
>>
>> 1) Is it reasonable to merge this into bzr.dev? I think it is generally
>> useful, but I wanted to play with it a bit first.
> 
> I haven't looked at the code but the feature sounds reasonable.

Yeah, I was mostly talking about whether the concept was reasonable.

> 
>> 2) I don't really like 'srevid:' and grabbing 'r:' seemed a little bit
>> greedy. I'm okay with it in a plugin, because plugins let you do things
>> like be greedy. Because it doesn't have to work for everyone everywhere.
>> Any thoughts on what the prefix should be?
> 
> I think using a one letter prefix is ok.
> 
>>
>> It might be possible to implement this, such that we don't even need a
>> prefix (it would work as a catch-all). But I would rather not do that
>> yet.
>>
>> 3) r:x--y versus r:x-y or other forms. I chose -- because I prefer the
>> look of it. The single dash looked like I was actually trying to match
>> part of the revision id, rather than only trying to match the prefix and
>> suffix.
>>
>> Obviously the real matching power is only in the suffix. The reason I
>> write them as prefix--suffix, is because that gives me the human being a
>> bit more of a handle on the object. It works better in graphing, because
>> I can associate 'that revision was committed by Aaron', even though the
>> committer doesn't really matter, it works better than "ae948".
> 
>> Actually, my favorite "short" form is: user-date-hash. So my graphs use:
>>
>>   john-20050918-c6498
>>   robertc-20050919-51340
> 
> So does this have to match the start of the hash part, or the end of
> it, or anything within?

Right now it is set to just the tail of the revision-id. Partially
because I didn't want to assume all revision ids are user at host-date-entropy.

> 
> ... I wonder if we should in fact allocate ids with just the username
> component of the committer, leaving out the domain...
> 

Well, we are pretty much expecting the entropy and date to give us
uniqueness. It is probably sufficient, and would make for shorter
revision ids. Heck we could make the entropy longer without any real
sacrifice.

If we change:
aaron.bentley at utoronto.ca-20070411063909-t630ktlwrss64yyk

to
aaron.bentley-20070411063909-t630ktlwrss64yyk

or even
aaron.bentley-20070411063909-t630ktlwrss64yykaabbccdd

It helps more with ones like:
abentley at panoramicfeedback.com-20070410173827-hvguwu1iaxxk3olo
to
abentley-20070410173827-hvguwu1iaxxk3olo

So I don't have a strong preference here. I like shorter revision ids
just because they take up less space, and when you have 100,000 it does
start to add up. But with a reasonable compression algorithm, you can
probably fix that anyway.

John
=:->

PS> One of the ideas would be to allow things like 'bzr annotate' to
show these short forms, but at the moment it isn't well fleshed out.



More information about the bazaar mailing list