Email in bazaar repository---not replacing, but redirecting

Joseph Wakeling joseph.wakeling at webdrake.net
Thu Mar 9 01:06:21 GMT 2006


Aaron Bentley wrote:
> Revision data is write-once.  It is a very, very bad thing to change
> write-once data in a distributed system.

I've been doing some thinking about this.  As it was, I realised after
some browsing of the .bzr directory contents, and some thinking of,
"Hmm, can I really edit this ... ?", I came to the conclusion that it
was far, far, *far* too dangerous to actually change the data.

Then I remembered an algorithm from physics.  There is a simple concept,
percolation theory, which consists of the following model: take a
lattice of size LxL, and for each site on the lattice, randomly set it
to be 1 ("occupied") with probability p, 0 ("empty") with prob. (1-p). 
Each site's value is determined independently.  You get "clusters" of
neighbouring sites which are all 1.  See e.g.
http://www.krl.caltech.edu/~adami/CD1/Percolation/percolation.html to
check out the pretty patterns that result.

The question is, when you're simulating something like this, how do you
check how many clusters there are and how big they are in a
computationally efficient way?  Maybe you want to label the different
clusters, 1, 2, 3, 4, and then you have a vector size[k] which gives you
the size of cluster k.  The problem is that in going through the lattice
to check which sites are connected, you may initially think that two
clusters are separate, but then find they are linked---i.e. they are two
different bits of the same cluster.  How do you get the computer to
"know" this without relabelling all the sites in one of the parts?

The most efficient algorithm does something along the following lines: f
you suddenly find that clusters m and n are connected, where m is lower
than n, you set size[m] = size[m] + size[n], and size[n] = -m.  What
that means is that size[n] now points to cluster m as its "parent" or
"root" which enables future additions to the cluster to be directed
appropriately.

I'm wondering if something like that could be put into bzr which points
old user IDs to new ones, so that if a revision is listed in the .bzr
repository as by "myname at computer", bzr log and other functions can work
out that they should display instead "myemailaddress at gmail.com" or whatever.

The rough idea would be, for each committer in a repository, a file
commiter at email.com would be created which might contain either,
   displayID=self
or
   displayID=newemail at address.com

Now, when bzr log goes through the repository, it checks the id of a
committer against that id's file.  If the file contains displayID=self
then the log displays the committer's info as is.  If it contains
displayID=alternative at id.com then the log gets the identification
information from that new ID.

Probably easier than actually changing files. :-)

I don't yet know Python and still less are my general programming skills
up to bzr development, but does anyone feel like running with this idea?

        -- Joe



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060309/529743e3/attachment.pgp 


More information about the bazaar mailing list