reintroducing root ids

Sun Feb 12 22:08:19 GMT 2006

On Wed, 2006-02-08 at 14:39 +1100, Robert Collins wrote:
> So, two concepts now - repo scaling and nested trees have +1s on using
> root ids.
> 
> 
> Now we need an algorithm to sanely introduce them to existing trees, and
> generate them in new trees.
> 
> I kinda like Aarons 'add them when you commit with the parent being the
> NULL_REVISION' concept.
> 
> 
> Related to this is the root id to use in arch conversions - if you
> convert two related branches independently we should end up with the
> same root id.
> 
> 
> I dont have a proposal to make, other than that we figure this out :)

Ok, based on discussions with Martin (on the phone) and other comments
on list, how about the following proposal:

All ideas are other peoples, all mistakes are mine ;)

 * Revisions get a root_id property.
 * There is a branch format change to introduce this, and you cannot
pull any data from a newer format branch which has this to an older
branch that does not. [this trapdoor is needed to avoid data-losing
loops].
 * For existing revisions, as they are written into the new format
branch, or on the fly if needed, we follow the left most ancestor all
the way up.
 * For baz conversions we can use the log files in the branch to help.
 * When we detect the results from differing conversions, we take the
value of the conversion that had more history, and rewrite the
repository as needed. This should happen extremely rarely, and by taking
the long view each time this will result in convergence.
 * We accept that we are creating inventories and revisions that have
the same id and different value in different repositories *in this
specific case*, but as we can detect it and correct it, we do so. That
is, if during a pull operation we see a revision we dont have but do
have a common ancestor for, that has a root id which is the same as a
revision that that repository does have then we know that its a
converted revision from before root ids existed, and it should have the
same root id as we did for the common ancestor. We can then review the
graphs and pick which common ancestor should have won and either
translate as we read from that repository, or rewrite ours to take their
value.
 * We write a 'reconcile' command to trigger manual reconciliation (i.e.
if you dont want to merge or pull from a repo, how do you get this to
converge). This would replace reweave and fetch-missing as a single
repair tool.
 * We special case 'diff' and 'status' with the following heuristic for
all directories [well, maybe just the root, but hey]. When diffing A and
B, if there is a directory path with id X in A, and X is missing from B,
and a directory path with id Y in B, and Y is missing from A, treat X
and Y as aliases to each other. Note that this is a variation on
inventory id aliasing which is a more general solution, but post 1.0
IMO.

I can't think of any robust solution that does not involve either a
forced, one time pull of ALL the data into a single repository for
upgrading, or accepting that variation between these old revisions can
occur. Manually specifying an id is not robust - mistakes lead to
variation between revision representation; forced upgrades are not
robust and are extremely unfriendly to the user.

Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060213/864ef99d/attachment.pgp