[RFC] __BIG__ speed improvment in clone/branch on http transport

Martin Pool mbp at sourcefrog.net
Mon Dec 5 13:08:10 GMT 2005


On  5 Dec 2005, Goffredo Baroncelli <kreijack at alice.it> wrote:
> On Sunday 04 December 2005 23:28, you (Robert Collins) wrote:
> > On Thu, 2005-12-01 at 22:48 +0100, Goffredo Baroncelli wrote:
> > > Hi all,
> > > 
> > > the patch below introduces a new function to the branch class. The function
> > > is named file_involved(rev1,rev2 ) and returns the file_id 
> > > involved in changes between the revision1 and the revisions2. Moreover
> > > this function can be called with a set of revisions as argument: in this case
> > > file_involved() returns the file_id(s) involved in the revisions set.
> > > 
> > > This function is __very__ useful during the clone/branch function because
> > > with this is very easy know which weave we need to update/download.

That's very nice, thankyou.

> However I will select some test case; but one question: what is the definition 
> of a 'ghost revisions' ? I know their existence but it isn't clear, to me, what they are.
> On the basis of the bzrlib.check.check_one_rev( ), it seems that a ghost revision
> is a revision referred by another revision via its parent_ids, without
> existing in the revision store: it is correct ?

That's correct.

The particular difficulty they cause is that weaves represent ancestry
using only the revisions present in the weave, and so when we get a new
parent of a revision that's in the weave we have to reweave it.  This is
not entirely satisfactory.

Ghosts should rarely or never occur in bzr itself, but can happen when
you've done an import from arch, because Arch archives commonly have
ghosts (refer to revisions whose text isn't known.)

> But what are the difference between a ghost revision and a missing
> revision ?

I don't think 'missing' has a well-defined meaning.  As used by the
'missing' command, it tells you which revisions are present in another
branch that aren't in another.

> However, the weave.join code can be faster: it is possible to check the existence
> of a revision without expanding its contents: the gain should be another 2x...
> My problem is  to understand why sometime it is raised a WeaveParentMismatch
> exception, and why, if it happens, instead the weave.join( ) the
> function reweave( ) is called. The comment refers to the problem of the ghost revisions
> ... but it isn't very clear

Suppose there is a revision R2 with two parents R0 and R1.  This is
globally true.  A particular branch storage didn't get to see R1, so
it's weave for R2 has only R0 as a parent.  (The revision file knows
about both.)  If it does find a copy of R1, we observe that a new parent
has been found and so we need to rebuild the weave.  

The reweave is a bit simpleminded, regenerating things even when they
probably haven't changed.

-- 
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051206/533ed32f/attachment.pgp 


More information about the bazaar mailing list