[RFC] Alternative to current push/pull semantics [use weave.join]

Fri Dec 16 22:07:57 GMT 2005

On Friday 16 December 2005 20:56, you (John Arbash Meinel) wrote:
> Goffredo Baroncelli wrote:
> > On Friday 16 December 2005 18:59, John Arbash Meinel wrote:
> > [...]
> > 
> >>What I was thinking is that we probably could cheat, and instead of just
> >>adding each text, just do a weave.join().
> > 
> > 
> > weave.join( ) does so: it extracts every text not present, then adds it to the
> > target repository.
> > However I think that for non merge revision ( == revision with only one parent )
> > it is possible to merge to a weave without extraction then addition...
> > I need some time to write the code.
> > 
> > [...]
> 
> Yes it does extract of every text. But it does it at 1 time, rather than
> doing it once, writing out the file. Then reading it back in later and
> doing it again.

No, when a merge between two weave is performed, the merge is made against all
revision of both the weaves. So if a merge happen, this is sufficient for all 
the revision. Moreover the revisions list resulted are cached in order
to avoid unnecessary merge.

> I also agree that weave.join() should be optimizable to not require
> re-doing the diffs. But it probably isn't worth the effort if we are
> switching to something like knits.

I hope so; but until knits format came the weave can ( and have to ) be optimize
> 
> > 
> > 
> >>But it would change the current semantics to:
> >>	download the remote weave header
> >>	see that it is missing the revision we want
> >>	download the full remote weave
> >>	read the local weave
> >>	weave.join()
> >>	save the remote weave
> > 
> > 
> > 
> >>After that, even if we don't cache what weaves have what revisions,
> >>future steps would just be:
> >>	download the remote weave header
> >>	see it has the revision we care about
> >>	no upload needed
> >>
> >>Now right now we can't just download the header. I think we really can,
> >>but since our buffer size (32k) is greater than the average file size
> >>(8k), it doesn't gain us much. 
> > 
> > 
> > In order to know which revision are in the weave it should be sufficient to inspect
> > the history weave: in this file are recorded both the file id and the
> > revision id.
> > If you want to know which revision id of the README file are in the repository:
> > 
> 
> My whole point is that we would add extra revisions inside a weave file
> that may not be represented in the remote inventory or revision-store yet.
> The point is that when we get the chance, add all the revisions for a
> given file-id in the belief that more likely than not, we are going to
> want to do it in the (near) future.

I disagree: the risk is that you add revision which you wont utilize.
This is a waste of space. 

> The constraint we are working under is that *if* a revision is present
> in the revision-store, then its inventory, and all associated texts are
> also present. There is no constraint that there isn't extra information
> in the weaves themselves.

No, the constraint is that if a revision is in inventory, then have to be also 
in inventory store and in the weave.
Storing extra information in the weave or in the revision store can be explode the
use of disk space.

-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack at inwind.it>
Key fingerprint = CE3C 7E01 6782 30A3 5B87  87C0 BB86 505C 6B2A CFF9