Making diff fast (was Re: Some notes on distributed SCM)
Robert Collins
robertc at robertcollins.net
Sun Apr 10 23:59:10 BST 2005
On Sun, 2005-04-10 at 18:24 -0400, Daniel Phillips wrote:
> On Sunday 10 April 2005 18:17, Aaron Bentley wrote:
> > Benno wrote:
> > > 1/ Caching the working tree stat data, and then being able
> > > to simply stat each file and compare the stat information.
> > >
> > > Pros: Portable, simple to use.
> > > Cons: Still requies a full search of the tree which is slow.
> >
> > This is what Arch does, and it's quite slow on large trees. Robert
> > Collins has recently improved this in Baz, but it doesn't change the
> > fact that it's an O(versioned files) operation, rather than O(changed
> > files).
>
> But statting the full working copy kernel tree takes less than .1 second if
> the dentries are in cache, and it takes less than 5 seconds to get them in
> cache. What is wrong with that?
the tla codebase is somewhat dumber than that - if you have 20K files
under revision control, tla currently requires another 20K 'id files',
so you are stating 40K files - and then a model limitation (ids are in
id files,not centralised) requires cross checking the id's stat (even if
foo.c hasn't changed, the id for it may have), leading to a stat of 60K
files without fancy programming.
Oh, and it can take more than 5 seconds to get them in cache, depending
on various factors like current IO load.
That said, I don't see any problem in stating every file in the tree
once, as its not a heavy load for the cache, and a few seconds shouldn't
break anyones patience.
Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050411/370d01cd/attachment.pgp
More information about the bazaar
mailing list