Weave Insert Time
John A Meinel
john at arbash-meinel.com
Wed Oct 5 19:03:05 BST 2005
I was doing some tests on weaves (with my converting revision-store to
weaves stuff).
Basically, I just timed how long each insert took, and compared that to
how many revisions were already in the weave.
You can see the resultant graph here:
http://bzr.arbash-meinel.com/other/RevisionInsertTime.png
Basically, it looks like the time to insert one revision is very close
to O(N) where N is the number of revisions already in the weave. I think
that makes it O(N^2) when you are creating the weave for the first time.
I actually fit a power-law trend, but it seems to be N^0.9 which for me
is just N. (Linear fit did poorly, with R^2 ~.7-.8, not .95)
The 3 lines on there are:
Newformat- The latest bzr.newformat
SAX- This is my change to bzr which prints directly to the file, rather
than using ElementTree, so I put each attribute on a line by itself,
which is much friendlier to the weaves
Newformat-rev Reformatted bzr.newformat. Almost identical to SAX, but
only modifies the <revision> tag, not the <revision_ref> etc.
To give a filesize comparision
1001k revs.weave-newformat
914k revs.weave-newformat-reformed
888k revs.weave-sax
This compares to a 'du --apparent' size of about 879k (a 'du' of 7.1M).
So the revision compresses just enough to be the same size. After real
compression, they do get smaller, and the differences seen above pretty
much go away:
303k revs.weave-newformat-reformed.gz
298k revs.weave-newformat.gz
299k revs.weave-sax.gz
223k revs.weave-newformat-reformed.bz2
224k revs.weave-newformat.bz2
221k revs.weave-sax.bz2
Mostly I wanted to point out that the current Weave format really is
about O(N) for inserting texts. And you can get better compression for
Revision XML, at the expense of being slightly slower. (I think there is
just more work for diff to do, since there are more lines).
I would really like to do a comparison of inventory weaves, especially
extraction time vs number of revisions, and then compare that with my
reformatted SAX version, and the original version. But that is a lot
more scripting. So maybe I'll get that done.
So at least for the revision text, WeaveInsert is approximately O(N),
which I was worried it was O(N^2), so this is good to see.
Naturally, O(1) or O(num_ancestors) would be much nicer, I'm not sure
what would be possible, but I think O(num_ancestors) is feasible. We
just have to find a way to not look at non-ancestors.
Or maybe something O(aN + b num_ancestors) where a is really small
compared with b, so isn't as important in the N < 100k where we expect
bzr to work.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051005/0a2910e1/attachment.pgp
More information about the bazaar
mailing list