B+Tree indices: ongoing progress
Robert Collins
robertc at robertcollins.net
Wed Jul 2 08:44:06 BST 2008
On Wed, 2008-07-02 at 17:12 +1000, Robert Collins wrote:
> On Tue, 2008-07-01 at 22:46 -0500, John Arbash Meinel wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Robert Collins wrote:
> > | On Wed, 2008-07-02 at 13:19 +1000, Robert Collins wrote:
> > |
> > |
> > |> And some good news:
> > |> Baseline overhead: Read 92994 keys in 7.075
> > |> GraphIndex: Wrote 10147567 bytes in 10.949
> > |> GraphIndex: iter_all_entries in 9.671
> > |> GraphIndex: iter_random_one in 11.915
> > |> GraphIndex: get_components_positions(2000) in 20.535
> > |> GraphIndex: search_from_revid in 6.516
> > |> GraphIndex: miss torture in 343.138
> > |> GraphIndex: -------Done---------
> > |> Baseline overhead: Read 92994 keys in 7.072
> > |> BTreeIndex: Wrote 4104336 bytes in 20.299
> > |> BTreeIndex: iter_all_entries in 1.850
> > |> BTreeIndex: iter_random_one in 115.365
> > |> BTreeIndex: get_components_positions(2000) in 33.332
> > |> BTreeIndex: search_from_revid in 4.581
> > |> BTreeIndex: miss torture in 41.803
> > |> BTreeIndex: -------Done---------
> > |>
>
> On the same repository, with bloom filters added:
> Baseline overhead: Read 92994 keys in 6.734
> BTreeIndex: Wrote 4108432 bytes in 28.415
> BTreeIndex: iter_all_entries in 1.955
> BTreeIndex: iter_random_one in 120.980
> BTreeIndex: get_components_positions(2000) in 40.829, bloom(144839)
> BTreeIndex: search_from_revid in 6.911, bloom(32233)
> BTreeIndex: miss torture in 53.916, bloom(413765)
> BTreeIndex: -------Done---------
>
> Which is to say - we avoid 100's of thousands of leaf lookups but the
> cost of the (sha1 based) bloom lookups exceeds the cost of checking
> locally. I think the easiest way to test this is to drop disk caches
> before running the test loop, and I'm about to do that.
with cache dropping:
Baseline overhead: Read 92994 keys in 11.003
GraphIndex: Wrote 10147567 bytes in 11.606
GraphIndex: iter_all_entries in 12.029
GraphIndex: iter_random_one in 14.782
GraphIndex: get_components_positions(2000) in 24.927, bloom(0)
GraphIndex: search_from_revid in 10.226, bloom(0)
GraphIndex: miss torture in 348.412, bloom(0)
GraphIndex: -------Done---------
Baseline overhead: Read 92994 keys in 11.336
BTreeIndex: Wrote 4108432 bytes in 27.952
BTreeIndex: iter_all_entries in 2.877
BTreeIndex: iter_random_one in 124.600
BTreeIndex: get_components_positions(2000) in 48.122, bloom(143304)
BTreeIndex: search_from_revid in 7.577, bloom(32233)
BTreeIndex: miss torture in 56.051, bloom(413765)
BTreeIndex: -------Done---------
Which I find useful; but its not conclusive. More news tomorrow I think,
modulo an update with a better test script and so forth before I retire.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080702/d1167680/attachment.pgp
More information about the bazaar
mailing list