Prototype "improved_chk_index"

Thu Oct 29 04:42:50 GMT 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
> A while back I put together a draft of a few ideas I had for improving
> our index code to handle CHK pages. The draft is a bit dense with ideas,
> and certainly not polished. It is in the bzr source as
> 'doc/developers/improved_chk_index.txt', or you can read it here:
> 
>   http://doc.bazaar-vcs.org/latest/developers/improved_chk_index.html
> 
> 
> I've certainly wanted to work on it for a while, so I decided to take a
> "slack day" away from memory work, etc, and poke around at it. I was
> surprised to find that I have a basic proof-of-concept finished in about
> 1 day.
> 
>   lp:~jameinel/bzr/chk-index
> 
> The code is currently only python2.5 compatible because I use
> 'struct.Struct'. At the moment, I've only implemented the 'first half'
> of the spec. Notably:

On IRC Robert asked me why I thought this was worthwhile. When I
originally wrote this spec it was before 2.0.0 was actually out. The
specific motivation was a few things

1) BTreeGraphIndex repeatedly came up on lsprof results

2) We new that for a lot of operations we could thrash the btree.
   'bzr branch old standalone' specifically was a hard case (I don't
   think we've fully 'fixed' it, either.)

3) Disk size, a fully packed launchpad repository is a 104MB .pack file,
   but then you add another 16MB for the .cix

4) Just general observations about ways that the existing code was
   sub-optimal for a random-key index.

Not all of the original reasons apply.

1) I now think this was probably 'gc' overhead co-mingling with lsprof
   overhead. So the fact that btrees were generating a lot of objects,
   while lsprof was tracking, caused-more-than-average gc runs.

2) We've ameliorated this a bit, with having the 'cache everything'
   flag, and tweaking some of the spill-to-disk behavior. Note that I
   don't think this is 100% done, as I think 'bzr branch .. standalone'
   with a --1.9 repository is still faster than a 2a repo.

3) Still stands. All told,  We have 16MB .cix, 10MB .tix, and 2.3MB
   .iix and .rix, and 1.4MB .six. For a total of 31MB of indexes. 31MB
   of index for 100MB of content seems a bit skewed.

   If we could move the .tix in with the .cix, and then shrink that down
   to 1/3rd the current size, you'd end up chopping out ~17MB from your
   repo. And turning your 30% index overhead into just 14%.

4) zlib of our current indexes works pretty well, and we get a rather
   good compression rate. 'bzr dump-btree --raw *.tix' gives me 86MB of
   index compressed down to 9.8MB on disk, *almost* 10:1.
   Doing the same for '.cix' gives me 34MB => 16MB, ~2:1.
   zlib decompression is pretty fast. Fast enough that cold-cache
   performance it is probably faster to decompress than to read the
   expanded size. (especially at 10:1 compression).
   However, when everything is in your disk buffers, zlib.decompress is
   pure overhead. And when your compression is only 2:1, and you can get
   2:1 compression by just switching to binary sha1 from hex sha1, that
   seemed worth looking into.

   Even better was when I stumbled on the idea of reducing the number of
   bits stored in the index, and suddenly I could break the '28 byte'
   lower bound.

Some performance benchmarks for the same workload:
            time    peak memory
                     91MB loading just the request tuples
 bzr2.0.0   12.315  184MB
 bzr2.0.x   12.292  184MB
 bzr2.1.0b1 12.415  184MB
 bzr.dev    10.067  164MB (requests are tuples)
 bzr.dev     9.827  174MB (requests are StaticTuples)
 bzr.dev     9.535  130MB (requests are StaticTuple.intern())

 chk_index   1.744  134MB (requests are StaticTuple.intern())

So note that StaticTuples improved this by about 30%. Mostly from GC
overhead, I would assume (and a 40% memory reduction).

I also think there is still room for speed improvements. For example,
inlining the 'key => sha_hash' code and avoiding the assertions I put
in, drops the time to 1.5s. Not to mention a C extension, possibly using
mmap, etc.

Certainly there is still the question of 'good enough'. It currently
takes ~2m15s to extract all of the ancestry of launchpad from my test
repository. The above numbers hint that at least 10s of that is just
processing the chk index. Not huge, but it would be an ~8% performance
improvement for an actual operation.

I don't know about the time to *build* an index, but there could be some
significant gains there as well. (Traditionally compress() is
significantly slower than decompress(), especially with our current sync
and try again implementation.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrpHUoACgkQJdeBCYSNAAMebgCfQKQu7+rvn4ApbugLfQFZ2a0k
2FIAnjCCqEQUTFy5kerB5WvwuzkJl7Cl
=Mr5C
-----END PGP SIGNATURE-----