B+Tree indices: ongoing progress

Robert Collins robertc at robertcollins.net
Wed Jul 2 03:22:58 BST 2008


On Wed, 2008-07-02 at 12:04 +1000, Jonathan Lange wrote:
> On Wed, Jul 2, 2008 at 11:58 AM, Robert Collins
> <robertc at robertcollins.net> wrote:
> > Also, regarding zlib object copying:
> >>>> import zlib
> >>>> c = zlib.compressobj()
> >>>> c.copy()
> > <zlib.Compress object at 0x7f21b6ee2dd8>
> >
> > so my python has it - if yours doesn't, then that backs up my theory
> > that it would be a hassle to use it :). Though we could try and
> > fallback...
> >
> 
> I'm missing some context here, but I'll barge in anyway.
> 
> As a data point, the object returned by zlib.compressobj() doesn't
> have a copy attribute in Python 2.4. In Python 2.5, it works as you
> say.

Righto - I didn't check the zlibmodule.c code in 2.4 :).

The context is that we want to pack a lot of keys into a 4K page, and
have that page compressed so that we can read page aligned nodes rather
than having to store individual byte pointers. And there is a cost to
determining how many keys will fit in. Ideally we'd have the current
bit-length from the compressor available, and know what it would take to
flush, but it doesn't work quite that way.

We can flush, which reduces compression efficiency (it outputs a new
compression record including 4 bytes of padding), or we could copy the
internal state of the compressor, try, and if it fits keep going,
otherwise restore the copy, finalise and start a new one. This might be
cheaper (but copying the compression stream might be slow..)

So if copy() is faster we could use it where available and use a slower
method elsewhere.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080702/682fa7d5/attachment.pgp 


More information about the bazaar mailing list