[RFC] Reduce memory with Key and Keys (and improve performance)

Fri Oct 2 23:11:07 BST 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> Robert Collins wrote:
>> On Thu, 2009-09-10 at 16:01 -0500, John Arbash Meinel wrote:
>>
>>> Given the 10%-ish memory reduction, and the fairly impressive speed
>>> improvement, I'm thinking it is probably worth investigating this further.
>> I'm very much in favour of investigating this further.
> 
> Me too!  These are extremely promising results.  And as Robert says further
> down, having real objects that we can add methods and attributes to is an added
> bonus.
> 
> -Andrew.

To give a little bit more update about where I'm at.
The new branch is at:
  lp:~jameinel/bzr/2.1-static-tuple

With a basic design that falls into 2 parts:
 1) A 'StaticTuple' type which for all intents and purposes is a tuple
    with some small variations

    a) It can only reference other StaticTuples or PyStrings
    b) It has a '.intern()' attribute which canonicalizes it (similar to
       intern() for strings)
    c) intern() also is designed not to create immortal objects (so when
       the last refcount is deleted, it looks if it needs to remove
       itself from the interned group.)

 2) A 'StaticTupleIntern' object, which is essentially a PySet, only it
    *doesn't* cache the hash value associated with a reference.

    a) This means that it uses 1/2 the memory of a PySet, and 1/3rd the
       memory of a PyDict (which is used by string's intern)
    b) It also exposes some a Get function which PySet doesn't, and its
       "Add" function returns the object at that location.
    c) Which generally means you only have to do 1 lookup to get the
       canonical form rather than one for "Is there one already" and
       another for "Now put this one there".
    d) It also isn't put into the python GC because Pyrex doesn't know
       how to handle my raw-pointer (it Pyrex/Cython doesn't handle
       arrays of objects without going to a List object.)
       On the plus side, I didn't *want* GC to walk this structure,
       because it has 100-500k pointers that don't need to be evaluated.

Results look promising, though it isn't where I'd like it to be in all
cases.

For "load all the parent maps across all versionedfiles" it is quite
good. For my launchpad branch I'm at:
  327MB, 53.4s for bzr.dev
  244MB, 26.5s for my branch

Time for 'bzr log -r -1 -n1' (load the whole graph but only display the
last revision) is pretty much unchanged at around 700ms.

Time for 'bzr log -n1 -v' is also mostly unchanged, and memory
consumption has actually gone up slightly.

Time for 'bzr branch bzr.dev standalone' is:
  407MB, 2m04s for bzr.dev
  332MB, 1m42s for my branch

So a 1.23:1 savings in memory and 1.22:1 savings in speed. And from what
I can tell, this should only increase with something like a Launchpad tree.

Right now it requires Cython to build, because there are a few
constructs that looked nice. It probably isn't a huge difference, so I
may backport to pyrex, but I figured while exploring the space, I'd keep
it easy on myself.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrGensACgkQJdeBCYSNAAMjswCgmyJIghlYCuyxioBtMqlAsbMu
6ZYAoMcKnYH4iQZkC4ix+lvVchrVL3sn
=m3BF
-----END PGP SIGNATURE-----