StaticTuple... naming, maintenance, ...
John Arbash Meinel
john at arbash-meinel.com
Mon Oct 5 23:58:02 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
So I mentioned in an old thread, but I have a branch here:
lp:~jameinel/bzr/2.1-static-tuple
In it, I worked on re-implementing a 'tuple-like' structure. The basics are:
1) It is limited in what it can point to, so it doesn't participate in
the python garbage collector. (It cannot reference things which would
cause a cycle.)
2) It also is 'static' in that you can intern it based on the hash.
3) I also implemented a custom 'Interner' class, that works basically
like a PySet, but allows lookup and uses half the memory. (1/3rd
the memory of a dict.)
(In loading all of launchpad, 24MB was used for just the dict holding
the interned keys. So I save 16MB just with that.)
4) The next effect for 'bzr branch launchpad' was 17% less peak memory
consumed and *40%* faster (11min => 8min).
So overall, I'm quite happy, and I'd like to look into what it will take
to land this. I have quite a few open questions.
1) Name... it is currently called "StaticTuple", but if it is an object
we are going to make use of, that is a fairly 'heavy' name.
Compare: "key = (file_id, revision_id)"
versus: "key = StaticTuple(file_id, revision_id)"
or even: "key = StaticTuple((file_id, revision_id))"
I think lowercase is reasonable, though I wonder about
"key = stuple(file_id, revision_id)"
2) Constructor args:
"tuple(X)" takes a sequence X, however if you want to create a
3-tuple you have the (a, b, c) short form
As such, I was planning on making "StaticTuple(*args)", so that you
can just change:
foo = (a, b, c)
into:
foo = StaticTuple(a, b, c)
I would probably have a separate "StaticTuple.from_sequence()" for
the other form. You certainly can do StaticTuple(*t), however the
main loss is that "tuple(tuple(t)) is t", while
StaticTuple(*st) would have to be a new object.
3) C/Cython/Pyrex
The #1 memory benefit is removing the python GC header from all of
the objects. (16 bytes / object.)
I can easily define such a type in C, and have done so.
However, as you get into doing more with these objects, (like
creating a C level api to share with other code), there is a *lot*
more maintenance overhead in doing it from C.
You have to do all the exception handling manually, *and* write all
of the boilerplate for exposing the dynamic loading of functions.
In Pyrex/Cython doing so is:
cdef object myfunc(object):
becomes
cdef api object myfunc(object):
Doing so in C is about 4 lines of boilerplate per function, type
checking, etc. Plus another 20+ lines that you have to write to
describe that you *have* a C api that should be loaded.
In the end, I wrote "StaticTuple" in C, and "StaticTupleInterner" in
Cython, and the latter took a day, and the former took a week. It is
a "sunk cost", but ongoing maintenance is not.
The main issue here is that Pyrex will not generate objects without
the HAVE_GC flag set. Cython >= 0.11 can (as long as you don't have
'object' attributes, which is true here, because I have to use
PyObject** because neither Pyrex nor Cython support C arrays of
objects)
The difficulty is that would be a hard jump to go from Python 0.8 or
so (doesn't even support +=) to Cython 0.11 (it is in Karmic, but
Jaunty only have Cython 0.10).
I would *really* like to switch to Cython 0.11+, as I have specific
benefits. One could argue that we could try to be compatible, and
people can compile using Pyrex, and just wouldn't get the memory and
speed improvement of avoiding the GC...
I'm also using stuff like 'cpdef' and 'inline', but I can work
around those things easily enough. I can't hack the 'HAVE_GC' flag
easily.
I'd like to get some feedback, so I have a feel what I need to do to
finish this off and get it landed. I think this is a net win, and we
just need to decide some of the finer details and balance points.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkrKefoACgkQJdeBCYSNAAP8EACdFVN6zY0+1V6yEHr02oYkAPsM
RF8An22YD1mNSUpiHq6aEwEHgR5FDH5V
=f4Xf
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list