Optimizing tree building

Fri Jun 8 00:01:35 BST 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

Attacking tree-building from the other angle, I did a profiling run with
kcachegrind.

We spend 72% of our time in workingtree_4.get_file_lines.
We spend 26% of our time in _KnitIndex._load_data.
We spend 49% of our time in _get_content_maps, reading the various
records for knit data.

It seems essential to me that we avoid parsing entire indices, because
the size of these indices scales linearly with the length of history.

I haven't got a clear idea how to do that.  One option would be to sort
the indices, so that we can bisect into them.  It would help if they had
fixed-length records.

But then we couldn't append to indices, and rewriting indices would have
the same scaling problem.  So perhaps we'd have a series of indices?

But then how would we know which index held the information for a given
revision?  Maybe we put a bloom filter in the filenames?

I really don't know where this will lead to.  But even if we don't move
to a blob format for storing revision data, the index problem has to be
solved.

Anyone?

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGaI5P0F+nu1YWqI0RAggZAJ49jd1LnZmw8kn6v9nE4DoHq9//nACfRz/Z
vTHHea3mpogqSsTMfeOxE8Q=
=Eoo+
-----END PGP SIGNATURE-----