MySQL in chk-inventory

John Arbash Meinel john at arbash-meinel.com
Wed Dec 10 16:59:44 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I finally managed to get a full conversion of my mysql repository into
split-inventory repository. The results are pretty much what we've been
expecting, but I figure it is nice to have actual results:

Commits: 63546
                      Raw    %    Compressed    %  Objects
Revisions:     127574 KiB   0%     50804 KiB   5%    63546
Inventories:  1012790 KiB   4%    567565 KiB  59%  1263035
Texts:       20555561 KiB  94%    331055 KiB  34%   259395
Signatures:         0 KiB   0%         0 KiB   0%        0
Total:       21695927 KiB 100%    949425 KiB 100%  1585976

Extra Info:           count    total  avg stddev  min  max
internal node refs   858037  8011413    9    8.3    2   29
internal p_id refs    60426   414721    6    8.0    2   29
inv depth            269751  1757736    6    2.8    1   17
leaf node items      269751  1734108    6    4.6    1   18
leaf p_id items       11275   113518   10    8.6    1   38
p_id depth            11275   120390   10    5.1    1   23

The average depth of the inventory is 6, but the average depth of the
parent_id,basename => file_id map is 10. With a max depth of 17 and 23,
respectively.

We end up with an average of 19.7 inventory nodes per revision, which is
8.9kB in compressed form (approx 450B per node, 15.9kB uncompressed =
800B per node). This is pretty far off the 4kB we were originally
thinking for each node.

For the file_id=>inventory_entry map, have 269k leaf nodes, versus 858k
internal nodes. Or about 3:1 internal versus leaf.

For the parent_id,basename=>file_id map, we have 11.2k leaf versus 60k
internal nodes, or 5:1. Which is comparable to the average depth of 10
versus 6.

It is interesting to me to see how infrequently the tree-shape map
changes versus the inventory content map. A total of 113k leaf items
versus 1.7M.

Anyway, time for me to get back to actually improving things. For those
watching, I used a fairly hacked-up version of bzr to cache inventory
objects during extraction, etc. But it took approx 15 hours to convert
everything. (4 hours for the first 30k revs, and 11 hours for the last,
but I think my machine started hitting swap, as it had a peak memory
consumption of around 1GB.)

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk/9YAACgkQJdeBCYSNAANarwCePzYk1PvzrEg2mv2M4Ki3Eg3M
X/MAoLjSPIaygma/F72TESl66rEIEcYj
=NXsT
-----END PGP SIGNATURE-----



More information about the bazaar mailing list