Prototype "improved_chk_index"
Ian Clatworthy
ian.clatworthy at canonical.com
Thu Oct 29 23:52:58 GMT 2009
John Arbash Meinel wrote:
> So I think we *could* do better about size if we want to put a fairly
> significant amount of effort into it. The "easy" fixes would be:
>
> 1) Move the text content into .cix, and then only have the per-file
> graph available in .tix. (Allowing us to remove the 'value' field),
> saving about 1.25:1
> 2) Fix the per-file graph for root nodes to not require a node for every
> revision that came from a non-rich-root source. That saves another
> 1.28:1 for a total 1.6:1 space savings in .tix
> 3) Think about some way to combine .rix and .iix. Possibly just dropping
> the inventory records entirely. We talked about doing that in the
> past. The most significant issue is stacked branches needing the
> 'parent inventories but *not* the parent revisions'. Though we could
> do that with a simple flag in the index that said "this revision not
> considered 'present'"...
> This is 2.3MB of the 30MB in indexes for LP, so <10% total space. But
> becoming a more significant fraction if we shrink .cix and .tix.
As another data point, the FireFox 3.5 import shows:
* 123M pack file
* 13M indices
* 11M checkout/dirstate
* 4.1M checkout/merge-hashes
The index sizes are:
6.1M .bzr/repository/indices/43a941041bdf68b431bc9c73b9004fd1.cix
1.2M .bzr/repository/indices/43a941041bdf68b431bc9c73b9004fd1.iix
1.2M .bzr/repository/indices/43a941041bdf68b431bc9c73b9004fd1.rix
4.0K .bzr/repository/indices/43a941041bdf68b431bc9c73b9004fd1.six
4.3M .bzr/repository/indices/43a941041bdf68b431bc9c73b9004fd1.tix
Looking inside the matching .git import (after pack -adf --window=250):
4.0K .git/branches
4.0K .git/COMMIT_EDITMSG
4.0K .git/config
4.0K .git/description
4.0K .git/HEAD
48K .git/hooks
4.1M .git/index
16K .git/info
88K .git/logs
123M .git/objects
332K .git/refs
.git/objects is the matching pack file.
So head-to-head, both tools have a 123M pack file. Beyond the pack file,
git's overheads are 4.1M and ours are 28.1M. That certainly suggests we
have room for improvement in this area.
Ian C.
More information about the bazaar
mailing list