compressed weaves, and revision.weave

John A Meinel john at arbash-meinel.com
Tue Oct 25 09:43:07 BST 2005


John A Meinel wrote:
> Martin Pool wrote:
>
>>On 25/10/05, John Arbash Meinel <john at arbash-meinel.com> wrote:
>>
>>I think compressing the storage is a good idea, but I'd like to switch
>>to an append-only indexed weave-like format at some time in the
>>future.  I have the start of some code for that here.
>>
>>  http://people.ubuntu.com/~mbp/bzr.mbp.knit/
>>
>>There is some tension between such a format and compression; I suppose
>>we could just compress each appended record of the file independently.
>> The ratio might not be as good but it would eliminate some text
>>redundancy, and I suppose we can rely on the delta compression to get
>>some more.
>>
>>I think I'd like compression to be optional; for local access the CPU
>>cost may be more than people wish to pay.
>
>
> I believe I have implemented it, I was able to upgrade the bzr.dev tree.
> But I didn't write any more tests.

I just fixed the tree so that all tests pass again. This is now revno=1357.

This tree does 3 major things.

1) Switches from using cElementTree to write out elements, to writing
   them out directly. At the same time I updated the xml format version.
   Which isn't strictly necessary, but I chose to do it, because the
   inventory_sha1 no longer matches
2) Compresses the revision-store into a revision.weave file.
3) Pays attention to a ".bzr/compressed" file. If it exists, and it
   contains the value "true\n", then it expects all weaves to be in
   compressed form. You can change the state of the current tree using
   "bzr upgrade --compress" or "bzr upgrade --uncompress".

   The default when creating a new tree is compressed.
   The default upgrade is uncompressed.

   It won't work with a rsync push, but hopefully in the future, a
   standard "bzr push" will be able to have the remote branch be
   compressed, and the local branch be uncompressed. (This works
   already with "bzr pull").

I don't know if you want to try and merge this before you go to Canada.
It certainly should be tested some more.

To give some of the benefits of this branch, here are the statistics.

bzr.dev 2369 revisions
	inventory.weave	1.8M 1843774
	revision-store	1.0M 1064510
	.bzr/		8.0M 8346945

bzr.dev upgraded uncompressed
	inventory.weave	1.1M 1176233
	revision.weave  1.1M 1132526
        .bzr/		7.4M 7748158

bzr.dev upgraded compressed
	inventory.weave.gz 361.3K 369922
	revision.weave.gz  327.6K 335477
	.bzr/		     2.1M 2189365

Yes, that last number is accurate. With all of the weaves compressed, I
get a: "du -ksh *" of 2.4M, and "du -ksh .bzr" of 2.9M
So with compressed meta-data we are actually very close to a 1:1 ratio.

I haven't done a lot of performance testing with the now-compressed
files, though.
So we might play with it, and see what the real tradeoffs are. But
considering the current code has to download all of the
"inventory.weave" file before much can be done (because that is how it
gets the ancestry), it seems worthwhile to check compress it down to
only 1/5th of its size.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051025/e4a36aa5/attachment.pgp 


More information about the bazaar mailing list