compressed weaves, and revision.weave
John A Meinel
john at arbash-meinel.com
Tue Oct 25 09:43:07 BST 2005
John A Meinel wrote:
> Martin Pool wrote:
>
>>On 25/10/05, John Arbash Meinel <john at arbash-meinel.com> wrote:
>>
>>I think compressing the storage is a good idea, but I'd like to switch
>>to an append-only indexed weave-like format at some time in the
>>future. I have the start of some code for that here.
>>
>> http://people.ubuntu.com/~mbp/bzr.mbp.knit/
>>
>>There is some tension between such a format and compression; I suppose
>>we could just compress each appended record of the file independently.
>> The ratio might not be as good but it would eliminate some text
>>redundancy, and I suppose we can rely on the delta compression to get
>>some more.
>>
>>I think I'd like compression to be optional; for local access the CPU
>>cost may be more than people wish to pay.
>
>
> I believe I have implemented it, I was able to upgrade the bzr.dev tree.
> But I didn't write any more tests.
I just fixed the tree so that all tests pass again. This is now revno=1357.
This tree does 3 major things.
1) Switches from using cElementTree to write out elements, to writing
them out directly. At the same time I updated the xml format version.
Which isn't strictly necessary, but I chose to do it, because the
inventory_sha1 no longer matches
2) Compresses the revision-store into a revision.weave file.
3) Pays attention to a ".bzr/compressed" file. If it exists, and it
contains the value "true\n", then it expects all weaves to be in
compressed form. You can change the state of the current tree using
"bzr upgrade --compress" or "bzr upgrade --uncompress".
The default when creating a new tree is compressed.
The default upgrade is uncompressed.
It won't work with a rsync push, but hopefully in the future, a
standard "bzr push" will be able to have the remote branch be
compressed, and the local branch be uncompressed. (This works
already with "bzr pull").
I don't know if you want to try and merge this before you go to Canada.
It certainly should be tested some more.
To give some of the benefits of this branch, here are the statistics.
bzr.dev 2369 revisions
inventory.weave 1.8M 1843774
revision-store 1.0M 1064510
.bzr/ 8.0M 8346945
bzr.dev upgraded uncompressed
inventory.weave 1.1M 1176233
revision.weave 1.1M 1132526
.bzr/ 7.4M 7748158
bzr.dev upgraded compressed
inventory.weave.gz 361.3K 369922
revision.weave.gz 327.6K 335477
.bzr/ 2.1M 2189365
Yes, that last number is accurate. With all of the weaves compressed, I
get a: "du -ksh *" of 2.4M, and "du -ksh .bzr" of 2.9M
So with compressed meta-data we are actually very close to a 1:1 ratio.
I haven't done a lot of performance testing with the now-compressed
files, though.
So we might play with it, and see what the real tradeoffs are. But
considering the current code has to download all of the
"inventory.weave" file before much can be done (because that is how it
gets the ancestry), it seems worthwhile to check compress it down to
only 1/5th of its size.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 253 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051025/e4a36aa5/attachment.pgp
More information about the bazaar
mailing list