Rev 4804: Some small doc updates to chk_index. in http://bazaar.launchpad.net/~jameinel/bzr/chk-index
John Arbash Meinel
john at arbash-meinel.com
Fri Oct 30 14:29:40 GMT 2009
At http://bazaar.launchpad.net/~jameinel/bzr/chk-index
------------------------------------------------------------
revno: 4804
revision-id: john at arbash-meinel.com-20091030142922-5iipnhlg49r3rgi9
parent: john at arbash-meinel.com-20091028204625-b0owje7tzg60y96o
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: chk-index
timestamp: Fri 2009-10-30 09:29:22 -0500
message:
Some small doc updates to chk_index.
-------------- next part --------------
=== modified file 'doc/developers/improved_chk_index.txt'
--- a/doc/developers/improved_chk_index.txt 2009-03-24 16:35:22 +0000
+++ b/doc/developers/improved_chk_index.txt 2009-10-30 14:29:22 +0000
@@ -13,10 +13,11 @@
Btree indexes also rely on zlib compression, in order to get their compact
size, and further has to try hard to fit things into a compressed 4k page.
When the key is a sha1 hash, we would not expect to get better than 20bytes
-per key, which is the same size as the binary representation of the hash. This
-means we could write an index format that gets approximately the same on-disk
-size, without having the overhead of ``zlib.decompress``. Some thought would
-still need to be put into how to efficiently access these records from remote.
+per key, which is the same size as the binary representation of the hash (zlib
+compressing a sorted list of 10M hashes shrunk to only 97%). This means we
+could write an index format that gets approximately the same on-disk size,
+without having the overhead of ``zlib.decompress``. Some thought would still
+need to be put into how to efficiently access these records from remote.
Required information
@@ -112,7 +113,7 @@
small keys, low chance of collision, this is *not* redundant with the
value stored in (a)) This should then dereference into a location in
the index. This should probably be a 4-byte reference. It is unlikely,
- but possible, to have an index >16MB. With an 10-byte entry, it only
+ but possible, to have an index >16MB. With a 10-byte entry, it only
takes 1.6M chk nodes to do so. At the smallest end, this will probably
be a 256-way (8-bits) fan out, at the high end it could go up to
64k-way (16-bits) or maybe even 1M-way (20-bits). (64k-way should
@@ -385,6 +386,9 @@
64k records. And our groups are currently scaled that we require at least
1-2MB before they can be considered 'full'.
+However, there are also extremely pessimistic cases that can exist. So a
+variable number of bytes per group offset is probably the best answer.
+
variable length index entries
-----------------------------
More information about the bazaar-commits
mailing list