Rev 4180: The mini-index is the key to shrinking the key width and still being able to 'scale'. in http://bzr.arbash-meinel.com/branches/bzr/jam-integration
John Arbash Meinel
john at arbash-meinel.com
Sat Mar 21 02:01:17 GMT 2009
At http://bzr.arbash-meinel.com/branches/bzr/jam-integration
------------------------------------------------------------
revno: 4180
revision-id: john at arbash-meinel.com-20090321020106-507m1ij4mk38cwh3
parent: john at arbash-meinel.com-20090321013631-wbiiaca5hpfcnlo6
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: jam-integration
timestamp: Fri 2009-03-20 21:01:06 -0500
message:
The mini-index is the key to shrinking the key width and still being able to 'scale'.
As the index gets longer, we add log(N) bits to the mini-index, which is
exactly what we need to preserve the same level of collision avoidance.
-------------- next part --------------
=== modified file 'doc/developers/improved_chk_index.txt'
--- a/doc/developers/improved_chk_index.txt 2009-03-21 01:36:31 +0000
+++ b/doc/developers/improved_chk_index.txt 2009-03-21 02:01:06 +0000
@@ -100,7 +100,7 @@
5. So for 1M keys, an ideal chk+group index would be:
- a. 8-byte hash prefix
+ a. 6-byte hash prefix
b. 2-byte group number
@@ -108,22 +108,22 @@
d. a separate lookup of 12-byte group number to offset + length
- e. a variable width mini-index that splits X bits of the key. (for the
- smallest keys, and lowest chance of collision, this could *not* be
- redundant with the value stored in (a)) This should then dereference
- into a location in the index. This should probably be a 4-byte
- reference. It is unlikely, but possible, to have an index >16MB. With
- an 8-byte hash, it takes 1.3M chk nodes to do so.
- At the smallest end, this will probably be a 256-way (8-bits) fan out,
- at the high end it could go up to 64k-way (16-bits) or maybe even
- 1M-way (20-bits). (64k-way should handle up to 5-16M nodes and still
- allow a cheap <4k read to find the final entry.)
+ e. a variable width mini-index that splits X bits of the key. (to maintain
+ small keys, low chance of collision, this is *not* redundant with the
+ value stored in (a)) This should then dereference into a location in
+ the index. This should probably be a 4-byte reference. It is unlikely,
+ but possible, to have an index >16MB. With an 10-byte entry, it only
+ takes 1.6M chk nodes to do so. At the smallest end, this will probably
+ be a 256-way (8-bits) fan out, at the high end it could go up to
+ 64k-way (16-bits) or maybe even 1M-way (20-bits). (64k-way should
+ handle up to 5-16M nodes and still allow a cheap <4k read to find the
+ final entry.)
So the max size for the optimal groupcompress+chk index with 10M entries would be::
- 12 * 10M (entries) + 64k * 12 (group) + 64k * 4 (mini index) = 121 MiB
+ 10 * 10M (entries) + 64k * 12 (group) + 64k * 4 (mini index) = 101 MiB
-So 121MiB which breaks down as 120MiB for the actual entries, 0.75MiB for the
+So 101MiB which breaks down as 100MiB for the actual entries, 0.75MiB for the
group records, and 0.25MiB for the mini index.
1. Looking up a key would involve:
@@ -158,20 +158,30 @@
f. If the size of these mini headers becomes critical (8 bytes per record
is 8% overhead for 100 byte records), we could also compress this mini
- header. Alternatively we could set the 'width' of records. So instead of
- always using 4-byte wide offset + length, we could switch to 2-byte
- wide. Though groups are likely to always be >64KiB long (uncompressed).
+ header. Changing the number of bytes per entry is unlikely to be
+ efficient, because groups standardize on 4MiB wide, which is >>64KiB for
+ a 2-byte offset, 3-bytes would be enough as long as we never store an
+ ISO as a single entry in the content. Variable width also isn't a big
+ win, since base-128 hits 4-bytes at just 2MiB.
+
For minimum size without compression, we could only store the 4-byte
length of each node. Then to compute the offset, you have to sum all
- previous nodes. This should still be very cheap in compiled code, and
- would also have the advantage that it would be highly compressible
- itself. (Most nodes are going to have a length that fits 1-2 bytes.)
- An alternative form would be to use the base-128 encoding. (If the MSB
+ previous nodes. We require <64k nodes in a group, so it is up to 256KiB
+ for this header, but we would lose partial reads. This should still be
+ cheap in compiled code (needs tests, as you can't do partial info), and
+ would also have the advantage that fixed width would be highly
+ compressible itself. (Most nodes are going to have a length that fits
+ 1-2 bytes.)
+
+ An alternative form would be to use the base-128 encoding. (If the MSB
is set, then the next byte needs to be added to the current value
shifted by 7*n bits.) This encodes 4GiB in 5 bytes, but stores 127B in 1
- byte, and 2MiB in 3 bytes. If we only stored 64k entries in a group, we
- would expect this header to be <128KiB long, which should be fast to
- process.
+ byte, and 2MiB in 3 bytes. If we only stored 64k entries in a 4 MiB
+ group, the average size can only be 64B, which fits in a single byte
+ length, so 64KiB for this header, or only 1.5% overhead. We also don't
+ have to compute the offset of *all* nodes, just the ones before the one
+ we want, which is the similar to what we have to do to get the actual
+ content out.
Partial Hash
@@ -255,12 +265,17 @@
certainly acceptible. (note that isn't 1 in 1000 of those keys will be a
collision, but 1 in 1000 that you will have a *single* collision). Using a
collision chance of 10^-3, and number of keys 10^6, means we need (12+3)*0.4 =
-6 bytes. For 10M keys, you need (14+3)*0.4 = 6.8 aka 7.
+6 bytes. For 10M keys, you need (14+3)*0.4 = 6.8 aka 7. We get that extra byte
+from the ``mini-index``. In an index with a lot of keys, you want a bigger
+fan-out up front anyway, which gives you more bytes consumed and extends your
+effective key width.
Also taking one more look at ``H ~ 0.4 (2N + E)``, you can rearrange and
consider that for every order of magnitude more keys you insert, your chance
for collision goes up by 2 orders of magnitude. But for 100M keys, 8 bytes
-gives you a 1 in 10,000 chance of collision.
+gives you a 1 in 10,000 chance of collision, and that is gotten at a 16-bit
+fan-out (64k-way), but for 100M keys, we would likely want at least 20-bit fan
+out.
You can also see this from the original equation with a bit of rearranging::
More information about the bazaar-commits
mailing list