pack index performance

John Arbash Meinel john at arbash-meinel.com
Wed Jun 11 17:25:44 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
| Adding to the theories for why index performance is improved with less
| indices I think it is the linear access pattern.
|
| Specifically, say we have N indices: [0, 1, ... N-1]
|
| If we look up a key k we do the following IO:
| bisect_in(indices[0])
| ...
| bisect_in(indice_with_k)
|
| Looking up K keys we do the same IO pattern, reducing K by found keys at
| each step until K is empty.
|
| If we assume K == N and each k in K is present in one index only, that
| means we look for:
| K keys in indices[0]
| K-1 keys in indices[1]
| ...
| 1 key in indices[K-1]
|
| looking for more keys in a single index corresponds to larger readv's
| and more processing in the index layer.
|
| If we altered our search, to instead look in all indices in parallel, we
| would have less locality of reference, but likely perform all IO in
| parallel. That is to find a key k:
| while k not found:
|   bisect_step(indices[0])
|   ...
|   bisect_step(indices[N-1])
|
| This would trim the bisection required in all indices as soon as any k
| in K is found.
|
| (I can be more rigourous about this, but this is a quick note to prevent
| the idea being lost). I'll note that I had this in mind at the start of
| the index layer's work, its just slightly hidden from view at the
| moment.
|
| -Rob
|

That would be an interesting way to do it. It might also help the case when the
indexes already have some of the data loaded.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhP/IgACgkQJdeBCYSNAAPimgCggrstRZCRSord787Knnxf3hIS
K4IAn34yJotpXfKK9aKgZYx7MMtBCSBG
=dEk7
-----END PGP SIGNATURE-----



More information about the bazaar mailing list