[MERGE] KVF.get_record_stream('unordered') uses I/O order
John Arbash Meinel
john at arbash-meinel.com
Fri Dec 5 13:11:24 GMT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Vincent Ladeuil wrote:
>>>>>> "jam" == John Arbash Meinel <john at arbash-meinel.com> writes:
>
> <snip/>
>
> jam> + def _sort_keys_by_io(self, keys, positions):
> jam> + """Figure out an optimal order to read the records for the given keys.
> jam> +
> jam> + Sort keys, grouped by index and sorted by position.
> jam> +
> jam> + :param keys: A list of keys whose records we want to read. This will be
> jam> + sorted 'in-place'.
> jam> + :param positions: A dict, such as the one returned by
> jam> + _get_components_positions()
> jam> + :return: None
> jam> + """
> jam> + def get_index_memo(key):
> jam> + index_memo = positions[key][1]
> jam> + # Group by prefix and position. index_memo[0] is the key, so it is
> jam> + # (file_id, revision_id) and we don't want to sort on revision_id,
> jam> + # index_memo[1] is the position, and index_memo[2] is the size,
> jam> + # which doesn't matter for the sort
> jam> + return index_memo[0][:-1], index_memo[1]
> jam> + return keys.sort(key=get_index_memo)
> jam> +
>
> What a nice comment...
>
> jam> def _split_key(self, key):
> jam> """Split key into a prefix and suffix."""
> jam> return key[:-1], key[-1]
> jam> @@ -2380,6 +2407,21 @@
> jam> bits = node[2][1:].split(' ')
> jam> return node[0], int(bits[0]), int(bits[1])
>
> jam> + def _sort_keys_by_io(self, keys, positions):
> jam> + """Figure out an optimal order to read the records for the given keys.
> jam> +
> jam> + Sort keys, grouped by index and sorted by position.
> jam> +
> jam> + :param keys: A list of keys whose records we want to read. This will be
> jam> + sorted 'in-place'.
> jam> + :param positions: A dict, such as the one returned by
> jam> + _get_components_positions()
> jam> + :return: None
> jam> + """
> jam> + def get_index_memo(key):
> jam> + return positions[key][1]
>
> .. which makes the lack of it here a bit... surprising :)
>
>
> BB:approve
>
> Vincent
>
Well, in the second case we use it as-is, but I added this for you:
# index_memo is at offset [1]. It is made up of (GraphIndex,
# position, size). GI is an object, which will be unique for each
# pack file. This causes us to group by pack file, then sort by
# position. Size doesn't matter, but it isn't worth breaking up the
# tuple.
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkk5KHwACgkQJdeBCYSNAAMqMgCeKLYuEDIpxBiHBfOddawDh2bl
dKoAoLmgANHn8b46QHMUHyDux9fraT65
=0srW
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list