[MERGE] KVF.get_record_stream('unordered') uses I/O order

Fri Dec 5 13:11:24 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Ladeuil wrote:
>>>>>> "jam" == John Arbash Meinel <john at arbash-meinel.com> writes:
> 
> <snip/>
>  
>     jam> +    def _sort_keys_by_io(self, keys, positions):
>     jam> +        """Figure out an optimal order to read the records for the given keys.
>     jam> +
>     jam> +        Sort keys, grouped by index and sorted by position.
>     jam> +
>     jam> +        :param keys: A list of keys whose records we want to read. This will be
>     jam> +            sorted 'in-place'.
>     jam> +        :param positions: A dict, such as the one returned by
>     jam> +            _get_components_positions()
>     jam> +        :return: None
>     jam> +        """
>     jam> +        def get_index_memo(key):
>     jam> +            index_memo = positions[key][1]
>     jam> +            # Group by prefix and position. index_memo[0] is the key, so it is
>     jam> +            # (file_id, revision_id) and we don't want to sort on revision_id,
>     jam> +            # index_memo[1] is the position, and index_memo[2] is the size,
>     jam> +            # which doesn't matter for the sort
>     jam> +            return index_memo[0][:-1], index_memo[1]
>     jam> +        return keys.sort(key=get_index_memo)
>     jam> +
> 
> What a nice comment...
> 
>     jam>      def _split_key(self, key):
>     jam>          """Split key into a prefix and suffix."""
>     jam>          return key[:-1], key[-1]
>     jam> @@ -2380,6 +2407,21 @@
>     jam>          bits = node[2][1:].split(' ')
>     jam>          return node[0], int(bits[0]), int(bits[1])
>  
>     jam> +    def _sort_keys_by_io(self, keys, positions):
>     jam> +        """Figure out an optimal order to read the records for the given keys.
>     jam> +
>     jam> +        Sort keys, grouped by index and sorted by position.
>     jam> +
>     jam> +        :param keys: A list of keys whose records we want to read. This will be
>     jam> +            sorted 'in-place'.
>     jam> +        :param positions: A dict, such as the one returned by
>     jam> +            _get_components_positions()
>     jam> +        :return: None
>     jam> +        """
>     jam> +        def get_index_memo(key):
>     jam> +            return positions[key][1]
> 
> .. which makes the lack of it here a bit... surprising :)
> 
> 
> BB:approve
> 
>         Vincent
> 

Well, in the second case we use it as-is, but I added this for you:
# index_memo is at offset [1]. It is made up of (GraphIndex,
# position, size). GI is an object, which will be unique for each
# pack file. This causes us to group by pack file, then sort by
# position. Size doesn't matter, but it isn't worth breaking up the
# tuple.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkk5KHwACgkQJdeBCYSNAAMqMgCeKLYuEDIpxBiHBfOddawDh2bl
dKoAoLmgANHn8b46QHMUHyDux9fraT65
=0srW
-----END PGP SIGNATURE-----