Rev 3819: (jam) break up RemoteTransport.readv() requests into smaller chunks. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Mon Nov 3 19:50:32 GMT 2008


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 3819
revision-id: pqm at pqm.ubuntu.com-20081103195028-0cojtot2le6ph52m
parent: pqm at pqm.ubuntu.com-20081103045341-70776n3uqq0b5dmb
parent: john at arbash-meinel.com-20081103190723-bi0071ibl3i1ir9q
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Mon 2008-11-03 19:50:28 +0000
message:
  (jam) break up RemoteTransport.readv() requests into smaller chunks.
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
  bzrlib/transport/remote.py     ssh.py-20060608202016-c25gvf1ob7ypbus6-1
    ------------------------------------------------------------
    revno: 3805.4.10
    revision-id: john at arbash-meinel.com-20081103190723-bi0071ibl3i1ir9q
    parent: john at arbash-meinel.com-20081103190506-pdy5po5fj483yqz0
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Mon 2008-11-03 13:07:23 -0600
    message:
      NEWS entry
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
    ------------------------------------------------------------
    revno: 3805.4.9
    revision-id: john at arbash-meinel.com-20081103190506-pdy5po5fj483yqz0
    parent: john at arbash-meinel.com-20081029193544-6vhuhlahdor8ebr3
    parent: pqm at pqm.ubuntu.com-20081103045341-70776n3uqq0b5dmb
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Mon 2008-11-03 13:05:06 -0600
    message:
      Merge in bzr.dev, resolve conflict in RemoteTransport.readv()
    added:
      bzrlib/smart/packrepository.py packrepository.py-20080527041253-a16a8qp4vy8qh8y6-1
      contrib/bzr_ssh_path_limiter   bzr_ssh_path_limiter-20081030010544-xjhl0y2i6wyloz8q-1
    modified:
      NEWS                           NEWS-20050323055033-4e00b5db738777ff
      bzr                            bzr.py-20050313053754-5485f144c7006fa6
      bzrlib/__init__.py             __init__.py-20050309040759-33e65acf91bbcd5d
      bzrlib/branch.py               branch.py-20050309040759-e4baf4e0d046576e
      bzrlib/builtins.py             builtins.py-20050830033751-fc01482b9ca23183
      bzrlib/bundle/serializer/v4.py v10.py-20070611062757-5ggj7k18s9dej0fr-1
      bzrlib/bzrdir.py               bzrdir.py-20060131065624-156dfea39c4387cb
      bzrlib/commit.py               commit.py-20050511101309-79ec1a0168e0e825
      bzrlib/errors.py               errors.py-20050309040759-20512168c4e14fbd
      bzrlib/graph.py                graph_walker.py-20070525030359-y852guab65d4wtn0-1
      bzrlib/help_topics/en/hooks.txt hooks.txt-20070830033044-xxu2rced13f72dka-1
      bzrlib/mutabletree.py          mutabletree.py-20060906023413-4wlkalbdpsxi2r4y-2
      bzrlib/plugin.py               plugin.py-20050622060424-829b654519533d69
      bzrlib/remote.py               remote.py-20060720103555-yeeg2x51vn0rbtdp-1
      bzrlib/repofmt/pack_repo.py    pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
      bzrlib/repository.py           rev_storage.py-20051111201905-119e9401e46257e3
      bzrlib/smart/request.py        request.py-20061108095550-gunadhxmzkdjfeek-1
      bzrlib/smart/vfs.py            vfs.py-20061108095550-gunadhxmzkdjfeek-2
      bzrlib/tests/blackbox/test_bound_branches.py test_bound_branches.py-20051109215527-2373188ad566c205
      bzrlib/tests/blackbox/test_info.py test_info.py-20060215045507-bbdd2d34efab9e0a
      bzrlib/tests/blackbox/test_nick.py test_nick.py-20061105141046-p7zovcsit44uj4w9-1
      bzrlib/tests/blackbox/test_send.py test_bundle.py-20060616222707-c21c8b7ea5ef57b1
      bzrlib/tests/blackbox/test_switch.py test_switch.py-20071122111948-0c5en6uz92bwl76h-1
      bzrlib/tests/branch_implementations/test_stacking.py test_stacking.py-20080214020755-msjlkb7urobwly0f-1
      bzrlib/tests/test_graph.py     test_graph_walker.py-20070525030405-enq4r60hhi9xrujc-1
      bzrlib/tests/test_info.py      test_info.py-20070320150933-m0xxm1g7xi9v6noe-1
      bzrlib/tests/test_pack_repository.py test_pack_repository-20080801043947-eaw0e6h2gu75kwmy-1
      bzrlib/tests/test_remote.py    test_remote.py-20060720103555-yeeg2x51vn0rbtdp-2
      bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
      bzrlib/tests/test_smart.py     test_smart.py-20061122024551-ol0l0o0oofsu9b3t-2
      bzrlib/tests/test_smart_transport.py test_ssh_transport.py-20060608202016-c25gvf1ob7ypbus6-2
      bzrlib/trace.py                trace.py-20050309040759-c8ed824bdcd4748a
      bzrlib/transport/remote.py     ssh.py-20060608202016-c25gvf1ob7ypbus6-1
      doc/developers/development-repo.txt developmentrepo.txt-20080102200205-raj42k61dch8pjmj-1
      doc/developers/releasing.txt   releasing.txt-20080502015919-fnrcav8fwy8ccibu-1
      doc/en/user-guide/hooks.txt    hooks.txt-20070829200551-7nr6e5a1io6x78uf-1
    ------------------------------------------------------------
    revno: 3805.4.8
    revision-id: john at arbash-meinel.com-20081029193544-6vhuhlahdor8ebr3
    parent: john at arbash-meinel.com-20081029192753-g8yqvt6pgqefi0ir
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Wed 2008-10-29 14:35:44 -0500
    message:
      When starting a new list, we need to start it with the current entry
    modified:
      bzrlib/transport/remote.py     ssh.py-20060608202016-c25gvf1ob7ypbus6-1
    ------------------------------------------------------------
    revno: 3805.4.7
    revision-id: john at arbash-meinel.com-20081029192753-g8yqvt6pgqefi0ir
    parent: john at arbash-meinel.com-20081029192401-uaawypax2idzlxnl
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Wed 2008-10-29 14:27:53 -0500
    message:
      Fix a typo.
    modified:
      bzrlib/transport/remote.py     ssh.py-20060608202016-c25gvf1ob7ypbus6-1
    ------------------------------------------------------------
    revno: 3805.4.6
    revision-id: john at arbash-meinel.com-20081029192401-uaawypax2idzlxnl
    parent: john at arbash-meinel.com-20081029192344-inemp5hyyi61n39h
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Wed 2008-10-29 14:24:01 -0500
    message:
      refactor for clarity.
    modified:
      bzrlib/btree_index.py          index.py-20080624222253-p0x5f92uyh5hw734-7
    ------------------------------------------------------------
    revno: 3805.4.5
    revision-id: john at arbash-meinel.com-20081029192344-inemp5hyyi61n39h
    parent: john at arbash-meinel.com-20081029185313-ftm8kk89jhg452tk
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Wed 2008-10-29 14:23:44 -0500
    message:
      when splitting a readv, we need to start at the last offset,
      we were accidentally calling .next() twice without yielding the result.
    modified:
      bzrlib/transport/remote.py     ssh.py-20060608202016-c25gvf1ob7ypbus6-1
    ------------------------------------------------------------
    revno: 3805.4.4
    revision-id: john at arbash-meinel.com-20081029185313-ftm8kk89jhg452tk
    parent: john at arbash-meinel.com-20081029185002-pq7pky4jat7gtcyx
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Wed 2008-10-29 13:53:13 -0500
    message:
      Change to a 5MB request chunk, and add a bit more debug logging.
    modified:
      bzrlib/transport/remote.py     ssh.py-20060608202016-c25gvf1ob7ypbus6-1
    ------------------------------------------------------------
    revno: 3805.4.3
    revision-id: john at arbash-meinel.com-20081029185002-pq7pky4jat7gtcyx
    parent: john at arbash-meinel.com-20081029184645-5kkxpg13cpyewc0w
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Wed 2008-10-29 13:50:02 -0500
    message:
      Reorganize the new code, add some debugging.
    modified:
      bzrlib/transport/remote.py     ssh.py-20060608202016-c25gvf1ob7ypbus6-1
    ------------------------------------------------------------
    revno: 3805.4.2
    revision-id: john at arbash-meinel.com-20081029184645-5kkxpg13cpyewc0w
    parent: john at arbash-meinel.com-20081029180237-0ny6gh1punn3dcck
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Wed 2008-10-29 13:46:45 -0500
    message:
      Revert the changes to _sort_expand_and_combine
      It isn't used for .pack files. Nor is it used for btree indexes.
    modified:
      bzrlib/transport/__init__.py   transport.py-20050711165921-4978aa7ce1285ad5
    ------------------------------------------------------------
    revno: 3805.4.1
    revision-id: john at arbash-meinel.com-20081029180237-0ny6gh1punn3dcck
    parent: pqm at pqm.ubuntu.com-20081028202057-u3csau9zvf0hapya
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: remote_readv_sections
    timestamp: Wed 2008-10-29 13:02:37 -0500
    message:
      Change the default readv() packing rules.
    modified:
      bzrlib/transport/__init__.py   transport.py-20050711165921-4978aa7ce1285ad5
      bzrlib/transport/remote.py     ssh.py-20060608202016-c25gvf1ob7ypbus6-1
=== modified file 'NEWS'
--- a/NEWS	2008-11-03 04:12:11 +0000
+++ b/NEWS	2008-11-03 19:07:23 +0000
@@ -14,6 +14,11 @@
 
   IMPROVEMENTS:
 
+    * When making a large readv() request over ``bzr+ssh``, break up the
+      request into more manageable chunks. Because the RPC is not yet able
+      to stream, this helps keep us from buffering too much information at
+      once. (John Arbash Meinel)
+
   BUG FIXES:
 
   DOCUMENTATION:

=== modified file 'bzrlib/btree_index.py'
--- a/bzrlib/btree_index.py	2008-10-28 19:39:57 +0000
+++ b/bzrlib/btree_index.py	2008-10-29 19:24:01 +0000
@@ -848,17 +848,21 @@
         """
         return self._get_nodes(self._internal_node_cache, node_indexes)
 
-    def _get_leaf_nodes(self, node_indexes):
-        """Get a bunch of nodes, from cache or disk."""
-        found = self._get_nodes(self._leaf_node_cache, node_indexes)
+    def _cache_leaf_values(self, nodes):
+        """Cache directly from key => value, skipping the btree."""
         if self._leaf_value_cache is not None:
-            for node in found.itervalues():
+            for node in nodes.itervalues():
                 for key, value in node.keys.iteritems():
                     if key in self._leaf_value_cache:
                         # Don't add the rest of the keys, we've seen this node
                         # before.
                         break
                     self._leaf_value_cache[key] = value
+
+    def _get_leaf_nodes(self, node_indexes):
+        """Get a bunch of nodes, from cache or disk."""
+        found = self._get_nodes(self._leaf_node_cache, node_indexes)
+        self._cache_leaf_values(found)
         return found
 
     def iter_all_entries(self):

=== modified file 'bzrlib/transport/remote.py'
--- a/bzrlib/transport/remote.py	2008-10-17 06:01:10 +0000
+++ b/bzrlib/transport/remote.py	2008-11-03 19:05:06 +0000
@@ -61,6 +61,9 @@
     RemoteTCPTransport, etc.
     """
 
+    # When making a readv request, cap it at requesting 5MB of data
+    _max_readv_bytes = 5*1024*1024
+
     # IMPORTANT FOR IMPLEMENTORS: RemoteTransport MUST NOT be given encoding
     # responsibilities: Put those on SmartClient or similar. This is vital for
     # the ability to support multiple versions of the smart protocol over time:
@@ -312,29 +315,56 @@
                                limit=self._max_readv_combine,
                                fudge_factor=self._bytes_to_read_before_seek))
 
-        try:
-            result = self._client.call_with_body_readv_array(
-                ('readv', self._remote_path(relpath),),
-                [(c.start, c.length) for c in coalesced])
-            resp, response_handler = result
-        except errors.ErrorFromSmartServer, err:
-            self._translate_error(err)
-
-        if resp[0] != 'readv':
-            # This should raise an exception
-            response_handler.cancel_read_body()
-            raise errors.UnexpectedSmartServerResponse(resp)
-
-        return self._handle_response(offsets, coalesced, response_handler)
-
-    def _handle_response(self, offsets, coalesced, response_handler):
-        # turn the list of offsets into a stack
+        # now that we've coallesced things, avoid making enormous requests
+        requests = []
+        cur_request = []
+        cur_len = 0
+        for c in coalesced:
+            if c.length + cur_len > self._max_readv_bytes:
+                requests.append(cur_request)
+                cur_request = [c]
+                cur_len = c.length
+                continue
+            cur_request.append(c)
+            cur_len += c.length
+        if cur_request:
+            requests.append(cur_request)
+        if 'hpss' in debug.debug_flags:
+            trace.mutter('%s.readv %s offsets => %s coalesced'
+                         ' => %s requests (%s)',
+                         self.__class__.__name__, len(offsets), len(coalesced),
+                         len(requests), sum(map(len, requests)))
+        # Cache the results, but only until they have been fulfilled
+        data_map = {}
+        # turn the list of offsets into a single stack to iterate
         offset_stack = iter(offsets)
-        cur_offset_and_size = offset_stack.next()
+        # using a list so it can be modified when passing down and coming back
+        next_offset = [offset_stack.next()]
+        for cur_request in requests:
+            try:
+                result = self._client.call_with_body_readv_array(
+                    ('readv', self._remote_path(relpath),),
+                    [(c.start, c.length) for c in cur_request])
+                resp, response_handler = result
+            except errors.ErrorFromSmartServer, err:
+                self._translate_error(err)
+
+            if resp[0] != 'readv':
+                # This should raise an exception
+                response_handler.cancel_read_body()
+                raise errors.UnexpectedSmartServerResponse(resp)
+
+            for res in self._handle_response(offset_stack, cur_request,
+                                             response_handler,
+                                             data_map,
+                                             next_offset):
+                yield res
+
+    def _handle_response(self, offset_stack, coalesced, response_handler,
+                         data_map, next_offset):
+        cur_offset_and_size = next_offset[0]
         # FIXME: this should know how many bytes are needed, for clarity.
         data = response_handler.read_body_bytes()
-        # Cache the results, but only until they have been fulfilled
-        data_map = {}
         data_offset = 0
         for c_offset in coalesced:
             if len(data) < c_offset.length:
@@ -353,7 +383,7 @@
                 #       not have a real string.
                 if key == cur_offset_and_size:
                     yield cur_offset_and_size[0], this_data
-                    cur_offset_and_size = offset_stack.next()
+                    cur_offset_and_size = next_offset[0] = offset_stack.next()
                 else:
                     data_map[key] = this_data
             data_offset += c_offset.length
@@ -362,7 +392,7 @@
             while cur_offset_and_size in data_map:
                 this_data = data_map.pop(cur_offset_and_size)
                 yield cur_offset_and_size[0], this_data
-                cur_offset_and_size = offset_stack.next()
+                cur_offset_and_size = next_offset[0] = offset_stack.next()
 
     def rename(self, rel_from, rel_to):
         self._call('rename',




More information about the bazaar-commits mailing list