Rev 3819: (jam) break up RemoteTransport.readv() requests into smaller chunks. in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Mon Nov 3 19:50:32 GMT 2008
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 3819
revision-id: pqm at pqm.ubuntu.com-20081103195028-0cojtot2le6ph52m
parent: pqm at pqm.ubuntu.com-20081103045341-70776n3uqq0b5dmb
parent: john at arbash-meinel.com-20081103190723-bi0071ibl3i1ir9q
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Mon 2008-11-03 19:50:28 +0000
message:
(jam) break up RemoteTransport.readv() requests into smaller chunks.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
bzrlib/transport/remote.py ssh.py-20060608202016-c25gvf1ob7ypbus6-1
------------------------------------------------------------
revno: 3805.4.10
revision-id: john at arbash-meinel.com-20081103190723-bi0071ibl3i1ir9q
parent: john at arbash-meinel.com-20081103190506-pdy5po5fj483yqz0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Mon 2008-11-03 13:07:23 -0600
message:
NEWS entry
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
------------------------------------------------------------
revno: 3805.4.9
revision-id: john at arbash-meinel.com-20081103190506-pdy5po5fj483yqz0
parent: john at arbash-meinel.com-20081029193544-6vhuhlahdor8ebr3
parent: pqm at pqm.ubuntu.com-20081103045341-70776n3uqq0b5dmb
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Mon 2008-11-03 13:05:06 -0600
message:
Merge in bzr.dev, resolve conflict in RemoteTransport.readv()
added:
bzrlib/smart/packrepository.py packrepository.py-20080527041253-a16a8qp4vy8qh8y6-1
contrib/bzr_ssh_path_limiter bzr_ssh_path_limiter-20081030010544-xjhl0y2i6wyloz8q-1
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzr bzr.py-20050313053754-5485f144c7006fa6
bzrlib/__init__.py __init__.py-20050309040759-33e65acf91bbcd5d
bzrlib/branch.py branch.py-20050309040759-e4baf4e0d046576e
bzrlib/builtins.py builtins.py-20050830033751-fc01482b9ca23183
bzrlib/bundle/serializer/v4.py v10.py-20070611062757-5ggj7k18s9dej0fr-1
bzrlib/bzrdir.py bzrdir.py-20060131065624-156dfea39c4387cb
bzrlib/commit.py commit.py-20050511101309-79ec1a0168e0e825
bzrlib/errors.py errors.py-20050309040759-20512168c4e14fbd
bzrlib/graph.py graph_walker.py-20070525030359-y852guab65d4wtn0-1
bzrlib/help_topics/en/hooks.txt hooks.txt-20070830033044-xxu2rced13f72dka-1
bzrlib/mutabletree.py mutabletree.py-20060906023413-4wlkalbdpsxi2r4y-2
bzrlib/plugin.py plugin.py-20050622060424-829b654519533d69
bzrlib/remote.py remote.py-20060720103555-yeeg2x51vn0rbtdp-1
bzrlib/repofmt/pack_repo.py pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
bzrlib/repository.py rev_storage.py-20051111201905-119e9401e46257e3
bzrlib/smart/request.py request.py-20061108095550-gunadhxmzkdjfeek-1
bzrlib/smart/vfs.py vfs.py-20061108095550-gunadhxmzkdjfeek-2
bzrlib/tests/blackbox/test_bound_branches.py test_bound_branches.py-20051109215527-2373188ad566c205
bzrlib/tests/blackbox/test_info.py test_info.py-20060215045507-bbdd2d34efab9e0a
bzrlib/tests/blackbox/test_nick.py test_nick.py-20061105141046-p7zovcsit44uj4w9-1
bzrlib/tests/blackbox/test_send.py test_bundle.py-20060616222707-c21c8b7ea5ef57b1
bzrlib/tests/blackbox/test_switch.py test_switch.py-20071122111948-0c5en6uz92bwl76h-1
bzrlib/tests/branch_implementations/test_stacking.py test_stacking.py-20080214020755-msjlkb7urobwly0f-1
bzrlib/tests/test_graph.py test_graph_walker.py-20070525030405-enq4r60hhi9xrujc-1
bzrlib/tests/test_info.py test_info.py-20070320150933-m0xxm1g7xi9v6noe-1
bzrlib/tests/test_pack_repository.py test_pack_repository-20080801043947-eaw0e6h2gu75kwmy-1
bzrlib/tests/test_remote.py test_remote.py-20060720103555-yeeg2x51vn0rbtdp-2
bzrlib/tests/test_repository.py test_repository.py-20060131075918-65c555b881612f4d
bzrlib/tests/test_smart.py test_smart.py-20061122024551-ol0l0o0oofsu9b3t-2
bzrlib/tests/test_smart_transport.py test_ssh_transport.py-20060608202016-c25gvf1ob7ypbus6-2
bzrlib/trace.py trace.py-20050309040759-c8ed824bdcd4748a
bzrlib/transport/remote.py ssh.py-20060608202016-c25gvf1ob7ypbus6-1
doc/developers/development-repo.txt developmentrepo.txt-20080102200205-raj42k61dch8pjmj-1
doc/developers/releasing.txt releasing.txt-20080502015919-fnrcav8fwy8ccibu-1
doc/en/user-guide/hooks.txt hooks.txt-20070829200551-7nr6e5a1io6x78uf-1
------------------------------------------------------------
revno: 3805.4.8
revision-id: john at arbash-meinel.com-20081029193544-6vhuhlahdor8ebr3
parent: john at arbash-meinel.com-20081029192753-g8yqvt6pgqefi0ir
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Wed 2008-10-29 14:35:44 -0500
message:
When starting a new list, we need to start it with the current entry
modified:
bzrlib/transport/remote.py ssh.py-20060608202016-c25gvf1ob7ypbus6-1
------------------------------------------------------------
revno: 3805.4.7
revision-id: john at arbash-meinel.com-20081029192753-g8yqvt6pgqefi0ir
parent: john at arbash-meinel.com-20081029192401-uaawypax2idzlxnl
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Wed 2008-10-29 14:27:53 -0500
message:
Fix a typo.
modified:
bzrlib/transport/remote.py ssh.py-20060608202016-c25gvf1ob7ypbus6-1
------------------------------------------------------------
revno: 3805.4.6
revision-id: john at arbash-meinel.com-20081029192401-uaawypax2idzlxnl
parent: john at arbash-meinel.com-20081029192344-inemp5hyyi61n39h
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Wed 2008-10-29 14:24:01 -0500
message:
refactor for clarity.
modified:
bzrlib/btree_index.py index.py-20080624222253-p0x5f92uyh5hw734-7
------------------------------------------------------------
revno: 3805.4.5
revision-id: john at arbash-meinel.com-20081029192344-inemp5hyyi61n39h
parent: john at arbash-meinel.com-20081029185313-ftm8kk89jhg452tk
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Wed 2008-10-29 14:23:44 -0500
message:
when splitting a readv, we need to start at the last offset,
we were accidentally calling .next() twice without yielding the result.
modified:
bzrlib/transport/remote.py ssh.py-20060608202016-c25gvf1ob7ypbus6-1
------------------------------------------------------------
revno: 3805.4.4
revision-id: john at arbash-meinel.com-20081029185313-ftm8kk89jhg452tk
parent: john at arbash-meinel.com-20081029185002-pq7pky4jat7gtcyx
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Wed 2008-10-29 13:53:13 -0500
message:
Change to a 5MB request chunk, and add a bit more debug logging.
modified:
bzrlib/transport/remote.py ssh.py-20060608202016-c25gvf1ob7ypbus6-1
------------------------------------------------------------
revno: 3805.4.3
revision-id: john at arbash-meinel.com-20081029185002-pq7pky4jat7gtcyx
parent: john at arbash-meinel.com-20081029184645-5kkxpg13cpyewc0w
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Wed 2008-10-29 13:50:02 -0500
message:
Reorganize the new code, add some debugging.
modified:
bzrlib/transport/remote.py ssh.py-20060608202016-c25gvf1ob7ypbus6-1
------------------------------------------------------------
revno: 3805.4.2
revision-id: john at arbash-meinel.com-20081029184645-5kkxpg13cpyewc0w
parent: john at arbash-meinel.com-20081029180237-0ny6gh1punn3dcck
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Wed 2008-10-29 13:46:45 -0500
message:
Revert the changes to _sort_expand_and_combine
It isn't used for .pack files. Nor is it used for btree indexes.
modified:
bzrlib/transport/__init__.py transport.py-20050711165921-4978aa7ce1285ad5
------------------------------------------------------------
revno: 3805.4.1
revision-id: john at arbash-meinel.com-20081029180237-0ny6gh1punn3dcck
parent: pqm at pqm.ubuntu.com-20081028202057-u3csau9zvf0hapya
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: remote_readv_sections
timestamp: Wed 2008-10-29 13:02:37 -0500
message:
Change the default readv() packing rules.
modified:
bzrlib/transport/__init__.py transport.py-20050711165921-4978aa7ce1285ad5
bzrlib/transport/remote.py ssh.py-20060608202016-c25gvf1ob7ypbus6-1
=== modified file 'NEWS'
--- a/NEWS 2008-11-03 04:12:11 +0000
+++ b/NEWS 2008-11-03 19:07:23 +0000
@@ -14,6 +14,11 @@
IMPROVEMENTS:
+ * When making a large readv() request over ``bzr+ssh``, break up the
+ request into more manageable chunks. Because the RPC is not yet able
+ to stream, this helps keep us from buffering too much information at
+ once. (John Arbash Meinel)
+
BUG FIXES:
DOCUMENTATION:
=== modified file 'bzrlib/btree_index.py'
--- a/bzrlib/btree_index.py 2008-10-28 19:39:57 +0000
+++ b/bzrlib/btree_index.py 2008-10-29 19:24:01 +0000
@@ -848,17 +848,21 @@
"""
return self._get_nodes(self._internal_node_cache, node_indexes)
- def _get_leaf_nodes(self, node_indexes):
- """Get a bunch of nodes, from cache or disk."""
- found = self._get_nodes(self._leaf_node_cache, node_indexes)
+ def _cache_leaf_values(self, nodes):
+ """Cache directly from key => value, skipping the btree."""
if self._leaf_value_cache is not None:
- for node in found.itervalues():
+ for node in nodes.itervalues():
for key, value in node.keys.iteritems():
if key in self._leaf_value_cache:
# Don't add the rest of the keys, we've seen this node
# before.
break
self._leaf_value_cache[key] = value
+
+ def _get_leaf_nodes(self, node_indexes):
+ """Get a bunch of nodes, from cache or disk."""
+ found = self._get_nodes(self._leaf_node_cache, node_indexes)
+ self._cache_leaf_values(found)
return found
def iter_all_entries(self):
=== modified file 'bzrlib/transport/remote.py'
--- a/bzrlib/transport/remote.py 2008-10-17 06:01:10 +0000
+++ b/bzrlib/transport/remote.py 2008-11-03 19:05:06 +0000
@@ -61,6 +61,9 @@
RemoteTCPTransport, etc.
"""
+ # When making a readv request, cap it at requesting 5MB of data
+ _max_readv_bytes = 5*1024*1024
+
# IMPORTANT FOR IMPLEMENTORS: RemoteTransport MUST NOT be given encoding
# responsibilities: Put those on SmartClient or similar. This is vital for
# the ability to support multiple versions of the smart protocol over time:
@@ -312,29 +315,56 @@
limit=self._max_readv_combine,
fudge_factor=self._bytes_to_read_before_seek))
- try:
- result = self._client.call_with_body_readv_array(
- ('readv', self._remote_path(relpath),),
- [(c.start, c.length) for c in coalesced])
- resp, response_handler = result
- except errors.ErrorFromSmartServer, err:
- self._translate_error(err)
-
- if resp[0] != 'readv':
- # This should raise an exception
- response_handler.cancel_read_body()
- raise errors.UnexpectedSmartServerResponse(resp)
-
- return self._handle_response(offsets, coalesced, response_handler)
-
- def _handle_response(self, offsets, coalesced, response_handler):
- # turn the list of offsets into a stack
+ # now that we've coallesced things, avoid making enormous requests
+ requests = []
+ cur_request = []
+ cur_len = 0
+ for c in coalesced:
+ if c.length + cur_len > self._max_readv_bytes:
+ requests.append(cur_request)
+ cur_request = [c]
+ cur_len = c.length
+ continue
+ cur_request.append(c)
+ cur_len += c.length
+ if cur_request:
+ requests.append(cur_request)
+ if 'hpss' in debug.debug_flags:
+ trace.mutter('%s.readv %s offsets => %s coalesced'
+ ' => %s requests (%s)',
+ self.__class__.__name__, len(offsets), len(coalesced),
+ len(requests), sum(map(len, requests)))
+ # Cache the results, but only until they have been fulfilled
+ data_map = {}
+ # turn the list of offsets into a single stack to iterate
offset_stack = iter(offsets)
- cur_offset_and_size = offset_stack.next()
+ # using a list so it can be modified when passing down and coming back
+ next_offset = [offset_stack.next()]
+ for cur_request in requests:
+ try:
+ result = self._client.call_with_body_readv_array(
+ ('readv', self._remote_path(relpath),),
+ [(c.start, c.length) for c in cur_request])
+ resp, response_handler = result
+ except errors.ErrorFromSmartServer, err:
+ self._translate_error(err)
+
+ if resp[0] != 'readv':
+ # This should raise an exception
+ response_handler.cancel_read_body()
+ raise errors.UnexpectedSmartServerResponse(resp)
+
+ for res in self._handle_response(offset_stack, cur_request,
+ response_handler,
+ data_map,
+ next_offset):
+ yield res
+
+ def _handle_response(self, offset_stack, coalesced, response_handler,
+ data_map, next_offset):
+ cur_offset_and_size = next_offset[0]
# FIXME: this should know how many bytes are needed, for clarity.
data = response_handler.read_body_bytes()
- # Cache the results, but only until they have been fulfilled
- data_map = {}
data_offset = 0
for c_offset in coalesced:
if len(data) < c_offset.length:
@@ -353,7 +383,7 @@
# not have a real string.
if key == cur_offset_and_size:
yield cur_offset_and_size[0], this_data
- cur_offset_and_size = offset_stack.next()
+ cur_offset_and_size = next_offset[0] = offset_stack.next()
else:
data_map[key] = this_data
data_offset += c_offset.length
@@ -362,7 +392,7 @@
while cur_offset_and_size in data_map:
this_data = data_map.pop(cur_offset_and_size)
yield cur_offset_and_size[0], this_data
- cur_offset_and_size = offset_stack.next()
+ cur_offset_and_size = next_offset[0] = offset_stack.next()
def rename(self, rel_from, rel_to):
self._call('rename',
More information about the bazaar-commits
mailing list