Rev 4072: (mbp) hpss streaming design docs in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Tue Mar 3 04:24:14 GMT 2009

At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 4072
revision-id: pqm at pqm.ubuntu.com-20090303042409-qa96pox029nf2zus
parent: pqm at pqm.ubuntu.com-20090303034049-faaink61hujui1sy
parent: mbp at sourcefrog.net-20090119101414-tj4r8rhmnzofs2lz
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2009-03-03 04:24:09 +0000
message:
  (mbp) hpss streaming design docs
modified:
  doc/developers/network-protocol.txt networkprotocol.txt-20070903044232-woustorrjbmg5zol-1
    ------------------------------------------------------------
    revno: 3944.1.1
    revision-id: mbp at sourcefrog.net-20090119101414-tj4r8rhmnzofs2lz
    parent: pqm at pqm.ubuntu.com-20090119030630-3xdyyi4xj69md8e4
    committer: Martin Pool <mbp at sourcefrog.net>
    branch nick: doc-hpss
    timestamp: Mon 2009-01-19 21:14:14 +1100
    message:
      Notes with Andrew about hpss streaming
    modified:
      doc/developers/network-protocol.txt networkprotocol.txt-20070903044232-woustorrjbmg5zol-1
=== modified file 'doc/developers/network-protocol.txt'

--- a/doc/developers/network-protocol.txt	2008-05-16 07:15:57 +0000
+++ b/doc/developers/network-protocol.txt	2009-01-19 10:14:14 +0000
@@ -2,7 +2,7 @@
 Network Protocol
 ================
 
-:Date: 2007-09-03
+:Date: 2009-01-07
 
 
 .. contents::
@@ -221,19 +221,24 @@
 
 The underlying message format is::
 
-  MESSAGE := "bzr message 3 (bzr 1.6)" NEWLINE HEADERS MESSAGE_PARTS
+  MESSAGE := MAGIC NEWLINE HEADERS CONTENTS END_MESSAGE
+  MAGIC := "bzr message 3 (bzr 1.6)"
   HEADERS := LENGTH_PREFIX bencoded_dict
-  MESSAGE_PARTS := MESSAGE_PART [MORE_MESSAGE_PARTS]
-  MORE_MESSAGE_PARTS := END_MESSAGE_PARTS | MESSAGE_PARTS
-  END_MESSAGE_PARTS := "e"
+  END_MESSAGE := "e"
 
+  BODY := MESSAGE_PART+ 
   MESSAGE_PART := ONE_BYTE | STRUCTURE | BYTES
   ONE_BYTE := "o" byte
   STRUCTURE := "s" LENGTH_PREFIX bencoded_structure
   BYTES := "b" LENGTH_PREFIX bytes
 
+(Where ``+`` indicates one or more.)
+
 This format allows an arbitrary sequence of message parts to be encoded
-in a single message.
+in a single message.  The contents of a MESSAGE have a higher-level
+message, but knowing just this amount of data it's possible to
+deserialize and consume a message, so that implementations can respond to
+messages sent by later versions.
 
 Headers
 ~~~~~~~
@@ -254,36 +259,54 @@
 describes how such messages are encoded.  All requests and responses
 defined by earlier protocol versions must be encoded in this way.
 
-Conventional requests will send a sequence of:
-
-* Arguments (a STRUCTURE of a tuple)
-
-* (Optional) body
-
-  * Single body (BYTES), or
-
-  * Streamed body (multiple BYTES parts), followed by a status (ONE_BYTE)
-
-    * if status is "E", followed by an Error (STRUCTURE)
-
-Conventional responses will send a sequence of:
-
-* Status (ONE_BYTE)
-
-* Arguments (a STRUCTURE of a tuple)
-
-* (Optional) body
-
-  * Single body (BYTES), or
-
-  * Streamed body (multiple BYTES parts), followed by a status (ONE_BYTE)
-
-    * if status is "E", followed by an Error (STRUCTURE)
-
-In all cases, the ONE_BYTE status is either "S" for Success or "E" for
-Error.  Note that the streamed body from version two is now just multiple
+Conventional requests will send a CONTENTS of ::
+
+  CONV_REQ := ARGS SINGLE_OR_STREAMED_BODY?
+  SINGLE_OR_STREAMED_BODY := BYTES 
+        | BYTES+ TRAILER
+         
+  ARGS := STRUCTURE(argument_tuple) 
+  TRAILER := SUCCESS_STATUS | ERROR
+  SUCCESS_STATUS := ONE_BYTE("S")
+  ERROR := ONE_BYTE("E") STRUCTURE(argument_tuple)
+
+Conventional responses will send CONTENTS of ::
+
+  CONV_RESP := RESP_STATUS ARGS SINGLE_OR_STREAMED_BODY?
+  RESP_STATUS := ONE_BYTE("S") | ONE_BYTE("E")
+
+If the RESP_STATUS is success ("S"), the arguments are the
+method-dependent result.  
+
+For errors (where the Status byte of a response or a streamed body is
+"E"), the situation is analagous to requests.  The first item in the
+encoded sequence must be a string of the error name.  The other arguments
+supply details about the error, and their number and types will depend on
+the type of error (as identified by the error name).
+
+Note that the streamed body from version two is now just multiple
 BYTES parts.
 
+The end of the request or response is indicated by the lower-level 
+END_MESSAGE.  If there's only one BYTES element in the body, the TRAILER
+may or may not be present, depending on whether it was sent as a single
+chunk or as a stream that happens to have one element.
+
+  *(Discussion)* The success marker at the end of a streamed body seems
+  redundant; it doesn't have space for any arguments, and the end of the
+  body is marked anyhow by the end of the message.  Recipients shouldn't
+  take any action on it, though they should map an error into raising an
+  error locally.
+
+  1.10 clients don't assert that they get a status byte at the end of the
+  message.  They will complain (in
+  ``ConventionalResponseHandler.byte_part_received``) if they get an
+  initial success and then another byte part with no intervening bytes.
+  If we stop sending the final success message and only flag errors
+  they'll only get one if the error is detected after streaming starts but
+  before any bytes are actually sent.  Possibly we should wait until at 
+  least the first chunk is ready before declaring success.
+
 For new methods, these sequences are just a convention and may be varied
 if appropriate for a particular request or response.  However, each
 request should at least start with a STRUCTURE encoding the arguments
@@ -292,11 +315,105 @@
 bencoded.  As a result, unlike previous protocol versions, arguments in
 this version are 8-bit clean.)
 
-For errors (where the Status byte of a response or a streamed body is
-"E"), the situation is analagous to requests.  The first item in the
-encoded sequence must be a string of the error name.  The other arguments
-supply details about the error, and their number and types will depend on
-the type of error (as identified by the error name).
+  (Discussion) We're discussing having the byte segments be not just a
+  method for sending a stream across the network, but actually having them
+  be preserved in the rpc from end to end.  This may be useful when
+  there's an iterator on one side feeding in to an iterator on the other,
+  if it avoids doing chunking and byte-counting at two levels, and if
+  those iterators are a natural place to get good granularity.  Also, for 
+  cases like ``insert_record_stream`` the server can't do much with the
+  data until it gets a whole chunk, and so it'll be natural and efficient
+  for it to be called with one chunk at a time.
+
+  On the other hand, there may be times when we've got some bytes from the 
+  network but not a full chunk, and it might be worthwhile to pass it up.
+  If we promise to preserve chunks, then to do this we'd need two separate
+  streaming interfaces: "we got a chunk" and "we got some bytes but not
+  yet a full chunk".  For ``insert_record_stream`` the second might not be
+  useful, but it might be good when writing to a file where any number of
+  bytes can be processed.
+
+  If we promise to preserve chunks, it'll tend to make some RPCs work only
+  in chunks, and others just on whole blocks, and we can't so easily
+  migrate RPCs from one to the other transparently to older
+  implementations.
+
+  The data inside those chunks will be serialized anyhow, and possibly the
+  data inside them will already be able to be serialized apart without
+  understanding the chunks.  Also, we might want to use these formats e.g.
+  for pack files or in bundles, and so they don't particularly need
+  lower-level chunking.  So the current (unmerged, unstable) record stream
+  serialization turns each record into a bencoded tuple and it'd be
+  feasible to parse one tuple at a time from a byte stream that contains a
+  sequence of them.
+
+  So we've decided that the chunks won't be semantic, and code should not
+  count on them being preserved from client to server.
+
+Early error returns
+~~~~~~~~~~~~~~~~~~~
+
+  *(Discussion)* It would be nice if the server could notify the client of
+  errors even before a streaming request has finished.  This could cover
+  situtaions such as the server not understanding the request, it being
+  unable to open the requested location, or it finding that some of the
+  revisions being sent are not actually needed.
+
+  Especially in the last case, we'd like to be able to gracefully notice
+  the condition while the client is writing, and then have it adapt its
+  behaviour.  In any case, we don't want to have drop and restart the
+  network stream.
+
+  It should be possible for the client to finish its current chunk and
+  then its message, possibly with an error to cancel what's already been
+  sent.
+
+  This relies on the client being able to read back from the server while
+  it's writing.  This is technically difficult for http but feasible over
+  a socket or ssh.
+
+  We'd need a clean way to pass this back to the request method, even
+  though it's presumably in the middle of doing its body iterator.
+  Possibly the body iterator could be manually given a reference to the
+  request object, and it can poll it to see if there's a response.
+
+  Perhaps we need to distinguish error conditions, which should turn into
+  a client-side error regardless of the request code, from early success,
+  which should be handled only if the request code specifically wants to
+  do it.
+
+Full-duplex operation
+~~~~~~~~~~~~~~~~~~~~~
+
+  Code not geared to do pipelined requests, and this might require doing
+  asynchrony within bzrlib.  We might want to either go fully pipelined
+  and asynchronous, but there might be a profitable middle ground.
+
+  The particular case where duplex communication would be good is in
+  working towards the common points in the graphs between the client and
+  server: we want to send speculatively, but detect as soon as they've
+  matched up.
+
+  So we could for instance have a synchronous core, but rely on the OS
+  network buffering to allow us to work on batches of say 64kB.  We can
+  also pipeline requests and responses, without allowing for them
+  happening out of order, or mixed requests happening at the same time.
+
+  Wonder how our network performance would have turned out now if we'd
+  done full-duplex from the start, and ignored hpss over http.  We have
+  pretty good (readonly) http support just over dumb http, and that may be
+  better for many users.
+
+
+
+APIs
+====
+
+On the client, the bzrlib code is "in charge": when it makes a request, or
+asks from data from the network, that causes network IO.  The server is
+event driven: the network code tells the response handler when data has
+been received, and it takes back a Response object from the request
+handler that is then polled for body stream data.
 
 Paths
 =====