API work

Wed Nov 6 16:31:36 UTC 2013

For a little further exposition of the issue, here's a branch
that implements the API server side of charm streaming
as I'm suggesting. It seems reasonably simple to me.

https://codereview.appspot.com/22100045/

To complete this, we'd need an implementation of State.PutCharmBundle
that streams the bytes into Mongo (probably using mgo.GridFile),
and a corresponding GET request to enable agents to get the charm
bundle from the state.

On 6 November 2013 15:27, roger peppe <rogpeppe at gmail.com> wrote:
> On 6 November 2013 14:29, John Arbash Meinel <john at arbash-meinel.com> wrote:
>>>> I would be perfectly happy with PUT if we were already a RESTful
>>>> API, but it seems a bit strange to just tack that on, and will be
>>>> a one-more-special case that we run into when trying to debug,
>>>> etc. (logs will likely be different, working in the code will
>>>> have to think about multiple paths, etc.)
>>>
>>> The reason is that if you've got a large charm (and we'll probably
>>> end up uploading tools through this mechanism at some point) PUT
>>> streams the bytes nicely, but we *really* don't want a single RPC
>>> containing the entire charm as an arbitrarily large blob, so we'd
>>> have to add quite a bit more mechanism to "stream" the data with
>>> RPC, and even then you have to work out how big your data packets
>>> are, and you incur round trip latency for as many packets as you
>>> send - this would make charm upload quite a bit slower.
>>>
>>> I suspect that the amount of work outlined above is actually quite
>>> a bit less than would need to be done to implement charm streaming
>>> uploads over the RPC interface.
>>>
>>
>> The chunked implementation in golang just uses io.Copy which reads and
>> writes everything in 32kB chunks. We could just as easily do the same
>> thing, or just make them 1MB chunks or whatever. We can just as easily
>> pipeline the RPC requests which is what is being done with
>> transfer-encoding: chunked.
>
> I'm not sure I understand this.
>
> How about I explain what would be necessary to stream charms over
> the RPC interface?
>
> The sequence of RPC operations might look a little like this:
>
> -> UploadCharm {RequestId: 1, URL: "cs:precise/wordpress:28", SHA256:
> "abcb4464b3b3d3f3de"}
> <- {RequestId: 1, StreamId: 1234}
> -> WriteData {RequestId: 2, StreamId: 1234, Data: base64encodeddata}
> <- {RequestId: 2}
> -> WriteData {RequestId: 3, StreamId: 1234, Data: base64encodeddata}
> <- {RequestId: 3}
> ... repeat for as many data blocks as are in the charm.
> -> CloseStream {RequestId: 99, StreamId: 1234}
> <- {RequestId: 99}
>
> To do this, we'd need a new "stream" entity in the API,
> and we'd need to implement the above operations on it.
>
> If we wanted to pipeline, we'd have to go more sophisticated.
> We could include the offset of the data block in the RPC requests.
>
> A pipelined streaming operation might look like this:
>
> -> UploadCharm {RequestId: 1, URL: "cs:precise/wordpress:28", SHA256:
> "abcb4464b3b3d3f3de"}
> <- {RequestId: 1, StreamId: 1234}
> -> WriteData {RequestId: 2, StreamId: 1234, Offset: 0, Data: base64encodeddata}
> -> WriteData {RequestId: 3, StreamId: 1234, Offset: 65536, Data:
> base64encodeddata}
> <- {RequestId: 2}
> -> WriteData{ RequestId: 4, StreamId: 1234, Offset: 131072, Data:
> base64encodeddata}
> <- {RequestId: 3}
> ... repeat for as many data blocks as are in the charm
> -> CloseStream {RequestId: 99, StreamId: 1234}
> <- {RequestId: 99}
>
> This is eminently doable (and I've implemented this kind of thing in the past),
> but it is considerably more complex than just using TCP streaming
> as nature intended.
>
> And it's still not great - there are parameters that need tuning
> but the best values depend on the actual connection in use.
> For example: how big do you chunk the data? (that's the "packet size"
> I mentioned above); how many outstanding concurrent requests do
> you allow in flight?
>
> TCP already does sliding windows - it's a much better fit for streaming data.
> That's what it was designed for. The chunk size that io.Copy uses isn't
> that important, as each packet doesn't entail a round-trip.