[MERGE] Streaming-friendly container APIs

Andrew Bennetts andrew at canonical.com
Thu Oct 25 15:31:25 BST 2007


John Arbash Meinel wrote:
> John Arbash Meinel has voted tweak.
> Status is now: Conditionally approved
> Comment:
> I'm a little surprised to see 0 direct tests for ContainerSerialiser

It emerged out of tidying other code, rather than something I wrote from
scratch.  But it probably deserves direct tests.  I'll add some and resubmit.

> +    def _consume_until_byte(self, byte):
> +        """Take all bytes up to the given out of the buffer, and return 
> it.
> +
> +        If the specified byte is not found in the buffer, the buffer is
> +        unchanged and this returns None instead.
> +        """
> +        newline_pos = self._buffer.find('\n')
> +        if newline_pos != -1:
> +            line = self._buffer[:newline_pos]
> +            self._buffer = self._buffer[newline_pos+1:]
> +            return line
> +        else:
> +            return None
> 
> ^- This seems to always assume 'byte' is '\n'. (Sounds like you need a 
> test here)

Ah drat.  I'd actually fixed that already, but hadn't committed it.  I don't
need a separate _consume_until_byte method, just _consume_line.  Anyway, fixed.

> ...
> 
> +        # XXX: This causes a memory copy of bytes in size, but is 
> usually
> +        # faster than two write calls (12 vs 13 seconds to output a gig 
> of
> +        # 1k records.) - results may differ on significantly larger 
> records
> +        # like .iso's but as they should be rare in any case and thus 
> not
> +        # likely to be the common case. The biggest issue is causing 
> extreme
> +        # memory pressure in that case. One possibly improvement here 
> is to
> +        # check the size of the content before deciding to join here vs 
> call
> +        # write twice.
> +        return ''.join(byte_sections)
> 
> You could change the api to return an iterator of bytes, so that you 
> could do:
> 
> if len(bytes) > XXX:
>   return (''.join(byte_sections), bytes)
> else:
>   byte_sections.append(bytes)
>   return (''.join(byte_sections),)
> 
> And then the caller would do:
>   for section in self._serializer.bytes_record(...):
>       self.write_func(section)
> 
> I'm not sure that this is performance critical code, bit it is something
> to think about.

It is performance critical in that it affects the speed the speed we can emit
bundles.

That isn't a new comment (or new code), just slightly refactored code, so I'm
inclined not to worry much about this in this patch.

> I feel like you did fairly well a testing the bytes on the wire. I
> wonder about marking these tests somehow so that it is clear that they
> should not be changed as long as we want to maintain protocol
> compatibility.

Good point.  I'll update the docstrings for the test cases at least to make that
clearer.

> +    def test_readline(self):
> +        """Test using readline() as ContainerReader does.
> +
> +        This is always within a readv hunk, never across it.
> +        """
> +        transport = self.get_transport()
> +        transport.put_bytes('sample', '0\n2\n4\n')
> +        f = pack.ReadVFile(transport.readv('sample', [(0,2), (2,4)]))
> +        results = []
> +        results.append(f.readline())
> +        results.append(f.readline())
> +        results.append(f.readline())
> +        self.assertEqual(['0\n', '2\n', '4\n'], results)
> 
> ^- It seems a little odd to read all of the file with your readv. What
> if instead you did:
> 
> transport.put_bytes('sample', '0\n2\n4\n6\n')
> f = pack.ReadVFile(transport.readv('sample', [(0, 2), (4, 4)]))
> results = []
> results.append(f.readline())
> results.append(f.readline())
> results.append(f.readline())
> self.assertEqual(['0\n', '4\n', '6\n'], results)
> 
> I'm not sure why you are adding tests for ReadVFile when you aren't
> implementing ReadVFile. Though the tests themselves seem useful.

That is odd.  I'm sure I didn't touch this, and this branch is a simple branch
off bzr.dev with no merges.  Ah, it appears I must have accidentally copied and
pasted the existing tests.  I've removed the duplicated readv tests.  So, you
ignore that bit.

Thanks for the review!

-Andrew.




More information about the bazaar mailing list