[MERGE] Developer doc: container format

Vincent Ladeuil v.ladeuil+lp at free.fr
Tue Jun 5 14:48:00 BST 2007


>>>>> "aaron" == Aaron Bentley <aaron.bentley at utoronto.ca> writes:

    aaron> Andrew Bennetts wrote:
    >> +The format is:
    >> +
    >> +  * a **container lead-in**, "``bzr pack format 1\n``",
    >> +  * followed by one or more **records**.
    >> +
    >> +A record is:
    >> +
    >> +  * a 3 byte **kind marker**.
    >> +  * 0 or more bytes of record content, depending on the record type.

    aaron> To me, the layering seems a bit strange.  Because the name and size
    aaron> fields are defined as part of the record type, you need to understand
    aaron> all the record types used in the container in order to know where the
    aaron> record named "foo" begins and ends.

I agree with Aaron, if the lowest layer can't decode the content
with marker kind only, it will not be able to detect data
corruption (detecting other data corruptions can still occur at
higher levels but that's a separate issue).

Length-prefixed is good. TLV: type, length, value is the basis of
most of the reliable formats I know of.

    aaron> When implementing this, I would expect the lowest
    aaron> layer not to understand any record types, other than
    aaron> knowing that every content record has a name and size.
    aaron> That layer would simply provide access to bytes and
    aaron> record type, and higher layers would worry about the
    aaron> meaning of said bytes.

    aaron> So I think it makes sense to define size and name as
    aaron> being part of every record, except the end marker.

I understand the constraint: you don't want to force the producer
to know the size of the record before writing it.

In that case,use the following work-around: define a
'continuation' record that should be appended to the previously
read record when encountered and make the size a mandatory part
of the record (except for the end record).

That way, you will still be able to write a record when you reach
the maximum size of your buffer and just issue additional
continuation records to cover the full size.

It adds a bit of complexity for reading and writing records but
still allows streaming to occur without size limitation.

      Vincent



More information about the bazaar mailing list