Bundles and --2a

John Arbash Meinel john at arbash-meinel.com
Wed Jul 15 23:15:59 BST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So I spent some time working on:
  https://bugs.edge.launchpad.net/bzr/+bug/393349

To discover that bundles an CHK are pretty much incompatible at the moment.

I believe you can insert a bundle generated from 1.9/1.9-rich-root into
a --2a repository. But any bundle generated from a --2a repository is
not valid.

The code makes the assumption that:
  repository.inventories.make_mpdiffs()

Will return data that is sufficient to fully reproduce the tree shape
information. (By a later apply_mpdiffs()/insert_mpdiffs() call.)

So one possibility would be to redefine the meaning of 'make_mpdiffs()'
for CHKInventoryRepository.inventories. Instead of getting the text
stream of the 'inventory' lines and mpdiffing them, it would instead use
CHKSerializer.write_inventory_to_string() and then mpdiff that set of texts.

(effectively, upcast all inventories to an XML representation, compute
the mpdiff of that, and then write that into the bundle. On the other
side, it would do the reverse operation and insert_mpdiffs() on
'repo.inventories' would then also be writing data to repo.chk_bytes.)


The main benefit of something like this is that it makes it a bit more
generic, in the case that you end up crossing serialization boundaries.
However, it also means that we end up re-writing a lot of the
'insert_stream' code because we now have yet another way to insert data
into the repository.


At one point, I thought we could just have a Bundle appear as a
StreamSink object, and then have it serialize into bytes the data that
it is given. I think we could make this work, but it would be
StreamSource specific as to what data would be present.

Then I remembered that Robert & Andrew were trying to get a StreamSource
that could work between repository formats, and do so via
InventoryDeltas. Which might fit just right for a bundle format. As an
InventoryDelta is an abstract that is supposed to apply to any Inventory
to create a new Inventory of the appropriate type.

The main argument against this, is that for pre-2a formats, you have
deserialize the parent inventory xml into an Inventory, apply the delta,
and then serialize that back into another xml before inserting it into
your repository. (As opposed to the current fast path which says "if
source.serializer == target.serializer insert the bytes directly into
the repo.)

However, Bundles already have their own delta format. So we already had
to extract to a fulltext xml, we just didn't have to convert that xml
string into a bunch of InventoryEntry objects.

Also, for --2a inserting an InventoryDelta is much cheaper. (We only
have to extract the pages that contain the records that are being changed.)


So the fundamental shift would be to move a Bundle away from being a
delta of serialized fulltexts, to being a serialization of deltas. (For
file content, we could use the same deltas we do today [mpdiffs].)

I don't believe all of the streaming-via-inventory deltas work has
landed yet in bzr.dev. Can someone give me an update on its status?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkpeVR8ACgkQJdeBCYSNAAMY2ACZARP/ni+iGY/mAu+9YsCQAFZT
h2MAoLE11Ssd2N4s+Fvb98oIPLRsqUIE
=fNiu
-----END PGP SIGNATURE-----



More information about the bazaar mailing list