[MERGE] Fetch tweaks

Robert Collins robertc at robertcollins.net
Tue Jul 29 01:06:37 BST 2008


On Mon, 2008-07-28 at 10:48 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Robert Collins wrote:
> | This allows repositories more control over their fetch operations in the
> | generic fetching code. Doing so allows the groupcompress format to avoid
> | having to figure out full text representations, rather getting
> | everything as fulltext in the first place; and eliminates an unnecessary
> | reconcile post-fetch.
> |
> | -Rob
> |
> 
> BB:approve
> 
> I like this patch as it stands, though with one caveat. Specifically
> during first branch, passing _fetch_uses_deltas = False will read the
> entire repository into memory. It will be somewhat efficient, in that it
> will share strings in the in-memory lists, up until the point that you
> actually fetch a bit of text.

Yes indeed.

> Then it does ''.join(lines) which doubles memory consumption for that
> text (while still caching the original lines). If the caller doesn't
> hang onto the text it will probably be ok.

If we had the unpacked sizes in the index we could do something clever
and simple :). We don't though. (And I think it would be a loss overall
due to index size increasing.)

> To truly scale up, we need to change the 'get_record_stream()' code that
> blindly unpacks all of the requested keys so that we only unpack a few
> at a time. I don't have a good answer for that, as how do you decide how
> much to unpack for efficiency versus memory consumption.
> 
> Anyway, this is still better than what we have (as it lets us experiment
> with it), and it shouldn't change the behavior of anything *today*.

I plan to audit the versioned file code to make sure it will do
something nice for group compress - or can be tweaked to do so. Roughly
thats:
 - adding reverse-topological
 - checking knits groups by fileid when sorting
 - making the full text assembly a little bit more lazy (I'm thinking
just a 100-text batches). Excluding ISO's and so on most texts are <
1MB, so that should be less than 100MB worst case.

Another thing we should do in knits is discard the raw content once its
not referenced anymore; but perhaps gc will make that irrelevant.

-Rob

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080729/cd163fad/attachment.pgp 


More information about the bazaar mailing list