[MERGE] Fetch tweaks
Robert Collins
robertc at robertcollins.net
Tue Jul 29 01:06:37 BST 2008
On Mon, 2008-07-28 at 10:48 -0500, John Arbash Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Robert Collins wrote:
> | This allows repositories more control over their fetch operations in the
> | generic fetching code. Doing so allows the groupcompress format to avoid
> | having to figure out full text representations, rather getting
> | everything as fulltext in the first place; and eliminates an unnecessary
> | reconcile post-fetch.
> |
> | -Rob
> |
>
> BB:approve
>
> I like this patch as it stands, though with one caveat. Specifically
> during first branch, passing _fetch_uses_deltas = False will read the
> entire repository into memory. It will be somewhat efficient, in that it
> will share strings in the in-memory lists, up until the point that you
> actually fetch a bit of text.
Yes indeed.
> Then it does ''.join(lines) which doubles memory consumption for that
> text (while still caching the original lines). If the caller doesn't
> hang onto the text it will probably be ok.
If we had the unpacked sizes in the index we could do something clever
and simple :). We don't though. (And I think it would be a loss overall
due to index size increasing.)
> To truly scale up, we need to change the 'get_record_stream()' code that
> blindly unpacks all of the requested keys so that we only unpack a few
> at a time. I don't have a good answer for that, as how do you decide how
> much to unpack for efficiency versus memory consumption.
>
> Anyway, this is still better than what we have (as it lets us experiment
> with it), and it shouldn't change the behavior of anything *today*.
I plan to audit the versioned file code to make sure it will do
something nice for group compress - or can be tweaked to do so. Roughly
thats:
- adding reverse-topological
- checking knits groups by fileid when sorting
- making the full text assembly a little bit more lazy (I'm thinking
just a 100-text batches). Excluding ISO's and so on most texts are <
1MB, so that should be less than 100MB worst case.
Another thing we should do in knits is discard the raw content once its
not referenced anymore; but perhaps gc will make that irrelevant.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080729/cd163fad/attachment.pgp
More information about the bazaar
mailing list