[RFC] ancestry cache

Sun Feb 19 15:32:37 GMT 2006

Aaron Bentley wrote:
> John A Meinel wrote:
> | Aaron Bentley wrote:
> |>I've been fiddling with progress bars for merge application, and it
> |>turns out we spend a lot of our time selecting a merge base.  What
> |>surprised me is that the largest part of this is the time generating a
> |>graph of common ancestors.
> |>
> |>So I've implemented a prototype ancestry cache.  Patch attached.  Does
> |>this seem worth pursuing?
> 
> | I realize this has the advantage that it handles ghost revisions. But is
> | this genuinely faster than reading the Weave prefix of inventory.weave.
> | It actually seems like the weave prefix should be faster because you
> | have fewer bytes to read.
> 
> I haven't benched against that.  I didn't want to change the behavior of
> revision_graph.  In this case, I'm fairly sure that direct descendants
> of ghosts would be converted into descendants of the null revision,
> which would be very bad.  revision_graph also has callers that are
> looking ghosts.

Well, actually, ghosts don't show up in weaves (yet). So you just
wouldn't see them. I'm not sure what you would prefer at this point. (I
guess something which only has a parent who is a ghost would look like
it descended from 'null:')

> 
> In any case, the cache is not a bottleneck.
> 
> | Also, in my store-escaping branch, I was thinking to allow '/' as a
> | valid character in revision ids (then we wouldn't have to escape Arch
> | fully qualified patch names), it is easy enough to escape them to write
> | them onto disk. But I don't really care.
> 
> Using '/' is a hack.  I originally wanted to use rio format, but I had
> trouble using it.  For one thing, it doesn't accept arbitrary bytes for
> keys, so I couldn't use straight revision-ids as keys.  For another, I
> wasn't getting the same data out that I put in.

The former, I knew about, and it is kind of a pain. (I don't think it
even lets you have whitespace, or anything other than a-zA-Z0-9.-, I
forget the specific restriction).

I'm a little worried about the latter. Not getting the same data out
seems serious. Do you know what was being translated?

> 
> This won't be a merge request until the format is tidied up, if it
> becomes a merge request at all.
> 
> Also, it doesn't handle non-Repository revision_sources, and that will
> need to be fixed.
> 
> | +                    try:
> | +                        parents = a_cache.get_parent_ids(line)
> | +                        if len(parents) == 0:
> | +                            parents = [NULL_REVISION]
> | +                    except bzrlib.errors.NoSuchRevision:
> | +                        if line == revision:
> | +                            raise
> | +                        parents = None
> |
> |
> | It seems like this makes a_cache actually important, since it doesn't
> | look anywhere else for the parents.
> |
> | Otherwise it would seem that you would use:
> |
> | try:
> |   parents = a_cache.get_parent_ids(line)
> | except NoSuchRevision:
> |   rev = repository.get_revision(line)
> |   parents = rev.parent_ids
> 
> You're right; it's basically memoization of
> repository.get_revision().parent_ids.  It's only cache in the sense that
> ~ operations that destroy revisions might want to destroy it.
> 
> Aaron

Sure. Mostly what I was remarking about is that it expected to *find*
everything in the cache. And had no fallback if the cache was missing
something. (Which is an issue for either memoization or a plain cache)

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060219/c0cf5cb5/attachment.pgp