[MERGE] annotation policies

Tue Jun 17 11:32:53 BST 2008

John Arbash Meinel wrote:

> This is just an extension of the earlier patch, which now implements a
> couple
> more types, which might show what I was thinking about earlier.

Here's a partial review. Only tweaks and comments so far. In general
though, the code is a pleasure to read. Thanks in particular for taking
the time to clean up and clarify the method parameter names as you've
refactored the code from one module to another.

> @@ -173,174 +172,7 @@
>          If None, then any ancestry disputes will be resolved with
>          new_revision_id
>      """

Hmm. Is this docstring comment re None still correct now?

> +class AnnotationPolicy(object):
> +    """This class represents a specific method for performing annotations.
> +
> +    Mostly, this class determines how to resolve ambiguous lines. (Lines which
> +    both sides claim different last-modified revisions.)
> +    """
> +
> +    def __init__(self, heads_provider):
> +        """Create a new AnnotationPolicy

The simple tweak first: the 1st line of __init__'s docstring needs a
trailing '.'.

I must say that I'm not sold on the AnnotationPolicy class name. The
class itself is arguably an "Annotator" that feels like it ought to
be parameterised with a policy class. In other words, my gut feel from
the code I've read so far is that the 'policy' bit is really a tiny bit,
not the top level thing. Having said that, it looks like the no-merge
policy benefits from your subclassing approach and I'm not sure that
it's necessarily worth refactoring things more than you have. So I'm
OK if you'd really prefer to keep things as you've got them but if you're
sitting on the fence at all re this issue, consider my grumble enough to
push you onto one side of it. :-)

> +        :param heads_provider: An object which provids a .heads() call to
> +            resolve if any revision ids are children of others.
> +            Should also provide a 'cache(keys, head)' function.
> +            Can be None.

On the top line, s/provids/provides/.
I wonder if it's worth be more explicit about what a value of None means?

> +        else:
> +            return self._do_annotate_many_parents(
> +                annotated_parents_lines, left_matching_blocks,
> +                plain_child_lines, child_revision_id)
> +        return lines

You don't need this final line - all parts of the if/elif/else use
return.

> +    def _process_unmatched_lines(self, output_lines, plain_child_lines,
> +                                annotated_child_lines, start_child, end_child,
> +                                annotated_right_lines, start_right, end_right,
> +                                revision_id):
> +        """Find lines in plain right_lines that match plain_child_lines

Again, need a trailing '.'. on the docstring top line.

> +                left = annotated_child_lines[start_child+child_idx+offset]
> +                right = annotated_right_lines[start_right+right_idx+offset]

Spaces around the +'s please. I realise this is simply moved code but let's
clean it up while we're at it.

> +        # TODO: Would it be cleaner to just pass in the revision ids, rather
> +        #       than (revision_id, text)?

As the code currently stands, I think your approach is fine. *If* you do go with
separate policy classes that aren't subclasses, then I'd pass just the
revision-ids in and out, not tuples. I guess unpacking and repacking the tuple
*might* have a performance impact but I wonder if that would be minor in the
overall scheme of things? From the performance figures you posted, it looks
like the policy choice dominates so I'd lean towards making the policy
subclasses as clean as possible myself. Of course, if your benchmarking
indicates otherwise, ignore that suggestion. :-)

> +            # Both claim different origins
> +            # We know that revision_id is the head for
> +            # left and right, so cache it
> +            self._heads_provider.cache(
> +                (revision_id, left[0]),
> +                (revision_id,))
> +            self._heads_provider.cache(
> +                (revision_id, right[0]),
> +                (revision_id,))
> +            return (revision_id, left[1])

I'm hoping this will become clearer to me as I grok the code more but why
do you only update the heads_provider cache for this policy and not others?

> +class NoMergeAnnotationPolicy(annotation_policy.AnnotationPolicy):
> +    """Assign all lines based on the primary parent"""
> +
> +    def _do_annotate_two_parents(self, first_parent_lines,
> +                                 first_matching_blocks,
> +                                 second_parent_lines,
> +                                 plain_child_lines,
> +                                 child_revision_id):
> +        """Annotate versus two parents,

s/,/./ on the last line.

> === modified file 'bzrlib/tests/test_annotate.py'
> --- bzrlib/tests/test_annotate.py	2008-02-18 22:19:41 +0000
> +++ bzrlib/tests/test_annotate.py	2008-06-11 18:53:36 +0000
> @@ -23,6 +23,8 @@
>      annotate,
>      conflicts,
>      errors,
> +    graph,
> +    revision,

These last 2 new imports aren't needed as best I can tell.

> @@ -522,3 +414,5 @@
>          blocks = [(0, 1, 1), (1, 2, 0)]
>          self.annotateEqual([('rev2', 'a\n'), ('rev1', 'a\n')], [parent],
>                             new_text, 'rev2', blocks)
> +
> +

These new trailing newlines aren't a problem but don't add anything either.

I still need to review test_annotation_policy.py and read through your
emails on the trade-offs of the various algorithms before commenting on
my preferred default. I'm hoping to do that tomorrow.

Ian C.