[MERGE] Improvements to find_unique_ancestors()

Wed May 21 02:12:11 BST 2008

John Arbash Meinel wrote:
> Ian Clatworthy wrote:

> | 1. find_unique_ancestors() is simply too long - it covers 4+ pages
> |    when printed - making it unmaintainable by people like me with
> |    smaller brains. I'm aware of the performance emphasis here but
> |    still feel it could be broken up into a master method that calls
> |    two or three helper methods. Those helper methods can contain the
> |    loops, rather than being called inside the loops, so performance
> |    shouldn't be impacted by a noticeable amount.
> 
> I did this. It *might* be slightly faster with the current refactoring.
> I don't
> see any specific reason why it would be, but in my testing I see 128ms
> => 118ms.
> The only thing I can think of might be memory pressure. Or maybe an
> "if()" check
> is cheaper than a no-op _mutter() call.

Wow - what a difference in clarity after refactoring. Thanks! I found
a bug in the refactoring such that the end algorithm differs - it now
(accidentally) steps the all_unique_searcher 9 out of 10 times instead of
1 out of 10 times. Wouldn't expect that to speed things up though?

bb: tweak

> +        return newly_seen_common, newly_seen_unique,

Trailing comma can go here.

> +    def _find_nodes_common_to_all_unique(self, unique_tip_searchers,
> +                                         all_unique_searcher,
> +                                         newly_seen_unique, step_all_unique):
> +        """Find nodes that are common to all unique_tip_searchers.
> +
> +        If it is time, step the all_unique_searcher, and add its nodes to the
> +        result.
> +        """
> +        unique_are_common_nodes = newly_seen_unique.copy()
> +        for searcher in unique_tip_searchers:
> +            unique_are_common_nodes.intersection_update(searcher.seen)
> +        unique_are_common_nodes.intersection_update(
> +                                    all_unique_searcher.seen)

I'd really prefer it if unique_are_common_nodes was renamed to
common_to_all_unique_nodes, at least within this particular method.

> +        # Step all-unique less frequently than the other searchers.
> +        # In the common case, we don't need to spider out far here, so
> +        # avoid doing extra work.
> +        if not step_all_unique:

You've changed step_all_unique from a counter (outside this method) to
a boolean (inside this method). That's what I was expecting which is
good. You need to invert this test now though.

> +            tstart = time.clock()
> +            nodes = all_unique_searcher.step()
> +            unique_are_common_nodes.update(nodes)
> +            tdelta = time.clock() - tstart
> +            if 'graph' in debug.debug_flags:
> +                trace.mutter('all_unique_searcher step() took %.3fs'
> +                             'for %d nodes (%d total), iteration: %s',
> +                             tdelta, len(nodes), len(all_unique_searcher.seen),
> +                             all_unique_searcher._iterations)
> +        return unique_are_common_nodes

The calculation of tdelta can move inside the if block.

> | As a minor note, several of the _mutter calls probably ought to be
> | using %d instead of %s. It's pretty minor compared to the issues above
> | though.
> 
> I didn't do this, since "%d" % (int) == "%s" % (int). The reasons for %d
> are to
> handle "%2.2d" or to round from a float. %s "always works" which means I
> won't
> accidentally get an error if I change an int into an object. I'll do it
> if you
> think it adds clarity, though.

I think it does read better when integers are given as %d in format
strings.

Well done on this overall. The improvements in speed and code clarity in
this area are really welcome.

Ian C.