[MERGE] Improvements to find_unique_ancestors()
Ian Clatworthy
ian.clatworthy at internode.on.net
Wed May 21 02:12:11 BST 2008
John Arbash Meinel wrote:
> Ian Clatworthy wrote:
> | 1. find_unique_ancestors() is simply too long - it covers 4+ pages
> | when printed - making it unmaintainable by people like me with
> | smaller brains. I'm aware of the performance emphasis here but
> | still feel it could be broken up into a master method that calls
> | two or three helper methods. Those helper methods can contain the
> | loops, rather than being called inside the loops, so performance
> | shouldn't be impacted by a noticeable amount.
>
> I did this. It *might* be slightly faster with the current refactoring.
> I don't
> see any specific reason why it would be, but in my testing I see 128ms
> => 118ms.
> The only thing I can think of might be memory pressure. Or maybe an
> "if()" check
> is cheaper than a no-op _mutter() call.
Wow - what a difference in clarity after refactoring. Thanks! I found
a bug in the refactoring such that the end algorithm differs - it now
(accidentally) steps the all_unique_searcher 9 out of 10 times instead of
1 out of 10 times. Wouldn't expect that to speed things up though?
bb: tweak
> + return newly_seen_common, newly_seen_unique,
Trailing comma can go here.
> + def _find_nodes_common_to_all_unique(self, unique_tip_searchers,
> + all_unique_searcher,
> + newly_seen_unique, step_all_unique):
> + """Find nodes that are common to all unique_tip_searchers.
> +
> + If it is time, step the all_unique_searcher, and add its nodes to the
> + result.
> + """
> + unique_are_common_nodes = newly_seen_unique.copy()
> + for searcher in unique_tip_searchers:
> + unique_are_common_nodes.intersection_update(searcher.seen)
> + unique_are_common_nodes.intersection_update(
> + all_unique_searcher.seen)
I'd really prefer it if unique_are_common_nodes was renamed to
common_to_all_unique_nodes, at least within this particular method.
> + # Step all-unique less frequently than the other searchers.
> + # In the common case, we don't need to spider out far here, so
> + # avoid doing extra work.
> + if not step_all_unique:
You've changed step_all_unique from a counter (outside this method) to
a boolean (inside this method). That's what I was expecting which is
good. You need to invert this test now though.
> + tstart = time.clock()
> + nodes = all_unique_searcher.step()
> + unique_are_common_nodes.update(nodes)
> + tdelta = time.clock() - tstart
> + if 'graph' in debug.debug_flags:
> + trace.mutter('all_unique_searcher step() took %.3fs'
> + 'for %d nodes (%d total), iteration: %s',
> + tdelta, len(nodes), len(all_unique_searcher.seen),
> + all_unique_searcher._iterations)
> + return unique_are_common_nodes
The calculation of tdelta can move inside the if block.
> | As a minor note, several of the _mutter calls probably ought to be
> | using %d instead of %s. It's pretty minor compared to the issues above
> | though.
>
> I didn't do this, since "%d" % (int) == "%s" % (int). The reasons for %d
> are to
> handle "%2.2d" or to round from a float. %s "always works" which means I
> won't
> accidentally get an error if I change an int into an object. I'll do it
> if you
> think it adds clarity, though.
I think it does read better when integers are given as %d in format
strings.
Well done on this overall. The improvements in speed and code clarity in
this area are really welcome.
Ian C.
More information about the bazaar
mailing list