Improve performance of tsort.topo_sort
Maarten Bosmans
mkbosmans at gmail.com
Wed Jul 29 02:17:51 BST 2009
Hi there,
I've been looking into the code in tsort.py. With the modified
algorithm show below the time to topologically sort the graph for the
bzr.dev branch (~26000 revisions) went down from 680 to 260 ms.
Note that I didn't modify the TopoSorter class. It's of course
possible to alter the code there instead of in the topo_sort function,
but by not using the self._var variables the topo_sort function is a
bit faster.
With this change there is one testcase (test_tsort_partial) failing.
But if I understand the purpose of topo_sort correctly, this should
not be a fail, because the exact ordering of the result list is not
specified, so both the old and the new behaviour is correct.
Maarten Bosmans
=== modified file 'bzrlib/tsort.py'
--- bzrlib/tsort.py 2009-07-27 15:20:32 +0000
+++ bzrlib/tsort.py 2009-07-29 00:57:50 +0000
@@ -35,7 +35,39 @@
node identifiers can be any hashable object, and are typically strings.
"""
- return TopoSorter(graph).sorted()
+ graph = dict(graph)
+ node_name_stack = []
+
+ # count the number of children for every parent in the graph
+ nchildren = dict.fromkeys(graph.iterkeys(), 0)
+ for parents in graph.itervalues():
+ for parent in parents:
+ if parent in nchildren:
+ nchildren[parent] += 1
+ # keep track of nodes without children
+ nochildnodes = [node for (node, n) in nchildren.iteritems() if n == 0]
+
+ while nochildnodes:
+ # pick a node without a child and add it to the stack.
+ node_name = nochildnodes.pop()
+ node_name_stack.append(node_name)
+
+ # the parents of the node loose it as a child; if it was the last
+ # child, add the parent to the list of childless nodes.
+ parents = graph.pop(node_name)
+ for parent in parents:
+ if parent in nchildren:
+ nchildren[parent] -= 1
+ if nchildren[parent] == 0:
+ nochildnodes.append(parent)
+
+ # if there are still nodes left in the graph,
+ # that means that there is a cycle
+ if graph:
+ raise errors.GraphCycleError(graph)
+
+ node_name_stack.reverse()
+ return node_name_stack
class TopoSorter(object):
More information about the bazaar
mailing list