Improve performance of tsort.topo_sort

Maarten Bosmans mkbosmans at gmail.com
Wed Jul 29 02:17:51 BST 2009


Hi there,

I've been looking into the code in tsort.py. With the modified
algorithm show below the time to topologically sort the graph for the
bzr.dev branch (~26000 revisions) went down from 680 to 260 ms.

Note that I didn't modify the TopoSorter class. It's of course
possible to alter the code there instead of in the topo_sort function,
but by not using the self._var variables the topo_sort function is a
bit faster.

With this change there is one testcase (test_tsort_partial) failing.
But if I understand the purpose of topo_sort correctly, this should
not be a fail, because the exact ordering of the result list is not
specified, so both the old and the new behaviour is correct.

Maarten Bosmans


=== modified file 'bzrlib/tsort.py'
--- bzrlib/tsort.py     2009-07-27 15:20:32 +0000
+++ bzrlib/tsort.py     2009-07-29 00:57:50 +0000
@@ -35,7 +35,39 @@

     node identifiers can be any hashable object, and are typically strings.
     """
-    return TopoSorter(graph).sorted()
+    graph = dict(graph)
+    node_name_stack = []
+
+    # count the number of children for every parent in the graph
+    nchildren = dict.fromkeys(graph.iterkeys(), 0)
+    for parents in graph.itervalues():
+        for parent in parents:
+            if parent in nchildren:
+                nchildren[parent] += 1
+    # keep track of nodes without children
+    nochildnodes = [node for (node, n) in nchildren.iteritems() if n == 0]
+
+    while nochildnodes:
+        # pick a node without a child and add it to the stack.
+        node_name = nochildnodes.pop()
+        node_name_stack.append(node_name)
+
+        # the parents of the node loose it as a child; if it was the last
+        # child, add the parent to the list of childless nodes.
+        parents = graph.pop(node_name)
+        for parent in parents:
+            if parent in nchildren:
+                nchildren[parent] -= 1
+                if nchildren[parent] == 0:
+                    nochildnodes.append(parent)
+
+    # if there are still nodes left in the graph,
+    # that means that there is a cycle
+    if graph:
+        raise errors.GraphCycleError(graph)
+
+    node_name_stack.reverse()
+    return node_name_stack


 class TopoSorter(object):



More information about the bazaar mailing list