Rev 3910: Simple fix to avoid using small.difference_update(large) in http://bazaar.launchpad.net/%7Ebzr/bzr/brisbane-core

John Arbash Meinel john at arbash-meinel.com
Thu Mar 26 17:56:22 GMT 2009


At http://bazaar.launchpad.net/%7Ebzr/bzr/brisbane-core

------------------------------------------------------------
revno: 3910
revision-id: john at arbash-meinel.com-20090326175542-qmb46mw1d8zt5k1l
parent: john at arbash-meinel.com-20090326163500-os7lvdpsdxnxstd0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: brisbane-core
timestamp: Thu 2009-03-26 12:55:42 -0500
message:
  Simple fix to avoid using small.difference_update(large)
  It seems the obvious thing to do, but Python's implementation scales poorly.
  small = small.difference(large) scales much better [O(small) rather than O(large)].
-------------- next part --------------
=== modified file 'bzrlib/chk_map.py'
--- a/bzrlib/chk_map.py	2009-03-24 19:36:34 +0000
+++ b/bzrlib/chk_map.py	2009-03-26 17:55:42 +0000
@@ -1436,8 +1436,12 @@
             # only care about external references.
             node = _deserialise(bytes, record.key, search_key_func=None)
             if isinstance(node, InternalNode):
-                chks = set(node.refs())
-                chks.difference_update(all_uninteresting_chks)
+                # all_uninteresting_chks grows large, as it lists all nodes we
+                # don't want to process (including already seen interesting
+                # nodes).
+                # small.difference_update(large) scales O(large), but
+                # small.difference(large) scales O(small).
+                chks = set(node.refs()).difference(all_uninteresting_chks)
                 # Is set() and .difference_update better than:
                 # chks = [chk for chk in node.refs()
                 #              if chk not in all_uninteresting_chks]



More information about the bazaar-commits mailing list