Rev 3910: Simple fix to avoid using small.difference_update(large) in http://bazaar.launchpad.net/%7Ebzr/bzr/brisbane-core
John Arbash Meinel
john at arbash-meinel.com
Thu Mar 26 17:56:22 GMT 2009
At http://bazaar.launchpad.net/%7Ebzr/bzr/brisbane-core
------------------------------------------------------------
revno: 3910
revision-id: john at arbash-meinel.com-20090326175542-qmb46mw1d8zt5k1l
parent: john at arbash-meinel.com-20090326163500-os7lvdpsdxnxstd0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: brisbane-core
timestamp: Thu 2009-03-26 12:55:42 -0500
message:
Simple fix to avoid using small.difference_update(large)
It seems the obvious thing to do, but Python's implementation scales poorly.
small = small.difference(large) scales much better [O(small) rather than O(large)].
-------------- next part --------------
=== modified file 'bzrlib/chk_map.py'
--- a/bzrlib/chk_map.py 2009-03-24 19:36:34 +0000
+++ b/bzrlib/chk_map.py 2009-03-26 17:55:42 +0000
@@ -1436,8 +1436,12 @@
# only care about external references.
node = _deserialise(bytes, record.key, search_key_func=None)
if isinstance(node, InternalNode):
- chks = set(node.refs())
- chks.difference_update(all_uninteresting_chks)
+ # all_uninteresting_chks grows large, as it lists all nodes we
+ # don't want to process (including already seen interesting
+ # nodes).
+ # small.difference_update(large) scales O(large), but
+ # small.difference(large) scales O(small).
+ chks = set(node.refs()).difference(all_uninteresting_chks)
# Is set() and .difference_update better than:
# chks = [chk for chk in node.refs()
# if chk not in all_uninteresting_chks]
More information about the bazaar-commits
mailing list