Rev 3807: Getting rid of Inv.copy() and changing the LRU into a FIFO drops the in http://bzr.arbash-meinel.com/branches/bzr/brisbane/merge_dev
John Arbash Meinel
john at arbash-meinel.com
Tue Dec 9 04:27:12 GMT 2008
At http://bzr.arbash-meinel.com/branches/bzr/brisbane/merge_dev
------------------------------------------------------------
revno: 3807
revision-id: john at arbash-meinel.com-20081209042646-2ncxnd5hxio03et8
parent: john at arbash-meinel.com-20081207182513-00r47hr8fevohotb
parent: john at arbash-meinel.com-20081208183149-0m60m72x4pw378ib
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: merge_dev
timestamp: Mon 2008-12-08 22:26:46 -0600
message:
Getting rid of Inv.copy() and changing the LRU into a FIFO drops the
time down into 2m19s. (Down from 4m originally.)
Also hacked up the Inventory.add() to also be minimal.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/inventory.py inventory.py-20050309040759-6648b84ca2005b37
bzrlib/lru_cache.py lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
bzrlib/repofmt/pack_repo.py pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
bzrlib/xml5.py xml5.py-20080328030717-t9guwinq8hom0ar3-1
bzrlib/xml8.py xml5.py-20050907032657-aac8f960815b66b1
------------------------------------------------------------
revno: 3801.1.3
revision-id: john at arbash-meinel.com-20081208183149-0m60m72x4pw378ib
parent: john at arbash-meinel.com-20081207184233-fu65des8q4jhr8ig
parent: john at arbash-meinel.com-20081208183041-9r88rfoms0y8cr5b
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: debug_hacks
timestamp: Mon 2008-12-08 12:31:49 -0600
message:
Merge the XML entry cache.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/lru_cache.py lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
bzrlib/xml8.py xml5.py-20050907032657-aac8f960815b66b1
------------------------------------------------------------
revno: 3735.28.110
revision-id: john at arbash-meinel.com-20081208183041-9r88rfoms0y8cr5b
parent: john at arbash-meinel.com-20081208183004-gaxfel9xzzwrc093
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: xml_cache
timestamp: Mon 2008-12-08 12:30:41 -0600
message:
If we are going to thrash the inventory entry cache, increase its size.
modified:
bzrlib/xml8.py xml5.py-20050907032657-aac8f960815b66b1
------------------------------------------------------------
revno: 3735.28.109
revision-id: john at arbash-meinel.com-20081208183004-gaxfel9xzzwrc093
parent: john at arbash-meinel.com-20081208182757-2rls8q1ri36ub6e9
parent: john at arbash-meinel.com-20081208182300-u1qmnxafwt2rr5dz
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: xml_cache
timestamp: Mon 2008-12-08 12:30:04 -0600
message:
Merge the lru cache changes.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/lru_cache.py lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
------------------------------------------------------------
revno: 3735.137.1
revision-id: john at arbash-meinel.com-20081208182300-u1qmnxafwt2rr5dz
parent: pqm at pqm.ubuntu.com-20081205181554-ofrdnafloc43bxkh
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: lru_cache
timestamp: Mon 2008-12-08 12:23:00 -0600
message:
Add LRUCache.resize(), and change the init arguments for LRUCache.
The old name was a bit confusing, and caused LRUSizeCache to re-use variables in
a confusing way with LRUCache.
Also, this changes the default cleanup size to be 80% of max_size. This should
be better, as it means we get a little bit of room when adding keys,
rather than having to cleanup after every add, we can instead do it in
batches.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/lru_cache.py lru_cache.py-20070119165515-tlw203kuwh0id5gv-1
bzrlib/tests/test_lru_cache.py test_lru_cache.py-20070119165535-hph6rk4h9rzy4180-1
------------------------------------------------------------
revno: 3735.28.108
revision-id: john at arbash-meinel.com-20081208182757-2rls8q1ri36ub6e9
parent: pqm at pqm.ubuntu.com-20081205181554-ofrdnafloc43bxkh
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: xml_cache
timestamp: Mon 2008-12-08 12:27:57 -0600
message:
Add an InventoryEntry cache to the xml deserializer.
modified:
bzrlib/xml8.py xml5.py-20050907032657-aac8f960815b66b1
------------------------------------------------------------
revno: 3801.1.2
revision-id: john at arbash-meinel.com-20081207184233-fu65des8q4jhr8ig
parent: john at arbash-meinel.com-20081207174338-vx3sm4yzedmk338j
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: debug_hacks
timestamp: Sun 2008-12-07 12:42:33 -0600
message:
Don't execute pack ops twice.
modified:
bzrlib/repofmt/pack_repo.py pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
------------------------------------------------------------
revno: 3801.1.1
revision-id: john at arbash-meinel.com-20081207174338-vx3sm4yzedmk338j
parent: john at arbash-meinel.com-20081207174043-xjxckelplsqdey73
parent: john at arbash-meinel.com-20081203043230-f0riipwg6wkjr6ae
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: merge_dev
timestamp: Sun 2008-12-07 11:43:38 -0600
message:
Merge in the debug_hacks.
modified:
bzrlib/repofmt/pack_repo.py pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
------------------------------------------------------------
revno: 3791.1.16
revision-id: john at arbash-meinel.com-20081203043230-f0riipwg6wkjr6ae
parent: john at arbash-meinel.com-20081203041138-ecssp5m0mqxjgzhu
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: chk_map
timestamp: Tue 2008-12-02 22:32:30 -0600
message:
Hack in some other code, so we can determine how much compression we get.
This just tracks the 'old size' of all the packs that are getting combined versus the
'new size' of the newly created pack file.
modified:
bzrlib/repofmt/pack_repo.py pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
------------------------------------------------------------
revno: 3791.1.15
revision-id: john at arbash-meinel.com-20081203041138-ecssp5m0mqxjgzhu
parent: john at arbash-meinel.com-20081203035643-0x4npc4s8mh8nqlh
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: chk_map
timestamp: Tue 2008-12-02 22:11:38 -0600
message:
Add size information to the mutter when -Dpack is used.
Also fix a bug in -Dpack when the repository doesn't support chk_bytes.
modified:
bzrlib/repofmt/pack_repo.py pack_repo.py-20070813041115-gjv5ma7ktfqwsjgn-1
-------------- next part --------------
=== modified file 'NEWS'
--- a/NEWS 2008-12-07 17:40:43 +0000
+++ b/NEWS 2008-12-08 18:31:49 +0000
@@ -9,6 +9,11 @@
--------------
CHANGES:
+
+ * ``LRUCache(after_cleanup_size)`` was renamed to
+ ``after_cleanup_count`` and the old name deprecated. The new name is
+ used for clarity, and to avoid confusion with
+ ``LRUSizeCache(after_cleanup_size)``. (John Arbash Meinel)
NEW FEATURES:
=== modified file 'bzrlib/inventory.py'
--- a/bzrlib/inventory.py 2008-12-07 17:40:43 +0000
+++ b/bzrlib/inventory.py 2008-12-09 04:26:46 +0000
@@ -1137,6 +1137,16 @@
parent.children[entry.name] = entry
return self._add_child(entry)
+ def _add_one_entry_no_checks(self, entry, last_parent):
+ assert last_parent is not None
+ if last_parent.file_id == entry.parent_id:
+ parent = last_parent
+ else:
+ parent = self._byid[entry.parent_id]
+ parent.children[entry.name] = entry
+ self._byid[entry.file_id] = entry
+ return parent
+
def add_path(self, relpath, kind, file_id=None, parent_id=None):
"""Add entry from a path.
=== modified file 'bzrlib/lru_cache.py'
--- a/bzrlib/lru_cache.py 2008-10-14 20:19:06 +0000
+++ b/bzrlib/lru_cache.py 2008-12-08 18:23:00 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2006 Canonical Ltd
+# Copyright (C) 2006, 2008 Canonical Ltd
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -17,25 +17,26 @@
"""A simple least-recently-used (LRU) cache."""
from collections import deque
-import gc
+
+from bzrlib import symbol_versioning
class LRUCache(object):
"""A class which manages a cache of entries, removing unused ones."""
- def __init__(self, max_cache=100, after_cleanup_size=None):
- self._max_cache = max_cache
- if after_cleanup_size is None:
- self._after_cleanup_size = self._max_cache
- else:
- self._after_cleanup_size = min(after_cleanup_size, self._max_cache)
-
- self._compact_queue_length = 4*self._max_cache
-
+ def __init__(self, max_cache=100, after_cleanup_count=None,
+ after_cleanup_size=symbol_versioning.DEPRECATED_PARAMETER):
+ if symbol_versioning.deprecated_passed(after_cleanup_size):
+ symbol_versioning.warn('LRUCache.__init__(after_cleanup_size) was'
+ ' deprecated in 1.11. Use'
+ ' after_cleanup_count instead.',
+ DeprecationWarning)
+ after_cleanup_count = after_cleanup_size
self._cache = {}
self._cleanup = {}
self._queue = deque() # Track when things are accessed
self._refcount = {} # number of entries in self._queue for each key
+ self._update_max_cache(max_cache, after_cleanup_count)
def __contains__(self, key):
return key in self._cache
@@ -89,11 +90,13 @@
"""Clear the cache until it shrinks to the requested size.
This does not completely wipe the cache, just makes sure it is under
- the after_cleanup_size.
+ the after_cleanup_count.
"""
# Make sure the cache is shrunk to the correct size
- while len(self._cache) > self._after_cleanup_size:
+ while len(self._cache) > self._after_cleanup_count:
self._remove_lru()
+ # No need to compact the queue at this point, because the code that
+ # calls this would have already triggered it based on queue length
def __setitem__(self, key, value):
"""Add a value to the cache, there will be no cleanup function."""
@@ -150,6 +153,23 @@
while self._cache:
self._remove_lru()
+ def resize(self, max_cache, after_cleanup_count=None):
+ """Change the number of entries that will be cached."""
+ self._update_max_cache(max_cache,
+ after_cleanup_count=after_cleanup_count)
+
+ def _update_max_cache(self, max_cache, after_cleanup_count=None):
+ self._max_cache = max_cache
+ if after_cleanup_count is None:
+ self._after_cleanup_count = self._max_cache * 8 / 10
+ else:
+ self._after_cleanup_count = min(after_cleanup_count, self._max_cache)
+
+ self._compact_queue_length = 4*self._max_cache
+ if len(self._queue) > self._compact_queue_length:
+ self._compact_queue()
+ self.cleanup()
+
class LRUSizeCache(LRUCache):
"""An LRUCache that removes things based on the size of the values.
@@ -175,20 +195,15 @@
The function should take the form "compute_size(value) => integer".
If not supplied, it defaults to 'len()'
"""
- # This approximates that texts are > 0.5k in size. It only really
- # effects when we clean up the queue, so we don't want it to be too
- # large.
- LRUCache.__init__(self, max_cache=int(max_size/512))
- self._max_size = max_size
- if after_cleanup_size is None:
- self._after_cleanup_size = self._max_size
- else:
- self._after_cleanup_size = min(after_cleanup_size, self._max_size)
-
self._value_size = 0
self._compute_size = compute_size
if compute_size is None:
self._compute_size = len
+ # This approximates that texts are > 0.5k in size. It only really
+ # effects when we clean up the queue, so we don't want it to be too
+ # large.
+ self._update_max_size(max_size, after_cleanup_size=after_cleanup_size)
+ LRUCache.__init__(self, max_cache=max(int(max_size/512), 1))
def add(self, key, value, cleanup=None):
"""Add a new value to the cache.
@@ -229,3 +244,16 @@
"""Remove an entry, making sure to maintain the invariants."""
val = LRUCache._remove(self, key)
self._value_size -= self._compute_size(val)
+
+ def resize(self, max_size, after_cleanup_size=None):
+ """Change the number of bytes that will be cached."""
+ self._update_max_size(max_size, after_cleanup_size=after_cleanup_size)
+ max_cache = max(int(max_size/512), 1)
+ self._update_max_cache(max_cache)
+
+ def _update_max_size(self, max_size, after_cleanup_size=None):
+ self._max_size = max_size
+ if after_cleanup_size is None:
+ self._after_cleanup_size = self._max_size * 8 / 10
+ else:
+ self._after_cleanup_size = min(after_cleanup_size, self._max_size)
=== modified file 'bzrlib/repofmt/pack_repo.py'
--- a/bzrlib/repofmt/pack_repo.py 2008-12-07 18:25:13 +0000
+++ b/bzrlib/repofmt/pack_repo.py 2008-12-09 04:26:46 +0000
@@ -418,9 +418,15 @@
'../packs/' + self.name + '.pack')
self._state = 'finished'
if 'pack' in debug.debug_flags:
+ try:
+ size = self.pack_transport.stat(self.name + '.pack').st_size
+ size /= 1024.*1024
+ except errors.TransportNotPossible:
+ size = -1
# XXX: size might be interesting?
- mutter('%s: create_pack: pack renamed into place: %s%s->%s%s t+%6.3fs',
- time.ctime(), self.upload_transport.base, self.random_name,
+ mutter('%s: create_pack: pack renamed into place (%.3fMB): %s%s->%s%s'
+ ' t+%6.3fs',
+ time.ctime(), size, self.upload_transport.base, self.random_name,
self.pack_transport, self.name,
time.time() - self.start_time)
@@ -815,10 +821,17 @@
rev_count = len(self.revision_ids)
else:
rev_count = 'all'
- mutter('%s: create_pack: creating pack from source packs: '
+ size = 0
+ for a_pack in self.packs:
+ try:
+ size += a_pack.pack_transport.stat(a_pack.name + '.pack').st_size
+ except errors.TransportNotPossible:
+ pass
+ size /= 1024.*1024
+ mutter('%s: create_pack: creating pack from source packs (%.3fMB): '
'%s%s %s revisions wanted %s t=0',
- time.ctime(), self._pack_collection._upload_transport.base, new_pack.random_name,
- plain_pack_list, rev_count)
+ time.ctime(), size, self._pack_collection._upload_transport.base,
+ new_pack.random_name, plain_pack_list, rev_count)
self._copy_revision_texts()
self._copy_inventory_texts()
self._copy_text_texts()
@@ -1376,8 +1389,17 @@
'containing %d revisions. Packing %d files into %d affecting %d'
' revisions', self, total_packs, total_revisions, num_old_packs,
num_new_packs, num_revs_affected)
- self._execute_pack_operations(pack_operations)
- mutter('Auto-packing repository %s completed', self)
+ old_size, new_size = self._execute_pack_operations(pack_operations)
+ if old_size is None:
+ old_size = -1
+ else:
+ old_size /= (1024.0*1024)
+ if new_size is None:
+ new_size = -1
+ else:
+ new_size /= (1024.0*1024)
+ mutter('Auto-packing repository %s completed %.3fMB => %.3fMB',
+ self.transport.base, old_size, new_size)
return True
def _execute_pack_operations(self, pack_operations, _packer_class=Packer):
@@ -1387,19 +1409,32 @@
:param _packer_class: The class of packer to use (default: Packer).
:return: None.
"""
+ new_size = 0
for revision_count, packs in pack_operations:
# we may have no-ops from the setup logic
if len(packs) == 0:
continue
- _packer_class(self, packs, '.autopack').pack()
+ new_pack = _packer_class(self, packs, '.autopack').pack()
+ try:
+ new_size += new_pack.pack_transport.stat(new_pack.name + '.pack').st_size
+ except errors.TransportNotPossible:
+ new_size = None
for pack in packs:
self._remove_pack_from_memory(pack)
# record the newly available packs and stop advertising the old
# packs
+ if new_size is None:
+ old_size = None
+ else:
+ old_size = 0
+ for revision_count, packs in pack_operations:
+ for a_pack in packs:
+ old_size += a_pack.pack_transport.stat(a_pack.name + '.pack').st_size
self._save_pack_names(clear_obsolete_packs=True)
# Move the old packs out of the way now they are no longer referenced.
for revision_count, packs in pack_operations:
self._obsolete_packs(packs)
+ return old_size, new_size
def lock_names(self):
"""Acquire the mutex around the pack-names index.
=== modified file 'bzrlib/tests/test_lru_cache.py'
--- a/bzrlib/tests/test_lru_cache.py 2008-10-14 20:19:06 +0000
+++ b/bzrlib/tests/test_lru_cache.py 2008-12-08 18:23:00 +0000
@@ -1,4 +1,4 @@
-# Copyright (C) 2006 Canonical Ltd
+# Copyright (C) 2006, 2008 Canonical Ltd
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
@@ -38,7 +38,7 @@
def test_overflow(self):
"""Adding extra entries will pop out old ones."""
- cache = lru_cache.LRUCache(max_cache=1)
+ cache = lru_cache.LRUCache(max_cache=1, after_cleanup_count=1)
cache['foo'] = 'bar'
# With a max cache of 1, adding 'baz' should pop out 'foo'
@@ -113,7 +113,7 @@
self.assertEqual([(2, 20), (2, 25)], cleanup_called)
def test_len(self):
- cache = lru_cache.LRUCache(max_cache=10)
+ cache = lru_cache.LRUCache(max_cache=10, after_cleanup_count=10)
cache[1] = 10
cache[2] = 20
@@ -140,8 +140,8 @@
# We hit the max
self.assertEqual(10, len(cache))
- def test_cleanup_shrinks_to_after_clean_size(self):
- cache = lru_cache.LRUCache(max_cache=5, after_cleanup_size=3)
+ def test_cleanup_shrinks_to_after_clean_count(self):
+ cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=3)
cache.add(1, 10)
cache.add(2, 20)
@@ -156,15 +156,16 @@
self.assertEqual(3, len(cache))
def test_after_cleanup_larger_than_max(self):
- cache = lru_cache.LRUCache(max_cache=5, after_cleanup_size=10)
- self.assertEqual(5, cache._after_cleanup_size)
+ cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=10)
+ self.assertEqual(5, cache._after_cleanup_count)
def test_after_cleanup_none(self):
- cache = lru_cache.LRUCache(max_cache=5, after_cleanup_size=None)
- self.assertEqual(5, cache._after_cleanup_size)
+ cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=None)
+ # By default _after_cleanup_size is 80% of the normal size
+ self.assertEqual(4, cache._after_cleanup_count)
def test_cleanup(self):
- cache = lru_cache.LRUCache(max_cache=5, after_cleanup_size=2)
+ cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=2)
# Add these in order
cache.add(1, 10)
@@ -214,7 +215,7 @@
self.assertIs(obj, cache.get(3, obj))
def test_keys(self):
- cache = lru_cache.LRUCache(max_cache=5)
+ cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=5)
cache[1] = 2
cache[2] = 3
@@ -225,6 +226,52 @@
cache[6] = 7
self.assertEqual([2, 3, 4, 5, 6], sorted(cache.keys()))
+ def test_after_cleanup_size_deprecated(self):
+ obj = self.callDeprecated([
+ 'LRUCache.__init__(after_cleanup_size) was deprecated in 1.11.'
+ ' Use after_cleanup_count instead.'],
+ lru_cache.LRUCache, 50, after_cleanup_size=25)
+ self.assertEqual(obj._after_cleanup_count, 25)
+
+ def test_resize_smaller(self):
+ cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=4)
+ cache[1] = 2
+ cache[2] = 3
+ cache[3] = 4
+ cache[4] = 5
+ cache[5] = 6
+ self.assertEqual([1, 2, 3, 4, 5], sorted(cache.keys()))
+ cache[6] = 7
+ self.assertEqual([3, 4, 5, 6], sorted(cache.keys()))
+ # Now resize to something smaller, which triggers a cleanup
+ cache.resize(max_cache=3, after_cleanup_count=2)
+ self.assertEqual([5, 6], sorted(cache.keys()))
+ # Adding something will use the new size
+ cache[7] = 8
+ self.assertEqual([5, 6, 7], sorted(cache.keys()))
+ cache[8] = 9
+ self.assertEqual([7, 8], sorted(cache.keys()))
+
+ def test_resize_larger(self):
+ cache = lru_cache.LRUCache(max_cache=5, after_cleanup_count=4)
+ cache[1] = 2
+ cache[2] = 3
+ cache[3] = 4
+ cache[4] = 5
+ cache[5] = 6
+ self.assertEqual([1, 2, 3, 4, 5], sorted(cache.keys()))
+ cache[6] = 7
+ self.assertEqual([3, 4, 5, 6], sorted(cache.keys()))
+ cache.resize(max_cache=8, after_cleanup_count=6)
+ self.assertEqual([3, 4, 5, 6], sorted(cache.keys()))
+ cache[7] = 8
+ cache[8] = 9
+ cache[9] = 10
+ cache[10] = 11
+ self.assertEqual([3, 4, 5, 6, 7, 8, 9, 10], sorted(cache.keys()))
+ cache[11] = 12 # triggers cleanup back to new after_cleanup_count
+ self.assertEqual([6, 7, 8, 9, 10, 11], sorted(cache.keys()))
+
class TestLRUSizeCache(tests.TestCase):
@@ -232,7 +279,7 @@
cache = lru_cache.LRUSizeCache()
self.assertEqual(2048, cache._max_cache)
self.assertEqual(4*2048, cache._compact_queue_length)
- self.assertEqual(cache._max_size, cache._after_cleanup_size)
+ self.assertEqual(int(cache._max_size*0.8), cache._after_cleanup_size)
self.assertEqual(0, cache._value_size)
def test_add_tracks_size(self):
@@ -332,3 +379,37 @@
cache[2] = 'b'
cache[3] = 'cdef'
self.assertEqual([1, 2, 3], sorted(cache.keys()))
+
+ def test_resize_smaller(self):
+ cache = lru_cache.LRUSizeCache(max_size=10, after_cleanup_size=9)
+ cache[1] = 'abc'
+ cache[2] = 'def'
+ cache[3] = 'ghi'
+ cache[4] = 'jkl'
+ # Triggers a cleanup
+ self.assertEqual([2, 3, 4], sorted(cache.keys()))
+ # Resize should also cleanup again
+ cache.resize(max_size=6, after_cleanup_size=4)
+ self.assertEqual([4], sorted(cache.keys()))
+ # Adding should use the new max size
+ cache[5] = 'mno'
+ self.assertEqual([4, 5], sorted(cache.keys()))
+ cache[6] = 'pqr'
+ self.assertEqual([6], sorted(cache.keys()))
+
+ def test_resize_larger(self):
+ cache = lru_cache.LRUSizeCache(max_size=10, after_cleanup_size=9)
+ cache[1] = 'abc'
+ cache[2] = 'def'
+ cache[3] = 'ghi'
+ cache[4] = 'jkl'
+ # Triggers a cleanup
+ self.assertEqual([2, 3, 4], sorted(cache.keys()))
+ cache.resize(max_size=15, after_cleanup_size=12)
+ self.assertEqual([2, 3, 4], sorted(cache.keys()))
+ cache[5] = 'mno'
+ cache[6] = 'pqr'
+ self.assertEqual([2, 3, 4, 5, 6], sorted(cache.keys()))
+ cache[7] = 'stu'
+ self.assertEqual([4, 5, 6, 7], sorted(cache.keys()))
+
=== modified file 'bzrlib/xml5.py'
--- a/bzrlib/xml5.py 2008-04-24 07:22:53 +0000
+++ b/bzrlib/xml5.py 2008-12-09 04:26:46 +0000
@@ -44,13 +44,28 @@
if data_revision_id is not None:
revision_id = cache_utf8.encode(data_revision_id)
inv = inventory.Inventory(root_id, revision_id=revision_id)
+ last = (inv.root.file_id, inv.root, inv.root.children)
+ byid = inv._byid
+ unpack_entry = self._unpack_entry
for e in elt:
- ie = self._unpack_entry(e)
- if ie.parent_id is None:
- ie.parent_id = root_id
- inv.add(ie)
+ ie = unpack_entry(e)
+ parent_id = ie.parent_id
+ if parent_id is None:
+ ie.parent_id = parent_id = root_id
+ if last[0] == parent_id:
+ next = last
+ else:
+ parent = byid[parent_id]
+ next = (parent_id, parent, parent.children)
+ next[2][ie.name] = ie
+ byid[ie.file_id] = ie
+ last = next
if revision_id is not None:
inv.root.revision = revision_id
+ if len(inv) > xml8._entry_cache._max_cache:
+ new_len = len(inv) * 1.2
+ trace.note('Resizing inventory cache to %s', new_len)
+ _entry_cache.resize(new_len)
return inv
def _check_revisions(self, inv):
=== modified file 'bzrlib/xml8.py'
--- a/bzrlib/xml8.py 2008-04-24 07:22:53 +0000
+++ b/bzrlib/xml8.py 2008-12-09 04:26:46 +0000
@@ -21,7 +21,9 @@
cache_utf8,
errors,
inventory,
+ lru_cache,
revision as _mod_revision,
+ trace,
)
from bzrlib.xml_serializer import SubElement, Element, Serializer
from bzrlib.inventory import ROOT_ID, Inventory, InventoryEntry
@@ -38,6 +40,8 @@
"<":"<",
">":">",
}
+# A cache of InventoryEntry objects
+_entry_cache = lru_cache.LRUCache(10*1024)
def _ensure_utf8_re():
@@ -352,44 +356,65 @@
for e in elt:
ie = self._unpack_entry(e)
inv.add(ie)
+ if len(inv) > _entry_cache._max_cache:
+ new_len = len(inv) * 1.2
+ trace.note('Resizing inventory cache to %s', new_len)
+ _entry_cache.resize(new_len)
return inv
- def _unpack_entry(self, elt):
+ def _unpack_entry(self, elt, _entry_cache=_entry_cache,
+ _raw_cache=_entry_cache._cache):
+ elt_get = elt.get
+ file_id = elt_get('file_id')
+ revision = elt_get('revision')
+ # Check and see if we have already unpacked this exact entry
+ key = (file_id, revision)
+ try:
+ # Using the raw cache (basically FIFO instead of LRU) saves 30s
+ raw_ie = _raw_cache[key]
+ except KeyError:
+ pass
+ else:
+ # calling .copy() only on directorie saves 15s
+ if raw_ie.kind == 'directory':
+ return raw_ie.copy()
+ return raw_ie
+
kind = elt.tag
if not InventoryEntry.versionable_kind(kind):
raise AssertionError('unsupported entry kind %s' % kind)
get_cached = _get_utf8_or_ascii
- parent_id = elt.get('parent_id')
+ file_id = get_cached(file_id)
+ if revision is not None:
+ revision = get_cached(revision)
+ parent_id = elt_get('parent_id')
if parent_id is not None:
parent_id = get_cached(parent_id)
- file_id = get_cached(elt.get('file_id'))
if kind == 'directory':
ie = inventory.InventoryDirectory(file_id,
- elt.get('name'),
+ elt_get('name'),
parent_id)
elif kind == 'file':
ie = inventory.InventoryFile(file_id,
- elt.get('name'),
+ elt_get('name'),
parent_id)
- ie.text_sha1 = elt.get('text_sha1')
- if elt.get('executable') == 'yes':
+ ie.text_sha1 = elt_get('text_sha1')
+ if elt_get('executable') == 'yes':
ie.executable = True
- v = elt.get('text_size')
+ v = elt_get('text_size')
ie.text_size = v and int(v)
elif kind == 'symlink':
ie = inventory.InventoryLink(file_id,
- elt.get('name'),
+ elt_get('name'),
parent_id)
- ie.symlink_target = elt.get('symlink_target')
+ ie.symlink_target = elt_get('symlink_target')
else:
raise errors.UnsupportedInventoryKind(kind)
- revision = elt.get('revision')
- if revision is not None:
- revision = get_cached(revision)
ie.revision = revision
+ _entry_cache[key] = ie
return ie
More information about the bazaar-commits
mailing list