Rev 5378: Cache the lines as extracted if they are used right now. in http://bazaar.launchpad.net/~jameinel/bzr/2.3-send-mem-614576

Tue Aug 10 20:02:37 BST 2010

At http://bazaar.launchpad.net/~jameinel/bzr/2.3-send-mem-614576

------------------------------------------------------------
revno: 5378
revision-id: john at arbash-meinel.com-20100810190229-uzdp8oqsa6a67i0n
parent: john at arbash-meinel.com-20100810185520-2xswt7vw2y43f0j5
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 2.3-send-mem-614576
timestamp: Tue 2010-08-10 14:02:29 -0500
message:
  Cache the lines as extracted if they are used right now.
  
  While digging into the code, it seems that the real peak memory actually
  happens during the _add_inventory_mpdiffs_from_serializer, which is
  another mpdiff generator that extracts *all* content texts.
  We should be able to do better.
-------------- next part --------------
=== modified file 'bzrlib/versionedfile.py'

--- a/bzrlib/versionedfile.py	2010-08-10 18:55:20 +0000
+++ b/bzrlib/versionedfile.py	2010-08-10 19:02:29 +0000
@@ -306,17 +306,17 @@
                 else:
                     self.refcounts[p] = refcount - 1
                     parent_chunks = self.chunks[p]
+                p_lines = osutils.chunks_to_lines(parent_chunks)
                 # TODO: Should we cache the line form? We did the
                 #       computation to get it, but storing it this way will
                 #       be less memory efficient...
-                parent_lines.append(osutils.chunks_to_lines(parent_chunks))
+                parent_lines.append(p_lines)
+                del p_lines
             lines = osutils.chunks_to_lines(this_chunks)
-            # TODO: Should we be caching lines instead of chunks?
-            #       Higher-memory, but avoids double extracting.
-            #       If we have good topological sorting, we shouldn't have
-            #       much pending stuff cached...
-            ## this_chunks = lines
+            # Since we needed the lines, we'll go ahead and cache them this way
+            this_chunks = lines
             self._compute_diff(record.key, parent_lines, lines)
+            del lines
         # Is this content required for any more children?
         if record.key in self.refcounts:
             self.chunks[record.key] = this_chunks
@@ -1183,7 +1183,6 @@
 
     def make_mpdiffs(self, keys):
         """Create multiparent diffs for specified keys."""
-        import pdb; pdb.set_trace()
         generator = _MPDiffGenerator(self, keys)
         return generator.compute_diffs()