Rev 2860: Joining of annotated and plain knits (Ian Clatworthy) in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Tue Sep 25 09:01:11 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2860
revision-id: pqm at pqm.ubuntu.com-20070925080109-vqlnacer5iwwmxm8
parent: pqm at pqm.ubuntu.com-20070925072846-g54nzuhu1b5n3xyn
parent: ian.clatworthy at internode.on.net-20070925064345-o8jx2jhis3zh0x9s
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Tue 2007-09-25 09:01:09 +0100
message:
  Joining of annotated and plain knits (Ian Clatworthy)
modified:
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/knit.py                 knit.py-20051212171256-f056ac8f0fbe1bd9
  bzrlib/tests/test_knit.py      test_knit.py-20051212171302-95d4c00dd5f11f2b
  bzrlib/versionedfile.py        versionedfile.py-20060222045106-5039c71ee3b65490
    ------------------------------------------------------------
    revno: 2858.1.1
    merged: ian.clatworthy at internode.on.net-20070925064345-o8jx2jhis3zh0x9s
    parent: pqm at pqm.ubuntu.com-20070925041614-j2r43hi8rhw9ci4k
    parent: ian.clatworthy at internode.on.net-20070925064104-gdj0iea73g9iy24i
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: ianc-integration2
    timestamp: Tue 2007-09-25 16:43:45 +1000
    message:
      Joining of annotated and plain knits (Ian Clatworthy)
    ------------------------------------------------------------
    revno: 2851.4.6
    merged: ian.clatworthy at internode.on.net-20070925064104-gdj0iea73g9iy24i
    parent: ian.clatworthy at internode.on.net-20070925053912-o5wat48zhxc3q5r9
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.knit-pack-joins
    timestamp: Tue 2007-09-25 16:41:04 +1000
    message:
      review tweaks
    ------------------------------------------------------------
    revno: 2851.4.5
    merged: ian.clatworthy at internode.on.net-20070925053912-o5wat48zhxc3q5r9
    parent: ian.clatworthy at internode.on.net-20070925053538-efa084hz0ejvgh21
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.knit-pack-joins
    timestamp: Tue 2007-09-25 15:39:12 +1000
    message:
      Update NEWS
    ------------------------------------------------------------
    revno: 2851.4.4
    merged: ian.clatworthy at internode.on.net-20070925053538-efa084hz0ejvgh21
    parent: ian.clatworthy at internode.on.net-20070925053507-pcgw7jrgdggp2mq8
    parent: pqm at pqm.ubuntu.com-20070925020712-sf3qg1j3wh0l0hz8
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.knit-pack-joins
    timestamp: Tue 2007-09-25 15:35:38 +1000
    message:
      merge bzr.dev
    ------------------------------------------------------------
    revno: 2851.4.3
    merged: ian.clatworthy at internode.on.net-20070925053507-pcgw7jrgdggp2mq8
    parent: ian.clatworthy at internode.on.net-20070925045420-33ld2lbxqqwq87fi
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.knit-pack-joins
    timestamp: Tue 2007-09-25 15:35:07 +1000
    message:
      fix up plain-to-annotated knit conversion
    ------------------------------------------------------------
    revno: 2851.4.2
    merged: ian.clatworthy at internode.on.net-20070925045420-33ld2lbxqqwq87fi
    parent: ian.clatworthy at internode.on.net-20070924131621-19n00l12199eklzp
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.knit-pack-joins
    timestamp: Tue 2007-09-25 14:54:20 +1000
    message:
      use factory methods in annotated-to-plain conversion instead of duplicating format knowledge
    ------------------------------------------------------------
    revno: 2851.4.1
    merged: ian.clatworthy at internode.on.net-20070924131621-19n00l12199eklzp
    parent: pqm at pqm.ubuntu.com-20070924042807-nfjwj1voh6a8zddf
    committer: Ian Clatworthy <ian.clatworthy at internode.on.net>
    branch nick: bzr.knit-pack-joins
    timestamp: Mon 2007-09-24 23:16:21 +1000
    message:
      Support joining plain knits to annotated knits and vice versa
=== modified file 'NEWS'
--- a/NEWS	2007-09-25 03:21:08 +0000
+++ b/NEWS	2007-09-25 06:43:45 +0000
@@ -139,6 +139,9 @@
      paths from the root down to each element of selected_file_ids are
      returned. (Robert Collins)
 
+   * Knit joining has been enhanced to support plain to annotated conversion
+     and annotated to plain conversion. (Ian Clatworthy)
+
   TESTING:
 
 

=== modified file 'bzrlib/knit.py'
--- a/bzrlib/knit.py	2007-09-24 02:29:44 +0000
+++ b/bzrlib/knit.py	2007-09-25 05:35:07 +0000
@@ -254,7 +254,7 @@
     def parse_line_delta_iter(self, lines):
         return iter(self.parse_line_delta(lines))
 
-    def parse_line_delta(self, lines, version_id):
+    def parse_line_delta(self, lines, version_id, plain=False):
         """Convert a line based delta into internal representation.
 
         line delta is in the form of:
@@ -263,6 +263,10 @@
         revid(utf8) newline\n
         internal representation is
         (start, end, count, [1..count tuples (revid, newline)])
+
+        :param plain: If True, the lines are returned as a plain
+            list, not as a list of tuples, i.e.
+            (start, end, count, [1..count newline])
         """
         result = []
         lines = iter(lines)
@@ -274,10 +278,18 @@
             return cache.setdefault(origin, origin), text
 
         # walk through the lines parsing.
-        for header in lines:
-            start, end, count = [int(n) for n in header.split(',')]
-            contents = [tuple(next().split(' ', 1)) for i in xrange(count)]
-            result.append((start, end, count, contents))
+        # Note that the plain test is explicitly pulled out of the
+        # loop to minimise any performance impact
+        if plain:
+            for header in lines:
+                start, end, count = [int(n) for n in header.split(',')]
+                contents = [next().split(' ', 1)[1] for i in xrange(count)]
+                result.append((start, end, count, contents))
+        else:
+            for header in lines:
+                start, end, count = [int(n) for n in header.split(',')]
+                contents = [tuple(next().split(' ', 1)) for i in xrange(count)]
+                result.append((start, end, count, contents))
         return result
 
     def get_fulltext_content(self, lines):
@@ -2171,8 +2183,20 @@
         assert isinstance(self.source, KnitVersionedFile)
         assert isinstance(self.target, KnitVersionedFile)
 
+        # If the source and target are mismatched w.r.t. annotations vs
+        # plain, the data needs to be converted accordingly
+        if self.source.factory.annotated == self.target.factory.annotated:
+            converter = None
+        elif self.source.factory.annotated:
+            converter = self._anno_to_plain_converter
+        else:
+            # We're converting from a plain to an annotated knit. This requires
+            # building the annotations from scratch. The generic join code
+            # handles this implicitly so we delegate to it.
+            return super(InterKnit, self).join(pb, msg, version_ids,
+                ignore_missing)
+
         version_ids = self._get_source_version_ids(version_ids, ignore_missing)
-
         if not version_ids:
             return 0
 
@@ -2230,13 +2254,31 @@
                 assert version_id == version_id2, 'logic error, inconsistent results'
                 count = count + 1
                 pb.update("Joining knit", count, total)
-                raw_records.append((version_id, options, parents, len(raw_data)))
+                if converter:
+                    size, raw_data = converter(raw_data, version_id, options,
+                        parents)
+                else:
+                    size = len(raw_data)
+                raw_records.append((version_id, options, parents, size))
                 raw_datum.append(raw_data)
             self.target._add_raw_records(raw_records, ''.join(raw_datum))
             return count
         finally:
             pb.finished()
 
+    def _anno_to_plain_converter(self, raw_data, version_id, options,
+                                 parents):
+        """Convert annotated content to plain content."""
+        data, digest = self.source._data._parse_record(version_id, raw_data)
+        if 'fulltext' in options:
+            content = self.source.factory.parse_fulltext(data, version_id)
+            lines = self.target.factory.lower_fulltext(content)
+        else:
+            delta = self.source.factory.parse_line_delta(data, version_id,
+                plain=True)
+            lines = self.target.factory.lower_line_delta(delta)
+        return self.target._data._record_to_data(version_id, digest, lines)
+
 
 InterVersionedFile.register_optimiser(InterKnit)
 

=== modified file 'bzrlib/tests/test_knit.py'
--- a/bzrlib/tests/test_knit.py	2007-09-24 02:29:44 +0000
+++ b/bzrlib/tests/test_knit.py	2007-09-25 06:41:04 +0000
@@ -1315,22 +1315,38 @@
         self.assertEquals(origins[1], ('text-1', 'b\n'))
         self.assertEquals(origins[2], ('text-1', 'c\n'))
 
-    def test_knit_join(self):
-        """Store in knit with parents"""
-        k1 = KnitVersionedFile('test1', get_transport('.'), factory=KnitPlainFactory(), create=True)
-        k1.add_lines('text-a', [], split_lines(TEXT_1))
-        k1.add_lines('text-b', ['text-a'], split_lines(TEXT_1))
-
-        k1.add_lines('text-c', [], split_lines(TEXT_1))
-        k1.add_lines('text-d', ['text-c'], split_lines(TEXT_1))
-
-        k1.add_lines('text-m', ['text-b', 'text-d'], split_lines(TEXT_1))
-
-        k2 = KnitVersionedFile('test2', get_transport('.'), factory=KnitPlainFactory(), create=True)
+    def _test_join_with_factories(self, k1_factory, k2_factory):
+        k1 = KnitVersionedFile('test1', get_transport('.'), factory=k1_factory, create=True)
+        k1.add_lines('text-a', [], ['a1\n', 'a2\n', 'a3\n'])
+        k1.add_lines('text-b', ['text-a'], ['a1\n', 'b2\n', 'a3\n'])
+        k1.add_lines('text-c', [], ['c1\n', 'c2\n', 'c3\n'])
+        k1.add_lines('text-d', ['text-c'], ['c1\n', 'd2\n', 'd3\n'])
+        k1.add_lines('text-m', ['text-b', 'text-d'], ['a1\n', 'b2\n', 'd3\n'])
+        k2 = KnitVersionedFile('test2', get_transport('.'), factory=k2_factory, create=True)
         count = k2.join(k1, version_ids=['text-m'])
         self.assertEquals(count, 5)
         self.assertTrue(k2.has_version('text-a'))
         self.assertTrue(k2.has_version('text-c'))
+        origins = k2.annotate('text-m')
+        self.assertEquals(origins[0], ('text-a', 'a1\n'))
+        self.assertEquals(origins[1], ('text-b', 'b2\n'))
+        self.assertEquals(origins[2], ('text-d', 'd3\n'))
+
+    def test_knit_join_plain_to_plain(self):
+        """Test joining a plain knit with a plain knit."""
+        self._test_join_with_factories(KnitPlainFactory(), KnitPlainFactory())
+
+    def test_knit_join_anno_to_anno(self):
+        """Test joining an annotated knit with an annotated knit."""
+        self._test_join_with_factories(None, None)
+
+    def test_knit_join_anno_to_plain(self):
+        """Test joining an annotated knit with a plain knit."""
+        self._test_join_with_factories(None, KnitPlainFactory())
+
+    def test_knit_join_plain_to_anno(self):
+        """Test joining a plain knit with an annotated knit."""
+        self._test_join_with_factories(KnitPlainFactory(), None)
 
     def test_reannotate(self):
         k1 = KnitVersionedFile('knit1', get_transport('.'),

=== modified file 'bzrlib/versionedfile.py'
--- a/bzrlib/versionedfile.py	2007-09-20 06:12:51 +0000
+++ b/bzrlib/versionedfile.py	2007-09-25 05:35:07 +0000
@@ -625,8 +625,9 @@
             # TODO: remove parent texts when they are not relevant any more for 
             # memory pressure reduction. RBC 20060313
             # pb.update('Converting versioned data', 0, len(order))
+            total = len(order)
             for index, version in enumerate(order):
-                pb.update('Converting versioned data', index, len(order))
+                pb.update('Converting versioned data', index, total)
                 _, _, parent_text = target.add_lines(version,
                                                self.source.get_parents(version),
                                                self.source.get_lines(version),
@@ -640,6 +641,8 @@
                                         msg,
                                         version_ids,
                                         ignore_missing)
+            else:
+                return total
         finally:
             pb.finished()
 




More information about the bazaar-commits mailing list