Rev 4301: (jam) Tweaks to the pure-python group compressor, in file:///home/pqm/archives/thelove/bzr/%2Btrunk/
Canonical.com Patch Queue Manager
pqm at pqm.ubuntu.com
Thu Apr 23 03:08:23 BST 2009
At file:///home/pqm/archives/thelove/bzr/%2Btrunk/
------------------------------------------------------------
revno: 4301
revision-id: pqm at pqm.ubuntu.com-20090423015537-xfgqsbjj9ctpcd3o
parent: pqm at pqm.ubuntu.com-20090420092748-tm2cofylpjauo1nw
parent: john at arbash-meinel.com-20090423005830-kkdc31tqjetbj2f0
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Thu 2009-04-23 02:55:37 +0100
message:
(jam) Tweaks to the pure-python group compressor,
shrinks time from 30min => 4min for some circumstances.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/_groupcompress_py.py _groupcompress_py.py-20090324110021-j63s399f4icrgw4p-1
bzrlib/groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
bzrlib/tests/test__groupcompress.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
bzrlib/tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 4300.1.7
revision-id: john at arbash-meinel.com-20090423005830-kkdc31tqjetbj2f0
parent: john at arbash-meinel.com-20090422231241-rb3imoltcpzeghfe
parent: john at arbash-meinel.com-20090421235416-f0cz6ilf5cufbugi
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_info
timestamp: Wed 2009-04-22 19:58:30 -0500
message:
Bring in the other test cases.
Also, remove the assert statements.
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/_groupcompress_py.py _groupcompress_py.py-20090324110021-j63s399f4icrgw4p-1
bzrlib/tests/test__groupcompress.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 4300.2.1
revision-id: john at arbash-meinel.com-20090421235416-f0cz6ilf5cufbugi
parent: pqm at pqm.ubuntu.com-20090420092748-tm2cofylpjauo1nw
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: 1.15-gc-python
timestamp: Tue 2009-04-21 18:54:16 -0500
message:
Fix bug #364900, properly remove the 64kB that was just encoded in the copy.
Also, stop supporting None as a copy length in 'encode_copy_instruction'.
It was only used by the test suite, and it is good to pull that sort of thing out of
production code. (Besides, setting the copy to 64kB has the same effect.)
modified:
NEWS NEWS-20050323055033-4e00b5db738777ff
bzrlib/_groupcompress_py.py _groupcompress_py.py-20090324110021-j63s399f4icrgw4p-1
bzrlib/tests/test__groupcompress.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 4300.1.6
revision-id: john at arbash-meinel.com-20090422231241-rb3imoltcpzeghfe
parent: john at arbash-meinel.com-20090422225955-xkcuonztuijyxec2
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_info
timestamp: Wed 2009-04-22 18:12:41 -0500
message:
Remove a couple TODOs that don't matter.
modified:
bzrlib/_groupcompress_py.py _groupcompress_py.py-20090324110021-j63s399f4icrgw4p-1
------------------------------------------------------------
revno: 4300.1.5
revision-id: john at arbash-meinel.com-20090422225955-xkcuonztuijyxec2
parent: john at arbash-meinel.com-20090422221458-wg8pwibhdvgvvths
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_info
timestamp: Wed 2009-04-22 17:59:55 -0500
message:
A couple more cleanups on the pure-python implementation.
This drops the time for 'bzr pack' from 30min+ down to 4min.
1) Keep the matching entries as a set rather than a list and then casting
into a set all the time.
2) Delay incrementing until doing a match, and then only increment the
small set rather than the large one. 'prev' has gone through a set
intersection in most code paths, so it will be a lot smaller than
the raw 'locations'.
modified:
bzrlib/_groupcompress_py.py _groupcompress_py.py-20090324110021-j63s399f4icrgw4p-1
------------------------------------------------------------
revno: 4300.1.4
revision-id: john at arbash-meinel.com-20090422221458-wg8pwibhdvgvvths
parent: john at arbash-meinel.com-20090422205425-ujz47ris3ekak1h4
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_info
timestamp: Wed 2009-04-22 17:14:58 -0500
message:
Change self._matching_lines to use a set rather than a list.
We need to consider memory consumption, etc, but it means we don't have
to cast into a set() to do the intersection check.
Might consider redoing the copy_ends code of _get_longest_match.
modified:
bzrlib/_groupcompress_py.py _groupcompress_py.py-20090324110021-j63s399f4icrgw4p-1
------------------------------------------------------------
revno: 4300.1.3
revision-id: john at arbash-meinel.com-20090422205425-ujz47ris3ekak1h4
parent: john at arbash-meinel.com-20090422204951-xykrubpy1zehhr9p
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_info
timestamp: Wed 2009-04-22 15:54:25 -0500
message:
The assertion is <= 127, not < 127
modified:
bzrlib/_groupcompress_py.py _groupcompress_py.py-20090324110021-j63s399f4icrgw4p-1
------------------------------------------------------------
revno: 4300.1.2
revision-id: john at arbash-meinel.com-20090422204951-xykrubpy1zehhr9p
parent: john at arbash-meinel.com-20090422171845-5dmqokv8ygf3cvs5
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_info
timestamp: Wed 2009-04-22 15:49:51 -0500
message:
Change the pure-python compressor a bit.
Specifically, change how we encode insertions, but factor out that code into
another class.
The primary change is trying to get better line-based alignment for inserts,
subject to the 127 character insert limit.
The old code would take a long insert, split it into 127 byte chunks, and then
split those chunks into lines.
However, that tends to leave hunks that can't be indexed, because they aren't
a complete line.
So now we iterate over the lines, fitting them into 127-byte insertions as
possible, so we get proper indexing.
Note that it means any line > 127 bytes will never be matched, which is
a fairly serious issue in the pure-python matcher, but not worth fixing,
because you can just use the compiled matcher instead.
modified:
bzrlib/_groupcompress_py.py _groupcompress_py.py-20090324110021-j63s399f4icrgw4p-1
------------------------------------------------------------
revno: 4300.1.1
revision-id: john at arbash-meinel.com-20090422171845-5dmqokv8ygf3cvs5
parent: pqm at pqm.ubuntu.com-20090420092748-tm2cofylpjauo1nw
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_info
timestamp: Wed 2009-04-22 12:18:45 -0500
message:
Add the ability to convert a gc block into 'human readable' form.
modified:
bzrlib/groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
bzrlib/tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
=== modified file 'NEWS'
--- a/NEWS 2009-04-20 08:37:32 +0000
+++ b/NEWS 2009-04-21 23:54:16 +0000
@@ -44,6 +44,9 @@
* Non-recursive ``bzr ls`` now works properly when a path is specified.
(Jelmer Vernooij, #357863)
+* Fix a bug in the pure-python ``GroupCompress`` code when handling copies
+ longer than 64KiB. (John Arbash Meinel, #364900)
+
Documentation
*************
=== modified file 'bzrlib/_groupcompress_py.py'
--- a/bzrlib/_groupcompress_py.py 2009-04-09 20:23:07 +0000
+++ b/bzrlib/_groupcompress_py.py 2009-04-23 00:58:30 +0000
@@ -23,6 +23,74 @@
from bzrlib import osutils
+class _OutputHandler(object):
+ """A simple class which just tracks how to split up an insert request."""
+
+ def __init__(self, out_lines, index_lines, min_len_to_index):
+ self.out_lines = out_lines
+ self.index_lines = index_lines
+ self.min_len_to_index = min_len_to_index
+ self.cur_insert_lines = []
+ self.cur_insert_len = 0
+
+ def add_copy(self, start_byte, end_byte):
+ # The data stream allows >64kB in a copy, but to match the compiled
+ # code, we will also limit it to a 64kB copy
+ for start_byte in xrange(start_byte, end_byte, 64*1024):
+ num_bytes = min(64*1024, end_byte - start_byte)
+ copy_bytes = encode_copy_instruction(start_byte, num_bytes)
+ self.out_lines.append(copy_bytes)
+ self.index_lines.append(False)
+
+ def _flush_insert(self):
+ if not self.cur_insert_lines:
+ return
+ if self.cur_insert_len > 127:
+ raise AssertionError('We cannot insert more than 127 bytes'
+ ' at a time.')
+ self.out_lines.append(chr(self.cur_insert_len))
+ self.index_lines.append(False)
+ self.out_lines.extend(self.cur_insert_lines)
+ if self.cur_insert_len < self.min_len_to_index:
+ self.index_lines.extend([False]*len(self.cur_insert_lines))
+ else:
+ self.index_lines.extend([True]*len(self.cur_insert_lines))
+ self.cur_insert_lines = []
+ self.cur_insert_len = 0
+
+ def _insert_long_line(self, line):
+ # Flush out anything pending
+ self._flush_insert()
+ line_len = len(line)
+ for start_index in xrange(0, line_len, 127):
+ next_len = min(127, line_len - start_index)
+ self.out_lines.append(chr(next_len))
+ self.index_lines.append(False)
+ self.out_lines.append(line[start_index:start_index+next_len])
+ # We don't index long lines, because we won't be able to match
+ # a line split across multiple inserts anway
+ self.index_lines.append(False)
+
+ def add_insert(self, lines):
+ if self.cur_insert_lines != []:
+ raise AssertionError('self.cur_insert_lines must be empty when'
+ ' adding a new insert')
+ for line in lines:
+ if len(line) > 127:
+ self._insert_long_line(line)
+ else:
+ next_len = len(line) + self.cur_insert_len
+ if next_len > 127:
+ # Adding this line would overflow, so flush, and start over
+ self._flush_insert()
+ self.cur_insert_lines = [line]
+ self.cur_insert_len = len(line)
+ else:
+ self.cur_insert_lines.append(line)
+ self.cur_insert_len = next_len
+ self._flush_insert()
+
+
class LinesDeltaIndex(object):
"""This class indexes matches between strings.
@@ -33,6 +101,9 @@
:ivar endpoint: The total number of bytes in self.line_offsets
"""
+ _MIN_MATCH_BYTES = 10
+ _SOFT_MIN_MATCH_BYTES = 200
+
def __init__(self, lines):
self.lines = []
self.line_offsets = []
@@ -50,7 +121,11 @@
for idx, do_index in enumerate(index):
if not do_index:
continue
- matches.setdefault(new_lines[idx], []).append(start_idx + idx)
+ line = new_lines[idx]
+ try:
+ matches[line].add(start_idx + idx)
+ except KeyError:
+ matches[line] = set([start_idx + idx])
def get_matches(self, line):
"""Return the lines which match the line in right."""
@@ -59,7 +134,7 @@
except KeyError:
return None
- def _get_longest_match(self, lines, pos, locations):
+ def _get_longest_match(self, lines, pos):
"""Look at all matches for the current line, return the longest.
:param lines: The lines we are matching against
@@ -74,48 +149,45 @@
"""
range_start = pos
range_len = 0
- copy_ends = None
+ prev_locations = None
max_pos = len(lines)
+ matching = self._matching_lines
while pos < max_pos:
- if locations is None:
- # TODO: is try/except better than get(..., None)?
- try:
- locations = self._matching_lines[lines[pos]]
- except KeyError:
- locations = None
- if locations is None:
+ try:
+ locations = matching[lines[pos]]
+ except KeyError:
# No more matches, just return whatever we have, but we know
# that this last position is not going to match anything
pos += 1
break
+ # We have a match
+ if prev_locations is None:
+ # This is the first match in a range
+ prev_locations = locations
+ range_len = 1
+ locations = None # Consumed
else:
- # We have a match
- if copy_ends is None:
- # This is the first match in a range
- copy_ends = [loc + 1 for loc in locations]
- range_len = 1
+ # We have a match started, compare to see if any of the
+ # current matches can be continued
+ next_locations = locations.intersection([loc + 1 for loc
+ in prev_locations])
+ if next_locations:
+ # At least one of the regions continues to match
+ prev_locations = set(next_locations)
+ range_len += 1
locations = None # Consumed
else:
- # We have a match started, compare to see if any of the
- # current matches can be continued
- next_locations = set(copy_ends).intersection(locations)
- if next_locations:
- # At least one of the regions continues to match
- copy_ends = [loc + 1 for loc in next_locations]
- range_len += 1
- locations = None # Consumed
- else:
- # All current regions no longer match.
- # This line does still match something, just not at the
- # end of the previous matches. We will return locations
- # so that we can avoid another _matching_lines lookup.
- break
+ # All current regions no longer match.
+ # This line does still match something, just not at the
+ # end of the previous matches. We will return locations
+ # so that we can avoid another _matching_lines lookup.
+ break
pos += 1
- if copy_ends is None:
+ if prev_locations is None:
# We have no matches, this is a pure insert
- return None, pos, locations
- return (((min(copy_ends) - range_len, range_start, range_len)),
- pos, locations)
+ return None, pos
+ smallest = min(prev_locations)
+ return (smallest - range_len + 1, range_start, range_len), pos
def get_matching_blocks(self, lines, soft=False):
"""Return the ranges in lines which match self.lines.
@@ -133,15 +205,13 @@
# instructions.
result = []
pos = 0
- locations = None
max_pos = len(lines)
result_append = result.append
- min_match_bytes = 10
+ min_match_bytes = self._MIN_MATCH_BYTES
if soft:
- min_match_bytes = 200
+ min_match_bytes = self._SOFT_MIN_MATCH_BYTES
while pos < max_pos:
- block, pos, locations = self._get_longest_match(lines, pos,
- locations)
+ block, pos = self._get_longest_match(lines, pos)
if block is not None:
# Check to see if we match fewer than min_match_bytes. As we
# will turn this into a pure 'insert', rather than a copy.
@@ -178,38 +248,6 @@
' got out of sync with the line counter.')
self.endpoint = endpoint
- def _flush_insert(self, start_linenum, end_linenum,
- new_lines, out_lines, index_lines):
- """Add an 'insert' request to the data stream."""
- bytes_to_insert = ''.join(new_lines[start_linenum:end_linenum])
- insert_length = len(bytes_to_insert)
- # Each insert instruction is at most 127 bytes long
- for start_byte in xrange(0, insert_length, 127):
- insert_count = min(insert_length - start_byte, 127)
- out_lines.append(chr(insert_count))
- # Don't index the 'insert' instruction
- index_lines.append(False)
- insert = bytes_to_insert[start_byte:start_byte+insert_count]
- as_lines = osutils.split_lines(insert)
- out_lines.extend(as_lines)
- index_lines.extend([True]*len(as_lines))
-
- def _flush_copy(self, old_start_linenum, num_lines,
- out_lines, index_lines):
- if old_start_linenum == 0:
- first_byte = 0
- else:
- first_byte = self.line_offsets[old_start_linenum - 1]
- stop_byte = self.line_offsets[old_start_linenum + num_lines - 1]
- num_bytes = stop_byte - first_byte
- # The data stream allows >64kB in a copy, but to match the compiled
- # code, we will also limit it to a 64kB copy
- for start_byte in xrange(first_byte, stop_byte, 64*1024):
- num_bytes = min(64*1024, stop_byte - first_byte)
- copy_bytes = encode_copy_instruction(start_byte, num_bytes)
- out_lines.append(copy_bytes)
- index_lines.append(False)
-
def make_delta(self, new_lines, bytes_length=None, soft=False):
"""Compute the delta for this content versus the original content."""
if bytes_length is None:
@@ -217,6 +255,8 @@
# reserved for content type, content length
out_lines = ['', '', encode_base128_int(bytes_length)]
index_lines = [False, False, False]
+ output_handler = _OutputHandler(out_lines, index_lines,
+ self._MIN_MATCH_BYTES)
blocks = self.get_matching_blocks(new_lines, soft=soft)
current_line_num = 0
# We either copy a range (while there are reusable lines) or we
@@ -224,11 +264,16 @@
for old_start, new_start, range_len in blocks:
if new_start != current_line_num:
# non-matching region, insert the content
- self._flush_insert(current_line_num, new_start,
- new_lines, out_lines, index_lines)
+ output_handler.add_insert(new_lines[current_line_num:new_start])
current_line_num = new_start + range_len
if range_len:
- self._flush_copy(old_start, range_len, out_lines, index_lines)
+ # Convert the line based offsets into byte based offsets
+ if old_start == 0:
+ first_byte = 0
+ else:
+ first_byte = self.line_offsets[old_start - 1]
+ last_byte = self.line_offsets[old_start + range_len - 1]
+ output_handler.add_copy(first_byte, last_byte)
return out_lines, index_lines
@@ -271,9 +316,7 @@
copy_bytes.append(chr(base_byte))
offset >>= 8
if length is None:
- # None is used by the test suite
- copy_bytes[0] = chr(copy_command)
- return ''.join(copy_bytes)
+ raise ValueError("cannot supply a length of None")
if length > 0x10000:
raise ValueError("we don't emit copy records for lengths > 64KiB")
if length == 0:
@@ -337,7 +380,6 @@
def make_delta(source_bytes, target_bytes):
"""Create a delta from source to target."""
- # TODO: The checks below may not be a the right place yet.
if type(source_bytes) is not str:
raise TypeError('source is not a str')
if type(target_bytes) is not str:
=== modified file 'bzrlib/groupcompress.py'
--- a/bzrlib/groupcompress.py 2009-04-20 08:37:32 +0000
+++ b/bzrlib/groupcompress.py 2009-04-22 17:18:45 +0000
@@ -299,6 +299,66 @@
]
return ''.join(chunks)
+ def _dump(self, include_text=False):
+ """Take this block, and spit out a human-readable structure.
+
+ :param include_text: Inserts also include text bits, chose whether you
+ want this displayed in the dump or not.
+ :return: A dump of the given block. The layout is something like:
+ [('f', length), ('d', delta_length, text_length, [delta_info])]
+ delta_info := [('i', num_bytes, text), ('c', offset, num_bytes),
+ ...]
+ """
+ self._ensure_content()
+ result = []
+ pos = 0
+ while pos < self._content_length:
+ kind = self._content[pos]
+ pos += 1
+ if kind not in ('f', 'd'):
+ raise ValueError('invalid kind character: %r' % (kind,))
+ content_len, len_len = decode_base128_int(
+ self._content[pos:pos + 5])
+ pos += len_len
+ if content_len + pos > self._content_length:
+ raise ValueError('invalid content_len %d for record @ pos %d'
+ % (content_len, pos - len_len - 1))
+ if kind == 'f': # Fulltext
+ result.append(('f', content_len))
+ elif kind == 'd': # Delta
+ delta_content = self._content[pos:pos+content_len]
+ delta_info = []
+ # The first entry in a delta is the decompressed length
+ decomp_len, delta_pos = decode_base128_int(delta_content)
+ result.append(('d', content_len, decomp_len, delta_info))
+ measured_len = 0
+ while delta_pos < content_len:
+ c = ord(delta_content[delta_pos])
+ delta_pos += 1
+ if c & 0x80: # Copy
+ (offset, length,
+ delta_pos) = decode_copy_instruction(delta_content, c,
+ delta_pos)
+ delta_info.append(('c', offset, length))
+ measured_len += length
+ else: # Insert
+ if include_text:
+ txt = delta_content[delta_pos:delta_pos+c]
+ else:
+ txt = ''
+ delta_info.append(('i', c, txt))
+ measured_len += c
+ delta_pos += c
+ if delta_pos != content_len:
+ raise ValueError('Delta consumed a bad number of bytes:'
+ ' %d != %d' % (delta_pos, content_len))
+ if measured_len != decomp_len:
+ raise ValueError('Delta claimed fulltext was %d bytes, but'
+ ' extraction resulted in %d bytes'
+ % (decomp_len, measured_len))
+ pos += content_len
+ return result
+
class _LazyGroupCompressFactory(object):
"""Yield content from a GroupCompressBlock on demand."""
@@ -1661,6 +1721,7 @@
apply_delta_to_source,
encode_base128_int,
decode_base128_int,
+ decode_copy_instruction,
LinesDeltaIndex,
)
try:
=== modified file 'bzrlib/tests/test__groupcompress.py'
--- a/bzrlib/tests/test__groupcompress.py 2009-04-09 20:23:07 +0000
+++ b/bzrlib/tests/test__groupcompress.py 2009-04-21 23:54:16 +0000
@@ -186,6 +186,19 @@
'N\x90\x1d\x1ewhich is meant to differ from\n\x91:\x13',
delta)
+ def test_make_delta_with_large_copies(self):
+ # We want to have a copy that is larger than 64kB, which forces us to
+ # issue multiple copy instructions.
+ big_text = _text3 * 1220
+ delta = self.make_delta(big_text, big_text)
+ self.assertDeltaIn(
+ '\xdc\x86\x0a' # Encoding the length of the uncompressed text
+ '\x80' # Copy 64kB, starting at byte 0
+ '\x84\x01' # and another 64kB starting at 64kB
+ '\xb4\x02\x5c\x83', # And the bit of tail.
+ None, # Both implementations should be identical
+ delta)
+
def test_apply_delta_is_typesafe(self):
self.apply_delta(_text1, 'M\x90M')
self.assertRaises(TypeError, self.apply_delta, object(), 'M\x90M')
@@ -358,18 +371,18 @@
self.assertEqual((exp_offset, exp_length, exp_newpos), out)
def test_encode_no_length(self):
- self.assertEncode('\x80', 0, None)
- self.assertEncode('\x81\x01', 1, None)
- self.assertEncode('\x81\x0a', 10, None)
- self.assertEncode('\x81\xff', 255, None)
- self.assertEncode('\x82\x01', 256, None)
- self.assertEncode('\x83\x01\x01', 257, None)
- self.assertEncode('\x8F\xff\xff\xff\xff', 0xFFFFFFFF, None)
- self.assertEncode('\x8E\xff\xff\xff', 0xFFFFFF00, None)
- self.assertEncode('\x8D\xff\xff\xff', 0xFFFF00FF, None)
- self.assertEncode('\x8B\xff\xff\xff', 0xFF00FFFF, None)
- self.assertEncode('\x87\xff\xff\xff', 0x00FFFFFF, None)
- self.assertEncode('\x8F\x04\x03\x02\x01', 0x01020304, None)
+ self.assertEncode('\x80', 0, 64*1024)
+ self.assertEncode('\x81\x01', 1, 64*1024)
+ self.assertEncode('\x81\x0a', 10, 64*1024)
+ self.assertEncode('\x81\xff', 255, 64*1024)
+ self.assertEncode('\x82\x01', 256, 64*1024)
+ self.assertEncode('\x83\x01\x01', 257, 64*1024)
+ self.assertEncode('\x8F\xff\xff\xff\xff', 0xFFFFFFFF, 64*1024)
+ self.assertEncode('\x8E\xff\xff\xff', 0xFFFFFF00, 64*1024)
+ self.assertEncode('\x8D\xff\xff\xff', 0xFFFF00FF, 64*1024)
+ self.assertEncode('\x8B\xff\xff\xff', 0xFF00FFFF, 64*1024)
+ self.assertEncode('\x87\xff\xff\xff', 0x00FFFFFF, 64*1024)
+ self.assertEncode('\x8F\x04\x03\x02\x01', 0x01020304, 64*1024)
def test_encode_no_offset(self):
self.assertEncode('\x90\x01', 0, 1)
=== modified file 'bzrlib/tests/test_groupcompress.py'
--- a/bzrlib/tests/test_groupcompress.py 2009-04-20 08:37:32 +0000
+++ b/bzrlib/tests/test_groupcompress.py 2009-04-22 17:18:45 +0000
@@ -447,6 +447,18 @@
# And the decompressor is finalized
self.assertIs(None, block._z_content_decompressor)
+ def test__dump(self):
+ dup_content = 'some duplicate content\nwhich is sufficiently long\n'
+ key_to_text = {('1',): dup_content + '1 unique\n',
+ ('2',): dup_content + '2 extra special\n'}
+ locs, block = self.make_block(key_to_text)
+ self.assertEqual([('f', len(key_to_text[('1',)])),
+ ('d', 21, len(key_to_text[('2',)]),
+ [('c', 2, len(dup_content)),
+ ('i', len('2 extra special\n'), '')
+ ]),
+ ], block._dump())
+
class TestCaseWithGroupCompressVersionedFiles(tests.TestCaseWithTransport):
More information about the bazaar-commits
mailing list