Rev 31: Bring in the 'rabin' experiment. in http://bazaar.launchpad.net/%7Ebzr/bzr-groupcompress/trunk
John Arbash Meinel
john at arbash-meinel.com
Wed Mar 4 16:05:47 GMT 2009
At http://bazaar.launchpad.net/%7Ebzr/bzr-groupcompress/trunk
------------------------------------------------------------
revno: 31
revision-id: john at arbash-meinel.com-20090304160155-66iy2jorb5h39n6d
parent: robertc at robertcollins.net-20090302205544-kmcaa6d3stdbddda
parent: john at arbash-meinel.com-20090304153824-86p8mekizpx70bkr
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: trunk
timestamp: Wed 2009-03-04 10:01:55 -0600
message:
Bring in the 'rabin' experiment.
Change the names and disk-strings for the various repository formats.
Make the CHK format repositories all 'rich-root' we can introduce non-rich-root later.
Make a couple other small tweaks, like copyright statements, etc.
Remove patch-delta.c, at this point, it was only a reference implementation,
as we have fully integrated the patching into pyrex, to allow nicer exception
handling.
added:
delta.h delta.h-20090227173129-qsu3u43vowf1q3ay-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
renamed:
_groupcompress_c.pyx => _groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
tests/test__groupcompress_c.py => tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
modified:
.bzrignore bzrignore-20080724041812-7jbgn9euewwtns1u-1
TODO todo-20080705181503-ccbxd6xuy1bdnrpu-5
__init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
setup.py setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
tests/__init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-11
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 28.4.59
revision-id: john at arbash-meinel.com-20090304153824-86p8mekizpx70bkr
parent: john at arbash-meinel.com-20090304152748-iqp4zqlzvnq5pm23
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Wed 2009-03-04 09:38:24 -0600
message:
TODO entry.
modified:
TODO todo-20080705181503-ccbxd6xuy1bdnrpu-5
------------------------------------------------------------
revno: 28.4.58
revision-id: john at arbash-meinel.com-20090304152748-iqp4zqlzvnq5pm23
parent: john at arbash-meinel.com-20090304150015-b6o2fru8grx5ubpm
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Wed 2009-03-04 09:27:48 -0600
message:
fix up the failing tests.
The new delta code needs a 16-byte window to match, so to *know* that there will
be a match, you need ~32-bytes in common. (guarantees that 16-bytes somewhere in
that 32-byte range will match.)
Also, when setting 'max_delta', it is possible that we run out of bytes before
we actually find the last match, which would make things compress better.
This is rare in practice, because texts are longer than 40 bytes. But it happens
in testing.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 28.4.57
revision-id: john at arbash-meinel.com-20090304150015-b6o2fru8grx5ubpm
parent: john at arbash-meinel.com-20090304042506-zaf29b1u9jnajp2u
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Wed 2009-03-04 09:00:15 -0600
message:
Change the formatting, replace \t with spaces to be consistent with bzr coding.
modified:
delta.h delta.h-20090227173129-qsu3u43vowf1q3ay-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.56
revision-id: john at arbash-meinel.com-20090304042506-zaf29b1u9jnajp2u
parent: john at arbash-meinel.com-20090303225027-dd26kj3xasgfi7bv
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 22:25:06 -0600
message:
update TODO a little bit.
modified:
TODO todo-20080705181503-ccbxd6xuy1bdnrpu-5
------------------------------------------------------------
revno: 28.4.55
revision-id: john at arbash-meinel.com-20090303225027-dd26kj3xasgfi7bv
parent: john at arbash-meinel.com-20090303222649-n917r5v7ti7szu5r
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 16:50:27 -0600
message:
Make sure the default is _FAST=False for now.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.54
revision-id: john at arbash-meinel.com-20090303222649-n917r5v7ti7szu5r
parent: john at arbash-meinel.com-20090303221259-ghe53xhqu8igvz03
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 16:26:49 -0600
message:
'bzr pack' _FAST during compress() now is 32s versus 25s.
However, I'm extending _FAST to also stop checking the sha1 sums,
with that change, _FAST is 20s versus 32s.
It is a bit dangerous without the sha1 checking, but it is nice
to see as a 'how fast can we make it', once we are sure about
correctness.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.53
revision-id: john at arbash-meinel.com-20090303221259-ghe53xhqu8igvz03
parent: john at arbash-meinel.com-20090303220215-1luhz4zfr9vrdmud
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 16:12:59 -0600
message:
Remove the temporary adjustment for handling multiple formats of labels.
Update the maximum size source array.
I was hitting 16k sources in a single group, and I didn't want to write the code
that resizes sources and then adjusts the existing index pointers.
That should be done, though.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.52
revision-id: john at arbash-meinel.com-20090303220215-1luhz4zfr9vrdmud
parent: john at arbash-meinel.com-20090303214221-ea1e84bkmi22yfgk
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 16:02:15 -0600
message:
Use the max_delta flag.
Prefer to extract and compress bytes rather than chunks/lines.
This has a fairly positive impact on the 'bzr pack' times.
We still do a ''.join([bytes]), but we know that doesn't have
to do any memory copying.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.51
revision-id: john at arbash-meinel.com-20090303214221-ea1e84bkmi22yfgk
parent: john at arbash-meinel.com-20090303212302-lemyfgzfyq0l7ojl
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 15:42:21 -0600
message:
Remove the debug printing.
modified:
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.50
revision-id: john at arbash-meinel.com-20090303212302-lemyfgzfyq0l7ojl
parent: john at arbash-meinel.com-20090303210721-m25wehoeo3jxsz11
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Tue 2009-03-03 15:23:02 -0600
message:
Change the code to do the copies in bigger chunks.
We should be able to get a small number of memcopies, rather than having to copy
each record individualy, or copy each hash range individually.
modified:
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.49
revision-id: john at arbash-meinel.com-20090303210721-m25wehoeo3jxsz11
parent: john at arbash-meinel.com-20090303203526-o9xw0n70j2g622e0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Tue 2009-03-03 15:07:21 -0600
message:
When adding new entries to the delta index, use memcpy
rather than copying them one by one.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.48
revision-id: john at arbash-meinel.com-20090303203526-o9xw0n70j2g622e0
parent: john at arbash-meinel.com-20090303200908-hjdzbzj0cs6zua2v
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Tue 2009-03-03 14:35:26 -0600
message:
Remove bogus line.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.47
revision-id: john at arbash-meinel.com-20090303200908-hjdzbzj0cs6zua2v
parent: john at arbash-meinel.com-20090303200711-qc4qoqyrnpyla6iz
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Tue 2009-03-03 14:09:08 -0600
message:
Use the new add_delta_source.
It shaves off a small amount of time, and improves the compression slightly.
Next step is to work on optimizing the code.
It feels like the include_entries_from_index is wasting a lot of time
double copying all of the previous matches.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.46
revision-id: john at arbash-meinel.com-20090303200711-qc4qoqyrnpyla6iz
parent: john at arbash-meinel.com-20090303195329-epc5tn11m2jmo7rm
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Tue 2009-03-03 14:07:11 -0600
message:
Fix a bug in create_delta_index_from_delta when inserting into a already filled hash location.
modified:
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.45
revision-id: john at arbash-meinel.com-20090303195329-epc5tn11m2jmo7rm
parent: john at arbash-meinel.com-20090303181057-i1239vipqi27fxbs
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Tue 2009-03-03 13:53:29 -0600
message:
Add a function that updates the index for delta bytes.
This avoids indexing control bytes, and helps to align the actual index pointers
to the real data.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
delta.h delta.h-20090227173129-qsu3u43vowf1q3ay-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 28.4.44
revision-id: john at arbash-meinel.com-20090303181057-i1239vipqi27fxbs
parent: john at arbash-meinel.com-20090303180544-mfgw9jsndwiwj047
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 12:10:57 -0600
message:
Remove the multi-index handling now that we have index combining instead.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
delta.h delta.h-20090227173129-qsu3u43vowf1q3ay-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.43
revision-id: john at arbash-meinel.com-20090303180544-mfgw9jsndwiwj047
parent: john at arbash-meinel.com-20090303163107-l4j0114btw2efmjp
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 12:05:44 -0600
message:
Change the internals to allow delta indexes to be expanded with new source data.
Now when adding a new source, the old index entries are included in the new structure.
This generally seems to be better than having multiple indexes, as it improves the
efficiency of the internal hash map, and avoids extra iterating.
Bring back the _FAST flag. At the moment, with _FAST=True, doing bzr pack is about
37s rather than 1min, and gives 9.7MB texts, rather than 8.2MB or so.
So at the moment, it is still a useful flag to have.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
delta.h delta.h-20090227173129-qsu3u43vowf1q3ay-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 28.4.42
revision-id: john at arbash-meinel.com-20090303163107-l4j0114btw2efmjp
parent: john at arbash-meinel.com-20090303160222-4bkou2s65s60h75a
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 10:31:07 -0600
message:
Change the code around again.
This time, the information about sources is maintained in the DeltaIndex object.
And we pass that info down into create_delta_index, et al.
Next step is to actually combine the delta indexes.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
delta.h delta.h-20090227173129-qsu3u43vowf1q3ay-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.41
revision-id: john at arbash-meinel.com-20090303160222-4bkou2s65s60h75a
parent: john at arbash-meinel.com-20090303150939-93yexh0v5hmvkwdo
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 10:02:22 -0600
message:
Start moving the information about source buffers into the actual index_entry.
This leads the way for combining indexes for multiple sources together.
modified:
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.40
revision-id: john at arbash-meinel.com-20090303150939-93yexh0v5hmvkwdo
parent: john at arbash-meinel.com-20090303150400-3il0kyvau1ho5vww
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 09:09:39 -0600
message:
Add a comment why we aren't using the list type for _sources
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
------------------------------------------------------------
revno: 28.4.39
revision-id: john at arbash-meinel.com-20090303150400-3il0kyvau1ho5vww
parent: john at arbash-meinel.com-20090303145931-5ahrrw6hycii49xj
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 09:04:00 -0600
message:
Merge the setup.py changes so that it actually fails if an extension fails to build.
modified:
setup.py setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
------------------------------------------------------------
revno: 28.4.38
revision-id: john at arbash-meinel.com-20090303145931-5ahrrw6hycii49xj
parent: john at arbash-meinel.com-20090303144815-zdo0ak0vjclvx6y3
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 08:59:31 -0600
message:
fix the local offset problem in a slightly different way.
Leave moff in local offsets until encoding, and then convert.
This allows us to skip the extra local variable, and just looks a bit cleaner, IMO.
modified:
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.37
revision-id: john at arbash-meinel.com-20090303144815-zdo0ak0vjclvx6y3
parent: john at arbash-meinel.com-20090303141551-qhokyhnloc1qsznh
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 08:48:15 -0600
message:
If you are going to join the bytes anyway, use sha_string instead of sha_strings.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.36
revision-id: john at arbash-meinel.com-20090303141551-qhokyhnloc1qsznh
parent: john at arbash-meinel.com-20090303021815-dlqfgperty1bwnv1
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Tue 2009-03-03 08:15:51 -0600
message:
Track down a memory leak in the refactored diff-delta.c code.
We weren't deallocating the unpacked hash array in all code paths.
modified:
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.35
revision-id: john at arbash-meinel.com-20090303021815-dlqfgperty1bwnv1
parent: john at arbash-meinel.com-20090303021638-20p6dywzjesch07v
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Mon 2009-03-02 20:18:15 -0600
message:
Add a rich-root compatible gcr+chk255+rich-root format.
modified:
__init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
------------------------------------------------------------
revno: 28.4.34
revision-id: john at arbash-meinel.com-20090303021638-20p6dywzjesch07v
parent: john at arbash-meinel.com-20090302223828-hyb4crn4w28sgvmc
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Mon 2009-03-02 20:16:38 -0600
message:
Update groupcompress to allow it to read older conversions.
This will be removed, but I needed it for testing.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.33
revision-id: john at arbash-meinel.com-20090302223828-hyb4crn4w28sgvmc
parent: john at arbash-meinel.com-20090302210223-9ixutqay7sx8c1n3
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Mon 2009-03-02 16:38:28 -0600
message:
Fix a bug when handling multiple large-range copies.
We were adjusting moff multiple times, without adjusting it back.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.32
revision-id: john at arbash-meinel.com-20090302210223-9ixutqay7sx8c1n3
parent: john at arbash-meinel.com-20090302202718-c7ojzhft35boi1kn
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Mon 2009-03-02 15:02:23 -0600
message:
Refactor the code a bit, so that I can re-use bits for a create_delta_index_from_delta.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.31
revision-id: john at arbash-meinel.com-20090302202718-c7ojzhft35boi1kn
parent: john at arbash-meinel.com-20090302201609-k275n1rspptl2ve3
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: rabin
timestamp: Mon 2009-03-02 14:27:18 -0600
message:
Add a bit of comments about things to do.
modified:
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.30
revision-id: john at arbash-meinel.com-20090302201609-k275n1rspptl2ve3
parent: john at arbash-meinel.com-20090302200018-si0py093o7esxzyd
parent: john at arbash-meinel.com-20090302200837-l2v96rd0e6u68479
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 14:16:09 -0600
message:
Merge in Ian's groupcompress trunk updates
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
------------------------------------------------------------
revno: 28.4.29
revision-id: john at arbash-meinel.com-20090302200018-si0py093o7esxzyd
parent: john at arbash-meinel.com-20090302195421-5j3s3xzr2r8y80bw
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 14:00:18 -0600
message:
Forgot to add the delta bytes to the index objects.
Also add an assertion to make sure things like that don't get missed.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.28
revision-id: john at arbash-meinel.com-20090302195421-5j3s3xzr2r8y80bw
parent: john at arbash-meinel.com-20090302194337-f0x1quasnm4p7x9m
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 13:54:21 -0600
message:
Gotta import 'trace' if you want to use trace.mutter()
modified:
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
------------------------------------------------------------
revno: 28.4.27
revision-id: john at arbash-meinel.com-20090302194337-f0x1quasnm4p7x9m
parent: john at arbash-meinel.com-20090302193629-51hqsvh1rhh71gku
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 13:43:37 -0600
message:
Fix up some failing tests.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 28.4.26
revision-id: john at arbash-meinel.com-20090302193629-51hqsvh1rhh71gku
parent: john at arbash-meinel.com-20090302191537-7mvjwk2042fvj9gg
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 13:36:29 -0600
message:
We now start to make use of the ability to extend the delta index
with new sources. Next step is to understand the delta encoding, so as to
avoid linking up with lines in the deltas.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 28.4.25
revision-id: john at arbash-meinel.com-20090302191537-7mvjwk2042fvj9gg
parent: john at arbash-meinel.com-20090302185236-gm5ckgaic13q6vvs
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 13:15:37 -0600
message:
We are now able to add multiple sources to the delta generator.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 28.4.24
revision-id: john at arbash-meinel.com-20090302185236-gm5ckgaic13q6vvs
parent: john at arbash-meinel.com-20090302180420-8m229eh99p2bp2r5
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 12:52:36 -0600
message:
Change the code so that we can pass in multiple sources to match against.
At the moment, we only use a single source, but that will soon change.
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
delta.h delta.h-20090227173129-qsu3u43vowf1q3ay-1
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
------------------------------------------------------------
revno: 28.4.23
revision-id: john at arbash-meinel.com-20090302180420-8m229eh99p2bp2r5
parent: john at arbash-meinel.com-20090302180323-cx4qz36qnmd0dnki
parent: john at arbash-meinel.com-20090302160108-9pl56rebxcd23w35
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 12:04:20 -0600
message:
Merge the gc for pyrex 0.9.6.4 updates
modified:
_groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
------------------------------------------------------------
revno: 28.5.1
revision-id: john at arbash-meinel.com-20090302160108-9pl56rebxcd23w35
parent: john at arbash-meinel.com-20090228050444-38soix727ge8yhvn
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 10:01:08 -0600
message:
Make the groupcompress pyrex extension compatible with pyrex 0.9.6.4
Also fix a bug in processing the offsets.
modified:
_groupcompress_c.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
------------------------------------------------------------
revno: 28.4.22
revision-id: john at arbash-meinel.com-20090302180323-cx4qz36qnmd0dnki
parent: john at arbash-meinel.com-20090302170533-v13igzvtt0hf7y2z
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 12:03:23 -0600
message:
Add a mutter() while repacking, so that we log progress as we go along.
modified:
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
------------------------------------------------------------
revno: 28.4.21
revision-id: john at arbash-meinel.com-20090302170533-v13igzvtt0hf7y2z
parent: john at arbash-meinel.com-20090228050444-38soix727ge8yhvn
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Mon 2009-03-02 11:05:33 -0600
message:
Rename the extension to _pyx, since Robert prefers that form
renamed:
_groupcompress_c.pyx => _groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
tests/test__groupcompress_c.py => tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
modified:
.bzrignore bzrignore-20080724041812-7jbgn9euewwtns1u-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
setup.py setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
tests/__init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-11
tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 28.4.20
revision-id: john at arbash-meinel.com-20090228050444-38soix727ge8yhvn
parent: john at arbash-meinel.com-20090228050349-5b5fljgovy1ylokx
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 23:04:44 -0600
message:
For now, use _FAST=True
This could be a reasonable 'autopack' configuration, if DeltaIndex.extend()
ends up being too difficult to implement.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.19
revision-id: john at arbash-meinel.com-20090228050349-5b5fljgovy1ylokx
parent: john at arbash-meinel.com-20090228044639-zhrn3p7ykngc0zs4
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 23:03:49 -0600
message:
Implement a 'FAST' mode.
If we insert a text and get a 'decent' delta, then we just keep using
that delta_index until we get a bad insert. (delta > 1/2 size).
In this mode 'bzr pack' drops from 2m41s => 53s. Inventory pages
are barely effected in size, while Text pages go from 8.2MB => 9.6MB.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.18
revision-id: john at arbash-meinel.com-20090228044639-zhrn3p7ykngc0zs4
parent: john at arbash-meinel.com-20090228044347-vjb5fzj5s9cd8a7c
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 22:46:39 -0600
message:
Add some profiling comments.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.17
revision-id: john at arbash-meinel.com-20090228044347-vjb5fzj5s9cd8a7c
parent: john at arbash-meinel.com-20090228042933-zdoupq6lka7lyvg9
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 22:43:47 -0600
message:
Create a wrapper function, so that lsprof will properly attribute time spent.
modified:
_groupcompress_c.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.16
revision-id: john at arbash-meinel.com-20090228042933-zdoupq6lka7lyvg9
parent: john at arbash-meinel.com-20090228042802-joang5uih4qcf45p
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 22:29:33 -0600
message:
Properly restore the label functionality.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.15
revision-id: john at arbash-meinel.com-20090228042802-joang5uih4qcf45p
parent: john at arbash-meinel.com-20090228042448-nfhhzpjuqic78bfr
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 22:28:02 -0600
message:
Handle when self._index is NULL, mostly because the source text was the empty strig.
Start using DeltaIndex as part of the stardard compressing.
modified:
_groupcompress_c.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.14
revision-id: john at arbash-meinel.com-20090228042448-nfhhzpjuqic78bfr
parent: john at arbash-meinel.com-20090228040012-lbkwky6vtdmhjepx
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 22:24:48 -0600
message:
Implement a DeltaIndex wrapper.
This splits out the create_delta_index from the create_delta code.
Which should also help for profiling purposes.
modified:
_groupcompress_c.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
tests/test__groupcompress_c.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 28.4.13
revision-id: john at arbash-meinel.com-20090228040012-lbkwky6vtdmhjepx
parent: john at arbash-meinel.com-20090228032304-13o0os3ho1nqq4ze
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 22:00:12 -0600
message:
Factor out the ability to have/not have labels.
It turns out that labels now cost overall 10% increase in repo size. A rather
large 40% increase for inventory pages.
Perhaps since label == sha1 we could get away doing something differently.
Note also that repository-details doesn't take into account the indexes.
The .cix index for a conversion is approx 380kB, which starts to be an
important factor when you consider the total content for all chk pages
is less than 1.5MB.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.12
revision-id: john at arbash-meinel.com-20090228032304-13o0os3ho1nqq4ze
parent: john at arbash-meinel.com-20090227204002-fdzk52zc3frd4ddi
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 21:23:04 -0600
message:
Add a 'len:' field to the data.
With this field, we can now fully populate an index from expanding
the group-compress pages.
There might be an issue with expanding the zlib pages, though if
we switched to using gzip pages that would certainly go away.
(perhaps zlib would have a 'trailing bytes', though, that would
make it ok.)
Checking to see how much this impacts final compressed size.
Next step is to try removing all labels, and see what that
final size becomes.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 28.4.11
revision-id: john at arbash-meinel.com-20090227204002-fdzk52zc3frd4ddi
parent: john at arbash-meinel.com-20090227201847-181ruulj0worz3ra
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 14:40:02 -0600
message:
Insert a fulltext if the delta is more than half the total size.
Also, gcr deltas are more pithy, they probably are approx the same after
compression, but decrease the range limits since the copy instructions are
effectively pre-compressed.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
setup.py setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
------------------------------------------------------------
revno: 28.4.10
revision-id: john at arbash-meinel.com-20090227201847-181ruulj0worz3ra
parent: john at arbash-meinel.com-20090227195427-5rw3pjlgkssido0d
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 14:18:47 -0600
message:
Allowing the source bytes to be longer than expected.
This makes a huge difference for extraction speed.
10s versus 45s. Versus 17s for the original groupcompress code.
Also, the compiled version in _groupcompress_c seems ~ the same speed as
the patch-delta.c version.
At the very least, the extra memory copy overhead negates any benefit.
modified:
_groupcompress_c.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
patch-delta.c patchdelta.c-20090226042143-l9wzxynyuxnb5hus-2
------------------------------------------------------------
revno: 28.4.9
revision-id: john at arbash-meinel.com-20090227195427-5rw3pjlgkssido0d
parent: john at arbash-meinel.com-20090227184307-h8zgtnf217omdw1h
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 13:54:27 -0600
message:
We now basically have full support for using diff-delta as the compressor.
Will still need some tuning/tweaking to see how we want to proceed.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 28.4.8
revision-id: john at arbash-meinel.com-20090227184307-h8zgtnf217omdw1h
parent: john at arbash-meinel.com-20090227182104-ogr8fu5548ewpzx3
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 12:43:07 -0600
message:
Add another test text.
modified:
tests/test__groupcompress_c.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 28.4.7
revision-id: john at arbash-meinel.com-20090227182104-ogr8fu5548ewpzx3
parent: john at arbash-meinel.com-20090227173623-wbwvxgznqacu6u48
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 12:21:04 -0600
message:
Add a apply_delta2 function, just in case it matters.
modified:
_groupcompress_c.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
------------------------------------------------------------
revno: 28.4.6
revision-id: john at arbash-meinel.com-20090227173623-wbwvxgznqacu6u48
parent: john at arbash-meinel.com-20090227173204-ce7djs6xbflluut1
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 11:36:23 -0600
message:
Start stripping out the actual GroupCompressor
in preparation for using the diff-delta code.
Add some tests that we can generate and apply diff deltas.
We need to start adding some exceptions, and consider moving the
core of the patch-delta loop back into a pure C function, as the
generated code is very messy.
modified:
.bzrignore bzrignore-20080724041812-7jbgn9euewwtns1u-1
_groupcompress_c.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
setup.py setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
tests/__init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-11
tests/test__groupcompress_c.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
------------------------------------------------------------
revno: 28.4.5
revision-id: john at arbash-meinel.com-20090227173204-ce7djs6xbflluut1
parent: john at arbash-meinel.com-20090227160746-1gt1m20vqk7i273c
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 11:32:04 -0600
message:
Minor changes to get diff-delta.c and patch-delta.c to compile.
This includes bringing in 'delta.h'
added:
delta.h delta.h-20090227173129-qsu3u43vowf1q3ay-1
modified:
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
patch-delta.c patchdelta.c-20090226042143-l9wzxynyuxnb5hus-2
------------------------------------------------------------
revno: 28.4.4
revision-id: john at arbash-meinel.com-20090227160746-1gt1m20vqk7i273c
parent: john at arbash-meinel.com-20090227160650-iv1rpvxsqejydxj7
parent: john at arbash-meinel.com-20090227051839-841q6ss4z8zm1353
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 10:07:46 -0600
message:
Merge in the latest updates to the gc trunk.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 28.4.3
revision-id: john at arbash-meinel.com-20090227160650-iv1rpvxsqejydxj7
parent: john at arbash-meinel.com-20090226042229-qk6u230fwyxbmhd7
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Fri 2009-02-27 10:06:50 -0600
message:
Fix a couple more locations.
modified:
__init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
------------------------------------------------------------
revno: 28.4.2
revision-id: john at arbash-meinel.com-20090226042229-qk6u230fwyxbmhd7
parent: john at arbash-meinel.com-20090226041719-oi3d5putp8s2r233
author: Nicolas Pitre <nico at cam.org>
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Wed 2009-02-25 22:22:29 -0600
message:
Add the diff-delta.c and patch-delta.c files.
added:
diff-delta.c diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
patch-delta.c patchdelta.c-20090226042143-l9wzxynyuxnb5hus-2
------------------------------------------------------------
revno: 28.4.1
revision-id: john at arbash-meinel.com-20090226041719-oi3d5putp8s2r233
parent: john at arbash-meinel.com-20090225230422-4oigw03k7fq62eyb
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress_rabin
timestamp: Wed 2009-02-25 22:17:19 -0600
message:
Start a quick experimentation with a different 'diff' algorithm.
modified:
__init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
-------------- next part --------------
Diff too large for email (3617 lines, the limit is 1000).
More information about the bazaar-commits
mailing list