[MERGE] Re: merging in versioned-file
John A Meinel
john at arbash-meinel.com
Tue Feb 21 16:26:30 GMT 2006
Robert Collins wrote:
> On Mon, 2006-02-20 at 12:28 -0600, John A Meinel wrote:
>> Robert Collins wrote:
>>> So I've started on the merge conflict resolution for this, and Martin
>>> and I [hes camping at my place for 'net until his new adsl is installed]
>>> looked at some of the structural changes involved.
>>>
>>> A couple of obvious things:
>>>
>>> The transport changes are not backwards compatible, so we're going to
>>> deprecate the existing names and provide new names like 'open_append'
>>> rather than replacing the meaning of 'append'.
>>>
>> Sounds reasonable. I assume you are completely switching over to the
>> "Transport returns a file object" version. So I would expect you to add
>> the functions "open_read", "open_write", and "open_append".
>> I think the current code has "put()" returning a file-like object to
>> write to. But that doesn't seem very nice, so I would like to see that
>> changed to 'open_write'.
>
> Yes, thats the plan.
>
>> Further, we need to make it clear that the file-like objects that are
>> returned, aren't garbage collected properly, so they must be used in a
>> 'try/finally' to make sure that they are fully uploaded, and then put
>> into place atomically. (The old put() interface did that for you, now
>> all calling code needs to call close() in a finally block).
>
> Well, they can't offer any stronger gc than python does. I wonder if
> keeping the convenience methods is useful? I.e. keep Transport.put() at
> the top level, undeprecated, and implemented in terms of try: stream =
> self.open_write(); stream.write(content); finally: stream.close()
>
>> Also, if we are going to this form, I would really like to see the
>> file-like objects have a readv() function for just reading part of the
>> file. Rather than using seek+read. I think it will be a lot nicer for
>> knits where you could easily end up with a disjoint set of ranges that
>> you want to read. And seek+read falls down once you do it 2 times.
>> (unless you translate all read() requests into a partial read, rather
>> than being able to optimize that readv() is a partial, and read(1000) is
>> trying to read the whole thing, just 1000 bytes at a time).
>
> I think readv() is a useful optimisation to implement, but its important
> to note that we do not know the required ranges a-priori much of the
> time, so having seek + read as well is fine IMO. Or are you saying 'only
> provide readv, and if someone wants a stream interface, they build it on
> top of readv' ?
>
> ..
>> I like the idea of fast code paths based on the specific objects of
>> concern (weaves vs knits vs v4 storage) rather than based on the branch
>> format. So +1 to that concept.
>
> Well, its not necessarily down at that level, though as the concept
> spreads I imagine we will get to that point. With InterStore etc etc.
>
> For now, it removes the multiple format checks at the repository level
> into a single check, which can be extended at runtime by plugins (either
> derive from InterWeaveRepo and implement a new is_compatible) or write a
> branch new InterXRepo class as needed). These are also tested against a
> set of conformance tests, so we can find bugs. One interesting one is
> that all the repository slow-code paths were completely untested: we
> have only one repository layout that is writable, and the slow code
> paths were never triggering. So I've added a FIXME to reinstate testing
> of them as soon as we have an incompatible format to use them against.
>
> Seeking a +1 on the InterRepository implementation attached.
>
> inter-repo-api$ wc -l inter-repo.patch
> 1731 inter-repo.patch
>
> inter-repo-api$ diffstat inter-repo.patch
> NEWS | 8
> bzrlib/branch.py | 46 -
> bzrlib/fetch.py | 123 --
> bzrlib/merge.py | 8
> bzrlib/repository.py | 439 +++++++---
> bzrlib/revisionspec.py | 3
> bzrlib/tests/__init__.py | 4
> bzrlib/tests/branch_implementations/test_branch.py | 8
> bzrlib/tests/bzrdir_implementations/test_bzrdir.py | 90 +-
> bzrlib/tests/interrepository_implementations/__init__.py | 54 +
> bzrlib/tests/interrepository_implementations/test_interrepository.py | 158 +++
> bzrlib/tests/repository_implementations/test_fileid_involved.py | 4
> bzrlib/tests/repository_implementations/test_repository.py | 73 -
> bzrlib/tests/test_bzrdir.py | 2
> bzrlib/tests/test_commit_merge.py | 7
> bzrlib/tests/test_fetch.py | 31
> bzrlib/tests/test_merge.py | 3
> bzrlib/tests/test_repository.py | 87 +
> bzrlib/tests/test_revision.py | 13
> bzrlib/tests/test_selftest.py | 28
> 20 files changed, 846 insertions(+), 343 deletions(-)
>
> Rob
>
>
>
...
> +
> +class InterRepository(object):
...
> + @needs_write_lock
> + def copy_content(self, revision_id=None, basis=None):
> + """Make a complete copy of the content in self into destination.
> +
> + This is a destructive operation! Do not use it on existing
> + repositories.
> +
> + :param revision_id: Only copy the content needed to construct
> + revision_id and its parents.
> + :param basis: Copy the needed data preferentially from basis.
> + """
> + try:
> + self.target.set_make_working_trees(self.source.make_working_trees())
> + except NotImplementedError:
> + pass
> + # grab the basis available data
> + if basis is not None:
> + self.target.fetch(basis, revision_id=revision_id)
> + # but dont both fetching if we have the needed data now.
> + if (revision_id not in (None, NULL_REVISION) and
> + self.target.has_revision(revision_id)):
> + return
> + self.target.fetch(self.source, revision_id=revision_id)
Why is copy_content() marked as a 'destructive operation!'. I don't see
anything that actually removes information here. I'm guessing this is
old documentation. Since 'fetch()' shouldn't be destructive. (At least
in my mind).
> +
> + def _double_lock(self, lock_source, lock_target):
> + """Take out too locks, rolling back the first if the second throws."""
s/too/two/
> + lock_source()
> + try:
> + lock_target()
> + except Exception:
> + # we want to ensure that we don't leave source locked by mistake.
> + # and any error on target should not confuse source.
> + self.source.unlock()
> + raise
> +
> + @needs_write_lock
> + def fetch(self, revision_id=None, pb=None):
> + """Fetch the content required to construct revision_id.
> +
> + The content is copied from source to target.
> +
> + :param revision_id: if None all content is copied, if NULL_REVISION no
> + content is copied.
> + :param pb: optional progress bar to use for progress reports. If not
> + provided a default one will be created.
> +
> + Returns the copied revision count and the failed revisions in a tuple:
> + (copied, failures).
> + """
> + from bzrlib.fetch import RepoFetcher
> + mutter("Using fetch logic to copy between %s(%s) and %s(%s)",
> + self.source, self.source._format, self.target, self.target._format)
> + f = RepoFetcher(to_repository=self.target,
> + from_repository=self.source,
> + last_revision=revision_id,
> + pb=pb)
> + return f.count_copied, f.failed_revisions
> +
> + @classmethod
> + def get(klass, repository_source, repository_target):
> + """Retrieve a InterRepository worker object for these repositories.
> +
> + :param repository_source: the repository to be the 'source' member of
> + the InterRepository instance.
> + :param repository_target: the repository to be the 'target' member of
> + the InterRepository instance.
> + If an optimised InterRepository worker exists it will be used otherwise
> + a default InterRepository instance will be created.
> + """
> + for provider in klass._optimisers:
> + if provider.is_compatible(repository_source, repository_target):
> + return provider(repository_source, repository_target)
> + return InterRepository(repository_source, repository_target)
> +
> + def lock_read(self):
> + """Take out a logical read lock.
> +
> + This will lock the source branch and the target branch. The source gets
> + a read lock and the target a read lock.
> + """
> + self._double_lock(self.source.lock_read, self.target.lock_read)
> +
> + def lock_write(self):
> + """Take out a logical write lock.
> +
> + This will lock the source branch and the target branch. The source gets
> + a read lock and the target a write lock.
> + """
> + self._double_lock(self.source.lock_read, self.target.lock_write)
> +
> + @needs_read_lock
> + def missing_revision_ids(self, revision_id=None):
> + """Return the revision ids that source has that target does not.
> +
> + These are returned in topological order.
> +
> + :param revision_id: only return revision ids included by this
> + revision_id.
> + """
> + # generic, possibly worst case, slow code path.
> + target_ids = set(self.source.all_revision_ids())
> + if revision_id is not None:
> + source_ids = self.target.get_ancestry(revision_id)
> + assert source_ids.pop(0) == None
> + else:
> + source_ids = self.target.all_revision_ids()
> + result_set = set(source_ids).difference(target_ids)
> + # this may look like a no-op: its not. It preserves the ordering
> + # other_ids had while only returning the members from other_ids
> + # that we've decided we need.
> + return [rev_id for rev_id in other_ids if rev_id in result_set]
> +
> + @classmethod
> + def register_optimiser(klass, optimiser):
> + """Register an InterRepository optimiser."""
> + klass._optimisers.add(optimiser)
> +
> + def unlock(self):
> + """Release the locks on source and target."""
> + try:
> + self.target.unlock()
> + finally:
> + self.source.unlock()
> +
> + @classmethod
> + def unregister_optimiser(klass, optimiser):
> + """Unregister an InterRepository optimiser."""
> + klass._optimisers.remove(optimiser)
> +
I didn't go through the rest with a fine-toothed comb. But it looks
pretty good.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060221/5aa94106/attachment.pgp
More information about the bazaar
mailing list