[MERGE] Re: merging in versioned-file

Tue Feb 21 16:26:30 GMT 2006

Robert Collins wrote:
> On Mon, 2006-02-20 at 12:28 -0600, John A Meinel wrote:
>> Robert Collins wrote:
>>> So I've started on the merge conflict resolution for this, and Martin
>>> and I [hes camping at my place for 'net until his new adsl is installed]
>>> looked at some of the structural changes involved.
>>>
>>> A couple of obvious things:
>>>
>>> The transport changes are not backwards compatible, so we're going to
>>> deprecate the existing names and provide new names like 'open_append'
>>> rather than replacing the meaning of 'append'. 
>>>
>> Sounds reasonable. I assume you are completely switching over to the
>> "Transport returns a file object" version. So I would expect you to add
>> the functions "open_read", "open_write", and "open_append".
>> I think the current code has "put()" returning a file-like object to
>> write to. But that doesn't seem very nice, so I would like to see that
>> changed to 'open_write'.
> 
> Yes, thats the plan.
> 
>> Further, we need to make it clear that the file-like objects that are
>> returned, aren't garbage collected properly, so they must be used in a
>> 'try/finally' to make sure that they are fully uploaded, and then put
>> into place atomically. (The old put() interface did that for you, now
>> all calling code needs to call close() in a finally block).
> 
> Well, they can't offer any stronger gc than python does. I wonder if
> keeping the convenience methods is useful? I.e. keep Transport.put() at
> the top level, undeprecated, and implemented in terms of try: stream =
> self.open_write(); stream.write(content); finally: stream.close()
> 
>> Also, if we are going to this form, I would really like to see the
>> file-like objects have a readv() function for just reading part of the
>> file. Rather than using seek+read. I think it will be a lot nicer for
>> knits where you could easily end up with a disjoint set of ranges that
>> you want to read. And seek+read falls down once you do it 2 times.
>> (unless you translate all read() requests into a partial read, rather
>> than being able to optimize that readv() is a partial, and read(1000) is
>> trying to read the whole thing, just 1000 bytes at a time).
> 
> I think readv() is a useful optimisation to implement, but its important
> to note that we do not know the required ranges a-priori much of the
> time, so having seek + read as well is fine IMO. Or are you saying 'only
> provide readv, and if someone wants a stream interface, they build it on
> top of readv' ?
> 
> ..
>> I like the idea of fast code paths based on the specific objects of
>> concern (weaves vs knits vs v4 storage) rather than based on the branch
>> format. So +1 to that concept.
> 
> Well, its not necessarily down at that level, though as the concept
> spreads I imagine we will get to that point. With InterStore etc etc.
> 
> For now, it removes the multiple format checks at the repository level
> into a single check, which can be extended at runtime by plugins (either
> derive from InterWeaveRepo and implement a new is_compatible) or write a
> branch new InterXRepo class as needed). These are also tested against a
> set of conformance tests, so we can find bugs. One interesting one is
> that all the repository slow-code paths were completely untested: we
> have only one repository layout that is writable, and the slow code
> paths were never triggering. So I've added a FIXME to reinstate testing
> of them as soon as we have an incompatible format to use them against.
> 
> Seeking a +1 on the InterRepository implementation attached.
> 
> inter-repo-api$ wc -l inter-repo.patch 
> 1731 inter-repo.patch
> 
> inter-repo-api$ diffstat inter-repo.patch 
>  NEWS                                                                 |    8 
>  bzrlib/branch.py                                                     |   46 -
>  bzrlib/fetch.py                                                      |  123 --
>  bzrlib/merge.py                                                      |    8 
>  bzrlib/repository.py                                                 |  439 +++++++---
>  bzrlib/revisionspec.py                                               |    3 
>  bzrlib/tests/__init__.py                                             |    4 
>  bzrlib/tests/branch_implementations/test_branch.py                   |    8 
>  bzrlib/tests/bzrdir_implementations/test_bzrdir.py                   |   90 +-
>  bzrlib/tests/interrepository_implementations/__init__.py             |   54 +
>  bzrlib/tests/interrepository_implementations/test_interrepository.py |  158 +++
>  bzrlib/tests/repository_implementations/test_fileid_involved.py      |    4 
>  bzrlib/tests/repository_implementations/test_repository.py           |   73 -
>  bzrlib/tests/test_bzrdir.py                                          |    2 
>  bzrlib/tests/test_commit_merge.py                                    |    7 
>  bzrlib/tests/test_fetch.py                                           |   31 
>  bzrlib/tests/test_merge.py                                           |    3 
>  bzrlib/tests/test_repository.py                                      |   87 +
>  bzrlib/tests/test_revision.py                                        |   13 
>  bzrlib/tests/test_selftest.py                                        |   28 
>  20 files changed, 846 insertions(+), 343 deletions(-)
> 
> Rob
> 
> 
> 

...

> +
> +class InterRepository(object):

...

> +    @needs_write_lock
> +    def copy_content(self, revision_id=None, basis=None):
> +        """Make a complete copy of the content in self into destination.
> +        
> +        This is a destructive operation! Do not use it on existing 
> +        repositories.
> +
> +        :param revision_id: Only copy the content needed to construct
> +                            revision_id and its parents.
> +        :param basis: Copy the needed data preferentially from basis.
> +        """
> +        try:
> +            self.target.set_make_working_trees(self.source.make_working_trees())
> +        except NotImplementedError:
> +            pass
> +        # grab the basis available data
> +        if basis is not None:
> +            self.target.fetch(basis, revision_id=revision_id)
> +        # but dont both fetching if we have the needed data now.
> +        if (revision_id not in (None, NULL_REVISION) and 
> +            self.target.has_revision(revision_id)):
> +            return
> +        self.target.fetch(self.source, revision_id=revision_id)

Why is copy_content() marked as a 'destructive operation!'. I don't see
anything that actually removes information here. I'm guessing this is
old documentation. Since 'fetch()' shouldn't be destructive. (At least
in my mind).

> +
> +    def _double_lock(self, lock_source, lock_target):
> +        """Take out too locks, rolling back the first if the second throws."""

s/too/two/

> +        lock_source()
> +        try:
> +            lock_target()
> +        except Exception:
> +            # we want to ensure that we don't leave source locked by mistake.
> +            # and any error on target should not confuse source.
> +            self.source.unlock()
> +            raise
> +
> +    @needs_write_lock
> +    def fetch(self, revision_id=None, pb=None):
> +        """Fetch the content required to construct revision_id.
> +
> +        The content is copied from source to target.
> +
> +        :param revision_id: if None all content is copied, if NULL_REVISION no
> +                            content is copied.
> +        :param pb: optional progress bar to use for progress reports. If not
> +                   provided a default one will be created.
> +
> +        Returns the copied revision count and the failed revisions in a tuple:
> +        (copied, failures).
> +        """
> +        from bzrlib.fetch import RepoFetcher
> +        mutter("Using fetch logic to copy between %s(%s) and %s(%s)",
> +               self.source, self.source._format, self.target, self.target._format)
> +        f = RepoFetcher(to_repository=self.target,
> +                        from_repository=self.source,
> +                        last_revision=revision_id,
> +                        pb=pb)
> +        return f.count_copied, f.failed_revisions
> +
> +    @classmethod
> +    def get(klass, repository_source, repository_target):
> +        """Retrieve a InterRepository worker object for these repositories.
> +
> +        :param repository_source: the repository to be the 'source' member of
> +                                  the InterRepository instance.
> +        :param repository_target: the repository to be the 'target' member of
> +                                the InterRepository instance.
> +        If an optimised InterRepository worker exists it will be used otherwise
> +        a default InterRepository instance will be created.
> +        """
> +        for provider in klass._optimisers:
> +            if provider.is_compatible(repository_source, repository_target):
> +                return provider(repository_source, repository_target)
> +        return InterRepository(repository_source, repository_target)
> +
> +    def lock_read(self):
> +        """Take out a logical read lock.
> +
> +        This will lock the source branch and the target branch. The source gets
> +        a read lock and the target a read lock.
> +        """
> +        self._double_lock(self.source.lock_read, self.target.lock_read)
> +
> +    def lock_write(self):
> +        """Take out a logical write lock.
> +
> +        This will lock the source branch and the target branch. The source gets
> +        a read lock and the target a write lock.
> +        """
> +        self._double_lock(self.source.lock_read, self.target.lock_write)
> +
> +    @needs_read_lock
> +    def missing_revision_ids(self, revision_id=None):
> +        """Return the revision ids that source has that target does not.
> +        
> +        These are returned in topological order.
> +
> +        :param revision_id: only return revision ids included by this
> +                            revision_id.
> +        """
> +        # generic, possibly worst case, slow code path.
> +        target_ids = set(self.source.all_revision_ids())
> +        if revision_id is not None:
> +            source_ids = self.target.get_ancestry(revision_id)
> +            assert source_ids.pop(0) == None
> +        else:
> +            source_ids = self.target.all_revision_ids()
> +        result_set = set(source_ids).difference(target_ids)
> +        # this may look like a no-op: its not. It preserves the ordering
> +        # other_ids had while only returning the members from other_ids
> +        # that we've decided we need.
> +        return [rev_id for rev_id in other_ids if rev_id in result_set]
> +
> +    @classmethod
> +    def register_optimiser(klass, optimiser):
> +        """Register an InterRepository optimiser."""
> +        klass._optimisers.add(optimiser)
> +
> +    def unlock(self):
> +        """Release the locks on source and target."""
> +        try:
> +            self.target.unlock()
> +        finally:
> +            self.source.unlock()
> +
> +    @classmethod
> +    def unregister_optimiser(klass, optimiser):
> +        """Unregister an InterRepository optimiser."""
> +        klass._optimisers.remove(optimiser)
> +

I didn't go through the rest with a fine-toothed comb. But it looks
pretty good.

John
=:->

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060221/5aa94106/attachment.pgp