[rfc] [patch] speeding up bzr log with a tree delta iterator

Wed Feb 15 19:14:45 GMT 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Denys Duchier wrote:
> === modified file 'bzrlib/branch.py'
> --- bzrlib/branch.py	
> +++ bzrlib/branch.py	
> +    def revid_iter(self, rev1, rev2=None, reverse=False, pad=False):
> +        """returns an iterator over revids (both bounds included).
> +
> +        rev1
> +            revno or revid or None.  If it is a revid not in this branch's
> +            revision history, then rev2 must be None or equal to rev1; in this
> +            special case we are only interested in rev1 and possibly its delta
> +            with its left-most parent (mainline predecessor).  If rev1 is None,
> +            the iterator is empty.  rev1 denotes one bound of the iteration.

I can't recommend this as a public Branch method.  None is a valid
revision ID (unfortunately), so callers have a right to expect it to
work.  As bzrlib.log._revid_iter, it would be okay, because none of the
callers would ever supply the null revision as rev1 or rev2.

> +        def revno2id(revno):
> +            if revno:
> +                return rh[revno-1]
> +            else:
> +                return None

It looks like you could use Branch.get_rev_id here.  Is there a reason
not to?

> +        if isinstance(rev1, int):
> +            if not (1 <= rev1 <= rh_len):
> +                raise InvalidRevisionNumber(rev1)

isinstance is a yellow flag to me.  It suggests that something funky's
going on.  In this case, it's because your method takes revnos
(user-level identifiers) or revids (API-level identifiers.)  I wouldn't
- -1 on this basis, but I'd almost be happier if you took RevisionSpecs,
because at least that would be consistent.

> +                    elif self.has_revision(rev1):
> +                        yield rev1
> +                        return
> +                    else:
> +                        raise InvalidRevisionId(rev1, self)

I think NoSuchRevision would be a better error for this case

> +                else:
> +                    raise InvalidRevisionId(rev1, self)

Though not for this one.

> +        rev2no
> +            None or a function from revid to revno.  This function should
> +            also accept 0 and None and return None in those cases.

I don't understand the purpose of this.  Why not just Branch.get_rev_id?

> +        if delta:
> +            def rev2tree(revid):
> +                if revid is None:
> +                    return EmptyTree()
> +                else:
> +                    return self.revision_tree(revid)

revision_tree already returns EmptyTree if the input is None

> === modified file 'bzrlib/revisionspec.py'
> --- bzrlib/revisionspec.py	
> +++ bzrlib/revisionspec.py	
> @@ -206,6 +206,12 @@
>          except ValueError:
>              return RevisionInfo(branch, None, self.spec)
>  
> +    def in_store(self, branch):
> +        if branch.has_revision(self.spec):
> +            return RevisionInfo(branch, None, rev_id=self.spec)
> +        else:
> +            raise NoSuchRevision(branch, self.prefix + self.spec)

I don't understand the motivation for adding this.  Any function that
uses a revision-id already needs to handle cases where the revision
isn't in the repository.

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFD832l0F+nu1YWqI0RAskqAJ9ERHKHMvu2oU7/ukU92uQC/JGGEgCffhJA
mk3RsISaogc8dGvcZhvS8ME=
=VFxP
-----END PGP SIGNATURE-----