[MERGE] Update to bzr.dev.

Wed Jun 18 08:57:55 BST 2008

On Tue, Jun 17, 2008 at 5:05 PM, Andrew Bennetts <andrew at canonical.com> wrote:
> Here's a review of knit.py.  Lots of deleted and moved code, which makes it a
> large and awkward diff, but that's life.  Deleted code is a good thing, of
> course :)

I've done most of these but need answers from Robert on some of them:
>> +    def annotate(self, knit, key):
>> +        content = knit._get_content(key)
>> +        # adjust for the fact that serialised annotations are only key suffixes
>> +        # for this factory.
>> +        if type(key) == tuple:
>> +            prefix = key[:-1]
>> +            origins = content.annotate()
>> +            result = []
>> +            for origin, line in origins:
>> +                result.append((prefix + (origin,), line))
>> +            return result
>> +        else:
>> +            return content.annotate()
>
> This smells a bit.  Why would key ever be a non-tuple here?  Aren't keys defined
> to be tuples?
>

>> +def cleanup_pack_knit(versioned_files):
>> +    versioned_files.stream.close()
>> +    versioned_files.writer.end()
>
> Docstring please.  (And again, mention that it is a test helper).
>
> Or just move make_file_factory, make_pack_factory and cleanup_pack_knit into a
> separate module... (bzrlib.tests.knit_helpers maybe?)

In general I would like if the code intended only for use in testing
was only inside the test module.  I think Robert may disagree.

>
>> +class KnitVersionedFiles(VersionedFiles):
> [...]
>> +    def add_lines(self, key, parents, lines, parent_texts=None,
>> +        left_matching_blocks=None, nostore_sha=None, random_id=False,
>> +        check_content=True):
>> +        """See VersionedFiles.add_lines()."""
>> +        self._index._check_write_ok()
>> +        self._check_add(key, lines, random_id, check_content)
>> +        if parents is None:
>> +            # For no-graph knits, have the public interface use None for
>> +            # parents.
>> +            parents = ()
>
> I don't understand that comment.  What does "have the public interface use
> None" mean?  If we're here, then None was already passed, so the future tense
> doesn't make sense to me.

??

>> +        # Technically this could be avoided if we are happy to allow duplicate
>> +        # id insertion when other things than bzr core insert texts, but it
>> +        # seems useful for folk using the knit api directly to have some safety
>> +        # blanket that we can disable.
>> +        ##if not random_id and self.has_version(key):
>> +        ##    raise RevisionAlreadyPresent(key, self)
>
> What's the status here?  Are you going to delete the commented out code (and the
> comment), or reinstate the check?

??

>> -            delta_parents = self._index.get_parent_map([parent])[parent]
>> +            # No exception here because we stop at first fulltext anyway, an
>> +            # absent parent indicates a corrupt knit anyway.
>> +            # TODO: This should be asking for compression parent, not graph
>> +            # parent.
>> +            parent = self._index.get_parent_map([parent])[parent][0]
>
> No exception where?

I think he means he's not explicitly going to check that the parent is present?

> I'm not sure why the cache dict might be invalid... but this is strictly better
> than the previous code (which did the same sort of thing without any comment),
> so I'm not complaining :)
>
> Hmm, there is one small change that might help, though: you've replaced
> "has_version(foo)" calls with "get_parent_map([foo])".  I think the code would
> be a little easier to read in various places if we had an equivalent of
> has_version, maybe "has_graph_key(foo)".  I know you don't want to encourage
> people to write inefficient code that uses one key at a time by not providing
> APIs that work on single keys rather than collections of keys.  But it seems
> there's enough legitimate reason for checking for existence of a single key's
> existence at a time that this might be worth it?

I know Robert and Andrew talked about this a bit but I'm not sure what
the final result was.  I'd be ok to do that as a later cleanup.

>> +            for key in keys:
>> +                # the delta chain
>> +                try:
>> +                    chain = [key, positions[key][2]]
>
> For those of us that don't have the layout of positions tuples memorised, a
> comment in this function to remind the reader what [2] in a positions tuple is
> would be helpful.  Especially seeing as it's not the only sequence being indexed
> in this function.  Or perhaps it would be even better to immediately generate a
> dict of dict((key, next_key) for (method, index_memo, next_key, parents) in
> positions.iteritems()) before this loop.  After all, you only want the one
> element from items of the position map if include_delta_closure == True.
>
>> +                        try:
>> +                            chain.append(positions[chain[-1]][2])
>> +                        except KeyError:
>
> This would be clearer as:
>
>                        try:
>                            position = positions[chain[-1]]
>                        except KeyError:
>                            ...
>                        else:
>                            chain.append(position[2])
>
> But if take my earlier suggestion to trim the position dict down to something
> more appropriate then this wouldn't matter.

This seems like a good idea but the code is a bit hard for me to
reinterpret.  Maybe that means your comment is very good but I should
call it a day :-)

>
> [...]
>> +        if include_delta_closure:
>> +            # XXX: get_content_maps performs its own index queries; allow state
>> +            # to be passed in.
>> +            text_map, _ = self._get_content_maps(present_keys)
>> +            for key in present_keys:
>> +                yield FulltextContentFactory(key, parent_map[key], None,
>> +                    ''.join(text_map[key]))
>
> Huh.  So get_record_stream(..., include_delta_closure=True) will return only
> FulltextContentFactories (well, and AbsentContentFactories)?  That seems odd to
> me.  Is that right?

???

>> +    def iter_lines_added_or_present_in_keys(self, keys, pb=None):
> [...]
>> +        for key in keys:
>> +            key_records.append((key, build_details[key][0]))
>
> Again, give the [0] a name to help the reader:
>
>        for key in keys:
>            index_memo = build_details[key][0]

> [...]
>> +            prefix = key[:-1]
>> +            try:
>> +                suffix_parents = self._kndx_cache[prefix][0][key[-1]][4]
>> +            except KeyError:
>> +                pass
>
> Ow.  Which of the 5 separate getitem operators on that line is that KeyError
> meant to be guarding?
>
> Also, what the heck is the structure of _kndx_cache?  I tried to find it, but
> the best I could find was:
>
>    def _reset_cache(self):
>        # Possibly this should be a LRU cache. A dictionary from key_prefix to
>        # (cache_dict, history_vector) for parsed kndx files.
>        self._kndx_cache = {}
>        ...
>
> Unfortunately, that doesn't tell me anything about the structure or meaning of
> the cache_dict or history_vector.  The KndxIndex docstring talks about the
> _cache and _history instance variables, which I assume are probably related in
> some way, but I don't know how.  Please update the KndxIndex docstring.

robert, halp! :)

Here is my incremental patch addressing the other issues from Andrew and myself.

-- 
Martin <http://launchpad.net/~mbp/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20080618-versionedfiles-cleanups.diff
Type: text/x-diff
Size: 106269 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080618/a9274ea5/attachment-0001.bin