[MERGE] Update to bzr.dev.
Martin Pool
mbp at canonical.com
Wed Jun 18 08:57:55 BST 2008
On Tue, Jun 17, 2008 at 5:05 PM, Andrew Bennetts <andrew at canonical.com> wrote:
> Here's a review of knit.py. Lots of deleted and moved code, which makes it a
> large and awkward diff, but that's life. Deleted code is a good thing, of
> course :)
I've done most of these but need answers from Robert on some of them:
>> + def annotate(self, knit, key):
>> + content = knit._get_content(key)
>> + # adjust for the fact that serialised annotations are only key suffixes
>> + # for this factory.
>> + if type(key) == tuple:
>> + prefix = key[:-1]
>> + origins = content.annotate()
>> + result = []
>> + for origin, line in origins:
>> + result.append((prefix + (origin,), line))
>> + return result
>> + else:
>> + return content.annotate()
>
> This smells a bit. Why would key ever be a non-tuple here? Aren't keys defined
> to be tuples?
>
>> +def cleanup_pack_knit(versioned_files):
>> + versioned_files.stream.close()
>> + versioned_files.writer.end()
>
> Docstring please. (And again, mention that it is a test helper).
>
> Or just move make_file_factory, make_pack_factory and cleanup_pack_knit into a
> separate module... (bzrlib.tests.knit_helpers maybe?)
In general I would like if the code intended only for use in testing
was only inside the test module. I think Robert may disagree.
>
>> +class KnitVersionedFiles(VersionedFiles):
> [...]
>> + def add_lines(self, key, parents, lines, parent_texts=None,
>> + left_matching_blocks=None, nostore_sha=None, random_id=False,
>> + check_content=True):
>> + """See VersionedFiles.add_lines()."""
>> + self._index._check_write_ok()
>> + self._check_add(key, lines, random_id, check_content)
>> + if parents is None:
>> + # For no-graph knits, have the public interface use None for
>> + # parents.
>> + parents = ()
>
> I don't understand that comment. What does "have the public interface use
> None" mean? If we're here, then None was already passed, so the future tense
> doesn't make sense to me.
??
>> + # Technically this could be avoided if we are happy to allow duplicate
>> + # id insertion when other things than bzr core insert texts, but it
>> + # seems useful for folk using the knit api directly to have some safety
>> + # blanket that we can disable.
>> + ##if not random_id and self.has_version(key):
>> + ## raise RevisionAlreadyPresent(key, self)
>
> What's the status here? Are you going to delete the commented out code (and the
> comment), or reinstate the check?
??
>> - delta_parents = self._index.get_parent_map([parent])[parent]
>> + # No exception here because we stop at first fulltext anyway, an
>> + # absent parent indicates a corrupt knit anyway.
>> + # TODO: This should be asking for compression parent, not graph
>> + # parent.
>> + parent = self._index.get_parent_map([parent])[parent][0]
>
> No exception where?
I think he means he's not explicitly going to check that the parent is present?
> I'm not sure why the cache dict might be invalid... but this is strictly better
> than the previous code (which did the same sort of thing without any comment),
> so I'm not complaining :)
>
> Hmm, there is one small change that might help, though: you've replaced
> "has_version(foo)" calls with "get_parent_map([foo])". I think the code would
> be a little easier to read in various places if we had an equivalent of
> has_version, maybe "has_graph_key(foo)". I know you don't want to encourage
> people to write inefficient code that uses one key at a time by not providing
> APIs that work on single keys rather than collections of keys. But it seems
> there's enough legitimate reason for checking for existence of a single key's
> existence at a time that this might be worth it?
I know Robert and Andrew talked about this a bit but I'm not sure what
the final result was. I'd be ok to do that as a later cleanup.
>> + for key in keys:
>> + # the delta chain
>> + try:
>> + chain = [key, positions[key][2]]
>
> For those of us that don't have the layout of positions tuples memorised, a
> comment in this function to remind the reader what [2] in a positions tuple is
> would be helpful. Especially seeing as it's not the only sequence being indexed
> in this function. Or perhaps it would be even better to immediately generate a
> dict of dict((key, next_key) for (method, index_memo, next_key, parents) in
> positions.iteritems()) before this loop. After all, you only want the one
> element from items of the position map if include_delta_closure == True.
>
>> + try:
>> + chain.append(positions[chain[-1]][2])
>> + except KeyError:
>
> This would be clearer as:
>
> try:
> position = positions[chain[-1]]
> except KeyError:
> ...
> else:
> chain.append(position[2])
>
> But if take my earlier suggestion to trim the position dict down to something
> more appropriate then this wouldn't matter.
This seems like a good idea but the code is a bit hard for me to
reinterpret. Maybe that means your comment is very good but I should
call it a day :-)
>
> [...]
>> + if include_delta_closure:
>> + # XXX: get_content_maps performs its own index queries; allow state
>> + # to be passed in.
>> + text_map, _ = self._get_content_maps(present_keys)
>> + for key in present_keys:
>> + yield FulltextContentFactory(key, parent_map[key], None,
>> + ''.join(text_map[key]))
>
> Huh. So get_record_stream(..., include_delta_closure=True) will return only
> FulltextContentFactories (well, and AbsentContentFactories)? That seems odd to
> me. Is that right?
???
>> + def iter_lines_added_or_present_in_keys(self, keys, pb=None):
> [...]
>> + for key in keys:
>> + key_records.append((key, build_details[key][0]))
>
> Again, give the [0] a name to help the reader:
>
> for key in keys:
> index_memo = build_details[key][0]
> [...]
>> + prefix = key[:-1]
>> + try:
>> + suffix_parents = self._kndx_cache[prefix][0][key[-1]][4]
>> + except KeyError:
>> + pass
>
> Ow. Which of the 5 separate getitem operators on that line is that KeyError
> meant to be guarding?
>
> Also, what the heck is the structure of _kndx_cache? I tried to find it, but
> the best I could find was:
>
> def _reset_cache(self):
> # Possibly this should be a LRU cache. A dictionary from key_prefix to
> # (cache_dict, history_vector) for parsed kndx files.
> self._kndx_cache = {}
> ...
>
> Unfortunately, that doesn't tell me anything about the structure or meaning of
> the cache_dict or history_vector. The KndxIndex docstring talks about the
> _cache and _history instance variables, which I assume are probably related in
> some way, but I don't know how. Please update the KndxIndex docstring.
robert, halp! :)
Here is my incremental patch addressing the other issues from Andrew and myself.
--
Martin <http://launchpad.net/~mbp/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20080618-versionedfiles-cleanups.diff
Type: text/x-diff
Size: 106269 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080618/a9274ea5/attachment-0001.bin
More information about the bazaar
mailing list