[RFC] question about different behavior of WorkingTree.iter_changes

Sat Apr 12 12:05:33 BST 2008

Aaron Bentley пишет:
> Alexander Belchenko wrote:
>> Aaron Bentley ?8H5B:
>>> Alexander Belchenko wrote:
>>>> It's inevitable for me to use id2path for this cases because I need
>>>> to know
>>>> file basename or at least its file extension.
>>> I don't see why that follows; the basename is an attribute of inventory
>>> entries.  You don't need id2path to find out the basename.
>> *blink*
> 
>> Can you point me to right place, please: what I should use instead id2path
>> if I'm only need basename?
> 
> InventoryEntry.name
> 
> It's also available on dirstate entries as "entry[0][1]".

Sorry Aaron for stupid questions, but I feel I don't understand something.
According to dirstate.py entry[0][1] is basename of file in utf-8 encoding.
(I need basename in unicode not in utf-8, but it's not important right now.)

You said before that tree.id2path is expensive operation, but if I understand
you correctly using entry[0] is faster. For iter_changes I already did this,
and it works perfectly.
But in other places I don't have dirstate entry to work with, but only file_id.

I looked at implementation of WorkingTree4.id2path method and I see that
entry[0] is actually used in that method:

     @needs_read_lock
     def id2path(self, file_id):
         "Convert a file-id to a path."
         state = self.current_dirstate()
         entry = self._get_entry(file_id=file_id)
         if entry == (None, None):
             raise errors.NoSuchId(tree=self, file_id=file_id)
         path_utf8 = osutils.pathjoin(entry[0][0], entry[0][1])
         return path_utf8.decode('utf8')

So I'm totally confused. Can you shed some light why you said:

"id2path is a surprisingly expensive operation, and if you're doing it
repeatedly, it's best to do something else instead."

This is because search in dirstate for file_id is slow?
Or because every time self.current_dirstate() is called?

> 
>> But what about path2id? Actually it's not big deal if path2id will be slow,
>> because right now it's used only in WT2&3 supporting code (in HashCache).
> 
> Also preferable to avoid path2id.  If paths2ids is good enough for you,
> use that.  Generally, the multiple-result operations are more efficient
> than the single-result operations.
> 
> Aaron