[plugin] dirstate experiments

John Arbash Meinel john at arbash-meinel.com
Wed Jun 14 23:41:18 BST 2006


John Arbash Meinel wrote:
> Robert Collins wrote:
> 
> ...
> 
>> Well, there are two cases here. The stat cache is just a validator for
>> the fs, so unpacking it does not make sense. However, the sha1 is stored
>> in the inventory in commit, and code expects to get at it, so I think
>> having the sha be accessible is a good idea - I'd rather pay a little
>> bit more in size for a format that is directly usable in the inventory
>> logic, as streaming reads are fast - its seeks that are slow.
>>
>>> It most certainly is not editable in that form, but it is readable.
>>>
>>> Also, it turns out that the major overhead at this point is the giant
>>> 'text.decode()', plus the overhead of operating on unicode objects
>>> rather than operating on str() objects.
>> I was wondering when that would start to bite :(.
> 
> 

...

> 
> This is the breakdown based on number of parent entries:
> num	str	unicode	file_size
> 0	100ms	150ms	3.1MB
> 1	177ms	268ms	5.7MB
> 2	275ms	352ms	5.9MB
> 3	370ms	470ms	6.1MB
> 

I have good news and bad news about the above numbers, and both are the
same thing. I was unable to beat these numbers with a C++ extension.
str.split() is just really really fast, and doing slicing on them is
also super fast.
Maybe with an alternate format could we do better in C++, but lots of
calls to string.find (both C++ std::string and python str) is just much
worse than building up the list with str.split() and then slicing into
that list.

I tried quite a few methods. But a list comprehension with slicing is
pretty much impossible to beat.
Maybe if we weren't splitting on newline characters we could use some
sort of inner loop in C++ that would handle length prefixed chars (like
the blob format I worked on).

But this is just so much better than anything else I was able to create.

John
=:->


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 254 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060614/13bafcd2/attachment.pgp 


More information about the bazaar mailing list