bzr struggling with large trees

John Arbash Meinel john at arbash-meinel.com
Sun Oct 16 23:17:28 BST 2005


Rob Holland wrote:
> Hi,
> 
> I've tried importing a tree with around 79,000 files total. The import
> didn't go too badly, but the add/commit times are really painful.

Well, I can say from the start, that the part that is killing you is the
 parsing of the inventory XML. Do you have cElementTree installed? That
is probably the biggest performance boost for what you are seeing.

As far as a long-term solution, I'm not really sure. Because right now
we store the entire inventory in a single file. So if you add a file,
the entire inventory needs to be read, modified, and then written out again.

> 
> tigger at xahn % echo > testing-bzr2
> tigger at xahn % bzr --profile add testing-bzr2
> added testing-bzr2
>          5129837 function calls (4790586 primitive calls) in 8.295 CPU seconds
> 
>    Ordered by: cumulative time
>    List reduced from 182 to 20 due to restriction <20>
> 
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    0.000    0.000    8.295    8.295 commands.py:258(run_argv)
>         1    0.000    0.000    8.295    8.295 builtins.py:184(run)
>         1    0.366    0.366    8.295    8.295 add.py:60(smart_add)
>         1    0.002    0.002    7.928    7.928 add.py:73(smart_add_branch)
>         2    0.000    0.000    3.820    1.910 branch.py:536(read_working_inventory)
>         2    0.093    0.047    3.819    1.910 xml.py:54(read_inventory)
>         2    0.362    0.181    3.726    1.863 xml5.py:105(_unpack_inventory)
>         1    0.000    0.000    3.577    3.577 branch.py:548(_write_inventory)
>         1    0.000    0.000    3.576    3.576 xml.py:43(write_inventory)
>    197884    1.195    0.000    3.079    0.000 xml5.py:124(_unpack_entry)
>         1    0.000    0.000    2.655    2.655 xml.py:69(_write_element)
>         1    0.000    0.000    2.655    2.655 ElementTree.py:652(write)
>   98944/1    0.748    0.000    2.655    2.655 ElementTree.py:662(_write)
>         1    0.000    0.000    2.588    2.588 branch.py:1055(working_tree)
>    296681    1.080    0.000    1.462    0.000 ElementTree.py:812(_escape_attrib)
>    158059    0.594    0.000    1.126    0.000 inventory.py:526(__init__)
>         1    0.109    0.109    0.921    0.921 xml5.py:36(_pack_inventory)
>    197885    0.660    0.000    0.660    0.000 inventory.py:219(__init__)
>     39826    0.525    0.000    0.654    0.000 inventory.py:443(__init__)
> 339247/98944    0.481    0.000    0.629    0.000 inventory.py:749(iter_entries)
> 
> tigger at xahn % bzr --profile commit -m "another test" testing-bzr2
>          20134289 function calls (18783355 primitive calls) in 41.949 CPU seconds
> 
>    Ordered by: cumulative time
>    List reduced from 339 to 20 due to restriction <20>

Here you are running into the overhead of "mutter()" with the 197912
calls. I'm not sure why you have that many calls, though considering it
is only 9.5sec of your total time, the function isn't really expensive.

> 
>    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
>         1    0.000    0.000   41.949   41.949 commands.py:258(run_argv)
>         1    0.000    0.000   41.949   41.949 builtins.py:996(run)
>         1    0.691    0.691   41.947   41.947 branch.py:1017(commit)
>         1    0.003    0.003   41.254   41.254 commit.py:154(commit)
>         1    0.449    0.449   12.527   12.527 commit.py:361(_populate_new_inv)
>    197912    0.449    0.000    9.521    0.000 __init__.py:935(debug)
>    197912    0.588    0.000    8.588    0.000 __init__.py:1055(_log)
>         1    0.752    0.752    8.394    8.394 commit.py:338(_store_snapshot)
>         3    0.565    0.188    7.574    2.525 xml5.py:105(_unpack_inventory)
>         1    0.033    0.033    6.746    6.746 commit.py:245(_record_inventory)
>    296827    3.187    0.000    6.580    0.000 xml5.py:124(_unpack_entry)
>         2    0.034    0.017    6.227    3.114 branch.py:837(get_inventory)
>         2    0.184    0.092    6.014    3.007 xml.py:51(read_inventory_from_string)
>         1    0.176    0.176    5.333    5.333 xml.py:48(write_inventory_to_string)
>     98945    0.108    0.000    5.134    0.000 weave.py:92(get_weave_or_empty)
>     98947    1.119    0.000    5.026    0.000 weave.py:75(get_weave)
>    197912    0.423    0.000    4.618    0.000 __init__.py:1070(handle)
>                                                                          
> Anything more I can do to help try and solve the performance issues, let
> me know.

How does this compare with other trees that you have used? 79k files
seems like a lot, and I certainly think tla/baz would have done very
poorly too. bk would do a lot better, since it requires "bk edit". But
have you tried git?
> 
> I can be found as "tigger^" on irc.freenode.org if that helps, I sit in
> #bzr.

Hopefully we can help you a little bit,
John
=:->

> 
> Cheers
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 256 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20051016/4ae9ce93/attachment.pgp 


More information about the bazaar mailing list