[MERGE] Various hackery to reduce startup time.

Andrew Bennetts andrew at canonical.com
Sun Sep 14 12:57:21 BST 2008


John Arbash Meinel wrote:
[...]
> 
> To start with, this is the sort of cleanup that might be best when
> broken up into smaller chunks. As it makes it easier to review 1 file at

I agree.  This branch was originally just an experiment.  I intended to polish
it a bit more before dumping it on the list, but eventually decided it was more
useful to have it up for review than sitting on my disk.  I'm happy to extract
pieces of this and land just those.

> a time. Also, a lot of what you've done won't actually help, because
> we'll still end up loading osutils.get_user_encoding() on every 'bzr'
> invocation (because we use it to process the command line.) But I think
> it might put us in a direction to actually get there.

Right.  The thing that originally motivated me to look at that was wondering how
easy it would be to get the time for “import bzrlib” down.  As you say that's
not so useful to the 'bzr' tool today, but it might perhaps be helpful for other
clients of bzrlib.

> For now, I'd rather leave "bzrlib.user_encoding" alone, as we haven't
> deprecated it, and I think 3rd-party tools will use it. I *really* wish
> we had a way to issue a proper deprecation warning for accessing a
> module member. If it was a class we can use a property. Any ideas of
> what we could do here? (I worked on the ones for a DeprecatedList, etc.
> The problem is we don't use an *attribute* of user_encoding, we just use
> it as a string.)

Maybe we can make it be an object other than a str that warns when its __str__
is invoked?

> Lazy importing are changes that break things which the test suite cannot
> find. Because it imports *everything*. So I'd like to get something like
> this in bzr.dev as early in the next cycle as possible, so we can have
> real-world testing.

Agreed.


> +have_pywin32 = False
[...]
> +have_ctypes_win32 = False
[...]
> 
> I think it would be clearer to combine these. Also I notice that
> 'msvcrt' is repeated, which isn't really necessary. So instead:
> 
> have_pywin32 = False
> have_ctypes_win32 = False
> 
> if sys.platform == 'win32':
>   import msvcrt
>   try:
>     import win32con...
>     have_pywin32 = True
>   ...
> 
>   try:
>     import ctypes
>     have_ctypes_win32 = True
>   ...
> 
> 
> That gets us away from having imports spread throughout the code, and
> also always imports msvcrt since we need that anyway.

Good idea.  Done.

I've also moved these imports to the top of the file, rather than leaving them
buried half-way down.

(I wonder if actually bzr on win32 really needs all of these modules to be
imported all the time, or if some of these could in principle be lazy?
Something for a developer with Windows to play with, maybe...)

> 
> ...
> === modified file 'bzrlib/msgeditor.py'
> - --- bzrlib/msgeditor.py	2008-04-22 09:58:15 +0000
> +++ bzrlib/msgeditor.py	2008-09-08 12:59:00 +0000
> @@ -23,7 +23,6 @@
>  from subprocess import call
>  import sys
> 
> - -import bzrlib
>  import bzrlib.config as config
>  from bzrlib import osutils
> 
> ^- while we are here lets change it to:
> 
> from bzrlib import (
>   config,
>   osutils,
>   )

Done.

> === modified file 'bzrlib/option.py'
> ...
> +    @property
> +    def registry(self):
> +        if self._registry is not None:
> +            return self._registry
> +        else:
> +            return self._lazy_registry.get_obj()
> +
> 
> ^- Shouldn't this do:
> 
> if self._registry is None:
>   self._registry = self._lazy_registry.get_obj()
> return self._registry

Good idea.

I also added a basic test for lazy_registry to test_options.

>  def _win32_abspath(path):
> +    global _win32_abspath
> +    if win32utils.winver == 'Windows 98':
> +        _win32_abspath = _win98_abspath
> +    else:
> +        _win32_abspath = _real_win32_abspath
> +    return _win32_abspath(path)
> +
> +
> 
> self modifying functions seems a bit distasteful. I won't block it, as
> it may be a route we really want to go. It just seems a bit icky.
> 
> Oh, and in this particular case, it is wrong. In that we do:
> 
> if sys.platform == 'win32':
>   abspath = _win32_abspath
> 
> And your trickery will basically get invoked all the time anyway. (Since
> we access it via the 'abspath' variable, not the _win32_abspath.

Yeah.  In this case there's a better solution now that I look again:

 if sys.platform == 'win32':
-    abspath = _win32_abspath
+    if win32utils.winver == 'Windows 98':
+        abspath = _win98_abspath
+    else:
+        abspath = _win32_abspath

Easy :)

> However, I feel it is better than what you did here:
> 
> +    global sha_strings
> +    def sha_strings(strings, _factory=sha.new):
> 
> 
> ^- I would feel better about doing
> 
> global sha_strings
> def _sha_strings...
> 
> sha_strings = _sha_strings
> 
> I realize this works, but it seems brittle that 'def foo' will work with
> a 'global foo' statement.

I don't think it's brittle at all (def is just another form of assignment
statement in my brain), but I'm happy either way, so I've changed it.

> === modified file 'bzrlib/util/configobj/configobj.py'
> - --- bzrlib/util/configobj/configobj.py	2008-02-27 21:36:16 +0000
> +++ bzrlib/util/configobj/configobj.py	2008-03-14 17:07:55 +0000
> @@ -25,11 +25,6 @@
> 
>  import os, re
>  compiler = None
> - -try:
> - -    import compiler
> - -except ImportError:
> - -    # for IronPython
> - -    pass
> 
> ^- As we are modifying a 3rd-party library, I think we should actually
> comment it out, and include a clear comment as to why we got rid of it.
> 
> That way if we upgrade to a newer version, we will have good
> documentation about why that was removed.

Good idea.  Done.

> === modified file 'bzrlib/weave.py'
> - --- bzrlib/weave.py	2008-07-16 18:14:23 +0000
> +++ bzrlib/weave.py	2008-09-08 12:59:00 +0000
> @@ -75,6 +75,10 @@
>  import time
>  import warnings
> 
> +from bzrlib.lazy_import import lazy_import
> +lazy_import(globals(), """
> +from bzrlib import tsort
> +""")
>  from bzrlib import (
>      progress,
>      )
> 
> ^- You added a lazy "import tsort" but you didn't remove the "from
> bzrlib.tsort import topo_sort".

Oops.  Fixed.  (There's probably other improvements that could be made to the
imports in this module, but then bzrlib/weave.py isn't really that important.)

> So the biggest thing is bzrlib.user_encoding. I don't think we can get
> rid of it yet, and in general there isn't a win here for startup time,
> because we use it for command line processing. So I'd rather defer on
> that for now.

Sure.  I'll submit everything here minus the user_encoding changes to PQM.
Thanks for the review!

> PS> Any chance that you have some nice "bzr --no-plugin st" times to
> show the benefits of all this work?

I do indeed!

All times are best of 5 runs (using “for i in `seq 5` ; do time $command ;
done”.  I set my laptop cpufreq governor to “performance” as well.

bzr st of bzr.dev, one file in the tree modified:
-------------------------------------------------

With bzr.dev:

real    0m0.274s
user    0m0.252s
sys     0m0.020s

With faster-startup:

real    0m0.242s
user    0m0.232s
sys     0m0.008s

242/274 == 88%

(Times for a clean tree are essentially the same)


bzr log --line -r -1 of bzr.dev
-------------------------------

bzr.dev:

real    0m0.200s
user    0m0.176s
sys     0m0.020s

faster-startup:

real    0m0.170s
user    0m0.140s
sys     0m0.032s

170/200 == 85%


bzr st of empty standalone branch
---------------------------------

bzr.dev:

real    0m0.190s
user    0m0.156s
sys     0m0.020s

faster-startup:

real    0m0.157s
user    0m0.128s
sys     0m0.024s

157/190 == 83%

I'd expect there to be a noticeable improvement for cold-cache times too, but I
haven't checked.

-Andrew.




More information about the bazaar mailing list