[RFC] Faster commit - more progress and call for input

John Arbash Meinel john at arbash-meinel.com
Fri Jun 1 16:41:46 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> For those interested, my latest code for speeding up commit is provided
> in the attached bundle. It has a noticeable impact: 50% quicker for
> small commits and up to the same for the initial commit.

Another thing you might look at is "bzr commit -q". It is doing
something really weird with the progress bar. It seems to be clearing it
 way too often (it flickers).

If we are getting flickering, that means we are writing out a string of
space characters to clear the prompt. Which may also be an indication of
a problem. (Especially since stderr is unbuffered).

> 
> While not 100% ready for merging, I think this is getting close now to
> what I hope to see in 0.17. I fully expect some rigorous
> debate/discussion on some of the proposed changes though so this is
> marked as 'RFC', not 'MERGE'.
> 
> In summary, the changes do the following:
> 
> 1. buffer the report of commit messages until the end

git doesn't show them at all, and neither does hg. Actually, I just did
"hg init; hg add a b; hg commit -m foo" and it didn't say anything. At
'add' or 'commit' time. (hg 0.9.3). (Note that if you say 'hg add' it
will report what it adds).

And I have seen that "bzr add -q" and "bzr commit -q" are quite a bit
faster.

I bring this up to say that reporting what is happening is not a strict
requirement. I think we used "added foo" as a way to let you know commit
was working, but if we finish fast enough, we don't need to do it.

> 2. remove calls to dirstate._validate that shouldn't be in production
>    code

+2 from me :)

> 3. remove locking around some low level dirstate calls, namely
>    id2path, kind and is_executable

I don't think you mean "dirstate" calls. I think you mean WorkingTree calls.

The downside to this is that it makes testing WorkingTree a bit harder.
Because we require that you have a read-lock before it will keep the
inventory (or dirstate) in memory, so not having those means callers
need to lock them directly. I think we need to look at what functions
callers really need, and consider how we want our api to look.

We could:

a) Require callers to lock things rather than having convenience locking
at all. It is a bit of a pain, but really only the caller knows how long
it is going to be using this tree before the lock should be released.
Users *should* have been doing this already.

b) Have obviously high-level commands that auto-lock, and make it clear
what commands are low-level and should be locked by the user. This might
be the best place to do it.

c) Have alternative "_no_lock()" forms which can be used by internal
code, rather than the simple calls for the rest of the api.

> 4. minor code clean-ups in commit.py
> 
> In discussions with various people, the first of these appears the most
> contentious. I had been blaming progressive writes to ~/.bzr.log for
> slowing down the initial commit so much. It now seems that
> gnome-terminal is at least as guilty. Here are some times for checking
> in a Mozilla tree of 55k files:
> 
> Progressive writes to stdout: 25m52s
> Progressive writes to stdout > /dev/null: 15m33s
> All writes to stdout but at the end: 16m41s
> All writes to stdout but at the end > /dev/null: 16m24s
> 
> Writing to ~/.bzr.log adds 30 seconds to these times: bad but nothing
> compared to gnome-terminal doing its thing. As shown above, the fastest
> possible performance is progressive stdout writing being redirected to
> /dev/null. Even so, I remain cool on progressive writing. As well as
> making the progress bar a lot less friendly, something still feels wrong
> to me about reporting things are being 'added' during logically the data
> collection phase of commit. I'm putting on my fire-retardant suit in
> expectation of the flames expected in saying 'reporting at the end is
> the right thing because commit is atomic'. Fire away!!! :-)
> 
> Regardless of the outcome of that debate, the more important bit is the
> speeding up of small commits. The bundle has some read locks commented
> out. The production code to be merged probably needs those comments
> removed and explicit tests/asserts that a read lock has been taken. Can
> someone please confirm that and point me at the preferred pattern for
> that if any?

I'm not opposed to removing the commit message by default, and enabling
it with "bzr commit --verbose". We could just have the progress bar, or
something like that. But again, what we would really like is to be fast
enough that we don't need it.

> 
> I'd like to thanks Aaron, John and Robert for some off list emails and
> discussions explaining some of the complexity behind commit I didn't
> understand. We'll get that information consolidated and available so
> more people can understand the design tradeoffs and challenges that
> commit needs to make. The good news is that the large improvements
> delivered by this patch are just the low hanging fruit. Commit can be
> quite a bit faster still, though we won't have time for more advanced
> restructuring of commit in 0.17 in all likelihood.
> 
> Looking forward to everyone's comments,
> Ian C.

That would indeed be a nice document to have cleaned up and included in
our source.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGYD45JdeBCYSNAAMRAp/jAJ4mx8CM4p5VYoTX1vH4Je9OGKVmVwCfTNBE
97tnxVcYEUlMt+IXgLJFK0M=
=3CcT
-----END PGP SIGNATURE-----




More information about the bazaar mailing list