Wrote a small black-box test suite for bazaar

Wed Aug 27 11:26:41 BST 2008

Robert Collins <robertc <at> robertcollins.net> writes:

> 
> On Tue, 2008-08-26 at 21:07 +0200, Geoff Bache wrote:
> > 
> > Hi all,
> > 
> > I attach this in case anyone thinks it might be useful. I don't mean
> > to imply that Bazaar is inadequately tested
> > but I thought this might be a useful addition to its current testing.
> > It didn't take me long to produce it and
> > I think you could easily produce a lot more of this kind of test quite
> > quickly
> 
> Are you aware of our black box tests in bzrlib.tests.blackbox.test_*?
> Comprehensive tests for all our command-layer handling lives there
> currently.

Yes, I even referred to this in my original posting :)

> 
> From your description of your tests they will be extremely fragile to
> changes in any defaults: E.g. the transition from knits to packs would
> have changed the output data radically. 

Well yes, but it's not every day this kind of thing happens surely? And in any
case, it doesn't compare every file unless you tell it to, so all that would be
required in this case is for somebody to check the tests that change the
repository that the new output is correct and save the expected results.

In the case of stored repository data (a) it's useful to test back-compatibility
and (b) it is easy to upgrade the test repositories because it has to be easy to
upgrade them for your users in any case.

> Fork+exec is also slow when done
> thousands of times, which is why we only fork & exec sufficient to test
> that the test harness is robust and beyond that do it all in-process.

OK. I'm not claiming that this kind of test can replace your entire test suite,
or that that is even practical at this stage of development. I doubt whether the
extra time taken by fork+exec is going to have that much of an impact as a
percentage of total time, but you'd need to write a lot of tests to find out.
TextTest has support for parallelising the tests on several machines if slowness
becomes an issue. With a test-suite, the important thing is whether it can be
run (a) "while I watch" (b) "while I get a coffee" or (c) "only at night" so
performance changes of the order of (say) 10% don't make much difference in
practice.

> 
> That said though, what sort of coverage would the thing you have put
> together *add* to our current testing?

As it is, not much. It's an example only so far and there isn't any point
increasing the size of it unless there is some interest from the community in
taking it further. But testing in this way could in my estimation go all the way
to 100% statement coverage if you wanted it to (although covering every except
clause is usually not economic), as well as the advantages mentioned in the
original mail.

Regards,
Geoff Bache