Replacing an expensive proprietary CM system with bzr.

Tue Feb 26 20:05:48 GMT 2008

On Tue, 2008-02-26 at 20:40 +0100, Jurgen Defurne wrote:
> I am currently doing an investigation to see if it would be possible
> (long term) to replace a commercial CM solution with a combination of
> Bazaar and mySQL. I only need to focus on Bazaar, since a whole lot of
> the bug tracking is currently supported by mySQL for caching and
> speed.

Cool

> * Necessary features for switching from the other system to bzr
> ** Graphical interface in Windows : context menu like TortoiseSVN

There is a rough tortoiseBZR at the moment, and I believe it is being
improved at the moment. Certainly we consider this an important feature.

> ** Integration of bug tracker and VCS via graphical interface

Bazaar supports metadata in commits; we have an option '--fixes' in the
bzr command line client that will record a bug being fixed. I'm fairly
sure this is also supported in the main GUI clients today, but if not it
should be quite trivial to do so. ('bzr help bugs' will give you more
details about this feature).

> ** Good graphical history view of objects

I think 'bzr viz' is quite good - certainly it has been reused by other
modern VCS tools to provide their graphical history view.

> ** NestedTreeSupport

This is currently immature; the bulk of the development work has been
completed, but the developers involved have prioritised performance
improvements over nested trees in the short term. We have a group
get-together next week and this is on the agenda. Nested trees are an
important feature :).

> ** How does Bazaar handle databases of 80 Gb and more ?
>    The main question here is, how can you improve on the speed of the
>    central repository when several developers at once are doing
>    updates or checkouts ? When the build manager is tagging 211
>    different checkouts in one tree ?

Well, I can't speak for how it handles it today, as I don't have the
facilities to realistically test on that scale. I can give you a few
thoughts about where we are today, and what we are doing in the future.
In two parts - working tree, and repository. For the repository: it is
part of the specification for nested tree that you can have nested trees
while still having separate history databases for each tree. This lets
you partition the IO workload required by your servers. Each repository
has a set of read only data files (packs), and commits create new files;
and from time to time combine existing files to prevent huge seek
activity when accessing your database. The combining operation gives
increased locality of reference for related data and helps with scaling
up. Our indices follow a similar scheme, with each index being keyed to
a specific pack file. Currently we buffer the region of each index that
is accessed during an operation in memory for performance. This in
extremely large databases could lead to memory pressure - but we can
reduce or eliminate that buffering. We already have plans to tweak the
index format to reduce the need for buffering. Updates and checkouts
only perform reads on the central repository, and the same disk blocks
will be read for developers working on the same region of your project,
allowing OS disk cache hits. There are developers working on better
delta logic too, which will approximately halve the size of your
historical database between the current bzr storage and the new one.

For the working tree, the checkout on disk that you commit to, the
primary indicator for performance is not the number of bytes in the
tree, but rather the total number of paths, and the number of lines in
any modified file. Tagging any number of subtrees - that could mean two
things. It could mean a commit in the top level followed by making a new
branch, or it could just mean making a new branch from a previously
known point in time. For the former, bzr will first perform essentially
'bzr status' on each tree to detect changes, and then record a new
inventory for the top level tree, which should be a fairly small tree
from the sound of it. Making a new branch in bzr requires writing a few
K of data when a shared repository is in use (and you'll likely want
one :)), so should be nearly instant.

> ** Daily build and acceptance work-flow description
> 
>    Developers check out their private checkout, and use it for
>    development.
> 
>    At the end of the day a program starts an update and a build. In
>    the morning, the test team checks the build
>    results and gives later in the day the go ahead to say that the
>    particular revision is allright. However, as part of the build,
>    there may be libraries and executables taken in, which need to be
>    committed. Does this mean that prior to daily acceptance no one may
>    commit, or is there another solution possible with bzr ?

Branches are a very useful thing to represent concurrency. I would say
here that your acceptance tool should have it's own branch. When it runs
it will:
 - pull --overwrite the mainline into its branch
 - test
 - commit the tree if changes were made

Then, whoever is responsible for checking acceptance does:
 - merge from the acceptance tools branch
 - commit 

And at this point developers will get the acceptance tools changes and
libraries when they next update from the mainline.

>    After daily acceptance, it should be possible to say to the
>    developers to which revision they may update their checkout.
> 
>    Developers are responsible for merging their work against the
>    latest accepted (promoted) status.

> ** Multi-site work flow description
>    Daily check-ins are not sent to each repository. Rather, all
>    projects get assigned responsibility to one subsystem, and these
>    are developed in the context of other subsystems. There is a weekly
>    release. This weekly release should be taken into the other
>    repositories. The biggest problem with this part is the deletion of
>    objects.
> 
>   I suppose that in the case of daily-build and multi-site work flow,
>   the solutions are laying in the distributed development model of
>   bzr. In this case, I should probably first do a comparison of terms.
> 
> | Proprietary | bzr                   |
> |-------------+-----------------------|
> | database    | repository            |
> | project     | light-weight checkout |
> | version     | revision              |
> 
>   The other system has an export mechanism which makes it possible to
>   pack a certain revision of a project into a package which can be
>   transferred to another database and recreated there. This mechanism
>   can be used differentially, in which case it is possible to send
>   only deltas between the originating and the receiving databases. I
>   suppose the only available mechanism in lieu of this is probably
>   diff and patch.

Why not push and pull ?

>   I should be running checks to see which transport mechanism is the
>   fastest. I have already established that the file:// protocol, over
>   a CIFS share is very slow.

What latency do you have involved between your workstation and the CIFS
share?
How many files in the checkout?
How many revisions?
(bzr info -v will answer some of this).

You may be running into scaling issues with your tree size - in which
case I'd be delighted to help you track the down sufficiently that I can
file a bug report (which can then be fixed).

> ** The way access rights have to be assigned to a repository is not
>    clear
>    Especially since I am now doing experiments in a Cygwin environment,
>    I got twice problems with locks which could not be removed from the
>    repository after doing a checkout.

Cygwin is, in general, very slow. (I spent some time as a cygwin
developer - this is not a criticism). I strongly recommend using the
native python version of bzr as we have done the porting work in bzr,
which means less overhead and better performance.

-- 
GPG key available at: <http://www.robertcollins.net/keys.txt>.
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080227/391d826d/attachment.pgp