Why I prefer rebase to merge. Is there a better alternative?

Mon Oct 19 05:11:47 BST 2009

Andrew Bennetts writes:

 > If you rewrite history so that it always looks like the code was
 > like that, you now have a branch that always had that bug.  You've
 > lost information about when it really happened.

The reason I prefer rebase is that I see histories like this all too
often (bare numbers are the public mainline, A and B denote
developers' personal workspaces):

  A1 ---A2 -------- A3
 /   \     \       /   \
0 --- 1 --- 2 --- 3 --- 4 --- ...
 \           \   /
  B1 -------- B2

Now where did that bug come in?  How do you do a bisect on that?
Also, there's no way to distinguish Dev A's "junk" branch from a
targeted "feature" branch like Dev B's.  In the XEmacs repo there are
several cases where a single developer has up to 10 branches pending
merge (as you can see in "hg view"), then they have a merge party and
what could have simply been ten simple patches in a row becomes first
a Bill the Cat scale hairball, then a long series of "automatic merge"
logs as the DAG collapses back into the mainline.

 > Note that even if you do a rebase where you confirm the results are
 > good by running the full test suite for every newly synthesised
 > revision you can still have this problem because you may have
 > introduced a bug that the current test suite doesn't notice.

But you would still have the problem of not easily finding the bug
with a merge, and there is no way to determine where the bug is by
simple bisection; you have to trace each merged branch separately, and
if it's a true merge bug, you won't find it in either one.  I also
have found that "true" merge bugs (that manifest at parallel merge
points) tend to be N1xN2 complex (where Ni is the number of lines
commited in each branch since divergence), while those that manifest
in a series merge (rebase) tend to be N1xP* complex (where P* is the
line count of the offending patch in the second series), because you
find the offending patch in the later series by manual bisection (the
worst case, where the recipe to elicit the bug is hard to script), and
then you just need to compare that to the whole change introduced by
the earlier series of commits.

This may be due to the crazy-quilt workflow that I have am stuck with
in non-rebasing projects.  Bazaar, with its more disciplined workflow,
may not see the problem at all.

 > Actually, I think that's good data to preserve.

In my git projects, I've always done that.  The first thing I do
before rebasing is tag.

 > It's often interesting and useful to know how long branches take to
 > develop, so you can try to improve bottlenecks in your development
 > process.  (Is there a long gap between branch finished and branch
 > landed?  etc.)

That information is present in a proper rebase, though, because the
dates are not changed on the commits unless a merge conflict occurs.

 > I agree that often this extra will be basically ignored, but at the
 > same time the cost of preserving it is minimal.  Note that how you
 > present changes to upstream for review is a separate issue to what
 > the branch history looks like!

But you usually do not have access to the "presentation for review"
after the branch has been landed.  And in projects where the main
developers are unpaid volunteers, it's hard to enforce formal review
processes unless your project is really sexy.

I'm not arguing that rebase is necessarily a "good" workflow, but it
does have advantages in my view over an undisciplined (parallel)
merge-based workflow.

Regards,
Steve