Call for testing: cvs2bzr

Greg Ward greg at gerg.ca
Wed Aug 19 20:06:16 BST 2009


On Wed, Aug 19, 2009 at 12:30 AM, Ian
Clatworthy<ian.clatworthy at canonical.com> wrote:
> That's pretty well it. We *could* handle a separate blobs file but it's
> nicer w.r.t. memory consumption for us to go the inline blob path ala
> hg. Unlike hg though, bzr has no limitations w.r.t. merge parent count.

But keep in mind that inline blobs make the dump file much much
larger.  That'll be troublesome for large conversions.  I implemented
a rather vile hack in hg-fastimport to make it handle separate blobs:
write each blob to .hg/blobs/<blobmark>.  Then rm -rf .hg/blobs at the
end of conversion.  It's slow and doubles the disk space overhead, but
at least it doesn't suck up RAM.  And it's still less disk space than
inline blobs.

(My "clever" idea for handling blobs: keep a dict mapping blob mark to
file offset.  Then when we need a blob, seek to that offset and read
the required number of bytes.  Never got around to implementing this,
and I'm not sure if it would save much I/O.  Fewer writes I suppose.)

>> When Greg Ward is further along on cvs2hg [1], we should think about
>> refactoring the common code a little better, maybe s/Git/FastImport/ and
>> make git, bzr, and hg all peers that derive from the shared code.
>
> That would be good. I've copied Greg on this email as my changes may
> help him get cvs2hg in place sooner.

Keep in mind that the cvs2hg I'm working on (patches coming soon) does
*not* use fastimport at all, because I figured I could do it faster
and better writing directly to a Mercurial repository from inside
cvs2svn.  (So far it looks like I was right.  Phew!)  So I don't
*think* your changes will help me much.  But there is risk of conflict
if you touched git_run_options.py or git_output_option.py: I
refactored a bunch of stuff that is common to the git and hg backends
out of those files.  If you start feeling like refactoring the heck
out of one of those files, it'll cause pain for one of us.  Guess I
should get my patches in soon!

> While it has limitations and can't express some of things Bazaar
> supports (like empty directories, multiple authors and ghost revisions),
> the fast-import format is the best we have today for interchanging
> metadata among the various VCS tools. High quality fast-exporters and
> fast-importers benefit us all, I believe.

Absolutely!  But this whole idea of making conversion "user friendly"
... sheesh.  What are you *thinking*?!?  If the code was hard to
write, it should bloody well be hard to use!  ;-)

Greg



More information about the bazaar mailing list