Advice/help wanted on bzr fast-export-from-cvs
Ian Clatworthy
ian.clatworthy at canonical.com
Wed Aug 12 05:50:25 BST 2009
Michael Haggerty wrote:
Michael,
Thanks for the fast response!
> First of all, I suggest basing your work on the trunk version cvs2svn.
> It is long overdue that I make a new release, and trunk contains a few
> improvements that would be important for exporting to git-fast-import
> format.
Let me strongly encourage you to do that. :-) Ideally, I would like your
best work in karmic (if we don't bundle it). Right now, cvs2svn is
version 2.1.1 in jaunty and karmic ought to have 2.2.0, though Launchpad
reports that 2.2.0 is failing to build. That can't be good. :-( See
https://launchpad.net/ubuntu/+source/cvs2svn/2.2.0-1.
> For example, the cvs2git script creates separate git-fast-import
> dumpfiles for the blobs and for the commits, whereas until recently "hg
> fastimport" only supported inline blobs. If "bzr fastimport" requires
> inline blobs, than you would have to use the (much slower) output option
> using GitRevisionInlineWriter as done in the cvs2hg-example.options file.
bzr-fastimport will support either, though it wants a single import
file, not two of them. Keep in mind that bzr's data store doesn't work
the way git's does, so loading 100s of blobs in one step followed by
trees after that in another step just won't work.
Within that single import stream, it is typically better if blobs are
introduced just before they are used and, ideally, not reused. If a tree
references a blob introduced much earlier in the import stream, then I
need to keep those blobs around (in memory currently) and I need smart
tracking of how often each blob is referenced for garbage collection
purposes. Right now, that's implies an additional extra step beforehand
- running bzr fast-import-info.
FWIW though, I'm looking to just always run fast-import-info implicitly
when fast-import starts. After all, I need the information from
fast-import-info for good progress monitoring and I need good progress
monitoring before I can wrap fast-import in a QBzr-style GUI dialog. The
end goal is a nice wizard in Bazaar Explorer
(http://doc.bazaar-vcs.org/explorer/en/) that helps users migrate
existing projects to Bazaar with a minimum of fuss. The current release
of Bazaar Explorer has an "Import" button on the Welcome page but it's
disabled. That's a bug I'm keen to fix this weekend. :-)
> Another difference between cvs2git and cvs2hg is that hg only supports
> 0, 1, or 2 parents per commit, whereas it allows an unlimited number.
> This can also be adjusted in cvs2svn, using GitOutputOption's max_merges
> parameter.
Bazaar has no limits here.
> But anyway, there is a serious question as to what parents to record for
> branch-creation commits and commits that involve adding new files to a
> branch. Currently, cvs2git records all branches that contribute files
> to a branch as parents, but (having gained more experience with git) I
> am skeptical whether that behavior is correct. I think it would be more
> in the spirit of DAG-based VCSs to only consider the "best" source
> branch to be a parent of a new branch. Greg Ward, who has recently done
> some work on an improved cvs2svn-based cvs2hg, plans to do the latter.
Sounds good. I think a consistent policy is certainly the right thing,
though I don't personally have any deep opinions yet on the best policy.
> Even faster than those two options, by a significant factor, is a
> --use-internal-co option that I have prototyped on my hard disk but not
> yet released to the wild. The analogous option is the default for
> cvs2svn and would probably be the best default for the other converters,
> if it can get released in time.
Go for it. :-)
>> 4. Is is worth bundling the necessary pieces in bzr-fastimport itself
>> rather than asking users to separately install it? (A separate
>> install is a minor thing for Ubuntu/Debian users, say, but a PITA
>> for Windows users IIUIC.)(#)
>
> That's a nice idea. Here are some considerations:
>
> * As you can probably imagine, I am not anxious to have to support
> multiple cvs2xxx variants as distributed (and perhaps even modified) by
> different downstream projects. But I suppose if you would distribute an
> unmodified, defined version of cvs2svn and make it easy for users to see
> the cvs2svn command line that was used and to report problems upstream
> in a usable form, it wouldn't be so terrible.
Right. We'd certainly keep the original script in place and easy to run.
When a fast-export-from-xxx wrapper runs, the first thing it does is
tell the user exactly what command line it is executing so I think you'd
still get bug reports filed upstream, even if it was originally filed
against bzr-fastimport. Keep in mind that Launchpad is particularly cool
in this regard: most bugs in Ubuntu are actually upstream bugs so it
provides lots of features to assist making that upstream collaboration
easier. I'm happy to ensure any patches we receive are forwarded
upstream as well.
> * cvs2svn is currently under a CollabNet license which, as far as I can
> figure out, is not GPL-compatible. This might or might not present a
> problem, depending on how you want to connect your code to cvs2svn. It
> is conceivable that CollabNet would agree to change the license, but
> that is unfortunately not my decision.
Hmm. We could restrict ourselves to calling it via the command line
interface if that helped.
>> 5. Does the gnu 'sort' dependency still hold? Is there a good reason for
>> needing that versus doing the sorting in Python, say?
>
> Yes, we still require GNU sort. The sorting could definitely be done in
> Python, but not in-memory because we often have to sort enormous files.
> Patches would be welcome :-)
Indeed. The more users, the more likely those will be. :-)
So in summary:
1. I'd love to see a new official release if trunk is clearly better
than cvs2svn 2.2.0.
2. We need to work out the best way of getting that in the hands of
testers/users so that the following will Just Work:
bzr fast-export-from-cvs source-repo project.fi
bzr fast-import project.fi destination-repo
3. I'm planning to package the latest version of bzr-fastimport next
week and then beg the relevant people to package it for karmic,
bundle it in our Windows and OS X installers, etc. :-)
As soon as I'm happy enough with the new code in bzr-fastimport, I'm
planning to whip up a better Bazaar Migration Guide (won't be hard with
the nicer wrapper scripts) and to ask the Bazaar community for migration
testing. *If* we can get fast-export-from-cvs working together with a
recommended release of cvs2svn before 2.0 ships, then I'm sure you'll
get some useful feedback from the process above.
Ian C.
More information about the bazaar
mailing list