There are many tools for converting repositories between version-control systems out there. This file explains why reposurgeon is the best of breed by comparing it to the competition.

The problems other repository-translation tools have come from ontological mismatches between their source and target systems - models of changesets, branching and tagging can differ in complicated ways. While these gaps can often be bridged by careful analysis, the techniques for doing so are algorithmically complex, difficult to test, and have ugly edge cases.

Furthermore, doing a really high-quality translation often requires human judgment about how to move artifacts - and what to discard. But most lifting tools are, unlike reposurgeon, designed as run-it-once batch processors that can only implement simple and mechanical rules.

Consequently, most repository-translation tools evade the harder problems. They produce a sort of pidgin rendering that crudely and partially copies the history from the source system to the target without fully translating it into native idioms, leaving behind metadata that would take more effort to move over or leaving it in the native format for the source system.

But pidgin repository translations are a kind of friction drag on future development, and are just plain unpleasant to use. So instead of evading the hard problems, reposurgeon tackles them head-on.

Here are some specific symptoms of evasion that are common enough to deserve tags for later reference.

LINEAR: One very common form of evasion is only handling linear histories.

NO_IGNORES: There are many different mechanisms for ignoring files - .cvsignore, Subversion svn:ignore properties, .gitignore and their analogues. Many older Subversion repositories still have .cvsignore files in them as relics of CVS prehistory that weren’t translated when the repository was lifted. Reposurgeon, on the other hand, knows these can be changed to .gitignore files and does it.

NO_TAGS: Many repository translators cannot generate annotated tags (or their non-git equivalents) even when that would be the right abstraction in the target system.

CONFIGURATION: Another common failing is for repository-translation tools to require a lot of configuration and ceremony before they can operate. Often, for example, tools that translate from Subversion repositories require you to declare the repository’s branch structure every time even though sensible defaults and a bit of autodetection could have avoided this.

MIXEDBRANCH: Yet another case usually handled poorly (in translators that handle branching) is mixed-branch commits. In Subversion it is possible (though a bad idea) to commit a changeset that modifies multiple branches at once. All sufficiently old Subversion repositories have these, often by accident. The proper thing to do is split these up; the usual thing is to assign them to one branch and leave them omitted from the others.

Version references in commit comments. It is not uncommon to see a lot of references that are no longer usable embedded in translated repositories like fossils in geological strata - file-version numbers like 1.2 in Subversion repos that had a former life in CVS, Subversion references like r1234 in git repositories, and so forth. There’s no tag for this because tools other than reposurgeon generally have no support at all for lifting these.

Problems with specific tools

To avoid repetitive text in these descriptions, I use the following additional bug tags:

CONFIGURATION: Requires elaborate configuration even for cases that ought to be simple.

ABANDONED: Effectively abandoned by its maintainer. Some tools with this tag are still nominally maintained but have not been updated or released in years.

NO_DOCUMENTATION: Poorly (if at all) documented.

!FOO means the tool is known not to have problem FOO.

?FOO means I have not tried the tool but have strong reason to suspect the problem is present based on other things I know about it.

You should assume that none of these tools do reference-lifting.

cvs2svn

Just after the turn of the 21st century, when Subversion was the new thing in version control, most projects that were using version control were using CVS, and cvs2svn was about the only migration path.

Early cvs2svn had problems on every level, only some of which have been fixed by more recent releases. It tended to spew junk commits into the translated history, and produced strange combinations of Subversion internal operations that most later translation tools would cope with only poorly. Sometimes the resulting translations are actually malformed; more often they contains noisy commits or commit duplications that made little sense under Subversion and make even less under the new target system.

!LINEAR, ?MIXEDBRANCH, DOCUMENTATION

cvs-fast-export

Formerly named parsecvs. Originally written by Keith Packard to port the X.org repositories, which it did a good job on. Later maintained by Bart Massey, now maintained by me. reposurgeon uses it to read CVS repositories.

git-svn

git-svn, the Subversion converter in the git distribution, is really designed to be a two-way live gateway enabling git users to push and pull commits from a Subversion server. It operates by creating a git repository that is effectively a local mirror of the Subversion history, then performing Subversion client commands to synchronize the two in a git-like way.

This choice of mission means that git-svn’s translation of history into git uses a compromise between Subversion idioms and git ones that is more designed to make transactions back to the Subversion server easy and safe to generate than it is to make full use of the git capabilities that Subversion doesn’t have. This is pidgin translation for a reason better than laziness or failure of nerve, but it’s still pidgin.

For a straight linear history with no tags or branches, the difference between git-svn’s Subversion-emulating behavior and the way a git repository would most naturally be structured is minimal. But for conformability with Subversion, git-svn cannot (practically speaking) use git’s annotated-tag facility in the local mirror; instead, Subversion tags have to be represented in the local mirror as git branches even if they have no changes after the initial branch copy.

Another thing the live-gatewaying use case prevents is reference-lifting. Subversion references like "r1234" in commit comments have to be left as-is to avoid creating pain for users of the same Subversion remote not going through git-svn.

!ABANDONED, MIXEDBRANCH, NO_TAGS, NO_IGNORES.

git-svnimport

Formerly part of the git suite; what they had before git-svn, and inferior to it. Among other problems, it can only handle Subversion repos with a "standard" trunk/tags/branches layout. Now deprecated.

MIXEDBRANCH, NO_TAGS, NO_IGNORES, ABANDONED.

git-svn-import

A trivial wrapper around git-svn.

MIXEDBRANCH, NO_TAGS, NO_IGNORES, !ABANDONED.

svn-fe

svn-fe was a valiant effort to produce a tool that would dump a Subversion repository history as a git fast-import stream. It made it into the git contrib directory and lingers there still.

LINEAR, NO_TAGS, NO_IGNORES, ABANDONED.

Tailor

Tailor aimed to be an any-to-any repository translator.

LINEAR, ?NO_IGNORES, ABANDONED.

agito

This is a Subversion-to-git tool that was written to handle some cases that git-svn barfs on (but reposurgeon doesn’t - the reposurgeon test suite contains a case sent by agito’s author to check this). It even handles mixed-branch commits correctly.

!LINEAR, !NO_TAGS, !MIXEDBRANCH, CONFIGURATION.

If you cannot use reposurgeon for some reason, this is one of the best alternatives.

svn2git (jcoglan/nirvdrum version)

A batch-conversion wrapper around git-svn that creates real tag objects. This is the one written in Ruby.

!ABANDONED, !NO_TAGS, NO_IGNORES.

If you cannot use reposurgeon for some reason, this is another alternative that is not too horrible.

svn2git (Schemenauer version)

Native Python. More a proof of concept than a production tool.

LINEAR, NO_TAGS, NO_IGNORES, NO_DOCUMENTATION, ABANDONED.

svn2git (Nyblom version)

Written in C++. Says it’s based on svn-fast-export by Chris Lee. Not easy to figure out what it actually does, as there is no documentation at all and no test cases. May be genetically related to svn-all-fast-export, but if so they diverged in 2008.

CONFIGURATION, NO_DOCUMENTATION.

svn-fast-export

Written in C. More a proof of concept than a production tool.

LINEAR, NO_TAGS, NO_IGNORES, NO_DOCUMENTATION, ABANDONED.

svn-dump-fast-export

Written in C. Documentation is so lacking that there isn’t even a README. However, it’s possible to deduce what isn’t there by reading the code.

LINEAR, NO_TAGS, NO_IGNORES, NO_DOCUMENTATION.

svn-all-fast-export

May be genetically related to the Nyblom svn2git, but if so they diverged in 2008.

LINEAR, NO_TAGS, NO_IGNORES, NO_DOCUMENTATION, ABANDONED.

reposurgeon success stories

reposurgeon has been used for successful conversion on projects including but not limited to the following. Thse are in rough chronological order.

Hercules (IBM mainframe emulator)

I did this one, Subversion to hg. About ten years of history at the time, not too horribly messy.

NUT (Network UPS Tools)

I did this one, Subversion to git. The trial by fire - it was when the Subversion dump analyzer got built. Very large old repository with lots of pathologies (there was a CVS stratum).

Battle For Wesnoth

I did this one, Subversion to git. Very large repo, moderately complex.

Roundup (issue tracker)

I did this one, Subversion to git (they later switched to hg). Moderate-sized Subversion repo with some very strange malformations.

robotfindskitten

I did this one, CVS to git. Simple history, pretty easy.

Blender

Two guys at Blender did this one with help from me, Subversion to git. Huge repository with a lot of nasty pathologies. The tool needed some serious optimization and feature upgrades to handle it.

groff

I did this one, CVS to git. Rather easy as the project history was almost linear and, though very old, not huge.

Nethack

CVS to git. This conversion has not yet been publicly released at time of writing (late October 2014) for complicated political reasons.

Emacs

In progress, converting to git. A record three layers, Bazaar over CVS over RCS. Malformations not too bad except for some unique challenges created by the RCS-to-CVS conversion, but the sheer size of the history and number of layers makes it the most complex conversion yet.