reposurgeon is an extremely algorithmically complex program. It may still have bugs when dealing with strange corner cases in older repositories.
It is often extremely difficult or impossible to reproduce those bugs without a copy of the history on which they occurred. When you find a bug, please send me:
(a) An exact description of how observed behavior differed from expected behavior. If reposurgeon died, include the backtrace.
(b) A git fast-import or Subversion dump file of the repository you were operating on, or a pointer to where I can pull it from.
(c) A script containing the sequence of reposurgeon commands that revealed the bug.
Test case reduction
If you know how to reproduce the error, the best possible test case is a hand-crafted repository of minimal size with content that explains how it breaks the tool. I turn these into regression tests instantly.
When you don’t know the cause of the error, ship me a dump file derived from the real repository that triggered it. To speed up my debugging process so you can get an answer more quickly, there are some tactics you can use to reduce the bulk of the test case you send me. Also, a well-reduced dump can become a regression test to ensure the bug does not recur.
The commands you will use for test-case reduction are reposurgeon and, on Subversion dumps, repocutter.
Replace the content blobs in the dump with stubs
The subcommand in both tools is strip; it will usually cut the size of the dump by more than a factor of 10. Check that the bug still reproduces on the stripped dump; if it doesn’t, that would be unprecedented and interesting in itself.
If you are trying to maintain confidentiality about your code, sending me a stripped repo has the additional advantage that I won’t see the code! The command preserves structure and metadata but replaces each content blob with a unique magic cookie.
Truncate the dump as much as possible
Try to truncate the dump to the shortest leading section that reproduces the bug.
A reposurgeon error message will normally include a mark, event number, or (when reading a Subversion dump) a Subversion revision number. Use a selection-set argument to reposurgeon’s write command, or the select subcommand of repocutter, to pare down the dump so that it ends just after the error is triggered. Again, check to ensure that the bug reproduces on the truncated dump.
If the error message doesn’t tell you where the problem occurred, try a bisection process. Throw out the last half of the events in the dump; check to see if the bug reproduces. If it does, repeat; it it does not, throw out the last quarter, then the last eighth, and so forth. Keep this up until you can no longer drop events without making the bug go away.
Bisection is more effective than you might expect, because the kinds of repository malformations that trigger misbehavior from reposurgeon tend to rise in frequency as you go back in time. The largest single category of them has been ancient cruft produced by cvs2svn conversions.
Topologically reduce the dump
Next, topologically reduce the dump, throwing out boring commits that are unlikely to be related to your problem.
If a commit has all file modifications (no additions or deletions or copies or renames) and has exactly one ancestor and one descendant, then it may be boring. In a Subversion dump it also has to not have any property changes; in a git dump it must not be referred to by any tag or reset. Interesting commits are not boring, or have a not-boring parent or not-boring child.
Try using the reduce subcommand of repocutter to strip boring commits out of a Subversion dump. For a git dump, look at "strip reduce".
Prune irrelevant branches
Try to throw away branches that are not relevant to your problem. The expunge operation on repocutter or the branch delete command in reposurgeon may be helpful.
This is the attempted simplification least unlikely to make your bug vanish, so check that carefully after each branch deletion.
Know how to spot possible importer bugs
If your target VCS’s importer dies during a rebuild, try writing the repository content to a stream instead and importing the stream by hand. If the latter does not fail, the target VCS’s importer may be slightly buggy - but you have a workaround.
(This has been observed under git 2.5.0 with the result of a unite operation on two repositories. The cause is unknown, as git dies suddenly enough to not leave a crash report.)
If you need emergency help, go to the #reposurgeon IRC channel listed on the project web page. Be aware, however, that I am too busy to babysit difficult repository conversions unless I have explicitly volunteered for one or someone is paying me to care about it. For explanation, see Your money or your spec