Name

cvs-fast-export — fast-export history from a CVS repository or RCS collection.

Synopsis

cvs-fast-export [-h] [-C] [-F] [-a] [-w fuzz] [-k] [-g] [-v] [-A authormap] [-R revmap] [-V] [-T] [--reposurgeon] [-e remote] [-s stripprefix] [-i] [-t threads] [-p] [-P]

DESCRIPTION

cvs-fast-export tries to group the per-file commits and tags in a RCS file collection or CVS project repository into per-project changeset commits with common metadata, in the style of Subversion and later version-control systems.

This tool is best used in conjunction with reposurgeon(1). Plain cvs-fast-export conversions contain various sorts of fossils that reposurgeon is good for cleaning up. See the DVCS Migration HOWTO to learn about the sanity-checking and polishing steps required for a really high-quality conversion, including reference lifting and various sorts of artifact cleanup.

If arguments are supplied, the program assumes all ending with the extension ",v" are master files and reads them in. If no arguments are supplied, the program reads filenames from stdin, one per line. Directories and files not ending in ",v" are skipped. (But see the descripotion of the -P for how to change this behavior.)

Files from either Unix CVS or CVS-NT are handled. If a collection of files has commitid fields, changesets will be constructed reliably using those.

In the default mode, which generates a git-style fast-export stream to standard output:

  • The prefix given using the -s option or, if the option is omitted, the longest common prefix of the paths is discarded from each path.
  • Files in CVS Attic and RCS directories are treated as though the "Attic/" or "RCS/" portion of the path were absent. This usually restores the history of files that were deleted.
  • Permissions on all fileops related to a particular file will be controlled by the permissions on the corresponding master. If the executable bit on the master is on, all its fileops will have 100755 permissions; otherwise 100644.
  • A set of file operations is coalesced into a changeset if either (a) they all share the same commitid, or (b) all have no commitid but identical change comments, authors, and modification dates within the window defined by the time-fuzz parameter. Unlike some other exporters, no attempt is made to derive changesets from shared tags.
  • Commits are issued in time order unless the cvs-fast-export detects that some parent is younger than its child (this is unlikely but possible in cases of severe clock skew). In that case you will see a warning on standard error and the emission order is guaranteed topologically correct, but otherwise not specified (and is subject to change in future versions of this program).
  • CVS tags become git lightweight tags when they can be unambiguously associated with a changeset. If the same tag is attached to file deltas that resolve to multiple changesets, it is reported as if attached to the last of them.
  • The HEAD branch is renamed to master.
  • Other tag and branch names are sanitized to be legal for git; the characters ~^\*? are removed.
  • Since .cvsignore files have a syntax upward-compatible with that of .gitignore files, they’re renamed. In order to simulate the default ignore behavior of CVS, those defaults are prepended to root .cvsignore blobs renamed to .gitignore, and a root .gitignore containing the defaults is generated if no such blobs exist.

See the later section on RCS/CVS LIMITATIONS for more information on edge cases and conversion problems.

This program does not depend on any of the CVS metadata held outside the individual content files (e.g. under CVSROOT).

The variable TMPDIR is honored and used when generating a temporary directory in which to store file content during processing.

OPTIONS

-h
Display usage summary.
-w fuzz
Set the timestamp fuzz factor for identifying patch sets in seconds. The default is 300 seconds. This option is irrelevant for changesets with commitids.
-k
Enable RCS/CVS keyword expansion. No effect unless the default keyword expansion field has been set in the RCS/CVS masters. Not recommended; tends to produce spurious diffs.
-g
generate a picture of the commit graph in the DOT markup language used by the graphviz tools, rather than fast-exporting.
-a
Dump a list of author IDs found in the repository, rather than fast-exporting.
-C
Force canonical order (same as git-fast-export’s) in the emitted stream. Blobs are emitted as late as possible before the commits that require them. This option is forced if you specify -i (incremental) dumping. It reduces throughput by about a factor of two.
-F
Force fast order. Blobs are emitted first, then commits. This is faster, but incompatible with incremental dumping.
-A authormap

Apply an author-map file to the attribution lines. Each line must be of the form

ferd = Ferd J. Foonly <foonly@foo.com> America/Chicago

and will be applied to map the Unix username ferd to the DVCS-style user identity specified after the equals sign. The timezone field (after > and whitespace) is optional and (if present) is used to set the timezone offset to be attached to the date; acceptable formats for the timezone field are anything that can be in the TZ environment variable, including a [+-]hhmm offset. Whitespace around the equals sign is stripped. Lines beginning with a # or not containing an equals sign are silently ignored.

-R revmap
Write a revision map to the specified argument filename. Each line of the revision map consists of three whitespace-separated fields: a filename, an RCS revision number, and the mark of the commit to which that filename-revision pair was assigned. Doesn’t work with -g.
-v
Show verbose progress messages mainly of interest to developers.
-T
Force deterministic dates for regression testing. Each patchset will have a monotonic-increasing attributed date computed from its mark in the output stream - the mark value times the commit time window times two.
--reposurgeon
Emit for each commit a list of the CVS file:revision pairs composing it as a bzr-style commit property named "cvs-revisions". From version 2.12 onward, reposurgeon(1) can interpret these and use them as hints for reference-lifting.
--embed-id
Append to each commit comment identification of the CVS commits that contributed to it.
--expand
A more general expansion option than -k. You can specify anty of the CVS keyword expansion types: "kv", "kvl", "k", "v", "o", or "b". Consult the CVS documentation for explanation.
-V
Emit the program version and exit.
-e remote
Exported branch names are prefixed with refs/remotes/remote instead of refs/heads, making the import appear to come from the named remote.
-s stripprefix
Strip the given prefix instead of longest common prefix
-t threadcount
Instead of running the analysis sequentially, run it in N parallel threads (N < 2 has no effect). This increases the program’s memory footprint proportionally to the number of threads, but means the conversion may run in less total time because an I/O operation involving one master file will not block compute-intensive processing of others. Throughput gain is negligible on a dual-core machine running two threads but may be higher when more processors are available.
-p
Enable progress reporting. This also dumps statistics (elapsed time and size of maximum resident set) for several points in the conversion run.
-P
Normally cvs-fast-export will skip any filename presented as an argument or on stdin that does not end with the RCS/CVS extension ",v". A strict reading of RCS allows masters without this extension. This option sets promiscuous mode, disabling the check.
-i date
Enable incremental-dump mode. Only commits with a date after that specified by the argument are emitted. Each branch root in the incremental dump is decorated with git-stream magic which, when interpreted in context of a live repository, will connect that branch to any branch of the same name. The date is expected to be RFC3339 conformant (e.g. yy-mm-ddThh:mm:ssZ) or else an integer Unix time in seconds.

If neither -F nor -C is specified, cvs-fast-export will choose a mode based on the repository size - canonical order for small repositories, fast for large ones. Tools that consume git-fast-import should not care; this behavior is for backward compatibility.

EXAMPLE

A very typical invocation would look like this:

find . | cvs-fast-export >stream.fi

Your cvs-fast-export distribution should also supply cvssync(1), a tool for fetching CVS masters from a remote repository. Using them together will look something like this:

cvssync anonymous@cvs.savannah.gnu.org:/sources/groff groff
find groff | cvs-fast-export >groff.fi

Progress reporting can be reassuring if you expect a conversion to run for some time. It will animate completion percentages as the conversion proceeds and display timings when done.

RCS/CVS LIMITATIONS

Translating RCS/CVS repositories to the generic DVCS model expressed by import streams is not merely difficult and messy, there are weird RCS/CVS cases that cannot be correctly translated at all. cvs-fast-export will try to warn you about these cases rather than silently producing broken or incomplete translations, but there be dragons. We recommend some precautions under SANITY CHECKING.

CVS-NT and versions of GNU CVS after 1.12 (2004) added a changeset commit-id to file metadata. Older sections of CVS history without these are vulnerable to various problems caused by clock skew between clients; this used to be relatively common for multiple reasons, including less pervasive use of NTP clock synchronization. cvs-fast-export will warn you ("commits before this date lack commitids") when it sees such a section in your history. When it does, these caveats apply:

  • If timestamps of commits in the CVS repository were not stable enough to be used for ordering commits, changes may be reported in the wrong order.
  • If the timestamp order of different files crosses the revision order within the commit-matching time window, the order of commits reported may be wrong.

One more property affected by commitids is the stability of old changesets under incremental dumping. Under a CVS implementation issuing commitids, new CVS commits are guaranteed not to change cvs-fast-export’s changeset derivation from a previous history; thus, updating a target DVCS repository with incremental dumps from a live CVS installation will work. Even if older portions of the history do not have commitids, conversions will be stable. This stability guarantee is lost if you are using a version of CVS that does not issue commitids.

Also note that a CVS repository has to be completely reanalyzed even for incremental dumps; thus, processing time and memory requirements will rise with the total repository size even when the requested reporting interval of the incremental dump is small.

These problems cannot be fixed in cvs-fast-export; they are inherent to CVS.

CVS-FAST-EXPORT REQUIREMENTS AND LIMITATIONS

Branches occurring in only a subset of the analyzed masters are not correctly resolved; instead, an entirely disjoint history will be created containing the branch revisions and all parents back to the root.

CVS vendor branches are a source of trouble. Sufficiently strange combinations of imports and local modifications will translate badly, producing incorrect content on master and elsewhere.

Some other CVS exporters try, or have tried, to deduce changesets from shared tags even when comment metadata doesn’t match perfectly. This one does not; the designers judge that to trip over too many pathological CVS tagging cases.

cvs-fast-export is designed to do translation with all its intermediate structures in memory, in one pass. This contrasts with cvs2git(1), which uses multiple passes and journals intermediate structures to disk. The tradeoffs are that cvs-fast-export is much faster than cvs2git, but will fail with an out-of-memory error on very large CVS repositories that cvs2git can successfully process.

On stock PC hardware in 2014, cvs-fast-export achieves processing speeds upwards of 10K CVS commits per minute on real repositories. For this it requires memory about equal to the textual size of all RCS commit metadata.

The program’s transient storage requirements can be quite a bit larger; it must slurp in each entire master file once in order to do delta assembly and generate the version snapshots that will become snapshots. Using the -t option multiplies the expected amount of transient storage required by the number of threads; use with care, as it is easy to push memory usage so high that swap overhead overwhelms the gains from not constantly blocking on I/O.

It also requires temporary disk space equivalent to the sum of the sizes of all revisions in all files. Thus, large conversions will transiently require lots of space, quite a bit more than the on-disk size of the CVS repository.

When running multithreaded, there is an edge case in which the program’s behavior is nondeterministic. If the same tag looks like it should be assigned to two different gitspace commits with the same timestamp, which tag it actually lands on will be random.

SANITY CHECKING

After conversion, it is good practice to do the following verification steps:

  1. Use diff(1) with the -r option to compare a CVS head checkout with a checkout of the converted repository. The only differences you should see are those due to RCS keyword expansion and .cvsignore lifting. If this is not true, you have found a serious bug in cvs-fast-export; please report it with a copy of the CVS repo.
  2. Examine the translated repository with reposurgeon(1) looking (in particular) for misplaced tags or branch joins. Often these can be manually repaired with little effort. These flaws do not necessarily imply bugs in cvs-fast-export; they may simply indicate previously undetected malformations in the CVS history. However, reporting them may help improve cvs-fast-export.

The above is an abbreviated version of part of DVCS Migration HOWTO; browse it for more.

RETURN VALUE

0 if all files were found and successfully converted, 1 otherwise.

ERROR MESSAGES

Most of the messages cvs-fast-export emits are self.explanatory. Here are a few that aren’t. Where it says "check head", be sure to sanity-check against the head revision.

tag could not be assigned to a commit
RCS/CVS tags are per-file, not per revision. If developers are not careful in their use of tagging, it can be impossible to associate a tag with any of the changesets that cvs-fast-export resolves. When this happens, cvs-fast-export will issue this warning and the tag named will be discarded.
warning - unnamed branch
A CVS branch lacks a head label. A label with "-UNNAMED-BRANCH" suffixed to the name of the parent branch will be generated.
warning - no master branch generated
cvs-fast-export could not identify the default (HEAD) branch and therefore there is no "master" in the conversion; this will seriously confuse git and probably other VCSes when they try to import the output stream. You may be able to identify and rename a master branch using reposurgeon(1).
warning - xxx newer than yyy
Early in analysis of a CVS master file, time sort order of its deltas doesn’t match the topological order defined by the revision numbers. The most likely cause of this is clock skew between clients in very old CVS versions. The program will attempt to correct for this by tweaking the revision date of the out-of-order commit to be that of its parent, but this may not prevent other time-skew errors later in analysis.
too late date through branch
A similar problem to "newer than" being reported at a later stage, when file branches are being knit into changeset branches. Could lead to incorrect branch join assignments. Can also result in an invalid stream output that will crash git-fast-import.
some parent commits are younger than children
May indicate that cvs-fast-export aggregated some changesets in the wrong order; probably harmless, but check head.
warning - branch point later than branch
Late in the analysis, when connecting branches to their parents in the changeset DAG, the commit date of the root commit of a branch is earlier than the date of the parent it gets connected to. Could be yet another clock-skew symptom, or might point to an error in the program’s topological analysis. Examine commits near the join with reposurgeon(1); the branch may need to be reparented by hand.
more than one delta with number X.Y.Z
The CVS history contained duplicate file delta numbers. Should never happen, and may indice a corrupted CVS archive if it does; check head.
{revision|patch} with odd depth
Should never happen; only branch numbers are supposed to have odd depth, not file delta or patch numbers. May indice a corrupted CVS archive; check head.
duplicate tag in CVS master, ignoring
A CVS master has multiple instances of the same tag pointing at different file deltas. Probably a CVS operator error and relatively harmless, but check that the tag’s referent in the conversion makes sense.
tag or branch name was empty after sanitization
Fatal error: tag name was empty after all characters illegal for git were removed. Probably indicates a corrupted RCS file.
revision number too long, increase CVS_MAX_DEPTH
Fatal error: internal buffers are too short to handle a CVS revision in a repo. Increase this constant in cvs.h and rebuild. Warning: this will increase memory usage and slow down the tests a lot.
snapshot sequence number too large, widen serial_t
Fatal error: the number of file snapshots in the CVS repo overruns an internal counter. Rebuild cvs-fast-export from source with a wider serial_t patched into cvs.h. Warning: this will significantly increase the working-set size
too many branches, widen branchcount_t
Fatal error: the number of branches descended from some single commit overruns an nternal counter. Rebuild cvs-fast-export from source with a wider branchcount_t patched into cvs.h. Warning: this will significantly increase the working-set size
internal error - branch cycle
cvs-fast-export found a cycle while topologically sorting commits by parent link. This should never happen and probably indicates a serious internal error: please file a bug report.
internal error - lost tag
Late in analysis (after changeset coalescence) a tag lost its commit reference. This should never happen and probably indicates an internal error: please file a bug report.

REPORTING BUGS

Report bugs to Eric S. Raymond <esr@thyrsus.com>. The project page is at http://catb.org/~esr/cvs-fast-export

SEE ALSO

rcs(1), cvs(1), cvssync(1), reposurgeon(1), cvs2git(1).