SYNOPSIS

cvs-fast-export [-h] [-a] [-c] [-w 'fuzz'] [-g] [-N] [-l] [-v] [-q] [-V] [-T] [-p] [-P] [-i 'date'] [-A 'authormap'] [-R 'revmap'] [-r|--reposurgeon] [-e 'remote'] [file-or-dir…​]

DESCRIPTION

cvs-fast-export tries to group the per-file commits and tags in an RCS file collection or CVS project repository into per-project changeset commits with common metadata. It emits a Git fast-import stream describing these changesets to standard output.

This tool is best used in conjunction with reposurgeon(1). Plain cvs-fast-export conversions contain various sorts of fossils that reposurgeon is good for cleaning up. See the Repository Editing and Conversion With Reposurgeon to learn about the sanity-checking and polishing steps required for a really high-quality conversion, including reference lifting and various sorts of artifact cleanup.

If arguments are supplied, the program assumes all ending with the extension ",v" are master files and reads them in; directory arguments read in all paths matching ",v" beneath the directory. If no arguments are supplied, the program reads filenames from stdin, one per line. Directories and files not ending in ",v" are skipped. (But see the description of the -P option for how to change this behavior.)

Files from either Unix CVS or CVS-NT are handled. If a collection of files has commitid fields, changesets will be constructed reliably using those.

In the default mode, which generates a git-style fast-export stream to standard output:

  • The longest common prefix of the input paths is discarded from each path.

  • Files in CVS Attic and RCS directories are treated as though the "Attic/" or "RCS/" portion of the path were absent. This usually restores the history of files that were deleted.

  • Permissions on all fileops related to a particular file will be controlled by the permissions on the corresponding master. If the executable bit on the master is on, all its fileops will have 100755 permissions; otherwise 100644. GNU RCS and CVS preserve the executable bit but does not guarantee full mode bits (it relies on filesystem defaults/umask). The "permissions" field in CVS-NT masters is not interpreted.

  • A set of file operations is coalesced into a changeset if either (a) they all share the same commitid, or (b) all have no commitid but identical change comments, authors, and modification dates within the window defined by the time-fuzz parameter. Unlike some other exporters, no attempt is made to derive changesets from shared tags.

  • Commits are emitted in a topologically correct order that is biased toward increasing commit time, but strict time order is not guaranteed. Severe clock skew can force the ordering to deviate from time order.

  • CVS tags turn into git lightweight tags. Resolution is done in two passes. In the first pass any that can unambiguously associated with as single changeset (that is, all files modified in the same changeset ) are quietly attached to it. In a second pass, partial matches are accepted with a warning, with later attachment points favored over earlier ones.

  • The HEAD branch is renamed to 'master'.

  • Other tag and branch names are sanitized to be legal for git; the characters ~^\*? are removed.

  • Since .cvsignore files have a syntax (mostly) upward-compatible with that of .gitignore files, they’re renamed. In order to simulate the default ignore behavior of CVS, those defaults are prepended to root .cvsignore blobs renamed to .gitignore, and a root .gitignore containing the defaults is generated if no such blobs exist. Leading # characters on .cvsignore lines are escaped so git won’t misunderstand them as comment leaders, and spaces in these lines are mapped to line feeds because CVS treats spaces as pattern separators.

  • The CVS-NT extension keywords "owner", "group", "deltatype", "kopt", "permissions", "mergepoint", "filename", "hardlinks", and "username" are all ignored. So is the "access" keyword. (The CVSNT "mergepoint" field records merge tracking metadata (the last merge base) and does not correspond to an actual merge commit, so it is not translated into Git history.)

See the later section on RCS/CVS LIMITATIONS for more information on edge cases and conversion problems.

This program does not depend on any of the CVS metadata held outside the individual content files (e.g. under CVSROOT).

This program treats all RCS/CVS metadata and file contents of the source CVS or RCS repository, and their filenames, as uninterpreted byte sequences to be passed through to the git conversion without re-encoding. In particular, it makes no attempt to fix up line endings (Unix \n vs, Windows \r\n vs. Macintosh \r), nor does it know about what repository filenames might collide with special filenames on any given platform.

CVS $-keywords in the masters are not interpreted or expanded; this prevents corruption of binary content.

This program treats change comments as uninterpreted byte sequences to be passed through to the git conversion without change or re-encoding. If you need to re-encode (e.g, from Latin-1 to UTF-8) or remap CVS version IDs to something useful, use cvs-fast-export in conjunction with reposurgeon(1).

OPTIONS

-h

Display usage summary.

-w 'fuzz'

Set the timestamp fuzz factor for identifying patch sets in seconds. The default is 300 seconds. This option is irrelevant for changesets with commitids.

-c

Don’t trust commit-IDs; match by ordinary metadata. Will be useful if you have something like a CVS-NT repository in which per-file commits were made in such a way that the cliques don’t have matching IDs.

-g

Generate a picture of the commit graph in the DOT markup language used by the graphviz tools, rather than fast-exporting. With two -g options, tag each report on a CVS component commit with a prefix character; + for an added file, | for a changed one, - for a deleted one. With the -N option, don’t collate - graph the CVS forest instead.

-l

Warnings normally go to standard error. This option, which takes a filename, allows you to redirect them to a file. Convenient with the -p option.

-a

Dump a list of author IDs found in the repository, rather than fast-exporting.

-A 'authormap'

Apply an author-map file to the attribution lines. Each line must be of the form

ferd = Ferd J. Foonly <foonly@foo.com> America/Chicago

and will be applied to map the Unix username 'ferd' to the DVCS-style user identity specified after the equals sign. The timezone field (after > and whitespace) is optional and (if present) is used to set the timezone offset to be attached to the date; acceptable formats for the timezone field are anything that can be in the TZ environment variable, including a [+-]hhmm offset. Whitespace around the equals sign is stripped. Lines beginning with a # or not containing an equals sign are silently ignored.

-R 'revmap'

Write a revision map to the specified argument filename. Each line of the revision map consists of three whitespace-separated fields: a filename, an RCS revision number, and the mark of the commit to which that filename-revision pair was assigned. Doesn’t work with -g.

-v

Show verbose progress messages mainly of interest to developers.

-q

Run quietly, suppressing warning messages (including those about absence of commitids). Meant to be used with cvsconvert(1), which does its own correctness checking.

-T

Force deterministic dates for regression testing. Each patchset will have a monotonic-increasing attributed date computed from its mark in the output stream - the mark value times the commit time window times two.

-r, --reposurgeon

Ship a header comment for reposurgeon to use, declaring "cvs" if a .cvsignore master or CVSROOT file has been seen and "rcs" otherwise. Emit for each commit a list of the CVS file:revision pairs composing it as a bzr-style commit property named "cvs-revisions". From version 2.12 onward, reposurgeon(1) can interpret these and use them as hints for reference-lifting.

-V

Emit the program version and exit.

-e 'remote'

Exported branch names are prefixed with refs/remotes/'remote' instead of refs/heads, making the import appear to come from the named remote.

-p

Enable progress reporting. This also dumps statistics (elapsed time for several points in the conversion run.

-P

Normally cvs-fast-export will skip any filename presented as an argument or on stdin that does not end with the RCS/CVS extension ",v", and will also ignore a pathname containing the string CVSROOT (this avoids annoyances when running from or above a top-level CVS directory). A strict reading of RCS allows masters without the ,v extension. This option sets promiscuous mode, disabling both checks.

-i 'date'

Enable incremental-dump mode. Only commits with a date after that specified by the argument are emitted. Disables inclusion of default ignores. Each branch root in the incremental dump is decorated with git-stream magic which, when interpreted in context of a live repository, will connect that branch to any branch of the same name. The date is expected to be RFC3339 conformant (e.g. yy-mm-ddThh:mm:ssZ) or else an integer Unix time in seconds.

EXAMPLE

Typical invocations look like this:

cvs-fast-export . >stream.fi
find . | cvs-fast-export >stream.fi

Your cvs-fast-export distribution should also supply cvssync(1), a tool for fetching CVS masters from a remote repository. Using them together will look something like this:

cvssync anonymous@cvs.savannah.gnu.org:/sources/groff groff
find groff | cvs-fast-export >groff.fi

Progress reporting can be reassuring if you expect a conversion to run for some time. It will animate completion percentages as the conversion proceeds and display timings when done.

The cvs-fast-export suite contains a wrapper script called 'cvsconvert' that is useful for running a conversion and automatically checking its content against the CVS original.

RCS/CVS LIMITATIONS

Translating RCS/CVS repositories to the generic DVCS model expressed by import streams is not merely difficult and messy, there are weird RCS/CVS cases that cannot be correctly translated at all. cvs-fast-export will try to warn you about these cases rather than silently producing broken or incomplete translations, but there be dragons. We recommend some precautions under SANITY CHECKING.

CVS-NT and versions of GNU CVS after 1.12 (2004) added a changeset commit-id to file metadata. Older sections of CVS history without these are vulnerable to various problems caused by clock skew between clients; this used to be relatively common for multiple reasons, including less pervasive use of NTP clock synchronization. cvs-fast-export will warn you ("commits before this date lack commitids") when it sees such a section in your history. When it does, these caveats apply:

  • If timestamps of commits in the CVS repository were not stable enough to be used for ordering commits, the emitted history may be ordered incorrectly despite being topologically consistent.

  • If the timestamp order of different files crosses the revision order within the commit-matching time window, the collation and ordering of commits may be wrong.

One more property affected by commitids is the stability of old changesets under incremental dumping. Under a CVS implementation issuing commitids, new CVS commits are guaranteed not to change cvs-fast-export’s changeset derivation from a previous history; thus, updating a target DVCS repository with incremental dumps from a live CVS installation will work. Even if older portions of the history do not have commitids, conversions will be stable. This stability guarantee is lost if you are using a version of CVS that does not issue commitids.

Also note that a CVS repository has to be completely reanalyzed even for incremental dumps; thus, processing time and memory requirements will rise with the total repository size even when the requested reporting interval of the incremental dump is small.

These problems cannot be fixed in cvs-fast-export; they are inherent to CVS.

REQUIREMENTS AND PERFORMANCE

The program’s transient RAM requirement is proportional to the total volume of all attributions, comments, and blobs in the repository.

On stock PC hardware in 2020, cvs-fast-export achieves processing speeds upwards of 64K CVS commits per minute on real repositories.

LIMITATIONS

Branches occurring in only a subset of the analyzed masters are not correctly resolved; instead, an entirely disjoint history will be created containing the branch revisions and all parents back to the root.

This program does the equivalent of cvs -kb when checking out masters, not performing any $-keyword expansion at all. This differs from CVS’s default behavior on checkout but has the advantage that binary files will never be clobbered. It has the disadvantage that the data in $-headers is not reliable; at best you’ll get the unexpanded version of the $-cookie, at worst you might get the committer/timestamp information for when the master was originally checked in, rather than when it was last checked out. It’s good practice to remove all dollar cookies as part of post-conversion cleanup.

CVS vendor branches are a source of trouble, and this program will ship a warning when it sees them. Sufficiently strange combinations of imports and local modifications will translate badly, producing incorrect content on master and elsewhere. Some of these problems can be prevented by ensuring that the last (latest) commit in your repository is on trunk, rather than a branch.

Some other CVS exporters try, or have tried, to deduce changesets from shared tags even when comment metadata doesn’t match perfectly. This one does not; the designers judge that to trip over too many pathological CVS tagging cases.

CVSNT is supported, but CVSNT extension fields are ignored.

SANITY CHECKING

After conversion, it is good practice to do the following verification steps:

  1. If you ran the conversion directly with cvs-fast-export rather than using cvsconvert, use diff(1) with the -r option to compare a CVS head checkout with a checkout of the converted repository. The only differences you should see are those due to RCS keyword expansion, .cvsignore lifting, and manifest mismatches due to CVS not tracking file deaths quite correctly. If this is not true, you may have found a bug in cvs-fast-export; please report it with a copy of the CVS repo.

  2. Examine the translated repository with gitk(1) looking (in particular) for misplaced tags or branch joins. Often these can be manually repaired with little effort using reposurgeon(1). These flaws do 'not' necessarily imply bugs in cvs-fast-export; they may simply indicate previously undetected malformations in the CVS history. However, reporting them may help improve cvs-fast-export.

A more comprehensive sanity check is described in Repository Editing and Conversion With Reposurgeon; browse it for more.

RETURN VALUE

0 if all files were found and successfully converted, 1 otherwise.

WARNING AND ERROR MESSAGES

Most of the messages cvs-fast-export emits are self-explanatory. Here are a few that aren’t. Where it says "check head", be sure to sanity-check against the head revision.

vendor branch detected

Source tree contains vendored commits. Check head carefully, branch content might have landed on trunk. If this happens, you may be able to prevent it by adding a dummy commit to trunk.

null branch name, probably from a damaged Attic file

The code was unable to deduce a name for a branch and tried to export a null pointer as a name. The branch is given the name "null". It is likely this history will need repair.

discarding dead untagged branch

Analysis found a CVS branch with no tag consisting entirely of dead revisions. These cannot have been visible in the archival state of the CVS at conversion time; it is possible they may have been visible as branch content at some point in the repository’s past, but without an identifying tag that state is impossible to reconstruct.

warning - putting xxx rev yyy on unnamed branch zzz off www

A CVS branch with a live revision lacks a head label. A label with "UNNAMED-<file>-<branch>" will be generated for the exported branch.

ignoring empty branch

An explicitly named CVS branch has no live revisions; it is dropped from the conversion output.

warning - xxx newer than yyy

Early in analysis of a CVS master file, time sort order of its deltas doesn’t match the topological order defined by the revision numbers. The most likely cause of this is clock skew between clients in very old CVS versions. The program will attempt to correct for this by tweaking the revision date of the out-of-order commit to be that of its parent, but this may not prevent other time-skew errors later in analysis.

no commitids before zzz

Emitted when CVS history predates the introduction of commit IDs; earlier sections can be vulnerable to clock skew affecting collation.

warning - tag <name> requires synthetic commit (partial match)

A tag spans mixed base revisions and no existing commit matches the full checkout state, so a synthetic commit was emitted to materialize the tag state. This is only done for pathological cases.

tag <name> attached by partial match
tag <name> attached by fallback

These messages are a result of incomplete tagging - a tag that is not present in all masters. Tags attached this way should be considered dubious.

warning - invalid branch header "<text>"

The RCS admin header contains a malformed "branch" value; it is ignored.

warning - branch header <rev> is not a vendor branch

The RCS admin header specifies a branch that is not a vendor branch (not 1.1.x with odd x). Vendor-branch heuristics may be applied instead.

warning - vendor branch <rev> not found; falling back to heuristic
warning - no usable vendor branch hint; falling back to heuristic
warning - multiple vendor branches with no branch header; falling back to heuristic

CVS does not record enough information to identify which vendor branch should be treated as HEAD in these cases. The converter chooses a best-effort vendor branch and emits this warning.

warning - non-standard vendor branch <rev> with trunk commits is ambiguous

Vendor branches other than 1.1.1 combined with trunk commits are not unambiguously interpretable from the RCS data. Conversion output may diverge from historical CVS behavior.

REPORTING BUGS

Report bugs to Eric S. Raymond <esr@thyrsus.com>. Please read "Reporting bugs in cvs-fast-export" before shipping a report. The project page itself is at http://catb.org/~esr/cvs-fast-export

SEE ALSO

rcs(1), cvs(1), cvssync(1), cvsconvert(1), reposurgeon(1), cvs2git(1).