cvsps — fast export the contents of a CVS repository
cvsps [-h] [-z fuzz] [-g] [-s patchset] [-a author] [-f file] [-d date1 [-d date2]] [-l text] [-b branch] [-n] [-r tag [-r tag]] [-p directory] [-A authormap] [-R revmap] [-v] [-t] [--summary-first] [--diff-opts option string] [--debuglvl bitmask] [-Z compression] [--root cvsroot] [-q] [--fast-export] [--convert-ignores] [--reposurgeon] [-i] [-k] [-T] [-V] [module-path]
cvsps tries to group the per-file commits and tags in a CVS project repository into per-project changeset commits with common metadata, in the style of Subversion and later version-control systems.
The older, default reporting mode is designed for humans to look at and only includes commit metadata, not file contents. The newer --fast-export mode emits a git-style fast-import stream which can be consumed by the importers for git and other version-control systems.
There are several different ways to invoke cvsps:
In fast-export mode:
cvsps does not parse the $HOME/.cvsrc file. If you have a $HOME/.cvspass file (such as is normally created by cvs login) it will be searched for a password.
Apply an author-map file to the attribution lines. Each line must be of the form
ferd = Ferd J. Foonly <foonly@foo.com> America/Chicago
and will be applied to map the Unix username ferd to the DVCS-style user identity specified after the equals sign. The timezone field (after > and whitespace) is optional and (if present) is used to set the timezone offset to be attached to the date; acceptable formats for the timezone field are anything that can be in the TZ environment variable, including a [+-]hhmm offset. Whitespace around the equals sign is stripped. Lines beginning with a # or not containing an equals sign are silently ignored. -R revmap:: Write a revision map to the specified argument filename. Each line of the revision map consists of three whitespace-separated fields: a filename, an RCS revision number, and the mark of the commit to which that filename-revision pair was assigned. -v:: show very verbose parsing messages. -t:: show some brief memory usage statistics. --summary-first:: when multiple patchset diffs are being generated, put the patchset summary for all patchsets at the beginning of the output. --diffs-opts option string:: send a custom set of options to diff, for example to increase the number of context lines, or change the diff format. --debuglvl bitmask:: enable various debug output channels. -Z compression:: A value 1-9 which specifies amount of compression. A value of 0 disables compression. --root cvsroot:: Override the setting of CVSROOT (overrides working directory and environment). -i:: Incremental export. Each commit with no ancestor gets a from pointer name. When importing to an existing repository, this will attach each such commit as a child of the last commit on $BRANCH in the existing repository. -k:: Kill keywords: will extract files with -kk from the CVS archive to avoid noisy changesets. -T:: Force deterministic dates for regression testing. Each patchset will have a monotonic-increasing attributed date computed from its patchset ID. --fast-export:: Emit the report as a git import stream. --convert-ignores:: Convert ..cvsignore files to .gitignore files. --reposurgeon:: Emit for each commit a list of the CVS file:revision pairs composing it as a bzr-style commit property named "cvs-revisions". From version 2.12 onward, reposurgeon can interpret these and use them as hints for reference-lifting. -V:: Emit the program version and exit. module-path:: Operate on the specified module. If this option is not given, either the CVSROOT environment variable must be set to point directly at the module or cvsps must be run in a checkout directory or repository module subdirectory.
The old --cvs-direct option, and the -u and -x options having to do with local caching are gone; cvsps now always runs in direct mode and without caching. However, the -u command-line switch has been left in place and tied to an informative termination message so that users of older versions of calling scripts (such as git-cvsimport) will get an error message rather than silent misbehavior.
The ancestor-branch tracking enabled by the -A option in some previous versions of cvsps never worked properly and has been removed. The new -A option does author-name mapping, following the normal convention for CVS and SVN translation tools.
Tags are fundamentally file at a time in CVS, but like everything else, it would be nice to imagine that they are repository at a time. The approach cvsps takes is that a tag is assigned to a patchset. The meaning of this is that after this patchset, every revision of every file is after the tag (and conversely, before this patchset, at least one file is still before the tag).
However, there are two kinds of inconsistent (or funky) tags that can be created with older versions of CVS, even when following best practices for CVS. Newer version do an up-to-dateness check that prevents these.
The first is what is called a funky tag. A funky tag is one where there are patchsets which are chronologically (and thus by patchset id) earlier than the tag, but are tagwise after. These tags will be marked as FUNKY in the Tag: section of the cvsps output. When a funky tag is specified as one of the -r arguments, there are some number of patchsets which need to be considered out of sequence. In this case, the patchsets themselves will be labeled FUNKY and will be processed correctly.
The second is called an invalid tag. An invalid tag is a tag where there are patchsets which are chronologically (and thus by patchset id) earlier than the tag, but which have members which are tagwise both before, and after the tag, in the same patchset. If an INVALID tag is specified as one of the -r arguments, cvsps will flag each member of the affected patchsets as before or after the tag and the patchset summary will indicate which members are which, and diffs will be generated accordingly.
These may be better explained by examples. The easiest test case for this is two developers, starting in a consistent state. (a):: developer 1 changes file A(sub1) to A(sub2) and commits, creating patchset P(sub1) chronologically earlier, thus with a lower patchset id. (b):: developer 2 changes file B(sub1) to B(sub2) and commits, creating patchset P(sub2) chronologically later, thus higher patchset id. (c):: developer 2 B does fInotfR do "cvs update", so does not get A(sub2) in working directory and creates a "tag" T(sub1)
A checkout of T(sub1) should have A(sub1) and B(sub2) and there is no "patchset" that represents this. In other words, if we label patchset P(sub2) with the tag there are earlier patchsets which need to be disregarded.
An "invalid" tag can be generated with a similar testcase, except:
In other words, if we label patchset P(sub2) with the tag there are earlier patchsets which need to be partially disregarded.
cvsps parses an rc file at startup. This file should be located in ~/.cvsps/cvspsrc. The file should contain arguments, in the exact syntax as the command line, one per line. If an argument takes a parameter, the parameter should be on the same line as the argument.
Dates are reported in localtime, except that fast-export mode reports UTC. Three formats are accepted for the -d option:
As a contrived example:
$ cvsps -d '2004-05-01T00:00:00' -d '2004/07/07 12:00:00'
Translating CVS repositories to the generic DVCS model expressed by import streams is not merely difficult and messy, there are weird CVS cases that cannot be correctly translated at all. cvsps will try to warn you about these cases rather than silently producing broken or incomplete translations.
CVS tags are per-file, not per revision. If developers are not careful in their use of tagging, it can be impossible to associate a tag with any of the changesets that cvsps resolves.
CVS-NT and versions of GNU CVS after 1.12 (2004) added a changeset commit-id to file metadata. Older sections of CVS history without these are vulnerable to various problems caused by clock skew between clients; this used to be relatively common for multiple reasons, including less pervasive use of NTP clock synchronization. cvsps will warn you when it sees such a section in your history. When it does, these caveats apply:
These problems cannot be fixed in cvsps; they are inherent to CVS.
cvsps may be unable to communicate with some extremely ancient CVS server versions (this was a sacrifice for much faster performance). It is unlikely any of these are still in service; if you trip over one, mirror the repository locally with rsync or cvssuck and proceed.
CVS branches that were created but never got any file commits will not be reported in the translated import stream.
If any files were ever imported more than once (e.g., import of more than one vendor release), the head revision might contain incorrect content. cvsps issues a warning when this might occur.
If a CVS branch symbol could not be resolved to a translated commit, cvsps will issue a warning to that effect.
Report bugs to Eric S. Raymond <esr@thyrsus.com>. The project page is at http://catb.org/~esr/cvsps