I maintain a man-page-to-DocBook converter, doclifter. A side effect of this program is that
it serves as a validator for the correctness and portability of the
markup used on Unix manual pages. I test it by running it against all
the manual pages in a full Xubuntu 14.10 with some extras; there are 11394 of these on my development
machine, of which 962 already have DocBook masters. It converts 10063
(96.46%) of the remaining 10432 into valid XML-DocBook.
Most of the remaining 3.54% of errors happen because groff(1)
and its kin have weak-to-nonexistent validity checking. Often,
doclifter fails because of outright errors in macro usage that groff
does not catch. Sometime it fails on constructions that are legal but
perverse. Very occasionally it throws an error because a man page is
correct but has a structure that cannot be translated to DocBook. I
keep a database of patches for such problems, and periodically
try to push fix patches out to the manual-page maintainers.
Even if you do not care about DocBook, this cleanup work benefits
all third-party manual page viewers, including the GNOME and KDE
documentation browsers; groff constructions that confuse doclifter
are very likely to produce visible problems on these.
The table below is a listing of the 357 (3.42%) pages on which
doclifter fails, but the failure can be prevented with a fix patch to
the manual page source. 12 pages (0.12%) remain intractable,
generally due to markup problems more severe than a point patch can
address. I am working with the individual projects responsible to get
those cleaned up.
It is likely that you are reading this because you have received
email telling you that patches are associated with your name or list
address. Please consider incorporating them, or equivalents, in your
next release. Also, please write back and tell me what you plan to do
so I can keep my database up-to-date.
If you are not already considering it, please think about moving
the documentation masters of your project to DocBook (or some format
from which you can generate DocBook). If everybody moved to using
DocBook as a common exchange format, it would become much easier to
support unified browsing of all system documentation with Web-like
hypertext capabilities, automatic indexing, and rich search
facilities.
Tools to generate man pages, HTML, and PostScript from DocBook
files are open-source and generally available. My program, doclifter,
should make moving your manual-page masters to DocBook a fairly
painless process.
Many major open source projects (including the Linux kernel, the
Linux Documentation Project, X.org, GNOME, KDE, and FreeBSD) have
already moved to DocBook or are in the process of doing so.
(Individual entries for accepted patches are no longer shown.)
Summary: 288 patches pending, 548 accepted, 0 rejected.
Status codes are as follows:
| n |
No response yet. |
| p |
Maintainer has informed me that this is fixed in the masters, but
I have not seen the fix yet. |
| y |
Accepted |
| r |
Rejected |
| s |
Superseded (page lifts correctly without the patch) |
| [0-9]+ |
number of mailings sent |
| b |
Address is blocked |
Problem codes are explained after the table.
Error codes:
- 0
- Function declarations had to be modified in order to fit into
the DocBook DTD. This is not an error in troff usage, but it
reduces the quality of the HTML that can be generated from this page
through the DocBook toolchain.
- 1
- .MT was not properly closed by .ME.
- 2
- Removed unnecessary \c that confused the doclifter parser.
- 3
- Use of .RS/RE or man/mandoc list markup to produce indentation in
examples and screenshots makes structural translation
impossible. This bug is also likely to confuse third-party
man-page browsers.
- 4
- \c is an obscure feature; third-party viewers sometimes don't
intepret it. Plain \ is safer.
- 5
- Two-digit year in .Dd macro.
- 6
- Presentation-level use of SS could not be structurally
translated. I changed lower-level instances to .TP.
- 7
- This page wins an award for exceptionally creative and perverse
abuse of list syntax.
- 8
- C function syntax has extra paren.
- 9
- I replaced '-->' with a troff right arrow, which doclifter will
translate properly to an XML/HTML arrow glyph.
- A
- Dot or single-quote at start of line turns it into a garbage command.
This is a serious error; some lines of your page get silently lost
when it is formatted.
- B
- ( ) notation for mandatory parts of command syntax should be { }.
- C
- Broken command synopsis syntax. This may mean you're using a
construction in the command synopsis other than the standard
[ ] | { }, or it may mean you have running text in the command synopsis
section (the latter is not technically an error, but most cases of it
are impossible to translate into DocBook markup), or it may mean the
command syntax fails to match the description.
- D
- Non-break space prevents doclifter from incorrectly interpreting
"Feature Test" as end of function synopsis.
- E
- My translator trips over a useless command in list markup.
- F
- This looks like a build intermediate that was included in the
shipped manual pages by mistake
- G
- Spurious trailing .CE
- H
- Renaming SYNOPSIS because either (a) third-party viewers and
translators will try to interpret it as a command synopsis and become
confused, or (b) it actually needs to be named "SYNOPSIS" with no
modifier for function protoypes to be properly recognized.
- I
- Use of low-level troff hackery to set special indents or breaks can't
be translated. The page will have rendering faults in HTML, and
probably also under third-party man page browsers such as Xman,
Rosetta, and the KDE help browser. This patch eliminates .br, .ta, .ti,
.ce, .in, and \h in favor of requests like .RS/.RE that have
structural translations.
- J
- Ambiguous or invalid backslash. This doesn't cause groff a problem.
but it confuses doclifter and may confuse older troff implementations.
- K
- Renaming stock man macros throws warnings in doclifter and is likely
to cause failures on third-party manual browsers. Please redo this
page so it uses distinct names for the custom macros.
- L
- List syntax error. This means .IP, .TP or .RS/.RE markup is garbled.
Common causes include .TP just before a section header, .TP entries
with tags but no bodies, and mandoc lists with no trailing .El.
These confuse doclifter, and may also mess up stricter man-page
browsers like Xman and Rosetta.
- M
- Synopsis section name changed to avoid triggering command-synopsis
parsing.
- N
- Extraneous . at start of line.
- O
- Wrong order of arguments in .Dd macro.
- Q
- Spelling error or typo.
- R
- .ce markup can't be structurally translated, and is likely
to cause rendering flaws in generated HTML.
- S
- DEPRECATED: in function syntax cannot be translated. Also, the
code and examples need to be marked up better.
- T
- Junk at the beginning of the manual page.
- U
- Unbalanced group in command synopis. You probably forgot
to open or close a [ ] or { } group properly.
- V
- Missing body content in list trips up doclifter and is likely to
cause rendering problems in other viewers. I have been able to fill
in what was missing except for what should be under TAR_LONGLINK_100.
- W
- Missing or garbled name section. The most common form of garbling
is a missing - or extra -. Or your manual page may have been generated
by a tool that doesn't emit a NAME section as it should. Or your page
may add running text such as a version or authorship banner. These
problems make it impossible to lift the page to DocBook. They
can also confuse third-party manpage browsers and some implementations
of man -k.
- X
- Unknown or invalid macro. That is, one that does not fit in the
macro set that the man page seems to be using. This is a serious
error; it often means part of your text is being lost or rendered
incorrectly.
- Y
- I have been unable to identify an upstream maintainer for this
Ubuntu/Debian package, and am notifying the generic "Maintainer"
address in the package. Please forward appropriately. Also
fix the package metadata so it identifies the upstream maintainers.
- Z
- Your Synopsis is exceptionally creative. Unfortunately, that means
it cannot be translated to structural markup even when things like
running-text inclusions have been moved elswhere.
- b
- Attempt to interpolate unknown string.
- d
- .eo/.ec and complex tab-stop hackery can't be translated to XML/HTML
and are almost certain to confuse third-party readers such as
Rosetta and Xman.
- e
- Macro definitions in the NAME section confuse doclifter and are
likely to screw up third-party man viewers with their own parsers.
- g
- Use of a double quote for inch measurements often confuses people
who aren't from the Anglosphere.
- h
- .in arguments were swapped.
- i
- Non-ASCII character in document synopsis can't be parsed.
- j
- Parenthesized comments in command synopsis. This is impossible
to translate to DocBook.
- m
- Contains a request or escape that is outside the portable subset that
can be rendered by non-groff viewers such as the KDE and GNOME help
browsers.
- o
- TBL markup not used where it should be. Tables stitched together
with .ta or list requests can't be lifted to DocBook and will often
choke third-party viewers such as TKMan, XMan, Rosetta, etc.
- p
- Synopsis was incomplete and somewhat garbled.
- q
- Unused macro causes parsing problems.
- r
- I supplied a missing mail address. Without it, the .TP at the end of the
authors list was ill-formed.
- s
- Changed page to use the .URL macro now preferred on man(7).
- t
- Synopsis has to be immediately after NAME section for DocBook
translation to work.
- u
- Use local definitions of .EX/.EE or .DS/.DE to avoid low-level troff
requests in the page body. There are plans to add these to groff man;
in the interim, this patch adds a compatible definition to your page.
- w
- .SS markup in name section seriously confuses parsing, and sections
don't follow standard naming conventions.
- x
- Syntax had to be rearranged because of an options callout.
This is still excessively complicated; third-party man-page
viewers are likely to choke on it.
- y
- This page was generated from some sort of non-man markup. Please
fix the upstream markup so that it generates a well-formed
manual page with the indicated corrections.
- z
- Garbled or missing text near .SS tags. It's not clear to me what's
going on here, but .SS tags on adjacent lines defeat any attempt
to parse the markup. I have inserted text lines indicating that
something needs to be written here.