I maintain a man-page-to-DocBook converter, doclifter. A side effect of this program is that it serves as a validator for the correctness and portability of the markup used on Unix manual pages. I test it by running it against all the manual pages in a full Xubuntu 16.10 with some extras; there are 14203 of these on my development machine, of which 876 already have DocBook masters. It converts 12944 (97.13%) of the remaining 13327 into valid XML-DocBook.

Most of the remaining 2.87% of errors happen because groff(1) and its kin have weak-to-nonexistent validity checking. Often, doclifter fails because of outright errors in macro usage that groff does not catch. Sometime it fails on constructions that are legal but perverse. Very occasionally it throws an error because a man page is correct but has a structure that cannot be translated to DocBook. I keep a database of patches for such problems, and periodically try to push fix patches out to the manual-page maintainers.

(These are lower numbers and a higher error rate than in some previous reports because I now use i3 rather than GNOME or KDE. Many of the userland manuals that I used to check are no longer installed where my test procedure can see them. Becauw bad markup tends to be concentrated in the older manual pages of core tools, a larger random sample pulls down the error rate.)

Even if you do not care about DocBook, this cleanup work benefits all third-party manual page viewers, including the GNOME and KDE documentation browsers; groff constructions that confuse doclifter are very likely to produce visible problems on these.

The table below is a listing of the 310 (2.33%) pages on which doclifter fails, but the failure can be prevented with a fix patch to the manual page source. 73 pages (0.55%) remain intractable, generally due to markup problems more severe than a point patch can address. I am working with the individual projects responsible to get those cleaned up.

It is likely that you are reading this because you have received email telling you that patches are associated with your name or list address. Please consider incorporating them, or equivalents, in your next release. Also, please write back and tell me what you plan to do so I can keep my database up-to-date.

If you are not already considering it, please think about moving the documentation masters of your project to DocBook (or some format from which you can generate DocBook). If everybody moved to using DocBook as a common exchange format, it would become much easier to support unified browsing of all system documentation with Web-like hypertext capabilities, automatic indexing, and rich search facilities.

Tools to generate man pages, HTML, and PostScript from DocBook files are open-source and generally available. My program, doclifter, should make moving your manual-page masters to DocBook a fairly painless process.

Many major open source projects (including the Linux kernel, the Linux Documentation Project, X.org, GNOME, KDE, and FreeBSD) have already moved to DocBook or are in the process of doing so.

(Individual entries for accepted patches are no longer shown.)

Summary: 279 patches pending, 596 accepted, 0 rejected.

Status codes are as follows:


n No response yet.
p Maintainer has informed me that this is fixed in the masters, but I have not seen the fix yet.
y Accepted
r Rejected
s Superseded (page lifts correctly without the patch)
[0-9]+ number of mailings sent
b Address is blocked

Problem codes are explained after the table.


Patch:Problem code:Status:
_build_buildd_libcaca-0.99.beta18_ruby_.3caca
I F W M1n
_build_buildd_libcaca-0.99.beta18_caca_codec_.3caca
I F W M1n
_build_buildd_libcaca-0.99.beta18_caca_driver_.3caca
I F W M1n
_build_libcaca-8li34L_libcaca-0.99.beta19_caca_.3caca
I F W M1n
_build_libcaca-8li34L_libcaca-0.99.beta19_ruby_.3caca
I F W M1n
_build_libcaca-8li34L_libcaca-0.99.beta19_caca_codec_.3caca
I F W M1n
_build_libcaca-8li34L_libcaca-0.99.beta19_caca_driver_.3caca
I F W M1n
acl.5
Ib
admin.1posix
C2n
afmtodit.1
ks
american.5
english.5
b
analog.1
C Z2n
AnyEvent::FAQ.3pm
W1n
arp.7
p7n
as.1
Z y2n
audtool.1
Un
B::Hooks::EndOfScope::PP.3pm
B::Hooks::EndOfScope::XS.3pm
W1n
bash.1
L1n
btcflash.8
J2n
bzfs.6
ob
bzr.1
J X2n
cdparanoia.1
L2n
chmoddic.1
B CbA
chroot.2
E L2n
claws-mail.1
L1n
cmake.1
Bn
CPAN::Meta::History::Meta_1_2.3pm
CPAN::Meta::History::Meta_1_3.3pm
CPAN::Meta::History::Meta_1_4.3pm
tn
co.1
ident.1
ob
codepage.1
CbA
compose.1
edit.1
* y2n
CURLOPT_PROXY_SERVICE_NAME.3
CURLOPT_SERVICE_NAME.3
Ap
cvs.1
L4n
dash.1
sh.1
sh.distrib.1
J2n
dcut.1
R2n
Parse::DebControl::Error.3pm
W yb
devnag.1
Js
dh_install.1
i yb
dhcp-options.5
l1n
dkms.8
X J2n
dmcs.1
mcs.1
gmcs.1
L A1n
dosbox.1
Lb
dump-acct.8
U2n
dv2dt.1
C2n
dvipdf.1
font2c.1
R3n
dvipdfm.1
dvipdfmx.1
k1n
edgepaint.1
W1n
editres.1
Ip
e2fsck.8
o3n
e2image.8
J2n
efax.1
J u g3n
eqn.1
geqn.1
s
erb.1
erb2.1.1
irb.1
irb2.1.1
ruby2.1.1
ri2.1.1
a1n
ethtool.8
P2n
exiv2.1
Lp
extractres.1
R3n
f2py.1
f2py2.7.1
C2n
faked-sysv.1
faked-tcp.1
faked.1
fakeroot-sysv.1
fakeroot-tcp.1
fakeroot.1
r2n
fig2ps2tex.1
R3n
formail.1
lockfile.1
procmail.1
procmailex.5
procmailrc.5
procmailsc.5
K2n
fsck.ext2.8
fsck.ext3.8
fsck.ext4.8
fsck.ext4dev.8
op
ftm.7
D2n
fuzzyflakes.6x
C2n
gacutil.1
cli-gacutil.1
N2n
gdiffmk.1
W ks
genisoimage.1
o2n
getafm.1
R1n
gftodvi.1
IbA
gource.1
Cn
gpm-types.7
J C2n
grap.1
Qp
grn.1
Jp
groff.1
s
groff_char.7
J ks
groff_man.7
Js
groff_tmac.5
s
groffer.1
s
grog.1
s
gropdf.1
ks
gs.1
ghostscript.1
CbA
gtbl.1
tbl.1
*s
gthumb.1
Lb
gvcolor.1
Cb
gvpack.1
C *1n
hddtemp.8
L *1n
hfsutils.1
H J7n
hosts_access.5
hosts.allow.5
hosts.deny.5
hosts_options.5
Ib
html2text.1
Cb
html2textrc.5
Xb
hypertorus.6x
C2n
icclink.1
E3n
icctrans.1
L2n
tifficc.1
E3n
icedax.1
A I2n
ilbmtoppm.1
L2n
includeres.1
R3n
init-d-script.5
Lp
intel_panel_fitter.1
E2n
inxi.1
C ln
IO::WrapTie.3pm
W C2n
ipppd.8
L3n
iptables-extensions.8
J L U1n
ipv6calc.8
ipv6logconv.8
ipv6logstats.8
Ls
irda.7
c2n
ispell.1
buildhash.1
munchlist.1
findaffix.1
tryaffix.1
icombine.1
ijoin.1
C2n
ispell-wrapper.1
C2n
lamd.1
2n
lam.7
LAM.7
L2n
lam-helpfile.5
I2n
lastcomm.1
Ib
lftp.1
I2n
libcaca-authors.3caca
W2n
libcaca-canvas.3caca
W J2n
libcaca-env.3caca
W L2n
libcaca-font.3caca
W J2n
libcaca-ruby.3caca
W2n
libcaca-tutorial.3caca
W2n
libpng.3
S Jp
libtiff.3tiff
I2n
linkicc.1
transicc.1
Ln
list_audio_tracks.1
W2n
ln.1
j3n
locate.findutils.1
U2n
lpr.1
U7n
lvm.8
lvmconfig.8
lvm-config.8
lvm-dumpconfig.8
lvreduce.8
lvresize.8
lvcreate.8
lvconvert.8
U1n
vgchange.8
v1n
makeindex.1
J2n
mathspic.1
J W tp
mawk.1
R2n
mdoc.7
X1n
mke2fs.8
mkfs.ext2.8
mkfs.ext3.8
mkfs.ext4.8
mkfs.ext4dev.8
C1n
mkjobtexmf.1
L y2n
mlocate.db.5
Jp
mmcli.8
X1n
mono.1
cli.1
J X2n
mono-config.5
X2n
mpirun.1
mpirun.lam.1
L2n
mtools.5
mtools.conf.5
X2n
mtr.8
Jb
namespaces.7
I1n
nautilus.1
L1n
nautilus-connect-server.1
L2n
netpbm.1
J2n
netstat.8
C zb
nfsmount.conf.5
C Yn
nmcli.1
C L f1n
nsenter.1
L1n
nsgmls.1
C I2n
ntfs-3g.secaudit.8
C2n
ntfs-3g.usermap.8
C2n
nvidia-settings.1
I x Y2n
nvidia-smi.1
I f Y2n
ocsp.1ssl
U1n
ode.1
e2n
omfonts.1
W2n
openvt.1
open.1
L3n
orbd.1
W y Y2n
orca.1
s2n
pam_systemd.8
I1n
pandoc.1
Jb
pax.1posix
W J L2n
pbmclean.1
pnmcomp.1
pnmnorm.1
pnmpad.1
pnmquant.1
pnmremap.1
pnmtotiff.1
pgmnorm.1
ppmcolors.1
ppmnorm.1
ppmntsc.1
ppmquant.1
ppmrainbow.1
ppmtogif.1
ppmtoxpm.1
tifftopnm.1
C2n
pbget.1
pbput.1
pbputs.1
W2n
pbmtextps.1
C2n
pcap-filter.7
I2n
pdfroff.1
Xp
pdcp.1
Jn
pdsh.1
pdsh.bin.1
Jn
pidgin.1
T2n
plot.1
plotfont.1
W2n
pnmhisteq.1
ppmcie.1
ppmlabel.1
sbigtopgm.1
R2n
pnmpaste.1
X2n
pnmtotiffcmyk.1
C2n
pnmtofiasco.1
e2n
policytool.1
W y2n
ppdcfile.5
l1n
preconv.1
s
prlimit.1
U1n
proc.5
o h2n
procfs.5
I2n
pstree.1
pstree.x11.1
Cb
pstops.1
RbA
ps2epsi.1
j7n
ps2pdfwr.1
R2n
psnup.1
J3n
ptx.1
j7n
pylint.1
J1n
pytest.1
C2n
qsub.1posix
I2n
queue.3
LIST_EMPTY.3
LIST_ENTRY.3
LIST_FIRST.3
LIST_FOREACH.3
LIST_HEAD.3
LIST_HEAD_INITIALIZER.3
LIST_INIT.3
LIST_INSERT_BEFORE.3
LIST_INSERT_AFTER.3
LIST_INSERT_HEAD.3
LIST_REMOVE.3
LIST_NEXT.3
SLIST_EMPTY.3
SLIST_ENTRY.3
SLIST_FIRST.3
SLIST_FOREACH.3
SLIST_HEAD.3
SLIST_HEAD_INITIALIZER.3
SLIST_INIT.3
SLIST_INSERT_AFTER.3
SLIST_INSERT_HEAD.3
SLIST_NEXT.3
SLIST_REMOVE_HEAD.3
SLIST_REMOVE.3
STAILQ_CONCAT.3
STAILQ_EMPTY.3
STAILQ_ENTRY.3
STAILQ_FIRST.3
STAILQ_FOREACH.3
STAILQ_HEAD.3
STAILQ_HEAD_INITIALIZER.3
STAILQ_INIT.3
STAILQ_INSERT_AFTER.3
STAILQ_INSERT_HEAD.3
STAILQ_INSERT_TAIL.3
STAILQ_NEXT.3
STAILQ_REMOVE_HEAD.3
STAILQ_REMOVE.3
TAILQ_CONCAT.3
TAILQ_EMPTY.3
TAILQ_ENTRY.3
TAILQ_FIRST.3
TAILQ_FOREACH.3
TAILQ_FOREACH_REVERSE.3
TAILQ_HEAD.3
TAILQ_HEAD_INITIALIZER.3
TAILQ_INIT.3
TAILQ_INSERT_AFTER.3
TAILQ_INSERT_BEFORE.3
TAILQ_INSERT_HEAD.3
TAILQ_INSERT_TAIL.3
TAILQ_LAST.3
TAILQ_NEXT.3
TAILQ_PREV.3
TAILQ_REMOVE.3
TAILQ_SWAP.3
Xs
rake2.1.1
L1n
rcsfile.5
d2n
refer.1
s
regulatory.bin.5
w2n
request-key.8
q1n
request-key.conf.5
q1n
rhythmbox-client.1
L2n
ri.1
L J1n
rlog.1
L2n
rlwrap.1
readline-editor.1
J2n
rmid.1
W y2n
rmiregistry.1
W y2n
roff.7
s
rotatelogs.8
L * <b
ruby.1
ruby1.9.1.1
Lb
s3.4
Ip
sane-apple.5
Lp
sane-lexmark.5
L op
sane-mustek_pp.5
L op
scapy.1
ln
screen.1
L Ip
SDL_Init.3
L3n
SDL_CDPlayTracks.3
n2n
see.1
run-mailcap.1
print.1
C2n
semanage-user.8
semanage-boolean.8
semanage-module.8
semanage-permissive.8
B1n
semanage-fcontext.8
B U1n
setcap.8
C2n
sg_xcopy.8
l1n
sgmlspl.1
L2n
slapd.conf.5
L IbA
slapd-config.5
L IbA
slapo-constraint.5
LbA
software-properties-gtk.1
WbA
spam.1
C2n
rb.1
rx.1
rz.1
sb.1
sx.1
sz.1
e7n
tar.1
C Vp
tc-prio.8
tc-htb.8
tc-cbq.8
tc-cbq-details.8
C2n
tcpd.8
I3n
tcpdmatch.8
I2n
tcpdump.8
ln
tek2plot.1
W2n
test.1
[.1
C O2n
TIFFGetField.3tiff
I2n
TIFFmemory.3tiff
b2n
tnameserv.1
W y2n
tidy.1
W m1bA
ttf2afm.1
*p
tune2fs.8
C7n
unrar.1
unrar-nonfree.1
C1n
upstart-events.7
Ib
usb-creator-gtk.8
Wb
xz.1
xzcat.1
unxz.1
unlzma.1
lzcat.1
lzma.1
C2n
updatedb.conf.5
Jp
uuencode.1posix
I2n
winedbg.1
msiexec.1
Jp
winemaker.1
U1n
xdvipdfmx.1
kp
xlogo.1
Ip
XML::LibXML::Pattern.3pm
W1n
XML::LibXML::Reader.3pm
W1n
XML::LibXML::RegExp.3pm
W1n
XML::LibXML::XPathExpression.3pm
W1n
Xserver.1
I2n
xterm.1
L I3n
zic.8
I2n
zip.1
Jp
zipinfo.1
*2n
zipcloak.1
zipnote.1
zipsplit.1
Ip
zlib.3
C2n

Error codes:

A
Dot or single-quote at start of line turns it into a garbage command. This is a serious error; some lines of your page get silently lost when it is formatted.
B
( ) notation for mandatory parts of command syntax should be { }.
C
Broken command synopsis syntax. This may mean you're using a construction in the command synopsis other than the standard [ ] | { }, or it may mean you have running text in the command synopsis section (the latter is not technically an error, but most cases of it are impossible to translate into DocBook markup), or it may mean the command syntax fails to match the description.
D
Non-break space prevents doclifter from incorrectly interpreting "Feature Test" as end of function synopsis.
E
My translator trips over a useless command in list markup.
F
This looks like a build intermediate that was included in the shipped manual pages by mistake
H
Renaming SYNOPSIS because either (a) third-party viewers and translators will try to interpret it as a command synopsis and become confused, or (b) it actually needs to be named "SYNOPSIS" with no modifier for function protoypes to be properly recognized.
I
Use of low-level troff hackery to set special indents or breaks can't be translated. The page will have rendering faults in HTML, and probably also under third-party man page browsers such as Xman, Rosetta, and the KDE help browser. This patch eliminates .br, .ta, .ti, .ce, .in, and \h in favor of requests like .RS/.RE that have structural translations.
J
Ambiguous or invalid backslash. This doesn't cause groff a problem. but it confuses doclifter and may confuse older troff implementations.
K
Renaming stock man macros throws warnings in doclifter and is likely to cause failures on third-party manual browsers. Please redo this page so it uses distinct names for the custom macros.
L
List syntax error. This means .IP, .TP or .RS/.RE markup is garbled. Common causes include .TP just before a section header, .TP entries with tags but no bodies, and mandoc lists with no trailing .El. These confuse doclifter, and may also mess up stricter man-page browsers like Xman and Rosetta.
M
Synopsis section name changed to avoid triggering command-synopsis parsing.
N
Extraneous . at start of line.
O
Command-line options described are not actually implemented.
P
Removed unnecessary \c that confused the doclifter parser.
Q
Spelling error or typo.
R
.ce markup can't be structurally translated, and is likely to cause rendering flaws in generated HTML.
S
DEPRECATED: in function syntax cannot be translated. Also, the code and examples need to be marked up better.
T
Junk at the beginning of the manual page.
U
Unbalanced group in command synopsis. You probably forgot to open or close a [ ] or { } group properly.
V
Missing body content in list trips up doclifter and is likely to cause rendering problems in other viewers. I have been able to fill in what was missing except for what should be under TAR_LONGLINK_100.
W
Missing or garbled name section. The most common form of garbling is a missing - or extra -. Or your manual page may have been generated by a tool that doesn't emit a NAME section as it should. Or your page may add running text such as a version or authorship banner. These problems make it impossible to lift the page to DocBook. They can also confuse third-party manpage browsers and some implementations of man -k.
X
Unknown or invalid macro. That is, one that does not fit in the macro set that the man page seems to be using. This is a serious error; it often means part of your text is being lost or rendered incorrectly.
Y
I have been unable to identify an upstream maintainer for this Ubuntu/Debian package, and am notifying the generic "Maintainer" address in the package. Please forward appropriately. Also fix the package metadata so it identifies the upstream maintainers.
Z
Your Synopsis is exceptionally creative. Unfortunately, that means it cannot be translated to structural markup even when things like running-text inclusions have been moved elswhere.
a
Incorrect use of BSD list syntax confused doclifter's parser.
b
\c is an obscure feature; third-party viewers sometimes don't intepret it. Plain \ is safer.
c
Function declarations had to be modified in order to fit into the DocBook DTD. This is not an error in troff usage, but it reduces the quality of the HTML that can be generated from this page through the DocBook toolchain.
d
.eo/.ec and complex tab-stop hackery can't be translated to XML/HTML and are almost certain to confuse third-party readers such as Rosetta and Xman.
e
Macro definitions in the NAME section confuse doclifter and are likely to screw up third-party man viewers with their own parsers.
f
Presentation-level use of SS could not be structurally translated. I changed lower-level instances to .TP or .B.
g
Use of a double quote for inch measurements often confuses people who aren't from the Anglosphere.
h
.in arguments were swapped.
i
Non-ASCII character in document synopsis can't be parsed.
j
Parenthesized comments in command synopsis. This is impossible to translate to DocBook.
k
Misspelled macro name.
l
Invalid font escape.
m
Contains a request or escape that is outside the portable subset that can be rendered by non-groff viewers such as the KDE and GNOME help browsers.
n
C function syntax has extra paren.
o
TBL markup not used where it should be. Tables stitched together with .ta or list requests can't be lifted to DocBook and will often choke third-party viewers such as TKMan, XMan, Rosetta, etc.
p
Synopsis was incomplete and somewhat garbled.
q
The .ul request used here can't be translated into document structure. I put these files in a hanging list, which can be.
r
I supplied a missing mail address. Without it, the .TP at the end of the authors list was ill-formed.
s
Changed page to use the .URL macro now preferred on man(7).
t
Synopsis has to be immediately after NAME section for DocBook translation to work.
u
Use local definitions of .EX/.EE or .DS/.DE to avoid low-level troff requests in the page body. There are plans to add these to groff man; in the interim, this patch adds a compatible definition to your page.
v
Invalid option format - cannot have optional prefix in token, it confuses anything trying to do syntactic parsing.
w
.SS markup in name section seriously confuses parsing, and sections don't follow standard naming conventions.
x
Syntax had to be rearranged because of an options callout. This is still excessively complicated; third-party man-page viewers are likely to choke on it.
y
This page was generated from some sort of non-man markup. Please fix the upstream markup so that it generates a well-formed manual page with the indicated corrections.
z
Garbled or missing text near .SS tags. It's not clear to me what's going on here, but .SS tags on adjacent lines defeat any attempt to parse the markup. I have inserted text lines indicating that something needs to be written here.