The Art of Unix Programming

This book and its on-line version are distributed under the terms of the Creative Commons Attribution-NoDerivs 1.0 license, with the additional proviso that the right to publish it on paper for sale or other for-profit use is reserved to Pearson Education, Inc. A reference copy of this license may be found at http://creativecommons.org/licenses/by-nd/1.0/legalcode.

AIX, AS/400, DB/2, OS/2, System/360, MVS, VM/CMS, and IBM PC are trademarks of IBM. Alpha, DEC, VAX, HP-UX, PDP, TOPS-10, TOPS-20, VMS, and VT-100 are trademarks of Compaq. Amiga and AmigaOS are trademarks of Amiga, Inc. Apple, Macintosh, MacOS, Newton, OpenDoc, and OpenStep are trademarks of Apple Computers, Inc. ClearCase is a trademark of Rational Software, Inc. Ethernet is a trademark of 3COM, Inc. Excel, MS-DOS, Microsoft Windows and PowerPoint are trademarks of Microsoft, Inc. Java. J2EE, JavaScript, NeWS, and Solaris are trademarks of Sun Microsystems. SPARC is a trademark of SPARC international. Informix is a trademark of Informix software. Itanium is a trademark of Intel. Linux is a trademark of Linus Torvalds. Netscape is a trademark of AOL. PDF and PostScript are trademarks of Adobe, Inc. UNIX is a trademark of The Open Group.

The photograph of Ken and Dennis in Chapter 2 appears courtesy of Bell Labs/Lucent Technologies.

The epigraph on the Portability chapter is from the Bell System Technical Journal, v57 #6 part 2 (July-Aug. 1978) pp. 2021-2048 and is reproduced with the permission of Bell Labs/Lucent Technologies.

Revision History
Revision 1.019 September 2003esr
This is the content that went to Addison-Wesley's printers.
Revision 0.45 February 2003esr
Release for public review.
Revision 0.322 January 2003esr
First eighteen-chapter draft. Manuscript walkthrough at Chapter 12. Limited release for early reviewers.
Revision 0.22 January 2003esr
First manuscript walkthrough at Chapter 7. Released to Dmitry Kirsanov at AW production.
Revision 0.116 November 2002esr
First DocBook draft, fifteen chapters. Languages rewritten to incorporate lots of feedback. Transparency, Modularity, Multiprogramming, Configuration, Interfaces, Documentation, and Open Source chapters released. Shipped to Mark Taub at AW.
Revision 0.01999esr
Public HTML draft, first four chapters only.

To Ken Thompson and Dennis Ritchie, because you inspired me.

Table of Contents

Preface
Who Should Read This Book
How to Use This Book
Related References
Conventions Used in This Book
Our Case Studies
Author's Acknowledgements
I. Context
1. Philosophy
Culture? What Culture?
The Durability of Unix
The Case against Learning Unix Culture
What Unix Gets Wrong
What Unix Gets Right
Open-Source Software
Cross-Platform Portability and Open Standards
The Internet and the World Wide Web
The Open-Source Community
Flexibility All the Way Down
Unix Is Fun to Hack
The Lessons of Unix Can Be Applied Elsewhere
Basics of the Unix Philosophy
Rule of Modularity: Write simple parts connected by clean interfaces.
Rule of Clarity: Clarity is better than cleverness.
Rule of Composition: Design programs to be connected with other programs.
Rule of Separation: Separate policy from mechanism; separate interfaces from engines.
Rule of Simplicity: Design for simplicity; add complexity only where you must.
Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do.
Rule of Transparency: Design for visibility to make inspection and debugging easier.
Rule of Robustness: Robustness is the child of transparency and simplicity.
Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust.
Rule of Least Surprise: In interface design, always do the least surprising thing.
Rule of Silence: When a program has nothing surprising to say, it should say nothing.
Rule of Repair: Repair what you can — but when you must fail, fail noisily and as soon as possible.
Rule of Economy: Programmer time is expensive; conserve it in preference to machine time.
Rule of Generation: Avoid hand-hacking; write programs to write programs when you can.
Rule of Optimization: Prototype before polishing. Get it working before you optimize it.
Rule of Diversity: Distrust all claims for one true way.
Rule of Extensibility: Design for the future, because it will be here sooner than you think.
The Unix Philosophy in One Lesson
Applying the Unix Philosophy
Attitude Matters Too
2. History
Origins and History of Unix, 1969-1995
Genesis: 1969–1971
Exodus: 1971–1980
TCP/IP and the Unix Wars: 1980-1990
Blows against the Empire: 1991-1995
Origins and History of the Hackers, 1961-1995
At Play in the Groves of Academe: 1961-1980
Internet Fusion and the Free Software Movement: 1981-1991
Linux and the Pragmatist Reaction: 1991-1998
The Open-Source Movement: 1998 and Onward
The Lessons of Unix History
3. Contrasts
The Elements of Operating-System Style
What Is the Operating System's Unifying Idea?
Multitasking Capability
Cooperating Processes
Internal Boundaries
File Attributes and Record Structures
Binary File Formats
Preferred User Interface Style
Intended Audience
Entry Barriers to Development
Operating-System Comparisons
VMS
MacOS
OS/2
Windows NT
BeOS
MVS
VM/CMS
Linux
What Goes Around, Comes Around
II. Design
4. Modularity
Encapsulation and Optimal Module Size
Compactness and Orthogonality
Compactness
Orthogonality
The SPOT Rule
Compactness and the Strong Single Center
The Value of Detachment
Software Is a Many-Layered Thing
Top-Down versus Bottom-Up
Glue Layers
Case Study: C Considered as Thin Glue
Libraries
Case Study: GIMP Plugins
Unix and Object-Oriented Languages
Coding for Modularity
5. Textuality
The Importance of Being Textual
Case Study: Unix Password File Format
Case Study: .newsrc Format
Case Study: The PNG Graphics File Format
Data File Metaformats
DSV Style
RFC 822 Format
Cookie-Jar Format
Record-Jar Format
XML
Windows INI Format
Unix Textual File Format Conventions
The Pros and Cons of File Compression
Application Protocol Design
Case Study: SMTP, the Simple Mail Transfer Protocol
Case Study: POP3, the Post Office Protocol
Case Study: IMAP, the Internet Message Access Protocol
Application Protocol Metaformats
The Classical Internet Application Metaprotocol
HTTP as a Universal Application Protocol
BEEP: Blocks Extensible Exchange Protocol
XML-RPC, SOAP, and Jabber
6. Transparency
Studying Cases
Case Study: audacity
Case Study: fetchmail's -v option
Case Study: GCC
Case Study: kmail
Case Study: SNG
Case Study: The Terminfo Database
Case Study: Freeciv Data Files
Designing for Transparency and Discoverability
The Zen of Transparency
Coding for Transparency and Discoverability
Transparency and Avoiding Overprotectiveness
Transparency and Editable Representations
Transparency, Fault Diagnosis, and Fault Recovery
Designing for Maintainability
7. Multiprogramming
Separating Complexity Control from Performance Tuning
Taxonomy of Unix IPC Methods
Handing off Tasks to Specialist Programs
Pipes, Redirection, and Filters
Wrappers
Security Wrappers and Bernstein Chaining
Slave Processes
Peer-to-Peer Inter-Process Communication
Problems and Methods to Avoid
Obsolescent Unix IPC Methods
Remote Procedure Calls
Threads — Threat or Menace?
Process Partitioning at the Design Level
8. Minilanguages
Understanding the Taxonomy of Languages
Applying Minilanguages
Case Study: sng
Case Study: Regular Expressions
Case Study: Glade
Case Study: m4
Case Study: XSLT
Case Study: The Documenter's Workbench Tools
Case Study: fetchmail Run-Control Syntax
Case Study: awk
Case Study: PostScript
Case Study: bc and dc
Case Study: Emacs Lisp
Case Study: JavaScript
Designing Minilanguages
Choosing the Right Complexity Level
Extending and Embedding Languages
Writing a Custom Grammar
Macros — Beware!
Language or Application Protocol?
9. Generation
Data-Driven Programming
Case Study: ascii
Case Study: Statistical Spam Filtering
Case Study: Metaclass Hacking in fetchmailconf
Ad-hoc Code Generation
Case Study: Generating Code for the ascii Displays
Case Study: Generating HTML Code for a Tabular List
10. Configuration
What Should Be Configurable?
Where Configurations Live
Run-Control Files
Case Study: The .netrc File
Portability to Other Operating Systems
Environment Variables
System Environment Variables
User Environment Variables
When to Use Environment Variables
Portability to Other Operating Systems
Command-Line Options
The -a to -z of Command-Line Options
Portability to Other Operating Systems
How to Choose among the Methods
Case Study: fetchmail
Case Study: The XFree86 Server
On Breaking These Rules
11. Interfaces
Applying the Rule of Least Surprise
History of Interface Design on Unix
Evaluating Interface Designs
Tradeoffs between CLI and Visual Interfaces
Case Study: Two Ways to Write a Calculator Program
Transparency, Expressiveness, and Configurability
Unix Interface Design Patterns
The Filter Pattern
The Cantrip Pattern
The Source Pattern
The Sink Pattern
The Compiler Pattern
The ed pattern
The Roguelike Pattern
The ‘Separated Engine and Interface’ Pattern
The CLI Server Pattern
Language-Based Interface Patterns
Applying Unix Interface-Design Patterns
The Polyvalent-Program Pattern
The Web Browser as a Universal Front End
Silence Is Golden
12. Optimization
Don't Just Do Something, Stand There!
Measure before Optimizing
Nonlocality Considered Harmful
Throughput vs. Latency
Batching Operations
Overlapping Operations
Caching Operation Results
13. Complexity
Speaking of Complexity
The Three Sources of Complexity
Tradeoffs between Interface and Implementation Complexity
Essential, Optional, and Accidental Complexity
Mapping Complexity
When Simplicity Is Not Enough
A Tale of Five Editors
ed
vi
Sam
Emacs
Wily
The Right Size for an Editor
Identifying the Complexity Problems
Compromise Doesn't Work
Is Emacs an Argument against the Unix Tradition?
The Right Size of Software
III. Implementation
14. Languages
Unix's Cornucopia of Languages
Why Not C?
Interpreted Languages and Mixed Strategies
Language Evaluations
C
C++
Shell
Perl
Tcl
Python
Java
Emacs Lisp
Trends for the Future
Choosing an X Toolkit
15. Tools
A Developer-Friendly Operating System
Choosing an Editor
Useful Things to Know about vi
Useful Things to Know about Emacs
The Antireligious Choice: Using Both
Special-Purpose Code Generators
yacc and lex
Case Study: Glade
make: Automating Your Recipes
Basic Theory of make
make in Non-C/C++ Development
Utility Productions
Generating Makefiles
Version-Control Systems
Why Version Control?
Version Control by Hand
Automated Version Control
Unix Tools for Version Control
Runtime Debugging
Profiling
Combining Tools with Emacs
Emacs and make
Emacs and Runtime Debugging
Emacs and Version Control
Emacs and Profiling
Like an IDE, Only Better
16. Reuse
The Tale of J. Random Newbie
Transparency as the Key to Reuse
From Reuse to Open Source
The Best Things in Life Are Open
Where to Look?
Issues in Using Open-Source Software
Licensing Issues
What Qualifies as Open Source
Standard Open-Source Licenses
When You Need a Lawyer
IV. Community
17. Portability
Evolution of C
Early History of C
C Standards
Unix Standards
Standards and the Unix Wars
The Ghost at the Victory Banquet
Unix Standards in the Open-Source World
IETF and the RFC Standards Process
Specifications as DNA, Code as RNA
Programming for Portability
Portability and Choice of Language
Avoiding System Dependencies
Tools for Portability
Internationalization
Portability, Open Standards, and Open Source
18. Documentation
Documentation Concepts
The Unix Style
The Large-Document Bias
Cultural Style
The Zoo of Unix Documentation Formats
troff and the Documenter's Workbench Tools
TeX
Texinfo
POD
HTML
DocBook
The Present Chaos and a Possible Way Out
DocBook
Document Type Definitions
Other DTDs
The DocBook Toolchain
Migration Tools
Editing Tools
Related Standards and Practices
SGML
XML-DocBook References
Best Practices for Writing Unix Documentation
19. Open Source
Unix and Open Source
Best Practices for Working with Open-Source Developers
Good Patching Practice
Good Project- and Archive-Naming Practice
Good Development Practice
Good Distribution-Making Practice
Good Communication Practice
The Logic of Licenses: How to Pick One
Why You Should Use a Standard License
Varieties of Open-Source Licensing
MIT or X Consortium License
BSD Classic License
Artistic License
General Public License
Mozilla Public License
20. Futures
Essence and Accident in Unix Tradition
Plan 9: The Way the Future Was
Problems in the Design of Unix
A Unix File Is Just a Big Bag of Bytes
Unix Support for GUIs Is Weak
File Deletion Is Forever
Unix Assumes a Static File System
The Design of Job Control Was Badly Botched
The Unix API Doesn't Use Exceptions
ioctl2 and fcntl2 Are an Embarrassment
The Unix Security Model May Be Too Primitive
Unix Has Too Many Different Kinds of Names
File Systems Might Be Considered Harmful
Towards a Global Internet Address Space
Problems in the Environment of Unix
Problems in the Culture of Unix
Reasons to Believe
A. Glossary of Abbreviations
B. References
C. Contributors
D. Rootless Root
Editor's Introduction
Master Foo and the Ten Thousand Lines
Master Foo and the Script Kiddie
Master Foo Discourses on the Two Paths
Master Foo and the Methodologist
Master Foo Discourses on the Graphical User Interface
Master Foo and the Unix Zealot
Master Foo Discourses on the Unix-Nature
Master Foo and the End User

List of Figures

2.1. The PDP-7.
3.1. Schematic history of timesharing.
4.1. Qualitative plot of defect count and density vs. module size.
4.2. Caller/callee relationships in GIMP with a plugin loaded.
6.1. Screen shot of audacity.
6.2. Screen shot of kmail.
6.3. Main window of a Freeciv game.
8.1. Taxonomy of languages.
11.1. The xcalc GUI.
11.2. Screen shot of the original Rogue game.
11.3. The Xcdroast GUI.
11.4. Caller/callee relationships in a polyvalent program.
13.1. Sources and kinds of complexity.
18.1. Processing structural documents.
18.2. Present-day XML-DocBook toolchain.
18.3. Future XML-DocBook toolchain with FOP.

List of Tables

8.1. Regular-expression examples.
8.2. Introduction to regular-expression operations.
14.1. Language choices.
14.2. Summary of X Toolkits.

List of Examples

5.1. Password file example.
5.2. A .newsrc example.
5.3. A fortune file example.
5.4. Basic data for three planets in a record-jar format.
5.5. An XML example.
5.6. A .INI file example.
5.7. An SMTP session example.
5.8. A POP3 example session.
5.9. An IMAP session example.
6.1. An example fetchmail -v transcript.
6.2. An SNG Example.
7.1. The pic2graph pipeline.
8.1. Glade Hello, World.
8.2. A sample m4 macro.
8.3. A sample XSLT program.
8.4. Taxonomy of languages — the pic source.
8.5. Synthetic example of a fetchmailrc.
8.6. RSA implementation using dc.
9.1. Example of fetchmailrc syntax.
9.2. Python structure dump of a fetchmail configuration.
9.3. copy_instance metaclass code.
9.4. Calling context for copy_instance.
9.5. ascii usage screen.
9.6. Desired output format for the star table.
9.7. Master form of the star table.
10.1. A .netrc example.
10.2. X configuration example.
18.1. groff1 markup example.
18.2. man markup example.
19.1. tar archive maker production.