Re: I found your name in Linux's /usr/lib/magic

Daniel Quinlan (quinlan@proton.pathname.com)
Sat, 19 Oct 96 14:51 PDT

Eric S Raymond <esr@snark.thyrsus.com> writes:

> Let's do it!

Christos, I assume you want to help. Right? You're the one with the
real experience with file(1).

We might also want to bring in Greg Roelofs <newt@uchicago.edu>. He
designed the magic for PNG and might have a few good ideas.

> If you'll ship me a copy of your magic-number selection guidelines,
> I'll take a shot at drafting the RFC, then bounce it back to you for
> edits and expansion.

I couldn't find the darned file (just moved all of my files to
pathname.com and I am still sorting them out), but I remember most of
the guidelines, so here they are.

The tricky thing is that we have to store both the offset and the
magic in any registry. And we have to take that into account for an
RFC.

We also need to determine the relationship between the registry and
the RFC, guidelines for acceptance into the registry, and stuff we'll
recommend in the RFC, but not require for RFC comformance.

I'm going to walk through /etc/magic and see if there are any other
things we want to include. I'll also pick out a few bad magics and
tell I think they are bad.

Dan

------- start of cut text --------------
GLOSSARY

Primary magic
magic numbers used to recognize the type of data stream or file
Secondary magic
magic numbers used to determine characteristics of the data stream
of file such as version, sub-type, etc.

REQUIREMENTS

These are required for compatibility with both traditional and POSIX
versions of file(1).

1. Magic must be located at a predictable offset from the beginning
of the data stream.

2. Primary magic must be located in a single block of bytes and should
be byte-aligned.

RECOMMENDATIONS

1. Length of magic. 4 is enough, but I think we should consider jumping
to 8 for future entries.

Better yet, reserve a CLEAR range of magic (at offset 0) for 8 byte
magic. I'll look at /etc/magic and figure out what range would be
best.

2. Magic should not composed solely of text characters, especially
English text. It should be a randomish-mix of text and non-text
characters, preferrably including characters in each of these ranges:
ASCII, Latin-1 extensions to ASCII, and everything else.

A program to generate random hex numbers is useful. (hpa wrote one)

3. Don't use null characters in magic.

4. Offsets close to the beginning of the file are better. 0 is best.
Having magic located in the first disk block (if on a fixed disk)
is ideal, meaning it should be in the first 512 bytes.

5. The primary magic should be constant for a particular application
or data type. Magic should not change depending on the version of
an application or data type. Secondary magic should be used for that.
------- end ----------------------------