This commit was manufactured by cvs2svn to create tag 'FreeBSD-release/1.0'.
[unix-history] / usr.bin / file / file.1
CommitLineData
15637ed4 1.TH FILE 1 "Copyright but distributable"
78ed81a3 2.\# file.1,v 1.3 1993/06/10 00:38:06 jtc Exp
15637ed4
RG
3.SH NAME
4.I file
5\- determine file type
6.SH SYNOPSIS
7.B file
8[
78ed81a3 9.B \-c
15637ed4
RG
10]
11[
78ed81a3 12.B \-z
13]
14[
15.B \-L
16]
17[
18.B \-f
15637ed4
RG
19namefile ]
20[
78ed81a3 21.B \-m
15637ed4
RG
22magicfile ]
23file ...
24.SH DESCRIPTION
25.I File
26tests each argument in an attempt to classify it.
27There are three sets of tests, performed in this order:
28filesystem tests, magic number tests, and language tests.
29The
30.I first
31test that succeeds causes the file type to be printed.
32.PP
33The type printed will usually contain one of the words
34.B text
35(the file contains only ASCII characters and is
36probably safe to read on an ASCII terminal),
37.B executable
38(the file contains the result of compiling a program
39in a form understandable to some \s-1UNIX\s0 kernel or another),
40or
41.B data
42meaning anything else (data is usually `binary' or non-printable).
43Exceptions are well-known file formats (core files, tar archives)
44that are known to contain binary data.
45When modifying the file
46.I /etc/magic
47or the program itself,
48.B "preserve these keywords" .
49People depend on knowing that all the readable files in a directory
50have the word ``text'' printed.
78ed81a3 51Don't do as Berkeley did \- change ``shell commands text''
15637ed4
RG
52to ``shell script''.
53.PP
54The filesystem tests are based on examining the return from a
78ed81a3 55.IR stat (2)
15637ed4
RG
56system call.
57The program checks to see if the file is empty,
58or if it's some sort of special file.
59Any known file types appropriate to the system you are running on
78ed81a3 60(sockets, symbolic links, or named pipes (FIFOs) on those systems that
61implement them)
15637ed4
RG
62are intuited if they are defined in
63the system header file
78ed81a3 64.BR sys/stat.h .
15637ed4
RG
65.PP
66The magic number tests are used to check for files with data in
67particular fixed formats.
68The canonical example of this is a binary executable (compiled program)
78ed81a3 69.B a.out
15637ed4 70file, whose format is defined in
78ed81a3 71.B a.out.h
15637ed4 72and possibly
78ed81a3 73.B exec.h
15637ed4
RG
74in the standard include directory.
75These files have a `magic number' stored in a particular place
76near the beginning of the file that tells the \s-1UNIX\s0 operating system
77that the file is a binary executable, and which of several types thereof.
78The concept of `magic number' has been applied by extension to data files.
79Any file with some invariant identifier at a small fixed
80offset into the file can usually be described in this way.
81The information in these files is read from the magic file
78ed81a3 82.I /etc/magic.
15637ed4
RG
83.PP
84If an argument appears to be an
85.SM ASCII
86file,
87.I file
88attempts to guess its language.
89The language tests look for particular strings (cf \fInames.h\fP)
90that can appear anywhere in the first few blocks of a file.
91For example, the keyword
78ed81a3 92.B .br
15637ed4
RG
93indicates that the file is most likely a troff input file,
94just as the keyword
78ed81a3 95.B struct
15637ed4
RG
96indicates a C program.
97These tests are less reliable than the previous
98two groups, so they are performed last.
99The language test routines also test for some miscellany
100(such as
101.I tar
102archives) and determine whether an unknown file should be
103labelled as `ascii text' or `data'.
104.PP
105Use
78ed81a3 106.B \-m
15637ed4
RG
107.I file
108to specify an alternate file of magic numbers.
109.PP
110The
78ed81a3 111.B \-z
112tries to look inside compressed files.
113.PP
114The
115.B \-c
15637ed4
RG
116option causes a checking printout of the parsed form of the magic file.
117This is usually used in conjunction with
78ed81a3 118.B \-m
15637ed4
RG
119to debug a new magic file before installing it.
120.PP
121The
78ed81a3 122.B \-f
15637ed4
RG
123.I namefile
124option specifies that the names of the files to be examined
125are to be read (one per line) from
126.I namefile
127before the argument list.
128Either
129.I namefile
130or at least one filename argument must be present;
131to test the standard input, use ``-'' as a filename argument.
78ed81a3 132.PP
133The
134.B \-L
135option causes symlinks to be followed, as the like-named option in
136.IR ls (1).
15637ed4
RG
137.SH FILES
138.I /etc/magic
139\- default list of magic numbers
140.SH SEE ALSO
78ed81a3 141.IR magic (5)
15637ed4
RG
142\- description of magic file format.
143.br
144.IR Strings (1), " od" (1)
145\- tools for examining non-textfiles.
146.SH STANDARDS CONFORMANCE
147This program is believed to exceed the System V Interface Definition
148of FILE(CMD), as near as one can determine from the vague language
149contained therein.
150Its behaviour is mostly compatible with the System V program of the same name.
151This version knows more magic, however, so it will produce
152different (albeit more accurate) output in many cases.
153.PP
154The one significant difference
155between this version and System V
156is that this version treats any white space
157as a delimiter, so that spaces in pattern strings must be escaped.
158For example,
159.br
160>10 string language impress\ (imPRESS data)
161.br
162in an existing magic file would have to be changed to
163.br
164>10 string language\e impress (imPRESS data)
78ed81a3 165.br
166In addition, in this version, if a pattern string contains a backslash,
167it must be escaped. For example
168.br
1690 string \ebegindata Andrew Toolkit document
170.br
171in an existing magic file would have to be changed to
172.br
1730 string \e\ebegindata Andrew Toolkit document
174.br
15637ed4 175.PP
78ed81a3 176SunOS releases 3.2 and later from Sun Microsystems include a
177.IR file (1)
178command derived from the System V one, but with some extensions.
15637ed4 179My version differs from Sun's only in minor ways.
78ed81a3 180It includes the extension of the `&' operator, used as,
15637ed4
RG
181for example,
182.br
183>16 long&0x7fffffff >0 not stripped
15637ed4
RG
184.SH MAGIC DIRECTORY
185The magic file entries have been collected from various sources,
186mainly USENET, and contributed by various authors.
187Ian Darwin (address below) will collect additional
188or corrected magic file entries.
189A consolidation of magic file entries
190will be distributed periodically.
191.PP
192The order of entries in the magic file is significant.
193Depending on what system you are using, the order that
194they are put together may be incorrect.
195If your old
196.I file
197command uses a magic file,
198keep the old magic file around for comparison purposes
199(rename it to
200.IR /etc/magic.orig ).
201.SH HISTORY
202There has been a
203.I file
204command in every UNIX since at least Research Version 6
205(man page dated January, 1975).
206The System V version introduced one significant major change:
207the external list of magic number types.
208This slowed the program down slightly but made it a lot more flexible.
209.PP
210This program, based on the System V version,
211was written by Ian Darwin without looking at anybody else's source code.
212.PP
213John Gilmore revised the code extensively, making it better than
214the first version.
215Geoff Collyer found several inadequacies
216and provided some magic file entries.
217The program has undergone continued evolution since.
78ed81a3 218.SH AUTHOR
15637ed4
RG
219Written by Ian F. Darwin, UUCP address {utzoo | ihnp4}!darwin!ian,
220Internet address ian@sq.com,
221postal address: P.O. Box 603, Station F, Toronto, Ontario, CANADA M4Y 2L8.
222.PP
78ed81a3 223Altered by Rob McMahon, cudcv@warwick.ac.uk, 1989, to extend the `&' operator
224from simple `x&y != 0' to `x&y op z'.
225.PP
226Altered by Guy Harris, guy@auspex.com, 1993, to:
227.RS
228.PP
229put the ``old-style'' `&'
230operator back the way it was, because 1) Rob McMahon's change broke the
231previous style of usage, 2) the SunOS ``new-style'' `&' operator,
232which this version of
233.I file
234supports, also handles `x&y op z', and 3) Rob's change wasn't documented
235in any case;
236.PP
237put in multiple levels of `>';
15637ed4 238.PP
78ed81a3 239put in ``beshort'', ``leshort'', etc. keywords to look at numbers in the
240file in a specific byte order, rather than in the native byte order of
241the process running
242.IR file .
243.RE
244.PP
245Changes by Ian Darwin and various authors including
246Christos Zoulas (christos@ee.cornell.edu), 1990-1992.
247.SH LEGAL NOTICE
248Copyright (c) Ian F. Darwin, Toronto, Canada,
2491986, 1987, 1988, 1989, 1990, 1991, 1992, 1993.
250.PP
251This software is not subject to and may not be made subject to any
252license of the American Telephone and Telegraph Company, Sun
253Microsystems Inc., Digital Equipment Inc., Lotus Development Inc., the
254Regents of the University of California, The X Consortium or MIT, or
255The Free Software Foundation.
256.PP
257This software is not subject to any export provision of the United States
258Department of Commerce, and may be exported to any country or planet.
15637ed4
RG
259.PP
260Permission is granted to anyone to use this software for any purpose on
261any computer system, and to alter it and redistribute it freely, subject
262to the following restrictions:
263.PP
2641. The author is not responsible for the consequences of use of this
265software, no matter how awful, even if they arise from flaws in it.
266.PP
2672. The origin of this software must not be misrepresented, either by
268explicit claim or by omission. Since few users ever read sources,
269credits must appear in the documentation.
270.PP
2713. Altered versions must be plainly marked as such, and must not be
272misrepresented as being the original software. Since few users
273ever read sources, credits must appear in the documentation.
274.PP
2754. This notice may not be removed or altered.
276.PP
277A few support files (\fIgetopt\fP, \fIstrtok\fP)
278distributed with this package
279are by Henry Spencer and are subject to the same terms as above.
280.PP
281A few simple support files (\fIstrtol\fP, \fIstrchr\fP)
282distributed with this package
283are in the public domain; they are so marked.
284.PP
285The files
286.I tar.h
287and
288.I is_tar.c
289were written by John Gilmore from his public-domain
290.I tar
291program, and are not covered by the above restrictions.
292.SH BUGS
78ed81a3 293There must be a better way to automate the construction of the Magic
294file from all the glop in Magdir. What is it?
295Better yet, the magic file should be compiled into binary (say,
296.IR ndbm (3)
297or, better yet, fixed-length ASCII strings
298for use in heterogenous network environments) for faster startup.
299Then the program would run as fast as the Version 7 program of the same name,
300with the flexibility of the System V version.
15637ed4
RG
301.PP
302.I File
303uses several algorithms that favor speed over accuracy,
304thus it can be misled about the contents of ASCII files.
305.PP
306The support for ASCII files (primarily for programming languages)
307is simplistic, inefficient and requires recompilation to update.
308.PP
78ed81a3 309There should be an ``else'' clause to follow a series of continuation lines.
15637ed4
RG
310.PP
311The magic file and keywords should have regular expression support.
78ed81a3 312Their use of ASCII TAB as a field delimiter is ugly and makes
313it hard to edit the files, but is entrenched.
15637ed4
RG
314.PP
315It might be advisable to allow upper-case letters in keywords
316for e.g., troff commands vs man page macros.
317Regular expression support would make this easy.
318.PP
319The program doesn't grok \s-2FORTRAN\s0.
320It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
321appear indented at the start of line.
322Regular expression support would make this easy.
323.PP
324The list of keywords in
325.I ascmagic
326probably belongs in the Magic file.
327This could be done by using some keyword like `*' for the offset value.
328.PP
15637ed4
RG
329Another optimisation would be to sort
330the magic file so that we can just run down all the
331tests for the first byte, first word, first long, etc, once we
332have fetched it. Complain about conflicts in the magic file entries.
333Make a rule that the magic entries sort based on file offset rather
334than position within the magic file?
335.PP
336The program should provide a way to give an estimate
337of ``how good'' a guess is.
338We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
339they are not as good as other guesses (e.g. ``Newsgroups:'' versus
340"Return-Path:"). Still, if the others don't pan out, it should be
341possible to use the first guess.
342.PP
78ed81a3 343This program is slower than some vendors' file commands.
15637ed4
RG
344.PP
345This manual page, and particularly this section, is too long.
78ed81a3 346.SH AVAILABILITY
347You can obtain the original author's latest version by anonymous FTP
348on
349.B ftp.cs.toronto.edu
350in the directory
351.BR /pub/darwin/file .