Commit | Line | Data |
---|---|---|
15637ed4 | 1 | .TH FILE 1 "Copyright but distributable" |
78ed81a3 | 2 | .\# file.1,v 1.3 1993/06/10 00:38:06 jtc Exp |
15637ed4 RG |
3 | .SH NAME |
4 | .I file | |
5 | \- determine file type | |
6 | .SH SYNOPSIS | |
7 | .B file | |
8 | [ | |
78ed81a3 | 9 | .B \-c |
15637ed4 RG |
10 | ] |
11 | [ | |
78ed81a3 | 12 | .B \-z |
13 | ] | |
14 | [ | |
15 | .B \-L | |
16 | ] | |
17 | [ | |
18 | .B \-f | |
15637ed4 RG |
19 | namefile ] |
20 | [ | |
78ed81a3 | 21 | .B \-m |
15637ed4 RG |
22 | magicfile ] |
23 | file ... | |
24 | .SH DESCRIPTION | |
25 | .I File | |
26 | tests each argument in an attempt to classify it. | |
27 | There are three sets of tests, performed in this order: | |
28 | filesystem tests, magic number tests, and language tests. | |
29 | The | |
30 | .I first | |
31 | test that succeeds causes the file type to be printed. | |
32 | .PP | |
33 | The type printed will usually contain one of the words | |
34 | .B text | |
35 | (the file contains only ASCII characters and is | |
36 | probably safe to read on an ASCII terminal), | |
37 | .B executable | |
38 | (the file contains the result of compiling a program | |
39 | in a form understandable to some \s-1UNIX\s0 kernel or another), | |
40 | or | |
41 | .B data | |
42 | meaning anything else (data is usually `binary' or non-printable). | |
43 | Exceptions are well-known file formats (core files, tar archives) | |
44 | that are known to contain binary data. | |
45 | When modifying the file | |
46 | .I /etc/magic | |
47 | or the program itself, | |
48 | .B "preserve these keywords" . | |
49 | People depend on knowing that all the readable files in a directory | |
50 | have the word ``text'' printed. | |
78ed81a3 | 51 | Don't do as Berkeley did \- change ``shell commands text'' |
15637ed4 RG |
52 | to ``shell script''. |
53 | .PP | |
54 | The filesystem tests are based on examining the return from a | |
78ed81a3 | 55 | .IR stat (2) |
15637ed4 RG |
56 | system call. |
57 | The program checks to see if the file is empty, | |
58 | or if it's some sort of special file. | |
59 | Any known file types appropriate to the system you are running on | |
78ed81a3 | 60 | (sockets, symbolic links, or named pipes (FIFOs) on those systems that |
61 | implement them) | |
15637ed4 RG |
62 | are intuited if they are defined in |
63 | the system header file | |
78ed81a3 | 64 | .BR sys/stat.h . |
15637ed4 RG |
65 | .PP |
66 | The magic number tests are used to check for files with data in | |
67 | particular fixed formats. | |
68 | The canonical example of this is a binary executable (compiled program) | |
78ed81a3 | 69 | .B a.out |
15637ed4 | 70 | file, whose format is defined in |
78ed81a3 | 71 | .B a.out.h |
15637ed4 | 72 | and possibly |
78ed81a3 | 73 | .B exec.h |
15637ed4 RG |
74 | in the standard include directory. |
75 | These files have a `magic number' stored in a particular place | |
76 | near the beginning of the file that tells the \s-1UNIX\s0 operating system | |
77 | that the file is a binary executable, and which of several types thereof. | |
78 | The concept of `magic number' has been applied by extension to data files. | |
79 | Any file with some invariant identifier at a small fixed | |
80 | offset into the file can usually be described in this way. | |
81 | The information in these files is read from the magic file | |
78ed81a3 | 82 | .I /etc/magic. |
15637ed4 RG |
83 | .PP |
84 | If an argument appears to be an | |
85 | .SM ASCII | |
86 | file, | |
87 | .I file | |
88 | attempts to guess its language. | |
89 | The language tests look for particular strings (cf \fInames.h\fP) | |
90 | that can appear anywhere in the first few blocks of a file. | |
91 | For example, the keyword | |
78ed81a3 | 92 | .B .br |
15637ed4 RG |
93 | indicates that the file is most likely a troff input file, |
94 | just as the keyword | |
78ed81a3 | 95 | .B struct |
15637ed4 RG |
96 | indicates a C program. |
97 | These tests are less reliable than the previous | |
98 | two groups, so they are performed last. | |
99 | The language test routines also test for some miscellany | |
100 | (such as | |
101 | .I tar | |
102 | archives) and determine whether an unknown file should be | |
103 | labelled as `ascii text' or `data'. | |
104 | .PP | |
105 | Use | |
78ed81a3 | 106 | .B \-m |
15637ed4 RG |
107 | .I file |
108 | to specify an alternate file of magic numbers. | |
109 | .PP | |
110 | The | |
78ed81a3 | 111 | .B \-z |
112 | tries to look inside compressed files. | |
113 | .PP | |
114 | The | |
115 | .B \-c | |
15637ed4 RG |
116 | option causes a checking printout of the parsed form of the magic file. |
117 | This is usually used in conjunction with | |
78ed81a3 | 118 | .B \-m |
15637ed4 RG |
119 | to debug a new magic file before installing it. |
120 | .PP | |
121 | The | |
78ed81a3 | 122 | .B \-f |
15637ed4 RG |
123 | .I namefile |
124 | option specifies that the names of the files to be examined | |
125 | are to be read (one per line) from | |
126 | .I namefile | |
127 | before the argument list. | |
128 | Either | |
129 | .I namefile | |
130 | or at least one filename argument must be present; | |
131 | to test the standard input, use ``-'' as a filename argument. | |
78ed81a3 | 132 | .PP |
133 | The | |
134 | .B \-L | |
135 | option causes symlinks to be followed, as the like-named option in | |
136 | .IR ls (1). | |
15637ed4 RG |
137 | .SH FILES |
138 | .I /etc/magic | |
139 | \- default list of magic numbers | |
140 | .SH SEE ALSO | |
78ed81a3 | 141 | .IR magic (5) |
15637ed4 RG |
142 | \- description of magic file format. |
143 | .br | |
144 | .IR Strings (1), " od" (1) | |
145 | \- tools for examining non-textfiles. | |
146 | .SH STANDARDS CONFORMANCE | |
147 | This program is believed to exceed the System V Interface Definition | |
148 | of FILE(CMD), as near as one can determine from the vague language | |
149 | contained therein. | |
150 | Its behaviour is mostly compatible with the System V program of the same name. | |
151 | This version knows more magic, however, so it will produce | |
152 | different (albeit more accurate) output in many cases. | |
153 | .PP | |
154 | The one significant difference | |
155 | between this version and System V | |
156 | is that this version treats any white space | |
157 | as a delimiter, so that spaces in pattern strings must be escaped. | |
158 | For example, | |
159 | .br | |
160 | >10 string language impress\ (imPRESS data) | |
161 | .br | |
162 | in an existing magic file would have to be changed to | |
163 | .br | |
164 | >10 string language\e impress (imPRESS data) | |
78ed81a3 | 165 | .br |
166 | In addition, in this version, if a pattern string contains a backslash, | |
167 | it must be escaped. For example | |
168 | .br | |
169 | 0 string \ebegindata Andrew Toolkit document | |
170 | .br | |
171 | in an existing magic file would have to be changed to | |
172 | .br | |
173 | 0 string \e\ebegindata Andrew Toolkit document | |
174 | .br | |
15637ed4 | 175 | .PP |
78ed81a3 | 176 | SunOS releases 3.2 and later from Sun Microsystems include a |
177 | .IR file (1) | |
178 | command derived from the System V one, but with some extensions. | |
15637ed4 | 179 | My version differs from Sun's only in minor ways. |
78ed81a3 | 180 | It includes the extension of the `&' operator, used as, |
15637ed4 RG |
181 | for example, |
182 | .br | |
183 | >16 long&0x7fffffff >0 not stripped | |
15637ed4 RG |
184 | .SH MAGIC DIRECTORY |
185 | The magic file entries have been collected from various sources, | |
186 | mainly USENET, and contributed by various authors. | |
187 | Ian Darwin (address below) will collect additional | |
188 | or corrected magic file entries. | |
189 | A consolidation of magic file entries | |
190 | will be distributed periodically. | |
191 | .PP | |
192 | The order of entries in the magic file is significant. | |
193 | Depending on what system you are using, the order that | |
194 | they are put together may be incorrect. | |
195 | If your old | |
196 | .I file | |
197 | command uses a magic file, | |
198 | keep the old magic file around for comparison purposes | |
199 | (rename it to | |
200 | .IR /etc/magic.orig ). | |
201 | .SH HISTORY | |
202 | There has been a | |
203 | .I file | |
204 | command in every UNIX since at least Research Version 6 | |
205 | (man page dated January, 1975). | |
206 | The System V version introduced one significant major change: | |
207 | the external list of magic number types. | |
208 | This slowed the program down slightly but made it a lot more flexible. | |
209 | .PP | |
210 | This program, based on the System V version, | |
211 | was written by Ian Darwin without looking at anybody else's source code. | |
212 | .PP | |
213 | John Gilmore revised the code extensively, making it better than | |
214 | the first version. | |
215 | Geoff Collyer found several inadequacies | |
216 | and provided some magic file entries. | |
217 | The program has undergone continued evolution since. | |
78ed81a3 | 218 | .SH AUTHOR |
15637ed4 RG |
219 | Written by Ian F. Darwin, UUCP address {utzoo | ihnp4}!darwin!ian, |
220 | Internet address ian@sq.com, | |
221 | postal address: P.O. Box 603, Station F, Toronto, Ontario, CANADA M4Y 2L8. | |
222 | .PP | |
78ed81a3 | 223 | Altered by Rob McMahon, cudcv@warwick.ac.uk, 1989, to extend the `&' operator |
224 | from simple `x&y != 0' to `x&y op z'. | |
225 | .PP | |
226 | Altered by Guy Harris, guy@auspex.com, 1993, to: | |
227 | .RS | |
228 | .PP | |
229 | put the ``old-style'' `&' | |
230 | operator back the way it was, because 1) Rob McMahon's change broke the | |
231 | previous style of usage, 2) the SunOS ``new-style'' `&' operator, | |
232 | which this version of | |
233 | .I file | |
234 | supports, also handles `x&y op z', and 3) Rob's change wasn't documented | |
235 | in any case; | |
236 | .PP | |
237 | put in multiple levels of `>'; | |
15637ed4 | 238 | .PP |
78ed81a3 | 239 | put in ``beshort'', ``leshort'', etc. keywords to look at numbers in the |
240 | file in a specific byte order, rather than in the native byte order of | |
241 | the process running | |
242 | .IR file . | |
243 | .RE | |
244 | .PP | |
245 | Changes by Ian Darwin and various authors including | |
246 | Christos Zoulas (christos@ee.cornell.edu), 1990-1992. | |
247 | .SH LEGAL NOTICE | |
248 | Copyright (c) Ian F. Darwin, Toronto, Canada, | |
249 | 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993. | |
250 | .PP | |
251 | This software is not subject to and may not be made subject to any | |
252 | license of the American Telephone and Telegraph Company, Sun | |
253 | Microsystems Inc., Digital Equipment Inc., Lotus Development Inc., the | |
254 | Regents of the University of California, The X Consortium or MIT, or | |
255 | The Free Software Foundation. | |
256 | .PP | |
257 | This software is not subject to any export provision of the United States | |
258 | Department of Commerce, and may be exported to any country or planet. | |
15637ed4 RG |
259 | .PP |
260 | Permission is granted to anyone to use this software for any purpose on | |
261 | any computer system, and to alter it and redistribute it freely, subject | |
262 | to the following restrictions: | |
263 | .PP | |
264 | 1. The author is not responsible for the consequences of use of this | |
265 | software, no matter how awful, even if they arise from flaws in it. | |
266 | .PP | |
267 | 2. The origin of this software must not be misrepresented, either by | |
268 | explicit claim or by omission. Since few users ever read sources, | |
269 | credits must appear in the documentation. | |
270 | .PP | |
271 | 3. Altered versions must be plainly marked as such, and must not be | |
272 | misrepresented as being the original software. Since few users | |
273 | ever read sources, credits must appear in the documentation. | |
274 | .PP | |
275 | 4. This notice may not be removed or altered. | |
276 | .PP | |
277 | A few support files (\fIgetopt\fP, \fIstrtok\fP) | |
278 | distributed with this package | |
279 | are by Henry Spencer and are subject to the same terms as above. | |
280 | .PP | |
281 | A few simple support files (\fIstrtol\fP, \fIstrchr\fP) | |
282 | distributed with this package | |
283 | are in the public domain; they are so marked. | |
284 | .PP | |
285 | The files | |
286 | .I tar.h | |
287 | and | |
288 | .I is_tar.c | |
289 | were written by John Gilmore from his public-domain | |
290 | .I tar | |
291 | program, and are not covered by the above restrictions. | |
292 | .SH BUGS | |
78ed81a3 | 293 | There must be a better way to automate the construction of the Magic |
294 | file from all the glop in Magdir. What is it? | |
295 | Better yet, the magic file should be compiled into binary (say, | |
296 | .IR ndbm (3) | |
297 | or, better yet, fixed-length ASCII strings | |
298 | for use in heterogenous network environments) for faster startup. | |
299 | Then the program would run as fast as the Version 7 program of the same name, | |
300 | with the flexibility of the System V version. | |
15637ed4 RG |
301 | .PP |
302 | .I File | |
303 | uses several algorithms that favor speed over accuracy, | |
304 | thus it can be misled about the contents of ASCII files. | |
305 | .PP | |
306 | The support for ASCII files (primarily for programming languages) | |
307 | is simplistic, inefficient and requires recompilation to update. | |
308 | .PP | |
78ed81a3 | 309 | There should be an ``else'' clause to follow a series of continuation lines. |
15637ed4 RG |
310 | .PP |
311 | The magic file and keywords should have regular expression support. | |
78ed81a3 | 312 | Their use of ASCII TAB as a field delimiter is ugly and makes |
313 | it hard to edit the files, but is entrenched. | |
15637ed4 RG |
314 | .PP |
315 | It might be advisable to allow upper-case letters in keywords | |
316 | for e.g., troff commands vs man page macros. | |
317 | Regular expression support would make this easy. | |
318 | .PP | |
319 | The program doesn't grok \s-2FORTRAN\s0. | |
320 | It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which | |
321 | appear indented at the start of line. | |
322 | Regular expression support would make this easy. | |
323 | .PP | |
324 | The list of keywords in | |
325 | .I ascmagic | |
326 | probably belongs in the Magic file. | |
327 | This could be done by using some keyword like `*' for the offset value. | |
328 | .PP | |
15637ed4 RG |
329 | Another optimisation would be to sort |
330 | the magic file so that we can just run down all the | |
331 | tests for the first byte, first word, first long, etc, once we | |
332 | have fetched it. Complain about conflicts in the magic file entries. | |
333 | Make a rule that the magic entries sort based on file offset rather | |
334 | than position within the magic file? | |
335 | .PP | |
336 | The program should provide a way to give an estimate | |
337 | of ``how good'' a guess is. | |
338 | We end up removing guesses (e.g. ``From '' as first 5 chars of file) because | |
339 | they are not as good as other guesses (e.g. ``Newsgroups:'' versus | |
340 | "Return-Path:"). Still, if the others don't pan out, it should be | |
341 | possible to use the first guess. | |
342 | .PP | |
78ed81a3 | 343 | This program is slower than some vendors' file commands. |
15637ed4 RG |
344 | .PP |
345 | This manual page, and particularly this section, is too long. | |
78ed81a3 | 346 | .SH AVAILABILITY |
347 | You can obtain the original author's latest version by anonymous FTP | |
348 | on | |
349 | .B ftp.cs.toronto.edu | |
350 | in the directory | |
351 | .BR /pub/darwin/file . |