Commit | Line | Data |
---|---|---|
18a61723 | 1 | .\" @(#)lint.ms 5.1 (Berkeley) %G% |
795f68a3 KM |
2 | .\" |
3 | .RP | |
4 | .ND "July 26, 1978" | |
5 | .OK | |
6 | Program Portability | |
7 | Strong Type Checking | |
8 | .TL | |
9 | Lint, a C Program Checker | |
10 | .AU "MH 2C-559" 3968 | |
11 | S. C. Johnson | |
12 | .AI | |
13 | .MH | |
14 | .AB | |
15 | .PP | |
16 | .I Lint | |
17 | is a command which examines C source programs, | |
18 | detecting | |
19 | a number of bugs and obscurities. | |
20 | It enforces the type rules of C more strictly than | |
21 | the C compilers. | |
22 | It may also be used to enforce a number of portability | |
23 | restrictions involved in moving | |
24 | programs between different machines and/or operating systems. | |
25 | Another option detects a number of wasteful, or error prone, constructions | |
26 | which nevertheless are, strictly speaking, legal. | |
27 | .PP | |
28 | .I Lint | |
29 | accepts multiple input files and library specifications, and checks them for consistency. | |
30 | .PP | |
31 | The separation of function between | |
32 | .I lint | |
33 | and the C compilers has both historical and practical | |
34 | rationale. | |
35 | The compilers turn C programs into executable files rapidly | |
36 | and efficiently. | |
37 | This is possible in part because the | |
38 | compilers do not do sophisticated | |
39 | type checking, especially between | |
40 | separately compiled programs. | |
41 | .I Lint | |
42 | takes a more global, leisurely view of the program, | |
43 | looking much more carefully at the compatibilities. | |
44 | .PP | |
45 | This document discusses the use of | |
46 | .I lint , | |
47 | gives an overview of the implementation, and gives some hints on the | |
48 | writing of machine independent C code. | |
49 | .AE | |
50 | .CS 10 2 12 0 0 5 | |
51 | .SH | |
52 | Introduction and Usage | |
53 | .PP | |
54 | Suppose there are two C | |
55 | .[ | |
56 | Kernighan Ritchie Programming Prentice 1978 | |
57 | .] | |
58 | source files, | |
59 | .I file1. c | |
60 | and | |
61 | .I file2.c , | |
62 | which are ordinarily compiled and loaded together. | |
63 | Then the command | |
64 | .DS | |
65 | lint file1.c file2.c | |
66 | .DE | |
67 | produces messages describing inconsistencies and inefficiencies | |
68 | in the programs. | |
69 | The program enforces the typing rules of C | |
70 | more strictly than the C compilers | |
71 | (for both historical and practical reasons) | |
72 | enforce them. | |
73 | The command | |
74 | .DS | |
75 | lint \-p file1.c file2.c | |
76 | .DE | |
77 | will produce, in addition to the above messages, additional messages | |
78 | which relate to the portability of the programs to other operating | |
79 | systems and machines. | |
80 | Replacing the | |
81 | .B \-p | |
82 | by | |
83 | .B \-h | |
84 | will produce messages about various error-prone or wasteful constructions | |
85 | which, strictly speaking, are not bugs. | |
86 | Saying | |
87 | .B \-hp | |
88 | gets the whole works. | |
89 | .PP | |
90 | The next several sections describe the major messages; | |
91 | the document closes with sections | |
92 | discussing the implementation and giving suggestions | |
93 | for writing portable C. | |
94 | An appendix gives a summary of the | |
95 | .I lint | |
96 | options. | |
97 | .SH | |
98 | A Word About Philosophy | |
99 | .PP | |
100 | Many of the facts which | |
101 | .I lint | |
102 | needs may be impossible to | |
103 | discover. | |
104 | For example, whether a given function in a program ever gets called | |
105 | may depend on the input data. | |
106 | Deciding whether | |
107 | .I exit | |
108 | is ever called is equivalent to solving the famous ``halting problem,'' known to be | |
109 | recursively undecidable. | |
110 | .PP | |
111 | Thus, most of the | |
112 | .I lint | |
113 | algorithms are a compromise. | |
114 | If a function is never mentioned, it can never be called. | |
115 | If a function is mentioned, | |
116 | .I lint | |
117 | assumes it can be called; this is not necessarily so, but in practice is quite reasonable. | |
118 | .PP | |
119 | .I Lint | |
120 | tries to give information with a high degree of relevance. | |
121 | Messages of the form ``\fIxxx\fR might be a bug'' | |
122 | are easy to generate, but are acceptable only in proportion | |
123 | to the fraction of real bugs they uncover. | |
124 | If this fraction of real bugs is too small, the messages lose their credibility | |
125 | and serve merely to clutter up the output, | |
126 | obscuring the more important messages. | |
127 | .PP | |
128 | Keeping these issues in mind, we now consider in more detail | |
129 | the classes of messages which | |
130 | .I lint | |
131 | produces. | |
132 | .SH | |
133 | Unused Variables and Functions | |
134 | .PP | |
135 | As sets of programs evolve and develop, | |
136 | previously used variables and arguments to | |
137 | functions may become unused; | |
138 | it is not uncommon for external variables, or even entire | |
139 | functions, to become unnecessary, and yet | |
140 | not be removed from the source. | |
141 | These ``errors of commission'' rarely cause working programs to fail, but they are a source | |
142 | of inefficiency, and make programs harder to understand | |
143 | and change. | |
144 | Moreover, information about such unused variables and functions can occasionally | |
145 | serve to discover bugs; if a function does a necessary job, and | |
146 | is never called, something is wrong! | |
147 | .PP | |
148 | .I Lint | |
149 | complains about variables and functions which are defined but not otherwise | |
150 | mentioned. | |
151 | An exception is variables which are declared through explicit | |
152 | .B extern | |
153 | statements but are never referenced; thus the statement | |
154 | .DS | |
155 | extern float sin(\|); | |
156 | .DE | |
157 | will evoke no comment if | |
158 | .I sin | |
159 | is never used. | |
160 | Note that this agrees with the semantics of the C compiler. | |
161 | In some cases, these unused external declarations might be of some interest; they | |
162 | can be discovered by adding the | |
163 | .B \-x | |
164 | flag to the | |
165 | .I lint | |
166 | invocation. | |
167 | .PP | |
168 | Certain styles of programming | |
169 | require many functions to be written with similar interfaces; | |
170 | frequently, some of the arguments may be unused | |
171 | in many of the calls. | |
172 | The | |
173 | .B \-v | |
174 | option is available to suppress the printing of | |
175 | complaints about unused arguments. | |
176 | When | |
177 | .B \-v | |
178 | is in effect, no messages are produced about unused | |
179 | arguments except for those | |
180 | arguments which are unused and also declared as | |
181 | register arguments; this can be considered | |
182 | an active (and preventable) waste of the register | |
183 | resources of the machine. | |
184 | .PP | |
185 | There is one case where information about unused, or | |
186 | undefined, variables is more distracting | |
187 | than helpful. | |
188 | This is when | |
189 | .I lint | |
190 | is applied to some, but not all, files out of a collection | |
191 | which are to be loaded together. | |
192 | In this case, many of the functions and variables defined | |
193 | may not be used, and, conversely, | |
194 | many functions and variables defined elsewhere may be used. | |
195 | The | |
196 | .B \-u | |
197 | flag may be used to suppress the spurious messages which might otherwise appear. | |
198 | .SH | |
199 | Set/Used Information | |
200 | .PP | |
201 | .I Lint | |
202 | attempts to detect cases where a variable is used before it is set. | |
203 | This is very difficult to do well; | |
204 | many algorithms take a good deal of time and space, | |
205 | and still produce messages about perfectly valid programs. | |
206 | .I Lint | |
207 | detects local variables (automatic and register storage classes) | |
208 | whose first use appears physically earlier in the input file than the first assignment to the variable. | |
209 | It assumes that taking the address of a variable constitutes a ``use,'' since the actual use | |
210 | may occur at any later time, in a data dependent fashion. | |
211 | .PP | |
212 | The restriction to the physical appearance of variables in the file makes the | |
213 | algorithm very simple and quick to implement, | |
214 | since the true flow of control need not be discovered. | |
215 | It does mean that | |
216 | .I lint | |
217 | can complain about some programs which are legal, | |
218 | but these programs would probably be considered bad on stylistic grounds (e.g. might | |
219 | contain at least two \fBgoto\fR's). | |
220 | Because static and external variables are initialized to 0, | |
221 | no meaningful information can be discovered about their uses. | |
222 | The algorithm deals correctly, however, with initialized automatic variables, and variables | |
223 | which are used in the expression which first sets them. | |
224 | .PP | |
225 | The set/used information also permits recognition of those local variables which are set | |
226 | and never used; these form a frequent source of inefficiencies, and may also be symptomatic of bugs. | |
227 | .SH | |
228 | Flow of Control | |
229 | .PP | |
230 | .I Lint | |
231 | attempts to detect unreachable portions of the programs which it processes. | |
232 | It will complain about unlabeled statements immediately following | |
233 | \fBgoto\fR, \fBbreak\fR, \fBcontinue\fR, or \fBreturn\fR statements. | |
234 | An attempt is made to detect loops which can never be left at the bottom, detecting the | |
235 | special cases | |
236 | \fBwhile\fR( 1 ) and \fBfor\fR(;;) as infinite loops. | |
237 | .I Lint | |
238 | also complains about loops which cannot be entered at the top; | |
239 | some valid programs may have such loops, but at best they are bad style, | |
240 | at worst bugs. | |
241 | .PP | |
242 | .I Lint | |
243 | has an important area of blindness in the flow of control algorithm: | |
244 | it has no way of detecting functions which are called and never return. | |
245 | Thus, a call to | |
246 | .I exit | |
247 | may cause unreachable code which | |
248 | .I lint | |
249 | does not detect; the most serious effects of this are in the | |
250 | determination of returned function values (see the next section). | |
251 | .PP | |
252 | One form of unreachable statement is not usually complained about by | |
253 | .I lint; | |
254 | a | |
255 | .B break | |
256 | statement that cannot be reached causes no message. | |
257 | Programs generated by | |
258 | .I yacc , | |
259 | .[ | |
260 | Johnson Yacc 1975 | |
261 | .] | |
262 | and especially | |
263 | .I lex , | |
264 | .[ | |
265 | Lesk Lex | |
266 | .] | |
267 | may have literally hundreds of unreachable | |
268 | .B break | |
269 | statements. | |
270 | The | |
271 | .B \-O | |
272 | flag in the C compiler will often eliminate the resulting object code inefficiency. | |
273 | Thus, these unreached statements are of little importance, | |
274 | there is typically nothing the user can do about them, and the | |
275 | resulting messages would clutter up the | |
276 | .I lint | |
277 | output. | |
278 | If these messages are desired, | |
279 | .I lint | |
280 | can be invoked with the | |
281 | .B \-b | |
282 | option. | |
283 | .SH | |
284 | Function Values | |
285 | .PP | |
286 | Sometimes functions return values which are never used; | |
287 | sometimes programs incorrectly use function ``values'' | |
288 | which have never been returned. | |
289 | .I Lint | |
290 | addresses this problem in a number of ways. | |
291 | .PP | |
292 | Locally, within a function definition, | |
293 | the appearance of both | |
294 | .DS | |
295 | return( \fIexpr\fR ); | |
296 | .DE | |
297 | and | |
298 | .DS | |
299 | return ; | |
300 | .DE | |
301 | statements is cause for alarm; | |
302 | .I lint | |
303 | will give the message | |
304 | .DS | |
305 | function \fIname\fR contains return(e) and return | |
306 | .DE | |
307 | The most serious difficulty with this is detecting when a function return is implied | |
308 | by flow of control reaching the end of the function. | |
309 | This can be seen with a simple example: | |
310 | .DS | |
311 | .ta .5i 1i 1.5i | |
312 | \fRf ( a ) { | |
313 | if ( a ) return ( 3 ); | |
314 | g (\|); | |
315 | } | |
316 | .DE | |
317 | Notice that, if \fIa\fR tests false, \fIf\fR will call \fIg\fR and then return | |
318 | with no defined return value; this will trigger a complaint from | |
319 | .I lint . | |
320 | If \fIg\fR, like \fIexit\fR, never returns, | |
321 | the message will still be produced when in fact nothing is wrong. | |
322 | .PP | |
323 | In practice, some potentially serious bugs have been discovered by this feature; | |
324 | it also accounts for a substantial fraction of the ``noise'' messages produced | |
325 | by | |
326 | .I lint . | |
327 | .PP | |
328 | On a global scale, | |
329 | .I lint | |
330 | detects cases where a function returns a value, but this value is sometimes, | |
331 | or always, unused. | |
332 | When the value is always unused, it may constitute an inefficiency in the function definition. | |
333 | When the value is sometimes unused, it may represent bad style (e.g., not testing for | |
334 | error conditions). | |
335 | .PP | |
336 | The dual problem, using a function value when the function does not return one, | |
337 | is also detected. | |
338 | This is a serious problem. | |
339 | Amazingly, this bug has been observed on a couple of occasions | |
340 | in ``working'' programs; the desired function value just happened to have been computed | |
341 | in the function return register! | |
342 | .SH | |
343 | Type Checking | |
344 | .PP | |
345 | .I Lint | |
346 | enforces the type checking rules of C more strictly than the compilers do. | |
347 | The additional checking is in four major areas: | |
348 | across certain binary operators and implied assignments, | |
349 | at the structure selection operators, | |
350 | between the definition and uses of functions, | |
351 | and in the use of enumerations. | |
352 | .PP | |
353 | There are a number of operators which have an implied balancing between types of the operands. | |
354 | The assignment, conditional ( ?\|: ), and relational operators | |
355 | have this property; the argument | |
356 | of a \fBreturn\fR statement, | |
357 | and expressions used in initialization also suffer similar conversions. | |
358 | In these operations, | |
359 | \fBchar\fR, \fBshort\fR, \fBint\fR, \fBlong\fR, \fBunsigned\fR, \fBfloat\fR, and \fBdouble\fR types may be freely intermixed. | |
360 | The types of pointers must agree exactly, | |
361 | except that arrays of \fIx\fR's can, of course, be intermixed with pointers to \fIx\fR's. | |
362 | .PP | |
363 | The type checking rules also require that, in structure references, the | |
364 | left operand of the \(em> be a pointer to structure, the left operand of the \fB.\fR | |
365 | be a structure, and the right operand of these operators be a member | |
366 | of the structure implied by the left operand. | |
367 | Similar checking is done for references to unions. | |
368 | .PP | |
369 | Strict rules apply to function argument and return value | |
370 | matching. | |
371 | The types \fBfloat\fR and \fBdouble\fR may be freely matched, | |
372 | as may the types \fBchar\fR, \fBshort\fR, \fBint\fR, and \fBunsigned\fR. | |
373 | Also, pointers can be matched with the associated arrays. | |
374 | Aside from this, all actual arguments must agree in type with their declared counterparts. | |
375 | .PP | |
376 | With enumerations, checks are made that enumeration variables or members are not mixed | |
377 | with other types, or other enumerations, | |
378 | and that the only operations applied are =, initialization, ==, !=, and function arguments and return values. | |
379 | .SH | |
380 | Type Casts | |
381 | .PP | |
382 | The type cast feature in C was introduced largely as an aid | |
383 | to producing more portable programs. | |
384 | Consider the assignment | |
385 | .DS | |
386 | p = 1 ; | |
387 | .DE | |
388 | where | |
389 | .I p | |
390 | is a character pointer. | |
391 | .I Lint | |
392 | will quite rightly complain. | |
393 | Now, consider the assignment | |
394 | .DS | |
395 | p = (char \(**)1 ; | |
396 | .DE | |
397 | in which a cast has been used to | |
398 | convert the integer to a character pointer. | |
399 | The programmer obviously had a strong motivation | |
400 | for doing this, and has clearly signaled his intentions. | |
401 | It seems harsh for | |
402 | .I lint | |
403 | to continue to complain about this. | |
404 | On the other hand, if this code is moved to another | |
405 | machine, such code should be looked at carefully. | |
406 | The | |
407 | .B \-c | |
408 | flag controls the printing of comments about casts. | |
409 | When | |
410 | .B \-c | |
411 | is in effect, casts are treated as though they were assignments | |
412 | subject to complaint; otherwise, all legal casts are passed without comment, | |
413 | no matter how strange the type mixing seems to be. | |
414 | .SH | |
415 | Nonportable Character Use | |
416 | .PP | |
417 | On the PDP-11, characters are signed quantities, with a range | |
418 | from \-128 to 127. | |
419 | On most of the other C implementations, characters take on only positive | |
420 | values. | |
421 | Thus, | |
422 | .I lint | |
423 | will flag certain comparisons and assignments as being | |
424 | illegal or nonportable. | |
425 | For example, the fragment | |
426 | .DS | |
427 | char c; | |
428 | ... | |
429 | if( (c = getchar(\|)) < 0 ) .... | |
430 | .DE | |
431 | works on the PDP-11, but | |
432 | will fail on machines where characters always take | |
433 | on positive values. | |
434 | The real solution is to declare | |
435 | .I c | |
436 | an integer, since | |
437 | .I getchar | |
438 | is actually returning | |
439 | integer values. | |
440 | In any case, | |
441 | .I lint | |
442 | will say | |
443 | ``nonportable character comparison''. | |
444 | .PP | |
445 | A similar issue arises with bitfields; when assignments | |
446 | of constant values are made to bitfields, the field may | |
447 | be too small to hold the value. | |
448 | This is especially true because | |
449 | on some machines bitfields are considered as signed | |
450 | quantities. | |
451 | While it may seem unintuitive to consider | |
452 | that a two bit field declared of type | |
453 | .B int | |
454 | cannot hold the value 3, the problem disappears | |
455 | if the bitfield is declared to have type | |
456 | .B unsigned . | |
457 | .SH | |
458 | Assignments of longs to ints | |
459 | .PP | |
460 | Bugs may arise from the assignment of | |
461 | .B long | |
462 | to | |
463 | an | |
464 | .B int , | |
465 | which loses accuracy. | |
466 | This may happen in programs | |
467 | which have been incompletely converted to use | |
468 | .B typedefs . | |
469 | When a | |
470 | .B typedef | |
471 | variable | |
472 | is changed from \fBint\fR to \fBlong\fR, | |
473 | the program can stop working because | |
474 | some intermediate results may be assigned | |
475 | to \fBints\fR, losing accuracy. | |
476 | Since there are a number of legitimate reasons for | |
477 | assigning \fBlongs\fR to \fBints\fR, the detection | |
478 | of these assignments is enabled | |
479 | by the | |
480 | .B \-a | |
481 | flag. | |
482 | .SH | |
483 | Strange Constructions | |
484 | .PP | |
485 | Several perfectly legal, but somewhat strange, constructions | |
486 | are flagged by | |
487 | .I lint; | |
488 | the messages hopefully encourage better code quality, clearer style, and | |
489 | may even point out bugs. | |
490 | The | |
491 | .B \-h | |
492 | flag is used to enable these checks. | |
493 | For example, in the statement | |
494 | .DS | |
495 | \(**p++ ; | |
496 | .DE | |
497 | the \(** does nothing; this provokes the message ``null effect'' from | |
498 | .I lint . | |
499 | The program fragment | |
500 | .DS | |
501 | unsigned x ; | |
502 | if( x < 0 ) ... | |
503 | .DE | |
504 | is clearly somewhat strange; the | |
505 | test will never succeed. | |
506 | Similarly, the test | |
507 | .DS | |
508 | if( x > 0 ) ... | |
509 | .DE | |
510 | is equivalent to | |
511 | .DS | |
512 | if( x != 0 ) | |
513 | .DE | |
514 | which may not be the intended action. | |
515 | .I Lint | |
516 | will say ``degenerate unsigned comparison'' in these cases. | |
517 | If one says | |
518 | .DS | |
519 | if( 1 != 0 ) .... | |
520 | .DE | |
521 | .I lint | |
522 | will report | |
523 | ``constant in conditional context'', since the comparison | |
524 | of 1 with 0 gives a constant result. | |
525 | .PP | |
526 | Another construction | |
527 | detected by | |
528 | .I lint | |
529 | involves | |
530 | operator precedence. | |
531 | Bugs which arise from misunderstandings about the precedence | |
532 | of operators can be accentuated by spacing and formatting, | |
533 | making such bugs extremely hard to find. | |
534 | For example, the statements | |
535 | .DS | |
536 | if( x&077 == 0 ) ... | |
537 | .DE | |
538 | or | |
539 | .DS | |
540 | x<\h'-.3m'<2 + 40 | |
541 | .DE | |
542 | probably do not do what was intended. | |
543 | The best solution is to parenthesize such expressions, | |
544 | and | |
545 | .I lint | |
546 | encourages this by an appropriate message. | |
547 | .PP | |
548 | Finally, when the | |
549 | .B \-h | |
550 | flag is in force | |
551 | .I lint | |
552 | complains about variables which are redeclared in inner blocks | |
553 | in a way that conflicts with their use in outer blocks. | |
554 | This is legal, but is considered by many (including the author) to | |
555 | be bad style, usually unnecessary, and frequently a bug. | |
556 | .SH | |
557 | Ancient History | |
558 | .PP | |
559 | There are several forms of older syntax which are being officially | |
560 | discouraged. | |
561 | These fall into two classes, assignment operators and initialization. | |
562 | .PP | |
563 | The older forms of assignment operators (e.g., =+, =\-, . . . ) | |
564 | could cause ambiguous expressions, such as | |
565 | .DS | |
566 | a =\-1 ; | |
567 | .DE | |
568 | which could be taken as either | |
569 | .DS | |
570 | a =\- 1 ; | |
571 | .DE | |
572 | or | |
573 | .DS | |
574 | a = \-1 ; | |
575 | .DE | |
576 | The situation is especially perplexing if this | |
577 | kind of ambiguity arises as the result of a macro substitution. | |
578 | The newer, and preferred operators (+=, \-=, etc. ) | |
579 | have no such ambiguities. | |
580 | To spur the abandonment of the older forms, | |
581 | .I lint | |
582 | complains about these old fashioned operators. | |
583 | .PP | |
584 | A similar issue arises with initialization. | |
585 | The older language allowed | |
586 | .DS | |
587 | int x \fR1 ; | |
588 | .DE | |
589 | to initialize | |
590 | .I x | |
591 | to 1. | |
592 | This also caused syntactic difficulties: for example, | |
593 | .DS | |
594 | int x ( \-1 ) ; | |
595 | .DE | |
596 | looks somewhat like the beginning of a function declaration: | |
597 | .DS | |
598 | int x ( y ) { . . . | |
599 | .DE | |
600 | and the compiler must read a fair ways past | |
601 | .I x | |
602 | in order to sure what the declaration really is.. | |
603 | Again, the problem is even more perplexing when the | |
604 | initializer involves a macro. | |
605 | The current syntax places an equals sign between the | |
606 | variable and the initializer: | |
607 | .DS | |
608 | int x = \-1 ; | |
609 | .DE | |
610 | This is free of any possible syntactic ambiguity. | |
611 | .SH | |
612 | Pointer Alignment | |
613 | .PP | |
614 | Certain pointer assignments may be reasonable on some machines, | |
615 | and illegal on others, due entirely to | |
616 | alignment restrictions. | |
617 | For example, on the PDP-11, it is reasonable | |
618 | to assign integer pointers to double pointers, since | |
619 | double precision values may begin on any integer boundary. | |
620 | On the Honeywell 6000, double precision values must begin | |
621 | on even word boundaries; | |
622 | thus, not all such assignments make sense. | |
623 | .I Lint | |
624 | tries to detect cases where pointers are assigned to other | |
625 | pointers, and such alignment problems might arise. | |
626 | The message ``possible pointer alignment problem'' | |
627 | results from this situation whenever either the | |
628 | .B \-p | |
629 | or | |
630 | .B \-h | |
631 | flags are in effect. | |
632 | .SH | |
633 | Multiple Uses and Side Effects | |
634 | .PP | |
635 | In complicated expressions, the best order in which to evaluate | |
636 | subexpressions may be highly machine dependent. | |
637 | For example, on machines (like the PDP-11) in which the stack | |
638 | runs backwards, function arguments will probably be best evaluated | |
639 | from right-to-left; on machines with a stack running forward, | |
640 | left-to-right seems most attractive. | |
641 | Function calls embedded as arguments of other functions | |
642 | may or may not be treated similarly to ordinary arguments. | |
643 | Similar issues arise with other operators which have side effects, | |
644 | such as the assignment operators and the increment and decrement operators. | |
645 | .PP | |
646 | In order that the efficiency of C on a particular machine not be | |
647 | unduly compromised, the C language leaves the order | |
648 | of evaluation of complicated expressions up to the | |
649 | local compiler, and, in fact, the various C compilers have considerable | |
650 | differences in the order in which they will evaluate complicated | |
651 | expressions. | |
652 | In particular, if any variable is changed by a side effect, and | |
653 | also used elsewhere in the same expression, the result is explicitly undefined. | |
654 | .PP | |
655 | .I Lint | |
656 | checks for the important special case where | |
657 | a simple scalar variable is affected. | |
658 | For example, the statement | |
659 | .DS | |
660 | \fIa\fR[\fIi\|\fR] = \fIb\fR[\fIi\fR++] ; | |
661 | .DE | |
662 | will draw the complaint: | |
663 | .DS | |
664 | warning: \fIi\fR evaluation order undefined | |
665 | .DE | |
666 | .SH | |
667 | Implementation | |
668 | .PP | |
669 | .I Lint | |
670 | consists of two programs and a driver. | |
671 | The first program is a version of the | |
672 | Portable C Compiler | |
673 | .[ | |
674 | Johnson Ritchie BSTJ Portability Programs System | |
675 | .] | |
676 | .[ | |
677 | Johnson portable compiler 1978 | |
678 | .] | |
679 | which is the basis of the | |
680 | IBM 370, Honeywell 6000, and Interdata 8/32 C compilers. | |
681 | This compiler does lexical and syntax analysis on the input text, | |
682 | constructs and maintains symbol tables, and builds trees for expressions. | |
683 | Instead of writing an intermediate file which is passed to | |
684 | a code generator, as the other compilers | |
685 | do, | |
686 | .I lint | |
687 | produces an intermediate file which consists of lines of ascii text. | |
688 | Each line contains an external variable name, | |
689 | an encoding of the context in which it was seen (use, definition, declaration, etc.), | |
690 | a type specifier, and a source file name and line number. | |
691 | The information about variables local to a function or file | |
692 | is collected | |
693 | by accessing the symbol table, and examining the expression trees. | |
694 | .PP | |
695 | Comments about local problems are produced as detected. | |
696 | The information about external names is collected | |
697 | onto an intermediate file. | |
698 | After all the source files and library descriptions have | |
699 | been collected, the intermediate file is sorted | |
700 | to bring all information collected about a given external | |
701 | name together. | |
702 | The second, rather small, program then reads the lines | |
703 | from the intermediate file and compares all of the | |
704 | definitions, declarations, and uses for consistency. | |
705 | .PP | |
706 | The driver controls this | |
707 | process, and is also responsible for making the options available | |
708 | to both passes of | |
709 | .I lint . | |
710 | .SH | |
711 | Portability | |
712 | .PP | |
713 | C on the Honeywell and IBM systems is used, in part, to write system code for the host operating system. | |
714 | This means that the implementation of C tends to follow local conventions rather than | |
715 | adhere strictly to | |
716 | .UX | |
717 | system conventions. | |
718 | Despite these differences, many C programs have been successfully moved to GCOS and the various IBM | |
719 | installations with little effort. | |
720 | This section describes some of the differences between the implementations, and | |
721 | discusses the | |
722 | .I lint | |
723 | features which encourage portability. | |
724 | .PP | |
725 | Uninitialized external variables are treated differently in different | |
726 | implementations of C. | |
727 | Suppose two files both contain a declaration without initialization, such as | |
728 | .DS | |
729 | int a ; | |
730 | .DE | |
731 | outside of any function. | |
732 | The | |
733 | .UX | |
734 | loader will resolve these declarations, and cause only a single word of storage | |
735 | to be set aside for \fIa\fR. | |
736 | Under the GCOS and IBM implementations, this is not feasible (for various stupid reasons!) | |
737 | so each such declaration causes a word of storage to be set aside and called \fIa\fR. | |
738 | When loading or library editing takes place, this causes fatal conflicts which prevent | |
739 | the proper operation of the program. | |
740 | If | |
741 | .I lint | |
742 | is invoked with the \fB\-p\fR flag, | |
743 | it will detect such multiple definitions. | |
744 | .PP | |
745 | A related difficulty comes from the amount of information retained about external names during the | |
746 | loading process. | |
747 | On the | |
748 | .UX | |
749 | system, externally known names have seven significant characters, with the upper/lower | |
750 | case distinction kept. | |
751 | On the IBM systems, there are eight significant characters, but the case distinction | |
752 | is lost. | |
753 | On GCOS, there are only six characters, of a single case. | |
754 | This leads to situations where programs run on the | |
755 | .UX | |
756 | system, but encounter loader | |
757 | problems on the IBM or GCOS systems. | |
758 | .I Lint | |
759 | .B \-p | |
760 | causes all external symbols to be mapped to one case and truncated to six characters, | |
761 | providing a worst-case analysis. | |
762 | .PP | |
763 | A number of differences arise in the area of character handling: characters in the | |
764 | .UX | |
765 | system are eight bit ascii, while they are eight bit ebcdic on the IBM, and | |
766 | nine bit ascii on GCOS. | |
767 | Moreover, character strings go from high to low bit positions (``left to right'') | |
768 | on GCOS and IBM, and low to high (``right to left'') on the PDP-11. | |
769 | This means that code attempting to construct strings | |
770 | out of character constants, or attempting to use characters as indices | |
771 | into arrays, must be looked at with great suspicion. | |
772 | .I Lint | |
773 | is of little help here, except to flag multi-character character constants. | |
774 | .PP | |
775 | Of course, the word sizes are different! | |
776 | This causes less trouble than might be expected, at least when | |
777 | moving from the | |
778 | .UX | |
779 | system (16 bit words) to the IBM (32 bits) or GCOS (36 bits). | |
780 | The main problems are likely to arise in shifting or masking. | |
781 | C now supports a bit-field facility, which can be used to write much of | |
782 | this code in a reasonably portable way. | |
783 | Frequently, portability of such code can be enhanced by | |
784 | slight rearrangements in coding style. | |
785 | Many of the incompatibilities seem to have the flavor of writing | |
786 | .DS | |
787 | x &= 0177700 ; | |
788 | .DE | |
789 | to clear the low order six bits of \fIx\fR. | |
790 | This suffices on the PDP-11, but fails badly on GCOS and IBM. | |
791 | If the bit field feature cannot be used, the same effect can be obtained by | |
792 | writing | |
793 | .DS | |
794 | x &= \(ap 077 ; | |
795 | .DE | |
796 | which will work on all these machines. | |
797 | .PP | |
798 | The right shift operator is arithmetic shift on the PDP-11, and logical shift on most | |
799 | other machines. | |
800 | To obtain a logical shift on all machines, the left operand can be | |
801 | typed \fBunsigned\fR. | |
802 | Characters are considered signed integers on the PDP-11, and unsigned on the other machines. | |
803 | This persistence of the sign bit may be reasonably considered a bug in the PDP-11 hardware | |
804 | which has infiltrated itself into the C language. | |
805 | If there were a good way to discover the programs which would be affected, C could be changed; | |
806 | in any case, | |
807 | .I lint | |
808 | is no help here. | |
809 | .PP | |
810 | The above discussion may have made the problem of portability seem | |
811 | bigger than it in fact is. | |
812 | The issues involved here are rarely subtle or mysterious, at least to the | |
813 | implementor of the program, although they can involve some work to straighten out. | |
814 | The most serious bar to the portability of | |
815 | .UX | |
816 | system utilities has been the inability to mimic | |
817 | essential | |
818 | .UX | |
819 | system functions on the other systems. | |
820 | The inability to seek to a random character position in a text file, or to establish a pipe | |
821 | between processes, has involved far more rewriting | |
822 | and debugging than any of the differences in C compilers. | |
823 | On the other hand, | |
824 | .I lint | |
825 | has been very helpful | |
826 | in moving the | |
827 | .UX | |
828 | operating system and associated | |
829 | utility programs to other machines. | |
830 | .SH | |
831 | Shutting Lint Up | |
832 | .PP | |
833 | There are occasions when | |
834 | the programmer is smarter than | |
835 | .I lint . | |
836 | There may be valid reasons for ``illegal'' type casts, | |
837 | functions with a variable number of arguments, etc. | |
838 | Moreover, as specified above, the flow of control information | |
839 | produced by | |
840 | .I lint | |
841 | often has blind spots, causing occasional spurious | |
842 | messages about perfectly reasonable programs. | |
843 | Thus, some way of communicating with | |
844 | .I lint , | |
845 | typically to shut it up, is desirable. | |
846 | .PP | |
847 | The form which this mechanism should take is not at all clear. | |
848 | New keywords would require current and old compilers to | |
849 | recognize these keywords, if only to ignore them. | |
850 | This has both philosophical and practical problems. | |
851 | New preprocessor syntax suffers from similar problems. | |
852 | .PP | |
853 | What was finally done was to cause a number of words | |
854 | to be recognized by | |
855 | .I lint | |
856 | when they were embedded in comments. | |
857 | This required minimal preprocessor changes; | |
858 | the preprocessor just had to agree to pass comments | |
859 | through to its output, instead of deleting them | |
860 | as had been previously done. | |
861 | Thus, | |
862 | .I lint | |
863 | directives are invisible to the compilers, and | |
864 | the effect on systems with the older preprocessors | |
865 | is merely that the | |
866 | .I lint | |
867 | directives don't work. | |
868 | .PP | |
869 | The first directive is concerned with flow of control information; | |
870 | if a particular place in the program cannot be reached, | |
871 | but this is not apparent to | |
872 | .I lint , | |
873 | this can be asserted by the directive | |
874 | .DS | |
875 | /* NOTREACHED */ | |
876 | .DE | |
877 | at the appropriate spot in the program. | |
878 | Similarly, if it is desired to turn off | |
879 | strict type checking for | |
880 | the next expression, the directive | |
881 | .DS | |
882 | /* NOSTRICT */ | |
883 | .DE | |
884 | can be used; the situation reverts to the | |
885 | previous default after the next expression. | |
886 | The | |
887 | .B \-v | |
888 | flag can be turned on for one function by the directive | |
889 | .DS | |
890 | /* ARGSUSED */ | |
891 | .DE | |
892 | Complaints about variable number of arguments in calls to a function | |
893 | can be turned off by the directive | |
894 | .DS | |
895 | /* VARARGS */ | |
896 | .DE | |
897 | preceding the function definition. | |
898 | In some cases, it is desirable to check the | |
899 | first several arguments, and leave the later arguments unchecked. | |
900 | This can be done by following the VARARGS keyword immediately | |
901 | with a digit giving the number of arguments which should be checked; thus, | |
902 | .DS | |
903 | /* VARARGS2 */ | |
904 | .DE | |
905 | will cause the first two arguments to be checked, the others unchecked. | |
906 | Finally, the directive | |
907 | .DS | |
908 | /* LINTLIBRARY */ | |
909 | .DE | |
910 | at the head of a file identifies this file as | |
911 | a library declaration file; this topic is worth a | |
912 | section by itself. | |
913 | .SH | |
914 | Library Declaration Files | |
915 | .PP | |
916 | .I Lint | |
917 | accepts certain library directives, such as | |
918 | .DS | |
919 | \-ly | |
920 | .DE | |
921 | and tests the source files for compatibility with these libraries. | |
922 | This is done by accessing library description files whose | |
923 | names are constructed from the library directives. | |
924 | These files all begin with the directive | |
925 | .DS | |
926 | /* LINTLIBRARY */ | |
927 | .DE | |
928 | which is followed by a series of dummy function | |
929 | definitions. | |
930 | The critical parts of these definitions | |
931 | are the declaration of the function return type, | |
932 | whether the dummy function returns a value, and | |
933 | the number and types of arguments to the function. | |
934 | The VARARGS and ARGSUSED directives can | |
935 | be used to specify features of the library functions. | |
936 | .PP | |
937 | .I Lint | |
938 | library files are processed almost exactly like ordinary | |
939 | source files. | |
940 | The only difference is that functions which are defined on a library file, | |
941 | but are not used on a source file, draw no complaints. | |
942 | .I Lint | |
943 | does not simulate a full library search algorithm, | |
944 | and complains if the source files contain a redefinition of | |
945 | a library routine (this is a feature!). | |
946 | .PP | |
947 | By default, | |
948 | .I lint | |
949 | checks the programs it is given against a standard library | |
950 | file, which contains descriptions of the programs which | |
951 | are normally loaded when | |
952 | a C program | |
953 | is run. | |
954 | When the | |
955 | .B -p | |
956 | flag is in effect, another file is checked containing | |
957 | descriptions of the standard I/O library routines | |
958 | which are expected to be portable across various machines. | |
959 | The | |
960 | .B -n | |
961 | flag can be used to suppress all library checking. | |
962 | .SH | |
963 | Bugs, etc. | |
964 | .PP | |
965 | .I Lint | |
966 | was a difficult program to write, partially | |
967 | because it is closely connected with matters of programming style, | |
968 | and partially because users usually don't notice bugs which cause | |
969 | .I lint | |
970 | to miss errors which it should have caught. | |
971 | (By contrast, if | |
972 | .I lint | |
973 | incorrectly complains about something that is correct, the | |
974 | programmer reports that immediately!) | |
975 | .PP | |
976 | A number of areas remain to be further developed. | |
977 | The checking of structures and arrays is rather inadequate; | |
978 | size | |
979 | incompatibilities go unchecked, | |
980 | and no attempt is made to match up structure and union | |
981 | declarations across files. | |
982 | Some stricter checking of the use of the | |
983 | .B typedef | |
984 | is clearly desirable, but what checking is appropriate, and how | |
985 | to carry it out, is still to be determined. | |
986 | .PP | |
987 | .I Lint | |
988 | shares the preprocessor with the C compiler. | |
989 | At some point it may be appropriate for a | |
990 | special version of the preprocessor to be constructed | |
991 | which checks for things such as unused macro definitions, | |
992 | macro arguments which have side effects which are | |
993 | not expanded at all, or are expanded more than once, etc. | |
994 | .PP | |
995 | The central problem with | |
996 | .I lint | |
997 | is the packaging of the information which it collects. | |
998 | There are many options which | |
999 | serve only to turn off, or slightly modify, | |
1000 | certain features. | |
1001 | There are pressures to add even more of these options. | |
1002 | .PP | |
1003 | In conclusion, it appears that the general notion of having two | |
1004 | programs is a good one. | |
1005 | The compiler concentrates on quickly and accurately turning the | |
1006 | program text into bits which can be run; | |
1007 | .I lint | |
1008 | concentrates on issues | |
1009 | of portability, style, and efficiency. | |
1010 | .I Lint | |
1011 | can afford to be wrong, since incorrectness and over-conservatism | |
1012 | are merely annoying, not fatal. | |
1013 | The compiler can be fast since it knows that | |
1014 | .I lint | |
1015 | will cover its flanks. | |
1016 | Finally, the programmer can | |
1017 | concentrate at one stage | |
1018 | of the programming process solely on the algorithms, | |
1019 | data structures, and correctness of the | |
1020 | program, and then later retrofit, | |
1021 | with the aid of | |
1022 | .I lint , | |
1023 | the desirable properties of universality and portability. | |
1024 | .SG MH-1273-SCJ-unix | |
1025 | .bp | |
1026 | .[ | |
1027 | $LIST$ | |
1028 | .] | |
1029 | .bp | |
1030 | .SH | |
1031 | Appendix: Current Lint Options | |
1032 | .PP | |
1033 | The command currently has the form | |
1034 | .DS | |
1035 | lint\fR [\fB\-\fRoptions ] files... library-descriptors... | |
1036 | .DE | |
1037 | The options are | |
1038 | .IP \fBh\fR | |
1039 | Perform heuristic checks | |
1040 | .IP \fBp\fR | |
1041 | Perform portability checks | |
1042 | .IP \fBv\fR | |
1043 | Don't report unused arguments | |
1044 | .IP \fBu\fR | |
1045 | Don't report unused or undefined externals | |
1046 | .IP \fBb\fR | |
1047 | Report unreachable | |
1048 | .B break | |
1049 | statements. | |
1050 | .IP \fBx\fR | |
1051 | Report unused external declarations | |
1052 | .IP \fBa\fR | |
1053 | Report assignments of | |
1054 | .B long | |
1055 | to | |
1056 | .B int | |
1057 | or shorter. | |
1058 | .IP \fBc\fR | |
1059 | Complain about questionable casts | |
1060 | .IP \fBn\fR | |
1061 | No library checking is done | |
1062 | .IP \fBs\fR | |
1063 | Same as | |
1064 | .B h | |
1065 | (for historical reasons) |