Commit | Line | Data |
---|---|---|
2074ceed BJ |
1 | .RP |
2 | .ND "July 26, 1978" | |
3 | .OK | |
4 | Program Portability | |
5 | Strong Type Checking | |
6 | .TL | |
7 | Lint, a C Program Checker | |
8 | .AU "MH 2C-559" 3968 | |
9 | S. C. Johnson | |
10 | .AI | |
11 | .MH | |
12 | .AB | |
13 | .PP | |
14 | .I Lint | |
15 | is a command which examines C source programs, | |
16 | detecting | |
17 | a number of bugs and obscurities. | |
18 | It enforces the type rules of C more strictly than | |
19 | the C compilers. | |
20 | It may also be used to enforce a number of portability | |
21 | restrictions involved in moving | |
22 | programs between different machines and/or operating systems. | |
23 | Another option detects a number of wasteful, or error prone, constructions | |
24 | which nevertheless are, strictly speaking, legal. | |
25 | .PP | |
26 | .I Lint | |
27 | accepts multiple input files and library specifications, and checks them for consistency. | |
28 | .PP | |
29 | The separation of function between | |
30 | .I lint | |
31 | and the C compilers has both historical and practical | |
32 | rationale. | |
33 | The compilers turn C programs into executable files rapidly | |
34 | and efficiently. | |
35 | This is possible in part because the | |
36 | compilers do not do sophisticated | |
37 | type checking, especially between | |
38 | separately compiled programs. | |
39 | .I Lint | |
40 | takes a more global, leisurely view of the program, | |
41 | looking much more carefully at the compatibilities. | |
42 | .PP | |
43 | This document discusses the use of | |
44 | .I lint , | |
45 | gives an overview of the implementation, and gives some hints on the | |
46 | writing of machine independent C code. | |
47 | .AE | |
48 | .CS 10 2 12 0 0 5 | |
49 | .SH | |
50 | Introduction and Usage | |
51 | .PP | |
52 | Suppose there are two C | |
53 | .[ | |
54 | Kernighan Ritchie Programming Prentice 1978 | |
55 | .] | |
56 | source files, | |
57 | .I file1. c | |
58 | and | |
59 | .I file2.c , | |
60 | which are ordinarily compiled and loaded together. | |
61 | Then the command | |
62 | .DS | |
63 | lint file1.c file2.c | |
64 | .DE | |
65 | produces messages describing inconsistencies and inefficiencies | |
66 | in the programs. | |
67 | The program enforces the typing rules of C | |
68 | more strictly than the C compilers | |
69 | (for both historical and practical reasons) | |
70 | enforce them. | |
71 | The command | |
72 | .DS | |
73 | lint \-p file1.c file2.c | |
74 | .DE | |
75 | will produce, in addition to the above messages, additional messages | |
76 | which relate to the portability of the programs to other operating | |
77 | systems and machines. | |
78 | Replacing the | |
79 | .B \-p | |
80 | by | |
81 | .B \-h | |
82 | will produce messages about various error-prone or wasteful constructions | |
83 | which, strictly speaking, are not bugs. | |
84 | Saying | |
85 | .B \-hp | |
86 | gets the whole works. | |
87 | .PP | |
88 | The next several sections describe the major messages; | |
89 | the document closes with sections | |
90 | discussing the implementation and giving suggestions | |
91 | for writing portable C. | |
92 | An appendix gives a summary of the | |
93 | .I lint | |
94 | options. | |
95 | .SH | |
96 | A Word About Philosophy | |
97 | .PP | |
98 | Many of the facts which | |
99 | .I lint | |
100 | needs may be impossible to | |
101 | discover. | |
102 | For example, whether a given function in a program ever gets called | |
103 | may depend on the input data. | |
104 | Deciding whether | |
105 | .I exit | |
106 | is ever called is equivalent to solving the famous ``halting problem,'' known to be | |
107 | recursively undecidable. | |
108 | .PP | |
109 | Thus, most of the | |
110 | .I lint | |
111 | algorithms are a compromise. | |
112 | If a function is never mentioned, it can never be called. | |
113 | If a function is mentioned, | |
114 | .I lint | |
115 | assumes it can be called; this is not necessarily so, but in practice is quite reasonable. | |
116 | .PP | |
117 | .I Lint | |
118 | tries to give information with a high degree of relevance. | |
119 | Messages of the form ``\fIxxx\fR might be a bug'' | |
120 | are easy to generate, but are acceptable only in proportion | |
121 | to the fraction of real bugs they uncover. | |
122 | If this fraction of real bugs is too small, the messages lose their credibility | |
123 | and serve merely to clutter up the output, | |
124 | obscuring the more important messages. | |
125 | .PP | |
126 | Keeping these issues in mind, we now consider in more detail | |
127 | the classes of messages which | |
128 | .I lint | |
129 | produces. | |
130 | .SH | |
131 | Unused Variables and Functions | |
132 | .PP | |
133 | As sets of programs evolve and develop, | |
134 | previously used variables and arguments to | |
135 | functions may become unused; | |
136 | it is not uncommon for external variables, or even entire | |
137 | functions, to become unnecessary, and yet | |
138 | not be removed from the source. | |
139 | These ``errors of commission'' rarely cause working programs to fail, but they are a source | |
140 | of inefficiency, and make programs harder to understand | |
141 | and change. | |
142 | Moreover, information about such unused variables and functions can occasionally | |
143 | serve to discover bugs; if a function does a necessary job, and | |
144 | is never called, something is wrong! | |
145 | .PP | |
146 | .I Lint | |
147 | complains about variables and functions which are defined but not otherwise | |
148 | mentioned. | |
149 | An exception is variables which are declared through explicit | |
150 | .B extern | |
151 | statements but are never referenced; thus the statement | |
152 | .DS | |
153 | extern float sin(\|); | |
154 | .DE | |
155 | will evoke no comment if | |
156 | .I sin | |
157 | is never used. | |
158 | Note that this agrees with the semantics of the C compiler. | |
159 | In some cases, these unused external declarations might be of some interest; they | |
160 | can be discovered by adding the | |
161 | .B \-x | |
162 | flag to the | |
163 | .I lint | |
164 | invocation. | |
165 | .PP | |
166 | Certain styles of programming | |
167 | require many functions to be written with similar interfaces; | |
168 | frequently, some of the arguments may be unused | |
169 | in many of the calls. | |
170 | The | |
171 | .B \-v | |
172 | option is available to suppress the printing of | |
173 | complaints about unused arguments. | |
174 | When | |
175 | .B \-v | |
176 | is in effect, no messages are produced about unused | |
177 | arguments except for those | |
178 | arguments which are unused and also declared as | |
179 | register arguments; this can be considered | |
180 | an active (and preventable) waste of the register | |
181 | resources of the machine. | |
182 | .PP | |
183 | There is one case where information about unused, or | |
184 | undefined, variables is more distracting | |
185 | than helpful. | |
186 | This is when | |
187 | .I lint | |
188 | is applied to some, but not all, files out of a collection | |
189 | which are to be loaded together. | |
190 | In this case, many of the functions and variables defined | |
191 | may not be used, and, conversely, | |
192 | many functions and variables defined elsewhere may be used. | |
193 | The | |
194 | .B \-u | |
195 | flag may be used to suppress the spurious messages which might otherwise appear. | |
196 | .SH | |
197 | Set/Used Information | |
198 | .PP | |
199 | .I Lint | |
200 | attempts to detect cases where a variable is used before it is set. | |
201 | This is very difficult to do well; | |
202 | many algorithms take a good deal of time and space, | |
203 | and still produce messages about perfectly valid programs. | |
204 | .I Lint | |
205 | detects local variables (automatic and register storage classes) | |
206 | whose first use appears physically earlier in the input file than the first assignment to the variable. | |
207 | It assumes that taking the address of a variable constitutes a ``use,'' since the actual use | |
208 | may occur at any later time, in a data dependent fashion. | |
209 | .PP | |
210 | The restriction to the physical appearance of variables in the file makes the | |
211 | algorithm very simple and quick to implement, | |
212 | since the true flow of control need not be discovered. | |
213 | It does mean that | |
214 | .I lint | |
215 | can complain about some programs which are legal, | |
216 | but these programs would probably be considered bad on stylistic grounds (e.g. might | |
217 | contain at least two \fBgoto\fR's). | |
218 | Because static and external variables are initialized to 0, | |
219 | no meaningful information can be discovered about their uses. | |
220 | The algorithm deals correctly, however, with initialized automatic variables, and variables | |
221 | which are used in the expression which first sets them. | |
222 | .PP | |
223 | The set/used information also permits recognition of those local variables which are set | |
224 | and never used; these form a frequent source of inefficiencies, and may also be symptomatic of bugs. | |
225 | .SH | |
226 | Flow of Control | |
227 | .PP | |
228 | .I Lint | |
229 | attempts to detect unreachable portions of the programs which it processes. | |
230 | It will complain about unlabeled statements immediately following | |
231 | \fBgoto\fR, \fBbreak\fR, \fBcontinue\fR, or \fBreturn\fR statements. | |
232 | An attempt is made to detect loops which can never be left at the bottom, detecting the | |
233 | special cases | |
234 | \fBwhile\fR( 1 ) and \fBfor\fR(;;) as infinite loops. | |
235 | .I Lint | |
236 | also complains about loops which cannot be entered at the top; | |
237 | some valid programs may have such loops, but at best they are bad style, | |
238 | at worst bugs. | |
239 | .PP | |
240 | .I Lint | |
241 | has an important area of blindness in the flow of control algorithm: | |
242 | it has no way of detecting functions which are called and never return. | |
243 | Thus, a call to | |
244 | .I exit | |
245 | may cause unreachable code which | |
246 | .I lint | |
247 | does not detect; the most serious effects of this are in the | |
248 | determination of returned function values (see the next section). | |
249 | .PP | |
250 | One form of unreachable statement is not usually complained about by | |
251 | .I lint; | |
252 | a | |
253 | .B break | |
254 | statement that cannot be reached causes no message. | |
255 | Programs generated by | |
256 | .I yacc , | |
257 | .[ | |
258 | Johnson Yacc 1975 | |
259 | .] | |
260 | and especially | |
261 | .I lex , | |
262 | .[ | |
263 | Lesk Lex | |
264 | .] | |
265 | may have literally hundreds of unreachable | |
266 | .B break | |
267 | statements. | |
268 | The | |
269 | .B \-O | |
270 | flag in the C compiler will often eliminate the resulting object code inefficiency. | |
271 | Thus, these unreached statements are of little importance, | |
272 | there is typically nothing the user can do about them, and the | |
273 | resulting messages would clutter up the | |
274 | .I lint | |
275 | output. | |
276 | If these messages are desired, | |
277 | .I lint | |
278 | can be invoked with the | |
279 | .B \-b | |
280 | option. | |
281 | .SH | |
282 | Function Values | |
283 | .PP | |
284 | Sometimes functions return values which are never used; | |
285 | sometimes programs incorrectly use function ``values'' | |
286 | which have never been returned. | |
287 | .I Lint | |
288 | addresses this problem in a number of ways. | |
289 | .PP | |
290 | Locally, within a function definition, | |
291 | the appearance of both | |
292 | .DS | |
293 | return( \fIexpr\fR ); | |
294 | .DE | |
295 | and | |
296 | .DS | |
297 | return ; | |
298 | .DE | |
299 | statements is cause for alarm; | |
300 | .I lint | |
301 | will give the message | |
302 | .DS | |
303 | function \fIname\fR contains return(e) and return | |
304 | .DE | |
305 | The most serious difficulty with this is detecting when a function return is implied | |
306 | by flow of control reaching the end of the function. | |
307 | This can be seen with a simple example: | |
308 | .DS | |
309 | .ta .5i 1i 1.5i | |
310 | \fRf ( a ) { | |
311 | if ( a ) return ( 3 ); | |
312 | g (\|); | |
313 | } | |
314 | .DE | |
315 | Notice that, if \fIa\fR tests false, \fIf\fR will call \fIg\fR and then return | |
316 | with no defined return value; this will trigger a complaint from | |
317 | .I lint . | |
318 | If \fIg\fR, like \fIexit\fR, never returns, | |
319 | the message will still be produced when in fact nothing is wrong. | |
320 | .PP | |
321 | In practice, some potentially serious bugs have been discovered by this feature; | |
322 | it also accounts for a substantial fraction of the ``noise'' messages produced | |
323 | by | |
324 | .I lint . | |
325 | .PP | |
326 | On a global scale, | |
327 | .I lint | |
328 | detects cases where a function returns a value, but this value is sometimes, | |
329 | or always, unused. | |
330 | When the value is always unused, it may constitute an inefficiency in the function definition. | |
331 | When the value is sometimes unused, it may represent bad style (e.g., not testing for | |
332 | error conditions). | |
333 | .PP | |
334 | The dual problem, using a function value when the function does not return one, | |
335 | is also detected. | |
336 | This is a serious problem. | |
337 | Amazingly, this bug has been observed on a couple of occasions | |
338 | in ``working'' programs; the desired function value just happened to have been computed | |
339 | in the function return register! | |
340 | .SH | |
341 | Type Checking | |
342 | .PP | |
343 | .I Lint | |
344 | enforces the type checking rules of C more strictly than the compilers do. | |
345 | The additional checking is in four major areas: | |
346 | across certain binary operators and implied assignments, | |
347 | at the structure selection operators, | |
348 | between the definition and uses of functions, | |
349 | and in the use of enumerations. | |
350 | .PP | |
351 | There are a number of operators which have an implied balancing between types of the operands. | |
352 | The assignment, conditional ( ?\|: ), and relational operators | |
353 | have this property; the argument | |
354 | of a \fBreturn\fR statement, | |
355 | and expressions used in initialization also suffer similar conversions. | |
356 | In these operations, | |
357 | \fBchar\fR, \fBshort\fR, \fBint\fR, \fBlong\fR, \fBunsigned\fR, \fBfloat\fR, and \fBdouble\fR types may be freely intermixed. | |
358 | The types of pointers must agree exactly, | |
359 | except that arrays of \fIx\fR's can, of course, be intermixed with pointers to \fIx\fR's. | |
360 | .PP | |
361 | The type checking rules also require that, in structure references, the | |
362 | left operand of the \(em> be a pointer to structure, the left operand of the \fB.\fR | |
363 | be a structure, and the right operand of these operators be a member | |
364 | of the structure implied by the left operand. | |
365 | Similar checking is done for references to unions. | |
366 | .PP | |
367 | Strict rules apply to function argument and return value | |
368 | matching. | |
369 | The types \fBfloat\fR and \fBdouble\fR may be freely matched, | |
370 | as may the types \fBchar\fR, \fBshort\fR, \fBint\fR, and \fBunsigned\fR. | |
371 | Also, pointers can be matched with the associated arrays. | |
372 | Aside from this, all actual arguments must agree in type with their declared counterparts. | |
373 | .PP | |
374 | With enumerations, checks are made that enumeration variables or members are not mixed | |
375 | with other types, or other enumerations, | |
376 | and that the only operations applied are =, initialization, ==, !=, and function arguments and return values. | |
377 | .SH | |
378 | Type Casts | |
379 | .PP | |
380 | The type cast feature in C was introduced largely as an aid | |
381 | to producing more portable programs. | |
382 | Consider the assignment | |
383 | .DS | |
384 | p = 1 ; | |
385 | .DE | |
386 | where | |
387 | .I p | |
388 | is a character pointer. | |
389 | .I Lint | |
390 | will quite rightly complain. | |
391 | Now, consider the assignment | |
392 | .DS | |
393 | p = (char \(**)1 ; | |
394 | .DE | |
395 | in which a cast has been used to | |
396 | convert the integer to a character pointer. | |
397 | The programmer obviously had a strong motivation | |
398 | for doing this, and has clearly signaled his intentions. | |
399 | It seems harsh for | |
400 | .I lint | |
401 | to continue to complain about this. | |
402 | On the other hand, if this code is moved to another | |
403 | machine, such code should be looked at carefully. | |
404 | The | |
405 | .B \-c | |
406 | flag controls the printing of comments about casts. | |
407 | When | |
408 | .B \-c | |
409 | is in effect, casts are treated as though they were assignments | |
410 | subject to complaint; otherwise, all legal casts are passed without comment, | |
411 | no matter how strange the type mixing seems to be. | |
412 | .SH | |
413 | Nonportable Character Use | |
414 | .PP | |
415 | On the PDP-11, characters are signed quantities, with a range | |
416 | from \-128 to 127. | |
417 | On most of the other C implementations, characters take on only positive | |
418 | values. | |
419 | Thus, | |
420 | .I lint | |
421 | will flag certain comparisons and assignments as being | |
422 | illegal or nonportable. | |
423 | For example, the fragment | |
424 | .DS | |
425 | char c; | |
426 | ... | |
427 | if( (c = getchar(\|)) < 0 ) .... | |
428 | .DE | |
429 | works on the PDP-11, but | |
430 | will fail on machines where characters always take | |
431 | on positive values. | |
432 | The real solution is to declare | |
433 | .I c | |
434 | an integer, since | |
435 | .I getchar | |
436 | is actually returning | |
437 | integer values. | |
438 | In any case, | |
439 | .I lint | |
440 | will say | |
441 | ``nonportable character comparison''. | |
442 | .PP | |
443 | A similar issue arises with bitfields; when assignments | |
444 | of constant values are made to bitfields, the field may | |
445 | be too small to hold the value. | |
446 | This is especially true because | |
447 | on some machines bitfields are considered as signed | |
448 | quantities. | |
449 | While it may seem unintuitive to consider | |
450 | that a two bit field declared of type | |
451 | .B int | |
452 | cannot hold the value 3, the problem disappears | |
453 | if the bitfield is declared to have type | |
454 | .B unsigned . | |
455 | .SH | |
456 | Assignments of longs to ints | |
457 | .PP | |
458 | Bugs may arise from the assignment of | |
459 | .B long | |
460 | to | |
461 | an | |
462 | .B int , | |
463 | which loses accuracy. | |
464 | This may happen in programs | |
465 | which have been incompletely converted to use | |
466 | .B typedefs . | |
467 | When a | |
468 | .B typedef | |
469 | variable | |
470 | is changed from \fBint\fR to \fBlong\fR, | |
471 | the program can stop working because | |
472 | some intermediate results may be assigned | |
473 | to \fBints\fR, losing accuracy. | |
474 | Since there are a number of legitimate reasons for | |
475 | assigning \fBlongs\fR to \fBints\fR, the detection | |
476 | of these assignments is enabled | |
477 | by the | |
478 | .B \-a | |
479 | flag. | |
480 | .SH | |
481 | Strange Constructions | |
482 | .PP | |
483 | Several perfectly legal, but somewhat strange, constructions | |
484 | are flagged by | |
485 | .I lint; | |
486 | the messages hopefully encourage better code quality, clearer style, and | |
487 | may even point out bugs. | |
488 | The | |
489 | .B \-h | |
490 | flag is used to enable these checks. | |
491 | For example, in the statement | |
492 | .DS | |
493 | \(**p++ ; | |
494 | .DE | |
495 | the \(** does nothing; this provokes the message ``null effect'' from | |
496 | .I lint . | |
497 | The program fragment | |
498 | .DS | |
499 | unsigned x ; | |
500 | if( x < 0 ) ... | |
501 | .DE | |
502 | is clearly somewhat strange; the | |
503 | test will never succeed. | |
504 | Similarly, the test | |
505 | .DS | |
506 | if( x > 0 ) ... | |
507 | .DE | |
508 | is equivalent to | |
509 | .DS | |
510 | if( x != 0 ) | |
511 | .DE | |
512 | which may not be the intended action. | |
513 | .I Lint | |
514 | will say ``degenerate unsigned comparison'' in these cases. | |
515 | If one says | |
516 | .DS | |
517 | if( 1 != 0 ) .... | |
518 | .DE | |
519 | .I lint | |
520 | will report | |
521 | ``constant in conditional context'', since the comparison | |
522 | of 1 with 0 gives a constant result. | |
523 | .PP | |
524 | Another construction | |
525 | detected by | |
526 | .I lint | |
527 | involves | |
528 | operator precedence. | |
529 | Bugs which arise from misunderstandings about the precedence | |
530 | of operators can be accentuated by spacing and formatting, | |
531 | making such bugs extremely hard to find. | |
532 | For example, the statements | |
533 | .DS | |
534 | if( x&077 == 0 ) ... | |
535 | .DE | |
536 | or | |
537 | .DS | |
538 | x<\h'-.3m'<2 + 40 | |
539 | .DE | |
540 | probably do not do what was intended. | |
541 | The best solution is to parenthesize such expressions, | |
542 | and | |
543 | .I lint | |
544 | encourages this by an appropriate message. | |
545 | .PP | |
546 | Finally, when the | |
547 | .B \-h | |
548 | flag is in force | |
549 | .I lint | |
550 | complains about variables which are redeclared in inner blocks | |
551 | in a way that conflicts with their use in outer blocks. | |
552 | This is legal, but is considered by many (including the author) to | |
553 | be bad style, usually unnecessary, and frequently a bug. | |
554 | .SH | |
555 | Ancient History | |
556 | .PP | |
557 | There are several forms of older syntax which are being officially | |
558 | discouraged. | |
559 | These fall into two classes, assignment operators and initialization. | |
560 | .PP | |
561 | The older forms of assignment operators (e.g., =+, =\-, . . . ) | |
562 | could cause ambiguous expressions, such as | |
563 | .DS | |
564 | a =\-1 ; | |
565 | .DE | |
566 | which could be taken as either | |
567 | .DS | |
568 | a =\- 1 ; | |
569 | .DE | |
570 | or | |
571 | .DS | |
572 | a = \-1 ; | |
573 | .DE | |
574 | The situation is especially perplexing if this | |
575 | kind of ambiguity arises as the result of a macro substitution. | |
576 | The newer, and preferred operators (+=, \-=, etc. ) | |
577 | have no such ambiguities. | |
578 | To spur the abandonment of the older forms, | |
579 | .I lint | |
580 | complains about these old fashioned operators. | |
581 | .PP | |
582 | A similar issue arises with initialization. | |
583 | The older language allowed | |
584 | .DS | |
585 | int x \fR1 ; | |
586 | .DE | |
587 | to initialize | |
588 | .I x | |
589 | to 1. | |
590 | This also caused syntactic difficulties: for example, | |
591 | .DS | |
592 | int x ( \-1 ) ; | |
593 | .DE | |
594 | looks somewhat like the beginning of a function declaration: | |
595 | .DS | |
596 | int x ( y ) { . . . | |
597 | .DE | |
598 | and the compiler must read a fair ways past | |
599 | .I x | |
600 | in order to sure what the declaration really is.. | |
601 | Again, the problem is even more perplexing when the | |
602 | initializer involves a macro. | |
603 | The current syntax places an equals sign between the | |
604 | variable and the initializer: | |
605 | .DS | |
606 | int x = \-1 ; | |
607 | .DE | |
608 | This is free of any possible syntactic ambiguity. | |
609 | .SH | |
610 | Pointer Alignment | |
611 | .PP | |
612 | Certain pointer assignments may be reasonable on some machines, | |
613 | and illegal on others, due entirely to | |
614 | alignment restrictions. | |
615 | For example, on the PDP-11, it is reasonable | |
616 | to assign integer pointers to double pointers, since | |
617 | double precision values may begin on any integer boundary. | |
618 | On the Honeywell 6000, double precision values must begin | |
619 | on even word boundaries; | |
620 | thus, not all such assignments make sense. | |
621 | .I Lint | |
622 | tries to detect cases where pointers are assigned to other | |
623 | pointers, and such alignment problems might arise. | |
624 | The message ``possible pointer alignment problem'' | |
625 | results from this situation whenever either the | |
626 | .B \-p | |
627 | or | |
628 | .B \-h | |
629 | flags are in effect. | |
630 | .SH | |
631 | Multiple Uses and Side Effects | |
632 | .PP | |
633 | In complicated expressions, the best order in which to evaluate | |
634 | subexpressions may be highly machine dependent. | |
635 | For example, on machines (like the PDP-11) in which the stack | |
636 | runs backwards, function arguments will probably be best evaluated | |
637 | from right-to-left; on machines with a stack running forward, | |
638 | left-to-right seems most attractive. | |
639 | Function calls embedded as arguments of other functions | |
640 | may or may not be treated similarly to ordinary arguments. | |
641 | Similar issues arise with other operators which have side effects, | |
642 | such as the assignment operators and the increment and decrement operators. | |
643 | .PP | |
644 | In order that the efficiency of C on a particular machine not be | |
645 | unduly compromised, the C language leaves the order | |
646 | of evaluation of complicated expressions up to the | |
647 | local compiler, and, in fact, the various C compilers have considerable | |
648 | differences in the order in which they will evaluate complicated | |
649 | expressions. | |
650 | In particular, if any variable is changed by a side effect, and | |
651 | also used elsewhere in the same expression, the result is explicitly undefined. | |
652 | .PP | |
653 | .I Lint | |
654 | checks for the important special case where | |
655 | a simple scalar variable is affected. | |
656 | For example, the statement | |
657 | .DS | |
658 | \fIa\fR[\fIi\|\fR] = \fIb\fR[\fIi\fR++] ; | |
659 | .DE | |
660 | will draw the complaint: | |
661 | .DS | |
662 | warning: \fIi\fR evaluation order undefined | |
663 | .DE | |
664 | .SH | |
665 | Implementation | |
666 | .PP | |
667 | .I Lint | |
668 | consists of two programs and a driver. | |
669 | The first program is a version of the | |
670 | Portable C Compiler | |
671 | .[ | |
672 | Johnson Ritchie BSTJ Portability Programs System | |
673 | .] | |
674 | .[ | |
675 | Johnson portable compiler 1978 | |
676 | .] | |
677 | which is the basis of the | |
678 | IBM 370, Honeywell 6000, and Interdata 8/32 C compilers. | |
679 | This compiler does lexical and syntax analysis on the input text, | |
680 | constructs and maintains symbol tables, and builds trees for expressions. | |
681 | Instead of writing an intermediate file which is passed to | |
682 | a code generator, as the other compilers | |
683 | do, | |
684 | .I lint | |
685 | produces an intermediate file which consists of lines of ascii text. | |
686 | Each line contains an external variable name, | |
687 | an encoding of the context in which it was seen (use, definition, declaration, etc.), | |
688 | a type specifier, and a source file name and line number. | |
689 | The information about variables local to a function or file | |
690 | is collected | |
691 | by accessing the symbol table, and examining the expression trees. | |
692 | .PP | |
693 | Comments about local problems are produced as detected. | |
694 | The information about external names is collected | |
695 | onto an intermediate file. | |
696 | After all the source files and library descriptions have | |
697 | been collected, the intermediate file is sorted | |
698 | to bring all information collected about a given external | |
699 | name together. | |
700 | The second, rather small, program then reads the lines | |
701 | from the intermediate file and compares all of the | |
702 | definitions, declarations, and uses for consistency. | |
703 | .PP | |
704 | The driver controls this | |
705 | process, and is also responsible for making the options available | |
706 | to both passes of | |
707 | .I lint . | |
708 | .SH | |
709 | Portability | |
710 | .PP | |
711 | C on the Honeywell and IBM systems is used, in part, to write system code for the host operating system. | |
712 | This means that the implementation of C tends to follow local conventions rather than | |
713 | adhere strictly to | |
714 | .UX | |
715 | system conventions. | |
716 | Despite these differences, many C programs have been successfully moved to GCOS and the various IBM | |
717 | installations with little effort. | |
718 | This section describes some of the differences between the implementations, and | |
719 | discusses the | |
720 | .I lint | |
721 | features which encourage portability. | |
722 | .PP | |
723 | Uninitialized external variables are treated differently in different | |
724 | implementations of C. | |
725 | Suppose two files both contain a declaration without initialization, such as | |
726 | .DS | |
727 | int a ; | |
728 | .DE | |
729 | outside of any function. | |
730 | The | |
731 | .UX | |
732 | loader will resolve these declarations, and cause only a single word of storage | |
733 | to be set aside for \fIa\fR. | |
734 | Under the GCOS and IBM implementations, this is not feasible (for various stupid reasons!) | |
735 | so each such declaration causes a word of storage to be set aside and called \fIa\fR. | |
736 | When loading or library editing takes place, this causes fatal conflicts which prevent | |
737 | the proper operation of the program. | |
738 | If | |
739 | .I lint | |
740 | is invoked with the \fB\-p\fR flag, | |
741 | it will detect such multiple definitions. | |
742 | .PP | |
743 | A related difficulty comes from the amount of information retained about external names during the | |
744 | loading process. | |
745 | On the | |
746 | .UX | |
747 | system, externally known names have seven significant characters, with the upper/lower | |
748 | case distinction kept. | |
749 | On the IBM systems, there are eight significant characters, but the case distinction | |
750 | is lost. | |
751 | On GCOS, there are only six characters, of a single case. | |
752 | This leads to situations where programs run on the | |
753 | .UX | |
754 | system, but encounter loader | |
755 | problems on the IBM or GCOS systems. | |
756 | .I Lint | |
757 | .B \-p | |
758 | causes all external symbols to be mapped to one case and truncated to six characters, | |
759 | providing a worst-case analysis. | |
760 | .PP | |
761 | A number of differences arise in the area of character handling: characters in the | |
762 | .UX | |
763 | system are eight bit ascii, while they are eight bit ebcdic on the IBM, and | |
764 | nine bit ascii on GCOS. | |
765 | Moreover, character strings go from high to low bit positions (``left to right'') | |
766 | on GCOS and IBM, and low to high (``right to left'') on the PDP-11. | |
767 | This means that code attempting to construct strings | |
768 | out of character constants, or attempting to use characters as indices | |
769 | into arrays, must be looked at with great suspicion. | |
770 | .I Lint | |
771 | is of little help here, except to flag multi-character character constants. | |
772 | .PP | |
773 | Of course, the word sizes are different! | |
774 | This causes less trouble than might be expected, at least when | |
775 | moving from the | |
776 | .UX | |
777 | system (16 bit words) to the IBM (32 bits) or GCOS (36 bits). | |
778 | The main problems are likely to arise in shifting or masking. | |
779 | C now supports a bit-field facility, which can be used to write much of | |
780 | this code in a reasonably portable way. | |
781 | Frequently, portability of such code can be enhanced by | |
782 | slight rearrangements in coding style. | |
783 | Many of the incompatibilities seem to have the flavor of writing | |
784 | .DS | |
785 | x &= 0177700 ; | |
786 | .DE | |
787 | to clear the low order six bits of \fIx\fR. | |
788 | This suffices on the PDP-11, but fails badly on GCOS and IBM. | |
789 | If the bit field feature cannot be used, the same effect can be obtained by | |
790 | writing | |
791 | .DS | |
792 | x &= \(ap 077 ; | |
793 | .DE | |
794 | which will work on all these machines. | |
795 | .PP | |
796 | The right shift operator is arithmetic shift on the PDP-11, and logical shift on most | |
797 | other machines. | |
798 | To obtain a logical shift on all machines, the left operand can be | |
799 | typed \fBunsigned\fR. | |
800 | Characters are considered signed integers on the PDP-11, and unsigned on the other machines. | |
801 | This persistence of the sign bit may be reasonably considered a bug in the PDP-11 hardware | |
802 | which has infiltrated itself into the C language. | |
803 | If there were a good way to discover the programs which would be affected, C could be changed; | |
804 | in any case, | |
805 | .I lint | |
806 | is no help here. | |
807 | .PP | |
808 | The above discussion may have made the problem of portability seem | |
809 | bigger than it in fact is. | |
810 | The issues involved here are rarely subtle or mysterious, at least to the | |
811 | implementor of the program, although they can involve some work to straighten out. | |
812 | The most serious bar to the portability of | |
813 | .UX | |
814 | system utilities has been the inability to mimic | |
815 | essential | |
816 | .UX | |
817 | system functions on the other systems. | |
818 | The inability to seek to a random character position in a text file, or to establish a pipe | |
819 | between processes, has involved far more rewriting | |
820 | and debugging than any of the differences in C compilers. | |
821 | On the other hand, | |
822 | .I lint | |
823 | has been very helpful | |
824 | in moving the | |
825 | .UX | |
826 | operating system and associated | |
827 | utility programs to other machines. | |
828 | .SH | |
829 | Shutting Lint Up | |
830 | .PP | |
831 | There are occasions when | |
832 | the programmer is smarter than | |
833 | .I lint . | |
834 | There may be valid reasons for ``illegal'' type casts, | |
835 | functions with a variable number of arguments, etc. | |
836 | Moreover, as specified above, the flow of control information | |
837 | produced by | |
838 | .I lint | |
839 | often has blind spots, causing occasional spurious | |
840 | messages about perfectly reasonable programs. | |
841 | Thus, some way of communicating with | |
842 | .I lint , | |
843 | typically to shut it up, is desirable. | |
844 | .PP | |
845 | The form which this mechanism should take is not at all clear. | |
846 | New keywords would require current and old compilers to | |
847 | recognize these keywords, if only to ignore them. | |
848 | This has both philosophical and practical problems. | |
849 | New preprocessor syntax suffers from similar problems. | |
850 | .PP | |
851 | What was finally done was to cause a number of words | |
852 | to be recognized by | |
853 | .I lint | |
854 | when they were embedded in comments. | |
855 | This required minimal preprocessor changes; | |
856 | the preprocessor just had to agree to pass comments | |
857 | through to its output, instead of deleting them | |
858 | as had been previously done. | |
859 | Thus, | |
860 | .I lint | |
861 | directives are invisible to the compilers, and | |
862 | the effect on systems with the older preprocessors | |
863 | is merely that the | |
864 | .I lint | |
865 | directives don't work. | |
866 | .PP | |
867 | The first directive is concerned with flow of control information; | |
868 | if a particular place in the program cannot be reached, | |
869 | but this is not apparent to | |
870 | .I lint , | |
871 | this can be asserted by the directive | |
872 | .DS | |
873 | /* NOTREACHED */ | |
874 | .DE | |
875 | at the appropriate spot in the program. | |
876 | Similarly, if it is desired to turn off | |
877 | strict type checking for | |
878 | the next expression, the directive | |
879 | .DS | |
880 | /* NOSTRICT */ | |
881 | .DE | |
882 | can be used; the situation reverts to the | |
883 | previous default after the next expression. | |
884 | The | |
885 | .B \-v | |
886 | flag can be turned on for one function by the directive | |
887 | .DS | |
888 | /* ARGSUSED */ | |
889 | .DE | |
890 | Complaints about variable number of arguments in calls to a function | |
891 | can be turned off by the directive | |
892 | .DS | |
893 | /* VARARGS */ | |
894 | .DE | |
895 | preceding the function definition. | |
896 | In some cases, it is desirable to check the | |
897 | first several arguments, and leave the later arguments unchecked. | |
898 | This can be done by following the VARARGS keyword immediately | |
899 | with a digit giving the number of arguments which should be checked; thus, | |
900 | .DS | |
901 | /* VARARGS2 */ | |
902 | .DE | |
903 | will cause the first two arguments to be checked, the others unchecked. | |
904 | Finally, the directive | |
905 | .DS | |
906 | /* LINTLIBRARY */ | |
907 | .DE | |
908 | at the head of a file identifies this file as | |
909 | a library declaration file; this topic is worth a | |
910 | section by itself. | |
911 | .SH | |
912 | Library Declaration Files | |
913 | .PP | |
914 | .I Lint | |
915 | accepts certain library directives, such as | |
916 | .DS | |
917 | \-ly | |
918 | .DE | |
919 | and tests the source files for compatibility with these libraries. | |
920 | This is done by accessing library description files whose | |
921 | names are constructed from the library directives. | |
922 | These files all begin with the directive | |
923 | .DS | |
924 | /* LINTLIBRARY */ | |
925 | .DE | |
926 | which is followed by a series of dummy function | |
927 | definitions. | |
928 | The critical parts of these definitions | |
929 | are the declaration of the function return type, | |
930 | whether the dummy function returns a value, and | |
931 | the number and types of arguments to the function. | |
932 | The VARARGS and ARGSUSED directives can | |
933 | be used to specify features of the library functions. | |
934 | .PP | |
935 | .I Lint | |
936 | library files are processed almost exactly like ordinary | |
937 | source files. | |
938 | The only difference is that functions which are defined on a library file, | |
939 | but are not used on a source file, draw no complaints. | |
940 | .I Lint | |
941 | does not simulate a full library search algorithm, | |
942 | and complains if the source files contain a redefinition of | |
943 | a library routine (this is a feature!). | |
944 | .PP | |
945 | By default, | |
946 | .I lint | |
947 | checks the programs it is given against a standard library | |
948 | file, which contains descriptions of the programs which | |
949 | are normally loaded when | |
950 | a C program | |
951 | is run. | |
952 | When the | |
953 | .B -p | |
954 | flag is in effect, another file is checked containing | |
955 | descriptions of the standard I/O library routines | |
956 | which are expected to be portable across various machines. | |
957 | The | |
958 | .B -n | |
959 | flag can be used to suppress all library checking. | |
960 | .SH | |
961 | Bugs, etc. | |
962 | .PP | |
963 | .I Lint | |
964 | was a difficult program to write, partially | |
965 | because it is closely connected with matters of programming style, | |
966 | and partially because users usually don't notice bugs which cause | |
967 | .I lint | |
968 | to miss errors which it should have caught. | |
969 | (By contrast, if | |
970 | .I lint | |
971 | incorrectly complains about something that is correct, the | |
972 | programmer reports that immediately!) | |
973 | .PP | |
974 | A number of areas remain to be further developed. | |
975 | The checking of structures and arrays is rather inadequate; | |
976 | size | |
977 | incompatibilities go unchecked, | |
978 | and no attempt is made to match up structure and union | |
979 | declarations across files. | |
980 | Some stricter checking of the use of the | |
981 | .B typedef | |
982 | is clearly desirable, but what checking is appropriate, and how | |
983 | to carry it out, is still to be determined. | |
984 | .PP | |
985 | .I Lint | |
986 | shares the preprocessor with the C compiler. | |
987 | At some point it may be appropriate for a | |
988 | special version of the preprocessor to be constructed | |
989 | which checks for things such as unused macro definitions, | |
990 | macro arguments which have side effects which are | |
991 | not expanded at all, or are expanded more than once, etc. | |
992 | .PP | |
993 | The central problem with | |
994 | .I lint | |
995 | is the packaging of the information which it collects. | |
996 | There are many options which | |
997 | serve only to turn off, or slightly modify, | |
998 | certain features. | |
999 | There are pressures to add even more of these options. | |
1000 | .PP | |
1001 | In conclusion, it appears that the general notion of having two | |
1002 | programs is a good one. | |
1003 | The compiler concentrates on quickly and accurately turning the | |
1004 | program text into bits which can be run; | |
1005 | .I lint | |
1006 | concentrates on issues | |
1007 | of portability, style, and efficiency. | |
1008 | .I Lint | |
1009 | can afford to be wrong, since incorrectness and over-conservatism | |
1010 | are merely annoying, not fatal. | |
1011 | The compiler can be fast since it knows that | |
1012 | .I lint | |
1013 | will cover its flanks. | |
1014 | Finally, the programmer can | |
1015 | concentrate at one stage | |
1016 | of the programming process solely on the algorithms, | |
1017 | data structures, and correctness of the | |
1018 | program, and then later retrofit, | |
1019 | with the aid of | |
1020 | .I lint , | |
1021 | the desirable properties of universality and portability. | |
1022 | .SG MH-1273-SCJ-unix | |
1023 | .bp | |
1024 | .[ | |
1025 | $LIST$ | |
1026 | .] | |
1027 | .bp | |
1028 | .SH | |
1029 | Appendix: Current Lint Options | |
1030 | .PP | |
1031 | The command currently has the form | |
1032 | .DS | |
1033 | lint\fR [\fB\-\fRoptions ] files... library-descriptors... | |
1034 | .DE | |
1035 | The options are | |
1036 | .IP \fBh\fR | |
1037 | Perform heuristic checks | |
1038 | .IP \fBp\fR | |
1039 | Perform portability checks | |
1040 | .IP \fBv\fR | |
1041 | Don't report unused arguments | |
1042 | .IP \fBu\fR | |
1043 | Don't report unused or undefined externals | |
1044 | .IP \fBb\fR | |
1045 | Report unreachable | |
1046 | .B break | |
1047 | statements. | |
1048 | .IP \fBx\fR | |
1049 | Report unused external declarations | |
1050 | .IP \fBa\fR | |
1051 | Report assignments of | |
1052 | .B long | |
1053 | to | |
1054 | .B int | |
1055 | or shorter. | |
1056 | .IP \fBc\fR | |
1057 | Complain about questionable casts | |
1058 | .IP \fBn\fR | |
1059 | No library checking is done | |
1060 | .IP \fBs\fR | |
1061 | Same as | |
1062 | .B h | |
1063 | (for historical reasons) |