Commit | Line | Data |
---|---|---|
7b089094 WJ |
1 | .\" Copyright (c) 1990 The Regents of the University of California. |
2 | .\" All rights reserved. | |
3 | .\" | |
4 | .\" Redistribution and use in source and binary forms, with or without | |
5 | .\" modification, are permitted provided that the following conditions | |
6 | .\" are met: | |
7 | .\" 1. Redistributions of source code must retain the above copyright | |
8 | .\" notice, this list of conditions and the following disclaimer. | |
9 | .\" 2. Redistributions in binary form must reproduce the above copyright | |
10 | .\" notice, this list of conditions and the following disclaimer in the | |
11 | .\" documentation and/or other materials provided with the distribution. | |
12 | .\" 3. All advertising materials mentioning features or use of this software | |
13 | .\" must display the following acknowledgement: | |
14 | .\" This product includes software developed by the University of | |
15 | .\" California, Berkeley and its contributors. | |
16 | .\" 4. Neither the name of the University nor the names of its contributors | |
17 | .\" may be used to endorse or promote products derived from this software | |
18 | .\" without specific prior written permission. | |
19 | .\" | |
20 | .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | |
21 | .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
22 | .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
23 | .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | |
24 | .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
25 | .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
26 | .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
27 | .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
28 | .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
29 | .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
30 | .\" SUCH DAMAGE. | |
31 | .\" | |
32 | .\" @(#)lex.1 5.13 (Berkeley) 7/24/91 | |
33 | .\" | |
34 | .Dd July 24, 1991 | |
35 | .Dt LEX 1 | |
36 | .Os | |
37 | .Sh NAME | |
38 | .Nm lex | |
39 | .Nd fast lexical analyzer generator | |
40 | .Sh SYNOPSIS | |
41 | .Nm lex | |
42 | .Oo | |
43 | .Op Fl bcdfinpstvFILT8 | |
44 | .Fl C Ns Ns Op Cm efmF | |
45 | .Fl S Ns Ns Ar skeleton | |
46 | .Oc | |
47 | .Op Ar | |
48 | .Sh DESCRIPTION | |
49 | .Nm Lex | |
50 | is a tool for generating | |
51 | .Ar scanners : | |
52 | programs which recognized lexical patterns in text. | |
53 | .Nm Lex | |
54 | reads | |
55 | the given input files, or its standard input if no file names are given, | |
56 | for a description of a scanner to generate. The description is in | |
57 | the form of pairs | |
58 | of regular expressions and C code, called | |
59 | .Em rules . | |
60 | .Nm Lex | |
61 | generates as output a C source file, | |
62 | .Pa lex.yy.c , | |
63 | which defines a routine | |
64 | .Fn yylex . | |
65 | This file is compiled and linked with the | |
66 | .Fl lfl | |
67 | library to produce an executable. When the executable is run, | |
68 | it analyzes its input for occurrences | |
69 | of the regular expressions. Whenever it finds one, it executes | |
70 | the corresponding C code. | |
71 | .Pp | |
72 | For full documentation, see | |
73 | .Em Lexdoc . | |
74 | This manual entry is intended for use as a quick reference. | |
75 | .Sh OPTIONS | |
76 | .Nm Lex | |
77 | has the following options: | |
78 | .Bl -tag -width Ds | |
79 | .It Fl b | |
80 | Generate backtracking information to | |
81 | .Va lex.backtrack . | |
82 | This is a list of scanner states which require backtracking | |
83 | and the input characters on which they do so. By adding rules one | |
84 | can remove backtracking states. If all backtracking states | |
85 | are eliminated and | |
86 | .Fl f | |
87 | or | |
88 | .Fl F | |
89 | is used, the generated scanner will run faster. | |
90 | .It Fl c | |
91 | is a do-nothing, deprecated option included for | |
92 | .Tn POSIX | |
93 | compliance. | |
94 | .Pp | |
95 | .Ar NOTE : | |
96 | in previous releases of | |
97 | .Nm Lex | |
98 | .Op Fl c | |
99 | specified table-compression options. This functionality is | |
100 | now given by the | |
101 | .Fl C | |
102 | flag. To ease the the impact of this change, when | |
103 | .Nm lex | |
104 | encounters | |
105 | .Fl c, | |
106 | it currently issues a warning message and assumes that | |
107 | .Fl C | |
108 | was desired instead. In the future this "promotion" of | |
109 | .Fl c | |
110 | to | |
111 | .Fl C | |
112 | will go away in the name of full | |
113 | .Tn POSIX | |
114 | compliance (unless | |
115 | the | |
116 | .Tn POSIX | |
117 | meaning is removed first). | |
118 | .It Fl d | |
119 | Makes the generated scanner run in | |
120 | .Ar debug | |
121 | mode. Whenever a pattern is recognized and the global | |
122 | .Va yy_Lex_debug | |
123 | is non-zero (which is the default), the scanner will | |
124 | write to | |
125 | .Li stderr | |
126 | a line of the form: | |
127 | .Pp | |
128 | .Dl --accepting rule at line 53 ("the matched text") | |
129 | .Pp | |
130 | The line number refers to the location of the rule in the file | |
131 | defining the scanner (i.e., the file that was fed to lex). Messages | |
132 | are also generated when the scanner backtracks, accepts the | |
133 | default rule, reaches the end of its input buffer (or encounters | |
134 | a | |
135 | .Tn NUL ; | |
136 | the two look the same as far as the scanner's concerned), | |
137 | or reaches an end-of-file. | |
138 | .It Fl f | |
139 | Specifies (take your pick) | |
140 | .Em full table | |
141 | or | |
142 | .Em fast scanner . | |
143 | No table compression is done. The result is large but fast. | |
144 | This option is equivalent to | |
145 | .Fl Cf | |
146 | (see below). | |
147 | .It Fl i | |
148 | Instructs | |
149 | .Nm lex | |
150 | to generate a | |
151 | .Em case-insensitive | |
152 | scanner. The case of letters given in the | |
153 | .Nm lex | |
154 | input patterns will | |
155 | be ignored, and tokens in the input will be matched regardless of case. The | |
156 | matched text given in | |
157 | .Va yytext | |
158 | will have the preserved case (i.e., it will not be folded). | |
159 | .It Fl n | |
160 | Is another do-nothing, deprecated option included only for | |
161 | .Tn POSIX | |
162 | compliance. | |
163 | .It Fl p | |
164 | Generates a performance report to stderr. The report | |
165 | consists of comments regarding features of the | |
166 | .Nm lex | |
167 | input file which will cause a loss of performance in the resulting scanner. | |
168 | .It Fl s | |
169 | Causes the | |
170 | .Ar default rule | |
171 | (that unmatched scanner input is echoed to | |
172 | .Ar stdout ) | |
173 | to be suppressed. If the scanner encounters input that does not | |
174 | match any of its rules, it aborts with an error. | |
175 | .It Fl t | |
176 | Instructs | |
177 | .Nm lex | |
178 | to write the scanner it generates to standard output instead | |
179 | of | |
180 | .Pa lex.yy.c . | |
181 | .It Fl v | |
182 | Specifies that | |
183 | .Nm lex | |
184 | should write to | |
185 | .Li stderr | |
186 | a summary of statistics regarding the scanner it generates. | |
187 | .It Fl F | |
188 | Specifies that the | |
189 | .Em fast | |
190 | scanner table representation should be used. This representation is | |
191 | about as fast as the full table representation | |
192 | .Pq Fl f , | |
193 | and for some sets of patterns will be considerably smaller (and for | |
194 | others, larger). See | |
195 | .Em Lexdoc | |
196 | for details. | |
197 | .Pp | |
198 | This option is equivalent to | |
199 | .Fl CF | |
200 | (see below). | |
201 | .It Fl I | |
202 | Instructs | |
203 | .Nm lex | |
204 | to generate an | |
205 | .Em interactive | |
206 | scanner, that is, a scanner which stops immediately rather than | |
207 | looking ahead if it knows | |
208 | that the currently scanned text cannot be part of a longer rule's match. | |
209 | Again, see | |
210 | .Em Lexdoc | |
211 | for details. | |
212 | .Pp | |
213 | Note, | |
214 | .Fl I | |
215 | cannot be used in conjunction with | |
216 | .Em full | |
217 | or | |
218 | .Em fast tables , | |
219 | i.e., the | |
220 | .Fl f , F , Cf , | |
221 | or | |
222 | .Fl CF | |
223 | flags. | |
224 | .It Fl L | |
225 | Instructs | |
226 | .Nm lex | |
227 | not to generate | |
228 | .Li #line | |
229 | directives in | |
230 | .Pa lex.yy.c . | |
231 | The default is to generate such directives so error | |
232 | messages in the actions will be correctly | |
233 | located with respect to the original | |
234 | .Nm lex | |
235 | input file, and not to | |
236 | the fairly meaningless line numbers of | |
237 | .Pa lex.yy.c . | |
238 | .It Fl T | |
239 | Makes | |
240 | .Nm lex | |
241 | run in | |
242 | .Em trace | |
243 | mode. It will generate a lot of messages to | |
244 | .Li stdout | |
245 | concerning | |
246 | the form of the input and the resultant non-deterministic and deterministic | |
247 | finite automata. This option is mostly for use in maintaining | |
248 | .Nm lex . | |
249 | .It Fl 8 | |
250 | Instructs | |
251 | .Nm lex | |
252 | to generate an 8-bit scanner. | |
253 | On some sites, this is the default. On others, the default | |
254 | is 7-bit characters. To see which is the case, check the verbose | |
255 | .Pq Fl v | |
256 | output for "equivalence classes created". If the denominator of | |
257 | the number shown is 128, then by default | |
258 | .Nm lex | |
259 | is generating 7-bit characters. If it is 256, then the default is | |
260 | 8-bit characters. | |
261 | .It Fl C Ns Op Cm efmF | |
262 | Controls the degree of table compression. The default setting is | |
263 | .Fl Cem . | |
264 | .Pp | |
265 | .Bl -tag -width Ds | |
266 | .It Fl C | |
267 | A lone | |
268 | .Fl C | |
269 | specifies that the scanner tables should be compressed but neither | |
270 | equivalence classes nor meta-equivalence classes should be used. | |
271 | .It Fl \&Ce | |
272 | Directs | |
273 | .Nm lex | |
274 | to construct | |
275 | .Em equivalence classes , | |
276 | i.e., sets of characters | |
277 | which have identical lexical properties. | |
278 | Equivalence classes usually give | |
279 | dramatic reductions in the final table/object file sizes (typically | |
280 | a factor of 2-5) and are pretty cheap performance-wise (one array | |
281 | look-up per character scanned). | |
282 | .It Fl \&Cf | |
283 | Specifies that the | |
284 | .Em full | |
285 | scanner tables should be generated - | |
286 | .Nm lex | |
287 | should not compress the | |
288 | tables by taking advantages of similar transition functions for | |
289 | different states. | |
290 | .It Fl \&CF | |
291 | Specifies that the alternate fast scanner representation (described in | |
292 | .Em Lexdoc ) | |
293 | should be used. | |
294 | .It Fl \&Cm | |
295 | Directs | |
296 | .Nm lex | |
297 | to construct | |
298 | .Em meta-equivalence classes , | |
299 | which are sets of equivalence classes (or characters, if equivalence | |
300 | classes are not being used) that are commonly used together. Meta-equivalence | |
301 | classes are often a big win when using compressed tables, but they | |
302 | have a moderate performance impact (one or two "if" tests and one | |
303 | array look-up per character scanned). | |
304 | .It Fl Cem | |
305 | (Default) | |
306 | Generate both equivalence classes | |
307 | and meta-equivalence classes. This setting provides the highest | |
308 | degree of table compression. | |
309 | .El | |
310 | .Pp | |
311 | Faster-executing scanners can be traded off at the | |
312 | cost of larger tables with | |
313 | the following generally being true: | |
314 | .Bd -ragged -offset center | |
315 | slowest & smallest | |
316 | -Cem | |
317 | -Cm | |
318 | -Ce | |
319 | -C | |
320 | -C{f,F}e | |
321 | -C{f,F} | |
322 | fastest & largest | |
323 | .Ed | |
324 | .Pp | |
325 | .Fl C | |
326 | options are not cumulative; whenever the flag is encountered, the | |
327 | previous -C settings are forgotten. | |
328 | .Pp | |
329 | The options | |
330 | .Fl \&Cf | |
331 | or | |
332 | .Fl \&CF | |
333 | and | |
334 | .Fl \&Cm | |
335 | do not make sense together - there is no opportunity for meta-equivalence | |
336 | classes if the table is not being compressed. Otherwise the options | |
337 | may be freely mixed. | |
338 | .It Fl S Ns Ar skeleton_file | |
339 | Overrides the default skeleton file from which | |
340 | .Nm lex | |
341 | constructs its scanners. Useful for | |
342 | .Nm lex | |
343 | maintenance or development. | |
344 | .El | |
345 | .Sh SUMMARY OF LEX REGULAR EXPRESSIONS | |
346 | The patterns in the input are written using an extended set of regular | |
347 | expressions. These are: | |
348 | .Pp | |
349 | .Bl -tag -width 10n -compact | |
350 | .It Li x | |
351 | Match the character 'x'. | |
352 | .It Li \&. | |
353 | Any character except newline. | |
354 | .It Op Li xyz | |
355 | A "character class"; in this case, the pattern | |
356 | matches either an 'x', a 'y', or a 'z'. | |
357 | .It Op Li abj-oZ | |
358 | A "character class" with a range in it; matches | |
359 | an 'a', a 'b', any letter from 'j' through 'o', | |
360 | or a 'Z'. | |
361 | .It Op Li ^A-Z | |
362 | A "negated character class", i.e., any character | |
363 | but those in the class. In this case, any | |
364 | character | |
365 | .Em except | |
366 | an uppercase letter. | |
367 | .It Op Li ^A-Z\en | |
368 | Any character | |
369 | .Em except | |
370 | an uppercase letter or | |
371 | a newline. | |
372 | .It Li r* | |
373 | Zero or more r's, where r is any regular expression. | |
374 | .It Li r+ | |
375 | One or more r's. | |
376 | .It Li r? | |
377 | Zero or one r's (that is, "an optional r"). | |
378 | .It Li r{2,5} | |
379 | Anywhere from two to five r's. | |
380 | .It Li r{2,} | |
381 | Two or more r's. | |
382 | .It Li r{4} | |
383 | Exactly 4 r's. | |
384 | .It Li {name} | |
385 | The expansion of the "name" definition | |
386 | (see above). | |
387 | .It Xo | |
388 | .Oo Li xyz Oc Ns Li "\e\&\*qfoo" | |
389 | .Xc | |
390 | The literal string: | |
391 | [xyz]\*qfoo. | |
392 | .It Li \&\eX | |
393 | If X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', | |
394 | then the | |
395 | .Tn ANSI-C | |
396 | interpretation of \ex. | |
397 | Otherwise, a literal 'X' (used to escape | |
398 | operators such as '*'). | |
399 | .It Li \&\e123 | |
400 | The character with octal value 123. | |
401 | .It Li \&\ex2a | |
402 | The character with hexadecimal value 2a. | |
403 | .It Li (r) | |
404 | Match an r; parentheses are used to override | |
405 | precedence (see below). | |
406 | .It Li rs | |
407 | The regular expression r followed by the | |
408 | regular expression s; called "concatenation". | |
409 | .It Li rs | |
410 | Either an r or an s. | |
411 | .It Li r/s | |
412 | An r but only if it is followed by an s. The | |
413 | s is not part of the matched text. This type | |
414 | of pattern is called as "trailing context". | |
415 | .It Li \&^r | |
416 | An r, but only at the beginning of a line. | |
417 | .It Li r$ | |
418 | An r, but only at the end of a line. Equivalent | |
419 | to "r/\en". | |
420 | .It Li <s>r | |
421 | An r, but only in start condition s (see | |
422 | below for discussion of start conditions). | |
423 | .It Li <s1,s2,s3>r | |
424 | Same, but in any of start conditions s1, | |
425 | s2, or s3. | |
426 | .It Li <<EOF>> | |
427 | An end-of-file. | |
428 | .It Li <s1,s2><<EOF>> | |
429 | An end-of-file when in start condition s1 or s2. | |
430 | .El | |
431 | The regular expressions listed above are grouped according to | |
432 | precedence, from highest precedence at the top to lowest at the bottom. | |
433 | Those grouped together have equal precedence. | |
434 | .Pp | |
435 | Some notes on patterns: | |
436 | .Pp | |
437 | Negated character classes | |
438 | .Ar match newlines | |
439 | unless "\en" (or an equivalent escape sequence) is one of the | |
440 | characters explicitly present in the negated character class | |
441 | (e.g., " [^A-Z\en] "). | |
442 | .Pp | |
443 | A rule can have at most one instance of trailing context (the '/' operator | |
444 | or the '$' operator). The start condition, '^', and "<<EOF>>" patterns | |
445 | can only occur at the beginning of a pattern, and, as well as with '/' and '$', | |
446 | cannot be grouped inside parentheses. The following are all illegal: | |
447 | .Pp | |
448 | .Bd -literal -offset indent | |
449 | foo/bar$ | |
450 | foo(bar$) | |
451 | foo^bar | |
452 | <sc1>foo<sc2>bar | |
453 | .Ed | |
454 | .Sh SUMMARY OF SPECIAL ACTIONS | |
455 | In addition to arbitrary C code, the following can appear in actions: | |
456 | .Bl -tag -width Fl | |
457 | .It Ic ECHO | |
458 | Copies | |
459 | .Va yytext | |
460 | to the scanner's output. | |
461 | .It Ic BEGIN | |
462 | Followed by the name of a start condition places the scanner in the | |
463 | corresponding start condition. | |
464 | .It Ic REJECT | |
465 | Directs the scanner to proceed on to the "second best" rule which matched the | |
466 | input (or a prefix of the input). | |
467 | .Va yytext | |
468 | and | |
469 | .Va yyleng | |
470 | are set up appropriately. Note that | |
471 | .Ic REJECT | |
472 | is a particularly expensive feature in terms scanner performance; | |
473 | if it is used in | |
474 | .Em any | |
475 | of the scanner's actions it will slow down | |
476 | .Em all | |
477 | of the scanner's matching. Furthermore, | |
478 | .Ic REJECT | |
479 | cannot be used with the | |
480 | .Fl f | |
481 | or | |
482 | .Fl F | |
483 | options. | |
484 | .Pp | |
485 | Note also that unlike the other special actions, | |
486 | .Ic REJECT | |
487 | is a | |
488 | .Em branch ; | |
489 | code immediately following it in the action will | |
490 | .Em not | |
491 | be executed. | |
492 | .It Fn yymore | |
493 | tells the scanner that the next time it matches a rule, the corresponding | |
494 | token should be | |
495 | .Em appended | |
496 | onto the current value of | |
497 | .Va yytext | |
498 | rather than replacing it. | |
499 | .It Fn yyless \&n | |
500 | returns all but the first | |
501 | .Ar n | |
502 | characters of the current token back to the input stream, where they | |
503 | will be rescanned when the scanner looks for the next match. | |
504 | .Va yytext | |
505 | and | |
506 | .Va yyleng | |
507 | are adjusted appropriately (e.g., | |
508 | .Va yyleng | |
509 | will now be equal to | |
510 | .Ar n ) . | |
511 | .It Fn unput c | |
512 | puts the character | |
513 | .Ar c | |
514 | back onto the input stream. It will be the next character scanned. | |
515 | .It Fn input | |
516 | reads the next character from the input stream (this routine is called | |
517 | .Fn yyinput | |
518 | if the scanner is compiled using | |
519 | .Em C \&+\&+ ) . | |
520 | .It Fn yyterminate | |
521 | can be used in lieu of a return statement in an action. It terminates | |
522 | the scanner and returns a 0 to the scanner's caller, indicating "all done". | |
523 | .Pp | |
524 | By default, | |
525 | .Fn yyterminate | |
526 | is also called when an end-of-file is encountered. It is a macro and | |
527 | may be redefined. | |
528 | .It Ic YY_NEW_FILE | |
529 | is an action available only in <<EOF>> rules. It means "Okay, I've | |
530 | set up a new input file, continue scanning". | |
531 | .It Fn yy_create_buffer file size | |
532 | takes a | |
533 | .Ic FILE | |
534 | pointer and an integer | |
535 | .Ar size . | |
536 | It returns a YY_BUFFER_STATE | |
537 | handle to a new input buffer large enough to accomodate | |
538 | .Ar size | |
539 | characters and associated with the given file. When in doubt, use | |
540 | .Ar YY_BUF_SIZE | |
541 | for the size. | |
542 | .It Fn yy_switch_to_buffer new_buffer | |
543 | switches the scanner's processing to scan for tokens from | |
544 | the given buffer, which must be a YY_BUFFER_STATE. | |
545 | .It Fn yy_delete_buffer buffer | |
546 | deletes the given buffer. | |
547 | .El | |
548 | .Sh VALUES AVAILABLE TO THE USER | |
549 | .Bl -tag -width Fl | |
550 | .It Va char \&*yytext | |
551 | holds the text of the current token. It may not be modified. | |
552 | .It Va int yyleng | |
553 | holds the length of the current token. It may not be modified. | |
554 | .It Va FILE \&*yyin | |
555 | is the file which by default | |
556 | .Nm lex | |
557 | reads from. It may be redefined but doing so only makes sense before | |
558 | scanning begins. Changing it in the middle of scanning will have | |
559 | unexpected results since | |
560 | .Nm lex | |
561 | buffers its input. Once scanning terminates because an end-of-file | |
562 | has been seen, | |
563 | .Fn "void yyrestart" "FILE *new_file" | |
564 | may be called to point | |
565 | .Va yyin | |
566 | at the new input file. | |
567 | .It Va FILE \&*yyout | |
568 | is the file to which | |
569 | .Ar ECHO | |
570 | actions are done. It can be reassigned by the user. | |
571 | .It Va YY_CURRENT_BUFFER | |
572 | returns a | |
573 | YY_BUFFER_STATE | |
574 | handle to the current buffer. | |
575 | .El | |
576 | .Sh MACROS THE USER CAN REDEFINE | |
577 | .Bl -tag -width Fl | |
578 | .It Va YY_DECL | |
579 | controls how the scanning routine is declared. | |
580 | By default, it is "int yylex()", or, if prototypes are being | |
581 | used, "int yylex(void)". This definition may be changed by redefining | |
582 | the "YY_DECL" macro. Note that | |
583 | if you give arguments to the scanning routine using a | |
584 | K&R-style/non-prototyped function declaration, you must terminate | |
585 | the definition with a semi-colon (;). | |
586 | .It Va YY_INPUT | |
587 | The nature of how the scanner | |
588 | gets its input can be controlled by redefining the | |
589 | YY_INPUT | |
590 | macro. | |
591 | YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its | |
592 | action is to place up to | |
593 | .Ar max _size | |
594 | characters in the character array | |
595 | .Ar buf | |
596 | and return in the integer variable | |
597 | .Ar result | |
598 | either the | |
599 | number of characters read or the constant YY_NULL (0 on Unix systems) | |
600 | to indicate EOF. The default YY_INPUT reads from the | |
601 | global file-pointer "yyin". | |
602 | A sample redefinition of YY_INPUT (in the definitions | |
603 | section of the input file): | |
604 | .Bd -literal -offset indent | |
605 | %{ | |
606 | #undef YY_INPUT | |
607 | #define YY_INPUT(buf,result,max_size) \e | |
608 | result = ((buf[0] = getchar()) == EOF) ? YY_NULL : 1; | |
609 | %} | |
610 | .Ed | |
611 | .It Va YY_INPUT | |
612 | When the scanner receives an end-of-file indication from YY_INPUT, | |
613 | it then checks the | |
614 | .Fn yywrap | |
615 | function. If | |
616 | .Fn yywrap | |
617 | returns false (zero), then it is assumed that the | |
618 | function has gone ahead and set up | |
619 | .Va yyin | |
620 | to point to another input file, and scanning continues. If it returns | |
621 | true (non-zero), then the scanner terminates, returning 0 to its | |
622 | caller. | |
623 | .It Va yywrap | |
624 | The default | |
625 | .Fn yywrap | |
626 | always returns 1. Presently, to redefine it you must first | |
627 | "#undef yywrap", as it is currently implemented as a macro. It is | |
628 | likely that | |
629 | .Fn yywrap | |
630 | will soon be defined to be a function rather than a macro. | |
631 | .It Va YY_USER_ACTION | |
632 | can be redefined to provide an action | |
633 | which is always executed prior to the matched rule's action. | |
634 | .It Va YY_USER_INIT | |
635 | The macro | |
636 | .Va YY _USER_INIT | |
637 | may be redefined to provide an action which is always executed before | |
638 | the first scan. | |
639 | .It Va YY_BREAK | |
640 | In the generated scanner, the actions are all gathered in one large | |
641 | switch statement and separated using | |
642 | .Va YY _BREAK , | |
643 | which may be redefined. By default, it is simply a "break", to separate | |
644 | each rule's action from the following rule's. | |
645 | .El | |
646 | .Sh FILES | |
647 | .Bl -tag -width lex.backtrack -compact | |
648 | .It Pa lex.skel | |
649 | skeleton scanner. | |
650 | .It Pa lex.yy.c | |
651 | generated scanner | |
652 | (called | |
653 | .Pa lexyy.c | |
654 | on some systems). | |
655 | .It Pa lex.backtrack | |
656 | backtracking information for | |
657 | .Fl b | |
658 | .It Pa flag | |
659 | (called | |
660 | .Pa lex.bck | |
661 | on some systems). | |
662 | .El | |
663 | .Sh SEE ALSO | |
664 | .Xr lex 1 , | |
665 | .Xr yacc 1 , | |
666 | .Xr sed 1 , | |
667 | .Xr awk 1 . | |
668 | .Rs | |
669 | .%T "lexdoc" | |
670 | .Re | |
671 | .Rs | |
672 | .%A M. E. Lesk | |
673 | .%A E. Schmidt | |
674 | .%T "LEX \- Lexical Analyzer Generator" | |
675 | .Re | |
676 | .Sh DIAGNOSTICS | |
677 | .Bl -tag -width Fl | |
678 | .It Li reject_used_but_not_detected undefined | |
679 | or | |
680 | .It Li yymore_used_but_not_detected undefined | |
681 | These errors can occur at compile time. | |
682 | They indicate that the | |
683 | scanner uses | |
684 | .Ic REJECT | |
685 | or | |
686 | .Fn yymore | |
687 | but that | |
688 | .Nm lex | |
689 | failed to notice the fact, | |
690 | meaning that | |
691 | .Nm lex | |
692 | scanned the first two sections looking for occurrences of these actions | |
693 | and failed to find any, | |
694 | but somehow you snuck some in via a #include | |
695 | file, | |
696 | for example . | |
697 | Make an explicit reference to the action in your | |
698 | .Nm lex | |
699 | input file. | |
700 | Note that previously | |
701 | .Nm lex | |
702 | supported a | |
703 | .Li %used/%unused | |
704 | mechanism for dealing with this problem; | |
705 | this feature is still supported | |
706 | but now deprecated, | |
707 | and will go away soon unless the author hears from | |
708 | people who can argue compellingly that they need it. | |
709 | .It Li lex scanner jammed | |
710 | a scanner compiled with | |
711 | .Fl s | |
712 | has encountered an input string which wasn't matched by | |
713 | any of its rules. | |
714 | .It Li lex input buffer overflowed | |
715 | a scanner rule matched a string long enough to overflow the | |
716 | scanner's internal input buffer 16K bytes - controlled by | |
717 | .Va YY_BUF_MAX | |
718 | in | |
719 | .Pa lex.skel . | |
720 | .It Li scanner requires \&\-8 flag | |
721 | Your scanner specification includes recognizing 8-bit characters and | |
722 | you did not specify the -8 flag and your site has not installed lex | |
723 | with -8 as the default . | |
724 | .It Li too many \&%t classes! | |
725 | You managed to put every single character into its own %t class. | |
726 | .Nm Lex | |
727 | requires that at least one of the classes share characters. | |
728 | .El | |
729 | .Sh HISTORY | |
730 | A | |
731 | .Nm lex | |
732 | appeared in | |
733 | .At v6 . | |
734 | The version this man page describes is | |
735 | derived from code contributed by Vern Paxson. | |
736 | .Sh AUTHOR | |
737 | Vern Paxson, with the help of many ideas and much inspiration from | |
738 | Van Jacobson. Original version by Jef Poskanzer. | |
739 | .Pp | |
740 | See | |
741 | .%T "Lexdoc" | |
742 | for additional credits and the address to send comments to. | |
743 | .Sh BUGS | |
744 | .Pp | |
745 | Some trailing context | |
746 | patterns cannot be properly matched and generate | |
747 | warning messages ("Dangerous trailing context"). These are | |
748 | patterns where the ending of the | |
749 | first part of the rule matches the beginning of the second | |
750 | part, such as "zx*/xy*", where the 'x*' matches the 'x' at | |
751 | the beginning of the trailing context. (Note that the | |
752 | .Tn POSIX | |
753 | draft | |
754 | states that the text matched by such patterns is undefined.) | |
755 | .Pp | |
756 | For some trailing context rules, parts which are actually fixed-length are | |
757 | not recognized as such, leading to the abovementioned performance loss. | |
758 | In particular, parts using '\&|' or {n} (such as "foo{3}") are always | |
759 | considered variable-length. | |
760 | .Pp | |
761 | Combining trailing context with the special '\&|' action can result in | |
762 | .Em fixed | |
763 | trailing context being turned into the more expensive | |
764 | .Em variable | |
765 | trailing context. This happens in the following example: | |
766 | .Bd -literal -offset indent | |
767 | %% | |
768 | abc \&| | |
769 | xyz/def | |
770 | .Ed | |
771 | .Pp | |
772 | Use of | |
773 | .Fn unput | |
774 | invalidates yytext and yyleng. | |
775 | .Pp | |
776 | Use of | |
777 | .Fn unput | |
778 | to push back more text than was matched can | |
779 | result in the pushed-back text matching a beginning-of-line ('^') | |
780 | rule even though it didn't come at the beginning of the line | |
781 | (though this is rare!). | |
782 | .Pp | |
783 | Pattern-matching of | |
784 | .Tn NUL Ns 's | |
785 | is substantially slower than matching other | |
786 | characters. | |
787 | .Pp | |
788 | .Nm Lex | |
789 | does not generate correct #line directives for code internal | |
790 | to the scanner; thus, bugs in | |
791 | .Pa lex.skel | |
792 | yield bogus line numbers. | |
793 | .Pp | |
794 | Due to both buffering of input and read-ahead, you cannot intermix | |
795 | calls to | |
796 | .Aq Pa stdio.h | |
797 | routines, such as, for example, | |
798 | .Fn getchar , | |
799 | with | |
800 | .Nm lex | |
801 | rules and expect it to work. Call | |
802 | .Fn input | |
803 | instead. | |
804 | .Pp | |
805 | The total table entries listed by the | |
806 | .Fl v | |
807 | flag excludes the number of table entries needed to determine | |
808 | what rule has been matched. The number of entries is equal | |
809 | to the number of | |
810 | .Tn DFA | |
811 | states if the scanner does not use | |
812 | .Ic REJECT , | |
813 | and somewhat greater than the number of states if it does. | |
814 | .Pp | |
815 | .Ic REJECT | |
816 | cannot be used with the | |
817 | .Fl f | |
818 | or | |
819 | .Fl F | |
820 | options. | |
821 | .Pp | |
822 | Some of the macros, such as | |
823 | .Fn yywrap , | |
824 | may in the future become functions which live in the | |
825 | .Fl lfl | |
826 | library. This will doubtless break a lot of code, but may be | |
827 | required for | |
828 | .Tn POSIX Ns \-compliance . | |
829 | .Pp | |
830 | The | |
831 | .Nm lex | |
832 | internal algorithms need documentation. |