Commit | Line | Data |
---|---|---|
1c15e888 C |
1 | .\" Copyright (c) 1990 The Regents of the University of California. |
2 | .\" All rights reserved. | |
3 | .\" | |
4 | .\" Redistribution and use in source and binary forms are permitted provided | |
5 | .\" that: (1) source distributions retain this entire copyright notice and | |
6 | .\" comment, and (2) distributions including binaries display the following | |
7 | .\" acknowledgement: ``This product includes software developed by the | |
8 | .\" University of California, Berkeley and its contributors'' in the | |
9 | .\" documentation or other materials provided with the distribution and in | |
10 | .\" all advertising materials mentioning features or use of this software. | |
11 | .\" Neither the name of the University nor the names of its contributors may | |
12 | .\" be used to endorse or promote products derived from this software without | |
13 | .\" specific prior written permission. | |
14 | .\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED | |
15 | .\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF | |
16 | .\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. | |
17 | .\" | |
18 | .\" @(#)lex.1 5.10 (Berkeley) 7/24/90 | |
19 | .\" | |
20 | .Dd July 24, 1990 | |
21 | .Dt LEX 1 | |
22 | .Sh NAME | |
23 | .Nm lex | |
24 | .Nd fast lexical analyzer generator | |
25 | .Sh SYNOPSIS | |
26 | .Nm lex | |
27 | .Ob | |
28 | .Op Fl bcdfinpstvFILT8 | |
29 | .Cx Fl C | |
30 | .Op efmF | |
31 | .Cx | |
32 | .Cx Fl S | |
33 | .Ar skeleton | |
34 | .Cx | |
35 | .Oe | |
36 | .Nm lex | |
37 | .Ar | |
38 | .Sh DESCRIPTION | |
39 | .Nm Lex | |
40 | is a tool for generating | |
41 | .Ar scanners : | |
42 | programs which recognized lexical patterns in text. | |
43 | .Nm Lex | |
44 | reads | |
45 | the given input files, or its standard input if no file names are given, | |
46 | for a description of a scanner to generate. The description is in | |
47 | the form of pairs | |
48 | of regular expressions and C code, called | |
49 | .Em rules . | |
50 | .Nm Lex | |
51 | generates as output a C source file, | |
52 | .Pa lex.yy.c , | |
53 | which defines a routine | |
54 | .Fn yylex . | |
55 | This file is compiled and linked with the | |
56 | .Fl lfl | |
57 | library to produce an executable. When the executable is run, | |
58 | it analyzes its input for occurrences | |
59 | of the regular expressions. Whenever it finds one, it executes | |
60 | the corresponding C code. | |
61 | .Pp | |
62 | For full documentation, see | |
63 | .Em Lexdoc . | |
64 | This manual entry is intended for use as a quick reference. | |
65 | .Sh OPTIONS | |
66 | .Nm Lex | |
67 | has the following options: | |
68 | .Tw Ds | |
69 | .Tp Fl b | |
70 | Generate backtracking information to | |
71 | .Va lex.backtrack . | |
72 | This is a list of scanner states which require backtracking | |
73 | and the input characters on which they do so. By adding rules one | |
74 | can remove backtracking states. If all backtracking states | |
75 | are eliminated and | |
76 | .Fl f | |
77 | or | |
78 | .Fl F | |
79 | is used, the generated scanner will run faster. | |
80 | .Tp Fl c | |
81 | is a do-nothing, deprecated option included for POSIX compliance. | |
82 | .Pp | |
83 | .Ar NOTE : | |
84 | in previous releases of | |
85 | .Nm Lex | |
86 | .Op Fl c | |
87 | specified table-compression options. This functionality is | |
88 | now given by the | |
89 | .Fl C | |
90 | flag. To ease the the impact of this change, when | |
91 | .Nm lex | |
92 | encounters | |
93 | .Fl c, | |
94 | it currently issues a warning message and assumes that | |
95 | .Fl C | |
96 | was desired instead. In the future this "promotion" of | |
97 | .Fl c | |
98 | to | |
99 | .Fl C | |
100 | will go away in the name of full POSIX compliance (unless | |
101 | the POSIX meaning is removed first). | |
102 | .Tp Fl d | |
103 | makes the generated scanner run in | |
104 | .Ar debug | |
105 | mode. Whenever a pattern is recognized and the global | |
106 | .Va yy_Lex_debug | |
107 | is non-zero (which is the default), the scanner will | |
108 | write to | |
109 | .Li stderr | |
110 | a line of the form: | |
111 | .Pp | |
112 | .Dl --accepting rule at line 53 ("the matched text") | |
113 | .Pp | |
114 | The line number refers to the location of the rule in the file | |
115 | defining the scanner (i.e., the file that was fed to lex). Messages | |
116 | are also generated when the scanner backtracks, accepts the | |
117 | default rule, reaches the end of its input buffer (or encounters | |
118 | a NUL; the two look the same as far as the scanner's concerned), | |
119 | or reaches an end-of-file. | |
120 | .Tp Fl f | |
121 | specifies (take your pick) | |
122 | .Em full table | |
123 | or | |
124 | .Em fast scanner . | |
125 | No table compression is done. The result is large but fast. | |
126 | This option is equivalent to | |
127 | .Fl Cf | |
128 | (see below). | |
129 | .Tp Fl i | |
130 | instructs | |
131 | .Nm lex | |
132 | to generate a | |
133 | .Em case-insensitive | |
134 | scanner. The case of letters given in the | |
135 | .Nm lex | |
136 | input patterns will | |
137 | be ignored, and tokens in the input will be matched regardless of case. The | |
138 | matched text given in | |
139 | .Va yytext | |
140 | will have the preserved case (i.e., it will not be folded). | |
141 | .Tp Fl n | |
142 | is another do-nothing, deprecated option included only for | |
143 | POSIX compliance. | |
144 | .Tp Fl p | |
145 | generates a performance report to stderr. The report | |
146 | consists of comments regarding features of the | |
147 | .Nm lex | |
148 | input file which will cause a loss of performance in the resulting scanner. | |
149 | .Tp Fl s | |
150 | causes the | |
151 | .Ar default rule | |
152 | (that unmatched scanner input is echoed to | |
153 | .Ar stdout ) | |
154 | to be suppressed. If the scanner encounters input that does not | |
155 | match any of its rules, it aborts with an error. | |
156 | .Tp Fl t | |
157 | instructs | |
158 | .Nm lex | |
159 | to write the scanner it generates to standard output instead | |
160 | of | |
161 | .Pa lex.yy.c . | |
162 | .Tp Fl v | |
163 | specifies that | |
164 | .Nm lex | |
165 | should write to | |
166 | .Li stderr | |
167 | a summary of statistics regarding the scanner it generates. | |
168 | .Tp Fl F | |
169 | specifies that the | |
170 | .Em fast | |
171 | scanner table representation should be used. This representation is | |
172 | about as fast as the full table representation | |
173 | .Pq Fl f , | |
174 | and for some sets of patterns will be considerably smaller (and for | |
175 | others, larger). See | |
176 | .Em Lexdoc | |
177 | for details. | |
178 | .Pp | |
179 | This option is equivalent to | |
180 | .Fl CF | |
181 | (see below). | |
182 | .Tp Fl I | |
183 | instructs | |
184 | .Nm lex | |
185 | to generate an | |
186 | .Em interactive | |
187 | scanner, that is, a scanner which stops immediately rather than | |
188 | looking ahead if it knows | |
189 | that the currently scanned text cannot be part of a longer rule's match. | |
190 | Again, see | |
191 | .Em Lexdoc | |
192 | for details. | |
193 | .Pp | |
194 | Note, | |
195 | .Fl I | |
196 | cannot be used in conjunction with | |
197 | .Em full | |
198 | or | |
199 | .Em fast tables , | |
200 | i.e., the | |
201 | .Fl f , F , Cf , | |
202 | or | |
203 | .Fl CF | |
204 | flags. | |
205 | .Tp Fl L | |
206 | instructs | |
207 | .Nm lex | |
208 | not to generate | |
209 | .Li #line | |
210 | directives in | |
211 | .Pa lex.yy.c . | |
212 | The default is to generate such directives so error | |
213 | messages in the actions will be correctly | |
214 | located with respect to the original | |
215 | .Nm lex | |
216 | input file, and not to | |
217 | the fairly meaningless line numbers of | |
218 | .Pa lex.yy.c . | |
219 | .Tp Fl T | |
220 | makes | |
221 | .Nm lex | |
222 | run in | |
223 | .Em trace | |
224 | mode. It will generate a lot of messages to | |
225 | .Li stdout | |
226 | concerning | |
227 | the form of the input and the resultant non-deterministic and deterministic | |
228 | finite automata. This option is mostly for use in maintaining | |
229 | .Nm lex . | |
230 | .Tp Fl 8 | |
231 | instructs | |
232 | .Nm lex | |
233 | to generate an 8-bit scanner. | |
234 | On some sites, this is the default. On others, the default | |
235 | is 7-bit characters. To see which is the case, check the verbose | |
236 | .Pq Fl v | |
237 | output for "equivalence classes created". If the denominator of | |
238 | the number shown is 128, then by default | |
239 | .Nm lex | |
240 | is generating 7-bit characters. If it is 256, then the default is | |
241 | 8-bit characters. | |
242 | .Tc Fl C | |
243 | .Op Cm efmF | |
244 | .Cx | |
245 | controls the degree of table compression. The default setting is | |
246 | .Fl Cem . | |
247 | .Pp | |
248 | .Tw Ds | |
249 | .Tp Fl C | |
250 | A lone | |
251 | .Fl C | |
252 | specifies that the scanner tables should be compressed but neither | |
253 | equivalence classes nor meta-equivalence classes should be used. | |
254 | .Tp Fl \&Ce | |
255 | directs | |
256 | .Nm lex | |
257 | to construct | |
258 | .Em equivalence classes , | |
259 | i.e., sets of characters | |
260 | which have identical lexical properties. | |
261 | Equivalence classes usually give | |
262 | dramatic reductions in the final table/object file sizes (typically | |
263 | a factor of 2-5) and are pretty cheap performance-wise (one array | |
264 | look-up per character scanned). | |
265 | .Tp Fl \&Cf | |
266 | specifies that the | |
267 | .Em full | |
268 | scanner tables should be generated - | |
269 | .Nm lex | |
270 | should not compress the | |
271 | tables by taking advantages of similar transition functions for | |
272 | different states. | |
273 | .Tp Fl \&CF | |
274 | specifies that the alternate fast scanner representation (described in | |
275 | .Em Lexdoc ) | |
276 | should be used. | |
277 | .Tp Fl \&Cm | |
278 | directs | |
279 | .Nm lex | |
280 | to construct | |
281 | .Em meta-equivalence classes , | |
282 | which are sets of equivalence classes (or characters, if equivalence | |
283 | classes are not being used) that are commonly used together. Meta-equivalence | |
284 | classes are often a big win when using compressed tables, but they | |
285 | have a moderate performance impact (one or two "if" tests and one | |
286 | array look-up per character scanned). | |
287 | .Tp Fl Cem | |
288 | (default) | |
289 | Generate both equivalence classes | |
290 | and meta-equivalence classes. This setting provides the highest | |
291 | degree of table compression. | |
292 | .Tp | |
293 | .Pp | |
294 | Faster-executing scanners can be traded off at the cost of larger tables with | |
295 | the following generally being true: | |
296 | .Pp | |
297 | .Ds C | |
298 | slowest & smallest | |
299 | -Cem | |
300 | -Cm | |
301 | -Ce | |
302 | -C | |
303 | -C{f,F}e | |
304 | -C{f,F} | |
305 | fastest & largest | |
306 | .De | |
307 | .Pp | |
308 | .Fl C | |
309 | options are not cumulative; whenever the flag is encountered, the | |
310 | previous -C settings are forgotten. | |
311 | .Pp | |
312 | The options | |
313 | .Fl \&Cf | |
314 | or | |
315 | .Fl \&CF | |
316 | and | |
317 | .Fl \&Cm | |
318 | do not make sense together - there is no opportunity for meta-equivalence | |
319 | classes if the table is not being compressed. Otherwise the options | |
320 | may be freely mixed. | |
321 | .Tc Fl S | |
322 | .Ar skeleton_file | |
323 | .Cx | |
324 | overrides the default skeleton file from which | |
325 | .Nm lex | |
326 | constructs its scanners. Useful for | |
327 | .Nm lex | |
328 | maintenance or development. | |
329 | .Sh SUMMARY OF Lex REGULAR EXPRESSIONS | |
330 | The patterns in the input are written using an extended set of regular | |
331 | expressions. These are: | |
332 | .Pp | |
333 | .Dw 8n | |
334 | .Di L | |
335 | .Dp Li x | |
336 | match the character 'x' | |
337 | .Dp Li \&. | |
338 | any character except newline | |
339 | .Dp Op Li xyz | |
340 | a "character class"; in this case, the pattern | |
341 | matches either an 'x', a 'y', or a 'z' | |
342 | .Dp Op Li abj-oZ | |
343 | a "character class" with a range in it; matches | |
344 | an 'a', a 'b', any letter from 'j' through 'o', | |
345 | or a 'Z' | |
346 | .Dp Op \&Li ^A-Z | |
347 | a "negated character class", i.e., any character | |
348 | but those in the class. In this case, any | |
349 | character EXCEPT an uppercase letter. | |
350 | .Dp Op \&Li ^A-Z\en | |
351 | any character EXCEPT an uppercase letter or | |
352 | a newline | |
353 | .Dp Li r* | |
354 | zero or more r's, where r is any regular expression | |
355 | .Dp Li r+ | |
356 | one or more r's | |
357 | .Dp Li r? | |
358 | zero or one r's (that is, "an optional r") | |
359 | .Dp Li r{2,5} | |
360 | anywhere from two to five r's | |
361 | .Dp Li r{2,} | |
362 | two or more r's | |
363 | .Dp Li r{4} | |
364 | exactly 4 r's | |
365 | .Dp Li {name} | |
366 | the expansion of the "name" definition | |
367 | (see above) | |
368 | .Dc Op Li xyz | |
369 | .Li \&\e"foo" | |
370 | .Cx | |
371 | the literal string: | |
372 | [xyz]"foo | |
373 | .Dp Li \&\eX | |
374 | if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', | |
375 | then the ANSI-C interpretation of \ex. | |
376 | Otherwise, a literal 'X' (used to escape | |
377 | operators such as '*') | |
378 | .Dp Li \&\e123 | |
379 | the character with octal value 123 | |
380 | .Dp Li \&\ex2a | |
381 | the character with hexadecimal value 2a | |
382 | .Dp Li (r) | |
383 | match an r; parentheses are used to override | |
384 | precedence (see below) | |
385 | .Dp Li rs | |
386 | the regular expression r followed by the | |
387 | regular expression s; called "concatenation" | |
388 | .Dp Li rs | |
389 | either an r or an s | |
390 | .Dp Li r/s | |
391 | an r but only if it is followed by an s. The | |
392 | s is not part of the matched text. This type | |
393 | of pattern is called as "trailing context". | |
394 | .Dp Li \&^r | |
395 | an r, but only at the beginning of a line | |
396 | .Dp Li r$ | |
397 | an r, but only at the end of a line. Equivalent | |
398 | to "r/\en". | |
399 | .Dp Li <s>r | |
400 | an r, but only in start condition s (see | |
401 | below for discussion of start conditions) | |
402 | .Dp Li <s1,s2,s3>r | |
403 | same, but in any of start conditions s1, | |
404 | s2, or s3 | |
405 | .Dp Li <<EOF>> | |
406 | an end-of-file | |
407 | .Dp Li <s1,s2><<EOF>> | |
408 | an end-of-file when in start condition s1 or s2 | |
409 | .Dp | |
410 | The regular expressions listed above are grouped according to | |
411 | precedence, from highest precedence at the top to lowest at the bottom. | |
412 | Those grouped together have equal precedence. | |
413 | .Pp | |
414 | Some notes on patterns: | |
415 | .Pp | |
416 | Negated character classes | |
417 | .Ar match newlines | |
418 | unless "\en" (or an equivalent escape sequence) is one of the | |
419 | characters explicitly present in the negated character class | |
420 | (e.g., " [^A-Z\en] "). | |
421 | .Pp | |
422 | A rule can have at most one instance of trailing context (the '/' operator | |
423 | or the '$' operator). The start condition, '^', and "<<EOF>>" patterns | |
424 | can only occur at the beginning of a pattern, and, as well as with '/' and '$', | |
425 | cannot be grouped inside parentheses. The following are all illegal: | |
426 | .Pp | |
427 | .Ds C | |
428 | foo/bar$ | |
429 | foo(bar$) | |
430 | foo^bar | |
431 | <sc1>foo<sc2>bar | |
432 | .De | |
433 | .Sh SUMMARY OF SPECIAL ACTIONS | |
434 | In addition to arbitrary C code, the following can appear in actions: | |
435 | .Tw Fl | |
436 | .Tp Ic ECHO | |
437 | Copies | |
438 | .Va yytext | |
439 | to the scanner's output. | |
440 | .Tp Ic BEGIN | |
441 | Followed by the name of a start condition places the scanner in the | |
442 | corresponding start condition. | |
443 | .Tp Ic REJECT | |
444 | Directs the scanner to proceed on to the "second best" rule which matched the | |
445 | input (or a prefix of the input). | |
446 | .Va yytext | |
447 | and | |
448 | .Va yyleng | |
449 | are set up appropriately. Note that | |
450 | .Ic REJECT | |
451 | is a particularly expensive feature in terms scanner performance; | |
452 | if it is used in | |
453 | .Em any | |
454 | of the scanner's actions it will slow down | |
455 | .Em all | |
456 | of the scanner's matching. Furthermore, | |
457 | .Ic REJECT | |
458 | cannot be used with the | |
459 | .Fl f | |
460 | or | |
461 | .Fl F | |
462 | options. | |
463 | .Pp | |
464 | Note also that unlike the other special actions, | |
465 | .Ic REJECT | |
466 | is a | |
467 | .Em branch ; | |
468 | code immediately following it in the action will | |
469 | .Em not | |
470 | be executed. | |
471 | .Tp Fn yymore | |
472 | tells the scanner that the next time it matches a rule, the corresponding | |
473 | token should be | |
474 | .Em appended | |
475 | onto the current value of | |
476 | .Va yytext | |
477 | rather than replacing it. | |
478 | .Tp Fn yyless \&n | |
479 | returns all but the first | |
480 | .Ar n | |
481 | characters of the current token back to the input stream, where they | |
482 | will be rescanned when the scanner looks for the next match. | |
483 | .Va yytext | |
484 | and | |
485 | .Va yyleng | |
486 | are adjusted appropriately (e.g., | |
487 | .Va yyleng | |
488 | will now be equal to | |
489 | .Ar n ) . | |
490 | .Tp Fn unput c | |
491 | puts the character | |
492 | .Ar c | |
493 | back onto the input stream. It will be the next character scanned. | |
494 | .Tp Fn input | |
495 | reads the next character from the input stream (this routine is called | |
496 | .Fn yyinput | |
497 | if the scanner is compiled using | |
498 | .Em C \&+\&+ ) . | |
499 | .Tp Fn yyterminate | |
500 | can be used in lieu of a return statement in an action. It terminates | |
501 | the scanner and returns a 0 to the scanner's caller, indicating "all done". | |
502 | .Pp | |
503 | By default, | |
504 | .Fn yyterminate | |
505 | is also called when an end-of-file is encountered. It is a macro and | |
506 | may be redefined. | |
507 | .Tp Ic YY_NEW_FILE | |
508 | is an action available only in <<EOF>> rules. It means "Okay, I've | |
509 | set up a new input file, continue scanning". | |
510 | .Tp Fn yy_create_buffer file size | |
511 | takes a | |
512 | .Ic FILE | |
513 | pointer and an integer | |
514 | .Ar size . | |
515 | It returns a YY_BUFFER_STATE | |
516 | handle to a new input buffer large enough to accomodate | |
517 | .Ar size | |
518 | characters and associated with the given file. When in doubt, use | |
519 | .Ar YY_BUF_SIZE | |
520 | for the size. | |
521 | .Tp Fn yy_switch_to_buffer new_buffer | |
522 | switches the scanner's processing to scan for tokens from | |
523 | the given buffer, which must be a YY_BUFFER_STATE. | |
524 | .Tp Fn yy_delete_buffer buffer | |
525 | deletes the given buffer. | |
526 | .Tp | |
527 | .Sh \&VALUES\ AVAILABLE\ TO THE USER | |
528 | .Tw Fl | |
529 | .Tp Va \&char \&*yytext | |
530 | holds the text of the current token. It may not be modified. | |
531 | .Tp Va \&int yyleng | |
532 | holds the length of the current token. It may not be modified. | |
533 | .Tp Va FILE \&*yyin | |
534 | is the file which by default | |
535 | .Nm lex | |
536 | reads from. It may be redefined but doing so only makes sense before | |
537 | scanning begins. Changing it in the middle of scanning will have | |
538 | unexpected results since | |
539 | .Nm lex | |
540 | buffers its input. Once scanning terminates because an end-of-file | |
541 | has been seen, | |
542 | .Fn void\ yyrestart FILE\ *new_file | |
543 | may be called to point | |
544 | .Va yyin | |
545 | at the new input file. | |
546 | .Tp Va FILE \&*yyout | |
547 | is the file to which | |
548 | .Ar ECHO | |
549 | actions are done. It can be reassigned by the user. | |
550 | .Tp Va YY_CURRENT_BUFFER | |
551 | returns a | |
552 | YY_BUFFER_STATE | |
553 | handle to the current buffer. | |
554 | .Tp | |
555 | .Sh MACROS THE USER CAN REDEFINE | |
556 | .Tw Fl | |
557 | .Tp Va YY_DECL | |
558 | controls how the scanning routine is declared. | |
559 | By default, it is "int yylex()", or, if prototypes are being | |
560 | used, "int yylex(void)". This definition may be changed by redefining | |
561 | the "YY_DECL" macro. Note that | |
562 | if you give arguments to the scanning routine using a | |
563 | K&R-style/non-prototyped function declaration, you must terminate | |
564 | the definition with a semi-colon (;). | |
565 | .Tp Va YY_INPUT | |
566 | The nature of how the scanner | |
567 | gets its input can be controlled by redefining the | |
568 | YY_INPUT | |
569 | macro. | |
570 | YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its | |
571 | action is to place up to | |
572 | .Ar max _size | |
573 | characters in the character array | |
574 | .Ar buf | |
575 | and return in the integer variable | |
576 | .Ar result | |
577 | either the | |
578 | number of characters read or the constant YY_NULL (0 on Unix systems) | |
579 | to indicate EOF. The default YY_INPUT reads from the | |
580 | global file-pointer "yyin". | |
581 | A sample redefinition of YY_INPUT (in the definitions | |
582 | section of the input file): | |
583 | .Pp | |
584 | .Ds I | |
585 | %{ | |
586 | #undef YY_INPUT | |
587 | #define YY_INPUT(buf,result,max_size) \\ | |
588 | result = ((buf[0] = getchar()) == EOF) ? YY_NULL : 1; | |
589 | %} | |
590 | .De | |
591 | .Tp Va YY_INPUT | |
592 | When the scanner receives an end-of-file indication from YY_INPUT, | |
593 | it then checks the | |
594 | .Fn yywrap | |
595 | function. If | |
596 | .Fn yywrap | |
597 | returns false (zero), then it is assumed that the | |
598 | function has gone ahead and set up | |
599 | .Va yyin | |
600 | to point to another input file, and scanning continues. If it returns | |
601 | true (non-zero), then the scanner terminates, returning 0 to its | |
602 | caller. | |
603 | .Tp Va yywrap | |
604 | The default | |
605 | .Fn yywrap | |
606 | always returns 1. Presently, to redefine it you must first | |
607 | "#undef yywrap", as it is currently implemented as a macro. It is | |
608 | likely that | |
609 | .Fn yywrap | |
610 | will soon be defined to be a function rather than a macro. | |
611 | .Tp Va YY_USER_ACTION | |
612 | can be redefined to provide an action | |
613 | which is always executed prior to the matched rule's action. | |
614 | .Tp Va YY_USER_INIT | |
615 | The macro | |
616 | .Va YY _USER_INIT | |
617 | may be redefined to provide an action which is always executed before | |
618 | the first scan. | |
619 | .Tp Va YY_BREAK | |
620 | In the generated scanner, the actions are all gathered in one large | |
621 | switch statement and separated using | |
622 | .Va YY _BREAK , | |
623 | which may be redefined. By default, it is simply a "break", to separate | |
624 | each rule's action from the following rule's. | |
625 | .Tp | |
626 | .Sh FILES | |
627 | .Dw lex.backtrack | |
628 | .Di L | |
629 | .Dp Pa lex.skel | |
630 | skeleton scanner. | |
631 | .Dp Pa lex.yy.c | |
632 | generated scanner | |
633 | (called | |
634 | .Pa lexyy.c | |
635 | on some systems). | |
636 | .Dp Pa lex.backtrack | |
637 | backtracking information for | |
638 | .Fl b | |
639 | .Dp Pa flag | |
640 | (called | |
641 | .Pa lex.bck | |
642 | on some systems). | |
643 | .Dp | |
644 | .Sh SEE ALSO | |
645 | .Xr lex 1 , | |
646 | .Xr yacc 1 , | |
647 | .Xr sed 1 , | |
648 | .Xr awk 1 . | |
649 | .br | |
650 | .Em lexdoc | |
651 | .br | |
652 | M. | |
653 | E. | |
654 | Lesk and E. | |
655 | Schmidt, | |
656 | .Em LEX \- Lexical Analyzer Generator | |
657 | .Sh DIAGNOSTICS | |
658 | .Tw Fl | |
659 | .Tp Li reject_used_but_not_detected undefined | |
660 | or | |
661 | .Tp Li yymore_used_but_not_detected undefined | |
662 | These errors can occur at compile time. | |
663 | They indicate that the | |
664 | scanner uses | |
665 | .Ic REJECT | |
666 | or | |
667 | .Fn yymore | |
668 | but that | |
669 | .Nm lex | |
670 | failed to notice the fact, | |
671 | meaning that | |
672 | .Nm lex | |
673 | scanned the first two sections looking for occurrences of these actions | |
674 | and failed to find any, | |
675 | but somehow you snuck some in via a #include | |
676 | file, | |
677 | for example . | |
678 | Make an explicit reference to the action in your | |
679 | .Nm lex | |
680 | input file. | |
681 | Note that previously | |
682 | .Nm lex | |
683 | supported a | |
684 | .Li %used/%unused | |
685 | mechanism for dealing with this problem; | |
686 | this feature is still supported | |
687 | but now deprecated, | |
688 | and will go away soon unless the author hears from | |
689 | people who can argue compellingly that they need it. | |
690 | .Tp Li lex scanner jammed | |
691 | a scanner compiled with | |
692 | .Fl s | |
693 | has encountered an input string which wasn't matched by | |
694 | any of its rules. | |
695 | .Tp Li lex input buffer overflowed | |
696 | a scanner rule matched a string long enough to overflow the | |
697 | scanner's internal input buffer 16K bytes - controlled by | |
698 | .Va YY_BUF_MAX | |
699 | in | |
700 | .Pa lex.skel . | |
701 | .Tp Li scanner requires \&\-8 flag | |
702 | Your scanner specification includes recognizing 8-bit characters and | |
703 | you did not specify the -8 flag and your site has not installed lex | |
704 | with -8 as the default . | |
705 | .Tp Li too many \&%t classes! | |
706 | You managed to put every single character into its own %t class. | |
707 | .Nm Lex | |
708 | requires that at least one of the classes share characters. | |
709 | .Tp | |
710 | .Sh HISTORY | |
711 | A | |
712 | .Nm lex | |
713 | appeared in Version 6 AT&T Unix. | |
714 | The version this man page describes is | |
715 | derived from code contributed by Vern Paxson. | |
716 | .Sh AUTHOR | |
717 | Vern Paxson, with the help of many ideas and much inspiration from | |
718 | Van Jacobson. Original version by Jef Poskanzer. | |
719 | .Pp | |
720 | See | |
721 | .Em Lexdoc | |
722 | for additional credits and the address to send comments to. | |
723 | .Sh BUGS | |
724 | .Pp | |
725 | Some trailing context | |
726 | patterns cannot be properly matched and generate | |
727 | warning messages ("Dangerous trailing context"). These are | |
728 | patterns where the ending of the | |
729 | first part of the rule matches the beginning of the second | |
730 | part, such as "zx*/xy*", where the 'x*' matches the 'x' at | |
731 | the beginning of the trailing context. (Note that the POSIX draft | |
732 | states that the text matched by such patterns is undefined.) | |
733 | .Pp | |
734 | For some trailing context rules, parts which are actually fixed-length are | |
735 | not recognized as such, leading to the abovementioned performance loss. | |
736 | In particular, parts using '\&|' or {n} (such as "foo{3}") are always | |
737 | considered variable-length. | |
738 | .Pp | |
739 | Combining trailing context with the special '\&|' action can result in | |
740 | .Em fixed | |
741 | trailing context being turned into the more expensive | |
742 | .Em variable | |
743 | trailing context. This happens in the following example: | |
744 | .Pp | |
745 | .Ds C | |
746 | %% | |
747 | abc \&| | |
748 | xyz/def | |
749 | .De | |
750 | .Pp | |
751 | Use of | |
752 | .Fn unput | |
753 | invalidates yytext and yyleng. | |
754 | .Pp | |
755 | Use of | |
756 | .Fn unput | |
757 | to push back more text than was matched can | |
758 | result in the pushed-back text matching a beginning-of-line ('^') | |
759 | rule even though it didn't come at the beginning of the line | |
760 | (though this is rare!). | |
761 | .Pp | |
762 | Pattern-matching of NUL's is substantially slower than matching other | |
763 | characters. | |
764 | .Pp | |
765 | .Nm Lex | |
766 | does not generate correct #line directives for code internal | |
767 | to the scanner; thus, bugs in | |
768 | .Pa lex.skel | |
769 | yield bogus line numbers. | |
770 | .Pp | |
771 | Due to both buffering of input and read-ahead, you cannot intermix | |
772 | calls to <stdio.h> routines, such as, for example, | |
773 | .Fn getchar , | |
774 | with | |
775 | .Nm lex | |
776 | rules and expect it to work. Call | |
777 | .Fn input | |
778 | instead. | |
779 | .Pp | |
780 | The total table entries listed by the | |
781 | .Fl v | |
782 | flag excludes the number of table entries needed to determine | |
783 | what rule has been matched. The number of entries is equal | |
784 | to the number of DFA states if the scanner does not use | |
785 | .Ic REJECT , | |
786 | and somewhat greater than the number of states if it does. | |
787 | .Pp | |
788 | .Ic REJECT | |
789 | cannot be used with the | |
790 | .Fl f | |
791 | or | |
792 | .Fl F | |
793 | options. | |
794 | .Pp | |
795 | Some of the macros, such as | |
796 | .Fn yywrap , | |
797 | may in the future become functions which live in the | |
798 | .Fl lfl | |
799 | library. This will doubtless break a lot of code, but may be | |
800 | required for POSIX-compliance. | |
801 | .Pp | |
802 | The | |
803 | .Nm lex | |
804 | internal algorithms need documentation. |