Commit | Line | Data |
---|---|---|
78ed81a3 | 1 | .TH FLEX 1 "26 May 1990" "Version 2.3" |
2 | .SH NAME | |
3 | flex - fast lexical analyzer generator | |
4 | .SH SYNOPSIS | |
5 | .B flex | |
6 | .B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton] | |
7 | .I [filename ...] | |
8 | .SH DESCRIPTION | |
9 | .I flex | |
15637ed4 | 10 | is a tool for generating |
78ed81a3 | 11 | .I scanners: |
15637ed4 | 12 | programs which recognized lexical patterns in text. |
78ed81a3 | 13 | .I flex |
15637ed4 RG |
14 | reads |
15 | the given input files, or its standard input if no file names are given, | |
16 | for a description of a scanner to generate. The description is in | |
17 | the form of pairs | |
18 | of regular expressions and C code, called | |
78ed81a3 | 19 | .I rules. flex |
15637ed4 | 20 | generates as output a C source file, |
78ed81a3 | 21 | .B lex.yy.c, |
15637ed4 | 22 | which defines a routine |
78ed81a3 | 23 | .B yylex(). |
15637ed4 | 24 | This file is compiled and linked with the |
78ed81a3 | 25 | .B -lfl |
15637ed4 RG |
26 | library to produce an executable. When the executable is run, |
27 | it analyzes its input for occurrences | |
28 | of the regular expressions. Whenever it finds one, it executes | |
29 | the corresponding C code. | |
78ed81a3 | 30 | .LP |
15637ed4 | 31 | For full documentation, see |
78ed81a3 | 32 | .B flexdoc(1). |
15637ed4 | 33 | This manual entry is intended for use as a quick reference. |
78ed81a3 | 34 | .SH OPTIONS |
35 | .I flex | |
15637ed4 | 36 | has the following options: |
78ed81a3 | 37 | .TP |
38 | .B -b | |
15637ed4 | 39 | Generate backtracking information to |
78ed81a3 | 40 | .I lex.backtrack. |
15637ed4 RG |
41 | This is a list of scanner states which require backtracking |
42 | and the input characters on which they do so. By adding rules one | |
43 | can remove backtracking states. If all backtracking states | |
44 | are eliminated and | |
78ed81a3 | 45 | .B -f |
15637ed4 | 46 | or |
78ed81a3 | 47 | .B -F |
15637ed4 | 48 | is used, the generated scanner will run faster. |
78ed81a3 | 49 | .TP |
50 | .B -c | |
51 | is a do-nothing, deprecated option included for POSIX compliance. | |
52 | .IP | |
53 | .B NOTE: | |
15637ed4 | 54 | in previous releases of |
78ed81a3 | 55 | .I flex |
56 | .B -c | |
15637ed4 RG |
57 | specified table-compression options. This functionality is |
58 | now given by the | |
78ed81a3 | 59 | .B -C |
15637ed4 | 60 | flag. To ease the the impact of this change, when |
78ed81a3 | 61 | .I flex |
15637ed4 | 62 | encounters |
78ed81a3 | 63 | .B -c, |
15637ed4 | 64 | it currently issues a warning message and assumes that |
78ed81a3 | 65 | .B -C |
15637ed4 | 66 | was desired instead. In the future this "promotion" of |
78ed81a3 | 67 | .B -c |
15637ed4 | 68 | to |
78ed81a3 | 69 | .B -C |
70 | will go away in the name of full POSIX compliance (unless | |
71 | the POSIX meaning is removed first). | |
72 | .TP | |
73 | .B -d | |
74 | makes the generated scanner run in | |
75 | .I debug | |
15637ed4 | 76 | mode. Whenever a pattern is recognized and the global |
78ed81a3 | 77 | .B yy_flex_debug |
15637ed4 RG |
78 | is non-zero (which is the default), the scanner will |
79 | write to | |
78ed81a3 | 80 | .I stderr |
15637ed4 | 81 | a line of the form: |
78ed81a3 | 82 | .nf |
83 | ||
84 | --accepting rule at line 53 ("the matched text") | |
85 | ||
86 | .fi | |
15637ed4 | 87 | The line number refers to the location of the rule in the file |
78ed81a3 | 88 | defining the scanner (i.e., the file that was fed to flex). Messages |
15637ed4 RG |
89 | are also generated when the scanner backtracks, accepts the |
90 | default rule, reaches the end of its input buffer (or encounters | |
78ed81a3 | 91 | a NUL; the two look the same as far as the scanner's concerned), |
15637ed4 | 92 | or reaches an end-of-file. |
78ed81a3 | 93 | .TP |
94 | .B -f | |
95 | specifies (take your pick) | |
96 | .I full table | |
15637ed4 | 97 | or |
78ed81a3 | 98 | .I fast scanner. |
15637ed4 RG |
99 | No table compression is done. The result is large but fast. |
100 | This option is equivalent to | |
78ed81a3 | 101 | .B -Cf |
15637ed4 | 102 | (see below). |
78ed81a3 | 103 | .TP |
104 | .B -i | |
105 | instructs | |
106 | .I flex | |
15637ed4 | 107 | to generate a |
78ed81a3 | 108 | .I case-insensitive |
15637ed4 | 109 | scanner. The case of letters given in the |
78ed81a3 | 110 | .I flex |
15637ed4 RG |
111 | input patterns will |
112 | be ignored, and tokens in the input will be matched regardless of case. The | |
113 | matched text given in | |
78ed81a3 | 114 | .I yytext |
15637ed4 | 115 | will have the preserved case (i.e., it will not be folded). |
78ed81a3 | 116 | .TP |
117 | .B -n | |
118 | is another do-nothing, deprecated option included only for | |
119 | POSIX compliance. | |
120 | .TP | |
121 | .B -p | |
122 | generates a performance report to stderr. The report | |
15637ed4 | 123 | consists of comments regarding features of the |
78ed81a3 | 124 | .I flex |
15637ed4 | 125 | input file which will cause a loss of performance in the resulting scanner. |
78ed81a3 | 126 | .TP |
127 | .B -s | |
128 | causes the | |
129 | .I default rule | |
15637ed4 | 130 | (that unmatched scanner input is echoed to |
78ed81a3 | 131 | .I stdout) |
15637ed4 RG |
132 | to be suppressed. If the scanner encounters input that does not |
133 | match any of its rules, it aborts with an error. | |
78ed81a3 | 134 | .TP |
135 | .B -t | |
136 | instructs | |
137 | .I flex | |
15637ed4 RG |
138 | to write the scanner it generates to standard output instead |
139 | of | |
78ed81a3 | 140 | .B lex.yy.c. |
141 | .TP | |
142 | .B -v | |
143 | specifies that | |
144 | .I flex | |
15637ed4 | 145 | should write to |
78ed81a3 | 146 | .I stderr |
15637ed4 | 147 | a summary of statistics regarding the scanner it generates. |
78ed81a3 | 148 | .TP |
149 | .B -F | |
150 | specifies that the | |
151 | .ul | |
152 | fast | |
15637ed4 RG |
153 | scanner table representation should be used. This representation is |
154 | about as fast as the full table representation | |
78ed81a3 | 155 | .ul |
156 | (-f), | |
15637ed4 RG |
157 | and for some sets of patterns will be considerably smaller (and for |
158 | others, larger). See | |
78ed81a3 | 159 | .B flexdoc(1) |
15637ed4 | 160 | for details. |
78ed81a3 | 161 | .IP |
15637ed4 | 162 | This option is equivalent to |
78ed81a3 | 163 | .B -CF |
15637ed4 | 164 | (see below). |
78ed81a3 | 165 | .TP |
166 | .B -I | |
167 | instructs | |
168 | .I flex | |
15637ed4 | 169 | to generate an |
78ed81a3 | 170 | .I interactive |
15637ed4 RG |
171 | scanner, that is, a scanner which stops immediately rather than |
172 | looking ahead if it knows | |
173 | that the currently scanned text cannot be part of a longer rule's match. | |
174 | Again, see | |
78ed81a3 | 175 | .B flexdoc(1) |
15637ed4 | 176 | for details. |
78ed81a3 | 177 | .IP |
15637ed4 | 178 | Note, |
78ed81a3 | 179 | .B -I |
15637ed4 | 180 | cannot be used in conjunction with |
78ed81a3 | 181 | .I full |
15637ed4 | 182 | or |
78ed81a3 | 183 | .I fast tables, |
15637ed4 | 184 | i.e., the |
78ed81a3 | 185 | .B -f, -F, -Cf, |
15637ed4 | 186 | or |
78ed81a3 | 187 | .B -CF |
15637ed4 | 188 | flags. |
78ed81a3 | 189 | .TP |
190 | .B -L | |
191 | instructs | |
192 | .I flex | |
15637ed4 | 193 | not to generate |
78ed81a3 | 194 | .B #line |
15637ed4 | 195 | directives in |
78ed81a3 | 196 | .B lex.yy.c. |
15637ed4 RG |
197 | The default is to generate such directives so error |
198 | messages in the actions will be correctly | |
199 | located with respect to the original | |
78ed81a3 | 200 | .I flex |
15637ed4 RG |
201 | input file, and not to |
202 | the fairly meaningless line numbers of | |
78ed81a3 | 203 | .B lex.yy.c. |
204 | .TP | |
205 | .B -T | |
206 | makes | |
207 | .I flex | |
15637ed4 | 208 | run in |
78ed81a3 | 209 | .I trace |
15637ed4 | 210 | mode. It will generate a lot of messages to |
78ed81a3 | 211 | .I stdout |
15637ed4 RG |
212 | concerning |
213 | the form of the input and the resultant non-deterministic and deterministic | |
214 | finite automata. This option is mostly for use in maintaining | |
78ed81a3 | 215 | .I flex. |
216 | .TP | |
217 | .B -8 | |
218 | instructs | |
219 | .I flex | |
15637ed4 RG |
220 | to generate an 8-bit scanner. |
221 | On some sites, this is the default. On others, the default | |
222 | is 7-bit characters. To see which is the case, check the verbose | |
78ed81a3 | 223 | .B (-v) |
15637ed4 RG |
224 | output for "equivalence classes created". If the denominator of |
225 | the number shown is 128, then by default | |
78ed81a3 | 226 | .I flex |
15637ed4 RG |
227 | is generating 7-bit characters. If it is 256, then the default is |
228 | 8-bit characters. | |
78ed81a3 | 229 | .TP |
230 | .B -C[efmF] | |
231 | controls the degree of table compression. | |
232 | .IP | |
233 | .B -Ce | |
234 | directs | |
235 | .I flex | |
15637ed4 | 236 | to construct |
78ed81a3 | 237 | .I equivalence classes, |
15637ed4 RG |
238 | i.e., sets of characters |
239 | which have identical lexical properties. | |
240 | Equivalence classes usually give | |
241 | dramatic reductions in the final table/object file sizes (typically | |
242 | a factor of 2-5) and are pretty cheap performance-wise (one array | |
243 | look-up per character scanned). | |
78ed81a3 | 244 | .IP |
245 | .B -Cf | |
246 | specifies that the | |
247 | .I full | |
15637ed4 | 248 | scanner tables should be generated - |
78ed81a3 | 249 | .I flex |
15637ed4 RG |
250 | should not compress the |
251 | tables by taking advantages of similar transition functions for | |
252 | different states. | |
78ed81a3 | 253 | .IP |
254 | .B -CF | |
255 | specifies that the alternate fast scanner representation (described in | |
256 | .B flexdoc(1)) | |
15637ed4 | 257 | should be used. |
78ed81a3 | 258 | .IP |
259 | .B -Cm | |
260 | directs | |
261 | .I flex | |
15637ed4 | 262 | to construct |
78ed81a3 | 263 | .I meta-equivalence classes, |
15637ed4 RG |
264 | which are sets of equivalence classes (or characters, if equivalence |
265 | classes are not being used) that are commonly used together. Meta-equivalence | |
266 | classes are often a big win when using compressed tables, but they | |
267 | have a moderate performance impact (one or two "if" tests and one | |
268 | array look-up per character scanned). | |
78ed81a3 | 269 | .IP |
270 | A lone | |
271 | .B -C | |
272 | specifies that the scanner tables should be compressed but neither | |
273 | equivalence classes nor meta-equivalence classes should be used. | |
274 | .IP | |
15637ed4 | 275 | The options |
78ed81a3 | 276 | .B -Cf |
15637ed4 | 277 | or |
78ed81a3 | 278 | .B -CF |
15637ed4 | 279 | and |
78ed81a3 | 280 | .B -Cm |
15637ed4 RG |
281 | do not make sense together - there is no opportunity for meta-equivalence |
282 | classes if the table is not being compressed. Otherwise the options | |
283 | may be freely mixed. | |
78ed81a3 | 284 | .IP |
285 | The default setting is | |
286 | .B -Cem, | |
287 | which specifies that | |
288 | .I flex | |
289 | should generate equivalence classes | |
290 | and meta-equivalence classes. This setting provides the highest | |
291 | degree of table compression. You can trade off | |
292 | faster-executing scanners at the cost of larger tables with | |
293 | the following generally being true: | |
294 | .nf | |
295 | ||
296 | slowest & smallest | |
297 | -Cem | |
298 | -Cm | |
299 | -Ce | |
300 | -C | |
301 | -C{f,F}e | |
302 | -C{f,F} | |
303 | fastest & largest | |
304 | ||
305 | .fi | |
306 | .IP | |
307 | .B -C | |
308 | options are not cumulative; whenever the flag is encountered, the | |
309 | previous -C settings are forgotten. | |
310 | .TP | |
311 | .B -Sskeleton_file | |
312 | overrides the default skeleton file from which | |
313 | .I flex | |
314 | constructs its scanners. You'll never need this option unless you are doing | |
315 | .I flex | |
15637ed4 | 316 | maintenance or development. |
78ed81a3 | 317 | .SH SUMMARY OF FLEX REGULAR EXPRESSIONS |
15637ed4 RG |
318 | The patterns in the input are written using an extended set of regular |
319 | expressions. These are: | |
78ed81a3 | 320 | .nf |
321 | ||
322 | x match the character 'x' | |
323 | . any character except newline | |
324 | [xyz] a "character class"; in this case, the pattern | |
325 | matches either an 'x', a 'y', or a 'z' | |
326 | [abj-oZ] a "character class" with a range in it; matches | |
327 | an 'a', a 'b', any letter from 'j' through 'o', | |
328 | or a 'Z' | |
329 | [^A-Z] a "negated character class", i.e., any character | |
330 | but those in the class. In this case, any | |
331 | character EXCEPT an uppercase letter. | |
332 | [^A-Z\\n] any character EXCEPT an uppercase letter or | |
333 | a newline | |
334 | r* zero or more r's, where r is any regular expression | |
335 | r+ one or more r's | |
336 | r? zero or one r's (that is, "an optional r") | |
337 | r{2,5} anywhere from two to five r's | |
338 | r{2,} two or more r's | |
339 | r{4} exactly 4 r's | |
340 | {name} the expansion of the "name" definition | |
341 | (see above) | |
342 | "[xyz]\\"foo" | |
343 | the literal string: [xyz]"foo | |
344 | \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v', | |
345 | then the ANSI-C interpretation of \\x. | |
346 | Otherwise, a literal 'X' (used to escape | |
347 | operators such as '*') | |
348 | \\123 the character with octal value 123 | |
349 | \\x2a the character with hexadecimal value 2a | |
350 | (r) match an r; parentheses are used to override | |
351 | precedence (see below) | |
352 | ||
353 | ||
354 | rs the regular expression r followed by the | |
355 | regular expression s; called "concatenation" | |
356 | ||
357 | ||
358 | r|s either an r or an s | |
359 | ||
360 | ||
361 | r/s an r but only if it is followed by an s. The | |
362 | s is not part of the matched text. This type | |
363 | of pattern is called as "trailing context". | |
364 | ^r an r, but only at the beginning of a line | |
365 | r$ an r, but only at the end of a line. Equivalent | |
366 | to "r/\\n". | |
367 | ||
368 | ||
369 | <s>r an r, but only in start condition s (see | |
370 | below for discussion of start conditions) | |
371 | <s1,s2,s3>r | |
372 | same, but in any of start conditions s1, | |
373 | s2, or s3 | |
374 | ||
375 | ||
376 | <<EOF>> an end-of-file | |
377 | <s1,s2><<EOF>> | |
378 | an end-of-file when in start condition s1 or s2 | |
379 | ||
380 | .fi | |
15637ed4 RG |
381 | The regular expressions listed above are grouped according to |
382 | precedence, from highest precedence at the top to lowest at the bottom. | |
383 | Those grouped together have equal precedence. | |
78ed81a3 | 384 | .LP |
15637ed4 | 385 | Some notes on patterns: |
78ed81a3 | 386 | .IP - |
15637ed4 | 387 | Negated character classes |
78ed81a3 | 388 | .I match newlines |
389 | unless "\\n" (or an equivalent escape sequence) is one of the | |
15637ed4 | 390 | characters explicitly present in the negated character class |
78ed81a3 | 391 | (e.g., "[^A-Z\\n]"). |
392 | .IP - | |
15637ed4 RG |
393 | A rule can have at most one instance of trailing context (the '/' operator |
394 | or the '$' operator). The start condition, '^', and "<<EOF>>" patterns | |
395 | can only occur at the beginning of a pattern, and, as well as with '/' and '$', | |
396 | cannot be grouped inside parentheses. The following are all illegal: | |
78ed81a3 | 397 | .nf |
398 | ||
399 | foo/bar$ | |
400 | foo|(bar$) | |
401 | foo|^bar | |
402 | <sc1>foo<sc2>bar | |
403 | ||
404 | .fi | |
405 | .SH SUMMARY OF SPECIAL ACTIONS | |
15637ed4 | 406 | In addition to arbitrary C code, the following can appear in actions: |
78ed81a3 | 407 | .IP - |
408 | .B ECHO | |
409 | copies yytext to the scanner's output. | |
410 | .IP - | |
411 | .B BEGIN | |
412 | followed by the name of a start condition places the scanner in the | |
15637ed4 | 413 | corresponding start condition. |
78ed81a3 | 414 | .IP - |
415 | .B REJECT | |
416 | directs the scanner to proceed on to the "second best" rule which matched the | |
15637ed4 | 417 | input (or a prefix of the input). |
78ed81a3 | 418 | .B yytext |
15637ed4 | 419 | and |
78ed81a3 | 420 | .B yyleng |
15637ed4 | 421 | are set up appropriately. Note that |
78ed81a3 | 422 | .B REJECT |
15637ed4 RG |
423 | is a particularly expensive feature in terms scanner performance; |
424 | if it is used in | |
78ed81a3 | 425 | .I any |
15637ed4 | 426 | of the scanner's actions it will slow down |
78ed81a3 | 427 | .I all |
15637ed4 | 428 | of the scanner's matching. Furthermore, |
78ed81a3 | 429 | .B REJECT |
15637ed4 | 430 | cannot be used with the |
78ed81a3 | 431 | .I -f |
15637ed4 | 432 | or |
78ed81a3 | 433 | .I -F |
15637ed4 | 434 | options. |
78ed81a3 | 435 | .IP |
15637ed4 | 436 | Note also that unlike the other special actions, |
78ed81a3 | 437 | .B REJECT |
15637ed4 | 438 | is a |
78ed81a3 | 439 | .I branch; |
15637ed4 | 440 | code immediately following it in the action will |
78ed81a3 | 441 | .I not |
15637ed4 | 442 | be executed. |
78ed81a3 | 443 | .IP - |
444 | .B yymore() | |
15637ed4 RG |
445 | tells the scanner that the next time it matches a rule, the corresponding |
446 | token should be | |
78ed81a3 | 447 | .I appended |
15637ed4 | 448 | onto the current value of |
78ed81a3 | 449 | .B yytext |
15637ed4 | 450 | rather than replacing it. |
78ed81a3 | 451 | .IP - |
452 | .B yyless(n) | |
15637ed4 | 453 | returns all but the first |
78ed81a3 | 454 | .I n |
15637ed4 RG |
455 | characters of the current token back to the input stream, where they |
456 | will be rescanned when the scanner looks for the next match. | |
78ed81a3 | 457 | .B yytext |
15637ed4 | 458 | and |
78ed81a3 | 459 | .B yyleng |
15637ed4 | 460 | are adjusted appropriately (e.g., |
78ed81a3 | 461 | .B yyleng |
15637ed4 | 462 | will now be equal to |
78ed81a3 | 463 | .I n |
464 | ). | |
465 | .IP - | |
466 | .B unput(c) | |
15637ed4 | 467 | puts the character |
78ed81a3 | 468 | .I c |
15637ed4 | 469 | back onto the input stream. It will be the next character scanned. |
78ed81a3 | 470 | .IP - |
471 | .B input() | |
15637ed4 | 472 | reads the next character from the input stream (this routine is called |
78ed81a3 | 473 | .B yyinput() |
15637ed4 | 474 | if the scanner is compiled using |
78ed81a3 | 475 | .B C++). |
476 | .IP - | |
477 | .B yyterminate() | |
15637ed4 RG |
478 | can be used in lieu of a return statement in an action. It terminates |
479 | the scanner and returns a 0 to the scanner's caller, indicating "all done". | |
78ed81a3 | 480 | .IP |
15637ed4 | 481 | By default, |
78ed81a3 | 482 | .B yyterminate() |
15637ed4 RG |
483 | is also called when an end-of-file is encountered. It is a macro and |
484 | may be redefined. | |
78ed81a3 | 485 | .IP - |
486 | .B YY_NEW_FILE | |
15637ed4 RG |
487 | is an action available only in <<EOF>> rules. It means "Okay, I've |
488 | set up a new input file, continue scanning". | |
78ed81a3 | 489 | .IP - |
490 | .B yy_create_buffer( file, size ) | |
15637ed4 | 491 | takes a |
78ed81a3 | 492 | .I FILE |
15637ed4 | 493 | pointer and an integer |
78ed81a3 | 494 | .I size. |
15637ed4 RG |
495 | It returns a YY_BUFFER_STATE |
496 | handle to a new input buffer large enough to accomodate | |
78ed81a3 | 497 | .I size |
15637ed4 | 498 | characters and associated with the given file. When in doubt, use |
78ed81a3 | 499 | .B YY_BUF_SIZE |
15637ed4 | 500 | for the size. |
78ed81a3 | 501 | .IP - |
502 | .B yy_switch_to_buffer( new_buffer ) | |
15637ed4 RG |
503 | switches the scanner's processing to scan for tokens from |
504 | the given buffer, which must be a YY_BUFFER_STATE. | |
78ed81a3 | 505 | .IP - |
506 | .B yy_delete_buffer( buffer ) | |
15637ed4 | 507 | deletes the given buffer. |
78ed81a3 | 508 | .SH VALUES AVAILABLE TO THE USER |
509 | .IP - | |
510 | .B char *yytext | |
15637ed4 | 511 | holds the text of the current token. It may not be modified. |
78ed81a3 | 512 | .IP - |
513 | .B int yyleng | |
15637ed4 | 514 | holds the length of the current token. It may not be modified. |
78ed81a3 | 515 | .IP - |
516 | .B FILE *yyin | |
15637ed4 | 517 | is the file which by default |
78ed81a3 | 518 | .I flex |
15637ed4 RG |
519 | reads from. It may be redefined but doing so only makes sense before |
520 | scanning begins. Changing it in the middle of scanning will have | |
521 | unexpected results since | |
78ed81a3 | 522 | .I flex |
15637ed4 RG |
523 | buffers its input. Once scanning terminates because an end-of-file |
524 | has been seen, | |
78ed81a3 | 525 | .B |
526 | void yyrestart( FILE *new_file ) | |
15637ed4 | 527 | may be called to point |
78ed81a3 | 528 | .I yyin |
15637ed4 | 529 | at the new input file. |
78ed81a3 | 530 | .IP - |
531 | .B FILE *yyout | |
15637ed4 | 532 | is the file to which |
78ed81a3 | 533 | .B ECHO |
15637ed4 | 534 | actions are done. It can be reassigned by the user. |
78ed81a3 | 535 | .IP - |
536 | .B YY_CURRENT_BUFFER | |
15637ed4 | 537 | returns a |
78ed81a3 | 538 | .B YY_BUFFER_STATE |
15637ed4 | 539 | handle to the current buffer. |
78ed81a3 | 540 | .SH MACROS THE USER CAN REDEFINE |
541 | .IP - | |
542 | .B YY_DECL | |
15637ed4 RG |
543 | controls how the scanning routine is declared. |
544 | By default, it is "int yylex()", or, if prototypes are being | |
545 | used, "int yylex(void)". This definition may be changed by redefining | |
546 | the "YY_DECL" macro. Note that | |
547 | if you give arguments to the scanning routine using a | |
548 | K&R-style/non-prototyped function declaration, you must terminate | |
549 | the definition with a semi-colon (;). | |
78ed81a3 | 550 | .IP - |
15637ed4 RG |
551 | The nature of how the scanner |
552 | gets its input can be controlled by redefining the | |
78ed81a3 | 553 | .B YY_INPUT |
15637ed4 RG |
554 | macro. |
555 | YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its | |
556 | action is to place up to | |
78ed81a3 | 557 | .I max_size |
15637ed4 | 558 | characters in the character array |
78ed81a3 | 559 | .I buf |
15637ed4 | 560 | and return in the integer variable |
78ed81a3 | 561 | .I result |
15637ed4 RG |
562 | either the |
563 | number of characters read or the constant YY_NULL (0 on Unix systems) | |
564 | to indicate EOF. The default YY_INPUT reads from the | |
565 | global file-pointer "yyin". | |
566 | A sample redefinition of YY_INPUT (in the definitions | |
567 | section of the input file): | |
78ed81a3 | 568 | .nf |
569 | ||
570 | %{ | |
571 | #undef YY_INPUT | |
572 | #define YY_INPUT(buf,result,max_size) \\ | |
573 | { \\ | |
574 | int c = getchar(); \\ | |
575 | result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\ | |
576 | } | |
577 | %} | |
578 | ||
579 | .fi | |
580 | .IP - | |
15637ed4 RG |
581 | When the scanner receives an end-of-file indication from YY_INPUT, |
582 | it then checks the | |
78ed81a3 | 583 | .B yywrap() |
15637ed4 | 584 | function. If |
78ed81a3 | 585 | .B yywrap() |
15637ed4 RG |
586 | returns false (zero), then it is assumed that the |
587 | function has gone ahead and set up | |
78ed81a3 | 588 | .I yyin |
15637ed4 RG |
589 | to point to another input file, and scanning continues. If it returns |
590 | true (non-zero), then the scanner terminates, returning 0 to its | |
591 | caller. | |
78ed81a3 | 592 | .IP |
15637ed4 | 593 | The default |
78ed81a3 | 594 | .B yywrap() |
15637ed4 RG |
595 | always returns 1. Presently, to redefine it you must first |
596 | "#undef yywrap", as it is currently implemented as a macro. It is | |
597 | likely that | |
78ed81a3 | 598 | .B yywrap() |
15637ed4 | 599 | will soon be defined to be a function rather than a macro. |
78ed81a3 | 600 | .IP - |
601 | YY_USER_ACTION | |
15637ed4 RG |
602 | can be redefined to provide an action |
603 | which is always executed prior to the matched rule's action. | |
78ed81a3 | 604 | .IP - |
15637ed4 | 605 | The macro |
78ed81a3 | 606 | .B YY_USER_INIT |
15637ed4 RG |
607 | may be redefined to provide an action which is always executed before |
608 | the first scan. | |
78ed81a3 | 609 | .IP - |
15637ed4 RG |
610 | In the generated scanner, the actions are all gathered in one large |
611 | switch statement and separated using | |
78ed81a3 | 612 | .B YY_BREAK, |
15637ed4 RG |
613 | which may be redefined. By default, it is simply a "break", to separate |
614 | each rule's action from the following rule's. | |
78ed81a3 | 615 | .SH FILES |
616 | .TP | |
617 | .I lex.skel | |
15637ed4 | 618 | skeleton scanner. |
78ed81a3 | 619 | .TP |
620 | .I lex.yy.c | |
621 | generated scanner (called | |
622 | .I lexyy.c | |
15637ed4 | 623 | on some systems). |
78ed81a3 | 624 | .TP |
625 | .I lex.backtrack | |
15637ed4 | 626 | backtracking information for |
78ed81a3 | 627 | .B -b |
628 | flag (called | |
629 | .I lex.bck | |
15637ed4 | 630 | on some systems). |
78ed81a3 | 631 | .TP |
632 | .B -lfl | |
633 | library with which to link the scanners. | |
634 | .SH "SEE ALSO" | |
635 | .LP | |
636 | flexdoc(1), lex(1), yacc(1), sed(1), awk(1). | |
637 | .LP | |
638 | M. E. Lesk and E. Schmidt, | |
639 | .I LEX - Lexical Analyzer Generator | |
640 | .SH DIAGNOSTICS | |
641 | .I reject_used_but_not_detected undefined | |
15637ed4 | 642 | or |
78ed81a3 | 643 | .LP |
644 | .I yymore_used_but_not_detected undefined - | |
645 | These errors can occur at compile time. They indicate that the | |
15637ed4 | 646 | scanner uses |
78ed81a3 | 647 | .B REJECT |
15637ed4 | 648 | or |
78ed81a3 | 649 | .B yymore() |
15637ed4 | 650 | but that |
78ed81a3 | 651 | .I flex |
652 | failed to notice the fact, meaning that | |
653 | .I flex | |
15637ed4 | 654 | scanned the first two sections looking for occurrences of these actions |
78ed81a3 | 655 | and failed to find any, but somehow you snuck some in (via a #include |
656 | file, for example). Make an explicit reference to the action in your | |
657 | .I flex | |
658 | input file. (Note that previously | |
659 | .I flex | |
15637ed4 | 660 | supported a |
78ed81a3 | 661 | .B %used/%unused |
662 | mechanism for dealing with this problem; this feature is still supported | |
663 | but now deprecated, and will go away soon unless the author hears from | |
664 | people who can argue compellingly that they need it.) | |
665 | .LP | |
666 | .I flex scanner jammed - | |
15637ed4 | 667 | a scanner compiled with |
78ed81a3 | 668 | .B -s |
15637ed4 RG |
669 | has encountered an input string which wasn't matched by |
670 | any of its rules. | |
78ed81a3 | 671 | .LP |
672 | .I flex input buffer overflowed - | |
15637ed4 | 673 | a scanner rule matched a string long enough to overflow the |
78ed81a3 | 674 | scanner's internal input buffer (16K bytes - controlled by |
675 | .B YY_BUF_MAX | |
676 | in "lex.skel"). | |
677 | .LP | |
678 | .I scanner requires -8 flag - | |
15637ed4 | 679 | Your scanner specification includes recognizing 8-bit characters and |
78ed81a3 | 680 | you did not specify the -8 flag (and your site has not installed flex |
681 | with -8 as the default). | |
682 | .LP | |
683 | .I | |
684 | fatal flex scanner internal error--end of buffer missed - | |
685 | This can occur in an scanner which is reentered after a long-jump | |
686 | has jumped out (or over) the scanner's activation frame. Before | |
687 | reentering the scanner, use: | |
688 | .nf | |
689 | ||
690 | yyrestart( yyin ); | |
691 | ||
692 | .fi | |
693 | .LP | |
694 | .I too many %t classes! - | |
15637ed4 | 695 | You managed to put every single character into its own %t class. |
78ed81a3 | 696 | .I flex |
15637ed4 | 697 | requires that at least one of the classes share characters. |
78ed81a3 | 698 | .SH AUTHOR |
15637ed4 RG |
699 | Vern Paxson, with the help of many ideas and much inspiration from |
700 | Van Jacobson. Original version by Jef Poskanzer. | |
78ed81a3 | 701 | .LP |
702 | See flexdoc(1) for additional credits and the address to send comments to. | |
703 | .SH DEFICIENCIES / BUGS | |
704 | .LP | |
15637ed4 RG |
705 | Some trailing context |
706 | patterns cannot be properly matched and generate | |
707 | warning messages ("Dangerous trailing context"). These are | |
708 | patterns where the ending of the | |
709 | first part of the rule matches the beginning of the second | |
710 | part, such as "zx*/xy*", where the 'x*' matches the 'x' at | |
78ed81a3 | 711 | the beginning of the trailing context. (Note that the POSIX draft |
15637ed4 | 712 | states that the text matched by such patterns is undefined.) |
78ed81a3 | 713 | .LP |
15637ed4 RG |
714 | For some trailing context rules, parts which are actually fixed-length are |
715 | not recognized as such, leading to the abovementioned performance loss. | |
78ed81a3 | 716 | In particular, parts using '|' or {n} (such as "foo{3}") are always |
15637ed4 | 717 | considered variable-length. |
78ed81a3 | 718 | .LP |
719 | Combining trailing context with the special '|' action can result in | |
720 | .I fixed | |
15637ed4 | 721 | trailing context being turned into the more expensive |
78ed81a3 | 722 | .I variable |
723 | trailing context. For example, this happens in the following example: | |
724 | .nf | |
725 | ||
726 | %% | |
727 | abc | | |
728 | xyz/def | |
729 | ||
730 | .fi | |
731 | .LP | |
732 | Use of unput() invalidates yytext and yyleng. | |
733 | .LP | |
734 | Use of unput() to push back more text than was matched can | |
15637ed4 RG |
735 | result in the pushed-back text matching a beginning-of-line ('^') |
736 | rule even though it didn't come at the beginning of the line | |
737 | (though this is rare!). | |
78ed81a3 | 738 | .LP |
739 | Pattern-matching of NUL's is substantially slower than matching other | |
15637ed4 | 740 | characters. |
78ed81a3 | 741 | .LP |
742 | .I flex | |
15637ed4 RG |
743 | does not generate correct #line directives for code internal |
744 | to the scanner; thus, bugs in | |
78ed81a3 | 745 | .I lex.skel |
15637ed4 | 746 | yield bogus line numbers. |
78ed81a3 | 747 | .LP |
15637ed4 | 748 | Due to both buffering of input and read-ahead, you cannot intermix |
78ed81a3 | 749 | calls to <stdio.h> routines, such as, for example, |
750 | .B getchar(), | |
15637ed4 | 751 | with |
78ed81a3 | 752 | .I flex |
15637ed4 | 753 | rules and expect it to work. Call |
78ed81a3 | 754 | .B input() |
15637ed4 | 755 | instead. |
78ed81a3 | 756 | .LP |
15637ed4 | 757 | The total table entries listed by the |
78ed81a3 | 758 | .B -v |
15637ed4 RG |
759 | flag excludes the number of table entries needed to determine |
760 | what rule has been matched. The number of entries is equal | |
78ed81a3 | 761 | to the number of DFA states if the scanner does not use |
762 | .B REJECT, | |
15637ed4 | 763 | and somewhat greater than the number of states if it does. |
78ed81a3 | 764 | .LP |
765 | .B REJECT | |
15637ed4 | 766 | cannot be used with the |
78ed81a3 | 767 | .I -f |
15637ed4 | 768 | or |
78ed81a3 | 769 | .I -F |
15637ed4 | 770 | options. |
78ed81a3 | 771 | .LP |
15637ed4 | 772 | Some of the macros, such as |
78ed81a3 | 773 | .B yywrap(), |
15637ed4 | 774 | may in the future become functions which live in the |
78ed81a3 | 775 | .B -lfl |
15637ed4 | 776 | library. This will doubtless break a lot of code, but may be |
78ed81a3 | 777 | required for POSIX-compliance. |
778 | .LP | |
15637ed4 | 779 | The |
78ed81a3 | 780 | .I flex |
15637ed4 | 781 | internal algorithms need documentation. |