This commit was manufactured by cvs2svn to create tag 'FreeBSD-release/1.0'.
[unix-history] / usr.bin / lex / lex.1
CommitLineData
78ed81a3 1.TH FLEX 1 "26 May 1990" "Version 2.3"
2.SH NAME
3flex - fast lexical analyzer generator
4.SH SYNOPSIS
5.B flex
6.B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton]
7.I [filename ...]
8.SH DESCRIPTION
9.I flex
15637ed4 10is a tool for generating
78ed81a3 11.I scanners:
15637ed4 12programs which recognized lexical patterns in text.
78ed81a3 13.I flex
15637ed4
RG
14reads
15the given input files, or its standard input if no file names are given,
16for a description of a scanner to generate. The description is in
17the form of pairs
18of regular expressions and C code, called
78ed81a3 19.I rules. flex
15637ed4 20generates as output a C source file,
78ed81a3 21.B lex.yy.c,
15637ed4 22which defines a routine
78ed81a3 23.B yylex().
15637ed4 24This file is compiled and linked with the
78ed81a3 25.B -lfl
15637ed4
RG
26library to produce an executable. When the executable is run,
27it analyzes its input for occurrences
28of the regular expressions. Whenever it finds one, it executes
29the corresponding C code.
78ed81a3 30.LP
15637ed4 31For full documentation, see
78ed81a3 32.B flexdoc(1).
15637ed4 33This manual entry is intended for use as a quick reference.
78ed81a3 34.SH OPTIONS
35.I flex
15637ed4 36has the following options:
78ed81a3 37.TP
38.B -b
15637ed4 39Generate backtracking information to
78ed81a3 40.I lex.backtrack.
15637ed4
RG
41This is a list of scanner states which require backtracking
42and the input characters on which they do so. By adding rules one
43can remove backtracking states. If all backtracking states
44are eliminated and
78ed81a3 45.B -f
15637ed4 46or
78ed81a3 47.B -F
15637ed4 48is used, the generated scanner will run faster.
78ed81a3 49.TP
50.B -c
51is a do-nothing, deprecated option included for POSIX compliance.
52.IP
53.B NOTE:
15637ed4 54in previous releases of
78ed81a3 55.I flex
56.B -c
15637ed4
RG
57specified table-compression options. This functionality is
58now given by the
78ed81a3 59.B -C
15637ed4 60flag. To ease the the impact of this change, when
78ed81a3 61.I flex
15637ed4 62encounters
78ed81a3 63.B -c,
15637ed4 64it currently issues a warning message and assumes that
78ed81a3 65.B -C
15637ed4 66was desired instead. In the future this "promotion" of
78ed81a3 67.B -c
15637ed4 68to
78ed81a3 69.B -C
70will go away in the name of full POSIX compliance (unless
71the POSIX meaning is removed first).
72.TP
73.B -d
74makes the generated scanner run in
75.I debug
15637ed4 76mode. Whenever a pattern is recognized and the global
78ed81a3 77.B yy_flex_debug
15637ed4
RG
78is non-zero (which is the default), the scanner will
79write to
78ed81a3 80.I stderr
15637ed4 81a line of the form:
78ed81a3 82.nf
83
84 --accepting rule at line 53 ("the matched text")
85
86.fi
15637ed4 87The line number refers to the location of the rule in the file
78ed81a3 88defining the scanner (i.e., the file that was fed to flex). Messages
15637ed4
RG
89are also generated when the scanner backtracks, accepts the
90default rule, reaches the end of its input buffer (or encounters
78ed81a3 91a NUL; the two look the same as far as the scanner's concerned),
15637ed4 92or reaches an end-of-file.
78ed81a3 93.TP
94.B -f
95specifies (take your pick)
96.I full table
15637ed4 97or
78ed81a3 98.I fast scanner.
15637ed4
RG
99No table compression is done. The result is large but fast.
100This option is equivalent to
78ed81a3 101.B -Cf
15637ed4 102(see below).
78ed81a3 103.TP
104.B -i
105instructs
106.I flex
15637ed4 107to generate a
78ed81a3 108.I case-insensitive
15637ed4 109scanner. The case of letters given in the
78ed81a3 110.I flex
15637ed4
RG
111input patterns will
112be ignored, and tokens in the input will be matched regardless of case. The
113matched text given in
78ed81a3 114.I yytext
15637ed4 115will have the preserved case (i.e., it will not be folded).
78ed81a3 116.TP
117.B -n
118is another do-nothing, deprecated option included only for
119POSIX compliance.
120.TP
121.B -p
122generates a performance report to stderr. The report
15637ed4 123consists of comments regarding features of the
78ed81a3 124.I flex
15637ed4 125input file which will cause a loss of performance in the resulting scanner.
78ed81a3 126.TP
127.B -s
128causes the
129.I default rule
15637ed4 130(that unmatched scanner input is echoed to
78ed81a3 131.I stdout)
15637ed4
RG
132to be suppressed. If the scanner encounters input that does not
133match any of its rules, it aborts with an error.
78ed81a3 134.TP
135.B -t
136instructs
137.I flex
15637ed4
RG
138to write the scanner it generates to standard output instead
139of
78ed81a3 140.B lex.yy.c.
141.TP
142.B -v
143specifies that
144.I flex
15637ed4 145should write to
78ed81a3 146.I stderr
15637ed4 147a summary of statistics regarding the scanner it generates.
78ed81a3 148.TP
149.B -F
150specifies that the
151.ul
152fast
15637ed4
RG
153scanner table representation should be used. This representation is
154about as fast as the full table representation
78ed81a3 155.ul
156(-f),
15637ed4
RG
157and for some sets of patterns will be considerably smaller (and for
158others, larger). See
78ed81a3 159.B flexdoc(1)
15637ed4 160for details.
78ed81a3 161.IP
15637ed4 162This option is equivalent to
78ed81a3 163.B -CF
15637ed4 164(see below).
78ed81a3 165.TP
166.B -I
167instructs
168.I flex
15637ed4 169to generate an
78ed81a3 170.I interactive
15637ed4
RG
171scanner, that is, a scanner which stops immediately rather than
172looking ahead if it knows
173that the currently scanned text cannot be part of a longer rule's match.
174Again, see
78ed81a3 175.B flexdoc(1)
15637ed4 176for details.
78ed81a3 177.IP
15637ed4 178Note,
78ed81a3 179.B -I
15637ed4 180cannot be used in conjunction with
78ed81a3 181.I full
15637ed4 182or
78ed81a3 183.I fast tables,
15637ed4 184i.e., the
78ed81a3 185.B -f, -F, -Cf,
15637ed4 186or
78ed81a3 187.B -CF
15637ed4 188flags.
78ed81a3 189.TP
190.B -L
191instructs
192.I flex
15637ed4 193not to generate
78ed81a3 194.B #line
15637ed4 195directives in
78ed81a3 196.B lex.yy.c.
15637ed4
RG
197The default is to generate such directives so error
198messages in the actions will be correctly
199located with respect to the original
78ed81a3 200.I flex
15637ed4
RG
201input file, and not to
202the fairly meaningless line numbers of
78ed81a3 203.B lex.yy.c.
204.TP
205.B -T
206makes
207.I flex
15637ed4 208run in
78ed81a3 209.I trace
15637ed4 210mode. It will generate a lot of messages to
78ed81a3 211.I stdout
15637ed4
RG
212concerning
213the form of the input and the resultant non-deterministic and deterministic
214finite automata. This option is mostly for use in maintaining
78ed81a3 215.I flex.
216.TP
217.B -8
218instructs
219.I flex
15637ed4
RG
220to generate an 8-bit scanner.
221On some sites, this is the default. On others, the default
222is 7-bit characters. To see which is the case, check the verbose
78ed81a3 223.B (-v)
15637ed4
RG
224output for "equivalence classes created". If the denominator of
225the number shown is 128, then by default
78ed81a3 226.I flex
15637ed4
RG
227is generating 7-bit characters. If it is 256, then the default is
2288-bit characters.
78ed81a3 229.TP
230.B -C[efmF]
231controls the degree of table compression.
232.IP
233.B -Ce
234directs
235.I flex
15637ed4 236to construct
78ed81a3 237.I equivalence classes,
15637ed4
RG
238i.e., sets of characters
239which have identical lexical properties.
240Equivalence classes usually give
241dramatic reductions in the final table/object file sizes (typically
242a factor of 2-5) and are pretty cheap performance-wise (one array
243look-up per character scanned).
78ed81a3 244.IP
245.B -Cf
246specifies that the
247.I full
15637ed4 248scanner tables should be generated -
78ed81a3 249.I flex
15637ed4
RG
250should not compress the
251tables by taking advantages of similar transition functions for
252different states.
78ed81a3 253.IP
254.B -CF
255specifies that the alternate fast scanner representation (described in
256.B flexdoc(1))
15637ed4 257should be used.
78ed81a3 258.IP
259.B -Cm
260directs
261.I flex
15637ed4 262to construct
78ed81a3 263.I meta-equivalence classes,
15637ed4
RG
264which are sets of equivalence classes (or characters, if equivalence
265classes are not being used) that are commonly used together. Meta-equivalence
266classes are often a big win when using compressed tables, but they
267have a moderate performance impact (one or two "if" tests and one
268array look-up per character scanned).
78ed81a3 269.IP
270A lone
271.B -C
272specifies that the scanner tables should be compressed but neither
273equivalence classes nor meta-equivalence classes should be used.
274.IP
15637ed4 275The options
78ed81a3 276.B -Cf
15637ed4 277or
78ed81a3 278.B -CF
15637ed4 279and
78ed81a3 280.B -Cm
15637ed4
RG
281do not make sense together - there is no opportunity for meta-equivalence
282classes if the table is not being compressed. Otherwise the options
283may be freely mixed.
78ed81a3 284.IP
285The default setting is
286.B -Cem,
287which specifies that
288.I flex
289should generate equivalence classes
290and meta-equivalence classes. This setting provides the highest
291degree of table compression. You can trade off
292faster-executing scanners at the cost of larger tables with
293the following generally being true:
294.nf
295
296 slowest & smallest
297 -Cem
298 -Cm
299 -Ce
300 -C
301 -C{f,F}e
302 -C{f,F}
303 fastest & largest
304
305.fi
306.IP
307.B -C
308options are not cumulative; whenever the flag is encountered, the
309previous -C settings are forgotten.
310.TP
311.B -Sskeleton_file
312overrides the default skeleton file from which
313.I flex
314constructs its scanners. You'll never need this option unless you are doing
315.I flex
15637ed4 316maintenance or development.
78ed81a3 317.SH SUMMARY OF FLEX REGULAR EXPRESSIONS
15637ed4
RG
318The patterns in the input are written using an extended set of regular
319expressions. These are:
78ed81a3 320.nf
321
322 x match the character 'x'
323 . any character except newline
324 [xyz] a "character class"; in this case, the pattern
325 matches either an 'x', a 'y', or a 'z'
326 [abj-oZ] a "character class" with a range in it; matches
327 an 'a', a 'b', any letter from 'j' through 'o',
328 or a 'Z'
329 [^A-Z] a "negated character class", i.e., any character
330 but those in the class. In this case, any
331 character EXCEPT an uppercase letter.
332 [^A-Z\\n] any character EXCEPT an uppercase letter or
333 a newline
334 r* zero or more r's, where r is any regular expression
335 r+ one or more r's
336 r? zero or one r's (that is, "an optional r")
337 r{2,5} anywhere from two to five r's
338 r{2,} two or more r's
339 r{4} exactly 4 r's
340 {name} the expansion of the "name" definition
341 (see above)
342 "[xyz]\\"foo"
343 the literal string: [xyz]"foo
344 \\X if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
345 then the ANSI-C interpretation of \\x.
346 Otherwise, a literal 'X' (used to escape
347 operators such as '*')
348 \\123 the character with octal value 123
349 \\x2a the character with hexadecimal value 2a
350 (r) match an r; parentheses are used to override
351 precedence (see below)
352
353
354 rs the regular expression r followed by the
355 regular expression s; called "concatenation"
356
357
358 r|s either an r or an s
359
360
361 r/s an r but only if it is followed by an s. The
362 s is not part of the matched text. This type
363 of pattern is called as "trailing context".
364 ^r an r, but only at the beginning of a line
365 r$ an r, but only at the end of a line. Equivalent
366 to "r/\\n".
367
368
369 <s>r an r, but only in start condition s (see
370 below for discussion of start conditions)
371 <s1,s2,s3>r
372 same, but in any of start conditions s1,
373 s2, or s3
374
375
376 <<EOF>> an end-of-file
377 <s1,s2><<EOF>>
378 an end-of-file when in start condition s1 or s2
379
380.fi
15637ed4
RG
381The regular expressions listed above are grouped according to
382precedence, from highest precedence at the top to lowest at the bottom.
383Those grouped together have equal precedence.
78ed81a3 384.LP
15637ed4 385Some notes on patterns:
78ed81a3 386.IP -
15637ed4 387Negated character classes
78ed81a3 388.I match newlines
389unless "\\n" (or an equivalent escape sequence) is one of the
15637ed4 390characters explicitly present in the negated character class
78ed81a3 391(e.g., "[^A-Z\\n]").
392.IP -
15637ed4
RG
393A rule can have at most one instance of trailing context (the '/' operator
394or the '$' operator). The start condition, '^', and "<<EOF>>" patterns
395can only occur at the beginning of a pattern, and, as well as with '/' and '$',
396cannot be grouped inside parentheses. The following are all illegal:
78ed81a3 397.nf
398
399 foo/bar$
400 foo|(bar$)
401 foo|^bar
402 <sc1>foo<sc2>bar
403
404.fi
405.SH SUMMARY OF SPECIAL ACTIONS
15637ed4 406In addition to arbitrary C code, the following can appear in actions:
78ed81a3 407.IP -
408.B ECHO
409copies yytext to the scanner's output.
410.IP -
411.B BEGIN
412followed by the name of a start condition places the scanner in the
15637ed4 413corresponding start condition.
78ed81a3 414.IP -
415.B REJECT
416directs the scanner to proceed on to the "second best" rule which matched the
15637ed4 417input (or a prefix of the input).
78ed81a3 418.B yytext
15637ed4 419and
78ed81a3 420.B yyleng
15637ed4 421are set up appropriately. Note that
78ed81a3 422.B REJECT
15637ed4
RG
423is a particularly expensive feature in terms scanner performance;
424if it is used in
78ed81a3 425.I any
15637ed4 426of the scanner's actions it will slow down
78ed81a3 427.I all
15637ed4 428of the scanner's matching. Furthermore,
78ed81a3 429.B REJECT
15637ed4 430cannot be used with the
78ed81a3 431.I -f
15637ed4 432or
78ed81a3 433.I -F
15637ed4 434options.
78ed81a3 435.IP
15637ed4 436Note also that unlike the other special actions,
78ed81a3 437.B REJECT
15637ed4 438is a
78ed81a3 439.I branch;
15637ed4 440code immediately following it in the action will
78ed81a3 441.I not
15637ed4 442be executed.
78ed81a3 443.IP -
444.B yymore()
15637ed4
RG
445tells the scanner that the next time it matches a rule, the corresponding
446token should be
78ed81a3 447.I appended
15637ed4 448onto the current value of
78ed81a3 449.B yytext
15637ed4 450rather than replacing it.
78ed81a3 451.IP -
452.B yyless(n)
15637ed4 453returns all but the first
78ed81a3 454.I n
15637ed4
RG
455characters of the current token back to the input stream, where they
456will be rescanned when the scanner looks for the next match.
78ed81a3 457.B yytext
15637ed4 458and
78ed81a3 459.B yyleng
15637ed4 460are adjusted appropriately (e.g.,
78ed81a3 461.B yyleng
15637ed4 462will now be equal to
78ed81a3 463.I n
464).
465.IP -
466.B unput(c)
15637ed4 467puts the character
78ed81a3 468.I c
15637ed4 469back onto the input stream. It will be the next character scanned.
78ed81a3 470.IP -
471.B input()
15637ed4 472reads the next character from the input stream (this routine is called
78ed81a3 473.B yyinput()
15637ed4 474if the scanner is compiled using
78ed81a3 475.B C++).
476.IP -
477.B yyterminate()
15637ed4
RG
478can be used in lieu of a return statement in an action. It terminates
479the scanner and returns a 0 to the scanner's caller, indicating "all done".
78ed81a3 480.IP
15637ed4 481By default,
78ed81a3 482.B yyterminate()
15637ed4
RG
483is also called when an end-of-file is encountered. It is a macro and
484may be redefined.
78ed81a3 485.IP -
486.B YY_NEW_FILE
15637ed4
RG
487is an action available only in <<EOF>> rules. It means "Okay, I've
488set up a new input file, continue scanning".
78ed81a3 489.IP -
490.B yy_create_buffer( file, size )
15637ed4 491takes a
78ed81a3 492.I FILE
15637ed4 493pointer and an integer
78ed81a3 494.I size.
15637ed4
RG
495It returns a YY_BUFFER_STATE
496handle to a new input buffer large enough to accomodate
78ed81a3 497.I size
15637ed4 498characters and associated with the given file. When in doubt, use
78ed81a3 499.B YY_BUF_SIZE
15637ed4 500for the size.
78ed81a3 501.IP -
502.B yy_switch_to_buffer( new_buffer )
15637ed4
RG
503switches the scanner's processing to scan for tokens from
504the given buffer, which must be a YY_BUFFER_STATE.
78ed81a3 505.IP -
506.B yy_delete_buffer( buffer )
15637ed4 507deletes the given buffer.
78ed81a3 508.SH VALUES AVAILABLE TO THE USER
509.IP -
510.B char *yytext
15637ed4 511holds the text of the current token. It may not be modified.
78ed81a3 512.IP -
513.B int yyleng
15637ed4 514holds the length of the current token. It may not be modified.
78ed81a3 515.IP -
516.B FILE *yyin
15637ed4 517is the file which by default
78ed81a3 518.I flex
15637ed4
RG
519reads from. It may be redefined but doing so only makes sense before
520scanning begins. Changing it in the middle of scanning will have
521unexpected results since
78ed81a3 522.I flex
15637ed4
RG
523buffers its input. Once scanning terminates because an end-of-file
524has been seen,
78ed81a3 525.B
526void yyrestart( FILE *new_file )
15637ed4 527may be called to point
78ed81a3 528.I yyin
15637ed4 529at the new input file.
78ed81a3 530.IP -
531.B FILE *yyout
15637ed4 532is the file to which
78ed81a3 533.B ECHO
15637ed4 534actions are done. It can be reassigned by the user.
78ed81a3 535.IP -
536.B YY_CURRENT_BUFFER
15637ed4 537returns a
78ed81a3 538.B YY_BUFFER_STATE
15637ed4 539handle to the current buffer.
78ed81a3 540.SH MACROS THE USER CAN REDEFINE
541.IP -
542.B YY_DECL
15637ed4
RG
543controls how the scanning routine is declared.
544By default, it is "int yylex()", or, if prototypes are being
545used, "int yylex(void)". This definition may be changed by redefining
546the "YY_DECL" macro. Note that
547if you give arguments to the scanning routine using a
548K&R-style/non-prototyped function declaration, you must terminate
549the definition with a semi-colon (;).
78ed81a3 550.IP -
15637ed4
RG
551The nature of how the scanner
552gets its input can be controlled by redefining the
78ed81a3 553.B YY_INPUT
15637ed4
RG
554macro.
555YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
556action is to place up to
78ed81a3 557.I max_size
15637ed4 558characters in the character array
78ed81a3 559.I buf
15637ed4 560and return in the integer variable
78ed81a3 561.I result
15637ed4
RG
562either the
563number of characters read or the constant YY_NULL (0 on Unix systems)
564to indicate EOF. The default YY_INPUT reads from the
565global file-pointer "yyin".
566A sample redefinition of YY_INPUT (in the definitions
567section of the input file):
78ed81a3 568.nf
569
570 %{
571 #undef YY_INPUT
572 #define YY_INPUT(buf,result,max_size) \\
573 { \\
574 int c = getchar(); \\
575 result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
576 }
577 %}
578
579.fi
580.IP -
15637ed4
RG
581When the scanner receives an end-of-file indication from YY_INPUT,
582it then checks the
78ed81a3 583.B yywrap()
15637ed4 584function. If
78ed81a3 585.B yywrap()
15637ed4
RG
586returns false (zero), then it is assumed that the
587function has gone ahead and set up
78ed81a3 588.I yyin
15637ed4
RG
589to point to another input file, and scanning continues. If it returns
590true (non-zero), then the scanner terminates, returning 0 to its
591caller.
78ed81a3 592.IP
15637ed4 593The default
78ed81a3 594.B yywrap()
15637ed4
RG
595always returns 1. Presently, to redefine it you must first
596"#undef yywrap", as it is currently implemented as a macro. It is
597likely that
78ed81a3 598.B yywrap()
15637ed4 599will soon be defined to be a function rather than a macro.
78ed81a3 600.IP -
601YY_USER_ACTION
15637ed4
RG
602can be redefined to provide an action
603which is always executed prior to the matched rule's action.
78ed81a3 604.IP -
15637ed4 605The macro
78ed81a3 606.B YY_USER_INIT
15637ed4
RG
607may be redefined to provide an action which is always executed before
608the first scan.
78ed81a3 609.IP -
15637ed4
RG
610In the generated scanner, the actions are all gathered in one large
611switch statement and separated using
78ed81a3 612.B YY_BREAK,
15637ed4
RG
613which may be redefined. By default, it is simply a "break", to separate
614each rule's action from the following rule's.
78ed81a3 615.SH FILES
616.TP
617.I lex.skel
15637ed4 618skeleton scanner.
78ed81a3 619.TP
620.I lex.yy.c
621generated scanner (called
622.I lexyy.c
15637ed4 623on some systems).
78ed81a3 624.TP
625.I lex.backtrack
15637ed4 626backtracking information for
78ed81a3 627.B -b
628flag (called
629.I lex.bck
15637ed4 630on some systems).
78ed81a3 631.TP
632.B -lfl
633library with which to link the scanners.
634.SH "SEE ALSO"
635.LP
636flexdoc(1), lex(1), yacc(1), sed(1), awk(1).
637.LP
638M. E. Lesk and E. Schmidt,
639.I LEX - Lexical Analyzer Generator
640.SH DIAGNOSTICS
641.I reject_used_but_not_detected undefined
15637ed4 642or
78ed81a3 643.LP
644.I yymore_used_but_not_detected undefined -
645These errors can occur at compile time. They indicate that the
15637ed4 646scanner uses
78ed81a3 647.B REJECT
15637ed4 648or
78ed81a3 649.B yymore()
15637ed4 650but that
78ed81a3 651.I flex
652failed to notice the fact, meaning that
653.I flex
15637ed4 654scanned the first two sections looking for occurrences of these actions
78ed81a3 655and failed to find any, but somehow you snuck some in (via a #include
656file, for example). Make an explicit reference to the action in your
657.I flex
658input file. (Note that previously
659.I flex
15637ed4 660supported a
78ed81a3 661.B %used/%unused
662mechanism for dealing with this problem; this feature is still supported
663but now deprecated, and will go away soon unless the author hears from
664people who can argue compellingly that they need it.)
665.LP
666.I flex scanner jammed -
15637ed4 667a scanner compiled with
78ed81a3 668.B -s
15637ed4
RG
669has encountered an input string which wasn't matched by
670any of its rules.
78ed81a3 671.LP
672.I flex input buffer overflowed -
15637ed4 673a scanner rule matched a string long enough to overflow the
78ed81a3 674scanner's internal input buffer (16K bytes - controlled by
675.B YY_BUF_MAX
676in "lex.skel").
677.LP
678.I scanner requires -8 flag -
15637ed4 679Your scanner specification includes recognizing 8-bit characters and
78ed81a3 680you did not specify the -8 flag (and your site has not installed flex
681with -8 as the default).
682.LP
683.I
684fatal flex scanner internal error--end of buffer missed -
685This can occur in an scanner which is reentered after a long-jump
686has jumped out (or over) the scanner's activation frame. Before
687reentering the scanner, use:
688.nf
689
690 yyrestart( yyin );
691
692.fi
693.LP
694.I too many %t classes! -
15637ed4 695You managed to put every single character into its own %t class.
78ed81a3 696.I flex
15637ed4 697requires that at least one of the classes share characters.
78ed81a3 698.SH AUTHOR
15637ed4
RG
699Vern Paxson, with the help of many ideas and much inspiration from
700Van Jacobson. Original version by Jef Poskanzer.
78ed81a3 701.LP
702See flexdoc(1) for additional credits and the address to send comments to.
703.SH DEFICIENCIES / BUGS
704.LP
15637ed4
RG
705Some trailing context
706patterns cannot be properly matched and generate
707warning messages ("Dangerous trailing context"). These are
708patterns where the ending of the
709first part of the rule matches the beginning of the second
710part, such as "zx*/xy*", where the 'x*' matches the 'x' at
78ed81a3 711the beginning of the trailing context. (Note that the POSIX draft
15637ed4 712states that the text matched by such patterns is undefined.)
78ed81a3 713.LP
15637ed4
RG
714For some trailing context rules, parts which are actually fixed-length are
715not recognized as such, leading to the abovementioned performance loss.
78ed81a3 716In particular, parts using '|' or {n} (such as "foo{3}") are always
15637ed4 717considered variable-length.
78ed81a3 718.LP
719Combining trailing context with the special '|' action can result in
720.I fixed
15637ed4 721trailing context being turned into the more expensive
78ed81a3 722.I variable
723trailing context. For example, this happens in the following example:
724.nf
725
726 %%
727 abc |
728 xyz/def
729
730.fi
731.LP
732Use of unput() invalidates yytext and yyleng.
733.LP
734Use of unput() to push back more text than was matched can
15637ed4
RG
735result in the pushed-back text matching a beginning-of-line ('^')
736rule even though it didn't come at the beginning of the line
737(though this is rare!).
78ed81a3 738.LP
739Pattern-matching of NUL's is substantially slower than matching other
15637ed4 740characters.
78ed81a3 741.LP
742.I flex
15637ed4
RG
743does not generate correct #line directives for code internal
744to the scanner; thus, bugs in
78ed81a3 745.I lex.skel
15637ed4 746yield bogus line numbers.
78ed81a3 747.LP
15637ed4 748Due to both buffering of input and read-ahead, you cannot intermix
78ed81a3 749calls to <stdio.h> routines, such as, for example,
750.B getchar(),
15637ed4 751with
78ed81a3 752.I flex
15637ed4 753rules and expect it to work. Call
78ed81a3 754.B input()
15637ed4 755instead.
78ed81a3 756.LP
15637ed4 757The total table entries listed by the
78ed81a3 758.B -v
15637ed4
RG
759flag excludes the number of table entries needed to determine
760what rule has been matched. The number of entries is equal
78ed81a3 761to the number of DFA states if the scanner does not use
762.B REJECT,
15637ed4 763and somewhat greater than the number of states if it does.
78ed81a3 764.LP
765.B REJECT
15637ed4 766cannot be used with the
78ed81a3 767.I -f
15637ed4 768or
78ed81a3 769.I -F
15637ed4 770options.
78ed81a3 771.LP
15637ed4 772Some of the macros, such as
78ed81a3 773.B yywrap(),
15637ed4 774may in the future become functions which live in the
78ed81a3 775.B -lfl
15637ed4 776library. This will doubtless break a lot of code, but may be
78ed81a3 777required for POSIX-compliance.
778.LP
15637ed4 779The
78ed81a3 780.I flex
15637ed4 781internal algorithms need documentation.