| 1 | .SH |
| 2 | 0: Introduction |
| 3 | .PP |
| 4 | Yacc provides a general tool for imposing structure on the input to a computer program. |
| 5 | The Yacc user prepares a |
| 6 | specification of the input process; this includes rules |
| 7 | describing the input structure, code to be invoked when these |
| 8 | rules are recognized, and a low-level routine to do the |
| 9 | basic input. |
| 10 | Yacc then generates a function to control the input process. |
| 11 | This function, called a |
| 12 | .I parser , |
| 13 | calls the user-supplied low-level input routine |
| 14 | (the |
| 15 | .I "lexical analyzer" ) |
| 16 | to pick up the basic items |
| 17 | (called |
| 18 | .I tokens ) |
| 19 | from the input stream. |
| 20 | These tokens are organized according to the input structure rules, |
| 21 | called |
| 22 | .I "grammar rules" \|; |
| 23 | when one of these rules has been recognized, |
| 24 | then user code supplied for this rule, an |
| 25 | .I action , |
| 26 | is invoked; actions have the ability to return values and |
| 27 | make use of the values of other actions. |
| 28 | .PP |
| 29 | Yacc is written in a portable dialect of C |
| 30 | .[ |
| 31 | Ritchie Kernighan Language Prentice |
| 32 | .] |
| 33 | and the actions, and output subroutine, are in C as well. |
| 34 | Moreover, many of the syntactic conventions of Yacc follow C. |
| 35 | .PP |
| 36 | The heart of the input specification is a collection of grammar rules. |
| 37 | Each rule describes an allowable structure and gives it a name. |
| 38 | For example, one grammar rule might be |
| 39 | .DS |
| 40 | date : month\_name day \',\' year ; |
| 41 | .DE |
| 42 | Here, |
| 43 | .I date , |
| 44 | .I month\_name , |
| 45 | .I day , |
| 46 | and |
| 47 | .I year |
| 48 | represent structures of interest in the input process; |
| 49 | presumably, |
| 50 | .I month\_name , |
| 51 | .I day , |
| 52 | and |
| 53 | .I year |
| 54 | are defined elsewhere. |
| 55 | The comma ``,'' is enclosed in single quotes; this implies that the |
| 56 | comma is to appear literally in the input. |
| 57 | The colon and semicolon merely serve as punctuation in the rule, and have |
| 58 | no significance in controlling the input. |
| 59 | Thus, with proper definitions, the input |
| 60 | .DS |
| 61 | July 4, 1776 |
| 62 | .DE |
| 63 | might be matched by the above rule. |
| 64 | .PP |
| 65 | An important part of the input process is carried out by the |
| 66 | lexical analyzer. |
| 67 | This user routine reads the input stream, recognizing the lower level structures, |
| 68 | and communicates these tokens |
| 69 | to the parser. |
| 70 | For historical reasons, a structure recognized by the lexical analyzer is called a |
| 71 | .I "terminal symbol" , |
| 72 | while the structure recognized by the parser is called a |
| 73 | .I "nonterminal symbol" . |
| 74 | To avoid confusion, terminal symbols will usually be referred to as |
| 75 | .I tokens . |
| 76 | .PP |
| 77 | There is considerable leeway in deciding whether to recognize structures using the lexical |
| 78 | analyzer or grammar rules. |
| 79 | For example, the rules |
| 80 | .DS |
| 81 | month\_name : \'J\' \'a\' \'n\' ; |
| 82 | month\_name : \'F\' \'e\' \'b\' ; |
| 83 | |
| 84 | . . . |
| 85 | |
| 86 | month\_name : \'D\' \'e\' \'c\' ; |
| 87 | .DE |
| 88 | might be used in the above example. |
| 89 | The lexical analyzer would only need to recognize individual letters, and |
| 90 | .I month\_name |
| 91 | would be a nonterminal symbol. |
| 92 | Such low-level rules tend to waste time and space, and may |
| 93 | complicate the specification beyond Yacc's ability to deal with it. |
| 94 | Usually, the lexical analyzer would |
| 95 | recognize the month names, |
| 96 | and return an indication that a |
| 97 | .I month\_name |
| 98 | was seen; in this case, |
| 99 | .I month\_name |
| 100 | would be a token. |
| 101 | .PP |
| 102 | Literal characters such as ``,'' must also be passed through the lexical |
| 103 | analyzer, and are also considered tokens. |
| 104 | .PP |
| 105 | Specification files are very flexible. |
| 106 | It is realively easy to add to the above example the rule |
| 107 | .DS |
| 108 | date : month \'/\' day \'/\' year ; |
| 109 | .DE |
| 110 | allowing |
| 111 | .DS |
| 112 | 7 / 4 / 1776 |
| 113 | .DE |
| 114 | as a synonym for |
| 115 | .DS |
| 116 | July 4, 1776 |
| 117 | .DE |
| 118 | In most cases, this new rule could be ``slipped in'' to a working system with minimal effort, |
| 119 | and little danger of disrupting existing input. |
| 120 | .PP |
| 121 | The input being read may not conform to the |
| 122 | specifications. |
| 123 | These input errors are detected as early as is theoretically possible with a |
| 124 | left-to-right scan; |
| 125 | thus, not only is the chance of reading and computing with bad |
| 126 | input data substantially reduced, but the bad data can usually be quickly found. |
| 127 | Error handling, |
| 128 | provided as part of the input specifications, |
| 129 | permits the reentry of bad data, |
| 130 | or the continuation of the input process after skipping over the bad data. |
| 131 | .PP |
| 132 | In some cases, Yacc fails to produce a parser when given a set of |
| 133 | specifications. |
| 134 | For example, the specifications may be self contradictory, or they may |
| 135 | require a more powerful recognition mechanism than that available to Yacc. |
| 136 | The former cases represent design errors; |
| 137 | the latter cases |
| 138 | can often be corrected |
| 139 | by making |
| 140 | the lexical analyzer |
| 141 | more powerful, or by rewriting some of the grammar rules. |
| 142 | While Yacc cannot handle all possible specifications, its power |
| 143 | compares favorably with similar systems; |
| 144 | moreover, the |
| 145 | constructions which are difficult for Yacc to handle are |
| 146 | also frequently difficult for human beings to handle. |
| 147 | Some users have reported that the discipline of formulating valid |
| 148 | Yacc specifications for their input revealed errors of |
| 149 | conception or design early in the program development. |
| 150 | .PP |
| 151 | The theory underlying Yacc has been described elsewhere. |
| 152 | .[ |
| 153 | Aho Johnson Surveys LR Parsing |
| 154 | .] |
| 155 | .[ |
| 156 | Aho Johnson Ullman Ambiguous Grammars |
| 157 | .] |
| 158 | .[ |
| 159 | Aho Ullman Principles Compiler Design |
| 160 | .] |
| 161 | Yacc has been extensively used in numerous practical applications, |
| 162 | including |
| 163 | .I lint , |
| 164 | .[ |
| 165 | Johnson Lint |
| 166 | .] |
| 167 | the Portable C Compiler, |
| 168 | .[ |
| 169 | Johnson Portable Compiler Theory |
| 170 | .] |
| 171 | and a system for typesetting mathematics. |
| 172 | .[ |
| 173 | Kernighan Cherry typesetting system CACM |
| 174 | .] |
| 175 | .PP |
| 176 | The next several sections describe the |
| 177 | basic process of preparing a Yacc specification; |
| 178 | Section 1 describes the preparation of grammar rules, |
| 179 | Section 2 the preparation of the user supplied actions associated with these rules, |
| 180 | and Section 3 the preparation of lexical analyzers. |
| 181 | Section 4 describes the operation of the parser. |
| 182 | Section 5 discusses various reasons why Yacc may be unable to produce a |
| 183 | parser from a specification, and what to do about it. |
| 184 | Section 6 describes a simple mechanism for |
| 185 | handling operator precedences in arithmetic expressions. |
| 186 | Section 7 discusses error detection and recovery. |
| 187 | Section 8 discusses the operating environment and special features |
| 188 | of the parsers Yacc produces. |
| 189 | Section 9 gives some suggestions which should improve the |
| 190 | style and efficiency of the specifications. |
| 191 | Section 10 discusses some advanced topics, and Section 11 gives |
| 192 | acknowledgements. |
| 193 | Appendix A has a brief example, and Appendix B gives a |
| 194 | summary of the Yacc input syntax. |
| 195 | Appendix C gives an example using some of the more advanced |
| 196 | features of Yacc, and, finally, |
| 197 | Appendix D describes mechanisms and syntax |
| 198 | no longer actively supported, but |
| 199 | provided for historical continuity with older versions of Yacc. |