Commit | Line | Data |
---|---|---|
a7e60862 WJ |
1 | .TH MAWK 1 "Jan 22 1992" "Version 1.1" "USER COMMANDS" |
2 | .\" strings | |
3 | .ds ex \fIexpr\fR | |
4 | .SH NAME | |
5 | mawk \- pattern scanning and text processing language | |
6 | ||
7 | .SH SYNOPSIS | |
8 | .B mawk | |
9 | [\-\fBW | |
10 | .IR option ] | |
11 | [\-\fBF | |
12 | .IR value ] | |
13 | [\-\fBv | |
14 | .IR var=value ] | |
15 | [\-\|\-] 'program text' [file ...] | |
16 | .br | |
17 | .B mawk | |
18 | [\-\fBW | |
19 | .IR option ] | |
20 | [\-\fBF | |
21 | .IR value ] | |
22 | [\-\fBv | |
23 | .IR var=value ] | |
24 | [\-\fBf | |
25 | .IR program-file ] | |
26 | [\-\|\-] [file ...] | |
27 | ||
28 | .SH DESCRIPTION | |
29 | .B mawk | |
30 | is an interpreter for the AWK Programming Language. | |
31 | The AWK language | |
32 | is useful for manipulation of data files, | |
33 | text retrieval and processing, | |
34 | and for prototyping and experimenting with algorithms. | |
35 | .B mawk | |
36 | is a \fInew awk\fR meaning it implements the AWK language as | |
37 | defined in Aho, Kernighan and Weinberger, | |
38 | .I "The AWK Programming Language," | |
39 | Addison-Wesley Publishing, 1988. (Hereafter referred to as | |
40 | the AWK book.) | |
41 | .B mawk | |
42 | conforms to the Posix 1003.2 | |
43 | (draft 11.2) | |
44 | definition of the AWK language | |
45 | which contains a few features not described in the AWK | |
46 | book, and | |
47 | .B mawk | |
48 | provides a small number of extensions. | |
49 | ||
50 | An AWK program is a sequence of \fIpattern {action}\fR pairs and | |
51 | function definitions. | |
52 | Short programs are entered on the command line | |
53 | usually enclosed in ' ' to avoid shell | |
54 | interpretation. | |
55 | Longer programs can be read in from a | |
56 | file with the \-f option. | |
57 | Data input is read from the list of files on | |
58 | the command line or from standard input when the list is empty. | |
59 | The input is broken into records as determined by the | |
60 | record separator variable, \fBRS\fR. Initially, | |
61 | .B RS | |
62 | = "\\n" and records are synonymous with lines. | |
63 | Each record is compared against each | |
64 | .I pattern | |
65 | and if it matches, the program text for | |
66 | .I "{action}" | |
67 | is executed. | |
68 | ||
69 | .SH OPTIONS | |
70 | ||
71 | .TP \w'\-\fBv'+\w'\fIvar=value'u+2n | |
72 | \-\fBF \fIvalue | |
73 | sets the field separator, \fBFS\fR, to | |
74 | .IR value . | |
75 | ||
76 | .IP "\-\fBf \fIfile" | |
77 | Program text is read from \fIfile\fR instead of from the | |
78 | command line. Multiple \-f options are allowed. | |
79 | ||
80 | .IP "\-\fBv \fIvar=value" | |
81 | assigns | |
82 | .I value | |
83 | to program variable | |
84 | .IR var . | |
85 | ||
86 | .IP "\-\|\-" | |
87 | indicates the unambiguous end of options. | |
88 | .PP | |
89 | The above options will be available with any Posix compatible | |
90 | implementation of AWK, and implementation specific options are | |
91 | prefaced with \-W. | |
92 | .B mawk | |
93 | provides three: | |
94 | ||
95 | .TP \w'\-\fBv'+\w'\fIvar=value'u+2n | |
96 | \-\fBW \fRversion | |
97 | .B mawk | |
98 | writes its version and copyright | |
99 | to stdout and compiled limits to | |
100 | stderr and exits 0. | |
101 | .TP | |
102 | \-\fBW \fRdump | |
103 | writes an assembler like listing of the internal | |
104 | representation of the program to stderr. | |
105 | .TP | |
106 | \-\fBW \fRsprintf=\fInum | |
107 | adjusts the size of | |
108 | .B mawk's | |
109 | internal sprintf buffer to | |
110 | .I num | |
111 | bytes. More than rare use of this option indicates | |
112 | .B mawk | |
113 | should be recompiled. | |
114 | .TP | |
115 | \-\fBW \fRposix_space | |
116 | forces | |
117 | .B mawk | |
118 | not to consider '\\n' to be space. | |
119 | ||
120 | .SH "THE AWK LANGUAGE" | |
121 | .SS "\fB1. Program structure" | |
122 | An AWK program is a sequence of | |
123 | .I "pattern {action}" | |
124 | pairs and user | |
125 | function definitions. | |
126 | .PP | |
127 | A pattern can be: | |
128 | .nf | |
129 | .RS | |
130 | \fBBEGIN | |
131 | END\fR | |
132 | expression | |
133 | expression , expression | |
134 | .sp | |
135 | .RE | |
136 | .fi | |
137 | One, but not both, | |
138 | of \fIpattern {action}\fR can be omitted. If | |
139 | .I {action} | |
140 | is omitted it is implicitly { print }. If | |
141 | .I pattern | |
142 | is omitted, then it is implicitly matched. | |
143 | .B BEGIN | |
144 | and | |
145 | .B END | |
146 | patterns require an action. | |
147 | .PP | |
148 | Statements are terminated by newlines, semi-colons or both. | |
149 | Groups of statements such as | |
150 | actions or loop bodies are blocked via { ... } as in C. The | |
151 | last statement in a block doesn't need a terminator. Blank lines | |
152 | have no meaning; an empty statement is terminated with a | |
153 | semi-colon. Long statements | |
154 | can be continued with a backslash, \\\|. A statement can be broken | |
155 | without a backslash after a comma, left brace, &&, ||, | |
156 | .BR do , | |
157 | .BR else , | |
158 | the right parenthesis of an | |
159 | .BR if , | |
160 | .B while | |
161 | or | |
162 | .B for | |
163 | statement, and the | |
164 | right parenthesis of a function definition. | |
165 | A comment starts with # and extends to, but does not include | |
166 | the end of line. | |
167 | .PP | |
168 | The following statements control program flow inside blocks. | |
169 | .RS | |
170 | .PP | |
171 | .B if | |
172 | ( \*(ex ) | |
173 | .I statement | |
174 | .PP | |
175 | .B if | |
176 | ( \*(ex ) | |
177 | .I statement | |
178 | .B else | |
179 | .I statement | |
180 | .PP | |
181 | .B while | |
182 | ( \*(ex ) | |
183 | .I statement | |
184 | .PP | |
185 | .B do | |
186 | .I statement | |
187 | .B while | |
188 | ( \*(ex ) | |
189 | .PP | |
190 | .B for | |
191 | ( | |
192 | \fIopt_expr\fR ; | |
193 | \fIopt_expr\fR ; | |
194 | \fIopt_expr\fR | |
195 | ) | |
196 | .I statement | |
197 | .PP | |
198 | .B for | |
199 | ( \fIvar \fBin \fIarray\fR ) | |
200 | .I statement | |
201 | .PP | |
202 | .B continue | |
203 | .PP | |
204 | .B break | |
205 | .RE | |
206 | .\" | |
207 | .SS "\fB2. Data types, conversion and comparison" | |
208 | There are two basic data types, numeric and string. | |
209 | Numeric constants can be integer like \-2, | |
210 | decimal like 1.08, or in scientific notation like | |
211 | \-1.1e4 or .28E\-3. All numbers are represented internally and all | |
212 | computations are done in floating point arithmetic. | |
213 | So for example, the expression | |
214 | 0.2e2 == 20 | |
215 | is true and true is represented as 1.0. | |
216 | .PP | |
217 | String constants are enclosed in double quotes. | |
218 | .sp | |
219 | .ce | |
220 | "This is a string with a newline at the end.\\n" | |
221 | .sp | |
222 | Strings can be continued across a line by escaping (\\) the newline. | |
223 | The following escape sequences are recognized. | |
224 | .nf | |
225 | .sp | |
226 | \\\\ \\ | |
227 | \\" " | |
228 | \\a alert, ascii 7 | |
229 | \\b backspace, ascii 8 | |
230 | \\t tab, ascii 9 | |
231 | \\n newline, ascii 10 | |
232 | \\v vertical tab, ascii 11 | |
233 | \\f formfeed, ascii 12 | |
234 | \\r carriage return, ascii 13 | |
235 | \\ddd 1, 2 or 3 octal digits for ascii ddd | |
236 | \\xhh 1 or 2 hex digits for ascii hh | |
237 | .sp | |
238 | .fi | |
239 | If you escape any other character \\c, you get \\c, i.e., | |
240 | .B mawk | |
241 | ignores the escape. | |
242 | .PP | |
243 | There are really three basic data types; the third is | |
244 | .I "number and string" | |
245 | which has both a numeric value and a string value | |
246 | at the same time. | |
247 | User defined variables come into existence when first referenced | |
248 | and are initialized to | |
249 | .IR null , | |
250 | a number and string value which has numeric value 0 and string value | |
251 | "". | |
252 | Non-trivial number and string typed data come from input | |
253 | and are typically stored in fields. (See section 4). | |
254 | .PP | |
255 | The type of an expression is determined by its context and automatic | |
256 | type conversion occurs if needed. For example, to evaluate the | |
257 | statements | |
258 | .nf | |
259 | .sp | |
260 | y = x + 2 ; z = x "hello" | |
261 | .sp | |
262 | .fi | |
263 | The value stored in variable y will be typed numeric. | |
264 | If x is not numeric, | |
265 | the value taken from x is converted to numeric before it is added to | |
266 | 2 and stored in y. The value stored in variable z will be typed | |
267 | string, and the value of x will be converted to string if necessary | |
268 | and concatenated with "hello". (Of course, the value and type | |
269 | stored in x is not changed by any conversions.) | |
270 | A string expression is converted to numeric using its longest | |
271 | numeric prefix as with | |
272 | .IR atof (3). | |
273 | A numeric expression is converted to string by replacing | |
274 | .I expr | |
275 | with | |
276 | .BR sprintf(CONVFMT , | |
277 | .IR expr ), | |
278 | unless | |
279 | .I expr | |
280 | can be represented on the host machine as an exact integer then | |
281 | it is converted to \fBsprintf\fR("%d", \*(ex). | |
282 | .B Sprintf() | |
283 | is an AWK built-in that duplicates the functionality of | |
284 | .IR sprintf (3), | |
285 | and | |
286 | .B CONVFMT | |
287 | is a built-in variable used for internal conversion | |
288 | from number to string and initialized to "%.6g". | |
289 | Explicit type conversions can be forced, | |
290 | \*(ex "" | |
291 | is string and | |
292 | .IR expr +0 | |
293 | is numeric. | |
294 | .PP | |
295 | To evaluate, | |
296 | \*(ex\d1\u \fBrel-op \*(ex\d2\u, | |
297 | if both operands are numeric or number and string then the comparison | |
298 | is numeric; if both operands are string the comparison is string. | |
299 | If exactly one operand is string and after trimming spaces and | |
300 | tabs from the front and back the remaining string is entirely | |
301 | numeric in form, then the string is converted to number and the | |
302 | comparison is numeric; otherwise, the numeric operand is converted | |
303 | to string and the comparison is string. | |
304 | The result of a comparison is numeric, 0 or 1. | |
305 | .PP | |
306 | In boolean contexts such as, | |
307 | \fBif\fR ( \*(ex ) \fIstatement\fR, | |
308 | a string expression evaluates true if and only if it is not the | |
309 | empty string ""; | |
310 | numeric values if and only if not numerically zero. | |
311 | .\" | |
312 | .SS "\fB3. Regular expressions" | |
313 | In the AWK language, records, fields and strings are often | |
314 | tested for matching a | |
315 | .IR "regular expression" . | |
316 | Regular expressions are enclosed in slashes, and | |
317 | .nf | |
318 | .sp | |
319 | \*(ex ~ /\fIr\fR/ | |
320 | .sp | |
321 | .fi | |
322 | is an AWK expression that evaluates to 1 if \*(ex "matches" | |
323 | .IR r , | |
324 | which means a substring of \*(ex is in the set of strings | |
325 | defined by | |
326 | .IR r . | |
327 | With no match the expression evaluates to 0; replacing | |
328 | ~ with the "not match" operator, !~ , reverses the meaning. | |
329 | As pattern-action pairs, | |
330 | .nf | |
331 | .sp | |
332 | /\fIr\fR/ { \fIaction\fR } and\ | |
333 | \fB$0\fR ~ /\fIr\fR/ { \fIaction\fR } | |
334 | .sp | |
335 | .fi | |
336 | are the same, | |
337 | and for each input record that matches | |
338 | .IR r , | |
339 | .I action | |
340 | is executed. | |
341 | In fact, /\fIr\fR/ is an AWK expression that is | |
342 | equivalent to (\fB$0\fR ~ /\fIr\fR/) anywhere except when on the | |
343 | right side of a match operator or passed as an argument to | |
344 | a built-in function that expects a regular expression | |
345 | argument. | |
346 | .PP | |
347 | AWK uses extended regular expressions as with | |
348 | .IR egrep (1). | |
349 | The regular expression metacharacters, i.e., those with special | |
350 | meaning in regular expressions are | |
351 | .nf | |
352 | .sp | |
353 | \ ^ $ . [ ] | ( ) * + ? | |
354 | .sp | |
355 | .fi | |
356 | Regular expressions are built up from characters as follows: | |
357 | .RS | |
358 | .TP \w'[^c\d1\uc\d2\uc\d3\u...]'u+1n | |
359 | \fIc\fR | |
360 | matches any non-metacharacter | |
361 | .IR c . | |
362 | .IP "\e\fIc\fR" | |
363 | matches a character defined by the same escape sequences used | |
364 | in string constants or the literal | |
365 | character | |
366 | .I c | |
367 | if | |
368 | \\\fIc\fR | |
369 | is not an escape sequence. | |
370 | .IP \. | |
371 | matches any character (including newline). | |
372 | .TP | |
373 | ^ | |
374 | matches the front of a string. | |
375 | .TP | |
376 | $ | |
377 | matches the back of a string. | |
378 | .TP | |
379 | [c\d1\uc\d2\uc\d3\u...] | |
380 | matches any character in the class | |
381 | c\d1\uc\d2\uc\d3\u... . An interval of characters is denoted | |
382 | c\d1\u\-c\d2\u inside a class [...]. | |
383 | .TP | |
384 | [^c\d1\uc\d2\uc\d3\u...] | |
385 | matches any character not in the class | |
386 | c\d1\uc\d2\uc\d3\u... | |
387 | .RE | |
388 | .sp | |
389 | Regular expressions are built up from other regular expressions | |
390 | as follows: | |
391 | .RS | |
392 | .TP | |
393 | \fIr\fR\d1\u\fIr\fR\d2\u | |
394 | matches | |
395 | \fIr\fR\d1\u | |
396 | followed immediately by | |
397 | \fIr\fR\d2\u | |
398 | (concatenation). | |
399 | .TP | |
400 | \fIr\fR\d1\u | \fIr\fR\d2\u | |
401 | matches | |
402 | \fIr\fR\d1\u or | |
403 | \fIr\fR\d2\u | |
404 | (alternation). | |
405 | .TP | |
406 | \fIr\fR* | |
407 | matches \fIr\fR repeated zero or more times. | |
408 | .TP | |
409 | \fIr\fR+ | |
410 | matches \fIr\fR repeated one or more times. | |
411 | .TP | |
412 | \fIr\fR? | |
413 | matches \fIr\fR zero or once. | |
414 | .TP | |
415 | (\fIr\fR) | |
416 | matches \fIr\fR, providing grouping. | |
417 | .RE | |
418 | .sp | |
419 | The increasing precedence of operators is alternation, | |
420 | concatenation and | |
421 | unary (*, + or ?). | |
422 | .PP | |
423 | For example, | |
424 | .nf | |
425 | .sp | |
426 | /^[_a\-zA-Z][_a\-zA\-Z0\-9]*$/ and | |
427 | /^[\-+]?([0\-9]+\\\|.?|\\\|.[0\-9])[0\-9]*([eE][\-+]?[0\-9]+)?$/ | |
428 | .sp | |
429 | .fi | |
430 | are matched by AWK identifiers and AWK numeric constants | |
431 | respectively. Note that . has to be escaped to be | |
432 | recognized as a decimal point, and that metacharacters are not | |
433 | special inside character classes. | |
434 | .PP | |
435 | Any expression can be used on the right hand side of the ~ or !~ | |
436 | operators or | |
437 | passed to a built-in that expects | |
438 | a regular expression. | |
439 | If needed, it is converted to string, and then interpreted | |
440 | as a regular expression. For example, | |
441 | .nf | |
442 | .sp | |
443 | BEGIN { identifier = "[_a\-zA\-Z][_a\-zA\-Z0\-9]*" } | |
444 | ||
445 | $0 ~ "^" identifier | |
446 | .sp | |
447 | .fi | |
448 | prints all lines that start with an AWK identifier. | |
449 | .PP | |
450 | .B mawk | |
451 | recognizes the empty regular expression, //\|, which matches the | |
452 | empty string and hence is matched by any string at the front, | |
453 | back and between every character. For example, | |
454 | .nf | |
455 | .sp | |
456 | echo abc | mawk { gsub(//, "X") ; print } | |
457 | XaXbXcX | |
458 | .sp | |
459 | .fi | |
460 | .\" | |
461 | .SS "\fB4. Records and fields" | |
462 | Records are read in one at a time, and stored in the | |
463 | .I field | |
464 | variable | |
465 | .BR $0 . | |
466 | The record is split into | |
467 | .I fields | |
468 | which are stored in | |
469 | .BR $1 , | |
470 | .BR $2 ", ...," | |
471 | .BR $NF . | |
472 | The built-in variable | |
473 | .B NF | |
474 | is set to the number of fields, | |
475 | and | |
476 | .B NR | |
477 | and | |
478 | .B FNR | |
479 | are incremented by 1. | |
480 | Fields above | |
481 | .B $NF | |
482 | are set to "". | |
483 | .PP | |
484 | Assignment to | |
485 | .B $0 | |
486 | causes the fields and | |
487 | .B NF | |
488 | to be recomputed. | |
489 | Assignment to | |
490 | .B NF | |
491 | or to a field | |
492 | causes | |
493 | .B $0 | |
494 | to be reconstructed by | |
495 | concatenating the | |
496 | .B $i's | |
497 | separated by | |
498 | .BR OFS . | |
499 | Assignment to a field with index greater than | |
500 | .BR NF , | |
501 | increases | |
502 | .B NF | |
503 | and causes | |
504 | .B $0 | |
505 | to be reconstructed. | |
506 | .PP | |
507 | Data input stored in fields | |
508 | is string, unless the entire field has numeric | |
509 | form and then the type is number and string. | |
510 | For example, | |
511 | .sp | |
512 | .nf | |
513 | echo 24 24E | | |
514 | mawk '{ print($1>100, $1>"100", $2>100, $2>"100") }' | |
515 | 0 0 1 1 | |
516 | .fi | |
517 | .sp | |
518 | .B $0 | |
519 | and | |
520 | .B $2 | |
521 | are string and | |
522 | .B $1 | |
523 | is number and string. The first | |
524 | and second comparisons are numeric and the last | |
525 | two are string. In the second "100" is | |
526 | converted to 100, and in the third 100 is | |
527 | converted to "100". | |
528 | .\" | |
529 | .SS "\fB5. Expressions and operators" | |
530 | .PP | |
531 | The expression syntax is | |
532 | similar to C. Primary expressions are numeric constants, | |
533 | string constants, variables, fields, arrays and functions. | |
534 | The identifier | |
535 | for a variable, array or function can be a sequence of | |
536 | letters, digits and underscores, that does | |
537 | not start with a digit. | |
538 | Variables are not declared; they exist when first referenced and | |
539 | are initialized to | |
540 | .IR null . | |
541 | .PP | |
542 | New | |
543 | expressions are composed with the following operators in | |
544 | order of increasing precedence. | |
545 | .PP | |
546 | .RS | |
547 | .nf | |
548 | .vs +2p \" open up a little | |
549 | \fIassignment\fR = += \-= *= /= %= ^= | |
550 | \fIconditional\fR ? : | |
551 | \fIlogical or\fR || | |
552 | \fIlogical and\fR && | |
553 | \fIarray membership\fR \fBin | |
554 | \fImatching\fR ~ !~ | |
555 | \fIrelational\fR < > <= >= == != | |
556 | \fIconcatenation\fR (no explicit operator) | |
557 | \fIadd ops\fR + \- | |
558 | \fImul ops\fR * / % | |
559 | \fIunary\fR + \- | |
560 | \fIlogical not\fR ! | |
561 | \fIexponentiation\fR ^ | |
562 | \fIinc and dec\fR ++ \-\|\- (both post and pre) | |
563 | \fIfield\fR $ | |
564 | .vs | |
565 | .RE | |
566 | .PP | |
567 | .fi | |
568 | Assignment, conditional and exponentiation associate right to | |
569 | left; the other operators associate left to right. Any | |
570 | expression can be parenthesized. | |
571 | .\" | |
572 | .SS "\fB6. Arrays" | |
573 | .ds ae \fIarray\fR[\fIexpr\fR] | |
574 | Awk provides one-dimensional arrays. Array elements are expressed | |
575 | as \*(ae. | |
576 | .I Expr | |
577 | is internally converted to string type, so, for example, | |
578 | A[1] and A["1"] are the same element and the actual | |
579 | index is "1". | |
580 | Arrays indexed by strings are called associative arrays. | |
581 | Initially an array is empty; elements exist when first accessed. | |
582 | An expression, | |
583 | \fIexpr\fB in\fI array\fR | |
584 | evaluates to 1 if | |
585 | \*(ae | |
586 | exists, else to 0. | |
587 | .PP | |
588 | There is a form of the | |
589 | .B for | |
590 | statement that loops over each index of an array. | |
591 | .nf | |
592 | .sp | |
593 | \fBfor\fR ( \fIvar\fB in \fIarray \fR) \fIstatement\fR | |
594 | .sp | |
595 | .fi | |
596 | sets | |
597 | .I var | |
598 | to each index of | |
599 | .I array | |
600 | and executes | |
601 | .IR statement . | |
602 | The order that | |
603 | .I var | |
604 | transverses the indices of | |
605 | .I array | |
606 | is not defined. | |
607 | .PP | |
608 | The statement, | |
609 | .B delete | |
610 | \*(ae, | |
611 | causes | |
612 | \*(ae | |
613 | not to exist. | |
614 | .PP | |
615 | Multidimensional arrays are synthesized with concatenation using | |
616 | the built-in variable | |
617 | .BR SUBSEP . | |
618 | \fIarray\fR[\fIexpr\fR\d1\u,\|\fIexpr\fR\d2\u] | |
619 | is equivalent to | |
620 | \fIarray\fR[\fIexpr\fR\d1\u \fBSUBSEP \fIexpr\fR\d2\u]. | |
621 | Testing for a multidimensional element uses a parenthesized index, | |
622 | such as | |
623 | .sp | |
624 | .nf | |
625 | if ( (i, j) in A ) print A[i, j] | |
626 | .fi | |
627 | .sp | |
628 | .\" | |
629 | .SS "\fB7. Builtin-variables\fR" | |
630 | .PP | |
631 | The following variables are built-in and initialized before program | |
632 | execution. | |
633 | .RS | |
634 | .TP \w'FILENAME'u+2n | |
635 | .B ARGC | |
636 | number of command line arguments. | |
637 | .TP | |
638 | .B ARGV | |
639 | array of command line arguments, 0..ARGC-1. | |
640 | .TP | |
641 | .B CONVFMT | |
642 | format for internal conversion of numbers to string, | |
643 | initially = "%.6g". | |
644 | .TP | |
645 | .B ENVIRON | |
646 | array indexed by environment variables. An environment string, | |
647 | \fIvar=value\fR is stored as | |
648 | \fBENVIRON\fR[\fIvar\fR] = | |
649 | .IR value . | |
650 | .TP | |
651 | .B FILENAME | |
652 | name of the current input file. | |
653 | .TP | |
654 | .B FNR | |
655 | current record number in | |
656 | .BR FILENAME . | |
657 | .TP | |
658 | .B FS | |
659 | splits records into fields as a regular expression. | |
660 | .TP | |
661 | .B NF | |
662 | number of fields in the current record. | |
663 | .TP | |
664 | .B NR | |
665 | current record number in the total input stream. | |
666 | .TP | |
667 | .B OFMT | |
668 | format for printing numbers; initially = "%.6g". | |
669 | .TP | |
670 | .B OFS | |
671 | inserted between fields on output, initially = " ". | |
672 | .TP | |
673 | .B ORS | |
674 | terminates each record on output, initially = "\\n". | |
675 | .TP | |
676 | .B RLENGTH | |
677 | length set by the last call to the built-in function, | |
678 | .BR match() . | |
679 | .TP | |
680 | .B RS | |
681 | input record separator, initially = "\\n". | |
682 | .TP | |
683 | .B RSTART | |
684 | index set by the last call to | |
685 | .BR match() . | |
686 | .TP | |
687 | .B SUBSEP | |
688 | used to build multiple array subscripts, initially = "\\034". | |
689 | .RE | |
690 | .\" | |
691 | .SS "\fB8. Built-in functions" | |
692 | String functions | |
693 | .RS | |
694 | .TP | |
695 | gsub(\fIr,s,t\fR) gsub(\fIr,s\fR) | |
696 | Global substitution, every match of regular expression | |
697 | .I r | |
698 | in variable | |
699 | .I t | |
700 | is replaced by string | |
701 | .IR s . | |
702 | The number of replacements is returned. | |
703 | If | |
704 | .I t | |
705 | is omitted, | |
706 | .B $0 | |
707 | is used. An & in the replacement string | |
708 | .I s | |
709 | is replaced by the matched substring of | |
710 | .IR t . | |
711 | \\& puts a literal & in the replacement string. | |
712 | .TP | |
713 | index(\fIs,t\fR) | |
714 | If | |
715 | .I t | |
716 | is a substring of | |
717 | .IR s , | |
718 | then the position where | |
719 | .I t | |
720 | starts is returned, else 0 is returned. | |
721 | The first character of | |
722 | .I s | |
723 | is in position 1. | |
724 | .TP | |
725 | length(\fIs\fR) length() | |
726 | Returns the length of string | |
727 | .IR s ; | |
728 | without an argument, returns the length of | |
729 | .BR $0 . | |
730 | .TP | |
731 | match(\fIs,r\fR) | |
732 | Returns the index of the first longest match of regular expression | |
733 | .I r | |
734 | in string | |
735 | .IR s . | |
736 | Returns 0 if no match. | |
737 | As a side effect, | |
738 | .B RSTART | |
739 | is set to the return value. | |
740 | .B RLENGTH | |
741 | is set to the length of the match or \-1 if no match. If the | |
742 | empty string is matched, | |
743 | .B RLENGTH | |
744 | is set to 0, and 1 is returned if the match is at the front, and | |
745 | length(\fIs\fR)+1 is returned if the match is at the back. | |
746 | .TP | |
747 | split(\fIs,A,r\fR) split(\fIs,A\fR) | |
748 | String | |
749 | .I s | |
750 | is split into fields by regular expression | |
751 | .I r | |
752 | and the fields are loaded into array | |
753 | .IR A . | |
754 | The number of fields | |
755 | is returned. See section 11 below for more detail. | |
756 | If | |
757 | .I r | |
758 | is omitted, | |
759 | .B FS | |
760 | is used. | |
761 | .TP | |
762 | sprintf(\fIformat,expr-list\fR) | |
763 | Returns a string constructed from | |
764 | .I expr-list | |
765 | according to | |
766 | .IR format . | |
767 | See the description of printf() below. | |
768 | .TP | |
769 | sub(\fIr,s,t\fR) sub(\fIr,s\fR) | |
770 | Single substitution, same as gsub() except at most one substitution. | |
771 | .TP | |
772 | substr(\fIs,i,n\fR) substr(\fIs,i\fR) | |
773 | Returns the substring of string | |
774 | .IR s , | |
775 | starting at index | |
776 | .IR i , | |
777 | of length | |
778 | .IR n . | |
779 | If | |
780 | .I n | |
781 | is omitted, the suffix of | |
782 | .IR s , | |
783 | starting at | |
784 | .I i | |
785 | is returned. | |
786 | .TP | |
787 | tolower(\fIs\fR) | |
788 | Returns a copy of | |
789 | .I s | |
790 | with all upper case characters converted to lower case. | |
791 | .TP | |
792 | toupper(\fIs\fR) | |
793 | Returns a copy of | |
794 | .I s | |
795 | with all lower case characters converted to upper case. | |
796 | .RE | |
797 | .PP | |
798 | Arithmetic functions | |
799 | .RS | |
800 | .PP | |
801 | .nf | |
802 | atan2(\fIy,x\fR) Arctan of \fIy\fR/\fIx\fR between -\(*p and \(*p. | |
803 | .PP | |
804 | cos(\fIx\fR) Cosine function, \fIx\fR in radians. | |
805 | .PP | |
806 | exp(\fIx\fR) Exponential function. | |
807 | .PP | |
808 | int(\fIx\fR) Returns \fIx\fR truncated towards zero. | |
809 | .PP | |
810 | log(\fIx\fR) Natural logarithm. | |
811 | .PP | |
812 | rand() Returns a random number between zero and one. | |
813 | .PP | |
814 | sin(\fIx\fR) Sine function, \fIx\fR in radians. | |
815 | .PP | |
816 | sqrt(\fIx\fR) Returns square root of \fIx\fR. | |
817 | .fi | |
818 | .TP | |
819 | srand(\fIexpr\fR) srand() | |
820 | Seeds the random number generator, using the clock if | |
821 | .I expr | |
822 | is omitted, and returns the value of the previous seed. | |
823 | .B mawk | |
824 | seeds the random number generator from the clock at startup | |
825 | so there is no real need to call srand(). Srand(\fIexpr\fR) | |
826 | is useful for repeating pseudo random sequences. | |
827 | .RE | |
828 | .\" | |
829 | .SS "\fB9. Input and output" | |
830 | There are two output statements, | |
831 | .B print | |
832 | and | |
833 | .BR printf . | |
834 | .RS | |
835 | .TP | |
836 | ||
837 | writes | |
838 | .B "$0 ORS" | |
839 | to standard output. | |
840 | .TP | |
841 | print \*(ex\d1\u, \*(ex\d2\u, ..., \*(ex\dn\u | |
842 | writes | |
843 | \*(ex\d1\u \fBOFS \*(ex\d2\u \fBOFS\fR ... \*(ex\dn\u | |
844 | .B ORS | |
845 | to standard output. Numeric expressions are converted to | |
846 | string with | |
847 | .BR OFMT . | |
848 | .TP | |
849 | printf \fIformat, expr-list\fR | |
850 | duplicates the printf C library function writing to standard output. | |
851 | The complete ANSI C format specifications are recognized with | |
852 | conversions %c, %d, %e, %E, %f, %g, %G, | |
853 | %i, %o, %s, %u, %x, %X and %%, | |
854 | and conversion qualifiers h and l. | |
855 | .RE | |
856 | .PP | |
857 | The argument list to print or printf can optionally be enclosed in | |
858 | parentheses. | |
859 | Print formats numbers using | |
860 | .B OFMT | |
861 | or "%d" for exact integers. | |
862 | "%c" with a numeric argument prints the corresponding 8 bit | |
863 | character, with a string argument it prints the first character of | |
864 | the string. | |
865 | The output of print and printf can be redirected to a file or | |
866 | command by appending > | |
867 | .IR file , | |
868 | >> | |
869 | .I file | |
870 | or | |
871 | | | |
872 | .I command | |
873 | to the end of the print statement. | |
874 | Redirection opens | |
875 | .I file | |
876 | or | |
877 | .I command | |
878 | only once, subsequent redirections append to the already open stream. | |
879 | By convention, | |
880 | .B mawk | |
881 | associates the filename "/dev/stderr" with stderr which allows | |
882 | print and printf to be redirected to stderr. | |
883 | .PP | |
884 | The input function | |
885 | .B getline | |
886 | has the following variations. | |
887 | .RS | |
888 | .TP | |
889 | getline | |
890 | reads into | |
891 | .BR $0 , | |
892 | updates the fields, | |
893 | .BR NF , | |
894 | .B NR | |
895 | and | |
896 | .BR FNR . | |
897 | .TP | |
898 | getline < \fIfile\fR | |
899 | reads into | |
900 | .B $0 | |
901 | from \fIfile\fR, | |
902 | updates the fields and | |
903 | .BR NF . | |
904 | .TP | |
905 | getline \fIvar | |
906 | reads the next record into | |
907 | .IR var , | |
908 | updates | |
909 | .B NR | |
910 | and | |
911 | .BR FNR . | |
912 | .TP | |
913 | getline \fIvar\fR < \fIfile | |
914 | reads the next record of | |
915 | .I file | |
916 | into | |
917 | .IR var . | |
918 | .TP | |
919 | \fI command\fR | getline | |
920 | pipes a record from | |
921 | .I command | |
922 | into | |
923 | .B $0 | |
924 | and updates the fields and | |
925 | .BR NF . | |
926 | .TP | |
927 | \fI command\fR | getline \fIvar | |
928 | pipes a record from | |
929 | .I command | |
930 | into | |
931 | .IR var . | |
932 | .RE | |
933 | .PP | |
934 | Getline returns 0 on end-of-file, \-1 on error, otherwise 1. | |
935 | .PP | |
936 | Commands on the end of pipes are executed by /bin/sh. | |
937 | .PP | |
938 | The function \fBclose\fR(\*(ex) closes the file or pipe | |
939 | associated with | |
940 | .IR expr . | |
941 | Close returns 0 if | |
942 | .I expr | |
943 | is an open file, | |
944 | the exit status if | |
945 | .I expr | |
946 | is a piped command, and -1 otherwise. | |
947 | Close() is used to reread a file or command, make sure the other | |
948 | end of an output pipe is finished or conserve file resources. | |
949 | .PP | |
950 | The function | |
951 | \fBsystem\fR(\fIexpr\fR) | |
952 | uses | |
953 | /bin/sh | |
954 | to execute | |
955 | .I expr | |
956 | and returns the exit status of the command | |
957 | .IR expr . | |
958 | Changes made to the | |
959 | .B ENVIRON | |
960 | array are not passed to commands executed with | |
961 | .B system | |
962 | or pipes. | |
963 | .SS \fB10. User defined functions | |
964 | The syntax for a user defined function is | |
965 | .nf | |
966 | .sp | |
967 | \fBfunction\fR name( \fIargs\fR ) { \fIstatements\fR } | |
968 | .sp | |
969 | .fi | |
970 | The function body can contain a return statement | |
971 | .nf | |
972 | .sp | |
973 | \fBreturn\fI opt_expr\fR | |
974 | .sp | |
975 | .fi | |
976 | A return statement is not required. | |
977 | Function calls may be nested or recursive. | |
978 | Functions are passed expressions by value | |
979 | and arrays by reference. | |
980 | Extra arguments serve as local variables | |
981 | and are initialized to | |
982 | .IR null . | |
983 | For example, csplit(\fIs,\|A\fR) puts each character of | |
984 | .I s | |
985 | into array | |
986 | .I A | |
987 | and returns the length of | |
988 | .IR s . | |
989 | .nf | |
990 | .sp | |
991 | function csplit(s, A, n, i) | |
992 | { | |
993 | n = length(s) | |
994 | for( i = 1 ; i <= n ; i++ ) A[i] = substr(s, i, 1) | |
995 | return n | |
996 | } | |
997 | .sp | |
998 | .fi | |
999 | Putting extra space between passed arguments and local | |
1000 | variables is conventional. | |
1001 | Functions can be referenced before they are defined, but the | |
1002 | function name and the '(' of the arguments must touch to | |
1003 | avoid confusion with concatenation. | |
1004 | .\" | |
1005 | .SS "\fB11. Splitting strings, records and files" | |
1006 | Awk programs use the same algorithm to | |
1007 | split strings into arrays with split(), and records into fields | |
1008 | on | |
1009 | .BR FS . | |
1010 | .B mawk | |
1011 | uses essentially the same algorithm to split files into | |
1012 | records on | |
1013 | .BR RS . | |
1014 | .PP | |
1015 | Split(\fIexpr,\|A,\|sep\fR) works as follows: | |
1016 | .RS | |
1017 | .TP | |
1018 | (1) | |
1019 | If | |
1020 | .I sep | |
1021 | is omitted, it is replaced by | |
1022 | .BR FS . | |
1023 | .I Sep | |
1024 | can be an expression or regular expression. If it is an | |
1025 | expression of non-string type, it is converted to string. | |
1026 | .TP | |
1027 | (2) | |
1028 | If | |
1029 | .I sep | |
1030 | = " " (a single space), | |
1031 | then <SPACE> is trimmed from the front and back of | |
1032 | .IR expr , | |
1033 | and | |
1034 | .I sep | |
1035 | becomes <SPACE>. | |
1036 | .B mawk | |
1037 | defines <SPACE> as the regular expression | |
1038 | /[\ \\t\\n]+/. | |
1039 | Otherwise | |
1040 | .I sep | |
1041 | is treated as a regular expression, except that meta-characters | |
1042 | are ignored for a string of length 1, | |
1043 | e.g., | |
1044 | split(x, A, "*") and split(x, A, /\\*/) are the same. | |
1045 | .TP | |
1046 | (3) | |
1047 | If \*(ex is not string, it is converted to string. | |
1048 | If \*(ex is then the empty string "", split() returns 0 | |
1049 | and | |
1050 | .I A | |
1051 | is unchanged. | |
1052 | Otherwise, | |
1053 | all non-overlapping, non-null and longest matches of | |
1054 | .I sep | |
1055 | in | |
1056 | .IR expr , | |
1057 | separate | |
1058 | .I expr | |
1059 | into fields which are loaded into | |
1060 | .IR A . | |
1061 | The fields are placed in | |
1062 | A[1], A[2], ..., A[n] and split() returns n, the number | |
1063 | of fields which is the number | |
1064 | of matches plus one. | |
1065 | Data placed in | |
1066 | .I A | |
1067 | that looks numeric is typed number and string. | |
1068 | .RE | |
1069 | .PP | |
1070 | Splitting records into fields works the same except the | |
1071 | pieces are loaded into | |
1072 | .BR $1 , | |
1073 | \fB$2\fR,..., | |
1074 | .BR $NF . | |
1075 | If | |
1076 | .B $0 | |
1077 | is empty, | |
1078 | .B NF | |
1079 | is set to 0 and all | |
1080 | .B $i | |
1081 | to "". | |
1082 | .PP | |
1083 | .B mawk | |
1084 | splits files into records by the same algorithm, but with the | |
1085 | slight difference that | |
1086 | .B RS | |
1087 | is really a terminator instead of a separator. | |
1088 | (\fBORS\fR is really a terminator too). | |
1089 | .RS | |
1090 | .PP | |
1091 | E.g., if | |
1092 | .B FS | |
1093 | = ":+" and | |
1094 | .B $0 | |
1095 | = "a::b:" , then | |
1096 | .B NF | |
1097 | = 3 and | |
1098 | .B $1 | |
1099 | = "a", | |
1100 | .B $2 | |
1101 | = "b" and | |
1102 | .B $3 | |
1103 | = "", but | |
1104 | if "a::b:" is the contents of an input file and | |
1105 | .B RS | |
1106 | = ":+", then | |
1107 | there are two records "a" and "b". | |
1108 | .RE | |
1109 | .PP | |
1110 | .B RS | |
1111 | = " " is not special. | |
1112 | .\" | |
1113 | .SS "\fB12. Multi-line records" | |
1114 | Since | |
1115 | .B mawk | |
1116 | interprets | |
1117 | .B RS | |
1118 | as a regular expression, multi-line | |
1119 | records are easy. Setting | |
1120 | .B RS | |
1121 | = "\\n\\n+", makes one or more blank | |
1122 | lines separate records. If | |
1123 | .B FS | |
1124 | = " " (the default), then single | |
1125 | newlines, by the rules for <SPACE> above, become space and | |
1126 | single newlines are field separators. | |
1127 | .RS | |
1128 | .PP | |
1129 | For example, if a file is "a\ b\\nc\\n\\n", | |
1130 | .B RS | |
1131 | = "\\n\\n+" and | |
1132 | .B FS | |
1133 | = "\ ", then there is one record "a\ b\\nc" with three | |
1134 | fields "a", "b" and "c". Changing | |
1135 | .B FS | |
1136 | = "\\n", gives two | |
1137 | fields "a b" and "c"; changing | |
1138 | .B FS | |
1139 | = "", gives one field | |
1140 | identical to the record. | |
1141 | .RE | |
1142 | .PP | |
1143 | If you want lines with spaces or tabs to be considered blank, | |
1144 | set | |
1145 | .B RS | |
1146 | = "\\n([\ \\t]*\\n)+". | |
1147 | For compatibility with other awks, setting | |
1148 | .B RS | |
1149 | = "" has the same | |
1150 | effect as if blank lines are stripped from the | |
1151 | front and back of files and then records are determined as if | |
1152 | .B RS | |
1153 | = "\\n\\n+". | |
1154 | Posix requires that "\\n" always separates records when | |
1155 | .B RS | |
1156 | = "" regardless of the value of | |
1157 | .BR FS . | |
1158 | .B mawk | |
1159 | does not support this convention, because defining | |
1160 | "\\n" as <SPACE> makes it unnecessary. | |
1161 | .\" | |
1162 | .PP | |
1163 | Most of the time when you change | |
1164 | .B RS | |
1165 | for multi-line records, you | |
1166 | will also want to change | |
1167 | .B ORS | |
1168 | to "\\n\\n" so the record spacing is preserved on output. | |
1169 | .\" | |
1170 | .SS "\fB13. Program execution" | |
1171 | This section describes the order of program execution. | |
1172 | First | |
1173 | .B ARGC | |
1174 | is set to the total number of command line arguments passed to | |
1175 | the execution phase of the program. | |
1176 | .B ARGV[0] | |
1177 | is set the name of the AWK interpreter and | |
1178 | \fBARGV[1]\fR ... | |
1179 | .B ARGV[ARGC-1] | |
1180 | holds the remaining command line arguments exclusive of | |
1181 | options and program source. | |
1182 | For example with | |
1183 | .nf | |
1184 | .sp | |
1185 | mawk \-f prog v=1 A t=hello B | |
1186 | .sp | |
1187 | .fi | |
1188 | .B ARGC | |
1189 | = 5 with | |
1190 | .B ARGV[0] | |
1191 | = "mawk", | |
1192 | .B ARGV[1] | |
1193 | = "v=1", | |
1194 | .B ARGV[2] | |
1195 | = "A", | |
1196 | .B ARGV[3] | |
1197 | = "t=hello" and | |
1198 | .B ARGV[4] | |
1199 | = "B". | |
1200 | ||
1201 | Next, each | |
1202 | .B BEGIN | |
1203 | block is executed in order. | |
1204 | If the program consists | |
1205 | entirely of | |
1206 | .B BEGIN | |
1207 | blocks, then execution terminates, else | |
1208 | an input stream is opened and execution continues. | |
1209 | If | |
1210 | .B ARGC | |
1211 | equals 1, | |
1212 | the input stream is set to stdin, | |
1213 | else the command line arguments | |
1214 | .BR ARGV[1] " ... | |
1215 | .B ARGV[ARGC-1] | |
1216 | are examined for a file argument. | |
1217 | .PP | |
1218 | The command line arguments divide into three sets: | |
1219 | file arguments, assignment arguments and empty strings "". | |
1220 | An assignment has the form | |
1221 | \fIvar\fR=\fIstring\fR. | |
1222 | When an | |
1223 | .B ARGV[i] | |
1224 | is examined as a possible file argument, | |
1225 | if it is empty it is skipped; | |
1226 | if it is an assignment argument, the assignment to | |
1227 | .I var | |
1228 | takes place and | |
1229 | .B i | |
1230 | skips to the next argument; | |
1231 | else | |
1232 | .B ARGV[i] | |
1233 | is opened for input. | |
1234 | If it fails to open, execution terminates with exit code 1. | |
1235 | If no command line argument is a file argument, then input | |
1236 | comes from stdin. | |
1237 | Getline in a | |
1238 | .B BEGIN | |
1239 | action opens input. "\-" as a file argument denotes stdin. | |
1240 | .PP | |
1241 | Once an input stream is open, each input record is tested | |
1242 | against each | |
1243 | .IR pattern , | |
1244 | and if it matches, the associated | |
1245 | .I action | |
1246 | is executed. | |
1247 | An expression pattern matches if it is boolean true (see | |
1248 | the end of section 2). | |
1249 | A | |
1250 | .B BEGIN | |
1251 | pattern matches before any input has been read, and | |
1252 | an | |
1253 | .B END | |
1254 | pattern matches after all input has been read. | |
1255 | A range pattern, | |
1256 | \fIexpr\fR1,\|\fIexpr\fR2 , | |
1257 | matches every record between the match of | |
1258 | .IR expr 1 | |
1259 | and the match | |
1260 | .IR expr 2 | |
1261 | inclusively. | |
1262 | .PP | |
1263 | When end of file occurs on the input stream, the remaining | |
1264 | command line arguments are examined for a file argument, and | |
1265 | if there is one it is opened, else the | |
1266 | .B END | |
1267 | .I pattern | |
1268 | is considered matched | |
1269 | and all | |
1270 | .B END | |
1271 | .I actions | |
1272 | are executed. | |
1273 | .PP | |
1274 | In the example, the assignment | |
1275 | v=1 | |
1276 | takes place after the | |
1277 | .B BEGIN | |
1278 | .I actions | |
1279 | are executed, and | |
1280 | the data placed in | |
1281 | v | |
1282 | is typed number and string. | |
1283 | Input is then read from file A. | |
1284 | On end of file A, | |
1285 | t | |
1286 | is set to the string "hello", | |
1287 | and B is opened for input. | |
1288 | On end of file B, the | |
1289 | .B END | |
1290 | .I actions | |
1291 | are executed. | |
1292 | .PP | |
1293 | Program flow at the | |
1294 | .I pattern | |
1295 | .I {action} | |
1296 | level can be changed with the | |
1297 | .nf | |
1298 | .sp | |
1299 | \fBnext\fR and | |
1300 | \fBexit \fIopt_expr\fR | |
1301 | .sp | |
1302 | .fi | |
1303 | statements. | |
1304 | A | |
1305 | .B next | |
1306 | statement | |
1307 | causes the next input record to be read and pattern testing | |
1308 | to restart with the first | |
1309 | .I "pattern {action}" | |
1310 | pair in the program. | |
1311 | An | |
1312 | .B exit | |
1313 | statement | |
1314 | causes immediate execution of the | |
1315 | .B END | |
1316 | actions or program termination if there are none or | |
1317 | if the | |
1318 | .B exit | |
1319 | occurs in an | |
1320 | .B END | |
1321 | action. | |
1322 | The | |
1323 | .I opt_expr | |
1324 | sets the exit value of the program unless overridden by | |
1325 | a later | |
1326 | .B exit | |
1327 | or subsequent error. | |
1328 | ||
1329 | .SH EXAMPLES | |
1330 | .nf | |
1331 | 1. emulate cat. | |
1332 | ||
1333 | { print } | |
1334 | ||
1335 | 2. emulate wc. | |
1336 | ||
1337 | { chars += length($0) + 1 # add one for the \\n | |
1338 | words += NF | |
1339 | } | |
1340 | ||
1341 | END{ print NR, words, chars } | |
1342 | ||
1343 | 3. count the number of unique "real words". | |
1344 | ||
1345 | BEGIN { FS = "[^A-Za-z]+" } | |
1346 | ||
1347 | { for(i = 1 ; i <= NF ; i++) word[$i] = "" } | |
1348 | ||
1349 | END { delete word[""] | |
1350 | for ( i in word ) cnt++ | |
1351 | print cnt | |
1352 | } | |
1353 | ||
1354 | .fi | |
1355 | 4. sum the second field of | |
1356 | every record based on the first field. | |
1357 | .nf | |
1358 | ||
1359 | $1 ~ /credit\||\|gain/ { sum += $2 } | |
1360 | $1 ~ /debit\||\|loss/ { sum \-= $2 } | |
1361 | ||
1362 | END { print sum } | |
1363 | ||
1364 | 5. sort a file, comparing as string | |
1365 | ||
1366 | { line[NR] = $0 "" } # make sure of comparison type | |
1367 | # in case some lines look numeric | |
1368 | ||
1369 | END { isort(line, NR) | |
1370 | for(i = 1 ; i <= NR ; i++) print line[i] | |
1371 | } | |
1372 | ||
1373 | #insertion sort of A[1..n] | |
1374 | function isort( A, n, i, j, hold) | |
1375 | { | |
1376 | for( i = 2 ; i <= n ; i++) | |
1377 | { | |
1378 | hold = A[j = i] | |
1379 | while ( A[j\-1] > hold ) | |
1380 | { j\-\|\- ; A[j+1] = A[j] } | |
1381 | A[j] = hold | |
1382 | } | |
1383 | # sentinel A[0] = "" will be created if needed | |
1384 | } | |
1385 | ||
1386 | .fi | |
1387 | ||
1388 | .SH "COMPATIBILITY ISSUES" | |
1389 | The Posix 1003.2(draft 11.2) definition of the AWK language | |
1390 | is AWK as described in the AWK book with a few extensions | |
1391 | that appeared in SystemVR4 nawk. The extensions are: | |
1392 | .sp | |
1393 | .RS | |
1394 | New functions: toupper() and tolower(). | |
1395 | ||
1396 | New variables: ENVIRON[\|] and CONVFMT. | |
1397 | ||
1398 | ANSI C conversion specifications for printf() and sprintf(). | |
1399 | ||
1400 | New command options: \-v var=value, multiple -f options and | |
1401 | implementation options as arguments to \-W. | |
1402 | .RE | |
1403 | .sp | |
1404 | Posix AWK is oriented to operate on files a line at | |
1405 | a time. | |
1406 | .B RS | |
1407 | can be changed from "\\n" to another single character, | |
1408 | but it | |
1409 | is hard to find any use for this \(em there are no | |
1410 | examples in the AWK book. | |
1411 | By convention, \fBRS\fR = "", makes one or more blank lines | |
1412 | separate records, allowing multi-line records. When | |
1413 | \fBRS\fR = "", "\\n" is always a field separator | |
1414 | regardless of the value in | |
1415 | .BR FS . | |
1416 | .PP | |
1417 | .BR mawk , | |
1418 | on the other hand, | |
1419 | allows | |
1420 | .B RS | |
1421 | to be a regular expression. | |
1422 | When "\\n" appears in records, it is treated as space, and | |
1423 | .B FS | |
1424 | always determines fields. | |
1425 | .PP | |
1426 | Removing the line at a time paradigm can make some programs | |
1427 | simpler and can | |
1428 | often improve performance. For example, | |
1429 | redoing example 3 from above, | |
1430 | .nf | |
1431 | .sp | |
1432 | BEGIN { RS = "[^A-Za-z]+" } | |
1433 | ||
1434 | { word[ $0 ] = "" } | |
1435 | ||
1436 | END { delete word[ "" ] | |
1437 | for( i in word ) cnt++ | |
1438 | print cnt | |
1439 | } | |
1440 | .sp | |
1441 | .fi | |
1442 | counts the number of unique words by making each word a record. | |
1443 | On moderate size files, | |
1444 | .B mawk | |
1445 | executes twice as fast, because of the simplified inner loop. | |
1446 | .PP | |
1447 | The following program replaces each comment by a single space in | |
1448 | a C program file, | |
1449 | .nf | |
1450 | .sp | |
1451 | BEGIN { | |
1452 | RS = "/\|\\*([^*]\||\|\\*+[^/*])*\\*+/" | |
1453 | # comment is record separator | |
1454 | ORS = " " | |
1455 | getline hold | |
1456 | } | |
1457 | ||
1458 | { print hold ; hold = $0 } | |
1459 | ||
1460 | END { printf "%s" , hold } | |
1461 | .sp | |
1462 | .fi | |
1463 | Buffering one record is needed to avoid terminating the last | |
1464 | record with a space. | |
1465 | .PP | |
1466 | With | |
1467 | .BR mawk , | |
1468 | the following are all equivalent, | |
1469 | .nf | |
1470 | .sp | |
1471 | x ~ /a\\+b/ x ~ "a\\+b" x ~ "a\\\\+b" | |
1472 | .sp | |
1473 | .fi | |
1474 | The strings get scanned twice, once as string and once as | |
1475 | regular expression. On the string scan, | |
1476 | .B mawk | |
1477 | ignores the escape on non-escape characters while the AWK | |
1478 | book advocates | |
1479 | .I \ec | |
1480 | be recognized as | |
1481 | .I c | |
1482 | which necessitates the double escaping of meta-characters in | |
1483 | strings. | |
1484 | Posix explicitly declines to define the behavior which passively | |
1485 | forces programs that must run under a variety of awks to use | |
1486 | the more portable but less readable, double escape. | |
1487 | .PP | |
1488 | Posix AWK does not recognize "/dev/stderr" or \\x hex escape | |
1489 | sequences in strings. Unlike ANSI C, | |
1490 | .B mawk | |
1491 | limits the number of digits that follows \\x to two. | |
1492 | .PP | |
1493 | Finally, here is how | |
1494 | .B mawk | |
1495 | handles exceptional cases not discussed in the | |
1496 | AWK book or the Posix draft. It is unsafe to assume | |
1497 | consistency across awks and safe to skip to | |
1498 | the next section. | |
1499 | .PP | |
1500 | .RS | |
1501 | substr(s, i, n) returns the characters of s in the intersection | |
1502 | of the closed interval [1, length(s)] and the half-open interval | |
1503 | [i, i+n). When this intersection is empty, the empty string is | |
1504 | returned; so substr("ABC", 1, 0) = "" and | |
1505 | substr("ABC", \-4, 6) = "A". | |
1506 | ||
1507 | Every string, including the empty string, matches the empty string | |
1508 | at the | |
1509 | front so, s ~ // and s ~ "", are always 1 as is match(s, //) and | |
1510 | match(s, ""). The last two set | |
1511 | .B RLENGTH | |
1512 | to 0. | |
1513 | ||
1514 | index(s, t) is always the same as match(s, t1) where t1 is the | |
1515 | same as t with metacharacters escaped. Hence consistency | |
1516 | with match requires that | |
1517 | index(s, "") always returns 1. | |
1518 | Also the condition, index(s,t) != 0 if and only t is a substring | |
1519 | of s, requires index("","") = 1. | |
1520 | ||
1521 | If getline encounters end of file, getline var, leaves var | |
1522 | unchanged. Similarly, on entry to the | |
1523 | .B END | |
1524 | actions, | |
1525 | .BR $0 , | |
1526 | the fields and | |
1527 | .B NF | |
1528 | have their value unaltered from the last record. | |
1529 | ||
1530 | .SH SEE ALSO | |
1531 | .I egrep | |
1532 | (1) | |
1533 | .PP | |
1534 | Aho, Kernighan and Weinberger, | |
1535 | .IR "The AWK Programming Language" , | |
1536 | Addison-Wesley Publishing, 1988, (the AWK book), | |
1537 | defines the language, opening with a tutorial | |
1538 | and advancing to many interesting programs that delve into | |
1539 | issues of software design and analysis relevant to programming | |
1540 | in any language. | |
1541 | .PP | |
1542 | .IR "The GAWK Manual" , | |
1543 | The Free Software Foundation, 1991, is a tutorial | |
1544 | and language reference | |
1545 | that does not attempt the depth of the AWK book | |
1546 | and assumes the reader may be a novice programmer. | |
1547 | The section on AWK arrays is excellent. It also | |
1548 | discusses Posix requirements for AWK. | |
1549 | ||
1550 | ||
1551 | .SH BUGS | |
1552 | .B mawk | |
1553 | cannot handle ascii NUL \\0 in the source or data files. You | |
1554 | can output NUL using printf with %c, and any other 8 bit | |
1555 | character is acceptable input. | |
1556 | ||
1557 | .B mawk | |
1558 | implements printf() and sprintf() using the C library functions, | |
1559 | printf and sprintf, so full ANSI compatibility requires an ANSI | |
1560 | C library. In practice this means the h conversion qualifier may | |
1561 | not be available. Also | |
1562 | .B mawk | |
1563 | inherits any bugs or limitations of the library functions. | |
1564 | ||
1565 | Implementors of the AWK language have shown a consistent lack | |
1566 | of imagination when naming their programs. | |
1567 | ||
1568 | .SH AUTHOR | |
1569 | Mike Brennan (brennan@boeing.com). |