Commit | Line | Data |
---|---|---|
2f9c14bd C |
1 | This is Info file gawk.info, produced by Makeinfo-1.54 from the input |
2 | file gawk.texi. | |
3 | ||
4 | This file documents `awk', a program that you can use to select | |
5 | particular records in a file and perform operations upon them. | |
6 | ||
7 | This is Edition 0.15 of `The GAWK Manual', | |
8 | for the 2.15 version of the GNU implementation | |
9 | of AWK. | |
10 | ||
11 | Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc. | |
12 | ||
13 | Permission is granted to make and distribute verbatim copies of this | |
14 | manual provided the copyright notice and this permission notice are | |
15 | preserved on all copies. | |
16 | ||
17 | Permission is granted to copy and distribute modified versions of | |
18 | this manual under the conditions for verbatim copying, provided that | |
19 | the entire resulting derived work is distributed under the terms of a | |
20 | permission notice identical to this one. | |
21 | ||
22 | Permission is granted to copy and distribute translations of this | |
23 | manual into another language, under the above conditions for modified | |
24 | versions, except that this permission notice may be stated in a | |
25 | translation approved by the Foundation. | |
26 | ||
27 | \1f | |
28 | File: gawk.info, Node: Actions, Next: Expressions, Prev: Patterns, Up: Top | |
29 | ||
30 | Overview of Actions | |
31 | ******************* | |
32 | ||
33 | An `awk' program or script consists of a series of rules and | |
34 | function definitions, interspersed. (Functions are described later. | |
35 | *Note User-defined Functions: User-defined.) | |
36 | ||
37 | A rule contains a pattern and an action, either of which may be | |
38 | omitted. The purpose of the "action" is to tell `awk' what to do once | |
39 | a match for the pattern is found. Thus, the entire program looks | |
40 | somewhat like this: | |
41 | ||
42 | [PATTERN] [{ ACTION }] | |
43 | [PATTERN] [{ ACTION }] | |
44 | ... | |
45 | function NAME (ARGS) { ... } | |
46 | ... | |
47 | ||
48 | An action consists of one or more `awk' "statements", enclosed in | |
49 | curly braces (`{' and `}'). Each statement specifies one thing to be | |
50 | done. The statements are separated by newlines or semicolons. | |
51 | ||
52 | The curly braces around an action must be used even if the action | |
53 | contains only one statement, or even if it contains no statements at | |
54 | all. However, if you omit the action entirely, omit the curly braces as | |
55 | well. (An omitted action is equivalent to `{ print $0 }'.) | |
56 | ||
57 | Here are the kinds of statements supported in `awk': | |
58 | ||
59 | * Expressions, which can call functions or assign values to variables | |
60 | (*note Expressions as Action Statements: Expressions.). Executing | |
61 | this kind of statement simply computes the value of the expression | |
62 | and then ignores it. This is useful when the expression has side | |
63 | effects (*note Assignment Expressions: Assignment Ops.). | |
64 | ||
65 | * Control statements, which specify the control flow of `awk' | |
66 | programs. The `awk' language gives you C-like constructs (`if', | |
67 | `for', `while', and so on) as well as a few special ones (*note | |
68 | Control Statements in Actions: Statements.). | |
69 | ||
70 | * Compound statements, which consist of one or more statements | |
71 | enclosed in curly braces. A compound statement is used in order | |
72 | to put several statements together in the body of an `if', | |
73 | `while', `do' or `for' statement. | |
74 | ||
75 | * Input control, using the `getline' command (*note Explicit Input | |
76 | with `getline': Getline.), and the `next' statement (*note The | |
77 | `next' Statement: Next Statement.). | |
78 | ||
79 | * Output statements, `print' and `printf'. *Note Printing Output: | |
80 | Printing. | |
81 | ||
82 | * Deletion statements, for deleting array elements. *Note The | |
83 | `delete' Statement: Delete. | |
84 | ||
85 | \1f | |
86 | File: gawk.info, Node: Expressions, Next: Statements, Prev: Actions, Up: Top | |
87 | ||
88 | Expressions as Action Statements | |
89 | ******************************** | |
90 | ||
91 | Expressions are the basic building block of `awk' actions. An | |
92 | expression evaluates to a value, which you can print, test, store in a | |
93 | variable or pass to a function. But beyond that, an expression can | |
94 | assign a new value to a variable or a field, with an assignment | |
95 | operator. | |
96 | ||
97 | An expression can serve as a statement on its own. Most other kinds | |
98 | of statements contain one or more expressions which specify data to be | |
99 | operated on. As in other languages, expressions in `awk' include | |
100 | variables, array references, constants, and function calls, as well as | |
101 | combinations of these with various operators. | |
102 | ||
103 | * Menu: | |
104 | ||
105 | * Constants:: String, numeric, and regexp constants. | |
106 | * Variables:: Variables give names to values for later use. | |
107 | * Arithmetic Ops:: Arithmetic operations (`+', `-', etc.) | |
108 | * Concatenation:: Concatenating strings. | |
109 | * Comparison Ops:: Comparison of numbers and strings | |
110 | with `<', etc. | |
111 | * Boolean Ops:: Combining comparison expressions | |
112 | using boolean operators | |
113 | `||' ("or"), `&&' ("and") and `!' ("not"). | |
114 | ||
115 | * Assignment Ops:: Changing the value of a variable or a field. | |
116 | * Increment Ops:: Incrementing the numeric value of a variable. | |
117 | ||
118 | * Conversion:: The conversion of strings to numbers | |
119 | and vice versa. | |
120 | * Values:: The whole truth about numbers and strings. | |
121 | * Conditional Exp:: Conditional expressions select | |
122 | between two subexpressions under control | |
123 | of a third subexpression. | |
124 | * Function Calls:: A function call is an expression. | |
125 | * Precedence:: How various operators nest. | |
126 | ||
127 | \1f | |
128 | File: gawk.info, Node: Constants, Next: Variables, Prev: Expressions, Up: Expressions | |
129 | ||
130 | Constant Expressions | |
131 | ==================== | |
132 | ||
133 | The simplest type of expression is the "constant", which always has | |
134 | the same value. There are three types of constants: numeric constants, | |
135 | string constants, and regular expression constants. | |
136 | ||
137 | A "numeric constant" stands for a number. This number can be an | |
138 | integer, a decimal fraction, or a number in scientific (exponential) | |
139 | notation. Note that all numeric values are represented within `awk' in | |
140 | double-precision floating point. Here are some examples of numeric | |
141 | constants, which all have the same value: | |
142 | ||
143 | 105 | |
144 | 1.05e+2 | |
145 | 1050e-1 | |
146 | ||
147 | A string constant consists of a sequence of characters enclosed in | |
148 | double-quote marks. For example: | |
149 | ||
150 | "parrot" | |
151 | ||
152 | represents the string whose contents are `parrot'. Strings in `gawk' | |
153 | can be of any length and they can contain all the possible 8-bit ASCII | |
154 | characters including ASCII NUL. Other `awk' implementations may have | |
155 | difficulty with some character codes. | |
156 | ||
157 | Some characters cannot be included literally in a string constant. | |
158 | You represent them instead with "escape sequences", which are character | |
159 | sequences beginning with a backslash (`\'). | |
160 | ||
161 | One use of an escape sequence is to include a double-quote character | |
162 | in a string constant. Since a plain double-quote would end the string, | |
163 | you must use `\"' to represent a single double-quote character as a | |
164 | part of the string. The backslash character itself is another | |
165 | character that cannot be included normally; you write `\\' to put one | |
166 | backslash in the string. Thus, the string whose contents are the two | |
167 | characters `"\' must be written `"\"\\"'. | |
168 | ||
169 | Another use of backslash is to represent unprintable characters such | |
170 | as newline. While there is nothing to stop you from writing most of | |
171 | these characters directly in a string constant, they may look ugly. | |
172 | ||
173 | Here is a table of all the escape sequences used in `awk': | |
174 | ||
175 | `\\' | |
176 | Represents a literal backslash, `\'. | |
177 | ||
178 | `\a' | |
179 | Represents the "alert" character, control-g, ASCII code 7. | |
180 | ||
181 | `\b' | |
182 | Represents a backspace, control-h, ASCII code 8. | |
183 | ||
184 | `\f' | |
185 | Represents a formfeed, control-l, ASCII code 12. | |
186 | ||
187 | `\n' | |
188 | Represents a newline, control-j, ASCII code 10. | |
189 | ||
190 | `\r' | |
191 | Represents a carriage return, control-m, ASCII code 13. | |
192 | ||
193 | `\t' | |
194 | Represents a horizontal tab, control-i, ASCII code 9. | |
195 | ||
196 | `\v' | |
197 | Represents a vertical tab, control-k, ASCII code 11. | |
198 | ||
199 | `\NNN' | |
200 | Represents the octal value NNN, where NNN are one to three digits | |
201 | between 0 and 7. For example, the code for the ASCII ESC (escape) | |
202 | character is `\033'. | |
203 | ||
204 | `\xHH...' | |
205 | Represents the hexadecimal value HH, where HH are hexadecimal | |
206 | digits (`0' through `9' and either `A' through `F' or `a' through | |
207 | `f'). Like the same construct in ANSI C, the escape sequence | |
208 | continues until the first non-hexadecimal digit is seen. However, | |
209 | using more than two hexadecimal digits produces undefined results. | |
210 | (The `\x' escape sequence is not allowed in POSIX `awk'.) | |
211 | ||
212 | A "constant regexp" is a regular expression description enclosed in | |
213 | slashes, such as `/^beginning and end$/'. Most regexps used in `awk' | |
214 | programs are constant, but the `~' and `!~' operators can also match | |
215 | computed or "dynamic" regexps (*note How to Use Regular Expressions: | |
216 | Regexp Usage.). | |
217 | ||
218 | Constant regexps may be used like simple expressions. When a | |
219 | constant regexp is not on the right hand side of the `~' or `!~' | |
220 | operators, it has the same meaning as if it appeared in a pattern, i.e. | |
221 | `($0 ~ /foo/)' (*note Expressions as Patterns: Expression Patterns.). | |
222 | This means that the two code segments, | |
223 | ||
224 | if ($0 ~ /barfly/ || $0 ~ /camelot/) | |
225 | print "found" | |
226 | ||
227 | and | |
228 | ||
229 | if (/barfly/ || /camelot/) | |
230 | print "found" | |
231 | ||
232 | are exactly equivalent. One rather bizarre consequence of this rule is | |
233 | that the following boolean expression is legal, but does not do what | |
234 | the user intended: | |
235 | ||
236 | if (/foo/ ~ $1) print "found foo" | |
237 | ||
238 | This code is "obviously" testing `$1' for a match against the regexp | |
239 | `/foo/'. But in fact, the expression `(/foo/ ~ $1)' actually means | |
240 | `(($0 ~ /foo/) ~ $1)'. In other words, first match the input record | |
241 | against the regexp `/foo/'. The result will be either a 0 or a 1, | |
242 | depending upon the success or failure of the match. Then match that | |
243 | result against the first field in the record. | |
244 | ||
245 | Since it is unlikely that you would ever really wish to make this | |
246 | kind of test, `gawk' will issue a warning when it sees this construct in | |
247 | a program. | |
248 | ||
249 | Another consequence of this rule is that the assignment statement | |
250 | ||
251 | matches = /foo/ | |
252 | ||
253 | will assign either 0 or 1 to the variable `matches', depending upon the | |
254 | contents of the current input record. | |
255 | ||
256 | Constant regular expressions are also used as the first argument for | |
257 | the `sub' and `gsub' functions (*note Built-in Functions for String | |
258 | Manipulation: String Functions.). | |
259 | ||
260 | This feature of the language was never well documented until the | |
261 | POSIX specification. | |
262 | ||
263 | You may be wondering, when is | |
264 | ||
265 | $1 ~ /foo/ { ... } | |
266 | ||
267 | preferable to | |
268 | ||
269 | $1 ~ "foo" { ... } | |
270 | ||
271 | Since the right-hand sides of both `~' operators are constants, it | |
272 | is more efficient to use the `/foo/' form: `awk' can note that you have | |
273 | supplied a regexp and store it internally in a form that makes pattern | |
274 | matching more efficient. In the second form, `awk' must first convert | |
275 | the string into this internal form, and then perform the pattern | |
276 | matching. The first form is also better style; it shows clearly that | |
277 | you intend a regexp match. | |
278 | ||
279 | \1f | |
280 | File: gawk.info, Node: Variables, Next: Arithmetic Ops, Prev: Constants, Up: Expressions | |
281 | ||
282 | Variables | |
283 | ========= | |
284 | ||
285 | Variables let you give names to values and refer to them later. You | |
286 | have already seen variables in many of the examples. The name of a | |
287 | variable must be a sequence of letters, digits and underscores, but it | |
288 | may not begin with a digit. Case is significant in variable names; `a' | |
289 | and `A' are distinct variables. | |
290 | ||
291 | A variable name is a valid expression by itself; it represents the | |
292 | variable's current value. Variables are given new values with | |
293 | "assignment operators" and "increment operators". *Note Assignment | |
294 | Expressions: Assignment Ops. | |
295 | ||
296 | A few variables have special built-in meanings, such as `FS', the | |
297 | field separator, and `NF', the number of fields in the current input | |
298 | record. *Note Built-in Variables::, for a list of them. These | |
299 | built-in variables can be used and assigned just like all other | |
300 | variables, but their values are also used or changed automatically by | |
301 | `awk'. Each built-in variable's name is made entirely of upper case | |
302 | letters. | |
303 | ||
304 | Variables in `awk' can be assigned either numeric or string values. | |
305 | By default, variables are initialized to the null string, which is | |
306 | effectively zero if converted to a number. There is no need to | |
307 | "initialize" each variable explicitly in `awk', the way you would in C | |
308 | or most other traditional languages. | |
309 | ||
310 | * Menu: | |
311 | ||
312 | * Assignment Options:: Setting variables on the command line | |
313 | and a summary of command line syntax. | |
314 | This is an advanced method of input. | |
315 | ||
316 | \1f | |
317 | File: gawk.info, Node: Assignment Options, Prev: Variables, Up: Variables | |
318 | ||
319 | Assigning Variables on the Command Line | |
320 | --------------------------------------- | |
321 | ||
322 | You can set any `awk' variable by including a "variable assignment" | |
323 | among the arguments on the command line when you invoke `awk' (*note | |
324 | Invoking `awk': Command Line.). Such an assignment has this form: | |
325 | ||
326 | VARIABLE=TEXT | |
327 | ||
328 | With it, you can set a variable either at the beginning of the `awk' | |
329 | run or in between input files. | |
330 | ||
331 | If you precede the assignment with the `-v' option, like this: | |
332 | ||
333 | -v VARIABLE=TEXT | |
334 | ||
335 | then the variable is set at the very beginning, before even the `BEGIN' | |
336 | rules are run. The `-v' option and its assignment must precede all the | |
337 | file name arguments, as well as the program text. | |
338 | ||
339 | Otherwise, the variable assignment is performed at a time determined | |
340 | by its position among the input file arguments: after the processing of | |
341 | the preceding input file argument. For example: | |
342 | ||
343 | awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list | |
344 | ||
345 | prints the value of field number `n' for all input records. Before the | |
346 | first file is read, the command line sets the variable `n' equal to 4. | |
347 | This causes the fourth field to be printed in lines from the file | |
348 | `inventory-shipped'. After the first file has finished, but before the | |
349 | second file is started, `n' is set to 2, so that the second field is | |
350 | printed in lines from `BBS-list'. | |
351 | ||
352 | Command line arguments are made available for explicit examination by | |
353 | the `awk' program in an array named `ARGV' (*note Built-in | |
354 | Variables::.). | |
355 | ||
356 | `awk' processes the values of command line assignments for escape | |
357 | sequences (*note Constant Expressions: Constants.). | |
358 | ||
359 | \1f | |
360 | File: gawk.info, Node: Arithmetic Ops, Next: Concatenation, Prev: Variables, Up: Expressions | |
361 | ||
362 | Arithmetic Operators | |
363 | ==================== | |
364 | ||
365 | The `awk' language uses the common arithmetic operators when | |
366 | evaluating expressions. All of these arithmetic operators follow normal | |
367 | precedence rules, and work as you would expect them to. This example | |
368 | divides field three by field four, adds field two, stores the result | |
369 | into field one, and prints the resulting altered input record: | |
370 | ||
371 | awk '{ $1 = $2 + $3 / $4; print }' inventory-shipped | |
372 | ||
373 | The arithmetic operators in `awk' are: | |
374 | ||
375 | `X + Y' | |
376 | Addition. | |
377 | ||
378 | `X - Y' | |
379 | Subtraction. | |
380 | ||
381 | `- X' | |
382 | Negation. | |
383 | ||
384 | `+ X' | |
385 | Unary plus. No real effect on the expression. | |
386 | ||
387 | `X * Y' | |
388 | Multiplication. | |
389 | ||
390 | `X / Y' | |
391 | Division. Since all numbers in `awk' are double-precision | |
392 | floating point, the result is not rounded to an integer: `3 / 4' | |
393 | has the value 0.75. | |
394 | ||
395 | `X % Y' | |
396 | Remainder. The quotient is rounded toward zero to an integer, | |
397 | multiplied by Y and this result is subtracted from X. This | |
398 | operation is sometimes known as "trunc-mod." The following | |
399 | relation always holds: | |
400 | ||
401 | b * int(a / b) + (a % b) == a | |
402 | ||
403 | One possibly undesirable effect of this definition of remainder is | |
404 | that `X % Y' is negative if X is negative. Thus, | |
405 | ||
406 | -17 % 8 = -1 | |
407 | ||
408 | In other `awk' implementations, the signedness of the remainder | |
409 | may be machine dependent. | |
410 | ||
411 | `X ^ Y' | |
412 | `X ** Y' | |
413 | Exponentiation: X raised to the Y power. `2 ^ 3' has the value 8. | |
414 | The character sequence `**' is equivalent to `^'. (The POSIX | |
415 | standard only specifies the use of `^' for exponentiation.) | |
416 | ||
417 | \1f | |
418 | File: gawk.info, Node: Concatenation, Next: Comparison Ops, Prev: Arithmetic Ops, Up: Expressions | |
419 | ||
420 | String Concatenation | |
421 | ==================== | |
422 | ||
423 | There is only one string operation: concatenation. It does not have | |
424 | a specific operator to represent it. Instead, concatenation is | |
425 | performed by writing expressions next to one another, with no operator. | |
426 | For example: | |
427 | ||
428 | awk '{ print "Field number one: " $1 }' BBS-list | |
429 | ||
430 | produces, for the first record in `BBS-list': | |
431 | ||
432 | Field number one: aardvark | |
433 | ||
434 | Without the space in the string constant after the `:', the line | |
435 | would run together. For example: | |
436 | ||
437 | awk '{ print "Field number one:" $1 }' BBS-list | |
438 | ||
439 | produces, for the first record in `BBS-list': | |
440 | ||
441 | Field number one:aardvark | |
442 | ||
443 | Since string concatenation does not have an explicit operator, it is | |
444 | often necessary to insure that it happens where you want it to by | |
445 | enclosing the items to be concatenated in parentheses. For example, the | |
446 | following code fragment does not concatenate `file' and `name' as you | |
447 | might expect: | |
448 | ||
449 | file = "file" | |
450 | name = "name" | |
451 | print "something meaningful" > file name | |
452 | ||
453 | It is necessary to use the following: | |
454 | ||
455 | print "something meaningful" > (file name) | |
456 | ||
457 | We recommend you use parentheses around concatenation in all but the | |
458 | most common contexts (such as in the right-hand operand of `='). | |
459 | ||
460 | \1f | |
461 | File: gawk.info, Node: Comparison Ops, Next: Boolean Ops, Prev: Concatenation, Up: Expressions | |
462 | ||
463 | Comparison Expressions | |
464 | ====================== | |
465 | ||
466 | "Comparison expressions" compare strings or numbers for | |
467 | relationships such as equality. They are written using "relational | |
468 | operators", which are a superset of those in C. Here is a table of | |
469 | them: | |
470 | ||
471 | `X < Y' | |
472 | True if X is less than Y. | |
473 | ||
474 | `X <= Y' | |
475 | True if X is less than or equal to Y. | |
476 | ||
477 | `X > Y' | |
478 | True if X is greater than Y. | |
479 | ||
480 | `X >= Y' | |
481 | True if X is greater than or equal to Y. | |
482 | ||
483 | `X == Y' | |
484 | True if X is equal to Y. | |
485 | ||
486 | `X != Y' | |
487 | True if X is not equal to Y. | |
488 | ||
489 | `X ~ Y' | |
490 | True if the string X matches the regexp denoted by Y. | |
491 | ||
492 | `X !~ Y' | |
493 | True if the string X does not match the regexp denoted by Y. | |
494 | ||
495 | `SUBSCRIPT in ARRAY' | |
496 | True if array ARRAY has an element with the subscript SUBSCRIPT. | |
497 | ||
498 | Comparison expressions have the value 1 if true and 0 if false. | |
499 | ||
500 | The rules `gawk' uses for performing comparisons are based on those | |
501 | in draft 11.2 of the POSIX standard. The POSIX standard introduced the | |
502 | concept of a "numeric string", which is simply a string that looks like | |
503 | a number, for example, `" +2"'. | |
504 | ||
505 | When performing a relational operation, `gawk' considers the type of | |
506 | an operand to be the type it received on its last *assignment*, rather | |
507 | than the type of its last *use* (*note Numeric and String Values: | |
508 | Values.). This type is *unknown* when the operand is from an | |
509 | "external" source: field variables, command line arguments, array | |
510 | elements resulting from a `split' operation, and the value of an | |
511 | `ENVIRON' element. In this case only, if the operand is a numeric | |
512 | string, then it is considered to be of both string type and numeric | |
513 | type. If at least one operand of a comparison is of string type only, | |
514 | then a string comparison is performed. Any numeric operand will be | |
515 | converted to a string using the value of `CONVFMT' (*note Conversion of | |
516 | Strings and Numbers: Conversion.). If one operand of a comparison is | |
517 | numeric, and the other operand is either numeric or both numeric and | |
518 | string, then `gawk' does a numeric comparison. If both operands have | |
519 | both types, then the comparison is numeric. Strings are compared by | |
520 | comparing the first character of each, then the second character of | |
521 | each, and so on. Thus `"10"' is less than `"9"'. If there are two | |
522 | strings where one is a prefix of the other, the shorter string is less | |
523 | than the longer one. Thus `"abc"' is less than `"abcd"'. | |
524 | ||
525 | Here are some sample expressions, how `gawk' compares them, and what | |
526 | the result of the comparison is. | |
527 | ||
528 | `1.5 <= 2.0' | |
529 | numeric comparison (true) | |
530 | ||
531 | `"abc" >= "xyz"' | |
532 | string comparison (false) | |
533 | ||
534 | `1.5 != " +2"' | |
535 | string comparison (true) | |
536 | ||
537 | `"1e2" < "3"' | |
538 | string comparison (true) | |
539 | ||
540 | `a = 2; b = "2"' | |
541 | `a == b' | |
542 | string comparison (true) | |
543 | ||
544 | echo 1e2 3 | awk '{ print ($1 < $2) ? "true" : "false" }' | |
545 | ||
546 | prints `false' since both `$1' and `$2' are numeric strings and thus | |
547 | have both string and numeric types, thus dictating a numeric comparison. | |
548 | ||
549 | The purpose of the comparison rules and the use of numeric strings is | |
550 | to attempt to produce the behavior that is "least surprising," while | |
551 | still "doing the right thing." | |
552 | ||
553 | String comparisons and regular expression comparisons are very | |
554 | different. For example, | |
555 | ||
556 | $1 == "foo" | |
557 | ||
558 | has the value of 1, or is true, if the first field of the current input | |
559 | record is precisely `foo'. By contrast, | |
560 | ||
561 | $1 ~ /foo/ | |
562 | ||
563 | has the value 1 if the first field contains `foo', such as `foobar'. | |
564 | ||
565 | The right hand operand of the `~' and `!~' operators may be either a | |
566 | constant regexp (`/.../'), or it may be an ordinary expression, in | |
567 | which case the value of the expression as a string is a dynamic regexp | |
568 | (*note How to Use Regular Expressions: Regexp Usage.). | |
569 | ||
570 | In very recent implementations of `awk', a constant regular | |
571 | expression in slashes by itself is also an expression. The regexp | |
572 | `/REGEXP/' is an abbreviation for this comparison expression: | |
573 | ||
574 | $0 ~ /REGEXP/ | |
575 | ||
576 | In some contexts it may be necessary to write parentheses around the | |
577 | regexp to avoid confusing the `gawk' parser. For example, `(/x/ - /y/) | |
578 | > threshold' is not allowed, but `((/x/) - (/y/)) > threshold' parses | |
579 | properly. | |
580 | ||
581 | One special place where `/foo/' is *not* an abbreviation for `$0 ~ | |
582 | /foo/' is when it is the right-hand operand of `~' or `!~'! *Note | |
583 | Constant Expressions: Constants, where this is discussed in more detail. | |
584 | ||
585 | \1f | |
586 | File: gawk.info, Node: Boolean Ops, Next: Assignment Ops, Prev: Comparison Ops, Up: Expressions | |
587 | ||
588 | Boolean Expressions | |
589 | =================== | |
590 | ||
591 | A "boolean expression" is a combination of comparison expressions or | |
592 | matching expressions, using the boolean operators "or" (`||'), "and" | |
593 | (`&&'), and "not" (`!'), along with parentheses to control nesting. | |
594 | The truth of the boolean expression is computed by combining the truth | |
595 | values of the component expressions. | |
596 | ||
597 | Boolean expressions can be used wherever comparison and matching | |
598 | expressions can be used. They can be used in `if', `while' `do' and | |
599 | `for' statements. They have numeric values (1 if true, 0 if false), | |
600 | which come into play if the result of the boolean expression is stored | |
601 | in a variable, or used in arithmetic. | |
602 | ||
603 | In addition, every boolean expression is also a valid boolean | |
604 | pattern, so you can use it as a pattern to control the execution of | |
605 | rules. | |
606 | ||
607 | Here are descriptions of the three boolean operators, with an | |
608 | example of each. It may be instructive to compare these examples with | |
609 | the analogous examples of boolean patterns (*note Boolean Operators and | |
610 | Patterns: Boolean Patterns.), which use the same boolean operators in | |
611 | patterns instead of expressions. | |
612 | ||
613 | `BOOLEAN1 && BOOLEAN2' | |
614 | True if both BOOLEAN1 and BOOLEAN2 are true. For example, the | |
615 | following statement prints the current input record if it contains | |
616 | both `2400' and `foo'. | |
617 | ||
618 | if ($0 ~ /2400/ && $0 ~ /foo/) print | |
619 | ||
620 | The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is true. | |
621 | This can make a difference when BOOLEAN2 contains expressions that | |
622 | have side effects: in the case of `$0 ~ /foo/ && ($2 == bar++)', | |
623 | the variable `bar' is not incremented if there is no `foo' in the | |
624 | record. | |
625 | ||
626 | `BOOLEAN1 || BOOLEAN2' | |
627 | True if at least one of BOOLEAN1 or BOOLEAN2 is true. For | |
628 | example, the following command prints all records in the input | |
629 | file `BBS-list' that contain *either* `2400' or `foo', or both. | |
630 | ||
631 | awk '{ if ($0 ~ /2400/ || $0 ~ /foo/) print }' BBS-list | |
632 | ||
633 | The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is false. | |
634 | This can make a difference when BOOLEAN2 contains expressions | |
635 | that have side effects. | |
636 | ||
637 | `!BOOLEAN' | |
638 | True if BOOLEAN is false. For example, the following program | |
639 | prints all records in the input file `BBS-list' that do *not* | |
640 | contain the string `foo'. | |
641 | ||
642 | awk '{ if (! ($0 ~ /foo/)) print }' BBS-list | |
643 | ||
644 | \1f | |
645 | File: gawk.info, Node: Assignment Ops, Next: Increment Ops, Prev: Boolean Ops, Up: Expressions | |
646 | ||
647 | Assignment Expressions | |
648 | ====================== | |
649 | ||
650 | An "assignment" is an expression that stores a new value into a | |
651 | variable. For example, let's assign the value 1 to the variable `z': | |
652 | ||
653 | z = 1 | |
654 | ||
655 | After this expression is executed, the variable `z' has the value 1. | |
656 | Whatever old value `z' had before the assignment is forgotten. | |
657 | ||
658 | Assignments can store string values also. For example, this would | |
659 | store the value `"this food is good"' in the variable `message': | |
660 | ||
661 | thing = "food" | |
662 | predicate = "good" | |
663 | message = "this " thing " is " predicate | |
664 | ||
665 | (This also illustrates concatenation of strings.) | |
666 | ||
667 | The `=' sign is called an "assignment operator". It is the simplest | |
668 | assignment operator because the value of the right-hand operand is | |
669 | stored unchanged. | |
670 | ||
671 | Most operators (addition, concatenation, and so on) have no effect | |
672 | except to compute a value. If you ignore the value, you might as well | |
673 | not use the operator. An assignment operator is different; it does | |
674 | produce a value, but even if you ignore the value, the assignment still | |
675 | makes itself felt through the alteration of the variable. We call this | |
676 | a "side effect". | |
677 | ||
678 | The left-hand operand of an assignment need not be a variable (*note | |
679 | Variables::.); it can also be a field (*note Changing the Contents of a | |
680 | Field: Changing Fields.) or an array element (*note Arrays in `awk': | |
681 | Arrays.). These are all called "lvalues", which means they can appear | |
682 | on the left-hand side of an assignment operator. The right-hand | |
683 | operand may be any expression; it produces the new value which the | |
684 | assignment stores in the specified variable, field or array element. | |
685 | ||
686 | It is important to note that variables do *not* have permanent types. | |
687 | The type of a variable is simply the type of whatever value it happens | |
688 | to hold at the moment. In the following program fragment, the variable | |
689 | `foo' has a numeric value at first, and a string value later on: | |
690 | ||
691 | foo = 1 | |
692 | print foo | |
693 | foo = "bar" | |
694 | print foo | |
695 | ||
696 | When the second assignment gives `foo' a string value, the fact that it | |
697 | previously had a numeric value is forgotten. | |
698 | ||
699 | An assignment is an expression, so it has a value: the same value | |
700 | that is assigned. Thus, `z = 1' as an expression has the value 1. One | |
701 | consequence of this is that you can write multiple assignments together: | |
702 | ||
703 | x = y = z = 0 | |
704 | ||
705 | stores the value 0 in all three variables. It does this because the | |
706 | value of `z = 0', which is 0, is stored into `y', and then the value of | |
707 | `y = z = 0', which is 0, is stored into `x'. | |
708 | ||
709 | You can use an assignment anywhere an expression is called for. For | |
710 | example, it is valid to write `x != (y = 1)' to set `y' to 1 and then | |
711 | test whether `x' equals 1. But this style tends to make programs hard | |
712 | to read; except in a one-shot program, you should rewrite it to get rid | |
713 | of such nesting of assignments. This is never very hard. | |
714 | ||
715 | Aside from `=', there are several other assignment operators that do | |
716 | arithmetic with the old value of the variable. For example, the | |
717 | operator `+=' computes a new value by adding the right-hand value to | |
718 | the old value of the variable. Thus, the following assignment adds 5 | |
719 | to the value of `foo': | |
720 | ||
721 | foo += 5 | |
722 | ||
723 | This is precisely equivalent to the following: | |
724 | ||
725 | foo = foo + 5 | |
726 | ||
727 | Use whichever one makes the meaning of your program clearer. | |
728 | ||
729 | Here is a table of the arithmetic assignment operators. In each | |
730 | case, the right-hand operand is an expression whose value is converted | |
731 | to a number. | |
732 | ||
733 | `LVALUE += INCREMENT' | |
734 | Adds INCREMENT to the value of LVALUE to make the new value of | |
735 | LVALUE. | |
736 | ||
737 | `LVALUE -= DECREMENT' | |
738 | Subtracts DECREMENT from the value of LVALUE. | |
739 | ||
740 | `LVALUE *= COEFFICIENT' | |
741 | Multiplies the value of LVALUE by COEFFICIENT. | |
742 | ||
743 | `LVALUE /= QUOTIENT' | |
744 | Divides the value of LVALUE by QUOTIENT. | |
745 | ||
746 | `LVALUE %= MODULUS' | |
747 | Sets LVALUE to its remainder by MODULUS. | |
748 | ||
749 | `LVALUE ^= POWER' | |
750 | `LVALUE **= POWER' | |
751 | Raises LVALUE to the power POWER. (Only the `^=' operator is | |
752 | specified by POSIX.) | |
753 | ||
754 | \1f | |
755 | File: gawk.info, Node: Increment Ops, Next: Conversion, Prev: Assignment Ops, Up: Expressions | |
756 | ||
757 | Increment Operators | |
758 | =================== | |
759 | ||
760 | "Increment operators" increase or decrease the value of a variable | |
761 | by 1. You could do the same thing with an assignment operator, so the | |
762 | increment operators add no power to the `awk' language; but they are | |
763 | convenient abbreviations for something very common. | |
764 | ||
765 | The operator to add 1 is written `++'. It can be used to increment | |
766 | a variable either before or after taking its value. | |
767 | ||
768 | To pre-increment a variable V, write `++V'. This adds 1 to the | |
769 | value of V and that new value is also the value of this expression. | |
770 | The assignment expression `V += 1' is completely equivalent. | |
771 | ||
772 | Writing the `++' after the variable specifies post-increment. This | |
773 | increments the variable value just the same; the difference is that the | |
774 | value of the increment expression itself is the variable's *old* value. | |
775 | Thus, if `foo' has the value 4, then the expression `foo++' has the | |
776 | value 4, but it changes the value of `foo' to 5. | |
777 | ||
778 | The post-increment `foo++' is nearly equivalent to writing `(foo += | |
779 | 1) - 1'. It is not perfectly equivalent because all numbers in `awk' | |
780 | are floating point: in floating point, `foo + 1 - 1' does not | |
781 | necessarily equal `foo'. But the difference is minute as long as you | |
782 | stick to numbers that are fairly small (less than a trillion). | |
783 | ||
784 | Any lvalue can be incremented. Fields and array elements are | |
785 | incremented just like variables. (Use `$(i++)' when you wish to do a | |
786 | field reference and a variable increment at the same time. The | |
787 | parentheses are necessary because of the precedence of the field | |
788 | reference operator, `$'.) | |
789 | ||
790 | The decrement operator `--' works just like `++' except that it | |
791 | subtracts 1 instead of adding. Like `++', it can be used before the | |
792 | lvalue to pre-decrement or after it to post-decrement. | |
793 | ||
794 | Here is a summary of increment and decrement expressions. | |
795 | ||
796 | `++LVALUE' | |
797 | This expression increments LVALUE and the new value becomes the | |
798 | value of this expression. | |
799 | ||
800 | `LVALUE++' | |
801 | This expression causes the contents of LVALUE to be incremented. | |
802 | The value of the expression is the *old* value of LVALUE. | |
803 | ||
804 | `--LVALUE' | |
805 | Like `++LVALUE', but instead of adding, it subtracts. It | |
806 | decrements LVALUE and delivers the value that results. | |
807 | ||
808 | `LVALUE--' | |
809 | Like `LVALUE++', but instead of adding, it subtracts. It | |
810 | decrements LVALUE. The value of the expression is the *old* value | |
811 | of LVALUE. | |
812 | ||
813 | \1f | |
814 | File: gawk.info, Node: Conversion, Next: Values, Prev: Increment Ops, Up: Expressions | |
815 | ||
816 | Conversion of Strings and Numbers | |
817 | ================================= | |
818 | ||
819 | Strings are converted to numbers, and numbers to strings, if the | |
820 | context of the `awk' program demands it. For example, if the value of | |
821 | either `foo' or `bar' in the expression `foo + bar' happens to be a | |
822 | string, it is converted to a number before the addition is performed. | |
823 | If numeric values appear in string concatenation, they are converted to | |
824 | strings. Consider this: | |
825 | ||
826 | two = 2; three = 3 | |
827 | print (two three) + 4 | |
828 | ||
829 | This eventually prints the (numeric) value 27. The numeric values of | |
830 | the variables `two' and `three' are converted to strings and | |
831 | concatenated together, and the resulting string is converted back to the | |
832 | number 23, to which 4 is then added. | |
833 | ||
834 | If, for some reason, you need to force a number to be converted to a | |
835 | string, concatenate the null string with that number. To force a string | |
836 | to be converted to a number, add zero to that string. | |
837 | ||
838 | A string is converted to a number by interpreting a numeric prefix | |
839 | of the string as numerals: `"2.5"' converts to 2.5, `"1e3"' converts to | |
840 | 1000, and `"25fix"' has a numeric value of 25. Strings that can't be | |
841 | interpreted as valid numbers are converted to zero. | |
842 | ||
843 | The exact manner in which numbers are converted into strings is | |
844 | controlled by the `awk' built-in variable `CONVFMT' (*note Built-in | |
845 | Variables::.). Numbers are converted using a special version of the | |
846 | `sprintf' function (*note Built-in Functions: Built-in.) with `CONVFMT' | |
847 | as the format specifier. | |
848 | ||
849 | `CONVFMT''s default value is `"%.6g"', which prints a value with at | |
850 | least six significant digits. For some applications you will want to | |
851 | change it to specify more precision. Double precision on most modern | |
852 | machines gives you 16 or 17 decimal digits of precision. | |
853 | ||
854 | Strange results can happen if you set `CONVFMT' to a string that | |
855 | doesn't tell `sprintf' how to format floating point numbers in a useful | |
856 | way. For example, if you forget the `%' in the format, all numbers | |
857 | will be converted to the same constant string. | |
858 | ||
859 | As a special case, if a number is an integer, then the result of | |
860 | converting it to a string is *always* an integer, no matter what the | |
861 | value of `CONVFMT' may be. Given the following code fragment: | |
862 | ||
863 | CONVFMT = "%2.2f" | |
864 | a = 12 | |
865 | b = a "" | |
866 | ||
867 | `b' has the value `"12"', not `"12.00"'. | |
868 | ||
869 | Prior to the POSIX standard, `awk' specified that the value of | |
870 | `OFMT' was used for converting numbers to strings. `OFMT' specifies | |
871 | the output format to use when printing numbers with `print'. `CONVFMT' | |
872 | was introduced in order to separate the semantics of conversions from | |
873 | the semantics of printing. Both `CONVFMT' and `OFMT' have the same | |
874 | default value: `"%.6g"'. In the vast majority of cases, old `awk' | |
875 | programs will not change their behavior. However, this use of `OFMT' | |
876 | is something to keep in mind if you must port your program to other | |
877 | implementations of `awk'; we recommend that instead of changing your | |
878 | programs, you just port `gawk' itself! | |
879 | ||
880 | \1f | |
881 | File: gawk.info, Node: Values, Next: Conditional Exp, Prev: Conversion, Up: Expressions | |
882 | ||
883 | Numeric and String Values | |
884 | ========================= | |
885 | ||
886 | Through most of this manual, we present `awk' values (such as | |
887 | constants, fields, or variables) as *either* numbers *or* strings. | |
888 | This is a convenient way to think about them, since typically they are | |
889 | used in only one way, or the other. | |
890 | ||
891 | In truth though, `awk' values can be *both* string and numeric, at | |
892 | the same time. Internally, `awk' represents values with a string, a | |
893 | (floating point) number, and an indication that one, the other, or both | |
894 | representations of the value are valid. | |
895 | ||
896 | Keeping track of both kinds of values is important for execution | |
897 | efficiency: a variable can acquire a string value the first time it is | |
898 | used as a string, and then that string value can be used until the | |
899 | variable is assigned a new value. Thus, if a variable with only a | |
900 | numeric value is used in several concatenations in a row, it only has | |
901 | to be given a string representation once. The numeric value remains | |
902 | valid, so that no conversion back to a number is necessary if the | |
903 | variable is later used in an arithmetic expression. | |
904 | ||
905 | Tracking both kinds of values is also important for precise numerical | |
906 | calculations. Consider the following: | |
907 | ||
908 | a = 123.321 | |
909 | CONVFMT = "%3.1f" | |
910 | b = a " is a number" | |
911 | c = a + 1.654 | |
912 | ||
913 | The variable `a' receives a string value in the concatenation and | |
914 | assignment to `b'. The string value of `a' is `"123.3"'. If the | |
915 | numeric value was lost when it was converted to a string, then the | |
916 | numeric use of `a' in the last statement would lose information. `c' | |
917 | would be assigned the value 124.954 instead of 124.975. Such errors | |
918 | accumulate rapidly, and very adversely affect numeric computations. | |
919 | ||
920 | Once a numeric value acquires a corresponding string value, it stays | |
921 | valid until a new assignment is made. If `CONVFMT' (*note Conversion | |
922 | of Strings and Numbers: Conversion.) changes in the meantime, the old | |
923 | string value will still be used. For example: | |
924 | ||
925 | BEGIN { | |
926 | CONVFMT = "%2.2f" | |
927 | a = 123.456 | |
928 | b = a "" # force `a' to have string value too | |
929 | printf "a = %s\n", a | |
930 | CONVFMT = "%.6g" | |
931 | printf "a = %s\n", a | |
932 | a += 0 # make `a' numeric only again | |
933 | printf "a = %s\n", a # use `a' as string | |
934 | } | |
935 | ||
936 | This program prints `a = 123.46' twice, and then prints `a = 123.456'. | |
937 | ||
938 | *Note Conversion of Strings and Numbers: Conversion, for the rules | |
939 | that specify how string values are made from numeric values. | |
940 | ||
941 | \1f | |
942 | File: gawk.info, Node: Conditional Exp, Next: Function Calls, Prev: Values, Up: Expressions | |
943 | ||
944 | Conditional Expressions | |
945 | ======================= | |
946 | ||
947 | A "conditional expression" is a special kind of expression with | |
948 | three operands. It allows you to use one expression's value to select | |
949 | one of two other expressions. | |
950 | ||
951 | The conditional expression looks the same as in the C language: | |
952 | ||
953 | SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP | |
954 | ||
955 | There are three subexpressions. The first, SELECTOR, is always | |
956 | computed first. If it is "true" (not zero and not null) then | |
957 | IF-TRUE-EXP is computed next and its value becomes the value of the | |
958 | whole expression. Otherwise, IF-FALSE-EXP is computed next and its | |
959 | value becomes the value of the whole expression. | |
960 | ||
961 | For example, this expression produces the absolute value of `x': | |
962 | ||
963 | x > 0 ? x : -x | |
964 | ||
965 | Each time the conditional expression is computed, exactly one of | |
966 | IF-TRUE-EXP and IF-FALSE-EXP is computed; the other is ignored. This | |
967 | is important when the expressions contain side effects. For example, | |
968 | this conditional expression examines element `i' of either array `a' or | |
969 | array `b', and increments `i'. | |
970 | ||
971 | x == y ? a[i++] : b[i++] | |
972 | ||
973 | This is guaranteed to increment `i' exactly once, because each time one | |
974 | or the other of the two increment expressions is executed, and the | |
975 | other is not. | |
976 | ||
977 | \1f | |
978 | File: gawk.info, Node: Function Calls, Next: Precedence, Prev: Conditional Exp, Up: Expressions | |
979 | ||
980 | Function Calls | |
981 | ============== | |
982 | ||
983 | A "function" is a name for a particular calculation. Because it has | |
984 | a name, you can ask for it by name at any point in the program. For | |
985 | example, the function `sqrt' computes the square root of a number. | |
986 | ||
987 | A fixed set of functions are "built-in", which means they are | |
988 | available in every `awk' program. The `sqrt' function is one of these. | |
989 | *Note Built-in Functions: Built-in, for a list of built-in functions | |
990 | and their descriptions. In addition, you can define your own functions | |
991 | in the program for use elsewhere in the same program. *Note | |
992 | User-defined Functions: User-defined, for how to do this. | |
993 | ||
994 | The way to use a function is with a "function call" expression, | |
995 | which consists of the function name followed by a list of "arguments" | |
996 | in parentheses. The arguments are expressions which give the raw | |
997 | materials for the calculation that the function will do. When there is | |
998 | more than one argument, they are separated by commas. If there are no | |
999 | arguments, write just `()' after the function name. Here are some | |
1000 | examples: | |
1001 | ||
1002 | sqrt(x^2 + y^2) # One argument | |
1003 | atan2(y, x) # Two arguments | |
1004 | rand() # No arguments | |
1005 | ||
1006 | *Do not put any space between the function name and the | |
1007 | open-parenthesis!* A user-defined function name looks just like the | |
1008 | name of a variable, and space would make the expression look like | |
1009 | concatenation of a variable with an expression inside parentheses. | |
1010 | Space before the parenthesis is harmless with built-in functions, but | |
1011 | it is best not to get into the habit of using space to avoid mistakes | |
1012 | with user-defined functions. | |
1013 | ||
1014 | Each function expects a particular number of arguments. For | |
1015 | example, the `sqrt' function must be called with a single argument, the | |
1016 | number to take the square root of: | |
1017 | ||
1018 | sqrt(ARGUMENT) | |
1019 | ||
1020 | Some of the built-in functions allow you to omit the final argument. | |
1021 | If you do so, they use a reasonable default. *Note Built-in Functions: | |
1022 | Built-in, for full details. If arguments are omitted in calls to | |
1023 | user-defined functions, then those arguments are treated as local | |
1024 | variables, initialized to the null string (*note User-defined | |
1025 | Functions: User-defined.). | |
1026 | ||
1027 | Like every other expression, the function call has a value, which is | |
1028 | computed by the function based on the arguments you give it. In this | |
1029 | example, the value of `sqrt(ARGUMENT)' is the square root of the | |
1030 | argument. A function can also have side effects, such as assigning the | |
1031 | values of certain variables or doing I/O. | |
1032 | ||
1033 | Here is a command to read numbers, one number per line, and print the | |
1034 | square root of each one: | |
1035 | ||
1036 | awk '{ print "The square root of", $1, "is", sqrt($1) }' | |
1037 | ||
1038 | \1f | |
1039 | File: gawk.info, Node: Precedence, Prev: Function Calls, Up: Expressions | |
1040 | ||
1041 | Operator Precedence (How Operators Nest) | |
1042 | ======================================== | |
1043 | ||
1044 | "Operator precedence" determines how operators are grouped, when | |
1045 | different operators appear close by in one expression. For example, | |
1046 | `*' has higher precedence than `+'; thus, `a + b * c' means to multiply | |
1047 | `b' and `c', and then add `a' to the product (i.e., `a + (b * c)'). | |
1048 | ||
1049 | You can overrule the precedence of the operators by using | |
1050 | parentheses. You can think of the precedence rules as saying where the | |
1051 | parentheses are assumed if you do not write parentheses yourself. In | |
1052 | fact, it is wise to always use parentheses whenever you have an unusual | |
1053 | combination of operators, because other people who read the program may | |
1054 | not remember what the precedence is in this case. You might forget, | |
1055 | too; then you could make a mistake. Explicit parentheses will help | |
1056 | prevent any such mistake. | |
1057 | ||
1058 | When operators of equal precedence are used together, the leftmost | |
1059 | operator groups first, except for the assignment, conditional and | |
1060 | exponentiation operators, which group in the opposite order. Thus, `a | |
1061 | - b + c' groups as `(a - b) + c'; `a = b = c' groups as `a = (b = c)'. | |
1062 | ||
1063 | The precedence of prefix unary operators does not matter as long as | |
1064 | only unary operators are involved, because there is only one way to | |
1065 | parse them--innermost first. Thus, `$++i' means `$(++i)' and `++$x' | |
1066 | means `++($x)'. However, when another operator follows the operand, | |
1067 | then the precedence of the unary operators can matter. Thus, `$x^2' | |
1068 | means `($x)^2', but `-x^2' means `-(x^2)', because `-' has lower | |
1069 | precedence than `^' while `$' has higher precedence. | |
1070 | ||
1071 | Here is a table of the operators of `awk', in order of increasing | |
1072 | precedence: | |
1073 | ||
1074 | assignment | |
1075 | `=', `+=', `-=', `*=', `/=', `%=', `^=', `**='. These operators | |
1076 | group right-to-left. (The `**=' operator is not specified by | |
1077 | POSIX.) | |
1078 | ||
1079 | conditional | |
1080 | `?:'. This operator groups right-to-left. | |
1081 | ||
1082 | logical "or". | |
1083 | `||'. | |
1084 | ||
1085 | logical "and". | |
1086 | `&&'. | |
1087 | ||
1088 | array membership | |
1089 | `in'. | |
1090 | ||
1091 | matching | |
1092 | `~', `!~'. | |
1093 | ||
1094 | relational, and redirection | |
1095 | The relational operators and the redirections have the same | |
1096 | precedence level. Characters such as `>' serve both as | |
1097 | relationals and as redirections; the context distinguishes between | |
1098 | the two meanings. | |
1099 | ||
1100 | The relational operators are `<', `<=', `==', `!=', `>=' and `>'. | |
1101 | ||
1102 | The I/O redirection operators are `<', `>', `>>' and `|'. | |
1103 | ||
1104 | Note that I/O redirection operators in `print' and `printf' | |
1105 | statements belong to the statement level, not to expressions. The | |
1106 | redirection does not produce an expression which could be the | |
1107 | operand of another operator. As a result, it does not make sense | |
1108 | to use a redirection operator near another operator of lower | |
1109 | precedence, without parentheses. Such combinations, for example | |
1110 | `print foo > a ? b : c', result in syntax errors. | |
1111 | ||
1112 | concatenation | |
1113 | No special token is used to indicate concatenation. The operands | |
1114 | are simply written side by side. | |
1115 | ||
1116 | add, subtract | |
1117 | `+', `-'. | |
1118 | ||
1119 | multiply, divide, mod | |
1120 | `*', `/', `%'. | |
1121 | ||
1122 | unary plus, minus, "not" | |
1123 | `+', `-', `!'. | |
1124 | ||
1125 | exponentiation | |
1126 | `^', `**'. These operators group right-to-left. (The `**' | |
1127 | operator is not specified by POSIX.) | |
1128 | ||
1129 | increment, decrement | |
1130 | `++', `--'. | |
1131 | ||
1132 | field | |
1133 | `$'. | |
1134 | ||
1135 | \1f | |
1136 | File: gawk.info, Node: Statements, Next: Arrays, Prev: Expressions, Up: Top | |
1137 | ||
1138 | Control Statements in Actions | |
1139 | ***************************** | |
1140 | ||
1141 | "Control statements" such as `if', `while', and so on control the | |
1142 | flow of execution in `awk' programs. Most of the control statements in | |
1143 | `awk' are patterned on similar statements in C. | |
1144 | ||
1145 | All the control statements start with special keywords such as `if' | |
1146 | and `while', to distinguish them from simple expressions. | |
1147 | ||
1148 | Many control statements contain other statements; for example, the | |
1149 | `if' statement contains another statement which may or may not be | |
1150 | executed. The contained statement is called the "body". If you want | |
1151 | to include more than one statement in the body, group them into a | |
1152 | single compound statement with curly braces, separating them with | |
1153 | newlines or semicolons. | |
1154 | ||
1155 | * Menu: | |
1156 | ||
1157 | * If Statement:: Conditionally execute | |
1158 | some `awk' statements. | |
1159 | * While Statement:: Loop until some condition is satisfied. | |
1160 | * Do Statement:: Do specified action while looping until some | |
1161 | condition is satisfied. | |
1162 | * For Statement:: Another looping statement, that provides | |
1163 | initialization and increment clauses. | |
1164 | * Break Statement:: Immediately exit the innermost enclosing loop. | |
1165 | * Continue Statement:: Skip to the end of the innermost | |
1166 | enclosing loop. | |
1167 | * Next Statement:: Stop processing the current input record. | |
1168 | * Next File Statement:: Stop processing the current file. | |
1169 | * Exit Statement:: Stop execution of `awk'. | |
1170 | ||
1171 | \1f | |
1172 | File: gawk.info, Node: If Statement, Next: While Statement, Prev: Statements, Up: Statements | |
1173 | ||
1174 | The `if' Statement | |
1175 | ================== | |
1176 | ||
1177 | The `if'-`else' statement is `awk''s decision-making statement. It | |
1178 | looks like this: | |
1179 | ||
1180 | if (CONDITION) THEN-BODY [else ELSE-BODY] | |
1181 | ||
1182 | CONDITION is an expression that controls what the rest of the statement | |
1183 | will do. If CONDITION is true, THEN-BODY is executed; otherwise, | |
1184 | ELSE-BODY is executed (assuming that the `else' clause is present). | |
1185 | The `else' part of the statement is optional. The condition is | |
1186 | considered false if its value is zero or the null string, and true | |
1187 | otherwise. | |
1188 | ||
1189 | Here is an example: | |
1190 | ||
1191 | if (x % 2 == 0) | |
1192 | print "x is even" | |
1193 | else | |
1194 | print "x is odd" | |
1195 | ||
1196 | In this example, if the expression `x % 2 == 0' is true (that is, | |
1197 | the value of `x' is divisible by 2), then the first `print' statement | |
1198 | is executed, otherwise the second `print' statement is performed. | |
1199 | ||
1200 | If the `else' appears on the same line as THEN-BODY, and THEN-BODY | |
1201 | is not a compound statement (i.e., not surrounded by curly braces), | |
1202 | then a semicolon must separate THEN-BODY from `else'. To illustrate | |
1203 | this, let's rewrite the previous example: | |
1204 | ||
1205 | awk '{ if (x % 2 == 0) print "x is even"; else | |
1206 | print "x is odd" }' | |
1207 | ||
1208 | If you forget the `;', `awk' won't be able to parse the statement, and | |
1209 | you will get a syntax error. | |
1210 | ||
1211 | We would not actually write this example this way, because a human | |
1212 | reader might fail to see the `else' if it were not the first thing on | |
1213 | its line. | |
1214 | ||
1215 | \1f | |
1216 | File: gawk.info, Node: While Statement, Next: Do Statement, Prev: If Statement, Up: Statements | |
1217 | ||
1218 | The `while' Statement | |
1219 | ===================== | |
1220 | ||
1221 | In programming, a "loop" means a part of a program that is (or at | |
1222 | least can be) executed two or more times in succession. | |
1223 | ||
1224 | The `while' statement is the simplest looping statement in `awk'. | |
1225 | It repeatedly executes a statement as long as a condition is true. It | |
1226 | looks like this: | |
1227 | ||
1228 | while (CONDITION) | |
1229 | BODY | |
1230 | ||
1231 | Here BODY is a statement that we call the "body" of the loop, and | |
1232 | CONDITION is an expression that controls how long the loop keeps | |
1233 | running. | |
1234 | ||
1235 | The first thing the `while' statement does is test CONDITION. If | |
1236 | CONDITION is true, it executes the statement BODY. (CONDITION is true | |
1237 | when the value is not zero and not a null string.) After BODY has been | |
1238 | executed, CONDITION is tested again, and if it is still true, BODY is | |
1239 | executed again. This process repeats until CONDITION is no longer | |
1240 | true. If CONDITION is initially false, the body of the loop is never | |
1241 | executed. | |
1242 | ||
1243 | This example prints the first three fields of each record, one per | |
1244 | line. | |
1245 | ||
1246 | awk '{ i = 1 | |
1247 | while (i <= 3) { | |
1248 | print $i | |
1249 | i++ | |
1250 | } | |
1251 | }' | |
1252 | ||
1253 | Here the body of the loop is a compound statement enclosed in braces, | |
1254 | containing two statements. | |
1255 | ||
1256 | The loop works like this: first, the value of `i' is set to 1. | |
1257 | Then, the `while' tests whether `i' is less than or equal to three. | |
1258 | This is the case when `i' equals one, so the `i'-th field is printed. | |
1259 | Then the `i++' increments the value of `i' and the loop repeats. The | |
1260 | loop terminates when `i' reaches 4. | |
1261 | ||
1262 | As you can see, a newline is not required between the condition and | |
1263 | the body; but using one makes the program clearer unless the body is a | |
1264 | compound statement or is very simple. The newline after the open-brace | |
1265 | that begins the compound statement is not required either, but the | |
1266 | program would be hard to read without it. | |
1267 | ||
1268 | \1f | |
1269 | File: gawk.info, Node: Do Statement, Next: For Statement, Prev: While Statement, Up: Statements | |
1270 | ||
1271 | The `do'-`while' Statement | |
1272 | ========================== | |
1273 | ||
1274 | The `do' loop is a variation of the `while' looping statement. The | |
1275 | `do' loop executes the BODY once, then repeats BODY as long as | |
1276 | CONDITION is true. It looks like this: | |
1277 | ||
1278 | do | |
1279 | BODY | |
1280 | while (CONDITION) | |
1281 | ||
1282 | Even if CONDITION is false at the start, BODY is executed at least | |
1283 | once (and only once, unless executing BODY makes CONDITION true). | |
1284 | Contrast this with the corresponding `while' statement: | |
1285 | ||
1286 | while (CONDITION) | |
1287 | BODY | |
1288 | ||
1289 | This statement does not execute BODY even once if CONDITION is false to | |
1290 | begin with. | |
1291 | ||
1292 | Here is an example of a `do' statement: | |
1293 | ||
1294 | awk '{ i = 1 | |
1295 | do { | |
1296 | print $0 | |
1297 | i++ | |
1298 | } while (i <= 10) | |
1299 | }' | |
1300 | ||
1301 | prints each input record ten times. It isn't a very realistic example, | |
1302 | since in this case an ordinary `while' would do just as well. But this | |
1303 | reflects actual experience; there is only occasionally a real use for a | |
1304 | `do' statement. | |
1305 |