BSD 4_4_Lite2 development
[unix-history] / usr / src / contrib / gawk-2.15.2 / gawk.info-4
CommitLineData
2f9c14bd
C
1This is Info file gawk.info, produced by Makeinfo-1.54 from the input
2file gawk.texi.
3
4 This file documents `awk', a program that you can use to select
5particular records in a file and perform operations upon them.
6
7 This is Edition 0.15 of `The GAWK Manual',
8for the 2.15 version of the GNU implementation
9of AWK.
10
11 Copyright (C) 1989, 1991, 1992, 1993 Free Software Foundation, Inc.
12
13 Permission is granted to make and distribute verbatim copies of this
14manual provided the copyright notice and this permission notice are
15preserved on all copies.
16
17 Permission is granted to copy and distribute modified versions of
18this manual under the conditions for verbatim copying, provided that
19the entire resulting derived work is distributed under the terms of a
20permission notice identical to this one.
21
22 Permission is granted to copy and distribute translations of this
23manual into another language, under the above conditions for modified
24versions, except that this permission notice may be stated in a
25translation approved by the Foundation.
26
27\1f
28File: gawk.info, Node: Actions, Next: Expressions, Prev: Patterns, Up: Top
29
30Overview of Actions
31*******************
32
33 An `awk' program or script consists of a series of rules and
34function definitions, interspersed. (Functions are described later.
35*Note User-defined Functions: User-defined.)
36
37 A rule contains a pattern and an action, either of which may be
38omitted. The purpose of the "action" is to tell `awk' what to do once
39a match for the pattern is found. Thus, the entire program looks
40somewhat like this:
41
42 [PATTERN] [{ ACTION }]
43 [PATTERN] [{ ACTION }]
44 ...
45 function NAME (ARGS) { ... }
46 ...
47
48 An action consists of one or more `awk' "statements", enclosed in
49curly braces (`{' and `}'). Each statement specifies one thing to be
50done. The statements are separated by newlines or semicolons.
51
52 The curly braces around an action must be used even if the action
53contains only one statement, or even if it contains no statements at
54all. However, if you omit the action entirely, omit the curly braces as
55well. (An omitted action is equivalent to `{ print $0 }'.)
56
57 Here are the kinds of statements supported in `awk':
58
59 * Expressions, which can call functions or assign values to variables
60 (*note Expressions as Action Statements: Expressions.). Executing
61 this kind of statement simply computes the value of the expression
62 and then ignores it. This is useful when the expression has side
63 effects (*note Assignment Expressions: Assignment Ops.).
64
65 * Control statements, which specify the control flow of `awk'
66 programs. The `awk' language gives you C-like constructs (`if',
67 `for', `while', and so on) as well as a few special ones (*note
68 Control Statements in Actions: Statements.).
69
70 * Compound statements, which consist of one or more statements
71 enclosed in curly braces. A compound statement is used in order
72 to put several statements together in the body of an `if',
73 `while', `do' or `for' statement.
74
75 * Input control, using the `getline' command (*note Explicit Input
76 with `getline': Getline.), and the `next' statement (*note The
77 `next' Statement: Next Statement.).
78
79 * Output statements, `print' and `printf'. *Note Printing Output:
80 Printing.
81
82 * Deletion statements, for deleting array elements. *Note The
83 `delete' Statement: Delete.
84
85\1f
86File: gawk.info, Node: Expressions, Next: Statements, Prev: Actions, Up: Top
87
88Expressions as Action Statements
89********************************
90
91 Expressions are the basic building block of `awk' actions. An
92expression evaluates to a value, which you can print, test, store in a
93variable or pass to a function. But beyond that, an expression can
94assign a new value to a variable or a field, with an assignment
95operator.
96
97 An expression can serve as a statement on its own. Most other kinds
98of statements contain one or more expressions which specify data to be
99operated on. As in other languages, expressions in `awk' include
100variables, array references, constants, and function calls, as well as
101combinations of these with various operators.
102
103* Menu:
104
105* Constants:: String, numeric, and regexp constants.
106* Variables:: Variables give names to values for later use.
107* Arithmetic Ops:: Arithmetic operations (`+', `-', etc.)
108* Concatenation:: Concatenating strings.
109* Comparison Ops:: Comparison of numbers and strings
110 with `<', etc.
111* Boolean Ops:: Combining comparison expressions
112 using boolean operators
113 `||' ("or"), `&&' ("and") and `!' ("not").
114
115* Assignment Ops:: Changing the value of a variable or a field.
116* Increment Ops:: Incrementing the numeric value of a variable.
117
118* Conversion:: The conversion of strings to numbers
119 and vice versa.
120* Values:: The whole truth about numbers and strings.
121* Conditional Exp:: Conditional expressions select
122 between two subexpressions under control
123 of a third subexpression.
124* Function Calls:: A function call is an expression.
125* Precedence:: How various operators nest.
126
127\1f
128File: gawk.info, Node: Constants, Next: Variables, Prev: Expressions, Up: Expressions
129
130Constant Expressions
131====================
132
133 The simplest type of expression is the "constant", which always has
134the same value. There are three types of constants: numeric constants,
135string constants, and regular expression constants.
136
137 A "numeric constant" stands for a number. This number can be an
138integer, a decimal fraction, or a number in scientific (exponential)
139notation. Note that all numeric values are represented within `awk' in
140double-precision floating point. Here are some examples of numeric
141constants, which all have the same value:
142
143 105
144 1.05e+2
145 1050e-1
146
147 A string constant consists of a sequence of characters enclosed in
148double-quote marks. For example:
149
150 "parrot"
151
152represents the string whose contents are `parrot'. Strings in `gawk'
153can be of any length and they can contain all the possible 8-bit ASCII
154characters including ASCII NUL. Other `awk' implementations may have
155difficulty with some character codes.
156
157 Some characters cannot be included literally in a string constant.
158You represent them instead with "escape sequences", which are character
159sequences beginning with a backslash (`\').
160
161 One use of an escape sequence is to include a double-quote character
162in a string constant. Since a plain double-quote would end the string,
163you must use `\"' to represent a single double-quote character as a
164part of the string. The backslash character itself is another
165character that cannot be included normally; you write `\\' to put one
166backslash in the string. Thus, the string whose contents are the two
167characters `"\' must be written `"\"\\"'.
168
169 Another use of backslash is to represent unprintable characters such
170as newline. While there is nothing to stop you from writing most of
171these characters directly in a string constant, they may look ugly.
172
173 Here is a table of all the escape sequences used in `awk':
174
175`\\'
176 Represents a literal backslash, `\'.
177
178`\a'
179 Represents the "alert" character, control-g, ASCII code 7.
180
181`\b'
182 Represents a backspace, control-h, ASCII code 8.
183
184`\f'
185 Represents a formfeed, control-l, ASCII code 12.
186
187`\n'
188 Represents a newline, control-j, ASCII code 10.
189
190`\r'
191 Represents a carriage return, control-m, ASCII code 13.
192
193`\t'
194 Represents a horizontal tab, control-i, ASCII code 9.
195
196`\v'
197 Represents a vertical tab, control-k, ASCII code 11.
198
199`\NNN'
200 Represents the octal value NNN, where NNN are one to three digits
201 between 0 and 7. For example, the code for the ASCII ESC (escape)
202 character is `\033'.
203
204`\xHH...'
205 Represents the hexadecimal value HH, where HH are hexadecimal
206 digits (`0' through `9' and either `A' through `F' or `a' through
207 `f'). Like the same construct in ANSI C, the escape sequence
208 continues until the first non-hexadecimal digit is seen. However,
209 using more than two hexadecimal digits produces undefined results.
210 (The `\x' escape sequence is not allowed in POSIX `awk'.)
211
212 A "constant regexp" is a regular expression description enclosed in
213slashes, such as `/^beginning and end$/'. Most regexps used in `awk'
214programs are constant, but the `~' and `!~' operators can also match
215computed or "dynamic" regexps (*note How to Use Regular Expressions:
216Regexp Usage.).
217
218 Constant regexps may be used like simple expressions. When a
219constant regexp is not on the right hand side of the `~' or `!~'
220operators, it has the same meaning as if it appeared in a pattern, i.e.
221`($0 ~ /foo/)' (*note Expressions as Patterns: Expression Patterns.).
222This means that the two code segments,
223
224 if ($0 ~ /barfly/ || $0 ~ /camelot/)
225 print "found"
226
227and
228
229 if (/barfly/ || /camelot/)
230 print "found"
231
232are exactly equivalent. One rather bizarre consequence of this rule is
233that the following boolean expression is legal, but does not do what
234the user intended:
235
236 if (/foo/ ~ $1) print "found foo"
237
238 This code is "obviously" testing `$1' for a match against the regexp
239`/foo/'. But in fact, the expression `(/foo/ ~ $1)' actually means
240`(($0 ~ /foo/) ~ $1)'. In other words, first match the input record
241against the regexp `/foo/'. The result will be either a 0 or a 1,
242depending upon the success or failure of the match. Then match that
243result against the first field in the record.
244
245 Since it is unlikely that you would ever really wish to make this
246kind of test, `gawk' will issue a warning when it sees this construct in
247a program.
248
249 Another consequence of this rule is that the assignment statement
250
251 matches = /foo/
252
253will assign either 0 or 1 to the variable `matches', depending upon the
254contents of the current input record.
255
256 Constant regular expressions are also used as the first argument for
257the `sub' and `gsub' functions (*note Built-in Functions for String
258Manipulation: String Functions.).
259
260 This feature of the language was never well documented until the
261POSIX specification.
262
263 You may be wondering, when is
264
265 $1 ~ /foo/ { ... }
266
267preferable to
268
269 $1 ~ "foo" { ... }
270
271 Since the right-hand sides of both `~' operators are constants, it
272is more efficient to use the `/foo/' form: `awk' can note that you have
273supplied a regexp and store it internally in a form that makes pattern
274matching more efficient. In the second form, `awk' must first convert
275the string into this internal form, and then perform the pattern
276matching. The first form is also better style; it shows clearly that
277you intend a regexp match.
278
279\1f
280File: gawk.info, Node: Variables, Next: Arithmetic Ops, Prev: Constants, Up: Expressions
281
282Variables
283=========
284
285 Variables let you give names to values and refer to them later. You
286have already seen variables in many of the examples. The name of a
287variable must be a sequence of letters, digits and underscores, but it
288may not begin with a digit. Case is significant in variable names; `a'
289and `A' are distinct variables.
290
291 A variable name is a valid expression by itself; it represents the
292variable's current value. Variables are given new values with
293"assignment operators" and "increment operators". *Note Assignment
294Expressions: Assignment Ops.
295
296 A few variables have special built-in meanings, such as `FS', the
297field separator, and `NF', the number of fields in the current input
298record. *Note Built-in Variables::, for a list of them. These
299built-in variables can be used and assigned just like all other
300variables, but their values are also used or changed automatically by
301`awk'. Each built-in variable's name is made entirely of upper case
302letters.
303
304 Variables in `awk' can be assigned either numeric or string values.
305By default, variables are initialized to the null string, which is
306effectively zero if converted to a number. There is no need to
307"initialize" each variable explicitly in `awk', the way you would in C
308or most other traditional languages.
309
310* Menu:
311
312* Assignment Options:: Setting variables on the command line
313 and a summary of command line syntax.
314 This is an advanced method of input.
315
316\1f
317File: gawk.info, Node: Assignment Options, Prev: Variables, Up: Variables
318
319Assigning Variables on the Command Line
320---------------------------------------
321
322 You can set any `awk' variable by including a "variable assignment"
323among the arguments on the command line when you invoke `awk' (*note
324Invoking `awk': Command Line.). Such an assignment has this form:
325
326 VARIABLE=TEXT
327
328With it, you can set a variable either at the beginning of the `awk'
329run or in between input files.
330
331 If you precede the assignment with the `-v' option, like this:
332
333 -v VARIABLE=TEXT
334
335then the variable is set at the very beginning, before even the `BEGIN'
336rules are run. The `-v' option and its assignment must precede all the
337file name arguments, as well as the program text.
338
339 Otherwise, the variable assignment is performed at a time determined
340by its position among the input file arguments: after the processing of
341the preceding input file argument. For example:
342
343 awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list
344
345prints the value of field number `n' for all input records. Before the
346first file is read, the command line sets the variable `n' equal to 4.
347This causes the fourth field to be printed in lines from the file
348`inventory-shipped'. After the first file has finished, but before the
349second file is started, `n' is set to 2, so that the second field is
350printed in lines from `BBS-list'.
351
352 Command line arguments are made available for explicit examination by
353the `awk' program in an array named `ARGV' (*note Built-in
354Variables::.).
355
356 `awk' processes the values of command line assignments for escape
357sequences (*note Constant Expressions: Constants.).
358
359\1f
360File: gawk.info, Node: Arithmetic Ops, Next: Concatenation, Prev: Variables, Up: Expressions
361
362Arithmetic Operators
363====================
364
365 The `awk' language uses the common arithmetic operators when
366evaluating expressions. All of these arithmetic operators follow normal
367precedence rules, and work as you would expect them to. This example
368divides field three by field four, adds field two, stores the result
369into field one, and prints the resulting altered input record:
370
371 awk '{ $1 = $2 + $3 / $4; print }' inventory-shipped
372
373 The arithmetic operators in `awk' are:
374
375`X + Y'
376 Addition.
377
378`X - Y'
379 Subtraction.
380
381`- X'
382 Negation.
383
384`+ X'
385 Unary plus. No real effect on the expression.
386
387`X * Y'
388 Multiplication.
389
390`X / Y'
391 Division. Since all numbers in `awk' are double-precision
392 floating point, the result is not rounded to an integer: `3 / 4'
393 has the value 0.75.
394
395`X % Y'
396 Remainder. The quotient is rounded toward zero to an integer,
397 multiplied by Y and this result is subtracted from X. This
398 operation is sometimes known as "trunc-mod." The following
399 relation always holds:
400
401 b * int(a / b) + (a % b) == a
402
403 One possibly undesirable effect of this definition of remainder is
404 that `X % Y' is negative if X is negative. Thus,
405
406 -17 % 8 = -1
407
408 In other `awk' implementations, the signedness of the remainder
409 may be machine dependent.
410
411`X ^ Y'
412`X ** Y'
413 Exponentiation: X raised to the Y power. `2 ^ 3' has the value 8.
414 The character sequence `**' is equivalent to `^'. (The POSIX
415 standard only specifies the use of `^' for exponentiation.)
416
417\1f
418File: gawk.info, Node: Concatenation, Next: Comparison Ops, Prev: Arithmetic Ops, Up: Expressions
419
420String Concatenation
421====================
422
423 There is only one string operation: concatenation. It does not have
424a specific operator to represent it. Instead, concatenation is
425performed by writing expressions next to one another, with no operator.
426For example:
427
428 awk '{ print "Field number one: " $1 }' BBS-list
429
430produces, for the first record in `BBS-list':
431
432 Field number one: aardvark
433
434 Without the space in the string constant after the `:', the line
435would run together. For example:
436
437 awk '{ print "Field number one:" $1 }' BBS-list
438
439produces, for the first record in `BBS-list':
440
441 Field number one:aardvark
442
443 Since string concatenation does not have an explicit operator, it is
444often necessary to insure that it happens where you want it to by
445enclosing the items to be concatenated in parentheses. For example, the
446following code fragment does not concatenate `file' and `name' as you
447might expect:
448
449 file = "file"
450 name = "name"
451 print "something meaningful" > file name
452
453It is necessary to use the following:
454
455 print "something meaningful" > (file name)
456
457 We recommend you use parentheses around concatenation in all but the
458most common contexts (such as in the right-hand operand of `=').
459
460\1f
461File: gawk.info, Node: Comparison Ops, Next: Boolean Ops, Prev: Concatenation, Up: Expressions
462
463Comparison Expressions
464======================
465
466 "Comparison expressions" compare strings or numbers for
467relationships such as equality. They are written using "relational
468operators", which are a superset of those in C. Here is a table of
469them:
470
471`X < Y'
472 True if X is less than Y.
473
474`X <= Y'
475 True if X is less than or equal to Y.
476
477`X > Y'
478 True if X is greater than Y.
479
480`X >= Y'
481 True if X is greater than or equal to Y.
482
483`X == Y'
484 True if X is equal to Y.
485
486`X != Y'
487 True if X is not equal to Y.
488
489`X ~ Y'
490 True if the string X matches the regexp denoted by Y.
491
492`X !~ Y'
493 True if the string X does not match the regexp denoted by Y.
494
495`SUBSCRIPT in ARRAY'
496 True if array ARRAY has an element with the subscript SUBSCRIPT.
497
498 Comparison expressions have the value 1 if true and 0 if false.
499
500 The rules `gawk' uses for performing comparisons are based on those
501in draft 11.2 of the POSIX standard. The POSIX standard introduced the
502concept of a "numeric string", which is simply a string that looks like
503a number, for example, `" +2"'.
504
505 When performing a relational operation, `gawk' considers the type of
506an operand to be the type it received on its last *assignment*, rather
507than the type of its last *use* (*note Numeric and String Values:
508Values.). This type is *unknown* when the operand is from an
509"external" source: field variables, command line arguments, array
510elements resulting from a `split' operation, and the value of an
511`ENVIRON' element. In this case only, if the operand is a numeric
512string, then it is considered to be of both string type and numeric
513type. If at least one operand of a comparison is of string type only,
514then a string comparison is performed. Any numeric operand will be
515converted to a string using the value of `CONVFMT' (*note Conversion of
516Strings and Numbers: Conversion.). If one operand of a comparison is
517numeric, and the other operand is either numeric or both numeric and
518string, then `gawk' does a numeric comparison. If both operands have
519both types, then the comparison is numeric. Strings are compared by
520comparing the first character of each, then the second character of
521each, and so on. Thus `"10"' is less than `"9"'. If there are two
522strings where one is a prefix of the other, the shorter string is less
523than the longer one. Thus `"abc"' is less than `"abcd"'.
524
525 Here are some sample expressions, how `gawk' compares them, and what
526the result of the comparison is.
527
528`1.5 <= 2.0'
529 numeric comparison (true)
530
531`"abc" >= "xyz"'
532 string comparison (false)
533
534`1.5 != " +2"'
535 string comparison (true)
536
537`"1e2" < "3"'
538 string comparison (true)
539
540`a = 2; b = "2"'
541`a == b'
542 string comparison (true)
543
544 echo 1e2 3 | awk '{ print ($1 < $2) ? "true" : "false" }'
545
546prints `false' since both `$1' and `$2' are numeric strings and thus
547have both string and numeric types, thus dictating a numeric comparison.
548
549 The purpose of the comparison rules and the use of numeric strings is
550to attempt to produce the behavior that is "least surprising," while
551still "doing the right thing."
552
553 String comparisons and regular expression comparisons are very
554different. For example,
555
556 $1 == "foo"
557
558has the value of 1, or is true, if the first field of the current input
559record is precisely `foo'. By contrast,
560
561 $1 ~ /foo/
562
563has the value 1 if the first field contains `foo', such as `foobar'.
564
565 The right hand operand of the `~' and `!~' operators may be either a
566constant regexp (`/.../'), or it may be an ordinary expression, in
567which case the value of the expression as a string is a dynamic regexp
568(*note How to Use Regular Expressions: Regexp Usage.).
569
570 In very recent implementations of `awk', a constant regular
571expression in slashes by itself is also an expression. The regexp
572`/REGEXP/' is an abbreviation for this comparison expression:
573
574 $0 ~ /REGEXP/
575
576 In some contexts it may be necessary to write parentheses around the
577regexp to avoid confusing the `gawk' parser. For example, `(/x/ - /y/)
578> threshold' is not allowed, but `((/x/) - (/y/)) > threshold' parses
579properly.
580
581 One special place where `/foo/' is *not* an abbreviation for `$0 ~
582/foo/' is when it is the right-hand operand of `~' or `!~'! *Note
583Constant Expressions: Constants, where this is discussed in more detail.
584
585\1f
586File: gawk.info, Node: Boolean Ops, Next: Assignment Ops, Prev: Comparison Ops, Up: Expressions
587
588Boolean Expressions
589===================
590
591 A "boolean expression" is a combination of comparison expressions or
592matching expressions, using the boolean operators "or" (`||'), "and"
593(`&&'), and "not" (`!'), along with parentheses to control nesting.
594The truth of the boolean expression is computed by combining the truth
595values of the component expressions.
596
597 Boolean expressions can be used wherever comparison and matching
598expressions can be used. They can be used in `if', `while' `do' and
599`for' statements. They have numeric values (1 if true, 0 if false),
600which come into play if the result of the boolean expression is stored
601in a variable, or used in arithmetic.
602
603 In addition, every boolean expression is also a valid boolean
604pattern, so you can use it as a pattern to control the execution of
605rules.
606
607 Here are descriptions of the three boolean operators, with an
608example of each. It may be instructive to compare these examples with
609the analogous examples of boolean patterns (*note Boolean Operators and
610Patterns: Boolean Patterns.), which use the same boolean operators in
611patterns instead of expressions.
612
613`BOOLEAN1 && BOOLEAN2'
614 True if both BOOLEAN1 and BOOLEAN2 are true. For example, the
615 following statement prints the current input record if it contains
616 both `2400' and `foo'.
617
618 if ($0 ~ /2400/ && $0 ~ /foo/) print
619
620 The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is true.
621 This can make a difference when BOOLEAN2 contains expressions that
622 have side effects: in the case of `$0 ~ /foo/ && ($2 == bar++)',
623 the variable `bar' is not incremented if there is no `foo' in the
624 record.
625
626`BOOLEAN1 || BOOLEAN2'
627 True if at least one of BOOLEAN1 or BOOLEAN2 is true. For
628 example, the following command prints all records in the input
629 file `BBS-list' that contain *either* `2400' or `foo', or both.
630
631 awk '{ if ($0 ~ /2400/ || $0 ~ /foo/) print }' BBS-list
632
633 The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is false.
634 This can make a difference when BOOLEAN2 contains expressions
635 that have side effects.
636
637`!BOOLEAN'
638 True if BOOLEAN is false. For example, the following program
639 prints all records in the input file `BBS-list' that do *not*
640 contain the string `foo'.
641
642 awk '{ if (! ($0 ~ /foo/)) print }' BBS-list
643
644\1f
645File: gawk.info, Node: Assignment Ops, Next: Increment Ops, Prev: Boolean Ops, Up: Expressions
646
647Assignment Expressions
648======================
649
650 An "assignment" is an expression that stores a new value into a
651variable. For example, let's assign the value 1 to the variable `z':
652
653 z = 1
654
655 After this expression is executed, the variable `z' has the value 1.
656Whatever old value `z' had before the assignment is forgotten.
657
658 Assignments can store string values also. For example, this would
659store the value `"this food is good"' in the variable `message':
660
661 thing = "food"
662 predicate = "good"
663 message = "this " thing " is " predicate
664
665(This also illustrates concatenation of strings.)
666
667 The `=' sign is called an "assignment operator". It is the simplest
668assignment operator because the value of the right-hand operand is
669stored unchanged.
670
671 Most operators (addition, concatenation, and so on) have no effect
672except to compute a value. If you ignore the value, you might as well
673not use the operator. An assignment operator is different; it does
674produce a value, but even if you ignore the value, the assignment still
675makes itself felt through the alteration of the variable. We call this
676a "side effect".
677
678 The left-hand operand of an assignment need not be a variable (*note
679Variables::.); it can also be a field (*note Changing the Contents of a
680Field: Changing Fields.) or an array element (*note Arrays in `awk':
681Arrays.). These are all called "lvalues", which means they can appear
682on the left-hand side of an assignment operator. The right-hand
683operand may be any expression; it produces the new value which the
684assignment stores in the specified variable, field or array element.
685
686 It is important to note that variables do *not* have permanent types.
687The type of a variable is simply the type of whatever value it happens
688to hold at the moment. In the following program fragment, the variable
689`foo' has a numeric value at first, and a string value later on:
690
691 foo = 1
692 print foo
693 foo = "bar"
694 print foo
695
696When the second assignment gives `foo' a string value, the fact that it
697previously had a numeric value is forgotten.
698
699 An assignment is an expression, so it has a value: the same value
700that is assigned. Thus, `z = 1' as an expression has the value 1. One
701consequence of this is that you can write multiple assignments together:
702
703 x = y = z = 0
704
705stores the value 0 in all three variables. It does this because the
706value of `z = 0', which is 0, is stored into `y', and then the value of
707`y = z = 0', which is 0, is stored into `x'.
708
709 You can use an assignment anywhere an expression is called for. For
710example, it is valid to write `x != (y = 1)' to set `y' to 1 and then
711test whether `x' equals 1. But this style tends to make programs hard
712to read; except in a one-shot program, you should rewrite it to get rid
713of such nesting of assignments. This is never very hard.
714
715 Aside from `=', there are several other assignment operators that do
716arithmetic with the old value of the variable. For example, the
717operator `+=' computes a new value by adding the right-hand value to
718the old value of the variable. Thus, the following assignment adds 5
719to the value of `foo':
720
721 foo += 5
722
723This is precisely equivalent to the following:
724
725 foo = foo + 5
726
727Use whichever one makes the meaning of your program clearer.
728
729 Here is a table of the arithmetic assignment operators. In each
730case, the right-hand operand is an expression whose value is converted
731to a number.
732
733`LVALUE += INCREMENT'
734 Adds INCREMENT to the value of LVALUE to make the new value of
735 LVALUE.
736
737`LVALUE -= DECREMENT'
738 Subtracts DECREMENT from the value of LVALUE.
739
740`LVALUE *= COEFFICIENT'
741 Multiplies the value of LVALUE by COEFFICIENT.
742
743`LVALUE /= QUOTIENT'
744 Divides the value of LVALUE by QUOTIENT.
745
746`LVALUE %= MODULUS'
747 Sets LVALUE to its remainder by MODULUS.
748
749`LVALUE ^= POWER'
750`LVALUE **= POWER'
751 Raises LVALUE to the power POWER. (Only the `^=' operator is
752 specified by POSIX.)
753
754\1f
755File: gawk.info, Node: Increment Ops, Next: Conversion, Prev: Assignment Ops, Up: Expressions
756
757Increment Operators
758===================
759
760 "Increment operators" increase or decrease the value of a variable
761by 1. You could do the same thing with an assignment operator, so the
762increment operators add no power to the `awk' language; but they are
763convenient abbreviations for something very common.
764
765 The operator to add 1 is written `++'. It can be used to increment
766a variable either before or after taking its value.
767
768 To pre-increment a variable V, write `++V'. This adds 1 to the
769value of V and that new value is also the value of this expression.
770The assignment expression `V += 1' is completely equivalent.
771
772 Writing the `++' after the variable specifies post-increment. This
773increments the variable value just the same; the difference is that the
774value of the increment expression itself is the variable's *old* value.
775Thus, if `foo' has the value 4, then the expression `foo++' has the
776value 4, but it changes the value of `foo' to 5.
777
778 The post-increment `foo++' is nearly equivalent to writing `(foo +=
7791) - 1'. It is not perfectly equivalent because all numbers in `awk'
780are floating point: in floating point, `foo + 1 - 1' does not
781necessarily equal `foo'. But the difference is minute as long as you
782stick to numbers that are fairly small (less than a trillion).
783
784 Any lvalue can be incremented. Fields and array elements are
785incremented just like variables. (Use `$(i++)' when you wish to do a
786field reference and a variable increment at the same time. The
787parentheses are necessary because of the precedence of the field
788reference operator, `$'.)
789
790 The decrement operator `--' works just like `++' except that it
791subtracts 1 instead of adding. Like `++', it can be used before the
792lvalue to pre-decrement or after it to post-decrement.
793
794 Here is a summary of increment and decrement expressions.
795
796`++LVALUE'
797 This expression increments LVALUE and the new value becomes the
798 value of this expression.
799
800`LVALUE++'
801 This expression causes the contents of LVALUE to be incremented.
802 The value of the expression is the *old* value of LVALUE.
803
804`--LVALUE'
805 Like `++LVALUE', but instead of adding, it subtracts. It
806 decrements LVALUE and delivers the value that results.
807
808`LVALUE--'
809 Like `LVALUE++', but instead of adding, it subtracts. It
810 decrements LVALUE. The value of the expression is the *old* value
811 of LVALUE.
812
813\1f
814File: gawk.info, Node: Conversion, Next: Values, Prev: Increment Ops, Up: Expressions
815
816Conversion of Strings and Numbers
817=================================
818
819 Strings are converted to numbers, and numbers to strings, if the
820context of the `awk' program demands it. For example, if the value of
821either `foo' or `bar' in the expression `foo + bar' happens to be a
822string, it is converted to a number before the addition is performed.
823If numeric values appear in string concatenation, they are converted to
824strings. Consider this:
825
826 two = 2; three = 3
827 print (two three) + 4
828
829This eventually prints the (numeric) value 27. The numeric values of
830the variables `two' and `three' are converted to strings and
831concatenated together, and the resulting string is converted back to the
832number 23, to which 4 is then added.
833
834 If, for some reason, you need to force a number to be converted to a
835string, concatenate the null string with that number. To force a string
836to be converted to a number, add zero to that string.
837
838 A string is converted to a number by interpreting a numeric prefix
839of the string as numerals: `"2.5"' converts to 2.5, `"1e3"' converts to
8401000, and `"25fix"' has a numeric value of 25. Strings that can't be
841interpreted as valid numbers are converted to zero.
842
843 The exact manner in which numbers are converted into strings is
844controlled by the `awk' built-in variable `CONVFMT' (*note Built-in
845Variables::.). Numbers are converted using a special version of the
846`sprintf' function (*note Built-in Functions: Built-in.) with `CONVFMT'
847as the format specifier.
848
849 `CONVFMT''s default value is `"%.6g"', which prints a value with at
850least six significant digits. For some applications you will want to
851change it to specify more precision. Double precision on most modern
852machines gives you 16 or 17 decimal digits of precision.
853
854 Strange results can happen if you set `CONVFMT' to a string that
855doesn't tell `sprintf' how to format floating point numbers in a useful
856way. For example, if you forget the `%' in the format, all numbers
857will be converted to the same constant string.
858
859 As a special case, if a number is an integer, then the result of
860converting it to a string is *always* an integer, no matter what the
861value of `CONVFMT' may be. Given the following code fragment:
862
863 CONVFMT = "%2.2f"
864 a = 12
865 b = a ""
866
867`b' has the value `"12"', not `"12.00"'.
868
869 Prior to the POSIX standard, `awk' specified that the value of
870`OFMT' was used for converting numbers to strings. `OFMT' specifies
871the output format to use when printing numbers with `print'. `CONVFMT'
872was introduced in order to separate the semantics of conversions from
873the semantics of printing. Both `CONVFMT' and `OFMT' have the same
874default value: `"%.6g"'. In the vast majority of cases, old `awk'
875programs will not change their behavior. However, this use of `OFMT'
876is something to keep in mind if you must port your program to other
877implementations of `awk'; we recommend that instead of changing your
878programs, you just port `gawk' itself!
879
880\1f
881File: gawk.info, Node: Values, Next: Conditional Exp, Prev: Conversion, Up: Expressions
882
883Numeric and String Values
884=========================
885
886 Through most of this manual, we present `awk' values (such as
887constants, fields, or variables) as *either* numbers *or* strings.
888This is a convenient way to think about them, since typically they are
889used in only one way, or the other.
890
891 In truth though, `awk' values can be *both* string and numeric, at
892the same time. Internally, `awk' represents values with a string, a
893(floating point) number, and an indication that one, the other, or both
894representations of the value are valid.
895
896 Keeping track of both kinds of values is important for execution
897efficiency: a variable can acquire a string value the first time it is
898used as a string, and then that string value can be used until the
899variable is assigned a new value. Thus, if a variable with only a
900numeric value is used in several concatenations in a row, it only has
901to be given a string representation once. The numeric value remains
902valid, so that no conversion back to a number is necessary if the
903variable is later used in an arithmetic expression.
904
905 Tracking both kinds of values is also important for precise numerical
906calculations. Consider the following:
907
908 a = 123.321
909 CONVFMT = "%3.1f"
910 b = a " is a number"
911 c = a + 1.654
912
913The variable `a' receives a string value in the concatenation and
914assignment to `b'. The string value of `a' is `"123.3"'. If the
915numeric value was lost when it was converted to a string, then the
916numeric use of `a' in the last statement would lose information. `c'
917would be assigned the value 124.954 instead of 124.975. Such errors
918accumulate rapidly, and very adversely affect numeric computations.
919
920 Once a numeric value acquires a corresponding string value, it stays
921valid until a new assignment is made. If `CONVFMT' (*note Conversion
922of Strings and Numbers: Conversion.) changes in the meantime, the old
923string value will still be used. For example:
924
925 BEGIN {
926 CONVFMT = "%2.2f"
927 a = 123.456
928 b = a "" # force `a' to have string value too
929 printf "a = %s\n", a
930 CONVFMT = "%.6g"
931 printf "a = %s\n", a
932 a += 0 # make `a' numeric only again
933 printf "a = %s\n", a # use `a' as string
934 }
935
936This program prints `a = 123.46' twice, and then prints `a = 123.456'.
937
938 *Note Conversion of Strings and Numbers: Conversion, for the rules
939that specify how string values are made from numeric values.
940
941\1f
942File: gawk.info, Node: Conditional Exp, Next: Function Calls, Prev: Values, Up: Expressions
943
944Conditional Expressions
945=======================
946
947 A "conditional expression" is a special kind of expression with
948three operands. It allows you to use one expression's value to select
949one of two other expressions.
950
951 The conditional expression looks the same as in the C language:
952
953 SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP
954
955There are three subexpressions. The first, SELECTOR, is always
956computed first. If it is "true" (not zero and not null) then
957IF-TRUE-EXP is computed next and its value becomes the value of the
958whole expression. Otherwise, IF-FALSE-EXP is computed next and its
959value becomes the value of the whole expression.
960
961 For example, this expression produces the absolute value of `x':
962
963 x > 0 ? x : -x
964
965 Each time the conditional expression is computed, exactly one of
966IF-TRUE-EXP and IF-FALSE-EXP is computed; the other is ignored. This
967is important when the expressions contain side effects. For example,
968this conditional expression examines element `i' of either array `a' or
969array `b', and increments `i'.
970
971 x == y ? a[i++] : b[i++]
972
973This is guaranteed to increment `i' exactly once, because each time one
974or the other of the two increment expressions is executed, and the
975other is not.
976
977\1f
978File: gawk.info, Node: Function Calls, Next: Precedence, Prev: Conditional Exp, Up: Expressions
979
980Function Calls
981==============
982
983 A "function" is a name for a particular calculation. Because it has
984a name, you can ask for it by name at any point in the program. For
985example, the function `sqrt' computes the square root of a number.
986
987 A fixed set of functions are "built-in", which means they are
988available in every `awk' program. The `sqrt' function is one of these.
989*Note Built-in Functions: Built-in, for a list of built-in functions
990and their descriptions. In addition, you can define your own functions
991in the program for use elsewhere in the same program. *Note
992User-defined Functions: User-defined, for how to do this.
993
994 The way to use a function is with a "function call" expression,
995which consists of the function name followed by a list of "arguments"
996in parentheses. The arguments are expressions which give the raw
997materials for the calculation that the function will do. When there is
998more than one argument, they are separated by commas. If there are no
999arguments, write just `()' after the function name. Here are some
1000examples:
1001
1002 sqrt(x^2 + y^2) # One argument
1003 atan2(y, x) # Two arguments
1004 rand() # No arguments
1005
1006 *Do not put any space between the function name and the
1007open-parenthesis!* A user-defined function name looks just like the
1008name of a variable, and space would make the expression look like
1009concatenation of a variable with an expression inside parentheses.
1010Space before the parenthesis is harmless with built-in functions, but
1011it is best not to get into the habit of using space to avoid mistakes
1012with user-defined functions.
1013
1014 Each function expects a particular number of arguments. For
1015example, the `sqrt' function must be called with a single argument, the
1016number to take the square root of:
1017
1018 sqrt(ARGUMENT)
1019
1020 Some of the built-in functions allow you to omit the final argument.
1021If you do so, they use a reasonable default. *Note Built-in Functions:
1022Built-in, for full details. If arguments are omitted in calls to
1023user-defined functions, then those arguments are treated as local
1024variables, initialized to the null string (*note User-defined
1025Functions: User-defined.).
1026
1027 Like every other expression, the function call has a value, which is
1028computed by the function based on the arguments you give it. In this
1029example, the value of `sqrt(ARGUMENT)' is the square root of the
1030argument. A function can also have side effects, such as assigning the
1031values of certain variables or doing I/O.
1032
1033 Here is a command to read numbers, one number per line, and print the
1034square root of each one:
1035
1036 awk '{ print "The square root of", $1, "is", sqrt($1) }'
1037
1038\1f
1039File: gawk.info, Node: Precedence, Prev: Function Calls, Up: Expressions
1040
1041Operator Precedence (How Operators Nest)
1042========================================
1043
1044 "Operator precedence" determines how operators are grouped, when
1045different operators appear close by in one expression. For example,
1046`*' has higher precedence than `+'; thus, `a + b * c' means to multiply
1047`b' and `c', and then add `a' to the product (i.e., `a + (b * c)').
1048
1049 You can overrule the precedence of the operators by using
1050parentheses. You can think of the precedence rules as saying where the
1051parentheses are assumed if you do not write parentheses yourself. In
1052fact, it is wise to always use parentheses whenever you have an unusual
1053combination of operators, because other people who read the program may
1054not remember what the precedence is in this case. You might forget,
1055too; then you could make a mistake. Explicit parentheses will help
1056prevent any such mistake.
1057
1058 When operators of equal precedence are used together, the leftmost
1059operator groups first, except for the assignment, conditional and
1060exponentiation operators, which group in the opposite order. Thus, `a
1061- b + c' groups as `(a - b) + c'; `a = b = c' groups as `a = (b = c)'.
1062
1063 The precedence of prefix unary operators does not matter as long as
1064only unary operators are involved, because there is only one way to
1065parse them--innermost first. Thus, `$++i' means `$(++i)' and `++$x'
1066means `++($x)'. However, when another operator follows the operand,
1067then the precedence of the unary operators can matter. Thus, `$x^2'
1068means `($x)^2', but `-x^2' means `-(x^2)', because `-' has lower
1069precedence than `^' while `$' has higher precedence.
1070
1071 Here is a table of the operators of `awk', in order of increasing
1072precedence:
1073
1074assignment
1075 `=', `+=', `-=', `*=', `/=', `%=', `^=', `**='. These operators
1076 group right-to-left. (The `**=' operator is not specified by
1077 POSIX.)
1078
1079conditional
1080 `?:'. This operator groups right-to-left.
1081
1082logical "or".
1083 `||'.
1084
1085logical "and".
1086 `&&'.
1087
1088array membership
1089 `in'.
1090
1091matching
1092 `~', `!~'.
1093
1094relational, and redirection
1095 The relational operators and the redirections have the same
1096 precedence level. Characters such as `>' serve both as
1097 relationals and as redirections; the context distinguishes between
1098 the two meanings.
1099
1100 The relational operators are `<', `<=', `==', `!=', `>=' and `>'.
1101
1102 The I/O redirection operators are `<', `>', `>>' and `|'.
1103
1104 Note that I/O redirection operators in `print' and `printf'
1105 statements belong to the statement level, not to expressions. The
1106 redirection does not produce an expression which could be the
1107 operand of another operator. As a result, it does not make sense
1108 to use a redirection operator near another operator of lower
1109 precedence, without parentheses. Such combinations, for example
1110 `print foo > a ? b : c', result in syntax errors.
1111
1112concatenation
1113 No special token is used to indicate concatenation. The operands
1114 are simply written side by side.
1115
1116add, subtract
1117 `+', `-'.
1118
1119multiply, divide, mod
1120 `*', `/', `%'.
1121
1122unary plus, minus, "not"
1123 `+', `-', `!'.
1124
1125exponentiation
1126 `^', `**'. These operators group right-to-left. (The `**'
1127 operator is not specified by POSIX.)
1128
1129increment, decrement
1130 `++', `--'.
1131
1132field
1133 `$'.
1134
1135\1f
1136File: gawk.info, Node: Statements, Next: Arrays, Prev: Expressions, Up: Top
1137
1138Control Statements in Actions
1139*****************************
1140
1141 "Control statements" such as `if', `while', and so on control the
1142flow of execution in `awk' programs. Most of the control statements in
1143`awk' are patterned on similar statements in C.
1144
1145 All the control statements start with special keywords such as `if'
1146and `while', to distinguish them from simple expressions.
1147
1148 Many control statements contain other statements; for example, the
1149`if' statement contains another statement which may or may not be
1150executed. The contained statement is called the "body". If you want
1151to include more than one statement in the body, group them into a
1152single compound statement with curly braces, separating them with
1153newlines or semicolons.
1154
1155* Menu:
1156
1157* If Statement:: Conditionally execute
1158 some `awk' statements.
1159* While Statement:: Loop until some condition is satisfied.
1160* Do Statement:: Do specified action while looping until some
1161 condition is satisfied.
1162* For Statement:: Another looping statement, that provides
1163 initialization and increment clauses.
1164* Break Statement:: Immediately exit the innermost enclosing loop.
1165* Continue Statement:: Skip to the end of the innermost
1166 enclosing loop.
1167* Next Statement:: Stop processing the current input record.
1168* Next File Statement:: Stop processing the current file.
1169* Exit Statement:: Stop execution of `awk'.
1170
1171\1f
1172File: gawk.info, Node: If Statement, Next: While Statement, Prev: Statements, Up: Statements
1173
1174The `if' Statement
1175==================
1176
1177 The `if'-`else' statement is `awk''s decision-making statement. It
1178looks like this:
1179
1180 if (CONDITION) THEN-BODY [else ELSE-BODY]
1181
1182CONDITION is an expression that controls what the rest of the statement
1183will do. If CONDITION is true, THEN-BODY is executed; otherwise,
1184ELSE-BODY is executed (assuming that the `else' clause is present).
1185The `else' part of the statement is optional. The condition is
1186considered false if its value is zero or the null string, and true
1187otherwise.
1188
1189 Here is an example:
1190
1191 if (x % 2 == 0)
1192 print "x is even"
1193 else
1194 print "x is odd"
1195
1196 In this example, if the expression `x % 2 == 0' is true (that is,
1197the value of `x' is divisible by 2), then the first `print' statement
1198is executed, otherwise the second `print' statement is performed.
1199
1200 If the `else' appears on the same line as THEN-BODY, and THEN-BODY
1201is not a compound statement (i.e., not surrounded by curly braces),
1202then a semicolon must separate THEN-BODY from `else'. To illustrate
1203this, let's rewrite the previous example:
1204
1205 awk '{ if (x % 2 == 0) print "x is even"; else
1206 print "x is odd" }'
1207
1208If you forget the `;', `awk' won't be able to parse the statement, and
1209you will get a syntax error.
1210
1211 We would not actually write this example this way, because a human
1212reader might fail to see the `else' if it were not the first thing on
1213its line.
1214
1215\1f
1216File: gawk.info, Node: While Statement, Next: Do Statement, Prev: If Statement, Up: Statements
1217
1218The `while' Statement
1219=====================
1220
1221 In programming, a "loop" means a part of a program that is (or at
1222least can be) executed two or more times in succession.
1223
1224 The `while' statement is the simplest looping statement in `awk'.
1225It repeatedly executes a statement as long as a condition is true. It
1226looks like this:
1227
1228 while (CONDITION)
1229 BODY
1230
1231Here BODY is a statement that we call the "body" of the loop, and
1232CONDITION is an expression that controls how long the loop keeps
1233running.
1234
1235 The first thing the `while' statement does is test CONDITION. If
1236CONDITION is true, it executes the statement BODY. (CONDITION is true
1237when the value is not zero and not a null string.) After BODY has been
1238executed, CONDITION is tested again, and if it is still true, BODY is
1239executed again. This process repeats until CONDITION is no longer
1240true. If CONDITION is initially false, the body of the loop is never
1241executed.
1242
1243 This example prints the first three fields of each record, one per
1244line.
1245
1246 awk '{ i = 1
1247 while (i <= 3) {
1248 print $i
1249 i++
1250 }
1251 }'
1252
1253Here the body of the loop is a compound statement enclosed in braces,
1254containing two statements.
1255
1256 The loop works like this: first, the value of `i' is set to 1.
1257Then, the `while' tests whether `i' is less than or equal to three.
1258This is the case when `i' equals one, so the `i'-th field is printed.
1259Then the `i++' increments the value of `i' and the loop repeats. The
1260loop terminates when `i' reaches 4.
1261
1262 As you can see, a newline is not required between the condition and
1263the body; but using one makes the program clearer unless the body is a
1264compound statement or is very simple. The newline after the open-brace
1265that begins the compound statement is not required either, but the
1266program would be hard to read without it.
1267
1268\1f
1269File: gawk.info, Node: Do Statement, Next: For Statement, Prev: While Statement, Up: Statements
1270
1271The `do'-`while' Statement
1272==========================
1273
1274 The `do' loop is a variation of the `while' looping statement. The
1275`do' loop executes the BODY once, then repeats BODY as long as
1276CONDITION is true. It looks like this:
1277
1278 do
1279 BODY
1280 while (CONDITION)
1281
1282 Even if CONDITION is false at the start, BODY is executed at least
1283once (and only once, unless executing BODY makes CONDITION true).
1284Contrast this with the corresponding `while' statement:
1285
1286 while (CONDITION)
1287 BODY
1288
1289This statement does not execute BODY even once if CONDITION is false to
1290begin with.
1291
1292 Here is an example of a `do' statement:
1293
1294 awk '{ i = 1
1295 do {
1296 print $0
1297 i++
1298 } while (i <= 10)
1299 }'
1300
1301prints each input record ten times. It isn't a very realistic example,
1302since in this case an ordinary `while' would do just as well. But this
1303reflects actual experience; there is only occasionally a real use for a
1304`do' statement.
1305