BSD 3 development
[unix-history] / usr / doc / m4
CommitLineData
2074ceed
BJ
1.if n .ls 2
2.tr _\(em
3.tr *\(**
4.de UC
5\&\\$3\s-1\\$1\\s0\&\\$2
6..
7.de IT
8.if n .ul
9\&\\$3\f2\\$1\fP\&\\$2
10..
11.de UL
12.if n .ul
13\&\\$3\f3\\$1\fP\&\\$2
14..
15.de P1
16.DS I 3n
17.if n .ls 2
18.nf
19.if n .ta 5 10 15 20 25 30 35 40 45 50 55 60
20.if t .ta .4i .8i 1.2i 1.6i 2i 2.4i 2.8i 3.2i 3.6i 4i 4.4i 4.8i 5.2i 5.6i
21.if t .tr -\(mi|\(bv'\(fm^\(no*\(**
22.tr `\(ga'\(aa
23.if t .tr _\(ul
24.ft 3
25.lg 0
26..
27.de P2
28.ps \\n(PS
29.vs \\n(VSp
30.ft R
31.if n .ls 2
32.tr --||''^^!!
33.if t .tr _\(em
34.fi
35.lg
36.DE
37.if t .tr _\(em
38..
39.hw semi-colon
40.hw estab-lished
41.hy 14
42. \"2=not last lines; 4= no -xx; 8=no xx-
43. \"special chars in programs
44. \" start of text
45.RP
46.....TR 59
47.....TM 77-1273-6 39199 39199-11
48.ND "July 1, 1977"
49.TL
50The M4 Macro Processor
51.AU "MH 2C-518" 6021
52Brian W. Kernighan
53.AU "MH 2C-517" 3770
54Dennis M. Ritchie
55.AI
56.MH
57.AB
58.PP
59M4 is a macro processor available on
60.UX
61and
62.UC GCOS .
63Its primary use has been as a
64front end for Ratfor for those
65cases where parameterless macros
66are not adequately powerful.
67It has also been used for languages as disparate as C and Cobol.
68M4 is particularly suited for functional languages like Fortran, PL/I and C
69since macros are specified in a functional notation.
70.PP
71M4 provides features seldom found even in much larger
72macro processors,
73including
74.IP " \(bu"
75arguments
76.IP " \(bu"
77condition testing
78.IP " \(bu"
79arithmetic capabilities
80.IP " \(bu"
81string and substring functions
82.IP " \(bu"
83file manipulation
84.LP
85.PP
86This paper is a user's manual for M4.
87.AE
88.CS 6 0 6 0 0 1
89.if t .2C
90.SH
91Introduction
92.PP
93A macro processor is a useful way to enhance a programming language,
94to make it more palatable
95or more readable,
96or to tailor it to a particular application.
97The
98.UL #define
99statement in C
100and the analogous
101.UL define
102in Ratfor
103are examples of the basic facility provided by
104any macro processor _
105replacement of text by other text.
106.PP
107The M4 macro processor is an extension of a macro processor called M3
108which was written by D. M. Ritchie
109for the AP-3 minicomputer;
110M3 was in turn based on a macro processor implemented for [1].
111Readers unfamiliar with the basic ideas of macro processing
112may wish to read some of the discussion there.
113.PP
114M4 is a suitable front end for Ratfor and C,
115and has also been used successfully with Cobol.
116Besides the straightforward replacement of one string of text by another,
117it provides
118macros with arguments,
119conditional macro expansion,
120arithmetic,
121file manipulation,
122and some specialized string processing functions.
123.PP
124The basic operation of M4
125is to copy its input to its output.
126As the input is read, however, each alphanumeric ``token''
127(that is, string of letters and digits) is checked.
128If it is the name of a macro,
129then the name of the macro is replaced by its defining text,
130and the resulting string is pushed back onto the
131input to be rescanned.
132Macros may be called with arguments, in which case the arguments are collected
133and substituted into the right places in the defining text
134before it is rescanned.
135.PP
136M4 provides a collection of about twenty built-in
137macros
138which perform various useful operations;
139in addition, the user can define new macros.
140Built-ins and user-defined macros work exactly the same way, except that
141some of the built-in macros have side effects
142on the state of the process.
143.SH
144Usage
145.PP
146On
147.UC UNIX ,
148use
149.P1
150m4 [files]
151.P2
152Each argument file is processed in order;
153if there are no arguments, or if an argument
154is `\-',
155the standard input is read at that point.
156The processed text is written on the standard output,
157which may be captured for subsequent processing with
158.P1
159m4 [files] >outputfile
160.P2
161On
162.UC GCOS ,
163usage is identical, but the program is called
164.UL \&./m4 .
165.SH
166Defining Macros
167.PP
168The primary built-in function of M4
169is
170.UL define ,
171which is used to define new macros.
172The input
173.P1
174define(name, stuff)
175.P2
176causes the string
177.UL name
178to be defined as
179.UL stuff .
180All subsequent occurrences of
181.UL name
182will be replaced by
183.UL stuff .
184.UL name
185must be alphanumeric and must begin with a letter
186(the underscore \(ul counts as a letter).
187.UL stuff
188is any text that contains balanced parentheses;
189it may stretch over multiple lines.
190.PP
191Thus, as a typical example,
192.P1
193define(N, 100)
194 ...
195if (i > N)
196.P2
197defines
198.UL N
199to be 100, and uses this ``symbolic constant'' in a later
200.UL if
201statement.
202.PP
203The left parenthesis must immediately follow the word
204.UL define ,
205to signal that
206.UL define
207has arguments.
208If a macro or built-in name is not followed immediately by `(',
209it is assumed to have no arguments.
210This is the situation for
211.UL N
212above;
213it is actually a macro with no arguments,
214and thus when it is used there need be no (...) following it.
215.PP
216You should also notice that a macro name is only recognized as such
217if it appears surrounded by non-alphanumerics.
218For example, in
219.P1
220define(N, 100)
221 ...
222if (NNN > 100)
223.P2
224the variable
225.UL NNN
226is absolutely unrelated to the defined macro
227.UL N ,
228even though it contains a lot of
229.UL N 's.
230.PP
231Things may be defined in terms of other things.
232For example,
233.P1
234define(N, 100)
235define(M, N)
236.P2
237defines both M and N to be 100.
238.PP
239What happens if
240.UL N
241is redefined?
242Or, to say it another way, is
243.UL M
244defined as
245.UL N
246or as 100?
247In M4,
248the latter is true _
249.UL M
250is 100, so even if
251.UL N
252subsequently changes,
253.UL M
254does not.
255.PP
256This behavior arises because
257M4 expands macro names into their defining text as soon as it possibly can.
258Here, that means that when the string
259.UL N
260is seen as the arguments of
261.UL define
262are being collected, it is immediately replaced by 100;
263it's just as if you had said
264.P1
265define(M, 100)
266.P2
267in the first place.
268.PP
269If this isn't what you really want, there are two ways out of it.
270The first, which is specific to this situation,
271is to interchange the order of the definitions:
272.P1
273define(M, N)
274define(N, 100)
275.P2
276Now
277.UL M
278is defined to be the string
279.UL N ,
280so when you ask for
281.UL M
282later, you'll always get the value of
283.UL N
284at that time
285(because the
286.UL M
287will be replaced by
288.UL N
289which will be replaced by 100).
290.SH
291Quoting
292.PP
293The more general solution is to delay the expansion of
294the arguments of
295.UL define
296by
297.ul
298quoting
299them.
300Any text surrounded by the single quotes \(ga and \(aa
301is not expanded immediately, but has the quotes stripped off.
302If you say
303.P1
304define(N, 100)
305define(M, `N')
306.P2
307the quotes around the
308.UL N
309are stripped off as the argument is being collected,
310but they have served their purpose, and
311.UL M
312is defined as
313the string
314.UL N ,
315not 100.
316The general rule is that M4 always strips off
317one level of single quotes whenever it evaluates
318something.
319This is true even outside of
320macros.
321If you want the word
322.UL define
323to appear in the output,
324you have to quote it in the input,
325as in
326.P1
327 `define' = 1;
328.P2
329.PP
330As another instance of the same thing, which is a bit more surprising,
331consider redefining
332.UL N :
333.P1
334define(N, 100)
335 ...
336define(N, 200)
337.P2
338Perhaps regrettably, the
339.UL N
340in the second definition is
341evaluated as soon as it's seen;
342that is, it is
343replaced by
344100, so it's as if you had written
345.P1
346define(100, 200)
347.P2
348This statement is ignored by M4, since you can only define things that look
349like names, but it obviously doesn't have the effect you wanted.
350To really redefine
351.UL N ,
352you must delay the evaluation by quoting:
353.P1
354define(N, 100)
355 ...
356define(`N', 200)
357.P2
358In M4,
359it is often wise to quote the first argument of a macro.
360.PP
361If \` and \' are not convenient for some reason,
362the quote characters can be changed with the built-in
363.UL changequote :
364.P1
365changequote([, ])
366.P2
367makes the new quote characters the left and right brackets.
368You can restore the original characters with just
369.P1
370changequote
371.P2
372.PP
373There are two additional built-ins related to
374.UL define .
375.UL undefine
376removes the definition of some macro or built-in:
377.P1
378undefine(`N')
379.P2
380removes the definition of
381.UL N .
382(Why are the quotes absolutely necessary?)
383Built-ins can be removed with
384.UL undefine ,
385as in
386.P1
387undefine(`define')
388.P2
389but once you remove one, you can never get it back.
390.PP
391The built-in
392.UL ifdef
393provides a way to determine if a macro is currently defined.
394In particular, M4 has pre-defined the names
395.UL unix
396and
397.UL gcos
398on the corresponding systems, so you can
399tell which one you're using:
400.P1
401ifdef(`unix', `define(wordsize,16)' )
402ifdef(`gcos', `define(wordsize,36)' )
403.P2
404makes a definition appropriate for the particular machine.
405Don't forget the quotes!
406.PP
407.UL ifdef
408actually permits three arguments;
409if the name is undefined, the value of
410.UL ifdef
411is then the third argument, as in
412.P1
413ifdef(`unix', on UNIX, not on UNIX)
414.P2
415.SH
416Arguments
417.PP
418So far we have discussed the simplest form of macro processing _
419replacing one string by another (fixed) string.
420User-defined macros may also have arguments, so different invocations
421can have different results.
422Within the replacement text for a macro
423(the second argument of its
424.UL define )
425any occurrence of
426.UL $n
427will be replaced by the
428.UL n th
429argument when the macro
430is actually used.
431Thus, the macro
432.UL bump ,
433defined as
434.P1
435define(bump, $1 = $1 + 1)
436.P2
437generates code to increment its argument by 1:
438.P1
439bump(x)
440.P2
441is
442.P1
443x = x + 1
444.P2
445.PP
446A macro can have as many arguments as you want,
447but only the first nine are accessible,
448through
449.UL $1
450to
451.UL $9 .
452(The macro name itself is
453.UL $0 ,
454although that is less commonly used.)
455Arguments that are not supplied are replaced by null strings,
456so
457we can define a macro
458.UL cat
459which simply concatenates its arguments, like this:
460.P1
461define(cat, $1$2$3$4$5$6$7$8$9)
462.P2
463Thus
464.P1
465cat(x, y, z)
466.P2
467is equivalent to
468.P1
469xyz
470.P2
471.UL $4
472through
473.UL $9
474are null, since no corresponding arguments were provided.
475.PP
476.PP
477Leading unquoted blanks, tabs, or newlines that occur during argument collection
478are discarded.
479All other white space is retained.
480Thus
481.P1
482define(a, b c)
483.P2
484defines
485.UL a
486to be
487.UL b\ \ \ c .
488.PP
489Arguments are separated by commas, but parentheses are counted properly,
490so a comma ``protected'' by parentheses does not terminate an argument.
491That is, in
492.P1
493define(a, (b,c))
494.P2
495there are only two arguments;
496the second is literally
497.UL (b,c) .
498And of course a bare comma or parenthesis can be inserted by quoting it.
499.SH
500Arithmetic Built-ins
501.PP
502M4 provides two built-in functions for doing arithmetic
503on integers (only).
504The simplest is
505.UL incr ,
506which increments its numeric argument by 1.
507Thus to handle the common programming situation
508where you want a variable to be defined as ``one more than N'',
509write
510.P1
511define(N, 100)
512define(N1, `incr(N)')
513.P2
514Then
515.UL N1
516is defined as one more than the current value of
517.UL N .
518.PP
519The more general mechanism for arithmetic is a built-in
520called
521.UL eval ,
522which is capable of arbitrary arithmetic on integers.
523It provides the operators
524(in decreasing order of precedence)
525.DS
526unary + and \(mi
527** or ^ (exponentiation)
528* / % (modulus)
529+ \(mi
530== != < <= > >=
531! (not)
532& or && (logical and)
533\(or or \(or\(or (logical or)
534.DE
535Parentheses may be used to group operations where needed.
536All the operands of
537an expression given to
538.UL eval
539must ultimately be numeric.
540The numeric value of a true relation
541(like 1>0)
542is 1, and false is 0.
543The precision in
544.UL eval
545is
54632 bits on
547.UC UNIX
548and 36 bits on
549.UC GCOS .
550.PP
551As a simple example, suppose we want
552.UL M
553to be
554.UL 2**N+1 .
555Then
556.P1
557define(N, 3)
558define(M, `eval(2**N+1)')
559.P2
560As a matter of principle, it is advisable
561to quote the defining text for a macro
562unless it is very simple indeed
563(say just a number);
564it usually gives the result you want,
565and is a good habit to get into.
566.SH
567File Manipulation
568.PP
569You can include a new file in the input at any time by
570the built-in function
571.UL include :
572.P1
573include(filename)
574.P2
575inserts the contents of
576.UL filename
577in place of the
578.UL include
579command.
580The contents of the file is often a set of definitions.
581The value
582of
583.UL include
584(that is, its replacement text)
585is the contents of the file;
586this can be captured in definitions, etc.
587.PP
588It is a fatal error if the file named in
589.UL include
590cannot be accessed.
591To get some control over this situation, the alternate form
592.UL sinclude
593can be used;
594.UL sinclude
595(``silent include'')
596says nothing and continues if it can't access the file.
597.PP
598It is also possible to divert the output of M4 to temporary files during processing,
599and output the collected material upon command.
600M4 maintains nine of these diversions, numbered 1 through 9.
601If you say
602.P1
603divert(n)
604.P2
605all subsequent output is put onto the end of a temporary file
606referred to as
607.UL n .
608Diverting to this file is stopped by another
609.UL divert
610command;
611in particular,
612.UL divert
613or
614.UL divert(0)
615resumes the normal output process.
616.PP
617Diverted text is normally output all at once
618at the end of processing,
619with the diversions output in numeric order.
620It is possible, however, to bring back diversions
621at any time,
622that is, to append them to the current diversion.
623.P1
624undivert
625.P2
626brings back all diversions in numeric order, and
627.UL undivert
628with arguments brings back the selected diversions
629in the order given.
630The act of undiverting discards the diverted stuff,
631as does diverting into a diversion
632whose number is not between 0 and 9 inclusive.
633.PP
634The value of
635.UL undivert
636is
637.ul
638not
639the diverted stuff.
640Furthermore, the diverted material is
641.ul
642not
643rescanned for macros.
644.PP
645The built-in
646.UL divnum
647returns the number of the currently active diversion.
648This is zero during normal processing.
649.SH
650System Command
651.PP
652You can run any program in the local operating system
653with the
654.UL syscmd
655built-in.
656For example,
657.P1
658syscmd(date)
659.P2
660on
661.UC UNIX
662runs the
663.UL date
664command.
665Normally
666.UL syscmd
667would be used to create a file
668for a subsequent
669.UL include .
670.PP
671To facilitate making unique file names, the built-in
672.UL maketemp
673is provided, with specifications identical to the system function
674.ul
675mktemp:
676a string of XXXXX in the argument is replaced
677by the process id of the current process.
678.SH
679Conditionals
680.PP
681There is a built-in called
682.UL ifelse
683which enables you to perform arbitrary conditional testing.
684In the simplest form,
685.P1
686ifelse(a, b, c, d)
687.P2
688compares the two strings
689.UL a
690and
691.UL b .
692If these are identical,
693.UL ifelse
694returns
695the string
696.UL c ;
697otherwise it returns
698.UL d .
699Thus we might define a macro called
700.UL compare
701which compares two strings and returns ``yes'' or ``no''
702if they are the same or different.
703.P1
704define(compare, `ifelse($1, $2, yes, no)')
705.P2
706Note the quotes,
707which prevent too-early evaluation of
708.UL ifelse .
709.PP
710If the fourth argument is missing, it is treated as empty.
711.PP
712.UL ifelse
713can actually have any number of arguments,
714and thus provides a limited form of multi-way decision capability.
715In the input
716.P1
717ifelse(a, b, c, d, e, f, g)
718.P2
719if the string
720.UL a
721matches the string
722.UL b ,
723the result is
724.UL c .
725Otherwise, if
726.UL d
727is the same as
728.UL e ,
729the result is
730.UL f .
731Otherwise the result is
732.UL g .
733If the final argument
734is omitted, the result is null,
735so
736.P1
737ifelse(a, b, c)
738.P2
739is
740.UL c
741if
742.UL a
743matches
744.UL b ,
745and null otherwise.
746.SH
747String Manipulation
748.PP
749The built-in
750.UL len
751returns the length of the string that makes up its argument.
752Thus
753.P1
754len(abcdef)
755.P2
756is 6, and
757.UL len((a,b))
758is 5.
759.PP
760The built-in
761.UL substr
762can be used to produce substrings of strings.
763.UL substr(s,\ i,\ n)
764returns the substring of
765.UL s
766that starts at the
767.UL i th
768position
769(origin zero),
770and is
771.UL n
772characters long.
773If
774.UL n
775is omitted, the rest of the string is returned,
776so
777.P1
778substr(`now is the time', 1)
779.P2
780is
781.P1
782ow is the time
783.P2
784If
785.UL i
786or
787.UL n
788are out of range, various sensible things happen.
789.PP
790.UL index(s1,\ s2)
791returns the index (position) in
792.UL s1
793where the string
794.UL s2
795occurs, or \-1
796if it doesn't occur.
797As with
798.UL substr ,
799the origin for strings is 0.
800.PP
801The built-in
802.UL translit
803performs character transliteration.
804.P1
805translit(s, f, t)
806.P2
807modifies
808.UL s
809by replacing any character found in
810.UL f
811by the corresponding character of
812.UL t .
813That is,
814.P1
815translit(s, aeiou, 12345)
816.P2
817replaces the vowels by the corresponding digits.
818If
819.UL t
820is shorter than
821.UL f ,
822characters which don't have an entry in
823.UL t
824are deleted; as a limiting case,
825if
826.UL t
827is not present at all,
828characters from
829.UL f
830are deleted from
831.UL s .
832So
833.P1
834translit(s, aeiou)
835.P2
836deletes vowels from
837.UL s .
838.PP
839There is also a built-in called
840.UL dnl
841which deletes all characters that follow it up to
842and including the next newline;
843it is useful mainly for throwing away
844empty lines that otherwise tend to clutter up M4 output.
845For example, if you say
846.P1
847define(N, 100)
848define(M, 200)
849define(L, 300)
850.P2
851the newline at the end of each line is not part of the definition,
852so it is copied into the output, where it may not be wanted.
853If you add
854.UL dnl
855to each of these lines, the newlines will disappear.
856.PP
857Another way to achieve this, due to J. E. Weythman,
858is
859.P1
860divert(-1)
861 define(...)
862 ...
863divert
864.P2
865.SH
866Printing
867.PP
868The built-in
869.UL errprint
870writes its arguments out on the standard error file.
871Thus you can say
872.P1
873errprint(`fatal error')
874.P2
875.PP
876.UL dumpdef
877is a debugging aid which
878dumps the current definitions of defined terms.
879If there are no arguments, you get everything;
880otherwise you get the ones you name as arguments.
881Don't forget to quote the names!
882.SH
883Summary of Built-ins
884.PP
885Each entry is preceded by the
886page number where it is described.
887.DS
888.tr '\'`\`
889.ta .25i
8903 changequote(L, R)
8911 define(name, replacement)
8924 divert(number)
8934 divnum
8945 dnl
8955 dumpdef(`name', `name', ...)
8965 errprint(s, s, ...)
8974 eval(numeric expression)
8983 ifdef(`name', this if true, this if false)
8995 ifelse(a, b, c, d)
9004 include(file)
9013 incr(number)
9025 index(s1, s2)
9035 len(string)
9044 maketemp(...XXXXX...)
9054 sinclude(file)
9065 substr(string, position, number)
9074 syscmd(s)
9085 translit(str, from, to)
9093 undefine(`name')
9104 undivert(number,number,...)
911.DE
912.SH
913Acknowledgements
914.PP
915We are indebted to Rick Becker, John Chambers,
916Doug McIlroy,
917and especially Jim Weythman,
918whose pioneering use of M4 has led to several valuable improvements.
919We are also deeply grateful to Weythman for several substantial contributions
920to the code.
921.SG
922.SH
923References
924.LP
925.IP [1]
926B. W. Kernighan and P. J. Plauger,
927.ul
928Software Tools,
929Addison-Wesley, Inc., 1976.