.if n .ta 5 10 15 20 25 30 35 40 45 50 55 60
.if t .ta .4i .8i 1.2i 1.6i 2i 2.4i 2.8i 3.2i 3.6i 4i 4.4i 4.8i 5.2i 5.6i
.if t .tr -\(mi|\(bv'\(fm^\(no*\(**
. \"2=not last lines; 4= no -xx; 8=no xx-
. \"special chars in programs
.....TM 77-1273-6 39199 39199-11
M4 is a macro processor available on
Its primary use has been as a
front end for Ratfor for those
cases where parameterless macros
are not adequately powerful.
It has also been used for languages as disparate as C and Cobol.
M4 is particularly suited for functional languages like Fortran, PL/I and C
since macros are specified in a functional notation.
M4 provides features seldom found even in much larger
string and substring functions
This paper is a user's manual for M4.
A macro processor is a useful way to enhance a programming language,
to make it more palatable
or to tailor it to a particular application.
are examples of the basic facility provided by
replacement of text by other text.
The M4 macro processor is an extension of a macro processor called M3
which was written by D. M. Ritchie
for the AP-3 minicomputer;
M3 was in turn based on a macro processor implemented for [1].
Readers unfamiliar with the basic ideas of macro processing
may wish to read some of the discussion there.
M4 is a suitable front end for Ratfor and C,
and has also been used successfully with Cobol.
Besides the straightforward replacement of one string of text by another,
conditional macro expansion,
and some specialized string processing functions.
The basic operation of M4
is to copy its input to its output.
As the input is read, however, each alphanumeric ``token''
(that is, string of letters and digits) is checked.
If it is the name of a macro,
then the name of the macro is replaced by its defining text,
and the resulting string is pushed back onto the
Macros may be called with arguments, in which case the arguments are collected
and substituted into the right places in the defining text
M4 provides a collection of about twenty built-in
which perform various useful operations;
in addition, the user can define new macros.
Built-ins and user-defined macros work exactly the same way, except that
some of the built-in macros have side effects
on the state of the process.
Each argument file is processed in order;
if there are no arguments, or if an argument
the standard input is read at that point.
The processed text is written on the standard output,
which may be captured for subsequent processing with
usage is identical, but the program is called
The primary built-in function of M4
which is used to define new macros.
All subsequent occurrences of
must be alphanumeric and must begin with a letter
(the underscore \(ul counts as a letter).
is any text that contains balanced parentheses;
it may stretch over multiple lines.
Thus, as a typical example,
to be 100, and uses this ``symbolic constant'' in a later
The left parenthesis must immediately follow the word
If a macro or built-in name is not followed immediately by `(',
it is assumed to have no arguments.
This is the situation for
it is actually a macro with no arguments,
and thus when it is used there need be no (...) following it.
You should also notice that a macro name is only recognized as such
if it appears surrounded by non-alphanumerics.
is absolutely unrelated to the defined macro
even though it contains a lot of
Things may be defined in terms of other things.
defines both M and N to be 100.
Or, to say it another way, is
This behavior arises because
M4 expands macro names into their defining text as soon as it possibly can.
Here, that means that when the string
is seen as the arguments of
are being collected, it is immediately replaced by 100;
it's just as if you had said
If this isn't what you really want, there are two ways out of it.
The first, which is specific to this situation,
is to interchange the order of the definitions:
is defined to be the string
later, you'll always get the value of
which will be replaced by 100).
The more general solution is to delay the expansion of
Any text surrounded by the single quotes \(ga and \(aa
is not expanded immediately, but has the quotes stripped off.
are stripped off as the argument is being collected,
but they have served their purpose, and
The general rule is that M4 always strips off
one level of single quotes whenever it evaluates
This is true even outside of
you have to quote it in the input,
As another instance of the same thing, which is a bit more surprising,
in the second definition is
evaluated as soon as it's seen;
100, so it's as if you had written
This statement is ignored by M4, since you can only define things that look
like names, but it obviously doesn't have the effect you wanted.
you must delay the evaluation by quoting:
it is often wise to quote the first argument of a macro.
If \` and \' are not convenient for some reason,
the quote characters can be changed with the built-in
makes the new quote characters the left and right brackets.
You can restore the original characters with just
There are two additional built-ins related to
removes the definition of some macro or built-in:
removes the definition of
(Why are the quotes absolutely necessary?)
Built-ins can be removed with
but once you remove one, you can never get it back.
provides a way to determine if a macro is currently defined.
In particular, M4 has pre-defined the names
on the corresponding systems, so you can
tell which one you're using:
ifdef(`unix', `define(wordsize,16)' )
ifdef(`gcos', `define(wordsize,36)' )
makes a definition appropriate for the particular machine.
actually permits three arguments;
if the name is undefined, the value of
is then the third argument, as in
ifdef(`unix', on UNIX, not on UNIX)
So far we have discussed the simplest form of macro processing _
replacing one string by another (fixed) string.
User-defined macros may also have arguments, so different invocations
can have different results.
Within the replacement text for a macro
(the second argument of its
define(bump, $1 = $1 + 1)
generates code to increment its argument by 1:
A macro can have as many arguments as you want,
but only the first nine are accessible,
(The macro name itself is
although that is less commonly used.)
Arguments that are not supplied are replaced by null strings,
which simply concatenates its arguments, like this:
define(cat, $1$2$3$4$5$6$7$8$9)
are null, since no corresponding arguments were provided.
Leading unquoted blanks, tabs, or newlines that occur during argument collection
All other white space is retained.
Arguments are separated by commas, but parentheses are counted properly,
so a comma ``protected'' by parentheses does not terminate an argument.
there are only two arguments;
And of course a bare comma or parenthesis can be inserted by quoting it.
M4 provides two built-in functions for doing arithmetic
which increments its numeric argument by 1.
Thus to handle the common programming situation
where you want a variable to be defined as ``one more than N'',
is defined as one more than the current value of
The more general mechanism for arithmetic is a built-in
which is capable of arbitrary arithmetic on integers.
It provides the operators
(in decreasing order of precedence)
\(or or \(or\(or (logical or)
Parentheses may be used to group operations where needed.
must ultimately be numeric.
The numeric value of a true relation
As a simple example, suppose we want
define(M, `eval(2**N+1)')
As a matter of principle, it is advisable
to quote the defining text for a macro
unless it is very simple indeed
it usually gives the result you want,
and is a good habit to get into.
You can include a new file in the input at any time by
The contents of the file is often a set of definitions.
(that is, its replacement text)
is the contents of the file;
this can be captured in definitions, etc.
It is a fatal error if the file named in
To get some control over this situation, the alternate form
says nothing and continues if it can't access the file.
It is also possible to divert the output of M4 to temporary files during processing,
and output the collected material upon command.
M4 maintains nine of these diversions, numbered 1 through 9.
all subsequent output is put onto the end of a temporary file
Diverting to this file is stopped by another
resumes the normal output process.
Diverted text is normally output all at once
at the end of processing,
with the diversions output in numeric order.
It is possible, however, to bring back diversions
that is, to append them to the current diversion.
brings back all diversions in numeric order, and
with arguments brings back the selected diversions
The act of undiverting discards the diverted stuff,
as does diverting into a diversion
whose number is not between 0 and 9 inclusive.
Furthermore, the diverted material is
returns the number of the currently active diversion.
This is zero during normal processing.
You can run any program in the local operating system
would be used to create a file
To facilitate making unique file names, the built-in
is provided, with specifications identical to the system function
a string of XXXXX in the argument is replaced
by the process id of the current process.
There is a built-in called
which enables you to perform arbitrary conditional testing.
Thus we might define a macro called
which compares two strings and returns ``yes'' or ``no''
if they are the same or different.
define(compare, `ifelse($1, $2, yes, no)')
which prevent too-early evaluation of
If the fourth argument is missing, it is treated as empty.
can actually have any number of arguments,
and thus provides a limited form of multi-way decision capability.
ifelse(a, b, c, d, e, f, g)
is omitted, the result is null,
returns the length of the string that makes up its argument.
can be used to produce substrings of strings.
is omitted, the rest of the string is returned,
substr(`now is the time', 1)
are out of range, various sensible things happen.
returns the index (position) in
the origin for strings is 0.
performs character transliteration.
by replacing any character found in
by the corresponding character of
translit(s, aeiou, 12345)
replaces the vowels by the corresponding digits.
characters which don't have an entry in
are deleted; as a limiting case,
There is also a built-in called
which deletes all characters that follow it up to
and including the next newline;
it is useful mainly for throwing away
empty lines that otherwise tend to clutter up M4 output.
the newline at the end of each line is not part of the definition,
so it is copied into the output, where it may not be wanted.
to each of these lines, the newlines will disappear.
Another way to achieve this, due to J. E. Weythman,
writes its arguments out on the standard error file.
dumps the current definitions of defined terms.
If there are no arguments, you get everything;
otherwise you get the ones you name as arguments.
Don't forget to quote the names!
Each entry is preceded by the
page number where it is described.
1 define(name, replacement)
5 dumpdef(`name', `name', ...)
4 eval(numeric expression)
3 ifdef(`name', this if true, this if false)
5 substr(string, position, number)
5 translit(str, from, to)
4 undivert(number,number,...)
We are indebted to Rick Becker, John Chambers,
and especially Jim Weythman,
whose pioneering use of M4 has led to several valuable improvements.
We are also deeply grateful to Weythman for several substantial contributions
B. W. Kernighan and P. J. Plauger,
Addison-Wesley, Inc., 1976.