exrefm/exrefma.n

.if !\n(xx .so tmac.e
.SH
Substitute replacement patterns
.PP
There are several metacharacters which may be used in substitute
replacement patterns.
As is the case for the regular expression metacharacters,
there are fewer replacement pattern metacharacters if
.I nomagic
is set.
This is discussed more below.
In fact,
with
.I nomagic
the only replacement pattern metacharacter is the escaping
`\e' (this is the default for \fIedit\fR).
.PP
The basic metacharacters for the replacement pattern are
`&' and `~'.
These are given as `\e&' and `\e~' when
.I nomagic
is set.
The metacharacter `&' is by far the most important of these.
Each instance of this metacharacter is replaced by the characters
which the regular expression matched.
Thus the substitute command
.DS
\fBsubstitute\fR/some/& other/
.DE
will replace the string `some' with the string `some other'
the first time it occurs on the current line.
The metacharacter `~' stands, in the replacement pattern,
as it did in regular expression formation,
for the defining text of the previous replacement pattern.
.PP
Other metasequences are possible in the replacement pattern,
and are introduced by the escaping character `\e';
this is the default for
.I edit .
The sequence `\e\fIn\fR' is replaced by the text matched
by the \fIn\fR-th regular subexpression enclosed between
`\e(' and `\e)'.\u\s-2\(dg\s0\d
.FS
\(dg When nested, parenthesized subexpressions are present,
\fIn\fR is determined by counting occurrences of `\e(' starting from the left.
.FE
The metasequences
`\eu', `\el', `\eU', `\eL', and `\eE' and `\ee'
are used to perform systematic case conversion of letters.
The sequences `\eu' and `\el' cause the immediately following character in
the replacement to be converted to upper- or lower-case respectively
if this character is a letter.
The sequences `\eU' and `\eL' turn such conversion on, either until
`\eE' or `\ee' is encountered, or until the end of the replacement pattern.
By bracketing selected portions of a regular expression with `\e('
and `\e)' and using `\eU' or `\eL' it is possible to systematically
capitalize entire words or phrases.
.SH
Regular expressions
.PP
.Ex
supports a form of regular expression notation.
A regular expression specifies a set of strings of characters.
A member of this set of strings is said to be
.I matched
by the regular expression.
Regular expressions may be used in locating or selecting lines
by their content,
in
.I open
and
.I visual
modes to position the cursor within the file,
and in the
.I substitute
command to select the portion of a line to be substituted.
.PP
.Ex
remembers two previous regular expressions:
the previous regular expression used in a
.I substitute
command
and the previous regular expression used elsewhere
(referred to as the previous \fIscanning\fR regular expression.)
The previous regular expression
can always be referred to by a null \fIre\fR, e.g. `//' or `??'.
.SH
Magic and nomagic
.PP
The regular expressions allowed by
.I ex
are constructed in one of two ways depending on the setting of
the
.I magic
option.
The
.I ex
default setting of
.I magic
gives quick access to a powerful set of regular expression
metacharacters.
The disadvantage of
.I magic
is that the user must remember that these metacharacters are
.I magic
and precede them with the character `\e'
to use them as ``ordinary'' characters.
With
.I nomagic ,
the default for
.I edit ,
regular expressions are much simpler,
there being only two metacharacters.
The power of the other metacharacters is still available by preceding
the (now) ordinary character with a `\e'.
Note that `\e' is thus always a metacharacter.
.PP
The remainder of the discussion of regular expressions assumes
that
that the setting of this option is
.I magic .
To discern what is true with
.I nomagic
it suffices to remember that the only
special characters in this case will be `\(ua' at the beginning
of a regular expression,
`$' at the end of a regular expression,
and `\e'.\u\s-2\(dd\s0\d
.FS
\(dd With
.I nomagic
the characters `\s+2~\s0' and `&' also lose their special meanings
related to the replacement pattern of a substitute.
.FE
.SH
Basic regular expression summary
.PP
The following basic constructs are used to construct
.I magic
mode regular expressions.
.TS H
allbox center;
c s
c l
c aw(65).
Basic regular expression forms
Form    Meaning
.TH
char    T{
An ordinary character which matches itself.
The character `\(ua' (`^') at the beginning of a line,
`$' at the end of line,
`*' as any character other than the first,
`.', `\e', `[', and `\s+2~\s0' are not ordinary characters and
must be escaped (preceded) by `\e' to be treated as such.
T}
\(ua    T{
Up-arrow (or circumflex `^') at the beginning of a pattern
forces the match to succeed only at the beginning of a line.
T}
$       T{
At the end of a regular expression forces the match to
succeed only at the end of the line.
T}
\&\fB.\fR       T{
A period character matches any single character except
the new-line character.
T}
\e<     T{
This sequence in a regular expression forces the match
to occur only at the beginning of a ``variable'' or ``word'';
that is, either at the beginning of a line, or just before
a letter, digit, or underline and after a character not one of
these.
T}
\e>     T{
Similar to `\e<', but matching the end of a ``variable''
or ``word'', i.e. either the end of the line or before character
which is neither a letter, nor a digit, nor the underline character.
T}
[\fIstring\fR]  T{
A string of characters enclosed in square brackets
matches any (single) character in the class defined by
.I string .
Most characters in
.I string
define themselves.
A pair of characters separated by `\-' in
.I string
defines the set of characters collating between the specified lower and upper
bounds, thus `[a\-z]' as a regular expression matches
any (single) lower-case letter.
If the first character of
.I string
is an `\(ua' or `^' then the construct
matches those characters which it otherwise would not;
thus `[^a\-z]' matches anything but a lower-case letter (and of course a
newline).
To place any of the characters
`\(ua', `^', `[', or `\-' in
.I string
you may escape them by preceding them with a `\e'.
T}
.TE
.PP
More complicated regular expressions are built by putting these simple pieces
together.
The concatenation of two regular expressions matches the longest string
which can be divided with the first piece matching the first regular
expression and the second piece matching the second.
Thus the regular expression `\fB..\fRe' will match any three characters
ending in the character `e',
while `^[aeiou]' matches any vowel which appears at the beginning of a line.
.PP
Any of the (single character matching) regular expressions mentioned
above may be followed by the character `*' to form a regular expression
which matches any number of adjacent occurrences (including 0) of characters
matched by the regular expression it follows.
The character `\s+2~\s0' may be used in a regular expression,
and matches the text which defined the replacement part
of the last
.I substitute
command.
A regular expression may be enclosed between the sequences
`\e(' and `\e)' with side effects in the
.I substitute
command,
and an escaped digit, e.g. `\e1',
matches the text which was matched by the corresponding previous
`\e(' and `\e)' bracketed expression,
numbered in order of occurrence of the `\e(' delimiters.
.bp