This commit was manufactured by cvs2svn to create tag 'FreeBSD-release/1.0'.
[unix-history] / usr.bin / elvis / doc / regexp.ms
CommitLineData
15637ed4
RG
1.Go 4 "REGULAR EXPRESSIONS"
2
3.PP
4\*E uses regular expressions for searching and substututions.
5A regular expression is a text string in which some characters have
6special meanings.
7This is much more powerful than simple text matching.
8.SH
9Syntax
10.PP
11\*E' regexp package treats the following one- or two-character
12strings (called meta-characters) in special ways:
78ed81a3 13.IP "\e(\fIsubexpression\fP\e)" 0.8i
14The \e( and \e) metacharacters are used to delimit subexpressions.
15637ed4
RG
15When the regular expression matches a particular chunk of text,
16\*E will remember which portion of that chunk matched the \fIsubexpression\fP.
17The :s/regexp/newtext/ command makes use of this feature.
18.IP "^" 0.8i
19The ^ metacharacter matches the beginning of a line.
20If, for example, you wanted to find "foo" at the beginning of a line,
21you would use a regular expression such as /^foo/.
22Note that ^ is only a metacharacter if it occurs
23at the beginning of a regular expression;
24anyplace else, it is treated as a normal character.
25.IP "$" 0.8i
26The $ metacharacter matches the end of a line.
27It is only a metacharacter when it occurs at the end of a regular expression;
28elsewhere, it is treated as a normal character.
29For example, the regular expression /$$/ will search for a dollar sign at
30the end of a line.
78ed81a3 31.IP "\e<" 0.8i
32The \e< metacharacter matches a zero-length string at the beginning of
15637ed4
RG
33a word.
34A word is considered to be a string of 1 or more letters and digits.
35A word can begin at the beginning of a line
36or after 1 or more non-alphanumeric characters.
78ed81a3 37.IP "\e>" 0.8i
38The \e> metacharacter matches a zero-length string at the end of a word.
15637ed4
RG
39A word can end at the end of the line
40or before 1 or more non-alphanumeric characters.
78ed81a3 41For example, /\e<end\e>/ would find any instance of the word "end",
15637ed4
RG
42but would ignore any instances of e-n-d inside another word
43such as "calendar".
44.IP "\&." 0.8i
45The . metacharacter matches any single character.
46.IP "[\fIcharacter-list\fP]" 0.8i
47This matches any single character from the \fIcharacter-list\fP.
48Inside the \fIcharacter-list\fP, you can denote a span of characters
49by writing only the first and last characters, with a hyphen between
50them.
51If the \fIcharacter-list\fP is preceded by a ^ character, then the
52list is inverted -- it will match character that \fIisn't\fP mentioned
53in the list.
54For example, /[a-zA-Z]/ matches any letter, and /[^ ]/ matches anything
55other than a blank.
78ed81a3 56.IP "\e{\fIn\fP\e}" 0.8i
15637ed4
RG
57This is a closure operator,
58which means that it can only be placed after something that matches a
59single character.
60It controls the number of times that the single-character expression
61should be repeated.
62.IP "" 0.8i
78ed81a3 63The \e{\fIn\fP\e} operator, in particular, means that the preceding
15637ed4 64expression should be repeated exactly \fIn\fP times.
78ed81a3 65For example, /^-\e{80\e}$/ matches a line of eighty hyphens, and
66/\e<[a-zA-Z]\e{4\e}\e>/ matches any four-letter word.
67.IP "\e{\fIn\fP,\fIm\fP\e}" 0.8i
15637ed4
RG
68This is a closure operator which means that the preceding single-character
69expression should be repeated between \fIn\fP and \fIm\fP times, inclusive.
70If the \fIm\fP is omitted (but the comma is present) then \fIm\fP is
71taken to be inifinity.
78ed81a3 72For example, /"[^"]\e{3,5\e}"/ matches any pair of quotes which contains
15637ed4
RG
73three, four, or five non-quote characters.
74.IP "*" 0.8i
75The * metacharacter is a closure operator which means that the preceding
76single-character expression can be repeated zero or more times.
78ed81a3 77It is equivelent to \e{0,\e}.
15637ed4 78For example, /.*/ matches a whole line.
78ed81a3 79.IP "\e+" 0.8i
80The \e+ metacharacter is a closure operator which means that the preceding
15637ed4 81single-character expression can be repeated one or more times.
78ed81a3 82It is equivelent to \e{1,\e}.
83For example, /.\e+/ matches a whole line, but only if the line contains
15637ed4
RG
84at least one character.
85It doesn't match empty lines.
78ed81a3 86.IP "\e?" 0.8i
87The \e? metacharacter is a closure operator which indicates that the
15637ed4
RG
88preceding single-character expression is optional -- that is, that it
89can occur 0 or 1 times.
78ed81a3 90It is equivelent to \e{0,1\e}.
91For example, /no[ -]\e?one/ matches "no one", "no-one", or "noone".
15637ed4
RG
92.PP
93Anything else is treated as a normal character which must exactly match
94a character from the scanned text.
95The special strings may all be preceded by a backslash to
96force them to be treated normally.
97.SH
98Substitutions
99.PP
100The :s command has at least two arguments: a regular expression,
101and a substitution string.
102The text that matched the regular expression is replaced by text
103which is derived from the substitution string.
104.br
105.ne 15 \" so we don't mess up the table
106.PP
107Most characters in the substitution string are copied into the
108text literally but a few have special meaning:
109.LD
110.ta 0.75i 1.3i
111 & Insert a copy of the original text
112 ~ Insert a copy of the previous replacement text
78ed81a3 113 \e1 Insert a copy of that portion of the original text which
114 matched the first set of \e( \e) parentheses
115 \e2-\e9 Do the same for the second (etc.) pair of \e( \e)
116 \eU Convert all chars of any later & or \e# to uppercase
117 \eL Convert all chars of any later & or \e# to lowercase
118 \eE End the effect of \eU or \eL
119 \eu Convert the first char of the next & or \e# to uppercase
120 \el Convert the first char of the next & or \e# to lowercase
15637ed4
RG
121.TA
122.DE
123.PP
124These may be preceded by a backslash to force them to be treated normally.
125If "nomagic" mode is in effect,
126then & and ~ will be treated normally,
78ed81a3 127and you must write them as \e& and \e~ for them to have special meaning.
15637ed4
RG
128.SH
129Options
130.PP
131\*E has two options which affect the way regular expressions are used.
132These options may be examined or set via the :set command.
133.PP
134The first option is called "[no]magic".
135This is a boolean option, and it is "magic" (TRUE) by default.
136While in magic mode, all of the meta-characters behave as described above.
137In nomagic mode, only ^ and $ retain their special meaning.
138.PP
139The second option is called "[no]ignorecase".
140This is a boolean option, and it is "noignorecase" (FALSE) by default.
141While in ignorecase mode, the searching mechanism will not distinguish between
142an uppercase letter and its lowercase form.
143In noignorecase mode, uppercase and lowercase are treated as being different.
144.PP
145Also, the "[no]wrapscan" option affects searches.
146.SH
147Examples
148.PP
149This example changes every occurence of "utilize" to "use":
150.sp
151.ti +1i
152:%s/utilize/use/g
153.PP
154This example deletes all whitespace that occurs at the end of a line anywhere
155in the file.
156(The brackets contain a single space and a single tab.):
157.sp
158.ti +1i
78ed81a3 159:%s/[ ]\e+$//
15637ed4
RG
160.PP
161This example converts the current line to uppercase:
162.sp
163.ti +1i
78ed81a3 164:s/.*/\eU&/
15637ed4
RG
165.PP
166This example underlines each letter in the current line,
167by changing it into an "underscore backspace letter" sequence.
168(The ^H is entered as "control-V backspace".):
169.sp
170.ti +1i
171:s/[a-zA-Z]/_^H&/g
172.PP
173This example locates the last colon in a line,
174and swaps the text before the colon with the text after the colon.
78ed81a3 175The first \e( \e) pair is used to delimit the stuff before the colon,
15637ed4 176and the second pair delimit the stuff after.
78ed81a3 177In the substitution text, \e1 and \e2 are given in reverse order
15637ed4
RG
178to perform the swap:
179.sp
180.ti +1i
78ed81a3 181:s/\e(.*\e):\e(.*\e)/\e2:\e1/