Commit | Line | Data |
---|---|---|
15637ed4 RG |
1 | .Go 4 "REGULAR EXPRESSIONS" |
2 | ||
3 | .PP | |
4 | \*E uses regular expressions for searching and substututions. | |
5 | A regular expression is a text string in which some characters have | |
6 | special meanings. | |
7 | This is much more powerful than simple text matching. | |
8 | .SH | |
9 | Syntax | |
10 | .PP | |
11 | \*E' regexp package treats the following one- or two-character | |
12 | strings (called meta-characters) in special ways: | |
78ed81a3 | 13 | .IP "\e(\fIsubexpression\fP\e)" 0.8i |
14 | The \e( and \e) metacharacters are used to delimit subexpressions. | |
15637ed4 RG |
15 | When the regular expression matches a particular chunk of text, |
16 | \*E will remember which portion of that chunk matched the \fIsubexpression\fP. | |
17 | The :s/regexp/newtext/ command makes use of this feature. | |
18 | .IP "^" 0.8i | |
19 | The ^ metacharacter matches the beginning of a line. | |
20 | If, for example, you wanted to find "foo" at the beginning of a line, | |
21 | you would use a regular expression such as /^foo/. | |
22 | Note that ^ is only a metacharacter if it occurs | |
23 | at the beginning of a regular expression; | |
24 | anyplace else, it is treated as a normal character. | |
25 | .IP "$" 0.8i | |
26 | The $ metacharacter matches the end of a line. | |
27 | It is only a metacharacter when it occurs at the end of a regular expression; | |
28 | elsewhere, it is treated as a normal character. | |
29 | For example, the regular expression /$$/ will search for a dollar sign at | |
30 | the end of a line. | |
78ed81a3 | 31 | .IP "\e<" 0.8i |
32 | The \e< metacharacter matches a zero-length string at the beginning of | |
15637ed4 RG |
33 | a word. |
34 | A word is considered to be a string of 1 or more letters and digits. | |
35 | A word can begin at the beginning of a line | |
36 | or after 1 or more non-alphanumeric characters. | |
78ed81a3 | 37 | .IP "\e>" 0.8i |
38 | The \e> metacharacter matches a zero-length string at the end of a word. | |
15637ed4 RG |
39 | A word can end at the end of the line |
40 | or before 1 or more non-alphanumeric characters. | |
78ed81a3 | 41 | For example, /\e<end\e>/ would find any instance of the word "end", |
15637ed4 RG |
42 | but would ignore any instances of e-n-d inside another word |
43 | such as "calendar". | |
44 | .IP "\&." 0.8i | |
45 | The . metacharacter matches any single character. | |
46 | .IP "[\fIcharacter-list\fP]" 0.8i | |
47 | This matches any single character from the \fIcharacter-list\fP. | |
48 | Inside the \fIcharacter-list\fP, you can denote a span of characters | |
49 | by writing only the first and last characters, with a hyphen between | |
50 | them. | |
51 | If the \fIcharacter-list\fP is preceded by a ^ character, then the | |
52 | list is inverted -- it will match character that \fIisn't\fP mentioned | |
53 | in the list. | |
54 | For example, /[a-zA-Z]/ matches any letter, and /[^ ]/ matches anything | |
55 | other than a blank. | |
78ed81a3 | 56 | .IP "\e{\fIn\fP\e}" 0.8i |
15637ed4 RG |
57 | This is a closure operator, |
58 | which means that it can only be placed after something that matches a | |
59 | single character. | |
60 | It controls the number of times that the single-character expression | |
61 | should be repeated. | |
62 | .IP "" 0.8i | |
78ed81a3 | 63 | The \e{\fIn\fP\e} operator, in particular, means that the preceding |
15637ed4 | 64 | expression should be repeated exactly \fIn\fP times. |
78ed81a3 | 65 | For example, /^-\e{80\e}$/ matches a line of eighty hyphens, and |
66 | /\e<[a-zA-Z]\e{4\e}\e>/ matches any four-letter word. | |
67 | .IP "\e{\fIn\fP,\fIm\fP\e}" 0.8i | |
15637ed4 RG |
68 | This is a closure operator which means that the preceding single-character |
69 | expression should be repeated between \fIn\fP and \fIm\fP times, inclusive. | |
70 | If the \fIm\fP is omitted (but the comma is present) then \fIm\fP is | |
71 | taken to be inifinity. | |
78ed81a3 | 72 | For example, /"[^"]\e{3,5\e}"/ matches any pair of quotes which contains |
15637ed4 RG |
73 | three, four, or five non-quote characters. |
74 | .IP "*" 0.8i | |
75 | The * metacharacter is a closure operator which means that the preceding | |
76 | single-character expression can be repeated zero or more times. | |
78ed81a3 | 77 | It is equivelent to \e{0,\e}. |
15637ed4 | 78 | For example, /.*/ matches a whole line. |
78ed81a3 | 79 | .IP "\e+" 0.8i |
80 | The \e+ metacharacter is a closure operator which means that the preceding | |
15637ed4 | 81 | single-character expression can be repeated one or more times. |
78ed81a3 | 82 | It is equivelent to \e{1,\e}. |
83 | For example, /.\e+/ matches a whole line, but only if the line contains | |
15637ed4 RG |
84 | at least one character. |
85 | It doesn't match empty lines. | |
78ed81a3 | 86 | .IP "\e?" 0.8i |
87 | The \e? metacharacter is a closure operator which indicates that the | |
15637ed4 RG |
88 | preceding single-character expression is optional -- that is, that it |
89 | can occur 0 or 1 times. | |
78ed81a3 | 90 | It is equivelent to \e{0,1\e}. |
91 | For example, /no[ -]\e?one/ matches "no one", "no-one", or "noone". | |
15637ed4 RG |
92 | .PP |
93 | Anything else is treated as a normal character which must exactly match | |
94 | a character from the scanned text. | |
95 | The special strings may all be preceded by a backslash to | |
96 | force them to be treated normally. | |
97 | .SH | |
98 | Substitutions | |
99 | .PP | |
100 | The :s command has at least two arguments: a regular expression, | |
101 | and a substitution string. | |
102 | The text that matched the regular expression is replaced by text | |
103 | which is derived from the substitution string. | |
104 | .br | |
105 | .ne 15 \" so we don't mess up the table | |
106 | .PP | |
107 | Most characters in the substitution string are copied into the | |
108 | text literally but a few have special meaning: | |
109 | .LD | |
110 | .ta 0.75i 1.3i | |
111 | & Insert a copy of the original text | |
112 | ~ Insert a copy of the previous replacement text | |
78ed81a3 | 113 | \e1 Insert a copy of that portion of the original text which |
114 | matched the first set of \e( \e) parentheses | |
115 | \e2-\e9 Do the same for the second (etc.) pair of \e( \e) | |
116 | \eU Convert all chars of any later & or \e# to uppercase | |
117 | \eL Convert all chars of any later & or \e# to lowercase | |
118 | \eE End the effect of \eU or \eL | |
119 | \eu Convert the first char of the next & or \e# to uppercase | |
120 | \el Convert the first char of the next & or \e# to lowercase | |
15637ed4 RG |
121 | .TA |
122 | .DE | |
123 | .PP | |
124 | These may be preceded by a backslash to force them to be treated normally. | |
125 | If "nomagic" mode is in effect, | |
126 | then & and ~ will be treated normally, | |
78ed81a3 | 127 | and you must write them as \e& and \e~ for them to have special meaning. |
15637ed4 RG |
128 | .SH |
129 | Options | |
130 | .PP | |
131 | \*E has two options which affect the way regular expressions are used. | |
132 | These options may be examined or set via the :set command. | |
133 | .PP | |
134 | The first option is called "[no]magic". | |
135 | This is a boolean option, and it is "magic" (TRUE) by default. | |
136 | While in magic mode, all of the meta-characters behave as described above. | |
137 | In nomagic mode, only ^ and $ retain their special meaning. | |
138 | .PP | |
139 | The second option is called "[no]ignorecase". | |
140 | This is a boolean option, and it is "noignorecase" (FALSE) by default. | |
141 | While in ignorecase mode, the searching mechanism will not distinguish between | |
142 | an uppercase letter and its lowercase form. | |
143 | In noignorecase mode, uppercase and lowercase are treated as being different. | |
144 | .PP | |
145 | Also, the "[no]wrapscan" option affects searches. | |
146 | .SH | |
147 | Examples | |
148 | .PP | |
149 | This example changes every occurence of "utilize" to "use": | |
150 | .sp | |
151 | .ti +1i | |
152 | :%s/utilize/use/g | |
153 | .PP | |
154 | This example deletes all whitespace that occurs at the end of a line anywhere | |
155 | in the file. | |
156 | (The brackets contain a single space and a single tab.): | |
157 | .sp | |
158 | .ti +1i | |
78ed81a3 | 159 | :%s/[ ]\e+$// |
15637ed4 RG |
160 | .PP |
161 | This example converts the current line to uppercase: | |
162 | .sp | |
163 | .ti +1i | |
78ed81a3 | 164 | :s/.*/\eU&/ |
15637ed4 RG |
165 | .PP |
166 | This example underlines each letter in the current line, | |
167 | by changing it into an "underscore backspace letter" sequence. | |
168 | (The ^H is entered as "control-V backspace".): | |
169 | .sp | |
170 | .ti +1i | |
171 | :s/[a-zA-Z]/_^H&/g | |
172 | .PP | |
173 | This example locates the last colon in a line, | |
174 | and swaps the text before the colon with the text after the colon. | |
78ed81a3 | 175 | The first \e( \e) pair is used to delimit the stuff before the colon, |
15637ed4 | 176 | and the second pair delimit the stuff after. |
78ed81a3 | 177 | In the substitution text, \e1 and \e2 are given in reverse order |
15637ed4 RG |
178 | to perform the swap: |
179 | .sp | |
180 | .ti +1i | |
78ed81a3 | 181 | :s/\e(.*\e):\e(.*\e)/\e2:\e1/ |