| 1 | .Go 4 "REGULAR EXPRESSIONS" |
| 2 | |
| 3 | .PP |
| 4 | \*E uses regular expressions for searching and substututions. |
| 5 | A regular expression is a text string in which some characters have |
| 6 | special meanings. |
| 7 | This is much more powerful than simple text matching. |
| 8 | .SH |
| 9 | Syntax |
| 10 | .PP |
| 11 | \*E' regexp package treats the following one- or two-character |
| 12 | strings (called meta-characters) in special ways: |
| 13 | .IP "\e(\fIsubexpression\fP\e)" 0.8i |
| 14 | The \e( and \e) metacharacters are used to delimit subexpressions. |
| 15 | When the regular expression matches a particular chunk of text, |
| 16 | \*E will remember which portion of that chunk matched the \fIsubexpression\fP. |
| 17 | The :s/regexp/newtext/ command makes use of this feature. |
| 18 | .IP "^" 0.8i |
| 19 | The ^ metacharacter matches the beginning of a line. |
| 20 | If, for example, you wanted to find "foo" at the beginning of a line, |
| 21 | you would use a regular expression such as /^foo/. |
| 22 | Note that ^ is only a metacharacter if it occurs |
| 23 | at the beginning of a regular expression; |
| 24 | anyplace else, it is treated as a normal character. |
| 25 | .IP "$" 0.8i |
| 26 | The $ metacharacter matches the end of a line. |
| 27 | It is only a metacharacter when it occurs at the end of a regular expression; |
| 28 | elsewhere, it is treated as a normal character. |
| 29 | For example, the regular expression /$$/ will search for a dollar sign at |
| 30 | the end of a line. |
| 31 | .IP "\e<" 0.8i |
| 32 | The \e< metacharacter matches a zero-length string at the beginning of |
| 33 | a word. |
| 34 | A word is considered to be a string of 1 or more letters and digits. |
| 35 | A word can begin at the beginning of a line |
| 36 | or after 1 or more non-alphanumeric characters. |
| 37 | .IP "\e>" 0.8i |
| 38 | The \e> metacharacter matches a zero-length string at the end of a word. |
| 39 | A word can end at the end of the line |
| 40 | or before 1 or more non-alphanumeric characters. |
| 41 | For example, /\e<end\e>/ would find any instance of the word "end", |
| 42 | but would ignore any instances of e-n-d inside another word |
| 43 | such as "calendar". |
| 44 | .IP "\&." 0.8i |
| 45 | The . metacharacter matches any single character. |
| 46 | .IP "[\fIcharacter-list\fP]" 0.8i |
| 47 | This matches any single character from the \fIcharacter-list\fP. |
| 48 | Inside the \fIcharacter-list\fP, you can denote a span of characters |
| 49 | by writing only the first and last characters, with a hyphen between |
| 50 | them. |
| 51 | If the \fIcharacter-list\fP is preceded by a ^ character, then the |
| 52 | list is inverted -- it will match character that \fIisn't\fP mentioned |
| 53 | in the list. |
| 54 | For example, /[a-zA-Z]/ matches any letter, and /[^ ]/ matches anything |
| 55 | other than a blank. |
| 56 | .IP "\e{\fIn\fP\e}" 0.8i |
| 57 | This is a closure operator, |
| 58 | which means that it can only be placed after something that matches a |
| 59 | single character. |
| 60 | It controls the number of times that the single-character expression |
| 61 | should be repeated. |
| 62 | .IP "" 0.8i |
| 63 | The \e{\fIn\fP\e} operator, in particular, means that the preceding |
| 64 | expression should be repeated exactly \fIn\fP times. |
| 65 | For example, /^-\e{80\e}$/ matches a line of eighty hyphens, and |
| 66 | /\e<[a-zA-Z]\e{4\e}\e>/ matches any four-letter word. |
| 67 | .IP "\e{\fIn\fP,\fIm\fP\e}" 0.8i |
| 68 | This is a closure operator which means that the preceding single-character |
| 69 | expression should be repeated between \fIn\fP and \fIm\fP times, inclusive. |
| 70 | If the \fIm\fP is omitted (but the comma is present) then \fIm\fP is |
| 71 | taken to be inifinity. |
| 72 | For example, /"[^"]\e{3,5\e}"/ matches any pair of quotes which contains |
| 73 | three, four, or five non-quote characters. |
| 74 | .IP "*" 0.8i |
| 75 | The * metacharacter is a closure operator which means that the preceding |
| 76 | single-character expression can be repeated zero or more times. |
| 77 | It is equivelent to \e{0,\e}. |
| 78 | For example, /.*/ matches a whole line. |
| 79 | .IP "\e+" 0.8i |
| 80 | The \e+ metacharacter is a closure operator which means that the preceding |
| 81 | single-character expression can be repeated one or more times. |
| 82 | It is equivelent to \e{1,\e}. |
| 83 | For example, /.\e+/ matches a whole line, but only if the line contains |
| 84 | at least one character. |
| 85 | It doesn't match empty lines. |
| 86 | .IP "\e?" 0.8i |
| 87 | The \e? metacharacter is a closure operator which indicates that the |
| 88 | preceding single-character expression is optional -- that is, that it |
| 89 | can occur 0 or 1 times. |
| 90 | It is equivelent to \e{0,1\e}. |
| 91 | For example, /no[ -]\e?one/ matches "no one", "no-one", or "noone". |
| 92 | .PP |
| 93 | Anything else is treated as a normal character which must exactly match |
| 94 | a character from the scanned text. |
| 95 | The special strings may all be preceded by a backslash to |
| 96 | force them to be treated normally. |
| 97 | .SH |
| 98 | Substitutions |
| 99 | .PP |
| 100 | The :s command has at least two arguments: a regular expression, |
| 101 | and a substitution string. |
| 102 | The text that matched the regular expression is replaced by text |
| 103 | which is derived from the substitution string. |
| 104 | .br |
| 105 | .ne 15 \" so we don't mess up the table |
| 106 | .PP |
| 107 | Most characters in the substitution string are copied into the |
| 108 | text literally but a few have special meaning: |
| 109 | .LD |
| 110 | .ta 0.75i 1.3i |
| 111 | & Insert a copy of the original text |
| 112 | ~ Insert a copy of the previous replacement text |
| 113 | \e1 Insert a copy of that portion of the original text which |
| 114 | matched the first set of \e( \e) parentheses |
| 115 | \e2-\e9 Do the same for the second (etc.) pair of \e( \e) |
| 116 | \eU Convert all chars of any later & or \e# to uppercase |
| 117 | \eL Convert all chars of any later & or \e# to lowercase |
| 118 | \eE End the effect of \eU or \eL |
| 119 | \eu Convert the first char of the next & or \e# to uppercase |
| 120 | \el Convert the first char of the next & or \e# to lowercase |
| 121 | .TA |
| 122 | .DE |
| 123 | .PP |
| 124 | These may be preceded by a backslash to force them to be treated normally. |
| 125 | If "nomagic" mode is in effect, |
| 126 | then & and ~ will be treated normally, |
| 127 | and you must write them as \e& and \e~ for them to have special meaning. |
| 128 | .SH |
| 129 | Options |
| 130 | .PP |
| 131 | \*E has two options which affect the way regular expressions are used. |
| 132 | These options may be examined or set via the :set command. |
| 133 | .PP |
| 134 | The first option is called "[no]magic". |
| 135 | This is a boolean option, and it is "magic" (TRUE) by default. |
| 136 | While in magic mode, all of the meta-characters behave as described above. |
| 137 | In nomagic mode, only ^ and $ retain their special meaning. |
| 138 | .PP |
| 139 | The second option is called "[no]ignorecase". |
| 140 | This is a boolean option, and it is "noignorecase" (FALSE) by default. |
| 141 | While in ignorecase mode, the searching mechanism will not distinguish between |
| 142 | an uppercase letter and its lowercase form. |
| 143 | In noignorecase mode, uppercase and lowercase are treated as being different. |
| 144 | .PP |
| 145 | Also, the "[no]wrapscan" option affects searches. |
| 146 | .SH |
| 147 | Examples |
| 148 | .PP |
| 149 | This example changes every occurence of "utilize" to "use": |
| 150 | .sp |
| 151 | .ti +1i |
| 152 | :%s/utilize/use/g |
| 153 | .PP |
| 154 | This example deletes all whitespace that occurs at the end of a line anywhere |
| 155 | in the file. |
| 156 | (The brackets contain a single space and a single tab.): |
| 157 | .sp |
| 158 | .ti +1i |
| 159 | :%s/[ ]\e+$// |
| 160 | .PP |
| 161 | This example converts the current line to uppercase: |
| 162 | .sp |
| 163 | .ti +1i |
| 164 | :s/.*/\eU&/ |
| 165 | .PP |
| 166 | This example underlines each letter in the current line, |
| 167 | by changing it into an "underscore backspace letter" sequence. |
| 168 | (The ^H is entered as "control-V backspace".): |
| 169 | .sp |
| 170 | .ti +1i |
| 171 | :s/[a-zA-Z]/_^H&/g |
| 172 | .PP |
| 173 | This example locates the last colon in a line, |
| 174 | and swaps the text before the colon with the text after the colon. |
| 175 | The first \e( \e) pair is used to delimit the stuff before the colon, |
| 176 | and the second pair delimit the stuff after. |
| 177 | In the substitution text, \e1 and \e2 are given in reverse order |
| 178 | to perform the swap: |
| 179 | .sp |
| 180 | .ti +1i |
| 181 | :s/\e(.*\e):\e(.*\e)/\e2:\e1/ |