Initial commit of OpenSPARC T2 architecture model.
[OpenSPARC-T2-SAM] / sam-t2 / devtools / amd64 / man / man1 / perlrequick.1
CommitLineData
920dae64
AT
1.\" Automatically generated by Pod::Man v1.37, Pod::Parser v1.32
2.\"
3.\" Standard preamble:
4.\" ========================================================================
5.de Sh \" Subsection heading
6.br
7.if t .Sp
8.ne 5
9.PP
10\fB\\$1\fR
11.PP
12..
13.de Sp \" Vertical space (when we can't use .PP)
14.if t .sp .5v
15.if n .sp
16..
17.de Vb \" Begin verbatim text
18.ft CW
19.nf
20.ne \\$1
21..
22.de Ve \" End verbatim text
23.ft R
24.fi
25..
26.\" Set up some character translations and predefined strings. \*(-- will
27.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
28.\" double quote, and \*(R" will give a right double quote. | will give a
29.\" real vertical bar. \*(C+ will give a nicer C++. Capital omega is used to
30.\" do unbreakable dashes and therefore won't be available. \*(C` and \*(C'
31.\" expand to `' in nroff, nothing in troff, for use with C<>.
32.tr \(*W-|\(bv\*(Tr
33.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
34.ie n \{\
35. ds -- \(*W-
36. ds PI pi
37. if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
38. if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch
39. ds L" ""
40. ds R" ""
41. ds C` ""
42. ds C' ""
43'br\}
44.el\{\
45. ds -- \|\(em\|
46. ds PI \(*p
47. ds L" ``
48. ds R" ''
49'br\}
50.\"
51.\" If the F register is turned on, we'll generate index entries on stderr for
52.\" titles (.TH), headers (.SH), subsections (.Sh), items (.Ip), and index
53.\" entries marked with X<> in POD. Of course, you'll have to process the
54.\" output yourself in some meaningful fashion.
55.if \nF \{\
56. de IX
57. tm Index:\\$1\t\\n%\t"\\$2"
58..
59. nr % 0
60. rr F
61.\}
62.\"
63.\" For nroff, turn off justification. Always turn off hyphenation; it makes
64.\" way too many mistakes in technical documents.
65.hy 0
66.if n .na
67.\"
68.\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2).
69.\" Fear. Run. Save yourself. No user-serviceable parts.
70. \" fudge factors for nroff and troff
71.if n \{\
72. ds #H 0
73. ds #V .8m
74. ds #F .3m
75. ds #[ \f1
76. ds #] \fP
77.\}
78.if t \{\
79. ds #H ((1u-(\\\\n(.fu%2u))*.13m)
80. ds #V .6m
81. ds #F 0
82. ds #[ \&
83. ds #] \&
84.\}
85. \" simple accents for nroff and troff
86.if n \{\
87. ds ' \&
88. ds ` \&
89. ds ^ \&
90. ds , \&
91. ds ~ ~
92. ds /
93.\}
94.if t \{\
95. ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u"
96. ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u'
97. ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u'
98. ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u'
99. ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u'
100. ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u'
101.\}
102. \" troff and (daisy-wheel) nroff accents
103.ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V'
104.ds 8 \h'\*(#H'\(*b\h'-\*(#H'
105.ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#]
106.ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H'
107.ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u'
108.ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#]
109.ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#]
110.ds ae a\h'-(\w'a'u*4/10)'e
111.ds Ae A\h'-(\w'A'u*4/10)'E
112. \" corrections for vroff
113.if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u'
114.if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u'
115. \" for low resolution devices (crt and lpr)
116.if \n(.H>23 .if \n(.V>19 \
117\{\
118. ds : e
119. ds 8 ss
120. ds o a
121. ds d- d\h'-1'\(ga
122. ds D- D\h'-1'\(hy
123. ds th \o'bp'
124. ds Th \o'LP'
125. ds ae ae
126. ds Ae AE
127.\}
128.rm #[ #] #H #V #F C
129.\" ========================================================================
130.\"
131.IX Title "PERLREQUICK 1"
132.TH PERLREQUICK 1 "2006-01-07" "perl v5.8.8" "Perl Programmers Reference Guide"
133.SH "NAME"
134perlrequick \- Perl regular expressions quick start
135.SH "DESCRIPTION"
136.IX Header "DESCRIPTION"
137This page covers the very basics of understanding, creating and
138using regular expressions ('regexes') in Perl.
139.SH "The Guide"
140.IX Header "The Guide"
141.Sh "Simple word matching"
142.IX Subsection "Simple word matching"
143The simplest regex is simply a word, or more generally, a string of
144characters. A regex consisting of a word matches any string that
145contains that word:
146.PP
147.Vb 1
148\& "Hello World" =~ /World/; # matches
149.Ve
150.PP
151In this statement, \f(CW\*(C`World\*(C'\fR is a regex and the \f(CW\*(C`//\*(C'\fR enclosing
152\&\f(CW\*(C`/World/\*(C'\fR tells perl to search a string for a match. The operator
153\&\f(CW\*(C`=~\*(C'\fR associates the string with the regex match and produces a true
154value if the regex matched, or false if the regex did not match. In
155our case, \f(CW\*(C`World\*(C'\fR matches the second word in \f(CW"Hello World"\fR, so the
156expression is true. This idea has several variations.
157.PP
158Expressions like this are useful in conditionals:
159.PP
160.Vb 1
161\& print "It matches\en" if "Hello World" =~ /World/;
162.Ve
163.PP
164The sense of the match can be reversed by using \f(CW\*(C`!~\*(C'\fR operator:
165.PP
166.Vb 1
167\& print "It doesn't match\en" if "Hello World" !~ /World/;
168.Ve
169.PP
170The literal string in the regex can be replaced by a variable:
171.PP
172.Vb 2
173\& $greeting = "World";
174\& print "It matches\en" if "Hello World" =~ /$greeting/;
175.Ve
176.PP
177If you're matching against \f(CW$_\fR, the \f(CW\*(C`$_ =~\*(C'\fR part can be omitted:
178.PP
179.Vb 2
180\& $_ = "Hello World";
181\& print "It matches\en" if /World/;
182.Ve
183.PP
184Finally, the \f(CW\*(C`//\*(C'\fR default delimiters for a match can be changed to
185arbitrary delimiters by putting an \f(CW'm'\fR out front:
186.PP
187.Vb 4
188\& "Hello World" =~ m!World!; # matches, delimited by '!'
189\& "Hello World" =~ m{World}; # matches, note the matching '{}'
190\& "/usr/bin/perl" =~ m"/perl"; # matches after '/usr/bin',
191\& # '/' becomes an ordinary char
192.Ve
193.PP
194Regexes must match a part of the string \fIexactly\fR in order for the
195statement to be true:
196.PP
197.Vb 3
198\& "Hello World" =~ /world/; # doesn't match, case sensitive
199\& "Hello World" =~ /o W/; # matches, ' ' is an ordinary char
200\& "Hello World" =~ /World /; # doesn't match, no ' ' at end
201.Ve
202.PP
203perl will always match at the earliest possible point in the string:
204.PP
205.Vb 2
206\& "Hello World" =~ /o/; # matches 'o' in 'Hello'
207\& "That hat is red" =~ /hat/; # matches 'hat' in 'That'
208.Ve
209.PP
210Not all characters can be used 'as is' in a match. Some characters,
211called \fBmetacharacters\fR, are reserved for use in regex notation.
212The metacharacters are
213.PP
214.Vb 1
215\& {}[]()^$.|*+?\e
216.Ve
217.PP
218A metacharacter can be matched by putting a backslash before it:
219.PP
220.Vb 4
221\& "2+2=4" =~ /2+2/; # doesn't match, + is a metacharacter
222\& "2+2=4" =~ /2\e+2/; # matches, \e+ is treated like an ordinary +
223\& 'C:\eWIN32' =~ /C:\e\eWIN/; # matches
224\& "/usr/bin/perl" =~ /\e/usr\e/bin\e/perl/; # matches
225.Ve
226.PP
227In the last regex, the forward slash \f(CW'/'\fR is also backslashed,
228because it is used to delimit the regex.
229.PP
230Non-printable \s-1ASCII\s0 characters are represented by \fBescape sequences\fR.
231Common examples are \f(CW\*(C`\et\*(C'\fR for a tab, \f(CW\*(C`\en\*(C'\fR for a newline, and \f(CW\*(C`\er\*(C'\fR
232for a carriage return. Arbitrary bytes are represented by octal
233escape sequences, e.g., \f(CW\*(C`\e033\*(C'\fR, or hexadecimal escape sequences,
234e.g., \f(CW\*(C`\ex1B\*(C'\fR:
235.PP
236.Vb 2
237\& "1000\et2000" =~ m(0\et2) # matches
238\& "cat" =~ /\e143\ex61\ex74/ # matches, but a weird way to spell cat
239.Ve
240.PP
241Regexes are treated mostly as double quoted strings, so variable
242substitution works:
243.PP
244.Vb 3
245\& $foo = 'house';
246\& 'cathouse' =~ /cat$foo/; # matches
247\& 'housecat' =~ /${foo}cat/; # matches
248.Ve
249.PP
250With all of the regexes above, if the regex matched anywhere in the
251string, it was considered a match. To specify \fIwhere\fR it should
252match, we would use the \fBanchor\fR metacharacters \f(CW\*(C`^\*(C'\fR and \f(CW\*(C`$\*(C'\fR. The
253anchor \f(CW\*(C`^\*(C'\fR means match at the beginning of the string and the anchor
254\&\f(CW\*(C`$\*(C'\fR means match at the end of the string, or before a newline at the
255end of the string. Some examples:
256.PP
257.Vb 5
258\& "housekeeper" =~ /keeper/; # matches
259\& "housekeeper" =~ /^keeper/; # doesn't match
260\& "housekeeper" =~ /keeper$/; # matches
261\& "housekeeper\en" =~ /keeper$/; # matches
262\& "housekeeper" =~ /^housekeeper$/; # matches
263.Ve
264.Sh "Using character classes"
265.IX Subsection "Using character classes"
266A \fBcharacter class\fR allows a set of possible characters, rather than
267just a single character, to match at a particular point in a regex.
268Character classes are denoted by brackets \f(CW\*(C`[...]\*(C'\fR, with the set of
269characters to be possibly matched inside. Here are some examples:
270.PP
271.Vb 3
272\& /cat/; # matches 'cat'
273\& /[bcr]at/; # matches 'bat', 'cat', or 'rat'
274\& "abc" =~ /[cab]/; # matches 'a'
275.Ve
276.PP
277In the last statement, even though \f(CW'c'\fR is the first character in
278the class, the earliest point at which the regex can match is \f(CW'a'\fR.
279.PP
280.Vb 3
281\& /[yY][eE][sS]/; # match 'yes' in a case-insensitive way
282\& # 'yes', 'Yes', 'YES', etc.
283\& /yes/i; # also match 'yes' in a case-insensitive way
284.Ve
285.PP
286The last example shows a match with an \f(CW'i'\fR \fBmodifier\fR, which makes
287the match case\-insensitive.
288.PP
289Character classes also have ordinary and special characters, but the
290sets of ordinary and special characters inside a character class are
291different than those outside a character class. The special
292characters for a character class are \f(CW\*(C`\-]\e^$\*(C'\fR and are matched using an
293escape:
294.PP
295.Vb 5
296\& /[\e]c]def/; # matches ']def' or 'cdef'
297\& $x = 'bcr';
298\& /[$x]at/; # matches 'bat, 'cat', or 'rat'
299\& /[\e$x]at/; # matches '$at' or 'xat'
300\& /[\e\e$x]at/; # matches '\eat', 'bat, 'cat', or 'rat'
301.Ve
302.PP
303The special character \f(CW'\-'\fR acts as a range operator within character
304classes, so that the unwieldy \f(CW\*(C`[0123456789]\*(C'\fR and \f(CW\*(C`[abc...xyz]\*(C'\fR
305become the svelte \f(CW\*(C`[0\-9]\*(C'\fR and \f(CW\*(C`[a\-z]\*(C'\fR:
306.PP
307.Vb 2
308\& /item[0-9]/; # matches 'item0' or ... or 'item9'
309\& /[0-9a-fA-F]/; # matches a hexadecimal digit
310.Ve
311.PP
312If \f(CW'\-'\fR is the first or last character in a character class, it is
313treated as an ordinary character.
314.PP
315The special character \f(CW\*(C`^\*(C'\fR in the first position of a character class
316denotes a \fBnegated character class\fR, which matches any character but
317those in the brackets. Both \f(CW\*(C`[...]\*(C'\fR and \f(CW\*(C`[^...]\*(C'\fR must match a
318character, or the match fails. Then
319.PP
320.Vb 4
321\& /[^a]at/; # doesn't match 'aat' or 'at', but matches
322\& # all other 'bat', 'cat, '0at', '%at', etc.
323\& /[^0-9]/; # matches a non-numeric character
324\& /[a^]at/; # matches 'aat' or '^at'; here '^' is ordinary
325.Ve
326.PP
327Perl has several abbreviations for common character classes:
328.IP "\(bu" 4
329\&\ed is a digit and represents
330.Sp
331.Vb 1
332\& [0-9]
333.Ve
334.IP "\(bu" 4
335\&\es is a whitespace character and represents
336.Sp
337.Vb 1
338\& [\e \et\er\en\ef]
339.Ve
340.IP "\(bu" 4
341\&\ew is a word character (alphanumeric or _) and represents
342.Sp
343.Vb 1
344\& [0-9a-zA-Z_]
345.Ve
346.IP "\(bu" 4
347\&\eD is a negated \ed; it represents any character but a digit
348.Sp
349.Vb 1
350\& [^0-9]
351.Ve
352.IP "\(bu" 4
353\&\eS is a negated \es; it represents any non-whitespace character
354.Sp
355.Vb 1
356\& [^\es]
357.Ve
358.IP "\(bu" 4
359\&\eW is a negated \ew; it represents any non-word character
360.Sp
361.Vb 1
362\& [^\ew]
363.Ve
364.IP "\(bu" 4
365The period '.' matches any character but \*(L"\en\*(R"
366.PP
367The \f(CW\*(C`\ed\es\ew\eD\eS\eW\*(C'\fR abbreviations can be used both inside and outside
368of character classes. Here are some in use:
369.PP
370.Vb 7
371\& /\ed\ed:\ed\ed:\ed\ed/; # matches a hh:mm:ss time format
372\& /[\ed\es]/; # matches any digit or whitespace character
373\& /\ew\eW\ew/; # matches a word char, followed by a
374\& # non-word char, followed by a word char
375\& /..rt/; # matches any two chars, followed by 'rt'
376\& /end\e./; # matches 'end.'
377\& /end[.]/; # same thing, matches 'end.'
378.Ve
379.PP
380The \fBword\ anchor\fR\ \f(CW\*(C`\eb\*(C'\fR matches a boundary between a word
381character and a non-word character \f(CW\*(C`\ew\eW\*(C'\fR or \f(CW\*(C`\eW\ew\*(C'\fR:
382.PP
383.Vb 4
384\& $x = "Housecat catenates house and cat";
385\& $x =~ /\ebcat/; # matches cat in 'catenates'
386\& $x =~ /cat\eb/; # matches cat in 'housecat'
387\& $x =~ /\ebcat\eb/; # matches 'cat' at end of string
388.Ve
389.PP
390In the last example, the end of the string is considered a word
391boundary.
392.Sh "Matching this or that"
393.IX Subsection "Matching this or that"
394We can match different character strings with the \fBalternation\fR
395metacharacter \f(CW'|'\fR. To match \f(CW\*(C`dog\*(C'\fR or \f(CW\*(C`cat\*(C'\fR, we form the regex
396\&\f(CW\*(C`dog|cat\*(C'\fR. As before, perl will try to match the regex at the
397earliest possible point in the string. At each character position,
398perl will first try to match the first alternative, \f(CW\*(C`dog\*(C'\fR. If
399\&\f(CW\*(C`dog\*(C'\fR doesn't match, perl will then try the next alternative, \f(CW\*(C`cat\*(C'\fR.
400If \f(CW\*(C`cat\*(C'\fR doesn't match either, then the match fails and perl moves to
401the next position in the string. Some examples:
402.PP
403.Vb 2
404\& "cats and dogs" =~ /cat|dog|bird/; # matches "cat"
405\& "cats and dogs" =~ /dog|cat|bird/; # matches "cat"
406.Ve
407.PP
408Even though \f(CW\*(C`dog\*(C'\fR is the first alternative in the second regex,
409\&\f(CW\*(C`cat\*(C'\fR is able to match earlier in the string.
410.PP
411.Vb 2
412\& "cats" =~ /c|ca|cat|cats/; # matches "c"
413\& "cats" =~ /cats|cat|ca|c/; # matches "cats"
414.Ve
415.PP
416At a given character position, the first alternative that allows the
417regex match to succeed will be the one that matches. Here, all the
418alternatives match at the first string position, so the first matches.
419.Sh "Grouping things and hierarchical matching"
420.IX Subsection "Grouping things and hierarchical matching"
421The \fBgrouping\fR metacharacters \f(CW\*(C`()\*(C'\fR allow a part of a regex to be
422treated as a single unit. Parts of a regex are grouped by enclosing
423them in parentheses. The regex \f(CW\*(C`house(cat|keeper)\*(C'\fR means match
424\&\f(CW\*(C`house\*(C'\fR followed by either \f(CW\*(C`cat\*(C'\fR or \f(CW\*(C`keeper\*(C'\fR. Some more examples
425are
426.PP
427.Vb 2
428\& /(a|b)b/; # matches 'ab' or 'bb'
429\& /(^a|b)c/; # matches 'ac' at start of string or 'bc' anywhere
430.Ve
431.PP
432.Vb 3
433\& /house(cat|)/; # matches either 'housecat' or 'house'
434\& /house(cat(s|)|)/; # matches either 'housecats' or 'housecat' or
435\& # 'house'. Note groups can be nested.
436.Ve
437.PP
438.Vb 2
439\& "20" =~ /(19|20|)\ed\ed/; # matches the null alternative '()\ed\ed',
440\& # because '20\ed\ed' can't match
441.Ve
442.Sh "Extracting matches"
443.IX Subsection "Extracting matches"
444The grouping metacharacters \f(CW\*(C`()\*(C'\fR also allow the extraction of the
445parts of a string that matched. For each grouping, the part that
446matched inside goes into the special variables \f(CW$1\fR, \f(CW$2\fR, etc.
447They can be used just as ordinary variables:
448.PP
449.Vb 5
450\& # extract hours, minutes, seconds
451\& $time =~ /(\ed\ed):(\ed\ed):(\ed\ed)/; # match hh:mm:ss format
452\& $hours = $1;
453\& $minutes = $2;
454\& $seconds = $3;
455.Ve
456.PP
457In list context, a match \f(CW\*(C`/regex/\*(C'\fR with groupings will return the
458list of matched values \f(CW\*(C`($1,$2,...)\*(C'\fR. So we could rewrite it as
459.PP
460.Vb 1
461\& ($hours, $minutes, $second) = ($time =~ /(\ed\ed):(\ed\ed):(\ed\ed)/);
462.Ve
463.PP
464If the groupings in a regex are nested, \f(CW$1\fR gets the group with the
465leftmost opening parenthesis, \f(CW$2\fR the next opening parenthesis,
466etc. For example, here is a complex regex and the matching variables
467indicated below it:
468.PP
469.Vb 2
470\& /(ab(cd|ef)((gi)|j))/;
471\& 1 2 34
472.Ve
473.PP
474Associated with the matching variables \f(CW$1\fR, \f(CW$2\fR, ... are
475the \fBbackreferences\fR \f(CW\*(C`\e1\*(C'\fR, \f(CW\*(C`\e2\*(C'\fR, ... Backreferences are
476matching variables that can be used \fIinside\fR a regex:
477.PP
478.Vb 1
479\& /(\ew\ew\ew)\es\e1/; # find sequences like 'the the' in string
480.Ve
481.PP
482\&\f(CW$1\fR, \f(CW$2\fR, ... should only be used outside of a regex, and \f(CW\*(C`\e1\*(C'\fR,
483\&\f(CW\*(C`\e2\*(C'\fR, ... only inside a regex.
484.Sh "Matching repetitions"
485.IX Subsection "Matching repetitions"
486The \fBquantifier\fR metacharacters \f(CW\*(C`?\*(C'\fR, \f(CW\*(C`*\*(C'\fR, \f(CW\*(C`+\*(C'\fR, and \f(CW\*(C`{}\*(C'\fR allow us
487to determine the number of repeats of a portion of a regex we
488consider to be a match. Quantifiers are put immediately after the
489character, character class, or grouping that we want to specify. They
490have the following meanings:
491.IP "\(bu" 4
492\&\f(CW\*(C`a?\*(C'\fR = match 'a' 1 or 0 times
493.IP "\(bu" 4
494\&\f(CW\*(C`a*\*(C'\fR = match 'a' 0 or more times, i.e., any number of times
495.IP "\(bu" 4
496\&\f(CW\*(C`a+\*(C'\fR = match 'a' 1 or more times, i.e., at least once
497.IP "\(bu" 4
498\&\f(CW\*(C`a{n,m}\*(C'\fR = match at least \f(CW\*(C`n\*(C'\fR times, but not more than \f(CW\*(C`m\*(C'\fR
499times.
500.IP "\(bu" 4
501\&\f(CW\*(C`a{n,}\*(C'\fR = match at least \f(CW\*(C`n\*(C'\fR or more times
502.IP "\(bu" 4
503\&\f(CW\*(C`a{n}\*(C'\fR = match exactly \f(CW\*(C`n\*(C'\fR times
504.PP
505Here are some examples:
506.PP
507.Vb 6
508\& /[a-z]+\es+\ed*/; # match a lowercase word, at least some space, and
509\& # any number of digits
510\& /(\ew+)\es+\e1/; # match doubled words of arbitrary length
511\& $year =~ /\ed{2,4}/; # make sure year is at least 2 but not more
512\& # than 4 digits
513\& $year =~ /\ed{4}|\ed{2}/; # better match; throw out 3 digit dates
514.Ve
515.PP
516These quantifiers will try to match as much of the string as possible,
517while still allowing the regex to match. So we have
518.PP
519.Vb 5
520\& $x = 'the cat in the hat';
521\& $x =~ /^(.*)(at)(.*)$/; # matches,
522\& # $1 = 'the cat in the h'
523\& # $2 = 'at'
524\& # $3 = '' (0 matches)
525.Ve
526.PP
527The first quantifier \f(CW\*(C`.*\*(C'\fR grabs as much of the string as possible
528while still having the regex match. The second quantifier \f(CW\*(C`.*\*(C'\fR has
529no string left to it, so it matches 0 times.
530.Sh "More matching"
531.IX Subsection "More matching"
532There are a few more things you might want to know about matching
533operators. In the code
534.PP
535.Vb 4
536\& $pattern = 'Seuss';
537\& while (<>) {
538\& print if /$pattern/;
539\& }
540.Ve
541.PP
542perl has to re-evaluate \f(CW$pattern\fR each time through the loop. If
543\&\f(CW$pattern\fR won't be changing, use the \f(CW\*(C`//o\*(C'\fR modifier, to only
544perform variable substitutions once. If you don't want any
545substitutions at all, use the special delimiter \f(CW\*(C`m''\*(C'\fR:
546.PP
547.Vb 3
548\& @pattern = ('Seuss');
549\& m/@pattern/; # matches 'Seuss'
550\& m'@pattern'; # matches the literal string '@pattern'
551.Ve
552.PP
553The global modifier \f(CW\*(C`//g\*(C'\fR allows the matching operator to match
554within a string as many times as possible. In scalar context,
555successive matches against a string will have \f(CW\*(C`//g\*(C'\fR jump from match
556to match, keeping track of position in the string as it goes along.
557You can get or set the position with the \f(CW\*(C`pos()\*(C'\fR function.
558For example,
559.PP
560.Vb 4
561\& $x = "cat dog house"; # 3 words
562\& while ($x =~ /(\ew+)/g) {
563\& print "Word is $1, ends at position ", pos $x, "\en";
564\& }
565.Ve
566.PP
567prints
568.PP
569.Vb 3
570\& Word is cat, ends at position 3
571\& Word is dog, ends at position 7
572\& Word is house, ends at position 13
573.Ve
574.PP
575A failed match or changing the target string resets the position. If
576you don't want the position reset after failure to match, add the
577\&\f(CW\*(C`//c\*(C'\fR, as in \f(CW\*(C`/regex/gc\*(C'\fR.
578.PP
579In list context, \f(CW\*(C`//g\*(C'\fR returns a list of matched groupings, or if
580there are no groupings, a list of matches to the whole regex. So
581.PP
582.Vb 4
583\& @words = ($x =~ /(\ew+)/g); # matches,
584\& # $word[0] = 'cat'
585\& # $word[1] = 'dog'
586\& # $word[2] = 'house'
587.Ve
588.Sh "Search and replace"
589.IX Subsection "Search and replace"
590Search and replace is performed using \f(CW\*(C`s/regex/replacement/modifiers\*(C'\fR.
591The \f(CW\*(C`replacement\*(C'\fR is a Perl double quoted string that replaces in the
592string whatever is matched with the \f(CW\*(C`regex\*(C'\fR. The operator \f(CW\*(C`=~\*(C'\fR is
593also used here to associate a string with \f(CW\*(C`s///\*(C'\fR. If matching
594against \f(CW$_\fR, the \f(CW\*(C`$_\ =~\*(C'\fR\ can be dropped. If there is a match,
595\&\f(CW\*(C`s///\*(C'\fR returns the number of substitutions made, otherwise it returns
596false. Here are a few examples:
597.PP
598.Vb 5
599\& $x = "Time to feed the cat!";
600\& $x =~ s/cat/hacker/; # $x contains "Time to feed the hacker!"
601\& $y = "'quoted words'";
602\& $y =~ s/^'(.*)'$/$1/; # strip single quotes,
603\& # $y contains "quoted words"
604.Ve
605.PP
606With the \f(CW\*(C`s///\*(C'\fR operator, the matched variables \f(CW$1\fR, \f(CW$2\fR, etc.
607are immediately available for use in the replacement expression. With
608the global modifier, \f(CW\*(C`s///g\*(C'\fR will search and replace all occurrences
609of the regex in the string:
610.PP
611.Vb 4
612\& $x = "I batted 4 for 4";
613\& $x =~ s/4/four/; # $x contains "I batted four for 4"
614\& $x = "I batted 4 for 4";
615\& $x =~ s/4/four/g; # $x contains "I batted four for four"
616.Ve
617.PP
618The evaluation modifier \f(CW\*(C`s///e\*(C'\fR wraps an \f(CW\*(C`eval{...}\*(C'\fR around the
619replacement string and the evaluated result is substituted for the
620matched substring. Some examples:
621.PP
622.Vb 3
623\& # reverse all the words in a string
624\& $x = "the cat in the hat";
625\& $x =~ s/(\ew+)/reverse $1/ge; # $x contains "eht tac ni eht tah"
626.Ve
627.PP
628.Vb 3
629\& # convert percentage to decimal
630\& $x = "A 39% hit rate";
631\& $x =~ s!(\ed+)%!$1/100!e; # $x contains "A 0.39 hit rate"
632.Ve
633.PP
634The last example shows that \f(CW\*(C`s///\*(C'\fR can use other delimiters, such as
635\&\f(CW\*(C`s!!!\*(C'\fR and \f(CW\*(C`s{}{}\*(C'\fR, and even \f(CW\*(C`s{}//\*(C'\fR. If single quotes are used
636\&\f(CW\*(C`s'''\*(C'\fR, then the regex and replacement are treated as single quoted
637strings.
638.Sh "The split operator"
639.IX Subsection "The split operator"
640\&\f(CW\*(C`split /regex/, string\*(C'\fR splits \f(CW\*(C`string\*(C'\fR into a list of substrings
641and returns that list. The regex determines the character sequence
642that \f(CW\*(C`string\*(C'\fR is split with respect to. For example, to split a
643string into words, use
644.PP
645.Vb 4
646\& $x = "Calvin and Hobbes";
647\& @word = split /\es+/, $x; # $word[0] = 'Calvin'
648\& # $word[1] = 'and'
649\& # $word[2] = 'Hobbes'
650.Ve
651.PP
652To extract a comma-delimited list of numbers, use
653.PP
654.Vb 4
655\& $x = "1.618,2.718, 3.142";
656\& @const = split /,\es*/, $x; # $const[0] = '1.618'
657\& # $const[1] = '2.718'
658\& # $const[2] = '3.142'
659.Ve
660.PP
661If the empty regex \f(CW\*(C`//\*(C'\fR is used, the string is split into individual
662characters. If the regex has groupings, then the list produced contains
663the matched substrings from the groupings as well:
664.PP
665.Vb 6
666\& $x = "/usr/bin";
667\& @parts = split m!(/)!, $x; # $parts[0] = ''
668\& # $parts[1] = '/'
669\& # $parts[2] = 'usr'
670\& # $parts[3] = '/'
671\& # $parts[4] = 'bin'
672.Ve
673.PP
674Since the first character of \f(CW$x\fR matched the regex, \f(CW\*(C`split\*(C'\fR prepended
675an empty initial element to the list.
676.SH "BUGS"
677.IX Header "BUGS"
678None.
679.SH "SEE ALSO"
680.IX Header "SEE ALSO"
681This is just a quick start guide. For a more in-depth tutorial on
682regexes, see perlretut and for the reference page, see perlre.
683.SH "AUTHOR AND COPYRIGHT"
684.IX Header "AUTHOR AND COPYRIGHT"
685Copyright (c) 2000 Mark Kvale
686All rights reserved.
687.PP
688This document may be distributed under the same terms as Perl itself.
689.Sh "Acknowledgments"
690.IX Subsection "Acknowledgments"
691The author would like to thank Mark-Jason Dominus, Tom Christiansen,
692Ilya Zakharevich, Brad Hughes, and Mike Giroux for all their helpful
693comments.