Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | =head1 NAME |
2 | ||
3 | perlintro -- a brief introduction and overview of Perl | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | This document is intended to give you a quick overview of the Perl | |
8 | programming language, along with pointers to further documentation. It | |
9 | is intended as a "bootstrap" guide for those who are new to the | |
10 | language, and provides just enough information for you to be able to | |
11 | read other peoples' Perl and understand roughly what it's doing, or | |
12 | write your own simple scripts. | |
13 | ||
14 | This introductory document does not aim to be complete. It does not | |
15 | even aim to be entirely accurate. In some cases perfection has been | |
16 | sacrificed in the goal of getting the general idea across. You are | |
17 | I<strongly> advised to follow this introduction with more information | |
18 | from the full Perl manual, the table of contents to which can be found | |
19 | in L<perltoc>. | |
20 | ||
21 | Throughout this document you'll see references to other parts of the | |
22 | Perl documentation. You can read that documentation using the C<perldoc> | |
23 | command or whatever method you're using to read this document. | |
24 | ||
25 | =head2 What is Perl? | |
26 | ||
27 | Perl is a general-purpose programming language originally developed for | |
28 | text manipulation and now used for a wide range of tasks including | |
29 | system administration, web development, network programming, GUI | |
30 | development, and more. | |
31 | ||
32 | The language is intended to be practical (easy to use, efficient, | |
33 | complete) rather than beautiful (tiny, elegant, minimal). Its major | |
34 | features are that it's easy to use, supports both procedural and | |
35 | object-oriented (OO) programming, has powerful built-in support for text | |
36 | processing, and has one of the world's most impressive collections of | |
37 | third-party modules. | |
38 | ||
39 | Different definitions of Perl are given in L<perl>, L<perlfaq1> and | |
40 | no doubt other places. From this we can determine that Perl is different | |
41 | things to different people, but that lots of people think it's at least | |
42 | worth writing about. | |
43 | ||
44 | =head2 Running Perl programs | |
45 | ||
46 | To run a Perl program from the Unix command line: | |
47 | ||
48 | perl progname.pl | |
49 | ||
50 | Alternatively, put this as the first line of your script: | |
51 | ||
52 | #!/usr/bin/env perl | |
53 | ||
54 | ... and run the script as C</path/to/script.pl>. Of course, it'll need | |
55 | to be executable first, so C<chmod 755 script.pl> (under Unix). | |
56 | ||
57 | For more information, including instructions for other platforms such as | |
58 | Windows and Mac OS, read L<perlrun>. | |
59 | ||
60 | =head2 Basic syntax overview | |
61 | ||
62 | A Perl script or program consists of one or more statements. These | |
63 | statements are simply written in the script in a straightforward | |
64 | fashion. There is no need to have a C<main()> function or anything of | |
65 | that kind. | |
66 | ||
67 | Perl statements end in a semi-colon: | |
68 | ||
69 | print "Hello, world"; | |
70 | ||
71 | Comments start with a hash symbol and run to the end of the line | |
72 | ||
73 | # This is a comment | |
74 | ||
75 | Whitespace is irrelevant: | |
76 | ||
77 | ||
78 | "Hello, world" | |
79 | ; | |
80 | ||
81 | ... except inside quoted strings: | |
82 | ||
83 | # this would print with a linebreak in the middle | |
84 | print "Hello | |
85 | world"; | |
86 | ||
87 | Double quotes or single quotes may be used around literal strings: | |
88 | ||
89 | print "Hello, world"; | |
90 | print 'Hello, world'; | |
91 | ||
92 | However, only double quotes "interpolate" variables and special | |
93 | characters such as newlines (C<\n>): | |
94 | ||
95 | print "Hello, $name\n"; # works fine | |
96 | print 'Hello, $name\n'; # prints $name\n literally | |
97 | ||
98 | Numbers don't need quotes around them: | |
99 | ||
100 | print 42; | |
101 | ||
102 | You can use parentheses for functions' arguments or omit them | |
103 | according to your personal taste. They are only required | |
104 | occasionally to clarify issues of precedence. | |
105 | ||
106 | print("Hello, world\n"); | |
107 | print "Hello, world\n"; | |
108 | ||
109 | More detailed information about Perl syntax can be found in L<perlsyn>. | |
110 | ||
111 | =head2 Perl variable types | |
112 | ||
113 | Perl has three main variable types: scalars, arrays, and hashes. | |
114 | ||
115 | =over 4 | |
116 | ||
117 | =item Scalars | |
118 | ||
119 | A scalar represents a single value: | |
120 | ||
121 | my $animal = "camel"; | |
122 | my $answer = 42; | |
123 | ||
124 | Scalar values can be strings, integers or floating point numbers, and Perl | |
125 | will automatically convert between them as required. There is no need | |
126 | to pre-declare your variable types. | |
127 | ||
128 | Scalar values can be used in various ways: | |
129 | ||
130 | print $animal; | |
131 | print "The animal is $animal\n"; | |
132 | print "The square of $answer is ", $answer * $answer, "\n"; | |
133 | ||
134 | There are a number of "magic" scalars with names that look like | |
135 | punctuation or line noise. These special variables are used for all | |
136 | kinds of purposes, and are documented in L<perlvar>. The only one you | |
137 | need to know about for now is C<$_> which is the "default variable". | |
138 | It's used as the default argument to a number of functions in Perl, and | |
139 | it's set implicitly by certain looping constructs. | |
140 | ||
141 | print; # prints contents of $_ by default | |
142 | ||
143 | =item Arrays | |
144 | ||
145 | An array represents a list of values: | |
146 | ||
147 | my @animals = ("camel", "llama", "owl"); | |
148 | my @numbers = (23, 42, 69); | |
149 | my @mixed = ("camel", 42, 1.23); | |
150 | ||
151 | Arrays are zero-indexed. Here's how you get at elements in an array: | |
152 | ||
153 | print $animals[0]; # prints "camel" | |
154 | print $animals[1]; # prints "llama" | |
155 | ||
156 | The special variable C<$#array> tells you the index of the last element | |
157 | of an array: | |
158 | ||
159 | print $mixed[$#mixed]; # last element, prints 1.23 | |
160 | ||
161 | You might be tempted to use C<$#array + 1> to tell you how many items there | |
162 | are in an array. Don't bother. As it happens, using C<@array> where Perl | |
163 | expects to find a scalar value ("in scalar context") will give you the number | |
164 | of elements in the array: | |
165 | ||
166 | if (@animals < 5) { ... } | |
167 | ||
168 | The elements we're getting from the array start with a C<$> because | |
169 | we're getting just a single value out of the array -- you ask for a scalar, | |
170 | you get a scalar. | |
171 | ||
172 | To get multiple values from an array: | |
173 | ||
174 | @animals[0,1]; # gives ("camel", "llama"); | |
175 | @animals[0..2]; # gives ("camel", "llama", "owl"); | |
176 | @animals[1..$#animals]; # gives all except the first element | |
177 | ||
178 | This is called an "array slice". | |
179 | ||
180 | You can do various useful things to lists: | |
181 | ||
182 | my @sorted = sort @animals; | |
183 | my @backwards = reverse @numbers; | |
184 | ||
185 | There are a couple of special arrays too, such as C<@ARGV> (the command | |
186 | line arguments to your script) and C<@_> (the arguments passed to a | |
187 | subroutine). These are documented in L<perlvar>. | |
188 | ||
189 | =item Hashes | |
190 | ||
191 | A hash represents a set of key/value pairs: | |
192 | ||
193 | my %fruit_color = ("apple", "red", "banana", "yellow"); | |
194 | ||
195 | You can use whitespace and the C<< => >> operator to lay them out more | |
196 | nicely: | |
197 | ||
198 | my %fruit_color = ( | |
199 | apple => "red", | |
200 | banana => "yellow", | |
201 | ); | |
202 | ||
203 | To get at hash elements: | |
204 | ||
205 | $fruit_color{"apple"}; # gives "red" | |
206 | ||
207 | You can get at lists of keys and values with C<keys()> and | |
208 | C<values()>. | |
209 | ||
210 | my @fruits = keys %fruit_colors; | |
211 | my @colors = values %fruit_colors; | |
212 | ||
213 | Hashes have no particular internal order, though you can sort the keys | |
214 | and loop through them. | |
215 | ||
216 | Just like special scalars and arrays, there are also special hashes. | |
217 | The most well known of these is C<%ENV> which contains environment | |
218 | variables. Read all about it (and other special variables) in | |
219 | L<perlvar>. | |
220 | ||
221 | =back | |
222 | ||
223 | Scalars, arrays and hashes are documented more fully in L<perldata>. | |
224 | ||
225 | More complex data types can be constructed using references, which allow | |
226 | you to build lists and hashes within lists and hashes. | |
227 | ||
228 | A reference is a scalar value and can refer to any other Perl data | |
229 | type. So by storing a reference as the value of an array or hash | |
230 | element, you can easily create lists and hashes within lists and | |
231 | hashes. The following example shows a 2 level hash of hash | |
232 | structure using anonymous hash references. | |
233 | ||
234 | my $variables = { | |
235 | scalar => { | |
236 | description => "single item", | |
237 | sigil => '$', | |
238 | }, | |
239 | array => { | |
240 | description => "ordered list of items", | |
241 | sigil => '@', | |
242 | }, | |
243 | hash => { | |
244 | description => "key/value pairs", | |
245 | sigil => '%', | |
246 | }, | |
247 | }; | |
248 | ||
249 | print "Scalars begin with a $variables->{'scalar'}->{'sigil'}\n"; | |
250 | ||
251 | Exhaustive information on the topic of references can be found in | |
252 | L<perlreftut>, L<perllol>, L<perlref> and L<perldsc>. | |
253 | ||
254 | =head2 Variable scoping | |
255 | ||
256 | Throughout the previous section all the examples have used the syntax: | |
257 | ||
258 | my $var = "value"; | |
259 | ||
260 | The C<my> is actually not required; you could just use: | |
261 | ||
262 | $var = "value"; | |
263 | ||
264 | However, the above usage will create global variables throughout your | |
265 | program, which is bad programming practice. C<my> creates lexically | |
266 | scoped variables instead. The variables are scoped to the block | |
267 | (i.e. a bunch of statements surrounded by curly-braces) in which they | |
268 | are defined. | |
269 | ||
270 | my $a = "foo"; | |
271 | if ($some_condition) { | |
272 | my $b = "bar"; | |
273 | print $a; # prints "foo" | |
274 | print $b; # prints "bar" | |
275 | } | |
276 | print $a; # prints "foo" | |
277 | print $b; # prints nothing; $b has fallen out of scope | |
278 | ||
279 | Using C<my> in combination with a C<use strict;> at the top of | |
280 | your Perl scripts means that the interpreter will pick up certain common | |
281 | programming errors. For instance, in the example above, the final | |
282 | C<print $b> would cause a compile-time error and prevent you from | |
283 | running the program. Using C<strict> is highly recommended. | |
284 | ||
285 | =head2 Conditional and looping constructs | |
286 | ||
287 | Perl has most of the usual conditional and looping constructs except for | |
288 | case/switch (but if you really want it, there is a Switch module in Perl | |
289 | 5.8 and newer, and on CPAN. See the section on modules, below, for more | |
290 | information about modules and CPAN). | |
291 | ||
292 | The conditions can be any Perl expression. See the list of operators in | |
293 | the next section for information on comparison and boolean logic operators, | |
294 | which are commonly used in conditional statements. | |
295 | ||
296 | =over 4 | |
297 | ||
298 | =item if | |
299 | ||
300 | if ( condition ) { | |
301 | ... | |
302 | } elsif ( other condition ) { | |
303 | ... | |
304 | } else { | |
305 | ... | |
306 | } | |
307 | ||
308 | There's also a negated version of it: | |
309 | ||
310 | unless ( condition ) { | |
311 | ... | |
312 | } | |
313 | ||
314 | This is provided as a more readable version of C<if (!I<condition>)>. | |
315 | ||
316 | Note that the braces are required in Perl, even if you've only got one | |
317 | line in the block. However, there is a clever way of making your one-line | |
318 | conditional blocks more English like: | |
319 | ||
320 | # the traditional way | |
321 | if ($zippy) { | |
322 | print "Yow!"; | |
323 | } | |
324 | ||
325 | # the Perlish post-condition way | |
326 | print "Yow!" if $zippy; | |
327 | print "We have no bananas" unless $bananas; | |
328 | ||
329 | =item while | |
330 | ||
331 | while ( condition ) { | |
332 | ... | |
333 | } | |
334 | ||
335 | There's also a negated version, for the same reason we have C<unless>: | |
336 | ||
337 | until ( condition ) { | |
338 | ... | |
339 | } | |
340 | ||
341 | You can also use C<while> in a post-condition: | |
342 | ||
343 | print "LA LA LA\n" while 1; # loops forever | |
344 | ||
345 | =item for | |
346 | ||
347 | Exactly like C: | |
348 | ||
349 | for ($i=0; $i <= $max; $i++) { | |
350 | ... | |
351 | } | |
352 | ||
353 | The C style for loop is rarely needed in Perl since Perl provides | |
354 | the more friendly list scanning C<foreach> loop. | |
355 | ||
356 | =item foreach | |
357 | ||
358 | foreach (@array) { | |
359 | print "This element is $_\n"; | |
360 | } | |
361 | ||
362 | # you don't have to use the default $_ either... | |
363 | foreach my $key (keys %hash) { | |
364 | print "The value of $key is $hash{$key}\n"; | |
365 | } | |
366 | ||
367 | =back | |
368 | ||
369 | For more detail on looping constructs (and some that weren't mentioned in | |
370 | this overview) see L<perlsyn>. | |
371 | ||
372 | =head2 Builtin operators and functions | |
373 | ||
374 | Perl comes with a wide selection of builtin functions. Some of the ones | |
375 | we've already seen include C<print>, C<sort> and C<reverse>. A list of | |
376 | them is given at the start of L<perlfunc> and you can easily read | |
377 | about any given function by using C<perldoc -f I<functionname>>. | |
378 | ||
379 | Perl operators are documented in full in L<perlop>, but here are a few | |
380 | of the most common ones: | |
381 | ||
382 | =over 4 | |
383 | ||
384 | =item Arithmetic | |
385 | ||
386 | + addition | |
387 | - subtraction | |
388 | * multiplication | |
389 | / division | |
390 | ||
391 | =item Numeric comparison | |
392 | ||
393 | == equality | |
394 | != inequality | |
395 | < less than | |
396 | > greater than | |
397 | <= less than or equal | |
398 | >= greater than or equal | |
399 | ||
400 | =item String comparison | |
401 | ||
402 | eq equality | |
403 | ne inequality | |
404 | lt less than | |
405 | gt greater than | |
406 | le less than or equal | |
407 | ge greater than or equal | |
408 | ||
409 | (Why do we have separate numeric and string comparisons? Because we don't | |
410 | have special variable types, and Perl needs to know whether to sort | |
411 | numerically (where 99 is less than 100) or alphabetically (where 100 comes | |
412 | before 99). | |
413 | ||
414 | =item Boolean logic | |
415 | ||
416 | && and | |
417 | || or | |
418 | ! not | |
419 | ||
420 | (C<and>, C<or> and C<not> aren't just in the above table as descriptions | |
421 | of the operators -- they're also supported as operators in their own | |
422 | right. They're more readable than the C-style operators, but have | |
423 | different precedence to C<&&> and friends. Check L<perlop> for more | |
424 | detail.) | |
425 | ||
426 | =item Miscellaneous | |
427 | ||
428 | = assignment | |
429 | . string concatenation | |
430 | x string multiplication | |
431 | .. range operator (creates a list of numbers) | |
432 | ||
433 | =back | |
434 | ||
435 | Many operators can be combined with a C<=> as follows: | |
436 | ||
437 | $a += 1; # same as $a = $a + 1 | |
438 | $a -= 1; # same as $a = $a - 1 | |
439 | $a .= "\n"; # same as $a = $a . "\n"; | |
440 | ||
441 | =head2 Files and I/O | |
442 | ||
443 | You can open a file for input or output using the C<open()> function. | |
444 | It's documented in extravagant detail in L<perlfunc> and L<perlopentut>, | |
445 | but in short: | |
446 | ||
447 | open(INFILE, "input.txt") or die "Can't open input.txt: $!"; | |
448 | open(OUTFILE, ">output.txt") or die "Can't open output.txt: $!"; | |
449 | open(LOGFILE, ">>my.log") or die "Can't open logfile: $!"; | |
450 | ||
451 | You can read from an open filehandle using the C<< <> >> operator. In | |
452 | scalar context it reads a single line from the filehandle, and in list | |
453 | context it reads the whole file in, assigning each line to an element of | |
454 | the list: | |
455 | ||
456 | my $line = <INFILE>; | |
457 | my @lines = <INFILE>; | |
458 | ||
459 | Reading in the whole file at one time is called slurping. It can | |
460 | be useful but it may be a memory hog. Most text file processing | |
461 | can be done a line at a time with Perl's looping constructs. | |
462 | ||
463 | The C<< <> >> operator is most often seen in a C<while> loop: | |
464 | ||
465 | while (<INFILE>) { # assigns each line in turn to $_ | |
466 | print "Just read in this line: $_"; | |
467 | } | |
468 | ||
469 | We've already seen how to print to standard output using C<print()>. | |
470 | However, C<print()> can also take an optional first argument specifying | |
471 | which filehandle to print to: | |
472 | ||
473 | print STDERR "This is your final warning.\n"; | |
474 | print OUTFILE $record; | |
475 | print LOGFILE $logmessage; | |
476 | ||
477 | When you're done with your filehandles, you should C<close()> them | |
478 | (though to be honest, Perl will clean up after you if you forget): | |
479 | ||
480 | close INFILE; | |
481 | ||
482 | =head2 Regular expressions | |
483 | ||
484 | Perl's regular expression support is both broad and deep, and is the | |
485 | subject of lengthy documentation in L<perlrequick>, L<perlretut>, and | |
486 | elsewhere. However, in short: | |
487 | ||
488 | =over 4 | |
489 | ||
490 | =item Simple matching | |
491 | ||
492 | if (/foo/) { ... } # true if $_ contains "foo" | |
493 | if ($a =~ /foo/) { ... } # true if $a contains "foo" | |
494 | ||
495 | The C<//> matching operator is documented in L<perlop>. It operates on | |
496 | C<$_> by default, or can be bound to another variable using the C<=~> | |
497 | binding operator (also documented in L<perlop>). | |
498 | ||
499 | =item Simple substitution | |
500 | ||
501 | s/foo/bar/; # replaces foo with bar in $_ | |
502 | $a =~ s/foo/bar/; # replaces foo with bar in $a | |
503 | $a =~ s/foo/bar/g; # replaces ALL INSTANCES of foo with bar in $a | |
504 | ||
505 | The C<s///> substitution operator is documented in L<perlop>. | |
506 | ||
507 | =item More complex regular expressions | |
508 | ||
509 | You don't just have to match on fixed strings. In fact, you can match | |
510 | on just about anything you could dream of by using more complex regular | |
511 | expressions. These are documented at great length in L<perlre>, but for | |
512 | the meantime, here's a quick cheat sheet: | |
513 | ||
514 | . a single character | |
515 | \s a whitespace character (space, tab, newline) | |
516 | \S non-whitespace character | |
517 | \d a digit (0-9) | |
518 | \D a non-digit | |
519 | \w a word character (a-z, A-Z, 0-9, _) | |
520 | \W a non-word character | |
521 | [aeiou] matches a single character in the given set | |
522 | [^aeiou] matches a single character outside the given set | |
523 | (foo|bar|baz) matches any of the alternatives specified | |
524 | ||
525 | ^ start of string | |
526 | $ end of string | |
527 | ||
528 | Quantifiers can be used to specify how many of the previous thing you | |
529 | want to match on, where "thing" means either a literal character, one | |
530 | of the metacharacters listed above, or a group of characters or | |
531 | metacharacters in parentheses. | |
532 | ||
533 | * zero or more of the previous thing | |
534 | + one or more of the previous thing | |
535 | ? zero or one of the previous thing | |
536 | {3} matches exactly 3 of the previous thing | |
537 | {3,6} matches between 3 and 6 of the previous thing | |
538 | {3,} matches 3 or more of the previous thing | |
539 | ||
540 | Some brief examples: | |
541 | ||
542 | /^\d+/ string starts with one or more digits | |
543 | /^$/ nothing in the string (start and end are adjacent) | |
544 | /(\d\s){3}/ a three digits, each followed by a whitespace | |
545 | character (eg "3 4 5 ") | |
546 | /(a.)+/ matches a string in which every odd-numbered letter | |
547 | is a (eg "abacadaf") | |
548 | ||
549 | # This loop reads from STDIN, and prints non-blank lines: | |
550 | while (<>) { | |
551 | next if /^$/; | |
552 | print; | |
553 | } | |
554 | ||
555 | =item Parentheses for capturing | |
556 | ||
557 | As well as grouping, parentheses serve a second purpose. They can be | |
558 | used to capture the results of parts of the regexp match for later use. | |
559 | The results end up in C<$1>, C<$2> and so on. | |
560 | ||
561 | # a cheap and nasty way to break an email address up into parts | |
562 | ||
563 | if ($email =~ /([^@]+)@(.+)/) { | |
564 | print "Username is $1\n"; | |
565 | print "Hostname is $2\n"; | |
566 | } | |
567 | ||
568 | =item Other regexp features | |
569 | ||
570 | Perl regexps also support backreferences, lookaheads, and all kinds of | |
571 | other complex details. Read all about them in L<perlrequick>, | |
572 | L<perlretut>, and L<perlre>. | |
573 | ||
574 | =back | |
575 | ||
576 | =head2 Writing subroutines | |
577 | ||
578 | Writing subroutines is easy: | |
579 | ||
580 | sub log { | |
581 | my $logmessage = shift; | |
582 | print LOGFILE $logmessage; | |
583 | } | |
584 | ||
585 | What's that C<shift>? Well, the arguments to a subroutine are available | |
586 | to us as a special array called C<@_> (see L<perlvar> for more on that). | |
587 | The default argument to the C<shift> function just happens to be C<@_>. | |
588 | So C<my $logmessage = shift;> shifts the first item off the list of | |
589 | arguments and assigns it to C<$logmessage>. | |
590 | ||
591 | We can manipulate C<@_> in other ways too: | |
592 | ||
593 | my ($logmessage, $priority) = @_; # common | |
594 | my $logmessage = $_[0]; # uncommon, and ugly | |
595 | ||
596 | Subroutines can also return values: | |
597 | ||
598 | sub square { | |
599 | my $num = shift; | |
600 | my $result = $num * $num; | |
601 | return $result; | |
602 | } | |
603 | ||
604 | For more information on writing subroutines, see L<perlsub>. | |
605 | ||
606 | =head2 OO Perl | |
607 | ||
608 | OO Perl is relatively simple and is implemented using references which | |
609 | know what sort of object they are based on Perl's concept of packages. | |
610 | However, OO Perl is largely beyond the scope of this document. | |
611 | Read L<perlboot>, L<perltoot>, L<perltooc> and L<perlobj>. | |
612 | ||
613 | As a beginning Perl programmer, your most common use of OO Perl will be | |
614 | in using third-party modules, which are documented below. | |
615 | ||
616 | =head2 Using Perl modules | |
617 | ||
618 | Perl modules provide a range of features to help you avoid reinventing | |
619 | the wheel, and can be downloaded from CPAN ( http://www.cpan.org/ ). A | |
620 | number of popular modules are included with the Perl distribution | |
621 | itself. | |
622 | ||
623 | Categories of modules range from text manipulation to network protocols | |
624 | to database integration to graphics. A categorized list of modules is | |
625 | also available from CPAN. | |
626 | ||
627 | To learn how to install modules you download from CPAN, read | |
628 | L<perlmodinstall> | |
629 | ||
630 | To learn how to use a particular module, use C<perldoc I<Module::Name>>. | |
631 | Typically you will want to C<use I<Module::Name>>, which will then give | |
632 | you access to exported functions or an OO interface to the module. | |
633 | ||
634 | L<perlfaq> contains questions and answers related to many common | |
635 | tasks, and often provides suggestions for good CPAN modules to use. | |
636 | ||
637 | L<perlmod> describes Perl modules in general. L<perlmodlib> lists the | |
638 | modules which came with your Perl installation. | |
639 | ||
640 | If you feel the urge to write Perl modules, L<perlnewmod> will give you | |
641 | good advice. | |
642 | ||
643 | =head1 AUTHOR | |
644 | ||
645 | Kirrily "Skud" Robert <skud@cpan.org> |