Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | =head1 NAME |
2 | X<subroutine> X<function> | |
3 | ||
4 | perlsub - Perl subroutines | |
5 | ||
6 | =head1 SYNOPSIS | |
7 | ||
8 | To declare subroutines: | |
9 | X<subroutine, declaration> X<sub> | |
10 | ||
11 | sub NAME; # A "forward" declaration. | |
12 | sub NAME(PROTO); # ditto, but with prototypes | |
13 | sub NAME : ATTRS; # with attributes | |
14 | sub NAME(PROTO) : ATTRS; # with attributes and prototypes | |
15 | ||
16 | sub NAME BLOCK # A declaration and a definition. | |
17 | sub NAME(PROTO) BLOCK # ditto, but with prototypes | |
18 | sub NAME : ATTRS BLOCK # with attributes | |
19 | sub NAME(PROTO) : ATTRS BLOCK # with prototypes and attributes | |
20 | ||
21 | To define an anonymous subroutine at runtime: | |
22 | X<subroutine, anonymous> | |
23 | ||
24 | $subref = sub BLOCK; # no proto | |
25 | $subref = sub (PROTO) BLOCK; # with proto | |
26 | $subref = sub : ATTRS BLOCK; # with attributes | |
27 | $subref = sub (PROTO) : ATTRS BLOCK; # with proto and attributes | |
28 | ||
29 | To import subroutines: | |
30 | X<import> | |
31 | ||
32 | use MODULE qw(NAME1 NAME2 NAME3); | |
33 | ||
34 | To call subroutines: | |
35 | X<subroutine, call> X<call> | |
36 | ||
37 | NAME(LIST); # & is optional with parentheses. | |
38 | NAME LIST; # Parentheses optional if predeclared/imported. | |
39 | &NAME(LIST); # Circumvent prototypes. | |
40 | &NAME; # Makes current @_ visible to called subroutine. | |
41 | ||
42 | =head1 DESCRIPTION | |
43 | ||
44 | Like many languages, Perl provides for user-defined subroutines. | |
45 | These may be located anywhere in the main program, loaded in from | |
46 | other files via the C<do>, C<require>, or C<use> keywords, or | |
47 | generated on the fly using C<eval> or anonymous subroutines. | |
48 | You can even call a function indirectly using a variable containing | |
49 | its name or a CODE reference. | |
50 | ||
51 | The Perl model for function call and return values is simple: all | |
52 | functions are passed as parameters one single flat list of scalars, and | |
53 | all functions likewise return to their caller one single flat list of | |
54 | scalars. Any arrays or hashes in these call and return lists will | |
55 | collapse, losing their identities--but you may always use | |
56 | pass-by-reference instead to avoid this. Both call and return lists may | |
57 | contain as many or as few scalar elements as you'd like. (Often a | |
58 | function without an explicit return statement is called a subroutine, but | |
59 | there's really no difference from Perl's perspective.) | |
60 | X<subroutine, parameter> X<parameter> | |
61 | ||
62 | Any arguments passed in show up in the array C<@_>. Therefore, if | |
63 | you called a function with two arguments, those would be stored in | |
64 | C<$_[0]> and C<$_[1]>. The array C<@_> is a local array, but its | |
65 | elements are aliases for the actual scalar parameters. In particular, | |
66 | if an element C<$_[0]> is updated, the corresponding argument is | |
67 | updated (or an error occurs if it is not updatable). If an argument | |
68 | is an array or hash element which did not exist when the function | |
69 | was called, that element is created only when (and if) it is modified | |
70 | or a reference to it is taken. (Some earlier versions of Perl | |
71 | created the element whether or not the element was assigned to.) | |
72 | Assigning to the whole array C<@_> removes that aliasing, and does | |
73 | not update any arguments. | |
74 | X<subroutine, argument> X<argument> X<@_> | |
75 | ||
76 | A C<return> statement may be used to exit a subroutine, optionally | |
77 | specifying the returned value, which will be evaluated in the | |
78 | appropriate context (list, scalar, or void) depending on the context of | |
79 | the subroutine call. If you specify no return value, the subroutine | |
80 | returns an empty list in list context, the undefined value in scalar | |
81 | context, or nothing in void context. If you return one or more | |
82 | aggregates (arrays and hashes), these will be flattened together into | |
83 | one large indistinguishable list. | |
84 | ||
85 | If no C<return> is found and if the last statement is an expression, its | |
86 | value is returned. If the last statement is a loop control structure | |
87 | like a C<foreach> or a C<while>, the returned value is unspecified. The | |
88 | empty sub returns the empty list. | |
89 | X<subroutine, return value> X<return value> X<return> | |
90 | ||
91 | Perl does not have named formal parameters. In practice all you | |
92 | do is assign to a C<my()> list of these. Variables that aren't | |
93 | declared to be private are global variables. For gory details | |
94 | on creating private variables, see L<"Private Variables via my()"> | |
95 | and L<"Temporary Values via local()">. To create protected | |
96 | environments for a set of functions in a separate package (and | |
97 | probably a separate file), see L<perlmod/"Packages">. | |
98 | X<formal parameter> X<parameter, formal> | |
99 | ||
100 | Example: | |
101 | ||
102 | sub max { | |
103 | my $max = shift(@_); | |
104 | foreach $foo (@_) { | |
105 | $max = $foo if $max < $foo; | |
106 | } | |
107 | return $max; | |
108 | } | |
109 | $bestday = max($mon,$tue,$wed,$thu,$fri); | |
110 | ||
111 | Example: | |
112 | ||
113 | # get a line, combining continuation lines | |
114 | # that start with whitespace | |
115 | ||
116 | sub get_line { | |
117 | $thisline = $lookahead; # global variables! | |
118 | LINE: while (defined($lookahead = <STDIN>)) { | |
119 | if ($lookahead =~ /^[ \t]/) { | |
120 | $thisline .= $lookahead; | |
121 | } | |
122 | else { | |
123 | last LINE; | |
124 | } | |
125 | } | |
126 | return $thisline; | |
127 | } | |
128 | ||
129 | $lookahead = <STDIN>; # get first line | |
130 | while (defined($line = get_line())) { | |
131 | ... | |
132 | } | |
133 | ||
134 | Assigning to a list of private variables to name your arguments: | |
135 | ||
136 | sub maybeset { | |
137 | my($key, $value) = @_; | |
138 | $Foo{$key} = $value unless $Foo{$key}; | |
139 | } | |
140 | ||
141 | Because the assignment copies the values, this also has the effect | |
142 | of turning call-by-reference into call-by-value. Otherwise a | |
143 | function is free to do in-place modifications of C<@_> and change | |
144 | its caller's values. | |
145 | X<call-by-reference> X<call-by-value> | |
146 | ||
147 | upcase_in($v1, $v2); # this changes $v1 and $v2 | |
148 | sub upcase_in { | |
149 | for (@_) { tr/a-z/A-Z/ } | |
150 | } | |
151 | ||
152 | You aren't allowed to modify constants in this way, of course. If an | |
153 | argument were actually literal and you tried to change it, you'd take a | |
154 | (presumably fatal) exception. For example, this won't work: | |
155 | X<call-by-reference> X<call-by-value> | |
156 | ||
157 | upcase_in("frederick"); | |
158 | ||
159 | It would be much safer if the C<upcase_in()> function | |
160 | were written to return a copy of its parameters instead | |
161 | of changing them in place: | |
162 | ||
163 | ($v3, $v4) = upcase($v1, $v2); # this doesn't change $v1 and $v2 | |
164 | sub upcase { | |
165 | return unless defined wantarray; # void context, do nothing | |
166 | my @parms = @_; | |
167 | for (@parms) { tr/a-z/A-Z/ } | |
168 | return wantarray ? @parms : $parms[0]; | |
169 | } | |
170 | ||
171 | Notice how this (unprototyped) function doesn't care whether it was | |
172 | passed real scalars or arrays. Perl sees all arguments as one big, | |
173 | long, flat parameter list in C<@_>. This is one area where | |
174 | Perl's simple argument-passing style shines. The C<upcase()> | |
175 | function would work perfectly well without changing the C<upcase()> | |
176 | definition even if we fed it things like this: | |
177 | ||
178 | @newlist = upcase(@list1, @list2); | |
179 | @newlist = upcase( split /:/, $var ); | |
180 | ||
181 | Do not, however, be tempted to do this: | |
182 | ||
183 | (@a, @b) = upcase(@list1, @list2); | |
184 | ||
185 | Like the flattened incoming parameter list, the return list is also | |
186 | flattened on return. So all you have managed to do here is stored | |
187 | everything in C<@a> and made C<@b> empty. See | |
188 | L<Pass by Reference> for alternatives. | |
189 | ||
190 | A subroutine may be called using an explicit C<&> prefix. The | |
191 | C<&> is optional in modern Perl, as are parentheses if the | |
192 | subroutine has been predeclared. The C<&> is I<not> optional | |
193 | when just naming the subroutine, such as when it's used as | |
194 | an argument to defined() or undef(). Nor is it optional when you | |
195 | want to do an indirect subroutine call with a subroutine name or | |
196 | reference using the C<&$subref()> or C<&{$subref}()> constructs, | |
197 | although the C<< $subref->() >> notation solves that problem. | |
198 | See L<perlref> for more about all that. | |
199 | X<&> | |
200 | ||
201 | Subroutines may be called recursively. If a subroutine is called | |
202 | using the C<&> form, the argument list is optional, and if omitted, | |
203 | no C<@_> array is set up for the subroutine: the C<@_> array at the | |
204 | time of the call is visible to subroutine instead. This is an | |
205 | efficiency mechanism that new users may wish to avoid. | |
206 | X<recursion> | |
207 | ||
208 | &foo(1,2,3); # pass three arguments | |
209 | foo(1,2,3); # the same | |
210 | ||
211 | foo(); # pass a null list | |
212 | &foo(); # the same | |
213 | ||
214 | &foo; # foo() get current args, like foo(@_) !! | |
215 | foo; # like foo() IFF sub foo predeclared, else "foo" | |
216 | ||
217 | Not only does the C<&> form make the argument list optional, it also | |
218 | disables any prototype checking on arguments you do provide. This | |
219 | is partly for historical reasons, and partly for having a convenient way | |
220 | to cheat if you know what you're doing. See L<Prototypes> below. | |
221 | X<&> | |
222 | ||
223 | Subroutines whose names are in all upper case are reserved to the Perl | |
224 | core, as are modules whose names are in all lower case. A subroutine in | |
225 | all capitals is a loosely-held convention meaning it will be called | |
226 | indirectly by the run-time system itself, usually due to a triggered event. | |
227 | Subroutines that do special, pre-defined things include C<AUTOLOAD>, C<CLONE>, | |
228 | C<DESTROY> plus all functions mentioned in L<perltie> and L<PerlIO::via>. | |
229 | ||
230 | The C<BEGIN>, C<CHECK>, C<INIT> and C<END> subroutines are not so much | |
231 | subroutines as named special code blocks, of which you can have more | |
232 | than one in a package, and which you can B<not> call explicitly. See | |
233 | L<perlmod/"BEGIN, CHECK, INIT and END"> | |
234 | ||
235 | =head2 Private Variables via my() | |
236 | X<my> X<variable, lexical> X<lexical> X<lexical variable> X<scope, lexical> | |
237 | X<lexical scope> X<attributes, my> | |
238 | ||
239 | Synopsis: | |
240 | ||
241 | my $foo; # declare $foo lexically local | |
242 | my (@wid, %get); # declare list of variables local | |
243 | my $foo = "flurp"; # declare $foo lexical, and init it | |
244 | my @oof = @bar; # declare @oof lexical, and init it | |
245 | my $x : Foo = $y; # similar, with an attribute applied | |
246 | ||
247 | B<WARNING>: The use of attribute lists on C<my> declarations is still | |
248 | evolving. The current semantics and interface are subject to change. | |
249 | See L<attributes> and L<Attribute::Handlers>. | |
250 | ||
251 | The C<my> operator declares the listed variables to be lexically | |
252 | confined to the enclosing block, conditional (C<if/unless/elsif/else>), | |
253 | loop (C<for/foreach/while/until/continue>), subroutine, C<eval>, | |
254 | or C<do/require/use>'d file. If more than one value is listed, the | |
255 | list must be placed in parentheses. All listed elements must be | |
256 | legal lvalues. Only alphanumeric identifiers may be lexically | |
257 | scoped--magical built-ins like C<$/> must currently be C<local>ized | |
258 | with C<local> instead. | |
259 | ||
260 | Unlike dynamic variables created by the C<local> operator, lexical | |
261 | variables declared with C<my> are totally hidden from the outside | |
262 | world, including any called subroutines. This is true if it's the | |
263 | same subroutine called from itself or elsewhere--every call gets | |
264 | its own copy. | |
265 | X<local> | |
266 | ||
267 | This doesn't mean that a C<my> variable declared in a statically | |
268 | enclosing lexical scope would be invisible. Only dynamic scopes | |
269 | are cut off. For example, the C<bumpx()> function below has access | |
270 | to the lexical $x variable because both the C<my> and the C<sub> | |
271 | occurred at the same scope, presumably file scope. | |
272 | ||
273 | my $x = 10; | |
274 | sub bumpx { $x++ } | |
275 | ||
276 | An C<eval()>, however, can see lexical variables of the scope it is | |
277 | being evaluated in, so long as the names aren't hidden by declarations within | |
278 | the C<eval()> itself. See L<perlref>. | |
279 | X<eval, scope of> | |
280 | ||
281 | The parameter list to my() may be assigned to if desired, which allows you | |
282 | to initialize your variables. (If no initializer is given for a | |
283 | particular variable, it is created with the undefined value.) Commonly | |
284 | this is used to name input parameters to a subroutine. Examples: | |
285 | ||
286 | $arg = "fred"; # "global" variable | |
287 | $n = cube_root(27); | |
288 | print "$arg thinks the root is $n\n"; | |
289 | fred thinks the root is 3 | |
290 | ||
291 | sub cube_root { | |
292 | my $arg = shift; # name doesn't matter | |
293 | $arg **= 1/3; | |
294 | return $arg; | |
295 | } | |
296 | ||
297 | The C<my> is simply a modifier on something you might assign to. So when | |
298 | you do assign to variables in its argument list, C<my> doesn't | |
299 | change whether those variables are viewed as a scalar or an array. So | |
300 | ||
301 | my ($foo) = <STDIN>; # WRONG? | |
302 | my @FOO = <STDIN>; | |
303 | ||
304 | both supply a list context to the right-hand side, while | |
305 | ||
306 | my $foo = <STDIN>; | |
307 | ||
308 | supplies a scalar context. But the following declares only one variable: | |
309 | ||
310 | my $foo, $bar = 1; # WRONG | |
311 | ||
312 | That has the same effect as | |
313 | ||
314 | my $foo; | |
315 | $bar = 1; | |
316 | ||
317 | The declared variable is not introduced (is not visible) until after | |
318 | the current statement. Thus, | |
319 | ||
320 | my $x = $x; | |
321 | ||
322 | can be used to initialize a new $x with the value of the old $x, and | |
323 | the expression | |
324 | ||
325 | my $x = 123 and $x == 123 | |
326 | ||
327 | is false unless the old $x happened to have the value C<123>. | |
328 | ||
329 | Lexical scopes of control structures are not bounded precisely by the | |
330 | braces that delimit their controlled blocks; control expressions are | |
331 | part of that scope, too. Thus in the loop | |
332 | ||
333 | while (my $line = <>) { | |
334 | $line = lc $line; | |
335 | } continue { | |
336 | print $line; | |
337 | } | |
338 | ||
339 | the scope of $line extends from its declaration throughout the rest of | |
340 | the loop construct (including the C<continue> clause), but not beyond | |
341 | it. Similarly, in the conditional | |
342 | ||
343 | if ((my $answer = <STDIN>) =~ /^yes$/i) { | |
344 | user_agrees(); | |
345 | } elsif ($answer =~ /^no$/i) { | |
346 | user_disagrees(); | |
347 | } else { | |
348 | chomp $answer; | |
349 | die "'$answer' is neither 'yes' nor 'no'"; | |
350 | } | |
351 | ||
352 | the scope of $answer extends from its declaration through the rest | |
353 | of that conditional, including any C<elsif> and C<else> clauses, | |
354 | but not beyond it. See L<perlsyn/"Simple statements"> for information | |
355 | on the scope of variables in statements with modifiers. | |
356 | ||
357 | The C<foreach> loop defaults to scoping its index variable dynamically | |
358 | in the manner of C<local>. However, if the index variable is | |
359 | prefixed with the keyword C<my>, or if there is already a lexical | |
360 | by that name in scope, then a new lexical is created instead. Thus | |
361 | in the loop | |
362 | X<foreach> X<for> | |
363 | ||
364 | for my $i (1, 2, 3) { | |
365 | some_function(); | |
366 | } | |
367 | ||
368 | the scope of $i extends to the end of the loop, but not beyond it, | |
369 | rendering the value of $i inaccessible within C<some_function()>. | |
370 | X<foreach> X<for> | |
371 | ||
372 | Some users may wish to encourage the use of lexically scoped variables. | |
373 | As an aid to catching implicit uses to package variables, | |
374 | which are always global, if you say | |
375 | ||
376 | use strict 'vars'; | |
377 | ||
378 | then any variable mentioned from there to the end of the enclosing | |
379 | block must either refer to a lexical variable, be predeclared via | |
380 | C<our> or C<use vars>, or else must be fully qualified with the package name. | |
381 | A compilation error results otherwise. An inner block may countermand | |
382 | this with C<no strict 'vars'>. | |
383 | ||
384 | A C<my> has both a compile-time and a run-time effect. At compile | |
385 | time, the compiler takes notice of it. The principal usefulness | |
386 | of this is to quiet C<use strict 'vars'>, but it is also essential | |
387 | for generation of closures as detailed in L<perlref>. Actual | |
388 | initialization is delayed until run time, though, so it gets executed | |
389 | at the appropriate time, such as each time through a loop, for | |
390 | example. | |
391 | ||
392 | Variables declared with C<my> are not part of any package and are therefore | |
393 | never fully qualified with the package name. In particular, you're not | |
394 | allowed to try to make a package variable (or other global) lexical: | |
395 | ||
396 | my $pack::var; # ERROR! Illegal syntax | |
397 | my $_; # also illegal (currently) | |
398 | ||
399 | In fact, a dynamic variable (also known as package or global variables) | |
400 | are still accessible using the fully qualified C<::> notation even while a | |
401 | lexical of the same name is also visible: | |
402 | ||
403 | package main; | |
404 | local $x = 10; | |
405 | my $x = 20; | |
406 | print "$x and $::x\n"; | |
407 | ||
408 | That will print out C<20> and C<10>. | |
409 | ||
410 | You may declare C<my> variables at the outermost scope of a file | |
411 | to hide any such identifiers from the world outside that file. This | |
412 | is similar in spirit to C's static variables when they are used at | |
413 | the file level. To do this with a subroutine requires the use of | |
414 | a closure (an anonymous function that accesses enclosing lexicals). | |
415 | If you want to create a private subroutine that cannot be called | |
416 | from outside that block, it can declare a lexical variable containing | |
417 | an anonymous sub reference: | |
418 | ||
419 | my $secret_version = '1.001-beta'; | |
420 | my $secret_sub = sub { print $secret_version }; | |
421 | &$secret_sub(); | |
422 | ||
423 | As long as the reference is never returned by any function within the | |
424 | module, no outside module can see the subroutine, because its name is not in | |
425 | any package's symbol table. Remember that it's not I<REALLY> called | |
426 | C<$some_pack::secret_version> or anything; it's just $secret_version, | |
427 | unqualified and unqualifiable. | |
428 | ||
429 | This does not work with object methods, however; all object methods | |
430 | have to be in the symbol table of some package to be found. See | |
431 | L<perlref/"Function Templates"> for something of a work-around to | |
432 | this. | |
433 | ||
434 | =head2 Persistent Private Variables | |
435 | X<static> X<variable, persistent> X<variable, static> X<closure> | |
436 | ||
437 | Just because a lexical variable is lexically (also called statically) | |
438 | scoped to its enclosing block, C<eval>, or C<do> FILE, this doesn't mean that | |
439 | within a function it works like a C static. It normally works more | |
440 | like a C auto, but with implicit garbage collection. | |
441 | ||
442 | Unlike local variables in C or C++, Perl's lexical variables don't | |
443 | necessarily get recycled just because their scope has exited. | |
444 | If something more permanent is still aware of the lexical, it will | |
445 | stick around. So long as something else references a lexical, that | |
446 | lexical won't be freed--which is as it should be. You wouldn't want | |
447 | memory being free until you were done using it, or kept around once you | |
448 | were done. Automatic garbage collection takes care of this for you. | |
449 | ||
450 | This means that you can pass back or save away references to lexical | |
451 | variables, whereas to return a pointer to a C auto is a grave error. | |
452 | It also gives us a way to simulate C's function statics. Here's a | |
453 | mechanism for giving a function private variables with both lexical | |
454 | scoping and a static lifetime. If you do want to create something like | |
455 | C's static variables, just enclose the whole function in an extra block, | |
456 | and put the static variable outside the function but in the block. | |
457 | ||
458 | { | |
459 | my $secret_val = 0; | |
460 | sub gimme_another { | |
461 | return ++$secret_val; | |
462 | } | |
463 | } | |
464 | # $secret_val now becomes unreachable by the outside | |
465 | # world, but retains its value between calls to gimme_another | |
466 | ||
467 | If this function is being sourced in from a separate file | |
468 | via C<require> or C<use>, then this is probably just fine. If it's | |
469 | all in the main program, you'll need to arrange for the C<my> | |
470 | to be executed early, either by putting the whole block above | |
471 | your main program, or more likely, placing merely a C<BEGIN> | |
472 | code block around it to make sure it gets executed before your program | |
473 | starts to run: | |
474 | ||
475 | BEGIN { | |
476 | my $secret_val = 0; | |
477 | sub gimme_another { | |
478 | return ++$secret_val; | |
479 | } | |
480 | } | |
481 | ||
482 | See L<perlmod/"BEGIN, CHECK, INIT and END"> about the | |
483 | special triggered code blocks, C<BEGIN>, C<CHECK>, C<INIT> and C<END>. | |
484 | ||
485 | If declared at the outermost scope (the file scope), then lexicals | |
486 | work somewhat like C's file statics. They are available to all | |
487 | functions in that same file declared below them, but are inaccessible | |
488 | from outside that file. This strategy is sometimes used in modules | |
489 | to create private variables that the whole module can see. | |
490 | ||
491 | =head2 Temporary Values via local() | |
492 | X<local> X<scope, dynamic> X<dynamic scope> X<variable, local> | |
493 | X<variable, temporary> | |
494 | ||
495 | B<WARNING>: In general, you should be using C<my> instead of C<local>, because | |
496 | it's faster and safer. Exceptions to this include the global punctuation | |
497 | variables, global filehandles and formats, and direct manipulation of the | |
498 | Perl symbol table itself. C<local> is mostly used when the current value | |
499 | of a variable must be visible to called subroutines. | |
500 | ||
501 | Synopsis: | |
502 | ||
503 | # localization of values | |
504 | ||
505 | local $foo; # make $foo dynamically local | |
506 | local (@wid, %get); # make list of variables local | |
507 | local $foo = "flurp"; # make $foo dynamic, and init it | |
508 | local @oof = @bar; # make @oof dynamic, and init it | |
509 | ||
510 | local $hash{key} = "val"; # sets a local value for this hash entry | |
511 | local ($cond ? $v1 : $v2); # several types of lvalues support | |
512 | # localization | |
513 | ||
514 | # localization of symbols | |
515 | ||
516 | local *FH; # localize $FH, @FH, %FH, &FH ... | |
517 | local *merlyn = *randal; # now $merlyn is really $randal, plus | |
518 | # @merlyn is really @randal, etc | |
519 | local *merlyn = 'randal'; # SAME THING: promote 'randal' to *randal | |
520 | local *merlyn = \$randal; # just alias $merlyn, not @merlyn etc | |
521 | ||
522 | A C<local> modifies its listed variables to be "local" to the | |
523 | enclosing block, C<eval>, or C<do FILE>--and to I<any subroutine | |
524 | called from within that block>. A C<local> just gives temporary | |
525 | values to global (meaning package) variables. It does I<not> create | |
526 | a local variable. This is known as dynamic scoping. Lexical scoping | |
527 | is done with C<my>, which works more like C's auto declarations. | |
528 | ||
529 | Some types of lvalues can be localized as well : hash and array elements | |
530 | and slices, conditionals (provided that their result is always | |
531 | localizable), and symbolic references. As for simple variables, this | |
532 | creates new, dynamically scoped values. | |
533 | ||
534 | If more than one variable or expression is given to C<local>, they must be | |
535 | placed in parentheses. This operator works | |
536 | by saving the current values of those variables in its argument list on a | |
537 | hidden stack and restoring them upon exiting the block, subroutine, or | |
538 | eval. This means that called subroutines can also reference the local | |
539 | variable, but not the global one. The argument list may be assigned to if | |
540 | desired, which allows you to initialize your local variables. (If no | |
541 | initializer is given for a particular variable, it is created with an | |
542 | undefined value.) | |
543 | ||
544 | Because C<local> is a run-time operator, it gets executed each time | |
545 | through a loop. Consequently, it's more efficient to localize your | |
546 | variables outside the loop. | |
547 | ||
548 | =head3 Grammatical note on local() | |
549 | X<local, context> | |
550 | ||
551 | A C<local> is simply a modifier on an lvalue expression. When you assign to | |
552 | a C<local>ized variable, the C<local> doesn't change whether its list is viewed | |
553 | as a scalar or an array. So | |
554 | ||
555 | local($foo) = <STDIN>; | |
556 | local @FOO = <STDIN>; | |
557 | ||
558 | both supply a list context to the right-hand side, while | |
559 | ||
560 | local $foo = <STDIN>; | |
561 | ||
562 | supplies a scalar context. | |
563 | ||
564 | =head3 Localization of special variables | |
565 | X<local, special variable> | |
566 | ||
567 | If you localize a special variable, you'll be giving a new value to it, | |
568 | but its magic won't go away. That means that all side-effects related | |
569 | to this magic still work with the localized value. | |
570 | ||
571 | This feature allows code like this to work : | |
572 | ||
573 | # Read the whole contents of FILE in $slurp | |
574 | { local $/ = undef; $slurp = <FILE>; } | |
575 | ||
576 | Note, however, that this restricts localization of some values ; for | |
577 | example, the following statement dies, as of perl 5.9.0, with an error | |
578 | I<Modification of a read-only value attempted>, because the $1 variable is | |
579 | magical and read-only : | |
580 | ||
581 | local $1 = 2; | |
582 | ||
583 | Similarly, but in a way more difficult to spot, the following snippet will | |
584 | die in perl 5.9.0 : | |
585 | ||
586 | sub f { local $_ = "foo"; print } | |
587 | for ($1) { | |
588 | # now $_ is aliased to $1, thus is magic and readonly | |
589 | f(); | |
590 | } | |
591 | ||
592 | See next section for an alternative to this situation. | |
593 | ||
594 | B<WARNING>: Localization of tied arrays and hashes does not currently | |
595 | work as described. | |
596 | This will be fixed in a future release of Perl; in the meantime, avoid | |
597 | code that relies on any particular behaviour of localising tied arrays | |
598 | or hashes (localising individual elements is still okay). | |
599 | See L<perl58delta/"Localising Tied Arrays and Hashes Is Broken"> for more | |
600 | details. | |
601 | X<local, tie> | |
602 | ||
603 | =head3 Localization of globs | |
604 | X<local, glob> X<glob> | |
605 | ||
606 | The construct | |
607 | ||
608 | local *name; | |
609 | ||
610 | creates a whole new symbol table entry for the glob C<name> in the | |
611 | current package. That means that all variables in its glob slot ($name, | |
612 | @name, %name, &name, and the C<name> filehandle) are dynamically reset. | |
613 | ||
614 | This implies, among other things, that any magic eventually carried by | |
615 | those variables is locally lost. In other words, saying C<local */> | |
616 | will not have any effect on the internal value of the input record | |
617 | separator. | |
618 | ||
619 | Notably, if you want to work with a brand new value of the default scalar | |
620 | $_, and avoid the potential problem listed above about $_ previously | |
621 | carrying a magic value, you should use C<local *_> instead of C<local $_>. | |
622 | ||
623 | =head3 Localization of elements of composite types | |
624 | X<local, composite type element> X<local, array element> X<local, hash element> | |
625 | ||
626 | It's also worth taking a moment to explain what happens when you | |
627 | C<local>ize a member of a composite type (i.e. an array or hash element). | |
628 | In this case, the element is C<local>ized I<by name>. This means that | |
629 | when the scope of the C<local()> ends, the saved value will be | |
630 | restored to the hash element whose key was named in the C<local()>, or | |
631 | the array element whose index was named in the C<local()>. If that | |
632 | element was deleted while the C<local()> was in effect (e.g. by a | |
633 | C<delete()> from a hash or a C<shift()> of an array), it will spring | |
634 | back into existence, possibly extending an array and filling in the | |
635 | skipped elements with C<undef>. For instance, if you say | |
636 | ||
637 | %hash = ( 'This' => 'is', 'a' => 'test' ); | |
638 | @ary = ( 0..5 ); | |
639 | { | |
640 | local($ary[5]) = 6; | |
641 | local($hash{'a'}) = 'drill'; | |
642 | while (my $e = pop(@ary)) { | |
643 | print "$e . . .\n"; | |
644 | last unless $e > 3; | |
645 | } | |
646 | if (@ary) { | |
647 | $hash{'only a'} = 'test'; | |
648 | delete $hash{'a'}; | |
649 | } | |
650 | } | |
651 | print join(' ', map { "$_ $hash{$_}" } sort keys %hash),".\n"; | |
652 | print "The array has ",scalar(@ary)," elements: ", | |
653 | join(', ', map { defined $_ ? $_ : 'undef' } @ary),"\n"; | |
654 | ||
655 | Perl will print | |
656 | ||
657 | 6 . . . | |
658 | 4 . . . | |
659 | 3 . . . | |
660 | This is a test only a test. | |
661 | The array has 6 elements: 0, 1, 2, undef, undef, 5 | |
662 | ||
663 | The behavior of local() on non-existent members of composite | |
664 | types is subject to change in future. | |
665 | ||
666 | =head2 Lvalue subroutines | |
667 | X<lvalue> X<subroutine, lvalue> | |
668 | ||
669 | B<WARNING>: Lvalue subroutines are still experimental and the | |
670 | implementation may change in future versions of Perl. | |
671 | ||
672 | It is possible to return a modifiable value from a subroutine. | |
673 | To do this, you have to declare the subroutine to return an lvalue. | |
674 | ||
675 | my $val; | |
676 | sub canmod : lvalue { | |
677 | # return $val; this doesn't work, don't say "return" | |
678 | $val; | |
679 | } | |
680 | sub nomod { | |
681 | $val; | |
682 | } | |
683 | ||
684 | canmod() = 5; # assigns to $val | |
685 | nomod() = 5; # ERROR | |
686 | ||
687 | The scalar/list context for the subroutine and for the right-hand | |
688 | side of assignment is determined as if the subroutine call is replaced | |
689 | by a scalar. For example, consider: | |
690 | ||
691 | data(2,3) = get_data(3,4); | |
692 | ||
693 | Both subroutines here are called in a scalar context, while in: | |
694 | ||
695 | (data(2,3)) = get_data(3,4); | |
696 | ||
697 | and in: | |
698 | ||
699 | (data(2),data(3)) = get_data(3,4); | |
700 | ||
701 | all the subroutines are called in a list context. | |
702 | ||
703 | =over 4 | |
704 | ||
705 | =item Lvalue subroutines are EXPERIMENTAL | |
706 | ||
707 | They appear to be convenient, but there are several reasons to be | |
708 | circumspect. | |
709 | ||
710 | You can't use the return keyword, you must pass out the value before | |
711 | falling out of subroutine scope. (see comment in example above). This | |
712 | is usually not a problem, but it disallows an explicit return out of a | |
713 | deeply nested loop, which is sometimes a nice way out. | |
714 | ||
715 | They violate encapsulation. A normal mutator can check the supplied | |
716 | argument before setting the attribute it is protecting, an lvalue | |
717 | subroutine never gets that chance. Consider; | |
718 | ||
719 | my $some_array_ref = []; # protected by mutators ?? | |
720 | ||
721 | sub set_arr { # normal mutator | |
722 | my $val = shift; | |
723 | die("expected array, you supplied ", ref $val) | |
724 | unless ref $val eq 'ARRAY'; | |
725 | $some_array_ref = $val; | |
726 | } | |
727 | sub set_arr_lv : lvalue { # lvalue mutator | |
728 | $some_array_ref; | |
729 | } | |
730 | ||
731 | # set_arr_lv cannot stop this ! | |
732 | set_arr_lv() = { a => 1 }; | |
733 | ||
734 | =back | |
735 | ||
736 | =head2 Passing Symbol Table Entries (typeglobs) | |
737 | X<typeglob> X<*> | |
738 | ||
739 | B<WARNING>: The mechanism described in this section was originally | |
740 | the only way to simulate pass-by-reference in older versions of | |
741 | Perl. While it still works fine in modern versions, the new reference | |
742 | mechanism is generally easier to work with. See below. | |
743 | ||
744 | Sometimes you don't want to pass the value of an array to a subroutine | |
745 | but rather the name of it, so that the subroutine can modify the global | |
746 | copy of it rather than working with a local copy. In perl you can | |
747 | refer to all objects of a particular name by prefixing the name | |
748 | with a star: C<*foo>. This is often known as a "typeglob", because the | |
749 | star on the front can be thought of as a wildcard match for all the | |
750 | funny prefix characters on variables and subroutines and such. | |
751 | ||
752 | When evaluated, the typeglob produces a scalar value that represents | |
753 | all the objects of that name, including any filehandle, format, or | |
754 | subroutine. When assigned to, it causes the name mentioned to refer to | |
755 | whatever C<*> value was assigned to it. Example: | |
756 | ||
757 | sub doubleary { | |
758 | local(*someary) = @_; | |
759 | foreach $elem (@someary) { | |
760 | $elem *= 2; | |
761 | } | |
762 | } | |
763 | doubleary(*foo); | |
764 | doubleary(*bar); | |
765 | ||
766 | Scalars are already passed by reference, so you can modify | |
767 | scalar arguments without using this mechanism by referring explicitly | |
768 | to C<$_[0]> etc. You can modify all the elements of an array by passing | |
769 | all the elements as scalars, but you have to use the C<*> mechanism (or | |
770 | the equivalent reference mechanism) to C<push>, C<pop>, or change the size of | |
771 | an array. It will certainly be faster to pass the typeglob (or reference). | |
772 | ||
773 | Even if you don't want to modify an array, this mechanism is useful for | |
774 | passing multiple arrays in a single LIST, because normally the LIST | |
775 | mechanism will merge all the array values so that you can't extract out | |
776 | the individual arrays. For more on typeglobs, see | |
777 | L<perldata/"Typeglobs and Filehandles">. | |
778 | ||
779 | =head2 When to Still Use local() | |
780 | X<local> X<variable, local> | |
781 | ||
782 | Despite the existence of C<my>, there are still three places where the | |
783 | C<local> operator still shines. In fact, in these three places, you | |
784 | I<must> use C<local> instead of C<my>. | |
785 | ||
786 | =over 4 | |
787 | ||
788 | =item 1. | |
789 | ||
790 | You need to give a global variable a temporary value, especially $_. | |
791 | ||
792 | The global variables, like C<@ARGV> or the punctuation variables, must be | |
793 | C<local>ized with C<local()>. This block reads in F</etc/motd>, and splits | |
794 | it up into chunks separated by lines of equal signs, which are placed | |
795 | in C<@Fields>. | |
796 | ||
797 | { | |
798 | local @ARGV = ("/etc/motd"); | |
799 | local $/ = undef; | |
800 | local $_ = <>; | |
801 | @Fields = split /^\s*=+\s*$/; | |
802 | } | |
803 | ||
804 | It particular, it's important to C<local>ize $_ in any routine that assigns | |
805 | to it. Look out for implicit assignments in C<while> conditionals. | |
806 | ||
807 | =item 2. | |
808 | ||
809 | You need to create a local file or directory handle or a local function. | |
810 | ||
811 | A function that needs a filehandle of its own must use | |
812 | C<local()> on a complete typeglob. This can be used to create new symbol | |
813 | table entries: | |
814 | ||
815 | sub ioqueue { | |
816 | local (*READER, *WRITER); # not my! | |
817 | pipe (READER, WRITER) or die "pipe: $!"; | |
818 | return (*READER, *WRITER); | |
819 | } | |
820 | ($head, $tail) = ioqueue(); | |
821 | ||
822 | See the Symbol module for a way to create anonymous symbol table | |
823 | entries. | |
824 | ||
825 | Because assignment of a reference to a typeglob creates an alias, this | |
826 | can be used to create what is effectively a local function, or at least, | |
827 | a local alias. | |
828 | ||
829 | { | |
830 | local *grow = \&shrink; # only until this block exists | |
831 | grow(); # really calls shrink() | |
832 | move(); # if move() grow()s, it shrink()s too | |
833 | } | |
834 | grow(); # get the real grow() again | |
835 | ||
836 | See L<perlref/"Function Templates"> for more about manipulating | |
837 | functions by name in this way. | |
838 | ||
839 | =item 3. | |
840 | ||
841 | You want to temporarily change just one element of an array or hash. | |
842 | ||
843 | You can C<local>ize just one element of an aggregate. Usually this | |
844 | is done on dynamics: | |
845 | ||
846 | { | |
847 | local $SIG{INT} = 'IGNORE'; | |
848 | funct(); # uninterruptible | |
849 | } | |
850 | # interruptibility automatically restored here | |
851 | ||
852 | But it also works on lexically declared aggregates. Prior to 5.005, | |
853 | this operation could on occasion misbehave. | |
854 | ||
855 | =back | |
856 | ||
857 | =head2 Pass by Reference | |
858 | X<pass by reference> X<pass-by-reference> X<reference> | |
859 | ||
860 | If you want to pass more than one array or hash into a function--or | |
861 | return them from it--and have them maintain their integrity, then | |
862 | you're going to have to use an explicit pass-by-reference. Before you | |
863 | do that, you need to understand references as detailed in L<perlref>. | |
864 | This section may not make much sense to you otherwise. | |
865 | ||
866 | Here are a few simple examples. First, let's pass in several arrays | |
867 | to a function and have it C<pop> all of then, returning a new list | |
868 | of all their former last elements: | |
869 | ||
870 | @tailings = popmany ( \@a, \@b, \@c, \@d ); | |
871 | ||
872 | sub popmany { | |
873 | my $aref; | |
874 | my @retlist = (); | |
875 | foreach $aref ( @_ ) { | |
876 | push @retlist, pop @$aref; | |
877 | } | |
878 | return @retlist; | |
879 | } | |
880 | ||
881 | Here's how you might write a function that returns a | |
882 | list of keys occurring in all the hashes passed to it: | |
883 | ||
884 | @common = inter( \%foo, \%bar, \%joe ); | |
885 | sub inter { | |
886 | my ($k, $href, %seen); # locals | |
887 | foreach $href (@_) { | |
888 | while ( $k = each %$href ) { | |
889 | $seen{$k}++; | |
890 | } | |
891 | } | |
892 | return grep { $seen{$_} == @_ } keys %seen; | |
893 | } | |
894 | ||
895 | So far, we're using just the normal list return mechanism. | |
896 | What happens if you want to pass or return a hash? Well, | |
897 | if you're using only one of them, or you don't mind them | |
898 | concatenating, then the normal calling convention is ok, although | |
899 | a little expensive. | |
900 | ||
901 | Where people get into trouble is here: | |
902 | ||
903 | (@a, @b) = func(@c, @d); | |
904 | or | |
905 | (%a, %b) = func(%c, %d); | |
906 | ||
907 | That syntax simply won't work. It sets just C<@a> or C<%a> and | |
908 | clears the C<@b> or C<%b>. Plus the function didn't get passed | |
909 | into two separate arrays or hashes: it got one long list in C<@_>, | |
910 | as always. | |
911 | ||
912 | If you can arrange for everyone to deal with this through references, it's | |
913 | cleaner code, although not so nice to look at. Here's a function that | |
914 | takes two array references as arguments, returning the two array elements | |
915 | in order of how many elements they have in them: | |
916 | ||
917 | ($aref, $bref) = func(\@c, \@d); | |
918 | print "@$aref has more than @$bref\n"; | |
919 | sub func { | |
920 | my ($cref, $dref) = @_; | |
921 | if (@$cref > @$dref) { | |
922 | return ($cref, $dref); | |
923 | } else { | |
924 | return ($dref, $cref); | |
925 | } | |
926 | } | |
927 | ||
928 | It turns out that you can actually do this also: | |
929 | ||
930 | (*a, *b) = func(\@c, \@d); | |
931 | print "@a has more than @b\n"; | |
932 | sub func { | |
933 | local (*c, *d) = @_; | |
934 | if (@c > @d) { | |
935 | return (\@c, \@d); | |
936 | } else { | |
937 | return (\@d, \@c); | |
938 | } | |
939 | } | |
940 | ||
941 | Here we're using the typeglobs to do symbol table aliasing. It's | |
942 | a tad subtle, though, and also won't work if you're using C<my> | |
943 | variables, because only globals (even in disguise as C<local>s) | |
944 | are in the symbol table. | |
945 | ||
946 | If you're passing around filehandles, you could usually just use the bare | |
947 | typeglob, like C<*STDOUT>, but typeglobs references work, too. | |
948 | For example: | |
949 | ||
950 | splutter(\*STDOUT); | |
951 | sub splutter { | |
952 | my $fh = shift; | |
953 | print $fh "her um well a hmmm\n"; | |
954 | } | |
955 | ||
956 | $rec = get_rec(\*STDIN); | |
957 | sub get_rec { | |
958 | my $fh = shift; | |
959 | return scalar <$fh>; | |
960 | } | |
961 | ||
962 | If you're planning on generating new filehandles, you could do this. | |
963 | Notice to pass back just the bare *FH, not its reference. | |
964 | ||
965 | sub openit { | |
966 | my $path = shift; | |
967 | local *FH; | |
968 | return open (FH, $path) ? *FH : undef; | |
969 | } | |
970 | ||
971 | =head2 Prototypes | |
972 | X<prototype> X<subroutine, prototype> | |
973 | ||
974 | Perl supports a very limited kind of compile-time argument checking | |
975 | using function prototyping. If you declare | |
976 | ||
977 | sub mypush (\@@) | |
978 | ||
979 | then C<mypush()> takes arguments exactly like C<push()> does. The | |
980 | function declaration must be visible at compile time. The prototype | |
981 | affects only interpretation of new-style calls to the function, | |
982 | where new-style is defined as not using the C<&> character. In | |
983 | other words, if you call it like a built-in function, then it behaves | |
984 | like a built-in function. If you call it like an old-fashioned | |
985 | subroutine, then it behaves like an old-fashioned subroutine. It | |
986 | naturally falls out from this rule that prototypes have no influence | |
987 | on subroutine references like C<\&foo> or on indirect subroutine | |
988 | calls like C<&{$subref}> or C<< $subref->() >>. | |
989 | ||
990 | Method calls are not influenced by prototypes either, because the | |
991 | function to be called is indeterminate at compile time, since | |
992 | the exact code called depends on inheritance. | |
993 | ||
994 | Because the intent of this feature is primarily to let you define | |
995 | subroutines that work like built-in functions, here are prototypes | |
996 | for some other functions that parse almost exactly like the | |
997 | corresponding built-in. | |
998 | ||
999 | Declared as Called as | |
1000 | ||
1001 | sub mylink ($$) mylink $old, $new | |
1002 | sub myvec ($$$) myvec $var, $offset, 1 | |
1003 | sub myindex ($$;$) myindex &getstring, "substr" | |
1004 | sub mysyswrite ($$$;$) mysyswrite $buf, 0, length($buf) - $off, $off | |
1005 | sub myreverse (@) myreverse $a, $b, $c | |
1006 | sub myjoin ($@) myjoin ":", $a, $b, $c | |
1007 | sub mypop (\@) mypop @array | |
1008 | sub mysplice (\@$$@) mysplice @array, @array, 0, @pushme | |
1009 | sub mykeys (\%) mykeys %{$hashref} | |
1010 | sub myopen (*;$) myopen HANDLE, $name | |
1011 | sub mypipe (**) mypipe READHANDLE, WRITEHANDLE | |
1012 | sub mygrep (&@) mygrep { /foo/ } $a, $b, $c | |
1013 | sub myrand ($) myrand 42 | |
1014 | sub mytime () mytime | |
1015 | ||
1016 | Any backslashed prototype character represents an actual argument | |
1017 | that absolutely must start with that character. The value passed | |
1018 | as part of C<@_> will be a reference to the actual argument given | |
1019 | in the subroutine call, obtained by applying C<\> to that argument. | |
1020 | ||
1021 | You can also backslash several argument types simultaneously by using | |
1022 | the C<\[]> notation: | |
1023 | ||
1024 | sub myref (\[$@%&*]) | |
1025 | ||
1026 | will allow calling myref() as | |
1027 | ||
1028 | myref $var | |
1029 | myref @array | |
1030 | myref %hash | |
1031 | myref &sub | |
1032 | myref *glob | |
1033 | ||
1034 | and the first argument of myref() will be a reference to | |
1035 | a scalar, an array, a hash, a code, or a glob. | |
1036 | ||
1037 | Unbackslashed prototype characters have special meanings. Any | |
1038 | unbackslashed C<@> or C<%> eats all remaining arguments, and forces | |
1039 | list context. An argument represented by C<$> forces scalar context. An | |
1040 | C<&> requires an anonymous subroutine, which, if passed as the first | |
1041 | argument, does not require the C<sub> keyword or a subsequent comma. | |
1042 | ||
1043 | A C<*> allows the subroutine to accept a bareword, constant, scalar expression, | |
1044 | typeglob, or a reference to a typeglob in that slot. The value will be | |
1045 | available to the subroutine either as a simple scalar, or (in the latter | |
1046 | two cases) as a reference to the typeglob. If you wish to always convert | |
1047 | such arguments to a typeglob reference, use Symbol::qualify_to_ref() as | |
1048 | follows: | |
1049 | ||
1050 | use Symbol 'qualify_to_ref'; | |
1051 | ||
1052 | sub foo (*) { | |
1053 | my $fh = qualify_to_ref(shift, caller); | |
1054 | ... | |
1055 | } | |
1056 | ||
1057 | A semicolon separates mandatory arguments from optional arguments. | |
1058 | It is redundant before C<@> or C<%>, which gobble up everything else. | |
1059 | ||
1060 | Note how the last three examples in the table above are treated | |
1061 | specially by the parser. C<mygrep()> is parsed as a true list | |
1062 | operator, C<myrand()> is parsed as a true unary operator with unary | |
1063 | precedence the same as C<rand()>, and C<mytime()> is truly without | |
1064 | arguments, just like C<time()>. That is, if you say | |
1065 | ||
1066 | mytime +2; | |
1067 | ||
1068 | you'll get C<mytime() + 2>, not C<mytime(2)>, which is how it would be parsed | |
1069 | without a prototype. | |
1070 | ||
1071 | The interesting thing about C<&> is that you can generate new syntax with it, | |
1072 | provided it's in the initial position: | |
1073 | X<&> | |
1074 | ||
1075 | sub try (&@) { | |
1076 | my($try,$catch) = @_; | |
1077 | eval { &$try }; | |
1078 | if ($@) { | |
1079 | local $_ = $@; | |
1080 | &$catch; | |
1081 | } | |
1082 | } | |
1083 | sub catch (&) { $_[0] } | |
1084 | ||
1085 | try { | |
1086 | die "phooey"; | |
1087 | } catch { | |
1088 | /phooey/ and print "unphooey\n"; | |
1089 | }; | |
1090 | ||
1091 | That prints C<"unphooey">. (Yes, there are still unresolved | |
1092 | issues having to do with visibility of C<@_>. I'm ignoring that | |
1093 | question for the moment. (But note that if we make C<@_> lexically | |
1094 | scoped, those anonymous subroutines can act like closures... (Gee, | |
1095 | is this sounding a little Lispish? (Never mind.)))) | |
1096 | ||
1097 | And here's a reimplementation of the Perl C<grep> operator: | |
1098 | X<grep> | |
1099 | ||
1100 | sub mygrep (&@) { | |
1101 | my $code = shift; | |
1102 | my @result; | |
1103 | foreach $_ (@_) { | |
1104 | push(@result, $_) if &$code; | |
1105 | } | |
1106 | @result; | |
1107 | } | |
1108 | ||
1109 | Some folks would prefer full alphanumeric prototypes. Alphanumerics have | |
1110 | been intentionally left out of prototypes for the express purpose of | |
1111 | someday in the future adding named, formal parameters. The current | |
1112 | mechanism's main goal is to let module writers provide better diagnostics | |
1113 | for module users. Larry feels the notation quite understandable to Perl | |
1114 | programmers, and that it will not intrude greatly upon the meat of the | |
1115 | module, nor make it harder to read. The line noise is visually | |
1116 | encapsulated into a small pill that's easy to swallow. | |
1117 | ||
1118 | If you try to use an alphanumeric sequence in a prototype you will | |
1119 | generate an optional warning - "Illegal character in prototype...". | |
1120 | Unfortunately earlier versions of Perl allowed the prototype to be | |
1121 | used as long as its prefix was a valid prototype. The warning may be | |
1122 | upgraded to a fatal error in a future version of Perl once the | |
1123 | majority of offending code is fixed. | |
1124 | ||
1125 | It's probably best to prototype new functions, not retrofit prototyping | |
1126 | into older ones. That's because you must be especially careful about | |
1127 | silent impositions of differing list versus scalar contexts. For example, | |
1128 | if you decide that a function should take just one parameter, like this: | |
1129 | ||
1130 | sub func ($) { | |
1131 | my $n = shift; | |
1132 | print "you gave me $n\n"; | |
1133 | } | |
1134 | ||
1135 | and someone has been calling it with an array or expression | |
1136 | returning a list: | |
1137 | ||
1138 | func(@foo); | |
1139 | func( split /:/ ); | |
1140 | ||
1141 | Then you've just supplied an automatic C<scalar> in front of their | |
1142 | argument, which can be more than a bit surprising. The old C<@foo> | |
1143 | which used to hold one thing doesn't get passed in. Instead, | |
1144 | C<func()> now gets passed in a C<1>; that is, the number of elements | |
1145 | in C<@foo>. And the C<split> gets called in scalar context so it | |
1146 | starts scribbling on your C<@_> parameter list. Ouch! | |
1147 | ||
1148 | This is all very powerful, of course, and should be used only in moderation | |
1149 | to make the world a better place. | |
1150 | ||
1151 | =head2 Constant Functions | |
1152 | X<constant> | |
1153 | ||
1154 | Functions with a prototype of C<()> are potential candidates for | |
1155 | inlining. If the result after optimization and constant folding | |
1156 | is either a constant or a lexically-scoped scalar which has no other | |
1157 | references, then it will be used in place of function calls made | |
1158 | without C<&>. Calls made using C<&> are never inlined. (See | |
1159 | F<constant.pm> for an easy way to declare most constants.) | |
1160 | ||
1161 | The following functions would all be inlined: | |
1162 | ||
1163 | sub pi () { 3.14159 } # Not exact, but close. | |
1164 | sub PI () { 4 * atan2 1, 1 } # As good as it gets, | |
1165 | # and it's inlined, too! | |
1166 | sub ST_DEV () { 0 } | |
1167 | sub ST_INO () { 1 } | |
1168 | ||
1169 | sub FLAG_FOO () { 1 << 8 } | |
1170 | sub FLAG_BAR () { 1 << 9 } | |
1171 | sub FLAG_MASK () { FLAG_FOO | FLAG_BAR } | |
1172 | ||
1173 | sub OPT_BAZ () { not (0x1B58 & FLAG_MASK) } | |
1174 | ||
1175 | sub N () { int(OPT_BAZ) / 3 } | |
1176 | ||
1177 | sub FOO_SET () { 1 if FLAG_MASK & FLAG_FOO } | |
1178 | ||
1179 | Be aware that these will not be inlined; as they contain inner scopes, | |
1180 | the constant folding doesn't reduce them to a single constant: | |
1181 | ||
1182 | sub foo_set () { if (FLAG_MASK & FLAG_FOO) { 1 } } | |
1183 | ||
1184 | sub baz_val () { | |
1185 | if (OPT_BAZ) { | |
1186 | return 23; | |
1187 | } | |
1188 | else { | |
1189 | return 42; | |
1190 | } | |
1191 | } | |
1192 | ||
1193 | If you redefine a subroutine that was eligible for inlining, you'll get | |
1194 | a mandatory warning. (You can use this warning to tell whether or not a | |
1195 | particular subroutine is considered constant.) The warning is | |
1196 | considered severe enough not to be optional because previously compiled | |
1197 | invocations of the function will still be using the old value of the | |
1198 | function. If you need to be able to redefine the subroutine, you need to | |
1199 | ensure that it isn't inlined, either by dropping the C<()> prototype | |
1200 | (which changes calling semantics, so beware) or by thwarting the | |
1201 | inlining mechanism in some other way, such as | |
1202 | ||
1203 | sub not_inlined () { | |
1204 | 23 if $]; | |
1205 | } | |
1206 | ||
1207 | =head2 Overriding Built-in Functions | |
1208 | X<built-in> X<override> X<CORE> X<CORE::GLOBAL> | |
1209 | ||
1210 | Many built-in functions may be overridden, though this should be tried | |
1211 | only occasionally and for good reason. Typically this might be | |
1212 | done by a package attempting to emulate missing built-in functionality | |
1213 | on a non-Unix system. | |
1214 | ||
1215 | Overriding may be done only by importing the name from a module at | |
1216 | compile time--ordinary predeclaration isn't good enough. However, the | |
1217 | C<use subs> pragma lets you, in effect, predeclare subs | |
1218 | via the import syntax, and these names may then override built-in ones: | |
1219 | ||
1220 | use subs 'chdir', 'chroot', 'chmod', 'chown'; | |
1221 | chdir $somewhere; | |
1222 | sub chdir { ... } | |
1223 | ||
1224 | To unambiguously refer to the built-in form, precede the | |
1225 | built-in name with the special package qualifier C<CORE::>. For example, | |
1226 | saying C<CORE::open()> always refers to the built-in C<open()>, even | |
1227 | if the current package has imported some other subroutine called | |
1228 | C<&open()> from elsewhere. Even though it looks like a regular | |
1229 | function call, it isn't: you can't take a reference to it, such as | |
1230 | the incorrect C<\&CORE::open> might appear to produce. | |
1231 | ||
1232 | Library modules should not in general export built-in names like C<open> | |
1233 | or C<chdir> as part of their default C<@EXPORT> list, because these may | |
1234 | sneak into someone else's namespace and change the semantics unexpectedly. | |
1235 | Instead, if the module adds that name to C<@EXPORT_OK>, then it's | |
1236 | possible for a user to import the name explicitly, but not implicitly. | |
1237 | That is, they could say | |
1238 | ||
1239 | use Module 'open'; | |
1240 | ||
1241 | and it would import the C<open> override. But if they said | |
1242 | ||
1243 | use Module; | |
1244 | ||
1245 | they would get the default imports without overrides. | |
1246 | ||
1247 | The foregoing mechanism for overriding built-in is restricted, quite | |
1248 | deliberately, to the package that requests the import. There is a second | |
1249 | method that is sometimes applicable when you wish to override a built-in | |
1250 | everywhere, without regard to namespace boundaries. This is achieved by | |
1251 | importing a sub into the special namespace C<CORE::GLOBAL::>. Here is an | |
1252 | example that quite brazenly replaces the C<glob> operator with something | |
1253 | that understands regular expressions. | |
1254 | ||
1255 | package REGlob; | |
1256 | require Exporter; | |
1257 | @ISA = 'Exporter'; | |
1258 | @EXPORT_OK = 'glob'; | |
1259 | ||
1260 | sub import { | |
1261 | my $pkg = shift; | |
1262 | return unless @_; | |
1263 | my $sym = shift; | |
1264 | my $where = ($sym =~ s/^GLOBAL_// ? 'CORE::GLOBAL' : caller(0)); | |
1265 | $pkg->export($where, $sym, @_); | |
1266 | } | |
1267 | ||
1268 | sub glob { | |
1269 | my $pat = shift; | |
1270 | my @got; | |
1271 | local *D; | |
1272 | if (opendir D, '.') { | |
1273 | @got = grep /$pat/, readdir D; | |
1274 | closedir D; | |
1275 | } | |
1276 | return @got; | |
1277 | } | |
1278 | 1; | |
1279 | ||
1280 | And here's how it could be (ab)used: | |
1281 | ||
1282 | #use REGlob 'GLOBAL_glob'; # override glob() in ALL namespaces | |
1283 | package Foo; | |
1284 | use REGlob 'glob'; # override glob() in Foo:: only | |
1285 | print for <^[a-z_]+\.pm\$>; # show all pragmatic modules | |
1286 | ||
1287 | The initial comment shows a contrived, even dangerous example. | |
1288 | By overriding C<glob> globally, you would be forcing the new (and | |
1289 | subversive) behavior for the C<glob> operator for I<every> namespace, | |
1290 | without the complete cognizance or cooperation of the modules that own | |
1291 | those namespaces. Naturally, this should be done with extreme caution--if | |
1292 | it must be done at all. | |
1293 | ||
1294 | The C<REGlob> example above does not implement all the support needed to | |
1295 | cleanly override perl's C<glob> operator. The built-in C<glob> has | |
1296 | different behaviors depending on whether it appears in a scalar or list | |
1297 | context, but our C<REGlob> doesn't. Indeed, many perl built-in have such | |
1298 | context sensitive behaviors, and these must be adequately supported by | |
1299 | a properly written override. For a fully functional example of overriding | |
1300 | C<glob>, study the implementation of C<File::DosGlob> in the standard | |
1301 | library. | |
1302 | ||
1303 | When you override a built-in, your replacement should be consistent (if | |
1304 | possible) with the built-in native syntax. You can achieve this by using | |
1305 | a suitable prototype. To get the prototype of an overridable built-in, | |
1306 | use the C<prototype> function with an argument of C<"CORE::builtin_name"> | |
1307 | (see L<perlfunc/prototype>). | |
1308 | ||
1309 | Note however that some built-ins can't have their syntax expressed by a | |
1310 | prototype (such as C<system> or C<chomp>). If you override them you won't | |
1311 | be able to fully mimic their original syntax. | |
1312 | ||
1313 | The built-ins C<do>, C<require> and C<glob> can also be overridden, but due | |
1314 | to special magic, their original syntax is preserved, and you don't have | |
1315 | to define a prototype for their replacements. (You can't override the | |
1316 | C<do BLOCK> syntax, though). | |
1317 | ||
1318 | C<require> has special additional dark magic: if you invoke your | |
1319 | C<require> replacement as C<require Foo::Bar>, it will actually receive | |
1320 | the argument C<"Foo/Bar.pm"> in @_. See L<perlfunc/require>. | |
1321 | ||
1322 | And, as you'll have noticed from the previous example, if you override | |
1323 | C<glob>, the C<E<lt>*E<gt>> glob operator is overridden as well. | |
1324 | ||
1325 | In a similar fashion, overriding the C<readline> function also overrides | |
1326 | the equivalent I/O operator C<< <FILEHANDLE> >>. | |
1327 | ||
1328 | Finally, some built-ins (e.g. C<exists> or C<grep>) can't be overridden. | |
1329 | ||
1330 | =head2 Autoloading | |
1331 | X<autoloading> X<AUTOLOAD> | |
1332 | ||
1333 | If you call a subroutine that is undefined, you would ordinarily | |
1334 | get an immediate, fatal error complaining that the subroutine doesn't | |
1335 | exist. (Likewise for subroutines being used as methods, when the | |
1336 | method doesn't exist in any base class of the class's package.) | |
1337 | However, if an C<AUTOLOAD> subroutine is defined in the package or | |
1338 | packages used to locate the original subroutine, then that | |
1339 | C<AUTOLOAD> subroutine is called with the arguments that would have | |
1340 | been passed to the original subroutine. The fully qualified name | |
1341 | of the original subroutine magically appears in the global $AUTOLOAD | |
1342 | variable of the same package as the C<AUTOLOAD> routine. The name | |
1343 | is not passed as an ordinary argument because, er, well, just | |
1344 | because, that's why... | |
1345 | ||
1346 | Many C<AUTOLOAD> routines load in a definition for the requested | |
1347 | subroutine using eval(), then execute that subroutine using a special | |
1348 | form of goto() that erases the stack frame of the C<AUTOLOAD> routine | |
1349 | without a trace. (See the source to the standard module documented | |
1350 | in L<AutoLoader>, for example.) But an C<AUTOLOAD> routine can | |
1351 | also just emulate the routine and never define it. For example, | |
1352 | let's pretend that a function that wasn't defined should just invoke | |
1353 | C<system> with those arguments. All you'd do is: | |
1354 | ||
1355 | sub AUTOLOAD { | |
1356 | my $program = $AUTOLOAD; | |
1357 | $program =~ s/.*:://; | |
1358 | system($program, @_); | |
1359 | } | |
1360 | date(); | |
1361 | who('am', 'i'); | |
1362 | ls('-l'); | |
1363 | ||
1364 | In fact, if you predeclare functions you want to call that way, you don't | |
1365 | even need parentheses: | |
1366 | ||
1367 | use subs qw(date who ls); | |
1368 | date; | |
1369 | who "am", "i"; | |
1370 | ls -l; | |
1371 | ||
1372 | A more complete example of this is the standard Shell module, which | |
1373 | can treat undefined subroutine calls as calls to external programs. | |
1374 | ||
1375 | Mechanisms are available to help modules writers split their modules | |
1376 | into autoloadable files. See the standard AutoLoader module | |
1377 | described in L<AutoLoader> and in L<AutoSplit>, the standard | |
1378 | SelfLoader modules in L<SelfLoader>, and the document on adding C | |
1379 | functions to Perl code in L<perlxs>. | |
1380 | ||
1381 | =head2 Subroutine Attributes | |
1382 | X<attribute> X<subroutine, attribute> X<attrs> | |
1383 | ||
1384 | A subroutine declaration or definition may have a list of attributes | |
1385 | associated with it. If such an attribute list is present, it is | |
1386 | broken up at space or colon boundaries and treated as though a | |
1387 | C<use attributes> had been seen. See L<attributes> for details | |
1388 | about what attributes are currently supported. | |
1389 | Unlike the limitation with the obsolescent C<use attrs>, the | |
1390 | C<sub : ATTRLIST> syntax works to associate the attributes with | |
1391 | a pre-declaration, and not just with a subroutine definition. | |
1392 | ||
1393 | The attributes must be valid as simple identifier names (without any | |
1394 | punctuation other than the '_' character). They may have a parameter | |
1395 | list appended, which is only checked for whether its parentheses ('(',')') | |
1396 | nest properly. | |
1397 | ||
1398 | Examples of valid syntax (even though the attributes are unknown): | |
1399 | ||
1400 | sub fnord (&\%) : switch(10,foo(7,3)) : expensive; | |
1401 | sub plugh () : Ugly('\(") :Bad; | |
1402 | sub xyzzy : _5x5 { ... } | |
1403 | ||
1404 | Examples of invalid syntax: | |
1405 | ||
1406 | sub fnord : switch(10,foo(); # ()-string not balanced | |
1407 | sub snoid : Ugly('('); # ()-string not balanced | |
1408 | sub xyzzy : 5x5; # "5x5" not a valid identifier | |
1409 | sub plugh : Y2::north; # "Y2::north" not a simple identifier | |
1410 | sub snurt : foo + bar; # "+" not a colon or space | |
1411 | ||
1412 | The attribute list is passed as a list of constant strings to the code | |
1413 | which associates them with the subroutine. In particular, the second example | |
1414 | of valid syntax above currently looks like this in terms of how it's | |
1415 | parsed and invoked: | |
1416 | ||
1417 | use attributes __PACKAGE__, \&plugh, q[Ugly('\(")], 'Bad'; | |
1418 | ||
1419 | For further details on attribute lists and their manipulation, | |
1420 | see L<attributes> and L<Attribute::Handlers>. | |
1421 | ||
1422 | =head1 SEE ALSO | |
1423 | ||
1424 | See L<perlref/"Function Templates"> for more about references and closures. | |
1425 | See L<perlxs> if you'd like to learn about calling C subroutines from Perl. | |
1426 | See L<perlembed> if you'd like to learn about calling Perl subroutines from C. | |
1427 | See L<perlmod> to learn about bundling up your functions in separate files. | |
1428 | See L<perlmodlib> to learn what library modules come standard on your system. | |
1429 | See L<perltoot> to learn how to make object method calls. |