Commit | Line | Data |
---|---|---|
86530b38 AT |
1 | =head1 NAME |
2 | ||
3 | perlref - Perl references and nested data structures | |
4 | ||
5 | =head1 NOTE | |
6 | ||
7 | This is complete documentation about all aspects of references. | |
8 | For a shorter, tutorial introduction to just the essential features, | |
9 | see L<perlreftut>. | |
10 | ||
11 | =head1 DESCRIPTION | |
12 | ||
13 | Before release 5 of Perl it was difficult to represent complex data | |
14 | structures, because all references had to be symbolic--and even then | |
15 | it was difficult to refer to a variable instead of a symbol table entry. | |
16 | Perl now not only makes it easier to use symbolic references to variables, | |
17 | but also lets you have "hard" references to any piece of data or code. | |
18 | Any scalar may hold a hard reference. Because arrays and hashes contain | |
19 | scalars, you can now easily build arrays of arrays, arrays of hashes, | |
20 | hashes of arrays, arrays of hashes of functions, and so on. | |
21 | ||
22 | Hard references are smart--they keep track of reference counts for you, | |
23 | automatically freeing the thing referred to when its reference count goes | |
24 | to zero. (Reference counts for values in self-referential or | |
25 | cyclic data structures may not go to zero without a little help; see | |
26 | L<perlobj/"Two-Phased Garbage Collection"> for a detailed explanation.) | |
27 | If that thing happens to be an object, the object is destructed. See | |
28 | L<perlobj> for more about objects. (In a sense, everything in Perl is an | |
29 | object, but we usually reserve the word for references to objects that | |
30 | have been officially "blessed" into a class package.) | |
31 | ||
32 | Symbolic references are names of variables or other objects, just as a | |
33 | symbolic link in a Unix filesystem contains merely the name of a file. | |
34 | The C<*glob> notation is something of a symbolic reference. (Symbolic | |
35 | references are sometimes called "soft references", but please don't call | |
36 | them that; references are confusing enough without useless synonyms.) | |
37 | ||
38 | In contrast, hard references are more like hard links in a Unix file | |
39 | system: They are used to access an underlying object without concern for | |
40 | what its (other) name is. When the word "reference" is used without an | |
41 | adjective, as in the following paragraph, it is usually talking about a | |
42 | hard reference. | |
43 | ||
44 | References are easy to use in Perl. There is just one overriding | |
45 | principle: Perl does no implicit referencing or dereferencing. When a | |
46 | scalar is holding a reference, it always behaves as a simple scalar. It | |
47 | doesn't magically start being an array or hash or subroutine; you have to | |
48 | tell it explicitly to do so, by dereferencing it. | |
49 | ||
50 | =head2 Making References | |
51 | ||
52 | References can be created in several ways. | |
53 | ||
54 | =over 4 | |
55 | ||
56 | =item 1. | |
57 | ||
58 | By using the backslash operator on a variable, subroutine, or value. | |
59 | (This works much like the & (address-of) operator in C.) | |
60 | This typically creates I<another> reference to a variable, because | |
61 | there's already a reference to the variable in the symbol table. But | |
62 | the symbol table reference might go away, and you'll still have the | |
63 | reference that the backslash returned. Here are some examples: | |
64 | ||
65 | $scalarref = \$foo; | |
66 | $arrayref = \@ARGV; | |
67 | $hashref = \%ENV; | |
68 | $coderef = \&handler; | |
69 | $globref = \*foo; | |
70 | ||
71 | It isn't possible to create a true reference to an IO handle (filehandle | |
72 | or dirhandle) using the backslash operator. The most you can get is a | |
73 | reference to a typeglob, which is actually a complete symbol table entry. | |
74 | But see the explanation of the C<*foo{THING}> syntax below. However, | |
75 | you can still use type globs and globrefs as though they were IO handles. | |
76 | ||
77 | =item 2. | |
78 | ||
79 | A reference to an anonymous array can be created using square | |
80 | brackets: | |
81 | ||
82 | $arrayref = [1, 2, ['a', 'b', 'c']]; | |
83 | ||
84 | Here we've created a reference to an anonymous array of three elements | |
85 | whose final element is itself a reference to another anonymous array of three | |
86 | elements. (The multidimensional syntax described later can be used to | |
87 | access this. For example, after the above, C<< $arrayref->[2][1] >> would have | |
88 | the value "b".) | |
89 | ||
90 | Taking a reference to an enumerated list is not the same | |
91 | as using square brackets--instead it's the same as creating | |
92 | a list of references! | |
93 | ||
94 | @list = (\$a, \@b, \%c); | |
95 | @list = \($a, @b, %c); # same thing! | |
96 | ||
97 | As a special case, C<\(@foo)> returns a list of references to the contents | |
98 | of C<@foo>, not a reference to C<@foo> itself. Likewise for C<%foo>, | |
99 | except that the key references are to copies (since the keys are just | |
100 | strings rather than full-fledged scalars). | |
101 | ||
102 | =item 3. | |
103 | ||
104 | A reference to an anonymous hash can be created using curly | |
105 | brackets: | |
106 | ||
107 | $hashref = { | |
108 | 'Adam' => 'Eve', | |
109 | 'Clyde' => 'Bonnie', | |
110 | }; | |
111 | ||
112 | Anonymous hash and array composers like these can be intermixed freely to | |
113 | produce as complicated a structure as you want. The multidimensional | |
114 | syntax described below works for these too. The values above are | |
115 | literals, but variables and expressions would work just as well, because | |
116 | assignment operators in Perl (even within local() or my()) are executable | |
117 | statements, not compile-time declarations. | |
118 | ||
119 | Because curly brackets (braces) are used for several other things | |
120 | including BLOCKs, you may occasionally have to disambiguate braces at the | |
121 | beginning of a statement by putting a C<+> or a C<return> in front so | |
122 | that Perl realizes the opening brace isn't starting a BLOCK. The economy and | |
123 | mnemonic value of using curlies is deemed worth this occasional extra | |
124 | hassle. | |
125 | ||
126 | For example, if you wanted a function to make a new hash and return a | |
127 | reference to it, you have these options: | |
128 | ||
129 | sub hashem { { @_ } } # silently wrong | |
130 | sub hashem { +{ @_ } } # ok | |
131 | sub hashem { return { @_ } } # ok | |
132 | ||
133 | On the other hand, if you want the other meaning, you can do this: | |
134 | ||
135 | sub showem { { @_ } } # ambiguous (currently ok, but may change) | |
136 | sub showem { {; @_ } } # ok | |
137 | sub showem { { return @_ } } # ok | |
138 | ||
139 | The leading C<+{> and C<{;> always serve to disambiguate | |
140 | the expression to mean either the HASH reference, or the BLOCK. | |
141 | ||
142 | =item 4. | |
143 | ||
144 | A reference to an anonymous subroutine can be created by using | |
145 | C<sub> without a subname: | |
146 | ||
147 | $coderef = sub { print "Boink!\n" }; | |
148 | ||
149 | Note the semicolon. Except for the code | |
150 | inside not being immediately executed, a C<sub {}> is not so much a | |
151 | declaration as it is an operator, like C<do{}> or C<eval{}>. (However, no | |
152 | matter how many times you execute that particular line (unless you're in an | |
153 | C<eval("...")>), $coderef will still have a reference to the I<same> | |
154 | anonymous subroutine.) | |
155 | ||
156 | Anonymous subroutines act as closures with respect to my() variables, | |
157 | that is, variables lexically visible within the current scope. Closure | |
158 | is a notion out of the Lisp world that says if you define an anonymous | |
159 | function in a particular lexical context, it pretends to run in that | |
160 | context even when it's called outside the context. | |
161 | ||
162 | In human terms, it's a funny way of passing arguments to a subroutine when | |
163 | you define it as well as when you call it. It's useful for setting up | |
164 | little bits of code to run later, such as callbacks. You can even | |
165 | do object-oriented stuff with it, though Perl already provides a different | |
166 | mechanism to do that--see L<perlobj>. | |
167 | ||
168 | You might also think of closure as a way to write a subroutine | |
169 | template without using eval(). Here's a small example of how | |
170 | closures work: | |
171 | ||
172 | sub newprint { | |
173 | my $x = shift; | |
174 | return sub { my $y = shift; print "$x, $y!\n"; }; | |
175 | } | |
176 | $h = newprint("Howdy"); | |
177 | $g = newprint("Greetings"); | |
178 | ||
179 | # Time passes... | |
180 | ||
181 | &$h("world"); | |
182 | &$g("earthlings"); | |
183 | ||
184 | This prints | |
185 | ||
186 | Howdy, world! | |
187 | Greetings, earthlings! | |
188 | ||
189 | Note particularly that $x continues to refer to the value passed | |
190 | into newprint() I<despite> "my $x" having gone out of scope by the | |
191 | time the anonymous subroutine runs. That's what a closure is all | |
192 | about. | |
193 | ||
194 | This applies only to lexical variables, by the way. Dynamic variables | |
195 | continue to work as they have always worked. Closure is not something | |
196 | that most Perl programmers need trouble themselves about to begin with. | |
197 | ||
198 | =item 5. | |
199 | ||
200 | References are often returned by special subroutines called constructors. | |
201 | Perl objects are just references to a special type of object that happens to know | |
202 | which package it's associated with. Constructors are just special | |
203 | subroutines that know how to create that association. They do so by | |
204 | starting with an ordinary reference, and it remains an ordinary reference | |
205 | even while it's also being an object. Constructors are often | |
206 | named new() and called indirectly: | |
207 | ||
208 | $objref = new Doggie (Tail => 'short', Ears => 'long'); | |
209 | ||
210 | But don't have to be: | |
211 | ||
212 | $objref = Doggie->new(Tail => 'short', Ears => 'long'); | |
213 | ||
214 | use Term::Cap; | |
215 | $terminal = Term::Cap->Tgetent( { OSPEED => 9600 }); | |
216 | ||
217 | use Tk; | |
218 | $main = MainWindow->new(); | |
219 | $menubar = $main->Frame(-relief => "raised", | |
220 | -borderwidth => 2) | |
221 | ||
222 | =item 6. | |
223 | ||
224 | References of the appropriate type can spring into existence if you | |
225 | dereference them in a context that assumes they exist. Because we haven't | |
226 | talked about dereferencing yet, we can't show you any examples yet. | |
227 | ||
228 | =item 7. | |
229 | ||
230 | A reference can be created by using a special syntax, lovingly known as | |
231 | the *foo{THING} syntax. *foo{THING} returns a reference to the THING | |
232 | slot in *foo (which is the symbol table entry which holds everything | |
233 | known as foo). | |
234 | ||
235 | $scalarref = *foo{SCALAR}; | |
236 | $arrayref = *ARGV{ARRAY}; | |
237 | $hashref = *ENV{HASH}; | |
238 | $coderef = *handler{CODE}; | |
239 | $ioref = *STDIN{IO}; | |
240 | $globref = *foo{GLOB}; | |
241 | ||
242 | All of these are self-explanatory except for C<*foo{IO}>. It returns | |
243 | the IO handle, used for file handles (L<perlfunc/open>), sockets | |
244 | (L<perlfunc/socket> and L<perlfunc/socketpair>), and directory | |
245 | handles (L<perlfunc/opendir>). For compatibility with previous | |
246 | versions of Perl, C<*foo{FILEHANDLE}> is a synonym for C<*foo{IO}>, though it | |
247 | is deprecated as of 5.8.0. If deprecation warnings are in effect, it will warn | |
248 | of its use. | |
249 | ||
250 | C<*foo{THING}> returns undef if that particular THING hasn't been used yet, | |
251 | except in the case of scalars. C<*foo{SCALAR}> returns a reference to an | |
252 | anonymous scalar if $foo hasn't been used yet. This might change in a | |
253 | future release. | |
254 | ||
255 | C<*foo{IO}> is an alternative to the C<*HANDLE> mechanism given in | |
256 | L<perldata/"Typeglobs and Filehandles"> for passing filehandles | |
257 | into or out of subroutines, or storing into larger data structures. | |
258 | Its disadvantage is that it won't create a new filehandle for you. | |
259 | Its advantage is that you have less risk of clobbering more than | |
260 | you want to with a typeglob assignment. (It still conflates file | |
261 | and directory handles, though.) However, if you assign the incoming | |
262 | value to a scalar instead of a typeglob as we do in the examples | |
263 | below, there's no risk of that happening. | |
264 | ||
265 | splutter(*STDOUT); # pass the whole glob | |
266 | splutter(*STDOUT{IO}); # pass both file and dir handles | |
267 | ||
268 | sub splutter { | |
269 | my $fh = shift; | |
270 | print $fh "her um well a hmmm\n"; | |
271 | } | |
272 | ||
273 | $rec = get_rec(*STDIN); # pass the whole glob | |
274 | $rec = get_rec(*STDIN{IO}); # pass both file and dir handles | |
275 | ||
276 | sub get_rec { | |
277 | my $fh = shift; | |
278 | return scalar <$fh>; | |
279 | } | |
280 | ||
281 | =back | |
282 | ||
283 | =head2 Using References | |
284 | ||
285 | That's it for creating references. By now you're probably dying to | |
286 | know how to use references to get back to your long-lost data. There | |
287 | are several basic methods. | |
288 | ||
289 | =over 4 | |
290 | ||
291 | =item 1. | |
292 | ||
293 | Anywhere you'd put an identifier (or chain of identifiers) as part | |
294 | of a variable or subroutine name, you can replace the identifier with | |
295 | a simple scalar variable containing a reference of the correct type: | |
296 | ||
297 | $bar = $$scalarref; | |
298 | push(@$arrayref, $filename); | |
299 | $$arrayref[0] = "January"; | |
300 | $$hashref{"KEY"} = "VALUE"; | |
301 | &$coderef(1,2,3); | |
302 | print $globref "output\n"; | |
303 | ||
304 | It's important to understand that we are specifically I<not> dereferencing | |
305 | C<$arrayref[0]> or C<$hashref{"KEY"}> there. The dereference of the | |
306 | scalar variable happens I<before> it does any key lookups. Anything more | |
307 | complicated than a simple scalar variable must use methods 2 or 3 below. | |
308 | However, a "simple scalar" includes an identifier that itself uses method | |
309 | 1 recursively. Therefore, the following prints "howdy". | |
310 | ||
311 | $refrefref = \\\"howdy"; | |
312 | print $$$$refrefref; | |
313 | ||
314 | =item 2. | |
315 | ||
316 | Anywhere you'd put an identifier (or chain of identifiers) as part of a | |
317 | variable or subroutine name, you can replace the identifier with a | |
318 | BLOCK returning a reference of the correct type. In other words, the | |
319 | previous examples could be written like this: | |
320 | ||
321 | $bar = ${$scalarref}; | |
322 | push(@{$arrayref}, $filename); | |
323 | ${$arrayref}[0] = "January"; | |
324 | ${$hashref}{"KEY"} = "VALUE"; | |
325 | &{$coderef}(1,2,3); | |
326 | $globref->print("output\n"); # iff IO::Handle is loaded | |
327 | ||
328 | Admittedly, it's a little silly to use the curlies in this case, but | |
329 | the BLOCK can contain any arbitrary expression, in particular, | |
330 | subscripted expressions: | |
331 | ||
332 | &{ $dispatch{$index} }(1,2,3); # call correct routine | |
333 | ||
334 | Because of being able to omit the curlies for the simple case of C<$$x>, | |
335 | people often make the mistake of viewing the dereferencing symbols as | |
336 | proper operators, and wonder about their precedence. If they were, | |
337 | though, you could use parentheses instead of braces. That's not the case. | |
338 | Consider the difference below; case 0 is a short-hand version of case 1, | |
339 | I<not> case 2: | |
340 | ||
341 | $$hashref{"KEY"} = "VALUE"; # CASE 0 | |
342 | ${$hashref}{"KEY"} = "VALUE"; # CASE 1 | |
343 | ${$hashref{"KEY"}} = "VALUE"; # CASE 2 | |
344 | ${$hashref->{"KEY"}} = "VALUE"; # CASE 3 | |
345 | ||
346 | Case 2 is also deceptive in that you're accessing a variable | |
347 | called %hashref, not dereferencing through $hashref to the hash | |
348 | it's presumably referencing. That would be case 3. | |
349 | ||
350 | =item 3. | |
351 | ||
352 | Subroutine calls and lookups of individual array elements arise often | |
353 | enough that it gets cumbersome to use method 2. As a form of | |
354 | syntactic sugar, the examples for method 2 may be written: | |
355 | ||
356 | $arrayref->[0] = "January"; # Array element | |
357 | $hashref->{"KEY"} = "VALUE"; # Hash element | |
358 | $coderef->(1,2,3); # Subroutine call | |
359 | ||
360 | The left side of the arrow can be any expression returning a reference, | |
361 | including a previous dereference. Note that C<$array[$x]> is I<not> the | |
362 | same thing as C<< $array->[$x] >> here: | |
363 | ||
364 | $array[$x]->{"foo"}->[0] = "January"; | |
365 | ||
366 | This is one of the cases we mentioned earlier in which references could | |
367 | spring into existence when in an lvalue context. Before this | |
368 | statement, C<$array[$x]> may have been undefined. If so, it's | |
369 | automatically defined with a hash reference so that we can look up | |
370 | C<{"foo"}> in it. Likewise C<< $array[$x]->{"foo"} >> will automatically get | |
371 | defined with an array reference so that we can look up C<[0]> in it. | |
372 | This process is called I<autovivification>. | |
373 | ||
374 | One more thing here. The arrow is optional I<between> brackets | |
375 | subscripts, so you can shrink the above down to | |
376 | ||
377 | $array[$x]{"foo"}[0] = "January"; | |
378 | ||
379 | Which, in the degenerate case of using only ordinary arrays, gives you | |
380 | multidimensional arrays just like C's: | |
381 | ||
382 | $score[$x][$y][$z] += 42; | |
383 | ||
384 | Well, okay, not entirely like C's arrays, actually. C doesn't know how | |
385 | to grow its arrays on demand. Perl does. | |
386 | ||
387 | =item 4. | |
388 | ||
389 | If a reference happens to be a reference to an object, then there are | |
390 | probably methods to access the things referred to, and you should probably | |
391 | stick to those methods unless you're in the class package that defines the | |
392 | object's methods. In other words, be nice, and don't violate the object's | |
393 | encapsulation without a very good reason. Perl does not enforce | |
394 | encapsulation. We are not totalitarians here. We do expect some basic | |
395 | civility though. | |
396 | ||
397 | =back | |
398 | ||
399 | Using a string or number as a reference produces a symbolic reference, | |
400 | as explained above. Using a reference as a number produces an | |
401 | integer representing its storage location in memory. The only | |
402 | useful thing to be done with this is to compare two references | |
403 | numerically to see whether they refer to the same location. | |
404 | ||
405 | if ($ref1 == $ref2) { # cheap numeric compare of references | |
406 | print "refs 1 and 2 refer to the same thing\n"; | |
407 | } | |
408 | ||
409 | Using a reference as a string produces both its referent's type, | |
410 | including any package blessing as described in L<perlobj>, as well | |
411 | as the numeric address expressed in hex. The ref() operator returns | |
412 | just the type of thing the reference is pointing to, without the | |
413 | address. See L<perlfunc/ref> for details and examples of its use. | |
414 | ||
415 | The bless() operator may be used to associate the object a reference | |
416 | points to with a package functioning as an object class. See L<perlobj>. | |
417 | ||
418 | A typeglob may be dereferenced the same way a reference can, because | |
419 | the dereference syntax always indicates the type of reference desired. | |
420 | So C<${*foo}> and C<${\$foo}> both indicate the same scalar variable. | |
421 | ||
422 | Here's a trick for interpolating a subroutine call into a string: | |
423 | ||
424 | print "My sub returned @{[mysub(1,2,3)]} that time.\n"; | |
425 | ||
426 | The way it works is that when the C<@{...}> is seen in the double-quoted | |
427 | string, it's evaluated as a block. The block creates a reference to an | |
428 | anonymous array containing the results of the call to C<mysub(1,2,3)>. So | |
429 | the whole block returns a reference to an array, which is then | |
430 | dereferenced by C<@{...}> and stuck into the double-quoted string. This | |
431 | chicanery is also useful for arbitrary expressions: | |
432 | ||
433 | print "That yields @{[$n + 5]} widgets\n"; | |
434 | ||
435 | =head2 Symbolic references | |
436 | ||
437 | We said that references spring into existence as necessary if they are | |
438 | undefined, but we didn't say what happens if a value used as a | |
439 | reference is already defined, but I<isn't> a hard reference. If you | |
440 | use it as a reference, it'll be treated as a symbolic | |
441 | reference. That is, the value of the scalar is taken to be the I<name> | |
442 | of a variable, rather than a direct link to a (possibly) anonymous | |
443 | value. | |
444 | ||
445 | People frequently expect it to work like this. So it does. | |
446 | ||
447 | $name = "foo"; | |
448 | $$name = 1; # Sets $foo | |
449 | ${$name} = 2; # Sets $foo | |
450 | ${$name x 2} = 3; # Sets $foofoo | |
451 | $name->[0] = 4; # Sets $foo[0] | |
452 | @$name = (); # Clears @foo | |
453 | &$name(); # Calls &foo() (as in Perl 4) | |
454 | $pack = "THAT"; | |
455 | ${"${pack}::$name"} = 5; # Sets $THAT::foo without eval | |
456 | ||
457 | This is powerful, and slightly dangerous, in that it's possible | |
458 | to intend (with the utmost sincerity) to use a hard reference, and | |
459 | accidentally use a symbolic reference instead. To protect against | |
460 | that, you can say | |
461 | ||
462 | use strict 'refs'; | |
463 | ||
464 | and then only hard references will be allowed for the rest of the enclosing | |
465 | block. An inner block may countermand that with | |
466 | ||
467 | no strict 'refs'; | |
468 | ||
469 | Only package variables (globals, even if localized) are visible to | |
470 | symbolic references. Lexical variables (declared with my()) aren't in | |
471 | a symbol table, and thus are invisible to this mechanism. For example: | |
472 | ||
473 | local $value = 10; | |
474 | $ref = "value"; | |
475 | { | |
476 | my $value = 20; | |
477 | print $$ref; | |
478 | } | |
479 | ||
480 | This will still print 10, not 20. Remember that local() affects package | |
481 | variables, which are all "global" to the package. | |
482 | ||
483 | =head2 Not-so-symbolic references | |
484 | ||
485 | A new feature contributing to readability in perl version 5.001 is that the | |
486 | brackets around a symbolic reference behave more like quotes, just as they | |
487 | always have within a string. That is, | |
488 | ||
489 | $push = "pop on "; | |
490 | print "${push}over"; | |
491 | ||
492 | has always meant to print "pop on over", even though push is | |
493 | a reserved word. This has been generalized to work the same outside | |
494 | of quotes, so that | |
495 | ||
496 | print ${push} . "over"; | |
497 | ||
498 | and even | |
499 | ||
500 | print ${ push } . "over"; | |
501 | ||
502 | will have the same effect. (This would have been a syntax error in | |
503 | Perl 5.000, though Perl 4 allowed it in the spaceless form.) This | |
504 | construct is I<not> considered to be a symbolic reference when you're | |
505 | using strict refs: | |
506 | ||
507 | use strict 'refs'; | |
508 | ${ bareword }; # Okay, means $bareword. | |
509 | ${ "bareword" }; # Error, symbolic reference. | |
510 | ||
511 | Similarly, because of all the subscripting that is done using single | |
512 | words, we've applied the same rule to any bareword that is used for | |
513 | subscripting a hash. So now, instead of writing | |
514 | ||
515 | $array{ "aaa" }{ "bbb" }{ "ccc" } | |
516 | ||
517 | you can write just | |
518 | ||
519 | $array{ aaa }{ bbb }{ ccc } | |
520 | ||
521 | and not worry about whether the subscripts are reserved words. In the | |
522 | rare event that you do wish to do something like | |
523 | ||
524 | $array{ shift } | |
525 | ||
526 | you can force interpretation as a reserved word by adding anything that | |
527 | makes it more than a bareword: | |
528 | ||
529 | $array{ shift() } | |
530 | $array{ +shift } | |
531 | $array{ shift @_ } | |
532 | ||
533 | The C<use warnings> pragma or the B<-w> switch will warn you if it | |
534 | interprets a reserved word as a string. | |
535 | But it will no longer warn you about using lowercase words, because the | |
536 | string is effectively quoted. | |
537 | ||
538 | =head2 Pseudo-hashes: Using an array as a hash | |
539 | ||
540 | B<WARNING>: This section describes an experimental feature. Details may | |
541 | change without notice in future versions. | |
542 | ||
543 | B<NOTE>: The current user-visible implementation of pseudo-hashes | |
544 | (the weird use of the first array element) is deprecated starting from | |
545 | Perl 5.8.0 and will be removed in Perl 5.10.0, and the feature will be | |
546 | implemented differently. Not only is the current interface rather ugly, | |
547 | but the current implementation slows down normal array and hash use quite | |
548 | noticeably. The 'fields' pragma interface will remain available. | |
549 | ||
550 | Beginning with release 5.005 of Perl, you may use an array reference | |
551 | in some contexts that would normally require a hash reference. This | |
552 | allows you to access array elements using symbolic names, as if they | |
553 | were fields in a structure. | |
554 | ||
555 | For this to work, the array must contain extra information. The first | |
556 | element of the array has to be a hash reference that maps field names | |
557 | to array indices. Here is an example: | |
558 | ||
559 | $struct = [{foo => 1, bar => 2}, "FOO", "BAR"]; | |
560 | ||
561 | $struct->{foo}; # same as $struct->[1], i.e. "FOO" | |
562 | $struct->{bar}; # same as $struct->[2], i.e. "BAR" | |
563 | ||
564 | keys %$struct; # will return ("foo", "bar") in some order | |
565 | values %$struct; # will return ("FOO", "BAR") in same some order | |
566 | ||
567 | while (my($k,$v) = each %$struct) { | |
568 | print "$k => $v\n"; | |
569 | } | |
570 | ||
571 | Perl will raise an exception if you try to access nonexistent fields. | |
572 | To avoid inconsistencies, always use the fields::phash() function | |
573 | provided by the C<fields> pragma. | |
574 | ||
575 | use fields; | |
576 | $pseudohash = fields::phash(foo => "FOO", bar => "BAR"); | |
577 | ||
578 | For better performance, Perl can also do the translation from field | |
579 | names to array indices at compile time for typed object references. | |
580 | See L<fields>. | |
581 | ||
582 | There are two ways to check for the existence of a key in a | |
583 | pseudo-hash. The first is to use exists(). This checks to see if the | |
584 | given field has ever been set. It acts this way to match the behavior | |
585 | of a regular hash. For instance: | |
586 | ||
587 | use fields; | |
588 | $phash = fields::phash([qw(foo bar pants)], ['FOO']); | |
589 | $phash->{pants} = undef; | |
590 | ||
591 | print exists $phash->{foo}; # true, 'foo' was set in the declaration | |
592 | print exists $phash->{bar}; # false, 'bar' has not been used. | |
593 | print exists $phash->{pants}; # true, your 'pants' have been touched | |
594 | ||
595 | The second is to use exists() on the hash reference sitting in the | |
596 | first array element. This checks to see if the given key is a valid | |
597 | field in the pseudo-hash. | |
598 | ||
599 | print exists $phash->[0]{bar}; # true, 'bar' is a valid field | |
600 | print exists $phash->[0]{shoes};# false, 'shoes' can't be used | |
601 | ||
602 | delete() on a pseudo-hash element only deletes the value corresponding | |
603 | to the key, not the key itself. To delete the key, you'll have to | |
604 | explicitly delete it from the first hash element. | |
605 | ||
606 | print delete $phash->{foo}; # prints $phash->[1], "FOO" | |
607 | print exists $phash->{foo}; # false | |
608 | print exists $phash->[0]{foo}; # true, key still exists | |
609 | print delete $phash->[0]{foo}; # now key is gone | |
610 | print $phash->{foo}; # runtime exception | |
611 | ||
612 | =head2 Function Templates | |
613 | ||
614 | As explained above, a closure is an anonymous function with access to the | |
615 | lexical variables visible when that function was compiled. It retains | |
616 | access to those variables even though it doesn't get run until later, | |
617 | such as in a signal handler or a Tk callback. | |
618 | ||
619 | Using a closure as a function template allows us to generate many functions | |
620 | that act similarly. Suppose you wanted functions named after the colors | |
621 | that generated HTML font changes for the various colors: | |
622 | ||
623 | print "Be ", red("careful"), "with that ", green("light"); | |
624 | ||
625 | The red() and green() functions would be similar. To create these, | |
626 | we'll assign a closure to a typeglob of the name of the function we're | |
627 | trying to build. | |
628 | ||
629 | @colors = qw(red blue green yellow orange purple violet); | |
630 | for my $name (@colors) { | |
631 | no strict 'refs'; # allow symbol table manipulation | |
632 | *$name = *{uc $name} = sub { "<FONT COLOR='$name'>@_</FONT>" }; | |
633 | } | |
634 | ||
635 | Now all those different functions appear to exist independently. You can | |
636 | call red(), RED(), blue(), BLUE(), green(), etc. This technique saves on | |
637 | both compile time and memory use, and is less error-prone as well, since | |
638 | syntax checks happen at compile time. It's critical that any variables in | |
639 | the anonymous subroutine be lexicals in order to create a proper closure. | |
640 | That's the reasons for the C<my> on the loop iteration variable. | |
641 | ||
642 | This is one of the only places where giving a prototype to a closure makes | |
643 | much sense. If you wanted to impose scalar context on the arguments of | |
644 | these functions (probably not a wise idea for this particular example), | |
645 | you could have written it this way instead: | |
646 | ||
647 | *$name = sub ($) { "<FONT COLOR='$name'>$_[0]</FONT>" }; | |
648 | ||
649 | However, since prototype checking happens at compile time, the assignment | |
650 | above happens too late to be of much use. You could address this by | |
651 | putting the whole loop of assignments within a BEGIN block, forcing it | |
652 | to occur during compilation. | |
653 | ||
654 | Access to lexicals that change over type--like those in the C<for> loop | |
655 | above--only works with closures, not general subroutines. In the general | |
656 | case, then, named subroutines do not nest properly, although anonymous | |
657 | ones do. If you are accustomed to using nested subroutines in other | |
658 | programming languages with their own private variables, you'll have to | |
659 | work at it a bit in Perl. The intuitive coding of this type of thing | |
660 | incurs mysterious warnings about ``will not stay shared''. For example, | |
661 | this won't work: | |
662 | ||
663 | sub outer { | |
664 | my $x = $_[0] + 35; | |
665 | sub inner { return $x * 19 } # WRONG | |
666 | return $x + inner(); | |
667 | } | |
668 | ||
669 | A work-around is the following: | |
670 | ||
671 | sub outer { | |
672 | my $x = $_[0] + 35; | |
673 | local *inner = sub { return $x * 19 }; | |
674 | return $x + inner(); | |
675 | } | |
676 | ||
677 | Now inner() can only be called from within outer(), because of the | |
678 | temporary assignments of the closure (anonymous subroutine). But when | |
679 | it does, it has normal access to the lexical variable $x from the scope | |
680 | of outer(). | |
681 | ||
682 | This has the interesting effect of creating a function local to another | |
683 | function, something not normally supported in Perl. | |
684 | ||
685 | =head1 WARNING | |
686 | ||
687 | You may not (usefully) use a reference as the key to a hash. It will be | |
688 | converted into a string: | |
689 | ||
690 | $x{ \$a } = $a; | |
691 | ||
692 | If you try to dereference the key, it won't do a hard dereference, and | |
693 | you won't accomplish what you're attempting. You might want to do something | |
694 | more like | |
695 | ||
696 | $r = \@a; | |
697 | $x{ $r } = $r; | |
698 | ||
699 | And then at least you can use the values(), which will be | |
700 | real refs, instead of the keys(), which won't. | |
701 | ||
702 | The standard Tie::RefHash module provides a convenient workaround to this. | |
703 | ||
704 | =head1 SEE ALSO | |
705 | ||
706 | Besides the obvious documents, source code can be instructive. | |
707 | Some pathological examples of the use of references can be found | |
708 | in the F<t/op/ref.t> regression test in the Perl source directory. | |
709 | ||
710 | See also L<perldsc> and L<perllol> for how to use references to create | |
711 | complex data structures, and L<perltoot>, L<perlobj>, and L<perlbot> | |
712 | for how to use them to create objects. |