Commit | Line | Data |
---|---|---|
86530b38 AT |
1 | |
2 | =head1 NAME | |
3 | ||
4 | perlreftut - Mark's very short tutorial about references | |
5 | ||
6 | =head1 DESCRIPTION | |
7 | ||
8 | One of the most important new features in Perl 5 was the capability to | |
9 | manage complicated data structures like multidimensional arrays and | |
10 | nested hashes. To enable these, Perl 5 introduced a feature called | |
11 | `references', and using references is the key to managing complicated, | |
12 | structured data in Perl. Unfortunately, there's a lot of funny syntax | |
13 | to learn, and the main manual page can be hard to follow. The manual | |
14 | is quite complete, and sometimes people find that a problem, because | |
15 | it can be hard to tell what is important and what isn't. | |
16 | ||
17 | Fortunately, you only need to know 10% of what's in the main page to get | |
18 | 90% of the benefit. This page will show you that 10%. | |
19 | ||
20 | =head1 Who Needs Complicated Data Structures? | |
21 | ||
22 | One problem that came up all the time in Perl 4 was how to represent a | |
23 | hash whose values were lists. Perl 4 had hashes, of course, but the | |
24 | values had to be scalars; they couldn't be lists. | |
25 | ||
26 | Why would you want a hash of lists? Let's take a simple example: You | |
27 | have a file of city and country names, like this: | |
28 | ||
29 | Chicago, USA | |
30 | Frankfurt, Germany | |
31 | Berlin, Germany | |
32 | Washington, USA | |
33 | Helsinki, Finland | |
34 | New York, USA | |
35 | ||
36 | and you want to produce an output like this, with each country mentioned | |
37 | once, and then an alphabetical list of the cities in that country: | |
38 | ||
39 | Finland: Helsinki. | |
40 | Germany: Berlin, Frankfurt. | |
41 | USA: Chicago, New York, Washington. | |
42 | ||
43 | The natural way to do this is to have a hash whose keys are country | |
44 | names. Associated with each country name key is a list of the cities in | |
45 | that country. Each time you read a line of input, split it into a country | |
46 | and a city, look up the list of cities already known to be in that | |
47 | country, and append the new city to the list. When you're done reading | |
48 | the input, iterate over the hash as usual, sorting each list of cities | |
49 | before you print it out. | |
50 | ||
51 | If hash values can't be lists, you lose. In Perl 4, hash values can't | |
52 | be lists; they can only be strings. You lose. You'd probably have to | |
53 | combine all the cities into a single string somehow, and then when | |
54 | time came to write the output, you'd have to break the string into a | |
55 | list, sort the list, and turn it back into a string. This is messy | |
56 | and error-prone. And it's frustrating, because Perl already has | |
57 | perfectly good lists that would solve the problem if only you could | |
58 | use them. | |
59 | ||
60 | =head1 The Solution | |
61 | ||
62 | By the time Perl 5 rolled around, we were already stuck with this | |
63 | design: Hash values must be scalars. The solution to this is | |
64 | references. | |
65 | ||
66 | A reference is a scalar value that I<refers to> an entire array or an | |
67 | entire hash (or to just about anything else). Names are one kind of | |
68 | reference that you're already familiar with. Think of the President | |
69 | of the United States: a messy, inconvenient bag of blood and bones. | |
70 | But to talk about him, or to represent him in a computer program, all | |
71 | you need is the easy, convenient scalar string "George Bush". | |
72 | ||
73 | References in Perl are like names for arrays and hashes. They're | |
74 | Perl's private, internal names, so you can be sure they're | |
75 | unambiguous. Unlike "George Bush", a reference only refers to one | |
76 | thing, and you always know what it refers to. If you have a reference | |
77 | to an array, you can recover the entire array from it. If you have a | |
78 | reference to a hash, you can recover the entire hash. But the | |
79 | reference is still an easy, compact scalar value. | |
80 | ||
81 | You can't have a hash whose values are arrays; hash values can only be | |
82 | scalars. We're stuck with that. But a single reference can refer to | |
83 | an entire array, and references are scalars, so you can have a hash of | |
84 | references to arrays, and it'll act a lot like a hash of arrays, and | |
85 | it'll be just as useful as a hash of arrays. | |
86 | ||
87 | We'll come back to this city-country problem later, after we've seen | |
88 | some syntax for managing references. | |
89 | ||
90 | ||
91 | =head1 Syntax | |
92 | ||
93 | There are just two ways to make a reference, and just two ways to use | |
94 | it once you have it. | |
95 | ||
96 | =head2 Making References | |
97 | ||
98 | B<Make Rule 1> | |
99 | ||
100 | If you put a C<\> in front of a variable, you get a | |
101 | reference to that variable. | |
102 | ||
103 | $aref = \@array; # $aref now holds a reference to @array | |
104 | $href = \%hash; # $href now holds a reference to %hash | |
105 | ||
106 | Once the reference is stored in a variable like $aref or $href, you | |
107 | can copy it or store it just the same as any other scalar value: | |
108 | ||
109 | $xy = $aref; # $xy now holds a reference to @array | |
110 | $p[3] = $href; # $p[3] now holds a reference to %hash | |
111 | $z = $p[3]; # $z now holds a reference to %hash | |
112 | ||
113 | ||
114 | These examples show how to make references to variables with names. | |
115 | Sometimes you want to make an array or a hash that doesn't have a | |
116 | name. This is analogous to the way you like to be able to use the | |
117 | string C<"\n"> or the number 80 without having to store it in a named | |
118 | variable first. | |
119 | ||
120 | B<Make Rule 2> | |
121 | ||
122 | C<[ ITEMS ]> makes a new, anonymous array, and returns a reference to | |
123 | that array. C<{ ITEMS }> makes a new, anonymous hash. and returns a | |
124 | reference to that hash. | |
125 | ||
126 | $aref = [ 1, "foo", undef, 13 ]; | |
127 | # $aref now holds a reference to an array | |
128 | ||
129 | $href = { APR => 4, AUG => 8 }; | |
130 | # $href now holds a reference to a hash | |
131 | ||
132 | ||
133 | The references you get from rule 2 are the same kind of | |
134 | references that you get from rule 1: | |
135 | ||
136 | # This: | |
137 | $aref = [ 1, 2, 3 ]; | |
138 | ||
139 | # Does the same as this: | |
140 | @array = (1, 2, 3); | |
141 | $aref = \@array; | |
142 | ||
143 | ||
144 | The first line is an abbreviation for the following two lines, except | |
145 | that it doesn't create the superfluous array variable C<@array>. | |
146 | ||
147 | ||
148 | =head2 Using References | |
149 | ||
150 | What can you do with a reference once you have it? It's a scalar | |
151 | value, and we've seen that you can store it as a scalar and get it back | |
152 | again just like any scalar. There are just two more ways to use it: | |
153 | ||
154 | B<Use Rule 1> | |
155 | ||
156 | If C<$aref> contains a reference to an array, then you | |
157 | can put C<{$aref}> anywhere you would normally put the name of an | |
158 | array. For example, C<@{$aref}> instead of C<@array>. | |
159 | ||
160 | Here are some examples of that: | |
161 | ||
162 | Arrays: | |
163 | ||
164 | ||
165 | @a @{$aref} An array | |
166 | reverse @a reverse @{$aref} Reverse the array | |
167 | $a[3] ${$aref}[3] An element of the array | |
168 | $a[3] = 17; ${$aref}[3] = 17 Assigning an element | |
169 | ||
170 | ||
171 | On each line are two expressions that do the same thing. The | |
172 | left-hand versions operate on the array C<@a>, and the right-hand | |
173 | versions operate on the array that is referred to by C<$aref>, but | |
174 | once they find the array they're operating on, they do the same things | |
175 | to the arrays. | |
176 | ||
177 | Using a hash reference is I<exactly> the same: | |
178 | ||
179 | %h %{$href} A hash | |
180 | keys %h keys %{$href} Get the keys from the hash | |
181 | $h{'red'} ${$href}{'red'} An element of the hash | |
182 | $h{'red'} = 17 ${$href}{'red'} = 17 Assigning an element | |
183 | ||
184 | ||
185 | B<Use Rule 2> | |
186 | ||
187 | C<${$aref}[3]> is too hard to read, so you can write C<< $aref->[3] >> | |
188 | instead. | |
189 | ||
190 | C<${$href}{red}> is too hard to read, so you can write | |
191 | C<< $href->{red} >> instead. | |
192 | ||
193 | Most often, when you have an array or a hash, you want to get or set a | |
194 | single element from it. C<${$aref}[3]> and C<${$href}{'red'}> have | |
195 | too much punctuation, and Perl lets you abbreviate. | |
196 | ||
197 | If C<$aref> holds a reference to an array, then C<< $aref->[3] >> is | |
198 | the fourth element of the array. Don't confuse this with C<$aref[3]>, | |
199 | which is the fourth element of a totally different array, one | |
200 | deceptively named C<@aref>. C<$aref> and C<@aref> are unrelated the | |
201 | same way that C<$item> and C<@item> are. | |
202 | ||
203 | Similarly, C<< $href->{'red'} >> is part of the hash referred to by | |
204 | the scalar variable C<$href>, perhaps even one with no name. | |
205 | C<$href{'red'}> is part of the deceptively named C<%href> hash. It's | |
206 | easy to forget to leave out the C<< -> >>, and if you do, you'll get | |
207 | bizarre results when your program gets array and hash elements out of | |
208 | totally unexpected hashes and arrays that weren't the ones you wanted | |
209 | to use. | |
210 | ||
211 | ||
212 | =head1 An Example | |
213 | ||
214 | Let's see a quick example of how all this is useful. | |
215 | ||
216 | First, remember that C<[1, 2, 3]> makes an anonymous array containing | |
217 | C<(1, 2, 3)>, and gives you a reference to that array. | |
218 | ||
219 | Now think about | |
220 | ||
221 | @a = ( [1, 2, 3], | |
222 | [4, 5, 6], | |
223 | [7, 8, 9] | |
224 | ); | |
225 | ||
226 | @a is an array with three elements, and each one is a reference to | |
227 | another array. | |
228 | ||
229 | C<$a[1]> is one of these references. It refers to an array, the array | |
230 | containing C<(4, 5, 6)>, and because it is a reference to an array, | |
231 | B<USE RULE 2> says that we can write C<< $a[1]->[2] >> to get the | |
232 | third element from that array. C<< $a[1]->[2] >> is the 6. | |
233 | Similarly, C<< $a[0]->[1] >> is the 2. What we have here is like a | |
234 | two-dimensional array; you can write C<< $a[ROW]->[COLUMN] >> to get | |
235 | or set the element in any row and any column of the array. | |
236 | ||
237 | The notation still looks a little cumbersome, so there's one more | |
238 | abbreviation: | |
239 | ||
240 | =head1 Arrow Rule | |
241 | ||
242 | In between two B<subscripts>, the arrow is optional. | |
243 | ||
244 | Instead of C<< $a[1]->[2] >>, we can write C<$a[1][2]>; it means the | |
245 | same thing. Instead of C<< $a[0]->[1] >>, we can write C<$a[0][1]>; | |
246 | it means the same thing. | |
247 | ||
248 | Now it really looks like two-dimensional arrays! | |
249 | ||
250 | You can see why the arrows are important. Without them, we would have | |
251 | had to write C<${$a[1]}[2]> instead of C<$a[1][2]>. For | |
252 | three-dimensional arrays, they let us write C<$x[2][3][5]> instead of | |
253 | the unreadable C<${${$x[2]}[3]}[5]>. | |
254 | ||
255 | ||
256 | =head1 Solution | |
257 | ||
258 | Here's the answer to the problem I posed earlier, of reformatting a | |
259 | file of city and country names. | |
260 | ||
261 | 1 while (<>) { | |
262 | 2 chomp; | |
263 | 3 my ($city, $country) = split /, /; | |
264 | 4 push @{$table{$country}}, $city; | |
265 | 5 } | |
266 | 6 | |
267 | 7 foreach $country (sort keys %table) { | |
268 | 8 print "$country: "; | |
269 | 9 my @cities = @{$table{$country}}; | |
270 | 10 print join ', ', sort @cities; | |
271 | 11 print ".\n"; | |
272 | 12 } | |
273 | ||
274 | ||
275 | The program has two pieces: Lines 1--5 read the input and build a | |
276 | data structure, and lines 7--12 analyze the data and print out the | |
277 | report. | |
278 | ||
279 | In the first part, line 4 is the important one. We're going to have a | |
280 | hash, C<%table>, whose keys are country names, and whose values are | |
281 | (references to) arrays of city names. After acquiring a city and | |
282 | country name, the program looks up C<$table{$country}>, which holds (a | |
283 | reference to) the list of cities seen in that country so far. Line 4 is | |
284 | totally analogous to | |
285 | ||
286 | push @array, $city; | |
287 | ||
288 | except that the name C<array> has been replaced by the reference | |
289 | C<{$table{$country}}>. The C<push> adds a city name to the end of the | |
290 | referred-to array. | |
291 | ||
292 | In the second part, line 9 is the important one. Again, | |
293 | C<$table{$country}> is (a reference to) the list of cities in the country, so | |
294 | we can recover the original list, and copy it into the array C<@cities>, | |
295 | by using C<@{$table{$country}}>. Line 9 is totally analogous to | |
296 | ||
297 | @cities = @array; | |
298 | ||
299 | except that the name C<array> has been replaced by the reference | |
300 | C<{$table{$country}}>. The C<@> tells Perl to get the entire array. | |
301 | ||
302 | The rest of the program is just familiar uses of C<chomp>, C<split>, C<sort>, | |
303 | C<print>, and doesn't involve references at all. | |
304 | ||
305 | There's one fine point I skipped. Suppose the program has just read | |
306 | the first line in its input that happens to mention Greece. | |
307 | Control is at line 4, C<$country> is C<'Greece'>, and C<$city> is | |
308 | C<'Athens'>. Since this is the first city in Greece, | |
309 | C<$table{$country}> is undefined---in fact there isn't an C<'Greece'> key | |
310 | in C<%table> at all. What does line 4 do here? | |
311 | ||
312 | 4 push @{$table{$country}}, $city; | |
313 | ||
314 | ||
315 | This is Perl, so it does the exact right thing. It sees that you want | |
316 | to push C<Athens> onto an array that doesn't exist, so it helpfully | |
317 | makes a new, empty, anonymous array for you, installs it in the table, | |
318 | and then pushes C<Athens> onto it. This is called `autovivification'. | |
319 | ||
320 | ||
321 | =head1 The Rest | |
322 | ||
323 | I promised to give you 90% of the benefit with 10% of the details, and | |
324 | that means I left out 90% of the details. Now that you have an | |
325 | overview of the important parts, it should be easier to read the | |
326 | L<perlref> manual page, which discusses 100% of the details. | |
327 | ||
328 | Some of the highlights of L<perlref>: | |
329 | ||
330 | =over 4 | |
331 | ||
332 | =item * | |
333 | ||
334 | You can make references to anything, including scalars, functions, and | |
335 | other references. | |
336 | ||
337 | =item * | |
338 | ||
339 | In B<USE RULE 1>, you can omit the curly brackets whenever the thing | |
340 | inside them is an atomic scalar variable like C<$aref>. For example, | |
341 | C<@$aref> is the same as C<@{$aref}>, and C<$$aref[1]> is the same as | |
342 | C<${$aref}[1]>. If you're just starting out, you may want to adopt | |
343 | the habit of always including the curly brackets. | |
344 | ||
345 | =item * | |
346 | ||
347 | To see if a variable contains a reference, use the `ref' function. | |
348 | It returns true if its argument is a reference. Actually it's a | |
349 | little better than that: It returns HASH for hash references and | |
350 | ARRAY for array references. | |
351 | ||
352 | =item * | |
353 | ||
354 | If you try to use a reference like a string, you get strings like | |
355 | ||
356 | ARRAY(0x80f5dec) or HASH(0x826afc0) | |
357 | ||
358 | If you ever see a string that looks like this, you'll know you | |
359 | printed out a reference by mistake. | |
360 | ||
361 | A side effect of this representation is that you can use C<eq> to see | |
362 | if two references refer to the same thing. (But you should usually use | |
363 | C<==> instead because it's much faster.) | |
364 | ||
365 | =item * | |
366 | ||
367 | You can use a string as if it were a reference. If you use the string | |
368 | C<"foo"> as an array reference, it's taken to be a reference to the | |
369 | array C<@foo>. This is called a I<soft reference> or I<symbolic reference>. | |
370 | ||
371 | =back | |
372 | ||
373 | You might prefer to go on to L<perllol> instead of L<perlref>; it | |
374 | discusses lists of lists and multidimensional arrays in detail. After | |
375 | that, you should move on to L<perldsc>; it's a Data Structure Cookbook | |
376 | that shows recipes for using and printing out arrays of hashes, hashes | |
377 | of arrays, and other kinds of data. | |
378 | ||
379 | =head1 Summary | |
380 | ||
381 | Everyone needs compound data structures, and in Perl the way you get | |
382 | them is with references. There are four important rules for managing | |
383 | references: Two for making references and two for using them. Once | |
384 | you know these rules you can do most of the important things you need | |
385 | to do with references. | |
386 | ||
387 | =head1 Credits | |
388 | ||
389 | Author: Mark-Jason Dominus, Plover Systems (C<mjd-perl-ref+@plover.com>) | |
390 | ||
391 | This article originally appeared in I<The Perl Journal> | |
392 | ( http://www.tpj.com/ ) volume 3, #2. Reprinted with permission. | |
393 | ||
394 | The original title was I<Understand References Today>. | |
395 | ||
396 | =head2 Distribution Conditions | |
397 | ||
398 | Copyright 1998 The Perl Journal. | |
399 | ||
400 | When included as part of the Standard Version of Perl, or as part of | |
401 | its complete documentation whether printed or otherwise, this work may | |
402 | be distributed only under the terms of Perl's Artistic License. Any | |
403 | distribution of this file or derivatives thereof outside of that | |
404 | package require that special arrangements be made with copyright | |
405 | holder. | |
406 | ||
407 | Irrespective of its distribution, all code examples in these files are | |
408 | hereby placed into the public domain. You are permitted and | |
409 | encouraged to use this code in your own programs for fun or for profit | |
410 | as you see fit. A simple comment in the code giving credit would be | |
411 | courteous but is not required. | |
412 | ||
413 | ||
414 | ||
415 | ||
416 | =cut |