Commit | Line | Data |
---|---|---|
920dae64 AT |
1 | =head1 NAME |
2 | X<data structure> X<complex data structure> X<struct> | |
3 | ||
4 | perldsc - Perl Data Structures Cookbook | |
5 | ||
6 | =head1 DESCRIPTION | |
7 | ||
8 | The single feature most sorely lacking in the Perl programming language | |
9 | prior to its 5.0 release was complex data structures. Even without direct | |
10 | language support, some valiant programmers did manage to emulate them, but | |
11 | it was hard work and not for the faint of heart. You could occasionally | |
12 | get away with the C<$m{$AoA,$b}> notation borrowed from B<awk> in which the | |
13 | keys are actually more like a single concatenated string C<"$AoA$b">, but | |
14 | traversal and sorting were difficult. More desperate programmers even | |
15 | hacked Perl's internal symbol table directly, a strategy that proved hard | |
16 | to develop and maintain--to put it mildly. | |
17 | ||
18 | The 5.0 release of Perl let us have complex data structures. You | |
19 | may now write something like this and all of a sudden, you'd have an array | |
20 | with three dimensions! | |
21 | ||
22 | for $x (1 .. 10) { | |
23 | for $y (1 .. 10) { | |
24 | for $z (1 .. 10) { | |
25 | $AoA[$x][$y][$z] = | |
26 | $x ** $y + $z; | |
27 | } | |
28 | } | |
29 | } | |
30 | ||
31 | Alas, however simple this may appear, underneath it's a much more | |
32 | elaborate construct than meets the eye! | |
33 | ||
34 | How do you print it out? Why can't you say just C<print @AoA>? How do | |
35 | you sort it? How can you pass it to a function or get one of these back | |
36 | from a function? Is it an object? Can you save it to disk to read | |
37 | back later? How do you access whole rows or columns of that matrix? Do | |
38 | all the values have to be numeric? | |
39 | ||
40 | As you see, it's quite easy to become confused. While some small portion | |
41 | of the blame for this can be attributed to the reference-based | |
42 | implementation, it's really more due to a lack of existing documentation with | |
43 | examples designed for the beginner. | |
44 | ||
45 | This document is meant to be a detailed but understandable treatment of the | |
46 | many different sorts of data structures you might want to develop. It | |
47 | should also serve as a cookbook of examples. That way, when you need to | |
48 | create one of these complex data structures, you can just pinch, pilfer, or | |
49 | purloin a drop-in example from here. | |
50 | ||
51 | Let's look at each of these possible constructs in detail. There are separate | |
52 | sections on each of the following: | |
53 | ||
54 | =over 5 | |
55 | ||
56 | =item * arrays of arrays | |
57 | ||
58 | =item * hashes of arrays | |
59 | ||
60 | =item * arrays of hashes | |
61 | ||
62 | =item * hashes of hashes | |
63 | ||
64 | =item * more elaborate constructs | |
65 | ||
66 | =back | |
67 | ||
68 | But for now, let's look at general issues common to all | |
69 | these types of data structures. | |
70 | ||
71 | =head1 REFERENCES | |
72 | X<reference> X<dereference> X<dereferencing> X<pointer> | |
73 | ||
74 | The most important thing to understand about all data structures in Perl | |
75 | -- including multidimensional arrays--is that even though they might | |
76 | appear otherwise, Perl C<@ARRAY>s and C<%HASH>es are all internally | |
77 | one-dimensional. They can hold only scalar values (meaning a string, | |
78 | number, or a reference). They cannot directly contain other arrays or | |
79 | hashes, but instead contain I<references> to other arrays or hashes. | |
80 | X<multidimensional array> X<array, multidimensional> | |
81 | ||
82 | You can't use a reference to an array or hash in quite the same way that you | |
83 | would a real array or hash. For C or C++ programmers unused to | |
84 | distinguishing between arrays and pointers to the same, this can be | |
85 | confusing. If so, just think of it as the difference between a structure | |
86 | and a pointer to a structure. | |
87 | ||
88 | You can (and should) read more about references in the perlref(1) man | |
89 | page. Briefly, references are rather like pointers that know what they | |
90 | point to. (Objects are also a kind of reference, but we won't be needing | |
91 | them right away--if ever.) This means that when you have something which | |
92 | looks to you like an access to a two-or-more-dimensional array and/or hash, | |
93 | what's really going on is that the base type is | |
94 | merely a one-dimensional entity that contains references to the next | |
95 | level. It's just that you can I<use> it as though it were a | |
96 | two-dimensional one. This is actually the way almost all C | |
97 | multidimensional arrays work as well. | |
98 | ||
99 | $array[7][12] # array of arrays | |
100 | $array[7]{string} # array of hashes | |
101 | $hash{string}[7] # hash of arrays | |
102 | $hash{string}{'another string'} # hash of hashes | |
103 | ||
104 | Now, because the top level contains only references, if you try to print | |
105 | out your array in with a simple print() function, you'll get something | |
106 | that doesn't look very nice, like this: | |
107 | ||
108 | @AoA = ( [2, 3], [4, 5, 7], [0] ); | |
109 | print $AoA[1][2]; | |
110 | 7 | |
111 | print @AoA; | |
112 | ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0) | |
113 | ||
114 | ||
115 | That's because Perl doesn't (ever) implicitly dereference your variables. | |
116 | If you want to get at the thing a reference is referring to, then you have | |
117 | to do this yourself using either prefix typing indicators, like | |
118 | C<${$blah}>, C<@{$blah}>, C<@{$blah[$i]}>, or else postfix pointer arrows, | |
119 | like C<$a-E<gt>[3]>, C<$h-E<gt>{fred}>, or even C<$ob-E<gt>method()-E<gt>[3]>. | |
120 | ||
121 | =head1 COMMON MISTAKES | |
122 | ||
123 | The two most common mistakes made in constructing something like | |
124 | an array of arrays is either accidentally counting the number of | |
125 | elements or else taking a reference to the same memory location | |
126 | repeatedly. Here's the case where you just get the count instead | |
127 | of a nested array: | |
128 | ||
129 | for $i (1..10) { | |
130 | @array = somefunc($i); | |
131 | $AoA[$i] = @array; # WRONG! | |
132 | } | |
133 | ||
134 | That's just the simple case of assigning an array to a scalar and getting | |
135 | its element count. If that's what you really and truly want, then you | |
136 | might do well to consider being a tad more explicit about it, like this: | |
137 | ||
138 | for $i (1..10) { | |
139 | @array = somefunc($i); | |
140 | $counts[$i] = scalar @array; | |
141 | } | |
142 | ||
143 | Here's the case of taking a reference to the same memory location | |
144 | again and again: | |
145 | ||
146 | for $i (1..10) { | |
147 | @array = somefunc($i); | |
148 | $AoA[$i] = \@array; # WRONG! | |
149 | } | |
150 | ||
151 | So, what's the big problem with that? It looks right, doesn't it? | |
152 | After all, I just told you that you need an array of references, so by | |
153 | golly, you've made me one! | |
154 | ||
155 | Unfortunately, while this is true, it's still broken. All the references | |
156 | in @AoA refer to the I<very same place>, and they will therefore all hold | |
157 | whatever was last in @array! It's similar to the problem demonstrated in | |
158 | the following C program: | |
159 | ||
160 | #include <pwd.h> | |
161 | main() { | |
162 | struct passwd *getpwnam(), *rp, *dp; | |
163 | rp = getpwnam("root"); | |
164 | dp = getpwnam("daemon"); | |
165 | ||
166 | printf("daemon name is %s\nroot name is %s\n", | |
167 | dp->pw_name, rp->pw_name); | |
168 | } | |
169 | ||
170 | Which will print | |
171 | ||
172 | daemon name is daemon | |
173 | root name is daemon | |
174 | ||
175 | The problem is that both C<rp> and C<dp> are pointers to the same location | |
176 | in memory! In C, you'd have to remember to malloc() yourself some new | |
177 | memory. In Perl, you'll want to use the array constructor C<[]> or the | |
178 | hash constructor C<{}> instead. Here's the right way to do the preceding | |
179 | broken code fragments: | |
180 | X<[]> X<{}> | |
181 | ||
182 | for $i (1..10) { | |
183 | @array = somefunc($i); | |
184 | $AoA[$i] = [ @array ]; | |
185 | } | |
186 | ||
187 | The square brackets make a reference to a new array with a I<copy> | |
188 | of what's in @array at the time of the assignment. This is what | |
189 | you want. | |
190 | ||
191 | Note that this will produce something similar, but it's | |
192 | much harder to read: | |
193 | ||
194 | for $i (1..10) { | |
195 | @array = 0 .. $i; | |
196 | @{$AoA[$i]} = @array; | |
197 | } | |
198 | ||
199 | Is it the same? Well, maybe so--and maybe not. The subtle difference | |
200 | is that when you assign something in square brackets, you know for sure | |
201 | it's always a brand new reference with a new I<copy> of the data. | |
202 | Something else could be going on in this new case with the C<@{$AoA[$i]}}> | |
203 | dereference on the left-hand-side of the assignment. It all depends on | |
204 | whether C<$AoA[$i]> had been undefined to start with, or whether it | |
205 | already contained a reference. If you had already populated @AoA with | |
206 | references, as in | |
207 | ||
208 | $AoA[3] = \@another_array; | |
209 | ||
210 | Then the assignment with the indirection on the left-hand-side would | |
211 | use the existing reference that was already there: | |
212 | ||
213 | @{$AoA[3]} = @array; | |
214 | ||
215 | Of course, this I<would> have the "interesting" effect of clobbering | |
216 | @another_array. (Have you ever noticed how when a programmer says | |
217 | something is "interesting", that rather than meaning "intriguing", | |
218 | they're disturbingly more apt to mean that it's "annoying", | |
219 | "difficult", or both? :-) | |
220 | ||
221 | So just remember always to use the array or hash constructors with C<[]> | |
222 | or C<{}>, and you'll be fine, although it's not always optimally | |
223 | efficient. | |
224 | ||
225 | Surprisingly, the following dangerous-looking construct will | |
226 | actually work out fine: | |
227 | ||
228 | for $i (1..10) { | |
229 | my @array = somefunc($i); | |
230 | $AoA[$i] = \@array; | |
231 | } | |
232 | ||
233 | That's because my() is more of a run-time statement than it is a | |
234 | compile-time declaration I<per se>. This means that the my() variable is | |
235 | remade afresh each time through the loop. So even though it I<looks> as | |
236 | though you stored the same variable reference each time, you actually did | |
237 | not! This is a subtle distinction that can produce more efficient code at | |
238 | the risk of misleading all but the most experienced of programmers. So I | |
239 | usually advise against teaching it to beginners. In fact, except for | |
240 | passing arguments to functions, I seldom like to see the gimme-a-reference | |
241 | operator (backslash) used much at all in code. Instead, I advise | |
242 | beginners that they (and most of the rest of us) should try to use the | |
243 | much more easily understood constructors C<[]> and C<{}> instead of | |
244 | relying upon lexical (or dynamic) scoping and hidden reference-counting to | |
245 | do the right thing behind the scenes. | |
246 | ||
247 | In summary: | |
248 | ||
249 | $AoA[$i] = [ @array ]; # usually best | |
250 | $AoA[$i] = \@array; # perilous; just how my() was that array? | |
251 | @{ $AoA[$i] } = @array; # way too tricky for most programmers | |
252 | ||
253 | ||
254 | =head1 CAVEAT ON PRECEDENCE | |
255 | X<dereference, precedence> X<dereferencing, precedence> | |
256 | ||
257 | Speaking of things like C<@{$AoA[$i]}>, the following are actually the | |
258 | same thing: | |
259 | X<< -> >> | |
260 | ||
261 | $aref->[2][2] # clear | |
262 | $$aref[2][2] # confusing | |
263 | ||
264 | That's because Perl's precedence rules on its five prefix dereferencers | |
265 | (which look like someone swearing: C<$ @ * % &>) make them bind more | |
266 | tightly than the postfix subscripting brackets or braces! This will no | |
267 | doubt come as a great shock to the C or C++ programmer, who is quite | |
268 | accustomed to using C<*a[i]> to mean what's pointed to by the I<i'th> | |
269 | element of C<a>. That is, they first take the subscript, and only then | |
270 | dereference the thing at that subscript. That's fine in C, but this isn't C. | |
271 | ||
272 | The seemingly equivalent construct in Perl, C<$$aref[$i]> first does | |
273 | the deref of $aref, making it take $aref as a reference to an | |
274 | array, and then dereference that, and finally tell you the I<i'th> value | |
275 | of the array pointed to by $AoA. If you wanted the C notion, you'd have to | |
276 | write C<${$AoA[$i]}> to force the C<$AoA[$i]> to get evaluated first | |
277 | before the leading C<$> dereferencer. | |
278 | ||
279 | =head1 WHY YOU SHOULD ALWAYS C<use strict> | |
280 | ||
281 | If this is starting to sound scarier than it's worth, relax. Perl has | |
282 | some features to help you avoid its most common pitfalls. The best | |
283 | way to avoid getting confused is to start every program like this: | |
284 | ||
285 | #!/usr/bin/perl -w | |
286 | use strict; | |
287 | ||
288 | This way, you'll be forced to declare all your variables with my() and | |
289 | also disallow accidental "symbolic dereferencing". Therefore if you'd done | |
290 | this: | |
291 | ||
292 | my $aref = [ | |
293 | [ "fred", "barney", "pebbles", "bambam", "dino", ], | |
294 | [ "homer", "bart", "marge", "maggie", ], | |
295 | [ "george", "jane", "elroy", "judy", ], | |
296 | ]; | |
297 | ||
298 | print $aref[2][2]; | |
299 | ||
300 | The compiler would immediately flag that as an error I<at compile time>, | |
301 | because you were accidentally accessing C<@aref>, an undeclared | |
302 | variable, and it would thereby remind you to write instead: | |
303 | ||
304 | print $aref->[2][2] | |
305 | ||
306 | =head1 DEBUGGING | |
307 | X<data structure, debugging> X<complex data structure, debugging> | |
308 | X<AoA, debugging> X<HoA, debugging> X<AoH, debugging> X<HoH, debugging> | |
309 | X<array of arrays, debugging> X<hash of arrays, debugging> | |
310 | X<array of hashes, debugging> X<hash of hashes, debugging> | |
311 | ||
312 | Before version 5.002, the standard Perl debugger didn't do a very nice job of | |
313 | printing out complex data structures. With 5.002 or above, the | |
314 | debugger includes several new features, including command line editing as | |
315 | well as the C<x> command to dump out complex data structures. For | |
316 | example, given the assignment to $AoA above, here's the debugger output: | |
317 | ||
318 | DB<1> x $AoA | |
319 | $AoA = ARRAY(0x13b5a0) | |
320 | 0 ARRAY(0x1f0a24) | |
321 | 0 'fred' | |
322 | 1 'barney' | |
323 | 2 'pebbles' | |
324 | 3 'bambam' | |
325 | 4 'dino' | |
326 | 1 ARRAY(0x13b558) | |
327 | 0 'homer' | |
328 | 1 'bart' | |
329 | 2 'marge' | |
330 | 3 'maggie' | |
331 | 2 ARRAY(0x13b540) | |
332 | 0 'george' | |
333 | 1 'jane' | |
334 | 2 'elroy' | |
335 | 3 'judy' | |
336 | ||
337 | =head1 CODE EXAMPLES | |
338 | ||
339 | Presented with little comment (these will get their own manpages someday) | |
340 | here are short code examples illustrating access of various | |
341 | types of data structures. | |
342 | ||
343 | =head1 ARRAYS OF ARRAYS | |
344 | X<array of arrays> X<AoA> | |
345 | ||
346 | =head2 Declaration of an ARRAY OF ARRAYS | |
347 | ||
348 | @AoA = ( | |
349 | [ "fred", "barney" ], | |
350 | [ "george", "jane", "elroy" ], | |
351 | [ "homer", "marge", "bart" ], | |
352 | ); | |
353 | ||
354 | =head2 Generation of an ARRAY OF ARRAYS | |
355 | ||
356 | # reading from file | |
357 | while ( <> ) { | |
358 | push @AoA, [ split ]; | |
359 | } | |
360 | ||
361 | # calling a function | |
362 | for $i ( 1 .. 10 ) { | |
363 | $AoA[$i] = [ somefunc($i) ]; | |
364 | } | |
365 | ||
366 | # using temp vars | |
367 | for $i ( 1 .. 10 ) { | |
368 | @tmp = somefunc($i); | |
369 | $AoA[$i] = [ @tmp ]; | |
370 | } | |
371 | ||
372 | # add to an existing row | |
373 | push @{ $AoA[0] }, "wilma", "betty"; | |
374 | ||
375 | =head2 Access and Printing of an ARRAY OF ARRAYS | |
376 | ||
377 | # one element | |
378 | $AoA[0][0] = "Fred"; | |
379 | ||
380 | # another element | |
381 | $AoA[1][1] =~ s/(\w)/\u$1/; | |
382 | ||
383 | # print the whole thing with refs | |
384 | for $aref ( @AoA ) { | |
385 | print "\t [ @$aref ],\n"; | |
386 | } | |
387 | ||
388 | # print the whole thing with indices | |
389 | for $i ( 0 .. $#AoA ) { | |
390 | print "\t [ @{$AoA[$i]} ],\n"; | |
391 | } | |
392 | ||
393 | # print the whole thing one at a time | |
394 | for $i ( 0 .. $#AoA ) { | |
395 | for $j ( 0 .. $#{ $AoA[$i] } ) { | |
396 | print "elt $i $j is $AoA[$i][$j]\n"; | |
397 | } | |
398 | } | |
399 | ||
400 | =head1 HASHES OF ARRAYS | |
401 | X<hash of arrays> X<HoA> | |
402 | ||
403 | =head2 Declaration of a HASH OF ARRAYS | |
404 | ||
405 | %HoA = ( | |
406 | flintstones => [ "fred", "barney" ], | |
407 | jetsons => [ "george", "jane", "elroy" ], | |
408 | simpsons => [ "homer", "marge", "bart" ], | |
409 | ); | |
410 | ||
411 | =head2 Generation of a HASH OF ARRAYS | |
412 | ||
413 | # reading from file | |
414 | # flintstones: fred barney wilma dino | |
415 | while ( <> ) { | |
416 | next unless s/^(.*?):\s*//; | |
417 | $HoA{$1} = [ split ]; | |
418 | } | |
419 | ||
420 | # reading from file; more temps | |
421 | # flintstones: fred barney wilma dino | |
422 | while ( $line = <> ) { | |
423 | ($who, $rest) = split /:\s*/, $line, 2; | |
424 | @fields = split ' ', $rest; | |
425 | $HoA{$who} = [ @fields ]; | |
426 | } | |
427 | ||
428 | # calling a function that returns a list | |
429 | for $group ( "simpsons", "jetsons", "flintstones" ) { | |
430 | $HoA{$group} = [ get_family($group) ]; | |
431 | } | |
432 | ||
433 | # likewise, but using temps | |
434 | for $group ( "simpsons", "jetsons", "flintstones" ) { | |
435 | @members = get_family($group); | |
436 | $HoA{$group} = [ @members ]; | |
437 | } | |
438 | ||
439 | # append new members to an existing family | |
440 | push @{ $HoA{"flintstones"} }, "wilma", "betty"; | |
441 | ||
442 | =head2 Access and Printing of a HASH OF ARRAYS | |
443 | ||
444 | # one element | |
445 | $HoA{flintstones}[0] = "Fred"; | |
446 | ||
447 | # another element | |
448 | $HoA{simpsons}[1] =~ s/(\w)/\u$1/; | |
449 | ||
450 | # print the whole thing | |
451 | foreach $family ( keys %HoA ) { | |
452 | print "$family: @{ $HoA{$family} }\n" | |
453 | } | |
454 | ||
455 | # print the whole thing with indices | |
456 | foreach $family ( keys %HoA ) { | |
457 | print "family: "; | |
458 | foreach $i ( 0 .. $#{ $HoA{$family} } ) { | |
459 | print " $i = $HoA{$family}[$i]"; | |
460 | } | |
461 | print "\n"; | |
462 | } | |
463 | ||
464 | # print the whole thing sorted by number of members | |
465 | foreach $family ( sort { @{$HoA{$b}} <=> @{$HoA{$a}} } keys %HoA ) { | |
466 | print "$family: @{ $HoA{$family} }\n" | |
467 | } | |
468 | ||
469 | # print the whole thing sorted by number of members and name | |
470 | foreach $family ( sort { | |
471 | @{$HoA{$b}} <=> @{$HoA{$a}} | |
472 | || | |
473 | $a cmp $b | |
474 | } keys %HoA ) | |
475 | { | |
476 | print "$family: ", join(", ", sort @{ $HoA{$family} }), "\n"; | |
477 | } | |
478 | ||
479 | =head1 ARRAYS OF HASHES | |
480 | X<array of hashes> X<AoH> | |
481 | ||
482 | =head2 Declaration of an ARRAY OF HASHES | |
483 | ||
484 | @AoH = ( | |
485 | { | |
486 | Lead => "fred", | |
487 | Friend => "barney", | |
488 | }, | |
489 | { | |
490 | Lead => "george", | |
491 | Wife => "jane", | |
492 | Son => "elroy", | |
493 | }, | |
494 | { | |
495 | Lead => "homer", | |
496 | Wife => "marge", | |
497 | Son => "bart", | |
498 | } | |
499 | ); | |
500 | ||
501 | =head2 Generation of an ARRAY OF HASHES | |
502 | ||
503 | # reading from file | |
504 | # format: LEAD=fred FRIEND=barney | |
505 | while ( <> ) { | |
506 | $rec = {}; | |
507 | for $field ( split ) { | |
508 | ($key, $value) = split /=/, $field; | |
509 | $rec->{$key} = $value; | |
510 | } | |
511 | push @AoH, $rec; | |
512 | } | |
513 | ||
514 | ||
515 | # reading from file | |
516 | # format: LEAD=fred FRIEND=barney | |
517 | # no temp | |
518 | while ( <> ) { | |
519 | push @AoH, { split /[\s+=]/ }; | |
520 | } | |
521 | ||
522 | # calling a function that returns a key/value pair list, like | |
523 | # "lead","fred","daughter","pebbles" | |
524 | while ( %fields = getnextpairset() ) { | |
525 | push @AoH, { %fields }; | |
526 | } | |
527 | ||
528 | # likewise, but using no temp vars | |
529 | while (<>) { | |
530 | push @AoH, { parsepairs($_) }; | |
531 | } | |
532 | ||
533 | # add key/value to an element | |
534 | $AoH[0]{pet} = "dino"; | |
535 | $AoH[2]{pet} = "santa's little helper"; | |
536 | ||
537 | =head2 Access and Printing of an ARRAY OF HASHES | |
538 | ||
539 | # one element | |
540 | $AoH[0]{lead} = "fred"; | |
541 | ||
542 | # another element | |
543 | $AoH[1]{lead} =~ s/(\w)/\u$1/; | |
544 | ||
545 | # print the whole thing with refs | |
546 | for $href ( @AoH ) { | |
547 | print "{ "; | |
548 | for $role ( keys %$href ) { | |
549 | print "$role=$href->{$role} "; | |
550 | } | |
551 | print "}\n"; | |
552 | } | |
553 | ||
554 | # print the whole thing with indices | |
555 | for $i ( 0 .. $#AoH ) { | |
556 | print "$i is { "; | |
557 | for $role ( keys %{ $AoH[$i] } ) { | |
558 | print "$role=$AoH[$i]{$role} "; | |
559 | } | |
560 | print "}\n"; | |
561 | } | |
562 | ||
563 | # print the whole thing one at a time | |
564 | for $i ( 0 .. $#AoH ) { | |
565 | for $role ( keys %{ $AoH[$i] } ) { | |
566 | print "elt $i $role is $AoH[$i]{$role}\n"; | |
567 | } | |
568 | } | |
569 | ||
570 | =head1 HASHES OF HASHES | |
571 | X<hass of hashes> X<HoH> | |
572 | ||
573 | =head2 Declaration of a HASH OF HASHES | |
574 | ||
575 | %HoH = ( | |
576 | flintstones => { | |
577 | lead => "fred", | |
578 | pal => "barney", | |
579 | }, | |
580 | jetsons => { | |
581 | lead => "george", | |
582 | wife => "jane", | |
583 | "his boy" => "elroy", | |
584 | }, | |
585 | simpsons => { | |
586 | lead => "homer", | |
587 | wife => "marge", | |
588 | kid => "bart", | |
589 | }, | |
590 | ); | |
591 | ||
592 | =head2 Generation of a HASH OF HASHES | |
593 | ||
594 | # reading from file | |
595 | # flintstones: lead=fred pal=barney wife=wilma pet=dino | |
596 | while ( <> ) { | |
597 | next unless s/^(.*?):\s*//; | |
598 | $who = $1; | |
599 | for $field ( split ) { | |
600 | ($key, $value) = split /=/, $field; | |
601 | $HoH{$who}{$key} = $value; | |
602 | } | |
603 | ||
604 | ||
605 | # reading from file; more temps | |
606 | while ( <> ) { | |
607 | next unless s/^(.*?):\s*//; | |
608 | $who = $1; | |
609 | $rec = {}; | |
610 | $HoH{$who} = $rec; | |
611 | for $field ( split ) { | |
612 | ($key, $value) = split /=/, $field; | |
613 | $rec->{$key} = $value; | |
614 | } | |
615 | } | |
616 | ||
617 | # calling a function that returns a key,value hash | |
618 | for $group ( "simpsons", "jetsons", "flintstones" ) { | |
619 | $HoH{$group} = { get_family($group) }; | |
620 | } | |
621 | ||
622 | # likewise, but using temps | |
623 | for $group ( "simpsons", "jetsons", "flintstones" ) { | |
624 | %members = get_family($group); | |
625 | $HoH{$group} = { %members }; | |
626 | } | |
627 | ||
628 | # append new members to an existing family | |
629 | %new_folks = ( | |
630 | wife => "wilma", | |
631 | pet => "dino", | |
632 | ); | |
633 | ||
634 | for $what (keys %new_folks) { | |
635 | $HoH{flintstones}{$what} = $new_folks{$what}; | |
636 | } | |
637 | ||
638 | =head2 Access and Printing of a HASH OF HASHES | |
639 | ||
640 | # one element | |
641 | $HoH{flintstones}{wife} = "wilma"; | |
642 | ||
643 | # another element | |
644 | $HoH{simpsons}{lead} =~ s/(\w)/\u$1/; | |
645 | ||
646 | # print the whole thing | |
647 | foreach $family ( keys %HoH ) { | |
648 | print "$family: { "; | |
649 | for $role ( keys %{ $HoH{$family} } ) { | |
650 | print "$role=$HoH{$family}{$role} "; | |
651 | } | |
652 | print "}\n"; | |
653 | } | |
654 | ||
655 | # print the whole thing somewhat sorted | |
656 | foreach $family ( sort keys %HoH ) { | |
657 | print "$family: { "; | |
658 | for $role ( sort keys %{ $HoH{$family} } ) { | |
659 | print "$role=$HoH{$family}{$role} "; | |
660 | } | |
661 | print "}\n"; | |
662 | } | |
663 | ||
664 | ||
665 | # print the whole thing sorted by number of members | |
666 | foreach $family ( sort { keys %{$HoH{$b}} <=> keys %{$HoH{$a}} } keys %HoH ) { | |
667 | print "$family: { "; | |
668 | for $role ( sort keys %{ $HoH{$family} } ) { | |
669 | print "$role=$HoH{$family}{$role} "; | |
670 | } | |
671 | print "}\n"; | |
672 | } | |
673 | ||
674 | # establish a sort order (rank) for each role | |
675 | $i = 0; | |
676 | for ( qw(lead wife son daughter pal pet) ) { $rank{$_} = ++$i } | |
677 | ||
678 | # now print the whole thing sorted by number of members | |
679 | foreach $family ( sort { keys %{ $HoH{$b} } <=> keys %{ $HoH{$a} } } keys %HoH ) { | |
680 | print "$family: { "; | |
681 | # and print these according to rank order | |
682 | for $role ( sort { $rank{$a} <=> $rank{$b} } keys %{ $HoH{$family} } ) { | |
683 | print "$role=$HoH{$family}{$role} "; | |
684 | } | |
685 | print "}\n"; | |
686 | } | |
687 | ||
688 | ||
689 | =head1 MORE ELABORATE RECORDS | |
690 | X<record> X<structure> X<struct> | |
691 | ||
692 | =head2 Declaration of MORE ELABORATE RECORDS | |
693 | ||
694 | Here's a sample showing how to create and use a record whose fields are of | |
695 | many different sorts: | |
696 | ||
697 | $rec = { | |
698 | TEXT => $string, | |
699 | SEQUENCE => [ @old_values ], | |
700 | LOOKUP => { %some_table }, | |
701 | THATCODE => \&some_function, | |
702 | THISCODE => sub { $_[0] ** $_[1] }, | |
703 | HANDLE => \*STDOUT, | |
704 | }; | |
705 | ||
706 | print $rec->{TEXT}; | |
707 | ||
708 | print $rec->{SEQUENCE}[0]; | |
709 | $last = pop @ { $rec->{SEQUENCE} }; | |
710 | ||
711 | print $rec->{LOOKUP}{"key"}; | |
712 | ($first_k, $first_v) = each %{ $rec->{LOOKUP} }; | |
713 | ||
714 | $answer = $rec->{THATCODE}->($arg); | |
715 | $answer = $rec->{THISCODE}->($arg1, $arg2); | |
716 | ||
717 | # careful of extra block braces on fh ref | |
718 | print { $rec->{HANDLE} } "a string\n"; | |
719 | ||
720 | use FileHandle; | |
721 | $rec->{HANDLE}->autoflush(1); | |
722 | $rec->{HANDLE}->print(" a string\n"); | |
723 | ||
724 | =head2 Declaration of a HASH OF COMPLEX RECORDS | |
725 | ||
726 | %TV = ( | |
727 | flintstones => { | |
728 | series => "flintstones", | |
729 | nights => [ qw(monday thursday friday) ], | |
730 | members => [ | |
731 | { name => "fred", role => "lead", age => 36, }, | |
732 | { name => "wilma", role => "wife", age => 31, }, | |
733 | { name => "pebbles", role => "kid", age => 4, }, | |
734 | ], | |
735 | }, | |
736 | ||
737 | jetsons => { | |
738 | series => "jetsons", | |
739 | nights => [ qw(wednesday saturday) ], | |
740 | members => [ | |
741 | { name => "george", role => "lead", age => 41, }, | |
742 | { name => "jane", role => "wife", age => 39, }, | |
743 | { name => "elroy", role => "kid", age => 9, }, | |
744 | ], | |
745 | }, | |
746 | ||
747 | simpsons => { | |
748 | series => "simpsons", | |
749 | nights => [ qw(monday) ], | |
750 | members => [ | |
751 | { name => "homer", role => "lead", age => 34, }, | |
752 | { name => "marge", role => "wife", age => 37, }, | |
753 | { name => "bart", role => "kid", age => 11, }, | |
754 | ], | |
755 | }, | |
756 | ); | |
757 | ||
758 | =head2 Generation of a HASH OF COMPLEX RECORDS | |
759 | ||
760 | # reading from file | |
761 | # this is most easily done by having the file itself be | |
762 | # in the raw data format as shown above. perl is happy | |
763 | # to parse complex data structures if declared as data, so | |
764 | # sometimes it's easiest to do that | |
765 | ||
766 | # here's a piece by piece build up | |
767 | $rec = {}; | |
768 | $rec->{series} = "flintstones"; | |
769 | $rec->{nights} = [ find_days() ]; | |
770 | ||
771 | @members = (); | |
772 | # assume this file in field=value syntax | |
773 | while (<>) { | |
774 | %fields = split /[\s=]+/; | |
775 | push @members, { %fields }; | |
776 | } | |
777 | $rec->{members} = [ @members ]; | |
778 | ||
779 | # now remember the whole thing | |
780 | $TV{ $rec->{series} } = $rec; | |
781 | ||
782 | ########################################################### | |
783 | # now, you might want to make interesting extra fields that | |
784 | # include pointers back into the same data structure so if | |
785 | # change one piece, it changes everywhere, like for example | |
786 | # if you wanted a {kids} field that was a reference | |
787 | # to an array of the kids' records without having duplicate | |
788 | # records and thus update problems. | |
789 | ########################################################### | |
790 | foreach $family (keys %TV) { | |
791 | $rec = $TV{$family}; # temp pointer | |
792 | @kids = (); | |
793 | for $person ( @{ $rec->{members} } ) { | |
794 | if ($person->{role} =~ /kid|son|daughter/) { | |
795 | push @kids, $person; | |
796 | } | |
797 | } | |
798 | # REMEMBER: $rec and $TV{$family} point to same data!! | |
799 | $rec->{kids} = [ @kids ]; | |
800 | } | |
801 | ||
802 | # you copied the array, but the array itself contains pointers | |
803 | # to uncopied objects. this means that if you make bart get | |
804 | # older via | |
805 | ||
806 | $TV{simpsons}{kids}[0]{age}++; | |
807 | ||
808 | # then this would also change in | |
809 | print $TV{simpsons}{members}[2]{age}; | |
810 | ||
811 | # because $TV{simpsons}{kids}[0] and $TV{simpsons}{members}[2] | |
812 | # both point to the same underlying anonymous hash table | |
813 | ||
814 | # print the whole thing | |
815 | foreach $family ( keys %TV ) { | |
816 | print "the $family"; | |
817 | print " is on during @{ $TV{$family}{nights} }\n"; | |
818 | print "its members are:\n"; | |
819 | for $who ( @{ $TV{$family}{members} } ) { | |
820 | print " $who->{name} ($who->{role}), age $who->{age}\n"; | |
821 | } | |
822 | print "it turns out that $TV{$family}{lead} has "; | |
823 | print scalar ( @{ $TV{$family}{kids} } ), " kids named "; | |
824 | print join (", ", map { $_->{name} } @{ $TV{$family}{kids} } ); | |
825 | print "\n"; | |
826 | } | |
827 | ||
828 | =head1 Database Ties | |
829 | ||
830 | You cannot easily tie a multilevel data structure (such as a hash of | |
831 | hashes) to a dbm file. The first problem is that all but GDBM and | |
832 | Berkeley DB have size limitations, but beyond that, you also have problems | |
833 | with how references are to be represented on disk. One experimental | |
834 | module that does partially attempt to address this need is the MLDBM | |
835 | module. Check your nearest CPAN site as described in L<perlmodlib> for | |
836 | source code to MLDBM. | |
837 | ||
838 | =head1 SEE ALSO | |
839 | ||
840 | perlref(1), perllol(1), perldata(1), perlobj(1) | |
841 | ||
842 | =head1 AUTHOR | |
843 | ||
844 | Tom Christiansen <F<tchrist@perl.com>> | |
845 | ||
846 | Last update: | |
847 | Wed Oct 23 04:57:50 MET DST 1996 |