Commit | Line | Data |
---|---|---|
86530b38 AT |
1 | =head1 NAME |
2 | ||
3 | perlmod - Perl modules (packages and symbol tables) | |
4 | ||
5 | =head1 DESCRIPTION | |
6 | ||
7 | =head2 Packages | |
8 | ||
9 | Perl provides a mechanism for alternative namespaces to protect | |
10 | packages from stomping on each other's variables. In fact, there's | |
11 | really no such thing as a global variable in Perl. The package | |
12 | statement declares the compilation unit as being in the given | |
13 | namespace. The scope of the package declaration is from the | |
14 | declaration itself through the end of the enclosing block, C<eval>, | |
15 | or file, whichever comes first (the same scope as the my() and | |
16 | local() operators). Unqualified dynamic identifiers will be in | |
17 | this namespace, except for those few identifiers that if unqualified, | |
18 | default to the main package instead of the current one as described | |
19 | below. A package statement affects only dynamic variables--including | |
20 | those you've used local() on--but I<not> lexical variables created | |
21 | with my(). Typically it would be the first declaration in a file | |
22 | included by the C<do>, C<require>, or C<use> operators. You can | |
23 | switch into a package in more than one place; it merely influences | |
24 | which symbol table is used by the compiler for the rest of that | |
25 | block. You can refer to variables and filehandles in other packages | |
26 | by prefixing the identifier with the package name and a double | |
27 | colon: C<$Package::Variable>. If the package name is null, the | |
28 | C<main> package is assumed. That is, C<$::sail> is equivalent to | |
29 | C<$main::sail>. | |
30 | ||
31 | The old package delimiter was a single quote, but double colon is now the | |
32 | preferred delimiter, in part because it's more readable to humans, and | |
33 | in part because it's more readable to B<emacs> macros. It also makes C++ | |
34 | programmers feel like they know what's going on--as opposed to using the | |
35 | single quote as separator, which was there to make Ada programmers feel | |
36 | like they knew what's going on. Because the old-fashioned syntax is still | |
37 | supported for backwards compatibility, if you try to use a string like | |
38 | C<"This is $owner's house">, you'll be accessing C<$owner::s>; that is, | |
39 | the $s variable in package C<owner>, which is probably not what you meant. | |
40 | Use braces to disambiguate, as in C<"This is ${owner}'s house">. | |
41 | ||
42 | Packages may themselves contain package separators, as in | |
43 | C<$OUTER::INNER::var>. This implies nothing about the order of | |
44 | name lookups, however. There are no relative packages: all symbols | |
45 | are either local to the current package, or must be fully qualified | |
46 | from the outer package name down. For instance, there is nowhere | |
47 | within package C<OUTER> that C<$INNER::var> refers to | |
48 | C<$OUTER::INNER::var>. It would treat package C<INNER> as a totally | |
49 | separate global package. | |
50 | ||
51 | Only identifiers starting with letters (or underscore) are stored | |
52 | in a package's symbol table. All other symbols are kept in package | |
53 | C<main>, including all punctuation variables, like $_. In addition, | |
54 | when unqualified, the identifiers STDIN, STDOUT, STDERR, ARGV, | |
55 | ARGVOUT, ENV, INC, and SIG are forced to be in package C<main>, | |
56 | even when used for other purposes than their built-in one. If you | |
57 | have a package called C<m>, C<s>, or C<y>, then you can't use the | |
58 | qualified form of an identifier because it would be instead interpreted | |
59 | as a pattern match, a substitution, or a transliteration. | |
60 | ||
61 | Variables beginning with underscore used to be forced into package | |
62 | main, but we decided it was more useful for package writers to be able | |
63 | to use leading underscore to indicate private variables and method names. | |
64 | $_ is still global though. See also | |
65 | L<perlvar/"Technical Note on the Syntax of Variable Names">. | |
66 | ||
67 | C<eval>ed strings are compiled in the package in which the eval() was | |
68 | compiled. (Assignments to C<$SIG{}>, however, assume the signal | |
69 | handler specified is in the C<main> package. Qualify the signal handler | |
70 | name if you wish to have a signal handler in a package.) For an | |
71 | example, examine F<perldb.pl> in the Perl library. It initially switches | |
72 | to the C<DB> package so that the debugger doesn't interfere with variables | |
73 | in the program you are trying to debug. At various points, however, it | |
74 | temporarily switches back to the C<main> package to evaluate various | |
75 | expressions in the context of the C<main> package (or wherever you came | |
76 | from). See L<perldebug>. | |
77 | ||
78 | The special symbol C<__PACKAGE__> contains the current package, but cannot | |
79 | (easily) be used to construct variables. | |
80 | ||
81 | See L<perlsub> for other scoping issues related to my() and local(), | |
82 | and L<perlref> regarding closures. | |
83 | ||
84 | =head2 Symbol Tables | |
85 | ||
86 | The symbol table for a package happens to be stored in the hash of that | |
87 | name with two colons appended. The main symbol table's name is thus | |
88 | C<%main::>, or C<%::> for short. Likewise the symbol table for the nested | |
89 | package mentioned earlier is named C<%OUTER::INNER::>. | |
90 | ||
91 | The value in each entry of the hash is what you are referring to when you | |
92 | use the C<*name> typeglob notation. In fact, the following have the same | |
93 | effect, though the first is more efficient because it does the symbol | |
94 | table lookups at compile time: | |
95 | ||
96 | local *main::foo = *main::bar; | |
97 | local $main::{foo} = $main::{bar}; | |
98 | ||
99 | (Be sure to note the B<vast> difference between the second line above | |
100 | and C<local $main::foo = $main::bar>. The former is accessing the hash | |
101 | C<%main::>, which is the symbol table of package C<main>. The latter is | |
102 | simply assigning scalar C<$bar> in package C<main> to scalar C<$foo> of | |
103 | the same package.) | |
104 | ||
105 | You can use this to print out all the variables in a package, for | |
106 | instance. The standard but antiquated F<dumpvar.pl> library and | |
107 | the CPAN module Devel::Symdump make use of this. | |
108 | ||
109 | Assignment to a typeglob performs an aliasing operation, i.e., | |
110 | ||
111 | *dick = *richard; | |
112 | ||
113 | causes variables, subroutines, formats, and file and directory handles | |
114 | accessible via the identifier C<richard> also to be accessible via the | |
115 | identifier C<dick>. If you want to alias only a particular variable or | |
116 | subroutine, assign a reference instead: | |
117 | ||
118 | *dick = \$richard; | |
119 | ||
120 | Which makes $richard and $dick the same variable, but leaves | |
121 | @richard and @dick as separate arrays. Tricky, eh? | |
122 | ||
123 | There is one subtle difference between the following statements: | |
124 | ||
125 | *foo = *bar; | |
126 | *foo = \$bar; | |
127 | ||
128 | C<*foo = *bar> makes the typeglobs themselves synonymous while | |
129 | C<*foo = \$bar> makes the SCALAR portions of two distinct typeglobs | |
130 | refer to the same scalar value. This means that the following code: | |
131 | ||
132 | $bar = 1; | |
133 | *foo = \$bar; # Make $foo an alias for $bar | |
134 | ||
135 | { | |
136 | local $bar = 2; # Restrict changes to block | |
137 | print $foo; # Prints '1'! | |
138 | } | |
139 | ||
140 | Would print '1', because C<$foo> holds a reference to the I<original> | |
141 | C<$bar> -- the one that was stuffed away by C<local()> and which will be | |
142 | restored when the block ends. Because variables are accessed through the | |
143 | typeglob, you can use C<*foo = *bar> to create an alias which can be | |
144 | localized. (But be aware that this means you can't have a separate | |
145 | C<@foo> and C<@bar>, etc.) | |
146 | ||
147 | What makes all of this important is that the Exporter module uses glob | |
148 | aliasing as the import/export mechanism. Whether or not you can properly | |
149 | localize a variable that has been exported from a module depends on how | |
150 | it was exported: | |
151 | ||
152 | @EXPORT = qw($FOO); # Usual form, can't be localized | |
153 | @EXPORT = qw(*FOO); # Can be localized | |
154 | ||
155 | You can work around the first case by using the fully qualified name | |
156 | (C<$Package::FOO>) where you need a local value, or by overriding it | |
157 | by saying C<*FOO = *Package::FOO> in your script. | |
158 | ||
159 | The C<*x = \$y> mechanism may be used to pass and return cheap references | |
160 | into or from subroutines if you don't want to copy the whole | |
161 | thing. It only works when assigning to dynamic variables, not | |
162 | lexicals. | |
163 | ||
164 | %some_hash = (); # can't be my() | |
165 | *some_hash = fn( \%another_hash ); | |
166 | sub fn { | |
167 | local *hashsym = shift; | |
168 | # now use %hashsym normally, and you | |
169 | # will affect the caller's %another_hash | |
170 | my %nhash = (); # do what you want | |
171 | return \%nhash; | |
172 | } | |
173 | ||
174 | On return, the reference will overwrite the hash slot in the | |
175 | symbol table specified by the *some_hash typeglob. This | |
176 | is a somewhat tricky way of passing around references cheaply | |
177 | when you don't want to have to remember to dereference variables | |
178 | explicitly. | |
179 | ||
180 | Another use of symbol tables is for making "constant" scalars. | |
181 | ||
182 | *PI = \3.14159265358979; | |
183 | ||
184 | Now you cannot alter C<$PI>, which is probably a good thing all in all. | |
185 | This isn't the same as a constant subroutine, which is subject to | |
186 | optimization at compile-time. A constant subroutine is one prototyped | |
187 | to take no arguments and to return a constant expression. See | |
188 | L<perlsub> for details on these. The C<use constant> pragma is a | |
189 | convenient shorthand for these. | |
190 | ||
191 | You can say C<*foo{PACKAGE}> and C<*foo{NAME}> to find out what name and | |
192 | package the *foo symbol table entry comes from. This may be useful | |
193 | in a subroutine that gets passed typeglobs as arguments: | |
194 | ||
195 | sub identify_typeglob { | |
196 | my $glob = shift; | |
197 | print 'You gave me ', *{$glob}{PACKAGE}, '::', *{$glob}{NAME}, "\n"; | |
198 | } | |
199 | identify_typeglob *foo; | |
200 | identify_typeglob *bar::baz; | |
201 | ||
202 | This prints | |
203 | ||
204 | You gave me main::foo | |
205 | You gave me bar::baz | |
206 | ||
207 | The C<*foo{THING}> notation can also be used to obtain references to the | |
208 | individual elements of *foo. See L<perlref>. | |
209 | ||
210 | Subroutine definitions (and declarations, for that matter) need | |
211 | not necessarily be situated in the package whose symbol table they | |
212 | occupy. You can define a subroutine outside its package by | |
213 | explicitly qualifying the name of the subroutine: | |
214 | ||
215 | package main; | |
216 | sub Some_package::foo { ... } # &foo defined in Some_package | |
217 | ||
218 | This is just a shorthand for a typeglob assignment at compile time: | |
219 | ||
220 | BEGIN { *Some_package::foo = sub { ... } } | |
221 | ||
222 | and is I<not> the same as writing: | |
223 | ||
224 | { | |
225 | package Some_package; | |
226 | sub foo { ... } | |
227 | } | |
228 | ||
229 | In the first two versions, the body of the subroutine is | |
230 | lexically in the main package, I<not> in Some_package. So | |
231 | something like this: | |
232 | ||
233 | package main; | |
234 | ||
235 | $Some_package::name = "fred"; | |
236 | $main::name = "barney"; | |
237 | ||
238 | sub Some_package::foo { | |
239 | print "in ", __PACKAGE__, ": \$name is '$name'\n"; | |
240 | } | |
241 | ||
242 | Some_package::foo(); | |
243 | ||
244 | prints: | |
245 | ||
246 | in main: $name is 'barney' | |
247 | ||
248 | rather than: | |
249 | ||
250 | in Some_package: $name is 'fred' | |
251 | ||
252 | This also has implications for the use of the SUPER:: qualifier | |
253 | (see L<perlobj>). | |
254 | ||
255 | =head2 Package Constructors and Destructors | |
256 | ||
257 | Four special subroutines act as package constructors and destructors. | |
258 | These are the C<BEGIN>, C<CHECK>, C<INIT>, and C<END> routines. The | |
259 | C<sub> is optional for these routines. | |
260 | ||
261 | A C<BEGIN> subroutine is executed as soon as possible, that is, the moment | |
262 | it is completely defined, even before the rest of the containing file | |
263 | is parsed. You may have multiple C<BEGIN> blocks within a file--they | |
264 | will execute in order of definition. Because a C<BEGIN> block executes | |
265 | immediately, it can pull in definitions of subroutines and such from other | |
266 | files in time to be visible to the rest of the file. Once a C<BEGIN> | |
267 | has run, it is immediately undefined and any code it used is returned to | |
268 | Perl's memory pool. This means you can't ever explicitly call a C<BEGIN>. | |
269 | ||
270 | An C<END> subroutine is executed as late as possible, that is, after | |
271 | perl has finished running the program and just before the interpreter | |
272 | is being exited, even if it is exiting as a result of a die() function. | |
273 | (But not if it's polymorphing into another program via C<exec>, or | |
274 | being blown out of the water by a signal--you have to trap that yourself | |
275 | (if you can).) You may have multiple C<END> blocks within a file--they | |
276 | will execute in reverse order of definition; that is: last in, first | |
277 | out (LIFO). C<END> blocks are not executed when you run perl with the | |
278 | C<-c> switch, or if compilation fails. | |
279 | ||
280 | Inside an C<END> subroutine, C<$?> contains the value that the program is | |
281 | going to pass to C<exit()>. You can modify C<$?> to change the exit | |
282 | value of the program. Beware of changing C<$?> by accident (e.g. by | |
283 | running something via C<system>). | |
284 | ||
285 | Similar to C<BEGIN> blocks, C<INIT> blocks are run just before the | |
286 | Perl runtime begins execution, in "first in, first out" (FIFO) order. | |
287 | For example, the code generators documented in L<perlcc> make use of | |
288 | C<INIT> blocks to initialize and resolve pointers to XSUBs. | |
289 | ||
290 | Similar to C<END> blocks, C<CHECK> blocks are run just after the | |
291 | Perl compile phase ends and before the run time begins, in | |
292 | LIFO order. C<CHECK> blocks are again useful in the Perl compiler | |
293 | suite to save the compiled state of the program. | |
294 | ||
295 | When you use the B<-n> and B<-p> switches to Perl, C<BEGIN> and | |
296 | C<END> work just as they do in B<awk>, as a degenerate case. | |
297 | Both C<BEGIN> and C<CHECK> blocks are run when you use the B<-c> | |
298 | switch for a compile-only syntax check, although your main code | |
299 | is not. | |
300 | ||
301 | =head2 Perl Classes | |
302 | ||
303 | There is no special class syntax in Perl, but a package may act | |
304 | as a class if it provides subroutines to act as methods. Such a | |
305 | package may also derive some of its methods from another class (package) | |
306 | by listing the other package name(s) in its global @ISA array (which | |
307 | must be a package global, not a lexical). | |
308 | ||
309 | For more on this, see L<perltoot> and L<perlobj>. | |
310 | ||
311 | =head2 Perl Modules | |
312 | ||
313 | A module is just a set of related functions in a library file, i.e., | |
314 | a Perl package with the same name as the file. It is specifically | |
315 | designed to be reusable by other modules or programs. It may do this | |
316 | by providing a mechanism for exporting some of its symbols into the | |
317 | symbol table of any package using it. Or it may function as a class | |
318 | definition and make its semantics available implicitly through | |
319 | method calls on the class and its objects, without explicitly | |
320 | exporting anything. Or it can do a little of both. | |
321 | ||
322 | For example, to start a traditional, non-OO module called Some::Module, | |
323 | create a file called F<Some/Module.pm> and start with this template: | |
324 | ||
325 | package Some::Module; # assumes Some/Module.pm | |
326 | ||
327 | use strict; | |
328 | use warnings; | |
329 | ||
330 | BEGIN { | |
331 | use Exporter (); | |
332 | our ($VERSION, @ISA, @EXPORT, @EXPORT_OK, %EXPORT_TAGS); | |
333 | ||
334 | # set the version for version checking | |
335 | $VERSION = 1.00; | |
336 | # if using RCS/CVS, this may be preferred | |
337 | $VERSION = do { my @r = (q$Revision: 2.21 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r }; # must be all one line, for MakeMaker | |
338 | ||
339 | @ISA = qw(Exporter); | |
340 | @EXPORT = qw(&func1 &func2 &func4); | |
341 | %EXPORT_TAGS = ( ); # eg: TAG => [ qw!name1 name2! ], | |
342 | ||
343 | # your exported package globals go here, | |
344 | # as well as any optionally exported functions | |
345 | @EXPORT_OK = qw($Var1 %Hashit &func3); | |
346 | } | |
347 | our @EXPORT_OK; | |
348 | ||
349 | # exported package globals go here | |
350 | our $Var1; | |
351 | our %Hashit; | |
352 | ||
353 | # non-exported package globals go here | |
354 | our @more; | |
355 | our $stuff; | |
356 | ||
357 | # initialize package globals, first exported ones | |
358 | $Var1 = ''; | |
359 | %Hashit = (); | |
360 | ||
361 | # then the others (which are still accessible as $Some::Module::stuff) | |
362 | $stuff = ''; | |
363 | @more = (); | |
364 | ||
365 | # all file-scoped lexicals must be created before | |
366 | # the functions below that use them. | |
367 | ||
368 | # file-private lexicals go here | |
369 | my $priv_var = ''; | |
370 | my %secret_hash = (); | |
371 | ||
372 | # here's a file-private function as a closure, | |
373 | # callable as &$priv_func; it cannot be prototyped. | |
374 | my $priv_func = sub { | |
375 | # stuff goes here. | |
376 | }; | |
377 | ||
378 | # make all your functions, whether exported or not; | |
379 | # remember to put something interesting in the {} stubs | |
380 | sub func1 {} # no prototype | |
381 | sub func2() {} # proto'd void | |
382 | sub func3($$) {} # proto'd to 2 scalars | |
383 | ||
384 | # this one isn't exported, but could be called! | |
385 | sub func4(\%) {} # proto'd to 1 hash ref | |
386 | ||
387 | END { } # module clean-up code here (global destructor) | |
388 | ||
389 | ## YOUR CODE GOES HERE | |
390 | ||
391 | 1; # don't forget to return a true value from the file | |
392 | ||
393 | Then go on to declare and use your variables in functions without | |
394 | any qualifications. See L<Exporter> and the L<perlmodlib> for | |
395 | details on mechanics and style issues in module creation. | |
396 | ||
397 | Perl modules are included into your program by saying | |
398 | ||
399 | use Module; | |
400 | ||
401 | or | |
402 | ||
403 | use Module LIST; | |
404 | ||
405 | This is exactly equivalent to | |
406 | ||
407 | BEGIN { require Module; import Module; } | |
408 | ||
409 | or | |
410 | ||
411 | BEGIN { require Module; import Module LIST; } | |
412 | ||
413 | As a special case | |
414 | ||
415 | use Module (); | |
416 | ||
417 | is exactly equivalent to | |
418 | ||
419 | BEGIN { require Module; } | |
420 | ||
421 | All Perl module files have the extension F<.pm>. The C<use> operator | |
422 | assumes this so you don't have to spell out "F<Module.pm>" in quotes. | |
423 | This also helps to differentiate new modules from old F<.pl> and | |
424 | F<.ph> files. Module names are also capitalized unless they're | |
425 | functioning as pragmas; pragmas are in effect compiler directives, | |
426 | and are sometimes called "pragmatic modules" (or even "pragmata" | |
427 | if you're a classicist). | |
428 | ||
429 | The two statements: | |
430 | ||
431 | require SomeModule; | |
432 | require "SomeModule.pm"; | |
433 | ||
434 | differ from each other in two ways. In the first case, any double | |
435 | colons in the module name, such as C<Some::Module>, are translated | |
436 | into your system's directory separator, usually "/". The second | |
437 | case does not, and would have to be specified literally. The other | |
438 | difference is that seeing the first C<require> clues in the compiler | |
439 | that uses of indirect object notation involving "SomeModule", as | |
440 | in C<$ob = purge SomeModule>, are method calls, not function calls. | |
441 | (Yes, this really can make a difference.) | |
442 | ||
443 | Because the C<use> statement implies a C<BEGIN> block, the importing | |
444 | of semantics happens as soon as the C<use> statement is compiled, | |
445 | before the rest of the file is compiled. This is how it is able | |
446 | to function as a pragma mechanism, and also how modules are able to | |
447 | declare subroutines that are then visible as list or unary operators for | |
448 | the rest of the current file. This will not work if you use C<require> | |
449 | instead of C<use>. With C<require> you can get into this problem: | |
450 | ||
451 | require Cwd; # make Cwd:: accessible | |
452 | $here = Cwd::getcwd(); | |
453 | ||
454 | use Cwd; # import names from Cwd:: | |
455 | $here = getcwd(); | |
456 | ||
457 | require Cwd; # make Cwd:: accessible | |
458 | $here = getcwd(); # oops! no main::getcwd() | |
459 | ||
460 | In general, C<use Module ()> is recommended over C<require Module>, | |
461 | because it determines module availability at compile time, not in the | |
462 | middle of your program's execution. An exception would be if two modules | |
463 | each tried to C<use> each other, and each also called a function from | |
464 | that other module. In that case, it's easy to use C<require>s instead. | |
465 | ||
466 | Perl packages may be nested inside other package names, so we can have | |
467 | package names containing C<::>. But if we used that package name | |
468 | directly as a filename it would make for unwieldy or impossible | |
469 | filenames on some systems. Therefore, if a module's name is, say, | |
470 | C<Text::Soundex>, then its definition is actually found in the library | |
471 | file F<Text/Soundex.pm>. | |
472 | ||
473 | Perl modules always have a F<.pm> file, but there may also be | |
474 | dynamically linked executables (often ending in F<.so>) or autoloaded | |
475 | subroutine definitions (often ending in F<.al>) associated with the | |
476 | module. If so, these will be entirely transparent to the user of | |
477 | the module. It is the responsibility of the F<.pm> file to load | |
478 | (or arrange to autoload) any additional functionality. For example, | |
479 | although the POSIX module happens to do both dynamic loading and | |
480 | autoloading, the user can say just C<use POSIX> to get it all. | |
481 | ||
482 | =head2 Making your module threadsafe | |
483 | ||
484 | Perl has since 5.6.0 support for a new type of threads called | |
485 | interpreter threads. These threads can be used explicitly and implicitly. | |
486 | ||
487 | Ithreads work by cloning the data tree so that no data is shared | |
488 | between different threads. These threads can be used using the threads | |
489 | module or by doing fork() on win32 (fake fork() support). When a | |
490 | thread is cloned all Perl data is cloned, however non-Perl data cannot | |
491 | be cloned automatically. Perl after 5.7.2 has support for the C<CLONE> | |
492 | special subroutine . In C<CLONE> you can do whatever you need to do, | |
493 | like for example handle the cloning of non-Perl data, if necessary. | |
494 | C<CLONE> will be executed once for every package that has it defined | |
495 | (or inherits it). It will be called in the context of the new thread, | |
496 | so all modifications are made in the new area. | |
497 | ||
498 | If you want to CLONE all objects you will need to keep track of them per | |
499 | package. This is simply done using a hash and Scalar::Util::weaken(). | |
500 | ||
501 | =head1 SEE ALSO | |
502 | ||
503 | See L<perlmodlib> for general style issues related to building Perl | |
504 | modules and classes, as well as descriptions of the standard library | |
505 | and CPAN, L<Exporter> for how Perl's standard import/export mechanism | |
506 | works, L<perltoot> and L<perltooc> for an in-depth tutorial on | |
507 | creating classes, L<perlobj> for a hard-core reference document on | |
508 | objects, L<perlsub> for an explanation of functions and scoping, | |
509 | and L<perlxstut> and L<perlguts> for more information on writing | |
510 | extension modules. |