| 1 | =head1 NAME |
| 2 | |
| 3 | perlcompile - Introduction to the Perl Compiler-Translator |
| 4 | |
| 5 | =head1 DESCRIPTION |
| 6 | |
| 7 | Perl has always had a compiler: your source is compiled into an |
| 8 | internal form (a parse tree) which is then optimized before being |
| 9 | run. Since version 5.005, Perl has shipped with a module |
| 10 | capable of inspecting the optimized parse tree (C<B>), and this has |
| 11 | been used to write many useful utilities, including a module that lets |
| 12 | you turn your Perl into C source code that can be compiled into a |
| 13 | native executable. |
| 14 | |
| 15 | The C<B> module provides access to the parse tree, and other modules |
| 16 | ("back ends") do things with the tree. Some write it out as |
| 17 | bytecode, C source code, or a semi-human-readable text. Another |
| 18 | traverses the parse tree to build a cross-reference of which |
| 19 | subroutines, formats, and variables are used where. Another checks |
| 20 | your code for dubious constructs. Yet another back end dumps the |
| 21 | parse tree back out as Perl source, acting as a source code beautifier |
| 22 | or deobfuscator. |
| 23 | |
| 24 | Because its original purpose was to be a way to produce C code |
| 25 | corresponding to a Perl program, and in turn a native executable, the |
| 26 | C<B> module and its associated back ends are known as "the |
| 27 | compiler", even though they don't really compile anything. |
| 28 | Different parts of the compiler are more accurately a "translator", |
| 29 | or an "inspector", but people want Perl to have a "compiler |
| 30 | option" not an "inspector gadget". What can you do? |
| 31 | |
| 32 | This document covers the use of the Perl compiler: which modules |
| 33 | it comprises, how to use the most important of the back end modules, |
| 34 | what problems there are, and how to work around them. |
| 35 | |
| 36 | =head2 Layout |
| 37 | |
| 38 | The compiler back ends are in the C<B::> hierarchy, and the front-end |
| 39 | (the module that you, the user of the compiler, will sometimes |
| 40 | interact with) is the O module. Some back ends (e.g., C<B::C>) have |
| 41 | programs (e.g., I<perlcc>) to hide the modules' complexity. |
| 42 | |
| 43 | Here are the important back ends to know about, with their status |
| 44 | expressed as a number from 0 (outline for later implementation) to |
| 45 | 10 (if there's a bug in it, we're very surprised): |
| 46 | |
| 47 | =over 4 |
| 48 | |
| 49 | =item B::Bytecode |
| 50 | |
| 51 | Stores the parse tree in a machine-independent format, suitable |
| 52 | for later reloading through the ByteLoader module. Status: 5 (some |
| 53 | things work, some things don't, some things are untested). |
| 54 | |
| 55 | =item B::C |
| 56 | |
| 57 | Creates a C source file containing code to rebuild the parse tree |
| 58 | and resume the interpreter. Status: 6 (many things work adequately, |
| 59 | including programs using Tk). |
| 60 | |
| 61 | =item B::CC |
| 62 | |
| 63 | Creates a C source file corresponding to the run time code path in |
| 64 | the parse tree. This is the closest to a Perl-to-C translator there |
| 65 | is, but the code it generates is almost incomprehensible because it |
| 66 | translates the parse tree into a giant switch structure that |
| 67 | manipulates Perl structures. Eventual goal is to reduce (given |
| 68 | sufficient type information in the Perl program) some of the |
| 69 | Perl data structure manipulations into manipulations of C-level |
| 70 | ints, floats, etc. Status: 5 (some things work, including |
| 71 | uncomplicated Tk examples). |
| 72 | |
| 73 | =item B::Lint |
| 74 | |
| 75 | Complains if it finds dubious constructs in your source code. Status: |
| 76 | 6 (it works adequately, but only has a very limited number of areas |
| 77 | that it checks). |
| 78 | |
| 79 | =item B::Deparse |
| 80 | |
| 81 | Recreates the Perl source, making an attempt to format it coherently. |
| 82 | Status: 8 (it works nicely, but a few obscure things are missing). |
| 83 | |
| 84 | =item B::Xref |
| 85 | |
| 86 | Reports on the declaration and use of subroutines and variables. |
| 87 | Status: 8 (it works nicely, but still has a few lingering bugs). |
| 88 | |
| 89 | =back |
| 90 | |
| 91 | =head1 Using The Back Ends |
| 92 | |
| 93 | The following sections describe how to use the various compiler back |
| 94 | ends. They're presented roughly in order of maturity, so that the |
| 95 | most stable and proven back ends are described first, and the most |
| 96 | experimental and incomplete back ends are described last. |
| 97 | |
| 98 | The O module automatically enabled the B<-c> flag to Perl, which |
| 99 | prevents Perl from executing your code once it has been compiled. |
| 100 | This is why all the back ends print: |
| 101 | |
| 102 | myperlprogram syntax OK |
| 103 | |
| 104 | before producing any other output. |
| 105 | |
| 106 | =head2 The Cross Referencing Back End |
| 107 | |
| 108 | The cross referencing back end (B::Xref) produces a report on your program, |
| 109 | breaking down declarations and uses of subroutines and variables (and |
| 110 | formats) by file and subroutine. For instance, here's part of the |
| 111 | report from the I<pod2man> program that comes with Perl: |
| 112 | |
| 113 | Subroutine clear_noremap |
| 114 | Package (lexical) |
| 115 | $ready_to_print i1069, 1079 |
| 116 | Package main |
| 117 | $& 1086 |
| 118 | $. 1086 |
| 119 | $0 1086 |
| 120 | $1 1087 |
| 121 | $2 1085, 1085 |
| 122 | $3 1085, 1085 |
| 123 | $ARGV 1086 |
| 124 | %HTML_Escapes 1085, 1085 |
| 125 | |
| 126 | This shows the variables used in the subroutine C<clear_noremap>. The |
| 127 | variable C<$ready_to_print> is a my() (lexical) variable, |
| 128 | B<i>ntroduced (first declared with my()) on line 1069, and used on |
| 129 | line 1079. The variable C<$&> from the main package is used on 1086, |
| 130 | and so on. |
| 131 | |
| 132 | A line number may be prefixed by a single letter: |
| 133 | |
| 134 | =over 4 |
| 135 | |
| 136 | =item i |
| 137 | |
| 138 | Lexical variable introduced (declared with my()) for the first time. |
| 139 | |
| 140 | =item & |
| 141 | |
| 142 | Subroutine or method call. |
| 143 | |
| 144 | =item s |
| 145 | |
| 146 | Subroutine defined. |
| 147 | |
| 148 | =item r |
| 149 | |
| 150 | Format defined. |
| 151 | |
| 152 | =back |
| 153 | |
| 154 | The most useful option the cross referencer has is to save the report |
| 155 | to a separate file. For instance, to save the report on |
| 156 | I<myperlprogram> to the file I<report>: |
| 157 | |
| 158 | $ perl -MO=Xref,-oreport myperlprogram |
| 159 | |
| 160 | =head2 The Decompiling Back End |
| 161 | |
| 162 | The Deparse back end turns your Perl source back into Perl source. It |
| 163 | can reformat along the way, making it useful as a de-obfuscator. The |
| 164 | most basic way to use it is: |
| 165 | |
| 166 | $ perl -MO=Deparse myperlprogram |
| 167 | |
| 168 | You'll notice immediately that Perl has no idea of how to paragraph |
| 169 | your code. You'll have to separate chunks of code from each other |
| 170 | with newlines by hand. However, watch what it will do with |
| 171 | one-liners: |
| 172 | |
| 173 | $ perl -MO=Deparse -e '$op=shift||die "usage: $0 |
| 174 | code [...]";chomp(@ARGV=<>)unless@ARGV; for(@ARGV){$was=$_;eval$op; |
| 175 | die$@ if$@; rename$was,$_ unless$was eq $_}' |
| 176 | -e syntax OK |
| 177 | $op = shift @ARGV || die("usage: $0 code [...]"); |
| 178 | chomp(@ARGV = <ARGV>) unless @ARGV; |
| 179 | foreach $_ (@ARGV) { |
| 180 | $was = $_; |
| 181 | eval $op; |
| 182 | die $@ if $@; |
| 183 | rename $was, $_ unless $was eq $_; |
| 184 | } |
| 185 | |
| 186 | The decompiler has several options for the code it generates. For |
| 187 | instance, you can set the size of each indent from 4 (as above) to |
| 188 | 2 with: |
| 189 | |
| 190 | $ perl -MO=Deparse,-si2 myperlprogram |
| 191 | |
| 192 | The B<-p> option adds parentheses where normally they are omitted: |
| 193 | |
| 194 | $ perl -MO=Deparse -e 'print "Hello, world\n"' |
| 195 | -e syntax OK |
| 196 | print "Hello, world\n"; |
| 197 | $ perl -MO=Deparse,-p -e 'print "Hello, world\n"' |
| 198 | -e syntax OK |
| 199 | print("Hello, world\n"); |
| 200 | |
| 201 | See L<B::Deparse> for more information on the formatting options. |
| 202 | |
| 203 | =head2 The Lint Back End |
| 204 | |
| 205 | The lint back end (B::Lint) inspects programs for poor style. One |
| 206 | programmer's bad style is another programmer's useful tool, so options |
| 207 | let you select what is complained about. |
| 208 | |
| 209 | To run the style checker across your source code: |
| 210 | |
| 211 | $ perl -MO=Lint myperlprogram |
| 212 | |
| 213 | To disable context checks and undefined subroutines: |
| 214 | |
| 215 | $ perl -MO=Lint,-context,-undefined-subs myperlprogram |
| 216 | |
| 217 | See L<B::Lint> for information on the options. |
| 218 | |
| 219 | =head2 The Simple C Back End |
| 220 | |
| 221 | This module saves the internal compiled state of your Perl program |
| 222 | to a C source file, which can be turned into a native executable |
| 223 | for that particular platform using a C compiler. The resulting |
| 224 | program links against the Perl interpreter library, so it |
| 225 | will not save you disk space (unless you build Perl with a shared |
| 226 | library) or program size. It may, however, save you startup time. |
| 227 | |
| 228 | The C<perlcc> tool generates such executables by default. |
| 229 | |
| 230 | perlcc myperlprogram.pl |
| 231 | |
| 232 | =head2 The Bytecode Back End |
| 233 | |
| 234 | This back end is only useful if you also have a way to load and |
| 235 | execute the bytecode that it produces. The ByteLoader module provides |
| 236 | this functionality. |
| 237 | |
| 238 | To turn a Perl program into executable byte code, you can use C<perlcc> |
| 239 | with the C<-B> switch: |
| 240 | |
| 241 | perlcc -B myperlprogram.pl |
| 242 | |
| 243 | The byte code is machine independent, so once you have a compiled |
| 244 | module or program, it is as portable as Perl source (assuming that |
| 245 | the user of the module or program has a modern-enough Perl interpreter |
| 246 | to decode the byte code). |
| 247 | |
| 248 | See B<B::Bytecode> for information on options to control the |
| 249 | optimization and nature of the code generated by the Bytecode module. |
| 250 | |
| 251 | =head2 The Optimized C Back End |
| 252 | |
| 253 | The optimized C back end will turn your Perl program's run time |
| 254 | code-path into an equivalent (but optimized) C program that manipulates |
| 255 | the Perl data structures directly. The program will still link against |
| 256 | the Perl interpreter library, to allow for eval(), C<s///e>, |
| 257 | C<require>, etc. |
| 258 | |
| 259 | The C<perlcc> tool generates such executables when using the -O |
| 260 | switch. To compile a Perl program (ending in C<.pl> |
| 261 | or C<.p>): |
| 262 | |
| 263 | perlcc -O myperlprogram.pl |
| 264 | |
| 265 | To produce a shared library from a Perl module (ending in C<.pm>): |
| 266 | |
| 267 | perlcc -O Myperlmodule.pm |
| 268 | |
| 269 | For more information, see L<perlcc> and L<B::CC>. |
| 270 | |
| 271 | =head1 Module List for the Compiler Suite |
| 272 | |
| 273 | =over 4 |
| 274 | |
| 275 | =item B |
| 276 | |
| 277 | This module is the introspective ("reflective" in Java terms) |
| 278 | module, which allows a Perl program to inspect its innards. The |
| 279 | back end modules all use this module to gain access to the compiled |
| 280 | parse tree. You, the user of a back end module, will not need to |
| 281 | interact with B. |
| 282 | |
| 283 | =item O |
| 284 | |
| 285 | This module is the front-end to the compiler's back ends. Normally |
| 286 | called something like this: |
| 287 | |
| 288 | $ perl -MO=Deparse myperlprogram |
| 289 | |
| 290 | This is like saying C<use O 'Deparse'> in your Perl program. |
| 291 | |
| 292 | =item B::Asmdata |
| 293 | |
| 294 | This module is used by the B::Assembler module, which is in turn used |
| 295 | by the B::Bytecode module, which stores a parse-tree as |
| 296 | bytecode for later loading. It's not a back end itself, but rather a |
| 297 | component of a back end. |
| 298 | |
| 299 | =item B::Assembler |
| 300 | |
| 301 | This module turns a parse-tree into data suitable for storing |
| 302 | and later decoding back into a parse-tree. It's not a back end |
| 303 | itself, but rather a component of a back end. It's used by the |
| 304 | I<assemble> program that produces bytecode. |
| 305 | |
| 306 | =item B::Bblock |
| 307 | |
| 308 | This module is used by the B::CC back end. It walks "basic blocks". |
| 309 | A basic block is a series of operations which is known to execute from |
| 310 | start to finish, with no possibility of branching or halting. |
| 311 | |
| 312 | =item B::Bytecode |
| 313 | |
| 314 | This module is a back end that generates bytecode from a |
| 315 | program's parse tree. This bytecode is written to a file, from where |
| 316 | it can later be reconstructed back into a parse tree. The goal is to |
| 317 | do the expensive program compilation once, save the interpreter's |
| 318 | state into a file, and then restore the state from the file when the |
| 319 | program is to be executed. See L</"The Bytecode Back End"> |
| 320 | for details about usage. |
| 321 | |
| 322 | =item B::C |
| 323 | |
| 324 | This module writes out C code corresponding to the parse tree and |
| 325 | other interpreter internal structures. You compile the corresponding |
| 326 | C file, and get an executable file that will restore the internal |
| 327 | structures and the Perl interpreter will begin running the |
| 328 | program. See L</"The Simple C Back End"> for details about usage. |
| 329 | |
| 330 | =item B::CC |
| 331 | |
| 332 | This module writes out C code corresponding to your program's |
| 333 | operations. Unlike the B::C module, which merely stores the |
| 334 | interpreter and its state in a C program, the B::CC module makes a |
| 335 | C program that does not involve the interpreter. As a consequence, |
| 336 | programs translated into C by B::CC can execute faster than normal |
| 337 | interpreted programs. See L</"The Optimized C Back End"> for |
| 338 | details about usage. |
| 339 | |
| 340 | =item B::Concise |
| 341 | |
| 342 | This module prints a concise (but complete) version of the Perl parse |
| 343 | tree. Its output is more customizable than the one of B::Terse or |
| 344 | B::Debug (and it can emulate them). This module useful for people who |
| 345 | are writing their own back end, or who are learning about the Perl |
| 346 | internals. It's not useful to the average programmer. |
| 347 | |
| 348 | =item B::Debug |
| 349 | |
| 350 | This module dumps the Perl parse tree in verbose detail to STDOUT. |
| 351 | It's useful for people who are writing their own back end, or who |
| 352 | are learning about the Perl internals. It's not useful to the |
| 353 | average programmer. |
| 354 | |
| 355 | =item B::Deparse |
| 356 | |
| 357 | This module produces Perl source code from the compiled parse tree. |
| 358 | It is useful in debugging and deconstructing other people's code, |
| 359 | also as a pretty-printer for your own source. See |
| 360 | L</"The Decompiling Back End"> for details about usage. |
| 361 | |
| 362 | =item B::Disassembler |
| 363 | |
| 364 | This module turns bytecode back into a parse tree. It's not a back |
| 365 | end itself, but rather a component of a back end. It's used by the |
| 366 | I<disassemble> program that comes with the bytecode. |
| 367 | |
| 368 | =item B::Lint |
| 369 | |
| 370 | This module inspects the compiled form of your source code for things |
| 371 | which, while some people frown on them, aren't necessarily bad enough |
| 372 | to justify a warning. For instance, use of an array in scalar context |
| 373 | without explicitly saying C<scalar(@array)> is something that Lint |
| 374 | can identify. See L</"The Lint Back End"> for details about usage. |
| 375 | |
| 376 | =item B::Showlex |
| 377 | |
| 378 | This module prints out the my() variables used in a function or a |
| 379 | file. To get a list of the my() variables used in the subroutine |
| 380 | mysub() defined in the file myperlprogram: |
| 381 | |
| 382 | $ perl -MO=Showlex,mysub myperlprogram |
| 383 | |
| 384 | To get a list of the my() variables used in the file myperlprogram: |
| 385 | |
| 386 | $ perl -MO=Showlex myperlprogram |
| 387 | |
| 388 | [BROKEN] |
| 389 | |
| 390 | =item B::Stackobj |
| 391 | |
| 392 | This module is used by the B::CC module. It's not a back end itself, |
| 393 | but rather a component of a back end. |
| 394 | |
| 395 | =item B::Stash |
| 396 | |
| 397 | This module is used by the L<perlcc> program, which compiles a module |
| 398 | into an executable. B::Stash prints the symbol tables in use by a |
| 399 | program, and is used to prevent B::CC from producing C code for the |
| 400 | B::* and O modules. It's not a back end itself, but rather a |
| 401 | component of a back end. |
| 402 | |
| 403 | =item B::Terse |
| 404 | |
| 405 | This module prints the contents of the parse tree, but without as much |
| 406 | information as B::Debug. For comparison, C<print "Hello, world."> |
| 407 | produced 96 lines of output from B::Debug, but only 6 from B::Terse. |
| 408 | |
| 409 | This module is useful for people who are writing their own back end, |
| 410 | or who are learning about the Perl internals. It's not useful to the |
| 411 | average programmer. |
| 412 | |
| 413 | =item B::Xref |
| 414 | |
| 415 | This module prints a report on where the variables, subroutines, and |
| 416 | formats are defined and used within a program and the modules it |
| 417 | loads. See L</"The Cross Referencing Back End"> for details about |
| 418 | usage. |
| 419 | |
| 420 | =back |
| 421 | |
| 422 | =head1 KNOWN PROBLEMS |
| 423 | |
| 424 | The simple C backend currently only saves typeglobs with alphanumeric |
| 425 | names. |
| 426 | |
| 427 | The optimized C backend outputs code for more modules than it should |
| 428 | (e.g., DirHandle). It also has little hope of properly handling |
| 429 | C<goto LABEL> outside the running subroutine (C<goto &sub> is okay). |
| 430 | C<goto LABEL> currently does not work at all in this backend. |
| 431 | It also creates a huge initialization function that gives |
| 432 | C compilers headaches. Splitting the initialization function gives |
| 433 | better results. Other problems include: unsigned math does not |
| 434 | work correctly; some opcodes are handled incorrectly by default |
| 435 | opcode handling mechanism. |
| 436 | |
| 437 | BEGIN{} blocks are executed while compiling your code. Any external |
| 438 | state that is initialized in BEGIN{}, such as opening files, initiating |
| 439 | database connections etc., do not behave properly. To work around |
| 440 | this, Perl has an INIT{} block that corresponds to code being executed |
| 441 | before your program begins running but after your program has finished |
| 442 | being compiled. Execution order: BEGIN{}, (possible save of state |
| 443 | through compiler back-end), INIT{}, program runs, END{}. |
| 444 | |
| 445 | =head1 AUTHOR |
| 446 | |
| 447 | This document was originally written by Nathan Torkington, and is now |
| 448 | maintained by the perl5-porters mailing list |
| 449 | I<perl5-porters@perl.org>. |
| 450 | |
| 451 | =cut |