| 1 | =head1 NAME |
| 2 | |
| 3 | perlhack - How to hack at the Perl internals |
| 4 | |
| 5 | =head1 DESCRIPTION |
| 6 | |
| 7 | This document attempts to explain how Perl development takes place, |
| 8 | and ends with some suggestions for people wanting to become bona fide |
| 9 | porters. |
| 10 | |
| 11 | The perl5-porters mailing list is where the Perl standard distribution |
| 12 | is maintained and developed. The list can get anywhere from 10 to 150 |
| 13 | messages a day, depending on the heatedness of the debate. Most days |
| 14 | there are two or three patches, extensions, features, or bugs being |
| 15 | discussed at a time. |
| 16 | |
| 17 | A searchable archive of the list is at either: |
| 18 | |
| 19 | http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/ |
| 20 | |
| 21 | or |
| 22 | |
| 23 | http://archive.develooper.com/perl5-porters@perl.org/ |
| 24 | |
| 25 | List subscribers (the porters themselves) come in several flavours. |
| 26 | Some are quiet curious lurkers, who rarely pitch in and instead watch |
| 27 | the ongoing development to ensure they're forewarned of new changes or |
| 28 | features in Perl. Some are representatives of vendors, who are there |
| 29 | to make sure that Perl continues to compile and work on their |
| 30 | platforms. Some patch any reported bug that they know how to fix, |
| 31 | some are actively patching their pet area (threads, Win32, the regexp |
| 32 | engine), while others seem to do nothing but complain. In other |
| 33 | words, it's your usual mix of technical people. |
| 34 | |
| 35 | Over this group of porters presides Larry Wall. He has the final word |
| 36 | in what does and does not change in the Perl language. Various |
| 37 | releases of Perl are shepherded by a "pumpking", a porter |
| 38 | responsible for gathering patches, deciding on a patch-by-patch, |
| 39 | feature-by-feature basis what will and will not go into the release. |
| 40 | For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of |
| 41 | Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and |
| 42 | Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release. |
| 43 | |
| 44 | In addition, various people are pumpkings for different things. For |
| 45 | instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the |
| 46 | I<Configure> pumpkin up till the 5.8 release. For the 5.10 release |
| 47 | H.Merijn Brand took over. |
| 48 | |
| 49 | Larry sees Perl development along the lines of the US government: |
| 50 | there's the Legislature (the porters), the Executive branch (the |
| 51 | pumpkings), and the Supreme Court (Larry). The legislature can |
| 52 | discuss and submit patches to the executive branch all they like, but |
| 53 | the executive branch is free to veto them. Rarely, the Supreme Court |
| 54 | will side with the executive branch over the legislature, or the |
| 55 | legislature over the executive branch. Mostly, however, the |
| 56 | legislature and the executive branch are supposed to get along and |
| 57 | work out their differences without impeachment or court cases. |
| 58 | |
| 59 | You might sometimes see reference to Rule 1 and Rule 2. Larry's power |
| 60 | as Supreme Court is expressed in The Rules: |
| 61 | |
| 62 | =over 4 |
| 63 | |
| 64 | =item 1 |
| 65 | |
| 66 | Larry is always by definition right about how Perl should behave. |
| 67 | This means he has final veto power on the core functionality. |
| 68 | |
| 69 | =item 2 |
| 70 | |
| 71 | Larry is allowed to change his mind about any matter at a later date, |
| 72 | regardless of whether he previously invoked Rule 1. |
| 73 | |
| 74 | =back |
| 75 | |
| 76 | Got that? Larry is always right, even when he was wrong. It's rare |
| 77 | to see either Rule exercised, but they are often alluded to. |
| 78 | |
| 79 | New features and extensions to the language are contentious, because |
| 80 | the criteria used by the pumpkings, Larry, and other porters to decide |
| 81 | which features should be implemented and incorporated are not codified |
| 82 | in a few small design goals as with some other languages. Instead, |
| 83 | the heuristics are flexible and often difficult to fathom. Here is |
| 84 | one person's list, roughly in decreasing order of importance, of |
| 85 | heuristics that new features have to be weighed against: |
| 86 | |
| 87 | =over 4 |
| 88 | |
| 89 | =item Does concept match the general goals of Perl? |
| 90 | |
| 91 | These haven't been written anywhere in stone, but one approximation |
| 92 | is: |
| 93 | |
| 94 | 1. Keep it fast, simple, and useful. |
| 95 | 2. Keep features/concepts as orthogonal as possible. |
| 96 | 3. No arbitrary limits (platforms, data sizes, cultures). |
| 97 | 4. Keep it open and exciting to use/patch/advocate Perl everywhere. |
| 98 | 5. Either assimilate new technologies, or build bridges to them. |
| 99 | |
| 100 | =item Where is the implementation? |
| 101 | |
| 102 | All the talk in the world is useless without an implementation. In |
| 103 | almost every case, the person or people who argue for a new feature |
| 104 | will be expected to be the ones who implement it. Porters capable |
| 105 | of coding new features have their own agendas, and are not available |
| 106 | to implement your (possibly good) idea. |
| 107 | |
| 108 | =item Backwards compatibility |
| 109 | |
| 110 | It's a cardinal sin to break existing Perl programs. New warnings are |
| 111 | contentious--some say that a program that emits warnings is not |
| 112 | broken, while others say it is. Adding keywords has the potential to |
| 113 | break programs, changing the meaning of existing token sequences or |
| 114 | functions might break programs. |
| 115 | |
| 116 | =item Could it be a module instead? |
| 117 | |
| 118 | Perl 5 has extension mechanisms, modules and XS, specifically to avoid |
| 119 | the need to keep changing the Perl interpreter. You can write modules |
| 120 | that export functions, you can give those functions prototypes so they |
| 121 | can be called like built-in functions, you can even write XS code to |
| 122 | mess with the runtime data structures of the Perl interpreter if you |
| 123 | want to implement really complicated things. If it can be done in a |
| 124 | module instead of in the core, it's highly unlikely to be added. |
| 125 | |
| 126 | =item Is the feature generic enough? |
| 127 | |
| 128 | Is this something that only the submitter wants added to the language, |
| 129 | or would it be broadly useful? Sometimes, instead of adding a feature |
| 130 | with a tight focus, the porters might decide to wait until someone |
| 131 | implements the more generalized feature. For instance, instead of |
| 132 | implementing a "delayed evaluation" feature, the porters are waiting |
| 133 | for a macro system that would permit delayed evaluation and much more. |
| 134 | |
| 135 | =item Does it potentially introduce new bugs? |
| 136 | |
| 137 | Radical rewrites of large chunks of the Perl interpreter have the |
| 138 | potential to introduce new bugs. The smaller and more localized the |
| 139 | change, the better. |
| 140 | |
| 141 | =item Does it preclude other desirable features? |
| 142 | |
| 143 | A patch is likely to be rejected if it closes off future avenues of |
| 144 | development. For instance, a patch that placed a true and final |
| 145 | interpretation on prototypes is likely to be rejected because there |
| 146 | are still options for the future of prototypes that haven't been |
| 147 | addressed. |
| 148 | |
| 149 | =item Is the implementation robust? |
| 150 | |
| 151 | Good patches (tight code, complete, correct) stand more chance of |
| 152 | going in. Sloppy or incorrect patches might be placed on the back |
| 153 | burner until the pumpking has time to fix, or might be discarded |
| 154 | altogether without further notice. |
| 155 | |
| 156 | =item Is the implementation generic enough to be portable? |
| 157 | |
| 158 | The worst patches make use of a system-specific features. It's highly |
| 159 | unlikely that nonportable additions to the Perl language will be |
| 160 | accepted. |
| 161 | |
| 162 | =item Is the implementation tested? |
| 163 | |
| 164 | Patches which change behaviour (fixing bugs or introducing new features) |
| 165 | must include regression tests to verify that everything works as expected. |
| 166 | Without tests provided by the original author, how can anyone else changing |
| 167 | perl in the future be sure that they haven't unwittingly broken the behaviour |
| 168 | the patch implements? And without tests, how can the patch's author be |
| 169 | confident that his/her hard work put into the patch won't be accidentally |
| 170 | thrown away by someone in the future? |
| 171 | |
| 172 | =item Is there enough documentation? |
| 173 | |
| 174 | Patches without documentation are probably ill-thought out or |
| 175 | incomplete. Nothing can be added without documentation, so submitting |
| 176 | a patch for the appropriate manpages as well as the source code is |
| 177 | always a good idea. |
| 178 | |
| 179 | =item Is there another way to do it? |
| 180 | |
| 181 | Larry said "Although the Perl Slogan is I<There's More Than One Way |
| 182 | to Do It>, I hesitate to make 10 ways to do something". This is a |
| 183 | tricky heuristic to navigate, though--one man's essential addition is |
| 184 | another man's pointless cruft. |
| 185 | |
| 186 | =item Does it create too much work? |
| 187 | |
| 188 | Work for the pumpking, work for Perl programmers, work for module |
| 189 | authors, ... Perl is supposed to be easy. |
| 190 | |
| 191 | =item Patches speak louder than words |
| 192 | |
| 193 | Working code is always preferred to pie-in-the-sky ideas. A patch to |
| 194 | add a feature stands a much higher chance of making it to the language |
| 195 | than does a random feature request, no matter how fervently argued the |
| 196 | request might be. This ties into "Will it be useful?", as the fact |
| 197 | that someone took the time to make the patch demonstrates a strong |
| 198 | desire for the feature. |
| 199 | |
| 200 | =back |
| 201 | |
| 202 | If you're on the list, you might hear the word "core" bandied |
| 203 | around. It refers to the standard distribution. "Hacking on the |
| 204 | core" means you're changing the C source code to the Perl |
| 205 | interpreter. "A core module" is one that ships with Perl. |
| 206 | |
| 207 | =head2 Keeping in sync |
| 208 | |
| 209 | The source code to the Perl interpreter, in its different versions, is |
| 210 | kept in a repository managed by a revision control system ( which is |
| 211 | currently the Perforce program, see http://perforce.com/ ). The |
| 212 | pumpkings and a few others have access to the repository to check in |
| 213 | changes. Periodically the pumpking for the development version of Perl |
| 214 | will release a new version, so the rest of the porters can see what's |
| 215 | changed. The current state of the main trunk of repository, and patches |
| 216 | that describe the individual changes that have happened since the last |
| 217 | public release are available at this location: |
| 218 | |
| 219 | http://public.activestate.com/pub/apc/ |
| 220 | ftp://public.activestate.com/pub/apc/ |
| 221 | |
| 222 | If you're looking for a particular change, or a change that affected |
| 223 | a particular set of files, you may find the B<Perl Repository Browser> |
| 224 | useful: |
| 225 | |
| 226 | http://public.activestate.com/cgi-bin/perlbrowse |
| 227 | |
| 228 | You may also want to subscribe to the perl5-changes mailing list to |
| 229 | receive a copy of each patch that gets submitted to the maintenance |
| 230 | and development "branches" of the perl repository. See |
| 231 | http://lists.perl.org/ for subscription information. |
| 232 | |
| 233 | If you are a member of the perl5-porters mailing list, it is a good |
| 234 | thing to keep in touch with the most recent changes. If not only to |
| 235 | verify if what you would have posted as a bug report isn't already |
| 236 | solved in the most recent available perl development branch, also |
| 237 | known as perl-current, bleading edge perl, bleedperl or bleadperl. |
| 238 | |
| 239 | Needless to say, the source code in perl-current is usually in a perpetual |
| 240 | state of evolution. You should expect it to be very buggy. Do B<not> use |
| 241 | it for any purpose other than testing and development. |
| 242 | |
| 243 | Keeping in sync with the most recent branch can be done in several ways, |
| 244 | but the most convenient and reliable way is using B<rsync>, available at |
| 245 | ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent |
| 246 | branch by FTP.) |
| 247 | |
| 248 | If you choose to keep in sync using rsync, there are two approaches |
| 249 | to doing so: |
| 250 | |
| 251 | =over 4 |
| 252 | |
| 253 | =item rsync'ing the source tree |
| 254 | |
| 255 | Presuming you are in the directory where your perl source resides |
| 256 | and you have rsync installed and available, you can "upgrade" to |
| 257 | the bleadperl using: |
| 258 | |
| 259 | # rsync -avz rsync://public.activestate.com/perl-current/ . |
| 260 | |
| 261 | This takes care of updating every single item in the source tree to |
| 262 | the latest applied patch level, creating files that are new (to your |
| 263 | distribution) and setting date/time stamps of existing files to |
| 264 | reflect the bleadperl status. |
| 265 | |
| 266 | Note that this will not delete any files that were in '.' before |
| 267 | the rsync. Once you are sure that the rsync is running correctly, |
| 268 | run it with the --delete and the --dry-run options like this: |
| 269 | |
| 270 | # rsync -avz --delete --dry-run rsync://public.activestate.com/perl-current/ . |
| 271 | |
| 272 | This will I<simulate> an rsync run that also deletes files not |
| 273 | present in the bleadperl master copy. Observe the results from |
| 274 | this run closely. If you are sure that the actual run would delete |
| 275 | no files precious to you, you could remove the '--dry-run' option. |
| 276 | |
| 277 | You can than check what patch was the latest that was applied by |
| 278 | looking in the file B<.patch>, which will show the number of the |
| 279 | latest patch. |
| 280 | |
| 281 | If you have more than one machine to keep in sync, and not all of |
| 282 | them have access to the WAN (so you are not able to rsync all the |
| 283 | source trees to the real source), there are some ways to get around |
| 284 | this problem. |
| 285 | |
| 286 | =over 4 |
| 287 | |
| 288 | =item Using rsync over the LAN |
| 289 | |
| 290 | Set up a local rsync server which makes the rsynced source tree |
| 291 | available to the LAN and sync the other machines against this |
| 292 | directory. |
| 293 | |
| 294 | From http://rsync.samba.org/README.html : |
| 295 | |
| 296 | "Rsync uses rsh or ssh for communication. It does not need to be |
| 297 | setuid and requires no special privileges for installation. It |
| 298 | does not require an inetd entry or a daemon. You must, however, |
| 299 | have a working rsh or ssh system. Using ssh is recommended for |
| 300 | its security features." |
| 301 | |
| 302 | =item Using pushing over the NFS |
| 303 | |
| 304 | Having the other systems mounted over the NFS, you can take an |
| 305 | active pushing approach by checking the just updated tree against |
| 306 | the other not-yet synced trees. An example would be |
| 307 | |
| 308 | #!/usr/bin/perl -w |
| 309 | |
| 310 | use strict; |
| 311 | use File::Copy; |
| 312 | |
| 313 | my %MF = map { |
| 314 | m/(\S+)/; |
| 315 | $1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime |
| 316 | } `cat MANIFEST`; |
| 317 | |
| 318 | my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2); |
| 319 | |
| 320 | foreach my $host (keys %remote) { |
| 321 | unless (-d $remote{$host}) { |
| 322 | print STDERR "Cannot Xsync for host $host\n"; |
| 323 | next; |
| 324 | } |
| 325 | foreach my $file (keys %MF) { |
| 326 | my $rfile = "$remote{$host}/$file"; |
| 327 | my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9]; |
| 328 | defined $size or ($mode, $size, $mtime) = (0, 0, 0); |
| 329 | $size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next; |
| 330 | printf "%4s %-34s %8d %9d %8d %9d\n", |
| 331 | $host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime; |
| 332 | unlink $rfile; |
| 333 | copy ($file, $rfile); |
| 334 | utime time, $MF{$file}[2], $rfile; |
| 335 | chmod $MF{$file}[0], $rfile; |
| 336 | } |
| 337 | } |
| 338 | |
| 339 | though this is not perfect. It could be improved with checking |
| 340 | file checksums before updating. Not all NFS systems support |
| 341 | reliable utime support (when used over the NFS). |
| 342 | |
| 343 | =back |
| 344 | |
| 345 | =item rsync'ing the patches |
| 346 | |
| 347 | The source tree is maintained by the pumpking who applies patches to |
| 348 | the files in the tree. These patches are either created by the |
| 349 | pumpking himself using C<diff -c> after updating the file manually or |
| 350 | by applying patches sent in by posters on the perl5-porters list. |
| 351 | These patches are also saved and rsync'able, so you can apply them |
| 352 | yourself to the source files. |
| 353 | |
| 354 | Presuming you are in a directory where your patches reside, you can |
| 355 | get them in sync with |
| 356 | |
| 357 | # rsync -avz rsync://public.activestate.com/perl-current-diffs/ . |
| 358 | |
| 359 | This makes sure the latest available patch is downloaded to your |
| 360 | patch directory. |
| 361 | |
| 362 | It's then up to you to apply these patches, using something like |
| 363 | |
| 364 | # last=`ls -t *.gz | sed q` |
| 365 | # rsync -avz rsync://public.activestate.com/perl-current-diffs/ . |
| 366 | # find . -name '*.gz' -newer $last -exec gzcat {} \; >blead.patch |
| 367 | # cd ../perl-current |
| 368 | # patch -p1 -N <../perl-current-diffs/blead.patch |
| 369 | |
| 370 | or, since this is only a hint towards how it works, use CPAN-patchaperl |
| 371 |