.\" This highly condensed manual page was prepared from perl.man.
.TH PERL 1 "June 30, 1993"
perl \- practical extraction and report language
is an interpreted language optimized for scanning arbitrary text files,
extracting information from those text files, and printing reports based
It's also a good language for many system management tasks.
The language is intended to be practical (easy to use, efficient, complete)
rather than beautiful (tiny, elegant, minimal).
It combines (in the author's opinion, anyway) some of the best features of C,
\fIsed\fR, \fIawk\fR, and \fIsh\fR,
so people familiar with those languages should have little difficulty with it.
(Language historians will also note some vestiges of \fIcsh\fR, Pascal, and
Expression syntax corresponds quite closely to C expression syntax.
Unlike most Unix utilities,
does not arbitrarily limit the size of your data\(emif you've got
can slurp in your whole file as a single string.
Recursion is of unlimited depth.
And the hash tables used by associative arrays grow as necessary to prevent
uses sophisticated pattern matching techniques to scan large amounts of
Although optimized for scanning text,
can also deal with binary data, and can make dbm files look like associative
arrays (where dbm is available).
scripts are safer than C programs
through a dataflow tracing mechanism which prevents many stupid security holes.
If you have a problem that would ordinarily use \fIsed\fR
or \fIawk\fR or \fIsh\fR, but it
exceeds their capabilities or must run a little faster,
and you don't want to write the silly thing in C, then
There are also translators to turn your
looks for your script in one of the following places:
Specified line by line via
switches on the command line.
Contained in the file specified by the first filename on the command line.
(Note that systems supporting the #! notation invoke interpreters this way.)
Passed in implicitly via standard input.
This only works if there are no filename arguments\(emto pass
script you must explicitly specify a \- for the script name.
After locating your script,
compiles it to an internal form.
If the script is syntactically correct, it is executed.
A single-character option may be combined with the following option, if any.
This is particularly useful when invoking a script using the #! construct which
only allows one argument. Example:
#!/usr/bin/perl \-spi.bak # same as \-s \-p \-i.bak
specifies the record separator ($/) as an octal number.
If there are no digits, the null character is the separator.
Other switches may precede or follow the digits.
For example, if you have a version of
which can print filenames terminated by the null character, you can say this:
find . \-name '*.bak' \-print0 | perl \-n0e unlink
The special value 00 will cause Perl to slurp files in paragraph mode.
The value 0777 will cause Perl to slurp files whole since there is no
legal character with that value.
turns on autosplit mode when used with a
An implicit split command to the @F array
is done as the first thing inside the implicit while loop produced by
perl \-ane \'print pop(@F), "\en";\'
to check the syntax of the script and then exit without executing it.
runs the script under the perl debugger.
See the section on Debugging.
To watch how it executes your script, use
(This only works if debugging is compiled into your
Another nice value is \-D1024, which lists your compiled syntax tree.
And \-D512 displays compiled regular expressions.
may be used to enter one line of script.
commands may be given to build up a multi-line script.
will not look for a script filename in the argument list.
specifies that files processed by the <> construct are to be edited
It does this by renaming the input file, opening the output file by the
same name, and selecting that output file as the default for print statements.
The extension, if supplied, is added to the name of the
old file to make a backup copy.
If no extension is supplied, no backup is made.
Saying \*(L"perl \-p \-i.bak \-e "s/foo/bar/;" .\|.\|. \*(R" is the same as using
rename($ARGV, $ARGV . \'.bak\');
print; # this prints to original filename
form doesn't need to compare $ARGV to $oldargv to know when
the filename has changed.
It does, however, use ARGVOUT for the selected filehandle.
is restored as the default output filehandle after the loop.
You can use eof to locate the end of each input file, in case you want
to append to each file, or reset line numbering (see example under eof).
may be used in conjunction with
to tell the C preprocessor where to look for include files.
By default /usr/include and /usr/lib/perl are searched.
enables automatic line-ending processing. It has two effects:
first, it automatically chops the line terminator when used with
and second, it assigns $\e to have the value of
so that any print statements will have that line terminator added back on. If
is omitted, sets $\e to the current value of $/.
For instance, to trim lines to 80 columns:
perl -lpe \'substr($_, 80) = ""\'
Note that the assignment $\e = $/ is done when the switch is processed,
so the input record separator can be different than the output record
gnufind / -print0 | perl -ln0e 'print "found $_" if -p'
This sets $\e to newline and then sets $/ to the null character.
to assume the following loop around your script, which makes it iterate
over filename arguments somewhat like \*(L"sed \-n\*(R" or \fIawk\fR:
.\|.\|. # your script goes here
Note that the lines are not printed by default.
Here is an efficient way to delete all files older than a week:
find . \-mtime +7 \-print | perl \-nle \'unlink;\'
This is faster than using the \-exec switch of find because you don't have to
start a process on every filename found.
to assume the following loop around your script, which makes it iterate
over filename arguments somewhat like \fIsed\fR:
.\|.\|. # your script goes here
Note that the lines are printed automatically.
To suppress printing use the
causes your script to be run through the C preprocessor before
(Since both comments and cpp directives begin with the # character,
you should avoid starting comments with any words recognized
by the C preprocessor such as \*(L"if\*(R", \*(L"else\*(R" or \*(L"define\*(R".)
enables some rudimentary switch parsing for switches on the command line
after the script name but before any filename arguments (or before a \-\|\-).
Any switch found there is removed from @ARGV and sets the corresponding variable in the
The following script prints \*(L"true\*(R" if and only if the script is
invoked with a \-xyz switch.
if ($xyz) { print "true\en"; }
use the PATH environment variable to search for the script
(unless the name of the script starts with a slash).
Typically this is used to emulate #! startup on machines that don't
support #!, in the following manner:
eval "exec /usr/bin/perl \-S $0 $*"
if $running_under_some_shell;
The system ignores the first line and feeds the script to /bin/sh,
which proceeds to try to execute the
script as a shell script.
The shell executes the second line as a normal shell command, and thus
On some systems $0 doesn't always contain the full pathname,
to search for the script if necessary.
locates the script, it parses the lines and ignores them because
the variable $running_under_some_shell is never true.
A better construct than $* would be ${1+"$@"}, which handles embedded spaces
and such in the filenames, but doesn't work if the script is being interpreted
In order to start up sh rather than csh, some systems may have to replace the
#! line with a line containing just
a colon, which will be politely ignored by perl.
Other systems can't control that, and need a totally devious construct that
will work under any of csh, sh or perl, such as the following:
eval '(exit $?0)' && eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
& eval 'exec /usr/bin/perl -S $0 $argv:q'
to dump core after compiling your script.
You can then take this core dump and turn it into an executable file
by using the undump program (not supplied).
This speeds startup at the expense of some disk space (which you can
minimize by stripping the executable).
(Still, a "hello world" executable comes out to about 200K on my machine.)
If you are going to run your executable as a set-id program then you
should probably compile it using taintperl rather than normal perl.
If you want to execute a portion of your script before dumping, use the
Note: availability of undump is platform specific and may not be available
for a specific port of perl.
Currently the only \*(L"unsafe\*(R" operations are the unlinking of directories while
running as superuser, and running setuid programs with fatal taint checks
prints the version and patchlevel of your
prints warnings about identifiers that are mentioned only once, and scalar
variables that are used before being set.
Also warns about redefined subroutines, and references to undefined
filehandles or filehandles opened readonly that you are attempting to
Also warns you if you use == on values that don't look like numbers, and if
your subroutines recurse more than 100 deep.
that the script is embedded in a message.
Leading garbage will be discarded until the first line that starts
with #! and contains the string "perl".
Any meaningful switches on that line will be applied (but only one
group of switches, as with normal #! processing).
If a directory name is specified, Perl will switch to that directory
before running the script.
switch only controls the the disposal of leading garbage.
The script must be terminated with _\|_END_\|_ if there is trailing garbage
to be ignored (the script can process any or all of the trailing garbage
via the DATA filehandle if desired).
Used if chdir has no argument.
Used if chdir has no argument and HOME is not set.
Used in executing subprocesses, and in finding the script if \-S
A colon-separated list of directories in which to look for Perl library
files before looking in the standard library and the current directory.
The command used to get the debugger code. If unset, uses
uses no other environment variables, except to make them available
to the script being executed, and to child processes.
However, scripts running setuid would do well to execute the following lines
before doing anything else, just to keep people honest:
$ENV{\'PATH\'} = \'/bin:/usr/bin\'; # or whatever you need
$ENV{\'SHELL\'} = \'/bin/sh\' if $ENV{\'SHELL\'} ne \'\';
$ENV{\'IFS\'} = \'\' if $ENV{\'IFS\'} ne \'\';
/tmp/perl\-eXXXXXX temporary file for
The complete perl documentation can be found in the
UNIX System manager's Manual (SMM:19).
a2p awk to perl translator
s2p sed to perl translator
Compilation errors will tell you the line number of the error, with an
indication of the next token or token type that was to be examined.
(In the case of a script passed to
Setuid scripts have additional constraints that can produce error messages
such as \*(L"Insecure dependency\*(R".
See the section on setuid scripts.
users should take special note of the following:
Semicolons are required after all simple statements in
(except at the end of a block).
Newline is not a statement delimiter.
Curly brackets are required on ifs and whiles.
Variables begin with $ or @ in
Arrays index from 0 unless you set $[.
Likewise string positions in substr() and index().
You have to decide whether your array has numeric or string indices.
Associative array values do not spring into existence upon mere reference.
You have to decide whether you want to use string or numeric comparisons.
Reading an input line does not split it for you. You get to split it yourself
operator has different arguments.
The current input line is normally in $_, not $0.
It generally does not have the newline stripped.
($0 is the name of the program executed.)
$<digit> does not refer to fields\(emit refers to substrings matched by the last
statement does not add field and record separators unless you set
You must open your files before you print to them.
The range operator is \*(L".\|.\*(R", not comma.
(The comma operator works as in C.)
The match operator is \*(L"=~\*(R", not \*(L"~\*(R".
(\*(L"~\*(R" is the one's complement operator, as in C.)
The exponentiation operator is \*(L"**\*(R", not \*(L"^\*(R".
(\*(L"^\*(R" is the XOR operator, as in C.)
The concatenation operator is \*(L".\*(R", not the null string.
(Using the null string would render \*(L"/pat/ /pat/\*(R" unparsable,
since the third slash would be interpreted as a division operator\(emthe
tokener is in fact slightly context sensitive for operators like /, ?, and <.
And in fact, . itself can be the beginning of a number.)
The following variables work differently
FNR \h'|2.5i'$. \- something
FS \h'|2.5i'(whatever you like)
NF \h'|2.5i'$#Fld, or some such
RLENGTH \h'|2.5i'length($&)
RSTART \h'|2.5i'length($\`)
construct through a2p and see what it gives you.
Cerebral C programmers should take note of the following:
Curly brackets are required on ifs and whiles.
You should use \*(L"elsif\*(R" rather than \*(L"else if\*(R"
There's no switch statement.
Variables begin with $ or @ in
Printf does not implement *.
Comments begin with #, not /*.
You can't take the address of anything.
ARGV must be capitalized.
The \*(L"system\*(R" calls link, unlink, rename, etc. return nonzero for success, not 0.
Signal handlers deal with signal names, not numbers.
programmers should take note of the following:
Backreferences in substitutions use $ rather than \e.
The pattern matching metacharacters (, ), and | do not have backslashes in front.
The range operator is .\|. rather than comma.
Sharp shell programmers should take note of the following:
The backtick operator does variable interpretation without regard to the
presence of single quotes in the command.
The backtick operator does no translation of the return value, unlike csh.
Shells (especially csh) do several levels of substitution on each command line.
does substitution only in certain constructs such as double quotes,
backticks, angle brackets and search patterns.
Shells interpret scripts a little bit at a time.
compiles the whole program before executing it.
The arguments are available via @ARGV, not $1, $2, etc.
The environment is not automatically made available as variables.
is at the mercy of your machine's definitions of various operations
such as type casting, atof() and sprintf().
If your stdio requires a seek or eof between reads and writes on a particular
(This doesn't apply to sysread() and syswrite().)
While none of the built-in data types have any arbitrary size limits (apart
from memory size), there are still a few arbitrary limits:
a given identifier may not be longer than 255 characters,
and no component of your PATH may be longer than 255 if you use \-S.
A regular expression may not compile to more than 32767 bytes internally.
actually stands for Pathologically Eclectic Rubbish Lister, but don't tell
Larry Wall <lwall@netlabs.com>
MS-DOS port by Diomidis Spinellis <dds@cc.ic.ac.uk>