package Text
::ParseWords
;
use vars
qw($VERSION @ISA @EXPORT $PERL_SINGLE_QUOTE);
@EXPORT = qw(shellwords quotewords nested_quotewords parse_line);
@EXPORT_OK = qw(old_shellwords);
$lines[$#lines] =~ s/\s+$//;
return(quotewords
('\s+', 0, @lines));
my($delim, $keep, @lines) = @_;
my($line, @words, @allwords);
@words = parse_line
($delim, $keep, $line);
return() unless (@words || !length($line));
my($delim, $keep, @lines) = @_;
for ($i = 0; $i < @lines; $i++) {
@
{$allwords[$i]} = parse_line
($delim, $keep, $lines[$i]);
return() unless (@
{$allwords[$i]} || !length($lines[$i]));
my($delimiter, $keep, $line) = @_;
no warnings
'uninitialized'; # we will be testing undef strings
$line =~ s
/^(["']) # a $quote
((?:\\.|(?!\1)[^\\])*) # and $quoted text
\1 # followed by the same quote
^((?:\\.|[^\\"'])*?) # an $unquoted text
(\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))
# plus EOL, delimiter, or quote
//xs or return; # extended layout
my($quote, $quoted, $unquoted, $delim) = ($1, $2, $3, $4);
return() unless( defined($quote) || length($unquoted) || length($delim));
$quoted = "$quote$quoted$quote";
$unquoted =~ s/\\(.)/$1/sg;
$quoted =~ s/\\(.)/$1/sg if ($quote eq '"');
$quoted =~ s/\\([\\'])/$1/g if ( $PERL_SINGLE_QUOTE && $quote eq "'");
$word .= substr($line, 0, 0); # leave results tainted
$word .= defined $quote ?
$quoted : $unquoted;
push(@pieces, $delim) if ($keep eq 'delimiters');
# @words = old_shellwords($line);
# @words = old_shellwords(@lines);
# @words = old_shellwords(); # defaults to $_ (and clobbers it)
no warnings
'uninitialized'; # we will be testing undef strings
local *_
= \
join('', @_) if @_;
my $field = substr($_, 0, 0); # leave results tainted
if (s/\A"(([^"\\]|\\.)*)"//s) {
($snippet = $1) =~ s
#\\(.)#$1#sg;
Carp
::carp
("Unmatched double quote: $_");
elsif (s/\A'(([^'\\]|\\.)*)'//s) {
($snippet = $1) =~ s
#\\(.)#$1#sg;
Carp
::carp
("Unmatched single quote: $_");
elsif (s/\A([^\s\\'"]+)//) {
Text::ParseWords - parse text into an array of tokens or array of arrays
@lists = &nested_quotewords($delim, $keep, @lines);
@words = "ewords($delim, $keep, @lines);
@words = &shellwords(@lines);
@words = &parse_line($delim, $keep, $line);
@words = &old_shellwords(@lines); # DEPRECATED!
The &nested_quotewords() and "ewords() functions accept a delimiter
(which can be a regular expression)
and a list of lines and then breaks those lines up into a list of
words ignoring delimiters that appear inside quotes. "ewords()
returns all of the tokens in a single long list, while &nested_quotewords()
returns a list of token lists corresponding to the elements of @lines.
&parse_line() does tokenizing on a single string. The &*quotewords()
functions simply call &parse_line(), so if you're only splitting
one line you can call &parse_line() directly and save a function
The $keep argument is a boolean flag. If true, then the tokens are
split on the specified delimiter, but all other characters (quotes,
backslashes, etc.) are kept in the tokens. If $keep is false then the
&*quotewords() functions remove all quotes and backslashes that are
not themselves backslash-escaped or inside of single quotes (i.e.,
"ewords() tries to interpret these characters just like the Bourne
shell). NB: these semantics are significantly different from the
original version of this module shipped with Perl 5.000 through 5.004.
As an additional feature, $keep may be the keyword "delimiters" which
causes the functions to preserve the delimiters in each string as
tokens in the token lists, in addition to preserving quote and
&shellwords() is written as a special case of "ewords(), and it
does token parsing with whitespace as a delimiter-- similar to most
@words = "ewords('\s+', 0, q{this is "a test" of\ quotewords \"for you});
multiple spaces are skipped because of our $delim
use of quotes to include a space in a word
use of a backslash to include a space in a word
use of a backslash to remove the special meaning of a double-quote
another simple word (note the lack of effect of the
backslashed double-quote)
Replacing C<"ewords('\s+', 0, q{this is...})>
with C<&shellwords(q{this is...})>
is a simpler way to accomplish the same thing.
Maintainer is Hal Pomeranz <pomeranz@netcom.com>, 1994-1997 (Original
author unknown). Much of the code for &parse_line() (including the
primary regexp) from Joerk Behrends <jbehrends@multimediaproduzenten.de>.
Examples section another documentation provided by John Heidemann
Bug reports, patches, and nagging provided by lots of folks-- thanks
everybody! Special thanks to Michael Schwern <schwern@envirolink.org>
for assuring me that a &nested_quotewords() would be useful, and to
Jeff Friedl <jfriedl@yahoo-inc.com> for telling me not to worry about
error-checking (sort of-- you had to be there).