Initial commit of OpenSPARC T2 design and verification files.
[OpenSPARC-T2-DV] / tools / perl-5.8.0 / bin / psed
CommitLineData
86530b38
AT
1#!/import/bw/tools/local/perl-5.8.0/bin/perl
2 eval 'exec /import/bw/tools/local/perl-5.8.0/bin/perl -S $0 ${1+"$@"}'
3 if $running_under_some_shell;
4my $startperl;
5my $perlpath;
6($startperl = <<'/../') =~ s/\s*\z//;
7#!/import/bw/tools/local/perl-5.8.0/bin/perl
8/../
9($perlpath = <<'/../') =~ s/\s*\z//;
10/import/bw/tools/local/perl-5.8.0/bin/perl
11/../
12
13$0 =~ s/^.*?(\w+)[\.\w]*$/$1/;
14
15# (p)sed - a stream editor
16# History: Aug 12 2000: Original version.
17# Mar 25 2002: Rearrange generated Perl program.
18
19use strict;
20use integer;
21use Symbol;
22
23=head1 NAME
24
25psed - a stream editor
26
27=head1 SYNOPSIS
28
29 psed [-an] script [file ...]
30 psed [-an] [-e script] [-f script-file] [file ...]
31
32 s2p [-an] [-e script] [-f script-file]
33
34=head1 DESCRIPTION
35
36A stream editor reads the input stream consisting of the specified files
37(or standard input, if none are given), processes is line by line by
38applying a script consisting of edit commands, and writes resulting lines
39to standard output. The filename `C<->' may be used to read standard input.
40
41The edit script is composed from arguments of B<-e> options and
42script-files, in the given order. A single script argument may be specified
43as the first parameter.
44
45If this program is invoked with the name F<s2p>, it will act as a
46sed-to-Perl translator. See L<"sed Script Translation">.
47
48B<sed> returns an exit code of 0 on success or >0 if an error occurred.
49
50=head1 OPTIONS
51
52=over 4
53
54=item B<-a>
55
56A file specified as argument to the B<w> edit command is by default
57opened before input processing starts. Using B<-a>, opening of such
58files is delayed until the first line is actually written to the file.
59
60=item B<-e> I<script>
61
62The editing commands defined by I<script> are appended to the script.
63Multiple commands must be separated by newlines.
64
65=item B<-f> I<script-file>
66
67Editing commands from the specified I<script-file> are read and appended
68to the script.
69
70=item B<-n>
71
72By default, a line is written to standard output after the editing script
73has been applied to it. The B<-n> option suppresses automatic printing.
74
75=back
76
77=head1 COMMANDS
78
79B<sed> command syntax is defined as
80
81Z<> Z<> Z<> Z<>[I<address>[B<,>I<address>]][B<!>]I<function>[I<argument>]
82
83with whitespace being permitted before or after addresses, and between
84the function character and the argument. The I<address>es and the
85address inverter (C<!>) are used to restrict the application of a
86command to the selected line(s) of input.
87
88Each command must be on a line of its own, except where noted in
89the synopses below.
90
91The edit cycle performed on each input line consist of reading the line
92(without its trailing newline character) into the I<pattern space>,
93applying the applicable commands of the edit script, writing the final
94contents of the pattern space and a newline to the standard output.
95A I<hold space> is provided for saving the contents of the
96pattern space for later use.
97
98=head2 Addresses
99
100A sed address is either a line number or a pattern, which may be combined
101arbitrarily to construct ranges. Lines are numbered across all input files.
102
103Any address may be followed by an exclamation mark (`C<!>'), selecting
104all lines not matching that address.
105
106=over 4
107
108=item I<number>
109
110The line with the given number is selected.
111
112=item B<$>
113
114A dollar sign (C<$>) is the line number of the last line of the input stream.
115
116=item B</>I<regular expression>B</>
117
118A pattern address is a basic regular expression (see
119L<"Basic Regular Expressions">), between the delimiting character C</>.
120Any other character except C<\> or newline may be used to delimit a
121pattern address when the initial delimiter is prefixed with a
122backslash (`C<\>').
123
124=back
125
126If no address is given, the command selects every line.
127
128If one address is given, it selects the line (or lines) matching the
129address.
130
131Two addresses select a range that begins whenever the first address
132matches, and ends (including that line) when the second address matches.
133If the first (second) address is a matching pattern, the second
134address is not applied to the very same line to determine the end of
135the range. Likewise, if the second address is a matching pattern, the
136first address is not applied to the very same line to determine the
137begin of another range. If both addresses are line numbers,
138and the second line number is less than the first line number, then
139only the first line is selected.
140
141
142=head2 Functions
143
144The maximum permitted number of addresses is indicated with each
145function synopsis below.
146
147The argument I<text> consists of one or more lines following the command.
148Embedded newlines in I<text> must be preceded with a backslash. Other
149backslashes in I<text> are deleted and the following character is taken
150literally.
151
152=over 4
153
154=cut
155
156my %ComTab;
157my %GenKey;
158#--------------------------------------------------------------------------
159$ComTab{'a'}=[ 1, 'txt', \&Emit, '{ push( @Q, <<'."'TheEnd' ) }\n" ]; #ok
160
161=item [1addr]B<a\> I<text>
162
163Write I<text> (which must start on the line following the command)
164to standard output immediately before reading the next line
165of input, either by executing the B<N> function or by beginning a new cycle.
166
167=cut
168
169#--------------------------------------------------------------------------
170$ComTab{'b'}=[ 2, 'str', \&Branch, '{ goto XXX; }' ]; #ok
171
172=item [2addr]B<b> [I<label>]
173
174Branch to the B<:> function with the specified I<label>. If no label
175is given, branch to the end of the script.
176
177=cut
178
179#--------------------------------------------------------------------------
180$ComTab{'c'}=[ 2, 'txt', \&Change, <<'-X-' ]; #ok
181{ print <<'TheEnd'; } $doPrint = 0; goto EOS;
182-X-
183### continue OK => next CYCLE;
184
185=item [2addr]B<c\> I<text>
186
187The line, or range of lines, selected by the address is deleted.
188The I<text> (which must start on the line following the command)
189is written to standard output. With an address range, this occurs at
190the end of the range.
191
192=cut
193
194#--------------------------------------------------------------------------
195$ComTab{'d'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
196{ $doPrint = 0;
197 goto EOS;
198}
199-X-
200### continue OK => next CYCLE;
201
202=item [2addr]B<d>
203
204Deletes the pattern space and starts the next cycle.
205
206=cut
207
208#--------------------------------------------------------------------------
209$ComTab{'D'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
210{ s/^.*\n?//;
211 if(length($_)){ goto BOS } else { goto EOS }
212}
213-X-
214### continue OK => next CYCLE;
215
216=item [2addr]B<D>
217
218Deletes the pattern space through the first embedded newline or to the end.
219If the pattern space becomes empty, a new cycle is started, otherwise
220execution of the script is restarted.
221
222=cut
223
224#--------------------------------------------------------------------------
225$ComTab{'g'}=[ 2, '', \&Emit, '{ $_ = $Hold };' ]; #ok
226
227=item [2addr]B<g>
228
229Replace the contents of the pattern space with the hold space.
230
231=cut
232
233#--------------------------------------------------------------------------
234$ComTab{'G'}=[ 2, '', \&Emit, '{ $_ .= "\n"; $_ .= $Hold };' ]; #ok
235
236=item [2addr]B<G>
237
238Append a newline and the contents of the hold space to the pattern space.
239
240=cut
241
242#--------------------------------------------------------------------------
243$ComTab{'h'}=[ 2, '', \&Emit, '{ $Hold = $_ }' ]; #ok
244
245=item [2addr]B<h>
246
247Replace the contents of the hold space with the pattern space.
248
249=cut
250
251#--------------------------------------------------------------------------
252$ComTab{'H'}=[ 2, '', \&Emit, '{ $Hold .= "\n"; $Hold .= $_; }' ]; #ok
253
254=item [2addr]B<H>
255
256Append a newline and the contents of the pattern space to the hold space.
257
258=cut
259
260#--------------------------------------------------------------------------
261$ComTab{'i'}=[ 1, 'txt', \&Emit, '{ print <<'."'TheEnd' }\n" ]; #ok
262
263=item [1addr]B<i\> I<text>
264
265Write the I<text> (which must start on the line following the command)
266to standard output.
267
268=cut
269
270#--------------------------------------------------------------------------
271$ComTab{'l'}=[ 2, '', \&Emit, '{ _l() }' ]; #okUTF8
272
273=item [2addr]B<l>
274
275Print the contents of the pattern space: non-printable characters are
276shown in C-style escaped form; long lines are split and have a trailing
277`C<\>' at the point of the split; the true end of a line is marked with
278a `C<$>'. Escapes are: `\a', `\t', `\n', `\f', `\r', `\e' for
279BEL, HT, LF, FF, CR, ESC, respectively, and `\' followed by a three-digit
280octal number for all other non-printable characters.
281
282=cut
283
284#--------------------------------------------------------------------------
285$ComTab{'n'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
286{ print $_, "\n" if $doPrint;
287 printQ() if @Q;
288 $CondReg = 0;
289 last CYCLE unless getsARGV();
290 chomp();
291}
292-X-
293
294=item [2addr]B<n>
295
296If automatic printing is enabled, write the pattern space to the standard
297output. Replace the pattern space with the next line of input. If
298there is no more input, processing is terminated.
299
300=cut
301
302#--------------------------------------------------------------------------
303$ComTab{'N'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
304{ printQ() if @Q;
305 $CondReg = 0;
306 last CYCLE unless getsARGV( $h );
307 chomp( $h );
308 $_ .= "\n$h";
309}
310-X-
311
312=item [2addr]B<N>
313
314Append a newline and the next line of input to the pattern space. If
315there is no more input, processing is terminated.
316
317=cut
318
319#--------------------------------------------------------------------------
320$ComTab{'p'}=[ 2, '', \&Emit, '{ print $_, "\n"; }' ]; #ok
321
322=item [2addr]B<p>
323
324Print the pattern space to the standard output. (Use the B<-n> option
325to suppress automatic printing at the end of a cycle if you want to
326avoid double printing of lines.)
327
328=cut
329
330#--------------------------------------------------------------------------
331$ComTab{'P'}=[ 2, '', \&Emit, <<'-X-' ]; #ok
332{ if( /^(.*)/ ){ print $1, "\n"; } }
333-X-
334
335=item [2addr]B<P>
336
337Prints the pattern space through the first embedded newline or to the end.
338
339=cut
340
341#--------------------------------------------------------------------------
342$ComTab{'q'}=[ 1, '', \&Emit, <<'-X-' ]; #ok
343{ print $_, "\n" if $doPrint;
344 last CYCLE;
345}
346-X-
347
348=item [1addr]B<q>
349
350Branch to the end of the script and quit without starting a new cycle.
351
352=cut
353
354#--------------------------------------------------------------------------
355$ComTab{'r'}=[ 1, 'str', \&Emit, "{ _r( '-X-' ) }" ]; #ok
356
357=item [1addr]B<r> I<file>
358
359Copy the contents of the I<file> to standard output immediately before
360the next attempt to read a line of input. Any error encountered while
361reading I<file> is silently ignored.
362
363=cut
364
365#--------------------------------------------------------------------------
366$ComTab{'s'}=[ 2, 'sub', \&Emit, '' ]; #ok
367
368=item [2addr]B<s/>I<regular expression>B</>I<replacement>B</>I<flags>
369
370Substitute the I<replacement> string for the first substring in
371the pattern space that matches the I<regular expression>.
372Any character other than backslash or newline can be used instead of a
373slash to delimit the regular expression and the replacement.
374To use the delimiter as a literal character within the regular expression
375and the replacement, precede the character by a backslash (`C<\>').
376
377Literal newlines may be embedded in the replacement string by
378preceding a newline with a backslash.
379
380Within the replacement, an ampersand (`C<&>') is replaced by the string
381matching the regular expression. The strings `C<\1>' through `C<\9>' are
382replaced by the corresponding subpattern (see L<"Basic Regular Expressions">).
383To get a literal `C<&>' or `C<\>' in the replacement text, precede it
384by a backslash.
385
386The following I<flags> modify the behaviour of the B<s> command:
387
388=over 8
389
390=item B<g>
391
392The replacement is performed for all matching, non-overlapping substrings
393of the pattern space.
394
395=item B<1>..B<9>
396
397Replace only the n-th matching substring of the pattern space.
398
399=item B<p>
400
401If the substitution was made, print the new value of the pattern space.
402
403=item B<w> I<file>
404
405If the substitution was made, write the new value of the pattern space
406to the specified file.
407
408=back
409
410=cut
411
412#--------------------------------------------------------------------------
413$ComTab{'t'}=[ 2, 'str', \&Branch, '{ goto XXX if _t() }' ]; #ok
414
415=item [2addr]B<t> [I<label>]
416
417Branch to the B<:> function with the specified I<label> if any B<s>
418substitutions have been made since the most recent reading of an input line
419or execution of a B<t> function. If no label is given, branch to the end of
420the script.
421
422
423=cut
424
425#--------------------------------------------------------------------------
426$ComTab{'w'}=[ 2, 'str', \&Write, "{ _w( '-X-' ) }" ]; #ok
427
428=item [2addr]B<w> I<file>
429
430The contents of the pattern space are written to the I<file>.
431
432=cut
433
434#--------------------------------------------------------------------------
435$ComTab{'x'}=[ 2, '', \&Emit, '{ ($Hold, $_) = ($_, $Hold) }' ]; #ok
436
437=item [2addr]B<x>
438
439Swap the contents of the pattern space and the hold space.
440
441=cut
442
443#--------------------------------------------------------------------------
444$ComTab{'y'}=[ 2, 'tra', \&Emit, '' ]; #ok
445=item [2addr]B<y>B</>I<string1>B</>I<string2>B</>
446
447In the pattern space, replace all characters occuring in I<string1> by the
448character at the corresponding position in I<string2>. It is possible
449to use any character (other than a backslash or newline) instead of a
450slash to delimit the strings. Within I<string1> and I<string2>, a
451backslash followed by any character other than a newline is that literal
452character, and a backslash followed by an `n' is replaced by a newline
453character.
454
455=cut
456
457#--------------------------------------------------------------------------
458$ComTab{'='}=[ 1, '', \&Emit, '{ print "$.\n" }' ]; #ok
459
460=item [1addr]B<=>
461
462Prints the current line number on the standard output.
463
464=cut
465
466#--------------------------------------------------------------------------
467$ComTab{':'}=[ 0, 'str', \&Label, '' ]; #ok
468
469=item [0addr]B<:> [I<label>]
470
471The command specifies the position of the I<label>. It has no other effect.
472
473=cut
474
475#--------------------------------------------------------------------------
476$ComTab{'{'}=[ 2, '', \&BeginBlock, '{' ]; #ok
477$ComTab{'}'}=[ 0, '', \&EndBlock, ';}' ]; #ok
478# ';' to avoid warning on empty {}-block
479
480=item [2addr]B<{> [I<command>]
481
482=item [0addr]B<}>
483
484These two commands begin and end a command list. The first command may
485be given on the same line as the opening B<{> command. The commands
486within the list are jointly selected by the address(es) given on the
487B<{> command (but may still have individual addresses).
488
489=cut
490
491#--------------------------------------------------------------------------
492$ComTab{'#'}=[ 0, 'str', \&Comment, '' ]; #ok
493
494=item [0addr]B<#> [I<comment>]
495
496The entire line is ignored (treated as a comment). If, however, the first
497two characters in the script are `C<#n>', automatic printing of output is
498suppressed, as if the B<-n> option were given on the command line.
499
500=back
501
502=cut
503
504use vars qw{ $isEOF $Hold %wFiles @Q $CondReg $doPrint };
505
506my $useDEBUG = exists( $ENV{PSEDDEBUG} );
507my $useEXTBRE = $ENV{PSEDEXTBRE} || '';
508$useEXTBRE =~ s/[^<>wWyB]//g; # gawk RE's handle these
509
510my $doAutoPrint = 1; # automatic printing of pattern space (-n => 0)
511my $doOpenWrite = 1; # open w command output files at start (-a => 0)
512my $svOpenWrite = 0; # save $doOpenWrite
513my $doGenerate = $0 eq 's2p';
514
515# Collected and compiled script
516#
517my( @Commands, %Defined, @BlockStack, %Label, $labNum, $Code, $Func );
518$Code = '';
519
520##################
521# Compile Time
522#
523# Labels
524#
525# Error handling
526#
527sub Warn($;$){
528 my( $msg, $loc ) = @_;
529 $loc ||= '';
530 $loc .= ': ' if length( $loc );
531 warn( "$0: $loc$msg\n" );
532}
533
534$labNum = 0;
535sub newLabel(){
536 return 'L_'.++$labNum;
537}
538
539# safeHere: create safe here delimiter and modify opcode and argument
540#
541sub safeHere($$){
542 my( $codref, $argref ) = @_;
543 my $eod = 'EOD000';
544 while( $$argref =~ /^$eod$/m ){
545 $eod++;
546 }
547 $$codref =~ s/TheEnd/$eod/e;
548 $$argref .= "$eod\n";
549}
550
551# Emit: create address logic and emit command
552#
553sub Emit($$$$$$){
554 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
555 my $cond = '';
556 if( defined( $addr1 ) ){
557 if( defined( $addr2 ) ){
558 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
559 } else {
560 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
561 }
562 $cond = $negated ? "unless( $addr1 )\n" : "if( $addr1 )\n";
563 }
564
565 if( $opcode eq '' ){
566 $Code .= "$cond$arg\n";
567
568 } elsif( $opcode =~ s/-X-/$arg/e ){
569 $Code .= "$cond$opcode\n";
570
571 } elsif( $opcode =~ /TheEnd/ ){
572 safeHere( \$opcode, \$arg );
573 $Code .= "$cond$opcode$arg";
574
575 } else {
576 $Code .= "$cond$opcode\n";
577 }
578 0;
579}
580
581# Write (w command, w flag): store pathname
582#
583sub Write($$$$$$){
584 my( $addr1, $addr2, $negated, $opcode, $path, $fl ) = @_;
585 $wFiles{$path} = '';
586 Emit( $addr1, $addr2, $negated, $opcode, $path, $fl );
587}
588
589
590# Label (: command): label definition
591#
592sub Label($$$$$$){
593 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
594 my $rc = 0;
595 $lab =~ s/\s+//;
596 if( length( $lab ) ){
597 my $h;
598 if( ! exists( $Label{$lab} ) ){
599 $h = $Label{$lab}{name} = newLabel();
600 } else {
601 $h = $Label{$lab}{name};
602 if( exists( $Label{$lab}{defined} ) ){
603 my $dl = $Label{$lab}{defined};
604 Warn( "duplicate label $lab (first defined at $dl)", $fl );
605 $rc = 1;
606 }
607 }
608 $Label{$lab}{defined} = $fl;
609 $Code .= "$h:;\n";
610 }
611 $rc;
612}
613
614# BeginBlock ({ command): push block start
615#
616sub BeginBlock($$$$$$){
617 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
618 push( @BlockStack, [ $fl, $addr1, $addr2, $negated ] );
619 Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
620}
621
622# EndBlock (} command): check proper nesting
623#
624sub EndBlock($$$$$$){
625 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
626 my $rc;
627 my $jcom = pop( @BlockStack );
628 if( defined( $jcom ) ){
629 $rc = Emit( $addr1, $addr2, $negated, $opcode, $arg, $fl );
630 } else {
631 Warn( "unexpected `}'", $fl );
632 $rc = 1;
633 }
634 $rc;
635}
636
637# Branch (t, b commands): check or create label, substitute default
638#
639sub Branch($$$$$$){
640 my( $addr1, $addr2, $negated, $opcode, $lab, $fl ) = @_;
641 $lab =~ s/\s+//; # no spaces at end
642 my $h;
643 if( length( $lab ) ){
644 if( ! exists( $Label{$lab} ) ){
645 $h = $Label{$lab}{name} = newLabel();
646 } else {
647 $h = $Label{$lab}{name};
648 }
649 push( @{$Label{$lab}{used}}, $fl );
650 } else {
651 $h = 'EOS';
652 }
653 $opcode =~ s/XXX/$h/e;
654 Emit( $addr1, $addr2, $negated, $opcode, '', $fl );
655}
656
657# Change (c command): is special due to range end watching
658#
659sub Change($$$$$$){
660 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
661 my $kwd = $negated ? 'unless' : 'if';
662 if( defined( $addr2 ) ){
663 $addr1 .= $addr2 =~ /^\d+$/ ? "..$addr2" : "...$addr2";
664 if( ! $negated ){
665 $addr1 = '$icnt = ('.$addr1.')';
666 $opcode = 'if( $icnt =~ /E0$/ )' . $opcode;
667 }
668 } else {
669 $addr1 .= ' == $.' if $addr1 =~ /^\d+$/;
670 }
671 safeHere( \$opcode, \$arg );
672 $Code .= "$kwd( $addr1 ){\n $opcode$arg}\n";
673 0;
674}
675
676
677# Comment (# command): A no-op. Who would've thought that!
678#
679sub Comment($$$$$$){
680 my( $addr1, $addr2, $negated, $opcode, $arg, $fl ) = @_;
681### $Code .= "# $arg\n";
682 0;
683}
684
685
686sub stripRegex($$){
687 my( $del, $sref ) = @_;
688 my $regex = $del;
689 print "stripRegex:$del:$$sref:\n" if $useDEBUG;
690 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E(\s*)}{}s ){
691 my $sl = $2;
692 $regex .= $1.$sl.$del;
693 if( length( $sl ) % 2 == 0 ){
694 return $regex;
695 }
696 $regex .= $3;
697 }
698 undef();
699}
700
701# stripTrans: take a <del> terminated string from y command
702# honoring and cleaning up of \-escaped <del>'s
703#
704sub stripTrans($$){
705 my( $del, $sref ) = @_;
706 my $t = '';
707 print "stripTrans:$del:$$sref:\n" if $useDEBUG;
708 while( $$sref =~ s{^(.*?)(\\*)\Q$del\E}{}s ){
709 my $sl = $2;
710 $t .= $1;
711 if( length( $sl ) % 2 == 0 ){
712 $t .= $sl;
713 $t =~ s/\\\\/\\/g;
714 return $t;
715 }
716 chop( $sl );
717 $t .= $sl.$del.$3;
718 }
719 undef();
720}
721
722# makey - construct Perl y/// from sed y///
723#
724sub makey($$$){
725 my( $fr, $to, $fl ) = @_;
726 my $error = 0;
727
728 # Ensure that any '-' is up front.
729 # Diagnose duplicate contradicting mappings
730 my %tr;
731 for( my $i = 0; $i < length($fr); $i++ ){
732 my $fc = substr($fr,$i,1);
733 my $tc = substr($to,$i,1);
734 if( exists( $tr{$fc} ) && $tr{$fc} ne $tc ){
735 Warn( "ambiguos translation for character `$fc' in `y' command",
736 $fl );
737 $error++;
738 }
739 $tr{$fc} = $tc;
740 }
741 $fr = $to = '';
742 if( exists( $tr{'-'} ) ){
743 ( $fr, $to ) = ( '-', $tr{'-'} );
744 delete( $tr{'-'} );
745 } else {
746 $fr = $to = '';
747 }
748 # might just as well sort it...
749 for my $fc ( sort keys( %tr ) ){
750 $fr .= $fc;
751 $to .= $tr{$fc};
752 }
753 # make embedded delimiters and newlines safe
754 $fr =~ s/([{}])/\$1/g;
755 $to =~ s/([{}])/\$1/g;
756 $fr =~ s/\n/\\n/g;
757 $to =~ s/\n/\\n/g;
758 return $error ? undef() : "{ y{$fr}{$to}; }";
759}
760
761######
762# makes - construct Perl s/// from sed s///
763#
764sub makes($$$$$$$){
765 my( $regex, $subst, $path, $global, $print, $nmatch, $fl ) = @_;
766
767 # make embedded newlines safe
768 $regex =~ s/\n/\\n/g;
769 $subst =~ s/\n/\\n/g;
770
771 my $code;
772 # n-th occurrence
773 #
774 if( length( $nmatch ) ){
775 $code = <<TheEnd;
776{ \$n = $nmatch;
777 while( --\$n && ( \$s = m ${regex}g ) ){}
778 \$s = ( substr( \$_, pos() ) =~ s ${regex}${subst}s ) if \$s;
779 \$CondReg ||= \$s;
780TheEnd
781 } else {
782 $code = <<TheEnd;
783{ \$s = s ${regex}${subst}s${global};
784 \$CondReg ||= \$s;
785TheEnd
786 }
787 if( $print ){
788 $code .= ' print $_, "\n" if $s;'."\n";
789 }
790 if( defined( $path ) ){
791 $wFiles{$path} = '';
792 $code .= " _w( '$path' ) if \$s;\n";
793 $GenKey{'w'} = 1;
794 }
795 $code .= "}";
796}
797
798=head1 BASIC REGULAR EXPRESSIONS
799
800A I<Basic Regular Expression> (BRE), as defined in POSIX 1003.2, consists
801of I<atoms>, for matching parts of a string, and I<bounds>, specifying
802repetitions of a preceding atom.
803
804=head2 Atoms
805
806The possible atoms of a BRE are: B<.>, matching any single character;
807B<^> and B<$>, matching the null string at the beginning or end
808of a string, respectively; a I<bracket expressions>, enclosed
809in B<[> and B<]> (see below); and any single character with no
810other significance (matching that character). A B<\> before one
811of: B<.>, B<^>, B<$>, B<[>, B<*>, B<\>, matching the character
812after the backslash. A sequence of atoms enclosed in B<\(> and B<\)>
813becomes an atom and establishes the target for a I<backreference>,
814consisting of the substring that actually matches the enclosed atoms.
815Finally, B<\> followed by one of the digits B<0> through B<9> is a
816backreference.
817
818A B<^> that is not first, or a B<$> that is not last does not have
819a special significance and need not be preceded by a backslash to
820become literal. The same is true for a B<]>, that does not terminate
821a bracket expression.
822
823An unescaped backslash cannot be last in a BRE.
824
825=head2 Bounds
826
827The BRE bounds are: B<*>, specifying 0 or more matches of the preceding
828atom; B<\{>I<count>B<\}>, specifying that many repetitions;
829B<\{>I<minimum>B<,\}>, giving a lower limit; and
830B<\{>I<minimum>B<,>I<maximum>B<\}> finally defines a lower and upper
831bound.
832
833A bound appearing as the first item in a BRE is taken literally.
834
835=head2 Bracket Expressions
836
837A I<bracket expression> is a list of characters, character ranges
838and character classes enclosed in B<[> and B<]> and matches any
839single character from the represented set of characters.
840
841A character range is written as two characters separated by B<-> and
842represents all characters (according to the character collating sequence)
843that are not less than the first and not greater than the second.
844(Ranges are very collating-sequence-dependent, and portable programs
845should avoid relying on them.)
846
847A character class is one of the class names
848
849 alnum digit punct
850 alpha graph space
851 blank lower upper
852 cntrl print xdigit
853
854enclosed in B<[:> and B<:]> and represents the set of characters
855as defined in ctype(3).
856
857If the first character after B<[> is B<^>, the sense of matching is
858inverted.
859
860To include a literal `C<^>', place it anywhere else but first. To
861include a literal 'C<]>' place it first or immediately after an
862initial B<^>. To include a literal `C<->' make it the first (or
863second after B<^>) or last character, or the second endpoint of
864a range.
865
866The special bracket expression constructs C<[[:E<lt>:]]> and C<[[:E<gt>:]]>
867match the null string at the beginning and end of a word respectively.
868(Note that neither is identical to Perl's `\b' atom.)
869
870=head2 Additional Atoms
871
872Since some sed implementations provide additional regular expression
873atoms (not defined in POSIX 1003.2), B<psed> is capable of translating
874the following backslash escapes:
875
876=over 4
877
878=item B<\E<lt>> This is the same as C<[[:E<gt>:]]>.
879
880=item B<\E<gt>> This is the same as C<[[:E<lt>:]]>.
881
882=item B<\w> This is an abbreviation for C<[[:alnum:]_]>.
883
884=item B<\W> This is an abbreviation for C<[^[:alnum:]_]>.
885
886=item B<\y> Match the empty string at a word boundary.
887
888=item B<\B> Match the empty string between any two either word or non-word characters.
889
890=back
891
892To enable this feature, the environment variable PSEDEXTBRE must be set
893to a string containing the requested characters, e.g.:
894C<PSEDEXTBRE='E<lt>E<gt>wW'>.
895
896=cut
897
898#####
899# bre2p - convert BRE to Perl RE
900#
901sub peek(\$$){
902 my( $pref, $ic ) = @_;
903 $ic < length($$pref)-1 ? substr( $$pref, $ic+1, 1 ) : '';
904}
905
906sub bre2p($$$){
907 my( $del, $pat, $fl ) = @_;
908 my $led = $del;
909 $led =~ tr/{([</})]>/;
910 $led = '' if $led eq $del;
911
912 $pat = substr( $pat, 1, length($pat) - 2 );
913 my $res = '';
914 my $bracklev = 0;
915 my $backref = 0;
916 my $parlev = 0;
917 for( my $ic = 0; $ic < length( $pat ); $ic++ ){
918 my $c = substr( $pat, $ic, 1 );
919 if( $c eq '\\' ){
920 ### backslash escapes
921 my $nc = peek($pat,$ic);
922 if( $nc eq '' ){
923 Warn( "`\\' cannot be last in pattern", $fl );
924 return undef();
925 }
926 $ic++;
927 if( $nc eq $del ){ ## \<pattern del> => \<pattern del>
928 $res .= "\\$del";
929
930 } elsif( $nc =~ /([[.*\\n])/ ){
931 ## check for \-escaped magics and \n:
932 ## \[ \. \* \\ \n stay as they are
933 $res .= '\\'.$nc;
934
935 } elsif( $nc eq '(' ){ ## \( => (
936 $parlev++;
937 $res .= '(';
938
939 } elsif( $nc eq ')' ){ ## \) => )
940 $parlev--;
941 $backref++;
942 if( $parlev < 0 ){
943 Warn( "unmatched `\\)'", $fl );
944 return undef();
945 }
946 $res .= ')';
947
948 } elsif( $nc eq '{' ){ ## repetition factor \{<i>[,[<j>]]\}
949 my $endpos = index( $pat, '\\}', $ic );
950 if( $endpos < 0 ){
951 Warn( "unmatched `\\{'", $fl );
952 return undef();
953 }
954 my $rep = substr( $pat, $ic+1, $endpos-($ic+1) );
955 $ic = $endpos + 1;
956
957 if( $res =~ /^\^?$/ ){
958 $res .= "\\{$rep\}";
959 } elsif( $rep =~ /^(\d+)(,?)(\d*)?$/ ){
960 my $min = $1;
961 my $com = $2 || '';
962 my $max = $3;
963 if( length( $max ) ){
964 if( $max < $min ){
965 Warn( "maximum less than minimum in `\\{$rep\\}'",
966 $fl );
967 return undef();
968 }
969 } else {
970 $max = '';
971 }
972 # simplify some
973 if( $min == 0 && $max eq '1' ){
974 $res .= '?';
975 } elsif( $min == 1 && "$com$max" eq ',' ){
976 $res .= '+';
977 } elsif( $min == 0 && "$com$max" eq ',' ){
978 $res .= '*';
979 } else {
980 $res .= "{$min$com$max}";
981 }
982 } else {
983 Warn( "invalid repeat clause `\\{$rep\\}'", $fl );
984 return undef();
985 }
986
987 } elsif( $nc =~ /^[1-9]$/ ){
988 ## \1 .. \9 => \1 .. \9, but check for a following digit
989 if( $nc > $backref ){
990 Warn( "invalid backreference ($nc)", $fl );
991 return undef();
992 }
993 $res .= "\\$nc";
994 if( peek($pat,$ic) =~ /[0-9]/ ){
995 $res .= '(?:)';
996 }
997
998 } elsif( $useEXTBRE && ( $nc =~ /[$useEXTBRE]/ ) ){
999 ## extensions - at most <>wWyB - not in POSIX
1000 if( $nc eq '<' ){ ## \< => \b(?=\w), be precise
1001 $res .= '\\b(?<=\\W)';
1002 } elsif( $nc eq '>' ){ ## \> => \b(?=\W), be precise
1003 $res .= '\\b(?=\\W)';
1004 } elsif( $nc eq 'y' ){ ## \y => \b
1005 $res .= '\\b';
1006 } else { ## \B, \w, \W remain the same
1007 $res .= "\\$nc";
1008 }
1009 } elsif( $nc eq $led ){
1010 ## \<closing bracketing-delimiter> - keep '\'
1011 $res .= "\\$nc";
1012
1013 } else { ## \ <char> => <char> ("as if `\' were not present")
1014 $res .= $nc;
1015 }
1016
1017 } elsif( $c eq '.' ){ ## . => .
1018 $res .= $c;
1019
1020 } elsif( $c eq '*' ){ ## * => * but \* if there's nothing preceding it
1021 if( $res =~ /^\^?$/ ){
1022 $res .= '\\*';
1023 } elsif( substr( $res, -1, 1 ) ne '*' ){
1024 $res .= $c;
1025 }
1026
1027 } elsif( $c eq '[' ){
1028 ## parse []: [^...] [^]...] [-...]
1029 my $add = '[';
1030 if( peek($pat,$ic) eq '^' ){
1031 $ic++;
1032 $add .= '^';
1033 }
1034 my $nc = peek($pat,$ic);
1035 if( $nc eq ']' || $nc eq '-' ){
1036 $add .= $nc;
1037 $ic++;
1038 }
1039 # check that [ is not trailing
1040 if( $ic >= length( $pat ) - 1 ){
1041 Warn( "unmatched `['", $fl );
1042 return undef();
1043 }
1044 # look for [:...:] and x-y
1045 my $rstr = substr( $pat, $ic+1 );
1046 if( $rstr =~ /^((?:\[:\(\w+|[><]\):\]|[^]-](?:-[^]])?)*)/ ){
1047 my $cnt = $1;
1048 $ic += length( $cnt );
1049 $cnt =~ s/([\\\$])/\\$1/g; # `\', `$' are magic in Perl []
1050 # try some simplifications
1051 my $red = $cnt;
1052 if( $red =~ s/0-9// ){
1053 $cnt = $red.'\d';
1054 if( $red =~ s/A-Z// && $red =~ s/a-z// && $red =~ s/_// ){
1055 $cnt = $red.'\w';
1056 }
1057 }
1058 $add .= $cnt;
1059
1060 # POSIX 1003.2 has this (optional) for begin/end word
1061 $add = '\\b(?=\\W)' if $add eq '[[:<:]]';
1062 $add = '\\b(?<=\\W)' if $add eq '[[:>:]]';
1063
1064 }
1065
1066 ## may have a trailing `-' before `]'
1067 if( $ic < length($pat) - 1 &&
1068 substr( $pat, $ic+1 ) =~ /^(-?])/ ){
1069 $ic += length( $1 );
1070 $add .= $1;
1071 # another simplification
1072 $add =~ s/^\[(\^?)(\\[dw])]$/ $1 eq '^' ? uc($2) : $2 /e;
1073 $res .= $add;
1074 } else {
1075 Warn( "unmatched `['", $fl );
1076 return undef();
1077 }
1078
1079 } elsif( $c eq $led ){ ## unescaped <closing bracketing-delimiter>
1080 $res .= "\\$c";
1081
1082 } elsif( $c eq ']' ){ ## unmatched ] is not magic
1083 $res .= ']';
1084
1085 } elsif( $c =~ /[|+?{}()]/ ){ ## not magic in BRE, but in Perl: \-quote
1086 $res .= "\\$c";
1087
1088 } elsif( $c eq '^' ){ ## not magic unless 1st, but in Perl: \-quote
1089 $res .= length( $res ) ? '\\^' : '^';
1090
1091 } elsif( $c eq '$' ){ ## not magic unless last, but in Perl: \-quote
1092 $res .= $ic == length( $pat ) - 1 ? '$' : '\\$';
1093
1094 } else {
1095 $res .= $c;
1096 }
1097 }
1098
1099 if( $parlev ){
1100 Warn( "unmatched `\\('", $fl );
1101 return undef();
1102 }
1103
1104 # final cleanup: eliminate raw HTs
1105 $res =~ s/\t/\\t/g;
1106 return $del . $res . ( $led ? $led : $del );
1107}
1108
1109
1110#####
1111# sub2p - convert sed substitution to Perl substitution
1112#
1113sub sub2p($$$){
1114 my( $del, $subst, $fl ) = @_;
1115 my $led = $del;
1116 $led =~ tr/{([</})]>/;
1117 $led = '' if $led eq $del;
1118
1119 $subst = substr( $subst, 1, length($subst) - 2 );
1120 my $res = '';
1121
1122 for( my $ic = 0; $ic < length( $subst ); $ic++ ){
1123 my $c = substr( $subst, $ic, 1 );
1124 if( $c eq '\\' ){
1125 ### backslash escapes
1126 my $nc = peek($subst,$ic);
1127 if( $nc eq '' ){
1128 Warn( "`\\' cannot be last in substitution", $fl );
1129 return undef();
1130 }
1131 $ic++;
1132 if( $nc =~ /[\\$del$led]/ ){ ## \ and delimiter
1133 $res .= '\\' . $nc;
1134 } elsif( $nc =~ /[1-9]/ ){ ## \1 - \9 => ${1} - ${9}
1135 $res .= '${' . $nc . '}';
1136 } else { ## everything else (includes &): omit \
1137 $res .= $nc;
1138 }
1139 } elsif( $c eq '&' ){ ## & => $&
1140 $res .= '$&';
1141 } elsif( $c =~ /[\$\@$led]/ ){ ## magic in Perl's substitution string
1142 $res .= '\\' . $c;
1143 } else {
1144 $res .= $c;
1145 }
1146 }
1147
1148 # final cleanup: eliminate raw HTs
1149 $res =~ s/\t/\\t/g;
1150 return ( $led ? $del : $led ) . $res . ( $led ? $led : $del );
1151}
1152
1153
1154sub Parse(){
1155 my $error = 0;
1156 my( $pdef, $pfil, $plin );
1157 for( my $icom = 0; $icom < @Commands; $icom++ ){
1158 my $cmd = $Commands[$icom];
1159 print "Parse:$cmd:\n" if $useDEBUG;
1160 $cmd =~ s/^\s+//;
1161 next unless length( $cmd );
1162 my $scom = $icom;
1163 if( exists( $Defined{$icom} ) ){
1164 $pdef = $Defined{$icom};
1165 if( $pdef =~ /^ #(\d+)/ ){
1166 $pfil = 'expression #';
1167 $plin = $1;
1168 } else {
1169 $pfil = "$pdef l.";
1170 $plin = 1;
1171 }
1172 } else {
1173 $plin++;
1174 }
1175 my $fl = "$pfil$plin";
1176
1177 # insert command as comment in gnerated code
1178 #
1179 $Code .= "# $cmd\n" if $doGenerate;
1180
1181 # The Address(es)
1182 #
1183 my( $negated, $naddr, $addr1, $addr2 );
1184 $naddr = 0;
1185 if( $cmd =~ s/^(\d+)\s*// ){
1186 $addr1 = "$1"; $naddr++;
1187 } elsif( $cmd =~ s/^\$\s*// ){
1188 $addr1 = 'eofARGV()'; $naddr++;
1189 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1190 my $del = $1;
1191 my $regex = stripRegex( $del, \$cmd );
1192 if( defined( $regex ) ){
1193 $addr1 = 'm '.bre2p( $del, $regex, $fl ).'s';
1194 $naddr++;
1195 } else {
1196 Warn( "malformed regex, 1st address", $fl );
1197 $error++;
1198 next;
1199 }
1200 }
1201 if( defined( $addr1 ) && $cmd =~ s/,\s*// ){
1202 if( $cmd =~ s/^(\d+)\s*// ){
1203 $addr2 = "$1"; $naddr++;
1204 } elsif( $cmd =~ s/^\$\s*// ){
1205 $addr2 = 'eofARGV()'; $naddr++;
1206 } elsif( $cmd =~ s{^(/)}{} || $cmd =~ s{^\\(.)}{} ){
1207 my $del = $1;
1208 my $regex = stripRegex( $del, \$cmd );
1209 if( defined( $regex ) ){
1210 $addr2 = 'm '. bre2p( $del, $regex, $fl ).'s';
1211 $naddr++;
1212 } else {
1213 Warn( "malformed regex, 2nd address", $fl );
1214 $error++;
1215 next;
1216 }
1217 } else {
1218 Warn( "invalid address after `,'", $fl );
1219 $error++;
1220 next;
1221 }
1222 }
1223
1224 # address modifier `!'
1225 #
1226 $negated = $cmd =~ s/^!\s*//;
1227 if( defined( $addr1 ) ){
1228 print "Parse: addr1=$addr1" if $useDEBUG;
1229 if( defined( $addr2 ) ){
1230 print ", addr2=$addr2 " if $useDEBUG;
1231 # both numeric and addr1 > addr2 => eliminate addr2
1232 undef( $addr2 ) if $addr1 =~ /^\d+$/ &&
1233 $addr2 =~ /^\d+$/ && $addr1 > $addr2;
1234 }
1235 }
1236 print 'negated' if $useDEBUG && $negated;
1237 print " command:$cmd\n" if $useDEBUG;
1238
1239 # The Command
1240 #
1241 if( $cmd !~ s/^([:#={}abcdDgGhHilnNpPqrstwxy])\s*// ){
1242 my $h = substr( $cmd, 0, 1 );
1243 Warn( "unknown command `$h'", $fl );
1244 $error++;
1245 next;
1246 }
1247 my $key = $1;
1248
1249 my $tabref = $ComTab{$key};
1250 $GenKey{$key} = 1;
1251 if( $naddr > $tabref->[0] ){
1252 Warn( "excess address(es)", $fl );
1253 $error++;
1254 next;
1255 }
1256
1257 my $arg = '';
1258 if( $tabref->[1] eq 'str' ){
1259 # take remainder - don't care if it is empty
1260 $arg = $cmd;
1261 $cmd = '';
1262
1263 } elsif( $tabref->[1] eq 'txt' ){
1264 # multi-line text
1265 my $goon = $cmd =~ /(.*)\\$/;
1266 if( length( $1 ) ){
1267 Warn( "extra characters after command ($cmd)", $fl );
1268 $error++;
1269 }
1270 while( $goon ){
1271 $icom++;
1272 if( $icom > $#Commands ){
1273 Warn( "unexpected end of script", $fl );
1274 $error++;
1275 last;
1276 }
1277 $cmd = $Commands[$icom];
1278 $Code .= "# $cmd\n" if $doGenerate;
1279 $goon = $cmd =~ s/\\$//;
1280 $cmd =~ s/\\(.)/$1/g;
1281 $arg .= "\n" if length( $arg );
1282 $arg .= $cmd;
1283 }
1284 $arg .= "\n" if length( $arg );
1285 $cmd = '';
1286
1287 } elsif( $tabref->[1] eq 'sub' ){
1288 # s///
1289 if( ! length( $cmd ) ){
1290 Warn( "`s' command requires argument", $fl );
1291 $error++;
1292 next;
1293 }
1294 if( $cmd =~ s{^([^\\\n])}{} ){
1295 my $del = $1;
1296 my $regex = stripRegex( $del, \$cmd );
1297 if( ! defined( $regex ) ){
1298 Warn( "malformed regular expression", $fl );
1299 $error++;
1300 next;
1301 }
1302 $regex = bre2p( $del, $regex, $fl );
1303
1304 # a trailing \ indicates embedded NL (in replacement string)
1305 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1306 $icom++;
1307 if( $icom > $#Commands ){
1308 Warn( "unexpected end of script", $fl );
1309 $error++;
1310 last;
1311 }
1312 $cmd .= $Commands[$icom];
1313 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1314 }
1315
1316 my $subst = stripRegex( $del, \$cmd );
1317 if( ! defined( $regex ) ){
1318 Warn( "malformed substitution expression", $fl );
1319 $error++;
1320 next;
1321 }
1322 $subst = sub2p( $del, $subst, $fl );
1323
1324 # parse s/// modifier: g|p|0-9|w <file>
1325 my( $global, $nmatch, $print, $write ) =
1326 ( '', '', 0, undef );
1327 while( $cmd =~ s/^([gp0-9])// ){
1328 $1 eq 'g' ? ( $global = 'g' ) :
1329 $1 eq 'p' ? ( $print = $1 ) : ( $nmatch .= $1 );
1330 }
1331 $write = $1 if $cmd =~ s/w\s*(.*)$//;
1332 ### $nmatch =~ s/^(\d)\1*$/$1/; ### may be dangerous?
1333 if( $global && length( $nmatch ) || length( $nmatch ) > 1 ){
1334 Warn( "conflicting flags `$global$nmatch'", $fl );
1335 $error++;
1336 next;
1337 }
1338
1339 $arg = makes( $regex, $subst,
1340 $write, $global, $print, $nmatch, $fl );
1341 if( ! defined( $arg ) ){
1342 $error++;
1343 next;
1344 }
1345
1346 } else {
1347 Warn( "improper delimiter in s command", $fl );
1348 $error++;
1349 next;
1350 }
1351
1352 } elsif( $tabref->[1] eq 'tra' ){
1353 # y///
1354 # a trailing \ indicates embedded newline
1355 while( $cmd =~ s/(?<!\\)\\$/\n/ ){
1356 $icom++;
1357 if( $icom > $#Commands ){
1358 Warn( "unexpected end of script", $fl );
1359 $error++;
1360 last;
1361 }
1362 $cmd .= $Commands[$icom];
1363 $Code .= "# $Commands[$icom]\n" if $doGenerate;
1364 }
1365 if( ! length( $cmd ) ){
1366 Warn( "`y' command requires argument", $fl );
1367 $error++;
1368 next;
1369 }
1370 my $d = substr( $cmd, 0, 1 ); $cmd = substr( $cmd, 1 );
1371 if( $d eq '\\' ){
1372 Warn( "`\\' not valid as delimiter in `y' command", $fl );
1373 $error++;
1374 next;
1375 }
1376 my $fr = stripTrans( $d, \$cmd );
1377 if( ! defined( $fr ) || ! length( $cmd ) ){
1378 Warn( "malformed `y' command argument", $fl );
1379 $error++;
1380 next;
1381 }
1382 my $to = stripTrans( $d, \$cmd );
1383 if( ! defined( $to ) ){
1384 Warn( "malformed `y' command argument", $fl );
1385 $error++;
1386 next;
1387 }
1388 if( length($fr) != length($to) ){
1389 Warn( "string lengths in `y' command differ", $fl );
1390 $error++;
1391 next;
1392 }
1393 if( ! defined( $arg = makey( $fr, $to, $fl ) ) ){
1394 $error++;
1395 next;
1396 }
1397
1398 }
1399
1400 # $cmd must be now empty - exception is {
1401 if( $cmd !~ /^\s*$/ ){
1402 if( $key eq '{' ){
1403 # dirty hack to process command on '{' line
1404 $Commands[$icom--] = $cmd;
1405 } else {
1406 Warn( "extra characters after command ($cmd)", $fl );
1407 $error++;
1408 next;
1409 }
1410 }
1411
1412 # Make Code
1413 #
1414 if( &{$tabref->[2]}( $addr1, $addr2, $negated,
1415 $tabref->[3], $arg, $fl ) ){
1416 $error++;
1417 }
1418 }
1419
1420 while( @BlockStack ){
1421 my $bl = pop( @BlockStack );
1422 Warn( "start of unterminated `{'", $bl );
1423 $error++;
1424 }
1425
1426 for my $lab ( keys( %Label ) ){
1427 if( ! exists( $Label{$lab}{defined} ) ){
1428 for my $used ( @{$Label{$lab}{used}} ){
1429 Warn( "undefined label `$lab'", $used );
1430 $error++;
1431 }
1432 }
1433 }
1434
1435 exit( 1 ) if $error;
1436}
1437
1438
1439##############
1440#### MAIN ####
1441##############
1442
1443sub usage(){
1444 print STDERR "Usage: sed [-an] command [file...]\n";
1445 print STDERR " [-an] [-e command] [-f script-file] [file...]\n";
1446}
1447
1448###################
1449# Here we go again...
1450#
1451my $expr = 0;
1452while( @ARGV && $ARGV[0] =~ /^-(.)(.*)$/ ){
1453 my $opt = $1;
1454 my $arg = $2;
1455 shift( @ARGV );
1456 if( $opt eq 'e' ){
1457 if( length( $arg ) ){
1458 push( @Commands, split( "\n", $arg ) );
1459 } elsif( @ARGV ){
1460 push( @Commands, shift( @ARGV ) );
1461 } else {
1462 Warn( "option -e requires an argument" );
1463 usage();
1464 exit( 1 );
1465 }
1466 $expr++;
1467 $Defined{$#Commands} = " #$expr";
1468 next;
1469 }
1470 if( $opt eq 'f' ){
1471 my $path;
1472 if( length( $arg ) ){
1473 $path = $arg;
1474 } elsif( @ARGV ){
1475 $path = shift( @ARGV );
1476 } else {
1477 Warn( "option -f requires an argument" );
1478 usage();
1479 exit( 1 );
1480 }
1481 my $fst = $#Commands + 1;
1482 open( SCRIPT, "<$path" ) || die( "$0: $path: could not open ($!)\n" );
1483 my $cmd;
1484 while( defined( $cmd = <SCRIPT> ) ){
1485 chomp( $cmd );
1486 push( @Commands, $cmd );
1487 }
1488 close( SCRIPT );
1489 if( $#Commands >= $fst ){
1490 $Defined{$fst} = "$path";
1491 }
1492 next;
1493 }
1494 if( $opt eq '-' && $arg eq '' ){
1495 last;
1496 }
1497 if( $opt eq 'h' || $opt eq '?' ){
1498 usage();
1499 exit( 0 );
1500 }
1501 if( $opt eq 'n' ){
1502 $doAutoPrint = 0;
1503 } elsif( $opt eq 'a' ){
1504 $doOpenWrite = 0;
1505 } else {
1506 Warn( "illegal option `$opt'" );
1507 usage();
1508 exit( 1 );
1509 }
1510 if( length( $arg ) ){
1511 unshift( @ARGV, "-$arg" );
1512 }
1513}
1514
1515# A singleton command may be the 1st argument when there are no options.
1516#
1517if( @Commands == 0 ){
1518 if( @ARGV == 0 ){
1519 Warn( "no script command given" );
1520 usage();
1521 exit( 1 );
1522 }
1523 push( @Commands, split( "\n", shift( @ARGV ) ) );
1524 $Defined{0} = ' #1';
1525}
1526
1527print STDERR "Files: @ARGV\n" if $useDEBUG;
1528
1529# generate leading code
1530#
1531$Func = <<'[TheEnd]';
1532
1533# openARGV: open 1st input file
1534#
1535sub openARGV(){
1536 unshift( @ARGV, '-' ) unless @ARGV;
1537 my $file = shift( @ARGV );
1538 open( ARG, "<$file" )
1539 || die( "$0: can't open $file for reading ($!)\n" );
1540 $isEOF = 0;
1541}
1542
1543# getsARGV: Read another input line into argument (default: $_).
1544# Move on to next input file, and reset EOF flag $isEOF.
1545sub getsARGV(;\$){
1546 my $argref = @_ ? shift() : \$_;
1547 while( $isEOF || ! defined( $$argref = <ARG> ) ){
1548 close( ARG );
1549 return 0 unless @ARGV;
1550 my $file = shift( @ARGV );
1551 open( ARG, "<$file" )
1552 || die( "$0: can't open $file for reading ($!)\n" );
1553 $isEOF = 0;
1554 }
1555 1;
1556}
1557
1558# eofARGV: end-of-file test
1559#
1560sub eofARGV(){
1561 return @ARGV == 0 && ( $isEOF = eof( ARG ) );
1562}
1563
1564# makeHandle: Generates another file handle for some file (given by its path)
1565# to be written due to a w command or an s command's w flag.
1566sub makeHandle($){
1567 my( $path ) = @_;
1568 my $handle;
1569 if( ! exists( $wFiles{$path} ) || $wFiles{$path} eq '' ){
1570 $handle = $wFiles{$path} = gensym();
1571 if( $doOpenWrite ){
1572 if( ! open( $handle, ">$path" ) ){
1573 die( "$0: can't open $path for writing: ($!)\n" );
1574 }
1575 }
1576 } else {
1577 $handle = $wFiles{$path};
1578 }
1579 return $handle;
1580}
1581
1582# printQ: Print queued output which is either a string or a reference
1583# to a pathname.
1584sub printQ(){
1585 for my $q ( @Q ){
1586 if( ref( $q ) ){
1587 # flush open w files so that reading this file gets it all
1588 if( exists( $wFiles{$$q} ) && $wFiles{$$q} ne '' ){
1589 open( $wFiles{$$q}, ">>$$q" );
1590 }
1591 # copy file to stdout: slow, but safe
1592 if( open( RF, "<$$q" ) ){
1593 while( defined( my $line = <RF> ) ){
1594 print $line;
1595 }
1596 close( RF );
1597 }
1598 } else {
1599 print $q;
1600 }
1601 }
1602 undef( @Q );
1603}
1604
1605[TheEnd]
1606
1607# generate the sed loop
1608#
1609$Code .= <<'[TheEnd]';
1610sub openARGV();
1611sub getsARGV(;\$);
1612sub eofARGV();
1613sub printQ();
1614
1615# Run: the sed loop reading input and applying the script
1616#
1617sub Run(){
1618 my( $h, $icnt, $s, $n );
1619 # hack (not unbreakable :-/) to avoid // matching an empty string
1620 my $z = "\000"; $z =~ /$z/;
1621 # Initialize.
1622 openARGV();
1623 $Hold = '';
1624 $CondReg = 0;
1625 $doPrint = $doAutoPrint;
1626CYCLE:
1627 while( getsARGV() ){
1628 chomp();
1629 $CondReg = 0; # cleared on t
1630BOS:;
1631[TheEnd]
1632
1633 # parse - avoid opening files when doing s2p
1634 #
1635 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1636 if $doGenerate;
1637 Parse();
1638 ( $svOpenWrite, $doOpenWrite ) = ( $doOpenWrite, $svOpenWrite )
1639 if $doGenerate;
1640
1641 # append trailing code
1642 #
1643 $Code .= <<'[TheEnd]';
1644EOS: if( $doPrint ){
1645 print $_, "\n";
1646 } else {
1647 $doPrint = $doAutoPrint;
1648 }
1649 printQ() if @Q;
1650 }
1651
1652 exit( 0 );
1653}
1654[TheEnd]
1655
1656
1657# append optional functions, prepend prototypes
1658#
1659my $Proto = "# prototypes\n";
1660if( $GenKey{'l'} ){
1661 $Proto .= "sub _l();\n";
1662 $Func .= <<'[TheEnd]';
1663# _l: l command processing
1664#
1665sub _l(){
1666 my $h = $_;
1667 my $mcpl = 70;
1668 # transform non printing chars into escape notation
1669 $h =~ s/\\/\\\\/g;
1670 if( $h =~ /[^[:print:]]/ ){
1671 $h =~ s/\a/\\a/g;
1672 $h =~ s/\f/\\f/g;
1673 $h =~ s/\n/\\n/g;
1674 $h =~ s/\t/\\t/g;
1675 $h =~ s/\r/\\r/g;
1676 $h =~ s/\e/\\e/g;
1677 $h =~ s/([^[:print:]])/sprintf("\\%03o", ord($1))/ge;
1678 }
1679 # split into lines of length $mcpl
1680 while( length( $h ) > $mcpl ){
1681 my $l = substr( $h, 0, $mcpl-1 );
1682 $h = substr( $h, $mcpl );
1683 # remove incomplete \-escape from end of line
1684 if( $l =~ s/(?<!\\)(\\[0-7]{0,2})$// ){
1685 $h = $1 . $h;
1686 }
1687 print $l, "\\\n";
1688 }
1689 print "$h\$\n";
1690}
1691
1692[TheEnd]
1693}
1694
1695if( $GenKey{'r'} ){
1696 $Proto .= "sub _r(\$);\n";
1697 $Func .= <<'[TheEnd]';
1698# _r: r command processing: Save a reference to the pathname.
1699#
1700sub _r($){
1701 my $path = shift();
1702 push( @Q, \$path );
1703}
1704
1705[TheEnd]
1706}
1707
1708if( $GenKey{'t'} ){
1709 $Proto .= "sub _t();\n";
1710 $Func .= <<'[TheEnd]';
1711# _t: t command - condition register test/reset
1712#
1713sub _t(){
1714 my $res = $CondReg;
1715 $CondReg = 0;
1716 $res;
1717}
1718
1719[TheEnd]
1720}
1721
1722if( $GenKey{'w'} ){
1723 $Proto .= "sub _w(\$);\n";
1724 $Func .= <<'[TheEnd]';
1725# _w: w command and s command's w flag - write to file
1726#
1727sub _w($){
1728 my $path = shift();
1729 my $handle = $wFiles{$path};
1730 if( ! $doOpenWrite && ! defined( fileno( $handle ) ) ){
1731 open( $handle, ">$path" )
1732 || die( "$0: $path: cannot open ($!)\n" );
1733 }
1734 print $handle $_, "\n";
1735}
1736
1737[TheEnd]
1738}
1739
1740$Code = $Proto . $Code;
1741
1742# magic "#n" - same as -n option
1743#
1744$doAutoPrint = 0 if substr( $Commands[0], 0, 2 ) eq '#n';
1745
1746# eval code - check for errors
1747#
1748print "Code:\n$Code$Func" if $useDEBUG;
1749eval $Code . $Func;
1750if( $@ ){
1751 print "Code:\n$Code$Func";
1752 die( "$0: internal error - generated incorrect Perl code: $@\n" );
1753}
1754
1755if( $doGenerate ){
1756
1757 # write full Perl program
1758 #
1759
1760 # bang line, declarations, prototypes
1761 print <<TheEnd;
1762#!$perlpath -w
1763eval 'exec $perlpath -S \$0 \${1+"\$@"}'
1764 if 0;
1765\$0 =~ s/^.*?(\\w+)\[\\.\\w+\]*\$/\$1/;
1766
1767use strict;
1768use Symbol;
1769use vars qw{ \$isEOF \$Hold \%wFiles \@Q \$CondReg
1770 \$doAutoPrint \$doOpenWrite \$doPrint };
1771\$doAutoPrint = $doAutoPrint;
1772\$doOpenWrite = $doOpenWrite;
1773TheEnd
1774
1775 my $wf = "'" . join( "', '", keys( %wFiles ) ) . "'";
1776 if( $wf ne "''" ){
1777 print <<TheEnd;
1778sub makeHandle(\$);
1779for my \$p ( $wf ){
1780 exit( 1 ) unless makeHandle( \$p );
1781}
1782TheEnd
1783 }
1784
1785 print $Code;
1786 print "Run();\n";
1787 print $Func;
1788 exit( 0 );
1789
1790} else {
1791
1792 # execute: make handles (and optionally open) all w files; run!
1793 for my $p ( keys( %wFiles ) ){
1794 exit( 1 ) unless makeHandle( $p );
1795 }
1796 Run();
1797}
1798
1799
1800=head1 ENVIRONMENT
1801
1802The environment variable C<PSEDEXTBRE> may be set to extend BREs.
1803See L<"Additional Atoms">.
1804
1805=head1 DIAGNOSTICS
1806
1807=over 4
1808
1809=item ambiguos translation for character `%s' in `y' command
1810
1811The indicated character appears twice, with different translations.
1812
1813=item `[' cannot be last in pattern
1814
1815A `[' in a BRE indicates the beginning of a I<bracket expression>.
1816
1817=item `\' cannot be last in pattern
1818
1819A `\' in a BRE is used to make the subsequent character literal.
1820
1821=item `\' cannot be last in substitution
1822
1823A `\' in a subsitution string is used to make the subsequent character literal.
1824
1825=item conflicting flags `%s'
1826
1827In an B<s> command, either the `g' flag and an n-th occurrence flag, or
1828multiple n-th occurrence flags are specified. Note that only the digits
1829`1' through `9' are permitted.
1830
1831=item duplicate label %s (first defined at %s)
1832
1833=item excess address(es)
1834
1835The command has more than the permitted number of addresses.
1836
1837=item extra characters after command (%s)
1838
1839=item illegal option `%s'
1840
1841=item improper delimiter in s command
1842
1843The BRE and substitution may not be delimited with `\' or newline.
1844
1845=item invalid address after `,'
1846
1847=item invalid backreference (%s)
1848
1849The specified backreference number exceeds the number of backreferences
1850in the BRE.
1851
1852=item invalid repeat clause `\{%s\}'
1853
1854The repeat clause does not contain a valid integer value, or pair of
1855values.
1856
1857=item malformed regex, 1st address
1858
1859=item malformed regex, 2nd address
1860
1861=item malformed regular expression
1862
1863=item malformed substitution expression
1864
1865=item malformed `y' command argument
1866
1867The first or second string of a B<y> command is syntactically incorrect.
1868
1869=item maximum less than minimum in `\{%s\}'
1870
1871=item no script command given
1872
1873There must be at least one B<-e> or one B<-f> option specifying a
1874script or script file.
1875
1876=item `\' not valid as delimiter in `y' command
1877
1878=item option -e requires an argument
1879
1880=item option -f requires an argument
1881
1882=item `s' command requires argument
1883
1884=item start of unterminated `{'
1885
1886=item string lengths in `y' command differ
1887
1888The translation table strings in a B<y> commanf must have equal lengths.
1889
1890=item undefined label `%s'
1891
1892=item unexpected `}'
1893
1894A B<}> command without a preceding B<{> command was encountered.
1895
1896=item unexpected end of script
1897
1898The end of the script was reached although a text line after a
1899B<a>, B<c> or B<i> command indicated another line.
1900
1901=item unknown command `%s'
1902
1903=item unterminated `['
1904
1905A BRE contains an unterminated bracket expression.
1906
1907=item unterminated `\('
1908
1909A BRE contains an unterminated backreference.
1910
1911=item `\{' without closing `\}'
1912
1913A BRE contains an unterminated bounds specification.
1914
1915=item `\)' without preceding `\('
1916
1917=item `y' command requires argument
1918
1919=back
1920
1921=head1 EXAMPLE
1922
1923The basic material for the preceding section was generated by running
1924the sed script
1925
1926 #no autoprint
1927 s/^.*Warn( *"\([^"]*\)".*$/\1/
1928 t process
1929 b
1930 :process
1931 s/$!/%s/g
1932 s/$[_[:alnum:]]\{1,\}/%s/g
1933 s/\\\\/\\/g
1934 s/^/=item /
1935 p
1936
1937on the program's own text, and piping the output into C<sort -u>.
1938
1939
1940=head1 SED SCRIPT TRANSLATION
1941
1942If this program is invoked with the name F<s2p> it will act as a
1943sed-to-Perl translator. After option processing (all other
1944arguments are ignored), a Perl program is printed on standard
1945output, which will process the input stream (as read from all
1946arguments) in the way defined by the sed script and the option setting
1947used for the translation.
1948
1949=head1 SEE ALSO
1950
1951perl(1), re_format(7)
1952
1953=head1 BUGS
1954
1955The B<l> command will show escape characters (ESC) as `C<\e>', but
1956a vertical tab (VT) in octal.
1957
1958Trailing spaces are truncated from labels in B<:>, B<t> and B<b> commands.
1959
1960The meaning of an empty regular expression (`C<//>'), as defined by B<sed>,
1961is "the last pattern used, at run time". This deviates from the Perl
1962interpretation, which will re-use the "last last successfully executed
1963regular expression". Since keeping track of pattern usage would create
1964terribly cluttered code, and differences would only appear in obscure
1965context (where other B<sed> implementations appear to deviate, too),
1966the Perl semantics was adopted. Note that common usage of this feature,
1967such as in C</abc/s//xyz/>, will work as expected.
1968
1969Collating elements (of bracket expressions in BREs) are not implemented.
1970
1971=head1 STANDARDS
1972
1973This B<sed> implementation conforms to the IEEE Std1003.2-1992 ("POSIX.2")
1974definition of B<sed>, and is compatible with the I<OpenBSD>
1975implementation, except where otherwise noted (see L<"BUGS">).
1976
1977=head1 AUTHOR
1978
1979This Perl implementation of I<sed> was written by Wolfgang Laun,
1980I<Wolfgang.Laun@alcatel.at>.
1981
1982=head1 COPYRIGHT and LICENSE
1983
1984This program is free and open software. You may use, modify,
1985distribute, and sell this program (and any modified variants) in any
1986way you wish, provided you do not restrict others from doing the same.
1987
1988=cut
1989