draft 2, sent to Diomidis
[unix-history] / usr / src / usr.bin / sed / POSIX
# @(#)POSIX 5.2 (Berkeley) %G%
Comments on the IEEE P1003.2 Draft 12
Part 2: Shell and Utilities
Section 4.55: sed - Stream editor
Diomidis Spinellis <dds@doc.ic.ac.uk>
In the following paragraphs, `wrong' means `inconsistent with historic
practice'. Many of the comments refer to undocumented inconsistencies
between the historical versions of sed and the POSIX standard. All the
comments are notes taken while implementing a POSIX-compatible version
of sed, and should not be interpreted as official opinions or criticism
towards the POSIX committee. Some are insignificant, pedantic and even
wrong.
1. For the text argument of the a command it is not specified if
lines are stripped of their initial blanks or not. Historical
practice, followed in this implementation, is to strip the
blanks, i.e.:
#!/bin/sed -f
a\
foo\
bar
produces:
foo
bar
2. Historical versions of sed required that the w flag must be the
last flag to an s command as it takes an additional argument.
This is not specified in the standard.
3. Historical versions of sed required that whitespace follow a w
flag to an s command. This is not specified in the standard.
This implementation permits whitespace but does not require
it.
4. Historical versions of sed permitted any number of whitespace
characters to follow the w command. This is not specified in
the standard. This implementation permits whitespace but does
not require it.
5. The specification of the a command is wrong. With the current
specification both of these scripts should produce the same
output:
#!/bin/sed -f
d
a\
hello
#!/bin/sed -f
a\
hello
d
TK -- Diomidis, the current implementation looks wrong on this case.
6. The specification of the c command in conjunction with the
specification of the default operation (D2 11293-11299) is
wrong. The default operation specifies that a newline is
printed after the pattern space. This is not the case when
the pattern space has been deleted by a c command.
TK Diomidis, the spec seems right to me -- the language in 11293
TK talks about copying the pattern space to stdout -- if the pattern space
TK is deleted, it can't be copied.
7. The rule for the l command differs from historic practice.
Table 2-15 includes the various ANSI C escape sequences,
including \\ for backslash. Some historical versions of
sed displayed two digit octal numbers. The POSIX
specification is a cleanup, and this implementation follows
to it.
8. The specification for ! does not specify that for a single
command the command must not contain an address specification
whereas the command list can contain address specifications.
TK I think this is wrong: the script:
TK
TK 3!p
TK
TK works fine. Am I misunderstanding your point?
9. The standard does not specify what happens with consecutive
! commands (e.g. /foo/!!!p). Historic implementations
allow any number of !'s without changing behaviour. (It
seems logical that each one should reverse the default
behaviour.) This implementation follows historic practice.
10. Historic versions of sed permitted commands to be separated
by semi-colons, e.g. 'sed -ne '1p;2p;3q' prints the first
three lines of a file. This is not specified by POSIX.
Note, the ; command separator is not allowed for the commands
a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
command. This implementation follows historic practice.
11. The standard does not specify that if EOF is reached during
the execution of the n command the program terminates (e.g.
sed -e '
n
i\
hello
' </dev/null
will not produce any output. This implementation follows
historic practice.
12. The standard does not specify that the q command causes all
lines that have been appended to be output and that the pattern
space is printed before exiting. This implementation follows
historic practice.
13. Historic implementations ignore comments in the text of the i
and a commands. This implementation follows historic practice.
14. Historic implementations do not consider the last line of a
file to match $ if an empty file follows, e.g.
sed -n -e '$p' /usr/dict/words /dev/null
will not print anything. This is not mentioned in the POSIX
specification and is almost certainly a bug. This implementation
follows the POSIX specification.
TK Diomidis, I think we need to fix this, can you do it?
DDS We follow POSIX. You don't mean to do it buggy?
TK I see... (I didn't understand that problem until now.) I think
TK that we *should* print out the last line of the dictionary, in
TK the above example, but I can see how it would be hard. What do
TK you think?
15. Historical implementations do not output the change text
of a c command in the case of an address range whose second
line number is greater than the first (e.g. 3,1). The POSIX
standard requires that the text be output. Since the historic
behavior doesn't seem to have any particular purpose, this
implementation follows the POSIX behavior.
16. Historical implementations output the c text on EVERY line not
included in the two address range in the case of a negation '!'.
TK Diomidis, this seems reasonable, I don't see where the standard
TK conflicts with this.
17. The standard does not specify that the p flag at the s command will
write the pattern space plus a newline on the standard output
TK I think this is covered by the general language aruond 11293
TK that says that the pattern space is always followed by a newline
TK when output.
18. The standard does not specify whether address ranges are
checked and reset if a command is not executed due to a
jump. The following program can behave in two different
ways depending on whether the range operator is reset at
line 6 or not. This is important in the case of pattern
matches.
sed -n -e '
4,8b
s/^/XXX/p
1,6 {
p
}'
TK I don't understand this -- can you explain further?
DDS The 1,6 operator will not be executed on line 6 (due to the 4,8b
DDS line) and thus it will not clear. In this case you can check for
DDS line > 6 in apply, but what if the 1,6 was /BEGIN/,/END/
TK OK, I understand, now. Well, I think I do, anyhow. It seems to
TK me that applies() will never see the 1,6 line under any circumstances
TK (even if it was /BEGIN/,/END/ because for lines 4 through 8.
TK A nastier example, as you point out, is:
TK 2,4b
TK /one/,/three/c\
TK append some text
TK
TK The BSD sed appends the text after the "branch" no longer applies,
TK i.e. with the input: one\ntwo\nthree\nfour\nfive\nsix it displays
TK two\nthree\nfour\nappend some text BUT THEN IT STOPS!
TK Our sed, of course, simply never outputs "append some text". It
TK seems to me that our current approach is "right", because it would
TK be possible to have:
TK 1,4b
TK /one/,/five/c\
TK message
TK
TK where you only want to see "message" if the patterns "one" ... "five"
TK occur, but not in lines 1 to 4. What do you think?
18. Historical implementations allow an output suppressing #n at the
beginning of -e arguments as well. This implementation follows
historical practice.
19. POSIX does not specify whether more than one numeric flag is
allowed on the s command
TK What's historic practice? Currently we don't report an error or
do all of the flags.
20. The standard does not specify whether a script is mandatory.
Historic sed implementations behave differently with ls | sed
(no output) and ls | sed - e'' (behaves like cat).
TK I don't understand what 'sed - e' does (it should be illegal,
TK right?) It seems to me that a script should be mandatory,
TK and sed should fail with an error if not given one.
21. The requirement to open all wfiles from the beginning makes sed
behave nonintuitively when the w commands are preceded by addresses
or are within conditional blocks. This implementation follows
historic practice, by default, and provides a flag for more
reasonable behavior.
TK I'll put it on my TODO list... ;-}
22. The rule specified in lines 11412-11413 of the standard does
not seem consistent with existing practice. Historic sed
implementations I tested copied the rfile on standard output
every time the r command was executed and not before reading
a line of input. The wording should be changed to be
consistent with the 'a' command i.e.
TK Something got dropped, here... Can you explain furtehr what
TK historic versoins did, what they should do, what we do?
23. The standard does not specify how excape sequences other
than \n and \D (where D is the delimiter character) are to
be treated. A strict interpretation would be that they
should be treated literaly. In the sed implementations I
have tried the \ is simply ingored.
TK I don't understand what you're saying, here. Can you explain?
24. The standard specifies that an address can be "empty". This
implies that constructs like ,d or 1,d and ,5d are allowed.
This is not true for historic implementations of sed. This
implementation follows historic practice.
25. The b t and : commands ignore leading white space, but not
trailing white space. This is not specified in the standard.
TK I think that line 11347 points out the the synopsis shows
TK which are valid.
Although the standard specifies that reading from files that
do not exist from within the script must not terminate the
script, it does not specify what happens if a write command
fails. Historic practice is to fail immediately if the file
cannot be open or written. This implementation follows that
practice.
26. Historic practice is that the \n construct can be used for
either string1 or string2 of the y command. This is not
specified by the standard. This implementation follows
historic practice.
29. The standard does not specify if the "nth occurrence" of a
regular expression in a substitute command is an overlapping
or a non-overlapping one, e.g. what is the result of s/a*/A/2
on the pattern "aaaaa aaaaa". Historical practice is to drop
core or do non-overlapping expressions. This implementation
follows historic practice.
30. Historic implementations of sed ignore the regular expression
delimiter characters within character classes. This is not
specified in the standard. This implementation follows historic
practice.