-# @(#)POSIX 5.1 (Berkeley) %G%
+# @(#)POSIX 5.2 (Berkeley) %G%
- Comments on the IEEE P1003.2 Draft 11.2 September 1991
+ Comments on the IEEE P1003.2 Draft 12
- Part 2: Shell and Utilities
- Section 4.55: sed - Stream editor
+ Part 2: Shell and Utilities
+ Section 4.55: sed - Stream editor
+
+ Diomidis Spinellis <dds@doc.ic.ac.uk>
In the following paragraphs, `wrong' means `inconsistent with historic
practice'. Many of the comments refer to undocumented inconsistencies
between the historical versions of sed and the POSIX standard. All the
comments are notes taken while implementing a POSIX-compatible version
of sed, and should not be interpreted as official opinions or criticism
-towards the POSIX committee. Many are insignificant, pedantic and even
+towards the POSIX committee. Some are insignificant, pedantic and even
wrong.
- Diomidis Spinellis <dds@doc.ic.ac.uk>
-
-[Some are significant and right, too. -- Keith Bostic]
-1. For the text argument of the a command it is not specified if lines are
- stripped from their initial blanks or not. There are some hints in D2
- 11335-11337 and in D2 11512-11514, but nothing concrete. Historical
- practice is to strip the blanks, i.e.:
+ 1. For the text argument of the a command it is not specified if
+ lines are stripped of their initial blanks or not. Historical
+ practice, followed in this implementation, is to strip the
+ blanks, i.e.:
#!/bin/sed -f
a\
foo\
bar
- produces:
+ produces:
foo
bar
-2. In the s command we assume that the w file is the last flag. This is
- historical practice, but not specified in the standard.
+ 2. Historical versions of sed required that the w flag must be the
+ last flag to an s command as it takes an additional argument.
+ This is not specified in the standard.
+
+ 3. Historical versions of sed required that whitespace follow a w
+ flag to an s command. This is not specified in the standard.
+ This implementation permits whitespace but does not require
+ it.
-3. In the s command the standard does not specify that a space must follow
- w. Also the standard does not specify that any number of spaces after
- the w command are allowed and removed.
+ 4. Historical versions of sed permitted any number of whitespace
+ characters to follow the w command. This is not specified in
+ the standard. This implementation permits whitespace but does
+ not require it.
-4. The specification of the a command is wrong. With the current
- specification both of these scripts should produce the same output:
+ 5. The specification of the a command is wrong. With the current
+ specification both of these scripts should produce the same
+ output:
#!/bin/sed -f
d
hello
d
-5. The specification of the c command in conjunction with the specification
- of the default operation (D2 11293-11299) is wrong. The default operation
- specifies that a newline is printed after the pattern space. This is not
- the case when the pattern space has been deleted by a c command.
-
-6. The rule for the l command differs from historic practice. Table 2-15
- includes the various escape sequences including \\. Is this meant by
- the standard? Furthermore some versions of sed print two digit octal
- numbers. Why does the standard require a three digit octal number?
- Normally the pattern space does not end with a newline. Will an implict
- \n be printed? Finaly the standard does not specify that a newline must
- follow the '$' sign (it seems logical to me).
-
-7. The specification for ! does not specify that for a single command the
- command must not contain an address specification whereas the command
- list can contain address specifications.
-
-8. The standard does not specify what happens with consequitive ! commands
- (e.g. /foo/!!!p) Current implementations allow any number of !'s without
- changing behaviour. It seems logical that each one should reverse the
- default behaviour.
-
-9. The ; command separator is not allowed for the commands a c i w r : b t
- # and at the end of a w flag in the s command.
-
-10. The standard does not specify that if an end of file occurs on the
- execution of the n command the program terminates (e.g.
+TK -- Diomidis, the current implementation looks wrong on this case.
+
+ 6. The specification of the c command in conjunction with the
+ specification of the default operation (D2 11293-11299) is
+ wrong. The default operation specifies that a newline is
+ printed after the pattern space. This is not the case when
+ the pattern space has been deleted by a c command.
+
+TK Diomidis, the spec seems right to me -- the language in 11293
+TK talks about copying the pattern space to stdout -- if the pattern space
+TK is deleted, it can't be copied.
+
+ 7. The rule for the l command differs from historic practice.
+ Table 2-15 includes the various ANSI C escape sequences,
+ including \\ for backslash. Some historical versions of
+ sed displayed two digit octal numbers. The POSIX
+ specification is a cleanup, and this implementation follows
+ to it.
+
+ 8. The specification for ! does not specify that for a single
+ command the command must not contain an address specification
+ whereas the command list can contain address specifications.
+
+TK I think this is wrong: the script:
+TK
+TK 3!p
+TK
+TK works fine. Am I misunderstanding your point?
+
+ 9. The standard does not specify what happens with consecutive
+ ! commands (e.g. /foo/!!!p). Historic implementations
+ allow any number of !'s without changing behaviour. (It
+ seems logical that each one should reverse the default
+ behaviour.) This implementation follows historic practice.
+
+10. Historic versions of sed permitted commands to be separated
+ by semi-colons, e.g. 'sed -ne '1p;2p;3q' prints the first
+ three lines of a file. This is not specified by POSIX.
+ Note, the ; command separator is not allowed for the commands
+ a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
+ command. This implementation follows historic practice.
+
+11. The standard does not specify that if EOF is reached during
+ the execution of the n command the program terminates (e.g.
sed -e '
n
hello
' </dev/null
- will not produce any output.
+ will not produce any output. This implementation follows
+ historic practice.
-11. The standard does not specify that the q command causes all lines that
- have been appended to be output and that the pattern space is printed
- before exiting.
+12. The standard does not specify that the q command causes all
+ lines that have been appended to be output and that the pattern
+ space is printed before exiting. This implementation follows
+ historic practice.
-12. Historic implementations ignore comments in the text of the i and a
- commands.
+13. Historic implementations ignore comments in the text of the i
+ and a commands. This implementation follows historic practice.
-13. The historic implementation does not consider the last line of a file
- to match $ if a null file follows:
+14. Historic implementations do not consider the last line of a
+ file to match $ if an empty file follows, e.g.
sed -n -e '$p' /usr/dict/words /dev/null
- will not print anything.
+ will not print anything. This is not mentioned in the POSIX
+ specification and is almost certainly a bug. This implementation
+ follows the POSIX specification.
+
+TK Diomidis, I think we need to fix this, can you do it?
+DDS We follow POSIX. You don't mean to do it buggy?
+TK I see... (I didn't understand that problem until now.) I think
+TK that we *should* print out the last line of the dictionary, in
+TK the above example, but I can see how it would be hard. What do
+TK you think?
-14. Historical implementations do not output the change text of a c command
- in the case of an address range whose second line number is greater than
- the first (e.g. 3,1). The standard seems to imply otherwise.
+15. Historical implementations do not output the change text
+ of a c command in the case of an address range whose second
+ line number is greater than the first (e.g. 3,1). The POSIX
+ standard requires that the text be output. Since the historic
+ behavior doesn't seem to have any particular purpose, this
+ implementation follows the POSIX behavior.
-15. Historical implementations output the c text on EVERY line not included
- in the two address range in the case of a negation '!'.
+16. Historical implementations output the c text on EVERY line not
+ included in the two address range in the case of a negation '!'.
-16. The standard does not specify that the p flag at the s command will
- write the pattern space plus a newline on the standard output
+TK Diomidis, this seems reasonable, I don't see where the standard
+TK conflicts with this.
-17. The standard does not specify whether address ranges are checked and
- reset if a command is not executed due to a jump. The following
- program can behave in two different ways depending on whether the range
- operator is reset at line 6 or not. This is important in the case of
- pattern matches.
+17. The standard does not specify that the p flag at the s command will
+ write the pattern space plus a newline on the standard output
+
+TK I think this is covered by the general language aruond 11293
+TK that says that the pattern space is always followed by a newline
+TK when output.
+
+18. The standard does not specify whether address ranges are
+ checked and reset if a command is not executed due to a
+ jump. The following program can behave in two different
+ ways depending on whether the range operator is reset at
+ line 6 or not. This is important in the case of pattern
+ matches.
sed -n -e '
4,8b
p
}'
-18. Historical implementations allow an output suppressing #n at the
- beginning of -e arguments as well.
-
-19. POSIX does not specify whether more than one numeric flag is
- allowed on the s command
-
-20. Existing versions of sed have the undocumented feature of allowing
- a semicolon to delimit commands. It is not specified in the standard.
-
-21. The standard does not specify whether a script is mandatory. The
- sed implementations I tested behave differently with ls | sed (no
- output) and ls | sed - e'' (behaves like cat).
-
-22. The requirement to open all wfiles from the beginning makes sed behave
- nonintuitively when the w commands are preceded by addresses or are
- within conditional blocks.
-
-23. The rule specified in lines 11412-11413 of the standard does not
- seem consistent with existing practice. The sed implementations I
- tested copied the rfile on standard output every time the r command was
- executed and not before reading a line of input. The wording should be
- changed to be consistent with the 'a' command i.e.
-
-24. The standard does not specify how excape sequences other than \n
- and \D (where D is the delimiter character) are to be treated. A
- strict interpretation would be that they should be treated literaly.
- In the sed implementations I have tried the \ is simply ingored.
-
-25. The standard specifies in line 11304 that an address can be empty.
- This is wrong since it implied that constructs like ,d or 1,d or ,5d
- are allowed. The sed implementation I tested do not allow them.
-
-26. The b t and : commands ignore leading white space, but not trailing
- white space. This is not specified in the standard.
-
-27. Although the standard specifies that reading from files that do not
- exist from within the script must not terminate the script; it does not
- specify what happens if a write command fails.
-
-28. In the sed implementation I tested the \n construct for newlines
- works on both strings of a y command. This is not specified in the
- standard.
-
-29. The standard does not specify if the "nth occurrence" of a regular
- expression in a substitute command is an overlapping or a
- non-overlappoin one. I.e. what is the result of s/a*/A/2 on the
- pattern "aaaaa aaaaa". (It crashes the implementation of sed I
- tested.)
-
-30. Existing implementations of sed ignore the regular expression
- delimiter characters within character classes. This is not specified
- in the standard.
+TK I don't understand this -- can you explain further?
+DDS The 1,6 operator will not be executed on line 6 (due to the 4,8b
+DDS line) and thus it will not clear. In this case you can check for
+DDS line > 6 in apply, but what if the 1,6 was /BEGIN/,/END/
+TK OK, I understand, now. Well, I think I do, anyhow. It seems to
+TK me that applies() will never see the 1,6 line under any circumstances
+TK (even if it was /BEGIN/,/END/ because for lines 4 through 8.
+TK A nastier example, as you point out, is:
+TK 2,4b
+TK /one/,/three/c\
+TK append some text
+TK
+TK The BSD sed appends the text after the "branch" no longer applies,
+TK i.e. with the input: one\ntwo\nthree\nfour\nfive\nsix it displays
+TK two\nthree\nfour\nappend some text BUT THEN IT STOPS!
+TK Our sed, of course, simply never outputs "append some text". It
+TK seems to me that our current approach is "right", because it would
+TK be possible to have:
+TK 1,4b
+TK /one/,/five/c\
+TK message
+TK
+TK where you only want to see "message" if the patterns "one" ... "five"
+TK occur, but not in lines 1 to 4. What do you think?
+
+18. Historical implementations allow an output suppressing #n at the
+ beginning of -e arguments as well. This implementation follows
+ historical practice.
+
+19. POSIX does not specify whether more than one numeric flag is
+ allowed on the s command
+
+TK What's historic practice? Currently we don't report an error or
+ do all of the flags.
+
+20. The standard does not specify whether a script is mandatory.
+ Historic sed implementations behave differently with ls | sed
+ (no output) and ls | sed - e'' (behaves like cat).
+
+TK I don't understand what 'sed - e' does (it should be illegal,
+TK right?) It seems to me that a script should be mandatory,
+TK and sed should fail with an error if not given one.
+
+21. The requirement to open all wfiles from the beginning makes sed
+ behave nonintuitively when the w commands are preceded by addresses
+ or are within conditional blocks. This implementation follows
+ historic practice, by default, and provides a flag for more
+ reasonable behavior.
+
+TK I'll put it on my TODO list... ;-}
+
+22. The rule specified in lines 11412-11413 of the standard does
+ not seem consistent with existing practice. Historic sed
+ implementations I tested copied the rfile on standard output
+ every time the r command was executed and not before reading
+ a line of input. The wording should be changed to be
+ consistent with the 'a' command i.e.
+
+TK Something got dropped, here... Can you explain furtehr what
+TK historic versoins did, what they should do, what we do?
+
+23. The standard does not specify how excape sequences other
+ than \n and \D (where D is the delimiter character) are to
+ be treated. A strict interpretation would be that they
+ should be treated literaly. In the sed implementations I
+ have tried the \ is simply ingored.
+
+TK I don't understand what you're saying, here. Can you explain?
+
+24. The standard specifies that an address can be "empty". This
+ implies that constructs like ,d or 1,d and ,5d are allowed.
+ This is not true for historic implementations of sed. This
+ implementation follows historic practice.
+
+25. The b t and : commands ignore leading white space, but not
+ trailing white space. This is not specified in the standard.
+
+TK I think that line 11347 points out the the synopsis shows
+TK which are valid.
+
+ Although the standard specifies that reading from files that
+ do not exist from within the script must not terminate the
+ script, it does not specify what happens if a write command
+ fails. Historic practice is to fail immediately if the file
+ cannot be open or written. This implementation follows that
+ practice.
+
+26. Historic practice is that the \n construct can be used for
+ either string1 or string2 of the y command. This is not
+ specified by the standard. This implementation follows
+ historic practice.
+
+29. The standard does not specify if the "nth occurrence" of a
+ regular expression in a substitute command is an overlapping
+ or a non-overlapping one, e.g. what is the result of s/a*/A/2
+ on the pattern "aaaaa aaaaa". Historical practice is to drop
+ core or do non-overlapping expressions. This implementation
+ follows historic practice.
+
+30. Historic implementations of sed ignore the regular expression
+ delimiter characters within character classes. This is not
+ specified in the standard. This implementation follows historic
+ practice.