Commit | Line | Data |
---|---|---|
ac7e13d8 | 1 | # @(#)POSIX 5.6 (Berkeley) %G% |
c8efee25 | 2 | |
9ff62da6 KB |
3 | Comments on the IEEE P1003.2 Draft 12 |
4 | Part 2: Shell and Utilities | |
5 | Section 4.55: sed - Stream editor | |
c8efee25 | 6 | |
9ff62da6 KB |
7 | Diomidis Spinellis <dds@doc.ic.ac.uk> |
8 | Keith Bostic <bostic@cs.berkeley.edu> | |
86cf068c | 9 | |
9ff62da6 KB |
10 | In the following paragraphs, "wrong" usually means "inconsistent with |
11 | historic practice", as most of the following comments refer to | |
12 | undocumented inconsistencies between the historical versions of sed and | |
13 | the POSIX 1003.2 standard. All the comments are notes taken while | |
14 | implementing a POSIX-compatible version of sed, and should not be | |
15 | interpreted as official opinions or criticism towards the POSIX committee. | |
16 | All uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. | |
c8efee25 | 17 | |
9ff62da6 KB |
18 | 1. Historic implementations of sed strip the text arguments of the |
19 | a, c and i commands of their initial blanks, i.e. | |
c8efee25 KB |
20 | |
21 | #!/bin/sed -f | |
22 | a\ | |
23 | foo\ | |
24 | bar | |
25 | ||
86cf068c | 26 | produces: |
c8efee25 KB |
27 | |
28 | foo | |
29 | bar | |
30 | ||
9ff62da6 KB |
31 | POSIX does not specify this behavior. This implementation follows |
32 | historic practice. | |
c8efee25 | 33 | |
cc266a68 | 34 | 2. Historical versions of sed required that the w flag be the last |
9ff62da6 KB |
35 | flag to an s command as it takes an additional argument. This |
36 | is obvious, but not specified in POSIX. | |
86cf068c | 37 | |
cc266a68 | 38 | 3. Historical versions of sed required that whitespace follow a w |
9ff62da6 KB |
39 | flag to an s command. This is not specified in POSIX. This |
40 | implementation permits whitespace but does not require it. | |
86cf068c | 41 | |
cc266a68 | 42 | 4. Historical versions of sed permitted any number of whitespace |
9ff62da6 KB |
43 | characters to follow the w command. This is not specified in |
44 | POSIX. This implementation permits whitespace but does not | |
45 | require it. | |
46 | ||
cc266a68 | 47 | 5. The rule for the l command differs from historic practice. Table |
9ff62da6 KB |
48 | 2-15 includes the various ANSI C escape sequences, including \\ |
49 | for backslash. Some historical versions of sed displayed two | |
cc266a68 KB |
50 | digit octal numbers, too, not three as specified by POSIX. POSIX |
51 | is a cleanup, and is followed by this implementation. | |
86cf068c | 52 | |
cc266a68 | 53 | 6. The POSIX specification for ! does not specify that for a single |
86cf068c | 54 | command the command must not contain an address specification |
cc266a68 KB |
55 | whereas the command list can contain address specifications. The |
56 | specification for ! implies that "3!/hello/p" works, and it never | |
ac7e13d8 KB |
57 | has, historically. Note, |
58 | ||
59 | 3!{ | |
60 | /hello/p | |
61 | } | |
62 | ||
63 | does work. | |
cc266a68 KB |
64 | |
65 | 7. POSIX does not specify what happens with consecutive ! commands | |
9ff62da6 KB |
66 | (e.g. /foo/!!!p). Historic implementations allow any number of |
67 | !'s without changing the behaviour. (It seems logical that each | |
68 | one might reverse the behaviour.) This implementation follows | |
69 | historic practice. | |
86cf068c | 70 | |
cc266a68 | 71 | 8. Historic versions of sed permitted commands to be separated |
9ff62da6 | 72 | by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first |
86cf068c KB |
73 | three lines of a file. This is not specified by POSIX. |
74 | Note, the ; command separator is not allowed for the commands | |
75 | a, c, i, w, r, :, b, t, # and at the end of a w flag in the s | |
9ff62da6 KB |
76 | command. This implementation follows historic practice and |
77 | implements the ; separator. | |
86cf068c | 78 | |
cc266a68 | 79 | 9. Historic versions of sed terminated the script if EOF was reached |
9ff62da6 | 80 | during the execution of the 'n' command, i.e.: |
c8efee25 KB |
81 | |
82 | sed -e ' | |
83 | n | |
84 | i\ | |
85 | hello | |
86 | ' </dev/null | |
87 | ||
9ff62da6 KB |
88 | did not produce any output. POSIX does not specify this behavior. |
89 | This implementation follows historic practice. | |
c8efee25 | 90 | |
cc266a68 | 91 | 10. POSIX does not specify that the q command causes all lines that |
9ff62da6 KB |
92 | have been appended to be output and that the pattern space is |
93 | printed before exiting. This implementation follows historic | |
94 | practice. | |
86cf068c | 95 | |
cc266a68 | 96 | 11. Historical implementations do not output the change text of a c |
ac7e13d8 KB |
97 | command in the case of an address range whose first line number |
98 | is greater than the second (e.g. 3,1). POSIX requires that the | |
9ff62da6 KB |
99 | text be output. Since the historic behavior doesn't seem to have |
100 | any particular purpose, this implementation follows the POSIX | |
101 | behavior. | |
102 | ||
cc266a68 | 103 | 12. POSIX does not specify whether address ranges are checked and |
9ff62da6 KB |
104 | reset if a command is not executed due to a jump. The following |
105 | program, with the input "one\ntwo\nthree\nfour\nfive" can behave | |
106 | in different ways depending on whether the the /one/,/three/c | |
107 | command is triggered at the third line. | |
108 | ||
109 | 2,4b | |
110 | /one,/three/c\ | |
111 | append some text | |
112 | ||
113 | Historic implementations of sed, for the above example, would | |
114 | output the text after the "branch" no longer applied, but would | |
115 | then quit without further processing. This implementation has | |
116 | the more intuitive behavior of never outputting the text at all. | |
117 | This is based on the belief that it would be reasonable to want | |
118 | to output some text if the pattern /one/,/three/ occurs but only | |
119 | if it occurs outside of the range of lines 2 to 4. | |
120 | ||
cc266a68 | 121 | 13. Historical implementations allow an output suppressing #n at the |
9ff62da6 KB |
122 | beginning of -e arguments as well as in a script file. POSIX |
123 | does not specify this. This implementation follows historical | |
124 | practice. | |
86cf068c | 125 | |
cc266a68 | 126 | 14. POSIX does not explicitly specify how sed behaves if no script is |
9ff62da6 KB |
127 | specified. Since the sed Synopsis permits this form of the command, |
128 | and the language in the Description section states that the input | |
129 | is output, it seems reasonable that it behave like the cat(1) | |
130 | command. Historic sed implementations behave differently for "ls | | |
cc266a68 KB |
131 | sed", where they produce no output, and "ls | sed -e#", where they |
132 | behave like cat. This implementation behaves like cat in both cases. | |
9ff62da6 | 133 | |
cc266a68 | 134 | 15. The POSIX requirement to open all wfiles from the beginning makes |
9ff62da6 KB |
135 | sed behave nonintuitively when the w commands are preceded by |
136 | addresses or are within conditional blocks. This implementation | |
137 | follows historic practice and POSIX, by default, and provides the | |
cc266a68 | 138 | -a option which opens the files only when they are needed. |
9ff62da6 | 139 | |
cc266a68 | 140 | 16. POSIX does not specify how escape sequences other than \n and \D |
9ff62da6 | 141 | (where D is the delimiter character) are to be treated. This is |
cc266a68 KB |
142 | reasonable, however, it also doesn't state that the backslash is |
143 | to be discarded from the output regardless. A strict reading of | |
144 | POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". | |
145 | As historic sed implementations always discarded the backslash, | |
146 | this implementation does as well. | |
147 | ||
148 | 17. POSIX specifies that an address can be "empty". This implies | |
149 | that constructs like ",d" or "1,d" and ",5d" are allowed. This | |
150 | is not true for historic implementations or this implementation | |
151 | of sed. | |
152 | ||
153 | 18. The b t and : commands are documented in POSIX to ignore leading | |
9ff62da6 KB |
154 | white space, but no mention is made of trailing white space. |
155 | Historic implementations of sed assigned different locations to | |
156 | the labels "x" and "x ". This is not useful, and leads to subtle | |
cc266a68 | 157 | programming errors, but it is historic practice and changing it |
ac7e13d8 KB |
158 | could theoretically break working scripts. This implementation |
159 | follows historic practice. | |
cc266a68 KB |
160 | |
161 | 19. Although POSIX specifies that reading from files that do not exist | |
9ff62da6 KB |
162 | from within the script must not terminate the script, it does not |
163 | specify what happens if a write command fails. Historic practice | |
cc266a68 KB |
164 | is to fail immediately if the file cannot be opened or written. |
165 | This implementation follows historic practice. | |
86cf068c | 166 | |
cc266a68 | 167 | 20. Historic practice is that the \n construct can be used for either |
9ff62da6 KB |
168 | string1 or string2 of the y command. This is not specified by |
169 | POSIX. This implementation follows historic practice. | |
86cf068c | 170 | |
cc266a68 | 171 | 21. POSIX does not specify if the "Nth occurrence" of an RE in a |
9ff62da6 KB |
172 | substitute command is an overlapping or a non-overlapping one, |
173 | i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". | |
174 | Historical practice is to drop core or only do non-overlapping | |
ac7e13d8 | 175 | RE's. This implementation only does non-overlapping RE's. |
86cf068c | 176 | |
cc266a68 KB |
177 | 22. Historic implementations of sed ignore the RE delimiter characters |
178 | within character classes. This is not specified in POSIX. This | |
179 | implementation follows historic practice. | |
8ab54be7 KB |
180 | |
181 | 23. Historic implementations handle empty RE's in a special way: the | |
182 | empty RE is interpreted as if it were the last RE encountered, | |
183 | whether in an address or elsewhere. POSIX does not document this | |
184 | behavior. For example the command: | |
185 | ||
186 | sed -e /abc/s//XXX/ | |
187 | ||
188 | substitutes XXX for the pattern abc. The semantics of "the last | |
189 | RE" can be defined in two different ways: | |
190 | ||
191 | 1. The last RE encountered when compiling (lexical/static scope). | |
192 | 2. The last RE encountered while running (dynamic scope). | |
193 | ||
194 | While many historical implementations fail on programs depending | |
195 | on scope differences, the SunOS version exhibited dynamic scope | |
196 | behaviour. This implementation also uses does dynamic scoping, as | |
ac7e13d8 KB |
197 | this seems the most useful and in order to remain consistent with |
198 | historical practice. |