Commit | Line | Data |
---|---|---|
9ff62da6 | 1 | # @(#)POSIX 5.3 (Berkeley) %G% |
c8efee25 | 2 | |
9ff62da6 KB |
3 | Comments on the IEEE P1003.2 Draft 12 |
4 | Part 2: Shell and Utilities | |
5 | Section 4.55: sed - Stream editor | |
c8efee25 | 6 | |
9ff62da6 KB |
7 | Diomidis Spinellis <dds@doc.ic.ac.uk> |
8 | Keith Bostic <bostic@cs.berkeley.edu> | |
86cf068c | 9 | |
9ff62da6 KB |
10 | In the following paragraphs, "wrong" usually means "inconsistent with |
11 | historic practice", as most of the following comments refer to | |
12 | undocumented inconsistencies between the historical versions of sed and | |
13 | the POSIX 1003.2 standard. All the comments are notes taken while | |
14 | implementing a POSIX-compatible version of sed, and should not be | |
15 | interpreted as official opinions or criticism towards the POSIX committee. | |
16 | All uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. | |
c8efee25 | 17 | |
9ff62da6 KB |
18 | 1. Historic implementations of sed strip the text arguments of the |
19 | a, c and i commands of their initial blanks, i.e. | |
c8efee25 KB |
20 | |
21 | #!/bin/sed -f | |
22 | a\ | |
23 | foo\ | |
24 | bar | |
25 | ||
86cf068c | 26 | produces: |
c8efee25 KB |
27 | |
28 | foo | |
29 | bar | |
30 | ||
9ff62da6 KB |
31 | POSIX does not specify this behavior. This implementation follows |
32 | historic practice. | |
c8efee25 | 33 | |
9ff62da6 KB |
34 | 2. Historic implementations ignore comments in the text of the i |
35 | and a commands. This implementation follows historic practice. | |
c8efee25 | 36 | |
9ff62da6 KB |
37 | TK I can't duplicate this -- the BSD version of sed doesn't, i.e. |
38 | TK i\ | |
39 | TK foo\ | |
40 | TK #comment\ | |
41 | TK bar | |
42 | TK prints | |
43 | TK | |
44 | TK foo | |
45 | TK #comment | |
46 | TK bar | |
86cf068c | 47 | |
9ff62da6 KB |
48 | 3. Historical versions of sed required that the w flag be the last |
49 | flag to an s command as it takes an additional argument. This | |
50 | is obvious, but not specified in POSIX. | |
86cf068c | 51 | |
9ff62da6 KB |
52 | 4. Historical versions of sed required that whitespace follow a w |
53 | flag to an s command. This is not specified in POSIX. This | |
54 | implementation permits whitespace but does not require it. | |
86cf068c | 55 | |
9ff62da6 KB |
56 | 5. Historical versions of sed permitted any number of whitespace |
57 | characters to follow the w command. This is not specified in | |
58 | POSIX. This implementation permits whitespace but does not | |
59 | require it. | |
60 | ||
61 | 6. The rule for the l command differs from historic practice. Table | |
62 | 2-15 includes the various ANSI C escape sequences, including \\ | |
63 | for backslash. Some historical versions of sed displayed two | |
64 | digit octal numbers, too, not three as specified by POSIX. The | |
65 | POSIX specification is a cleanup, and this implementation follows | |
66 | it. | |
86cf068c | 67 | |
9ff62da6 | 68 | 7. The specification for ! does not specify that for a single |
86cf068c KB |
69 | command the command must not contain an address specification |
70 | whereas the command list can contain address specifications. | |
71 | ||
9ff62da6 | 72 | TK I think this is wrong: the script: |
86cf068c KB |
73 | TK |
74 | TK 3!p | |
75 | TK | |
9ff62da6 KB |
76 | TK works fine. Am I misunderstanding your point? |
77 | DDS Yes. By the definition of command by POSIX 3!/hello/p should work | |
78 | DDS just as 3!{/hello/p} does. The current implementation follows | |
79 | DDS historic practice and does not implement it. | |
80 | TK I *still* don't understand.... Would you please try to explain | |
81 | TK it one more time? Thanks... | |
82 | ||
83 | 8. POSIX does not specify what happens with consecutive ! commands | |
84 | (e.g. /foo/!!!p). Historic implementations allow any number of | |
85 | !'s without changing the behaviour. (It seems logical that each | |
86 | one might reverse the behaviour.) This implementation follows | |
87 | historic practice. | |
86cf068c | 88 | |
9ff62da6 KB |
89 | 9. Historic versions of sed permitted commands to be separated |
90 | by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first | |
86cf068c KB |
91 | three lines of a file. This is not specified by POSIX. |
92 | Note, the ; command separator is not allowed for the commands | |
93 | a, c, i, w, r, :, b, t, # and at the end of a w flag in the s | |
9ff62da6 KB |
94 | command. This implementation follows historic practice and |
95 | implements the ; separator. | |
86cf068c | 96 | |
9ff62da6 KB |
97 | 10. Historic versions of sed terminated the script if EOF was reached |
98 | during the execution of the 'n' command, i.e.: | |
c8efee25 KB |
99 | |
100 | sed -e ' | |
101 | n | |
102 | i\ | |
103 | hello | |
104 | ' </dev/null | |
105 | ||
9ff62da6 KB |
106 | did not produce any output. POSIX does not specify this behavior. |
107 | This implementation follows historic practice. | |
c8efee25 | 108 | |
9ff62da6 KB |
109 | 11. POSIX does not specify that the q command causes all lines that |
110 | have been appended to be output and that the pattern space is | |
111 | printed before exiting. This implementation follows historic | |
112 | practice. | |
86cf068c | 113 | |
9ff62da6 KB |
114 | 12. Historical implementations do not output the change text of a c |
115 | command in the case of an address range whose second line number | |
116 | is greater than the first (e.g. 3,1). POSIX requires that the | |
117 | text be output. Since the historic behavior doesn't seem to have | |
118 | any particular purpose, this implementation follows the POSIX | |
119 | behavior. | |
120 | ||
121 | 13. POSIX does not specify whether address ranges are checked and | |
122 | reset if a command is not executed due to a jump. The following | |
123 | program, with the input "one\ntwo\nthree\nfour\nfive" can behave | |
124 | in different ways depending on whether the the /one/,/three/c | |
125 | command is triggered at the third line. | |
126 | ||
127 | 2,4b | |
128 | /one,/three/c\ | |
129 | append some text | |
130 | ||
131 | Historic implementations of sed, for the above example, would | |
132 | output the text after the "branch" no longer applied, but would | |
133 | then quit without further processing. This implementation has | |
134 | the more intuitive behavior of never outputting the text at all. | |
135 | This is based on the belief that it would be reasonable to want | |
136 | to output some text if the pattern /one/,/three/ occurs but only | |
137 | if it occurs outside of the range of lines 2 to 4. | |
138 | ||
139 | 14. Historical implementations allow an output suppressing #n at the | |
140 | beginning of -e arguments as well as in a script file. POSIX | |
141 | does not specify this. This implementation follows historical | |
142 | practice. | |
86cf068c | 143 | |
9ff62da6 KB |
144 | 15. POSIX does not specify whether more than one numeric flag is |
145 | allowed on the s command. Historic practice is to specify only | |
146 | a single flag. | |
86cf068c KB |
147 | |
148 | TK What's historic practice? Currently we don't report an error or | |
9ff62da6 KB |
149 | TK do all of the flags. |
150 | DDS Historic practice is a single flag. We follow it. POSIX | |
151 | DDS should be more precise. | |
152 | TK It actually seems reasonable to do multiple flags, i.e. display | |
153 | TK two or more of the matched patterns. Since it's unambiguous (only | |
154 | TK 1-9 are allowed, so /19 *has* to be 1 and 9, not nineteen, we can't | |
155 | TK break any existing scripts. | |
156 | ||
157 | 16. POSIX does not explicitly specify how sed behaves if no script is | |
158 | specified. Since the sed Synopsis permits this form of the command, | |
159 | and the language in the Description section states that the input | |
160 | is output, it seems reasonable that it behave like the cat(1) | |
161 | command. Historic sed implementations behave differently for "ls | | |
162 | sed" (no output) and "ls | sed -e#" (like cat). This implementation | |
163 | behaves like cat in both cases. | |
164 | ||
165 | 17. The POSIX requirement to open all wfiles from the beginning makes | |
166 | sed behave nonintuitively when the w commands are preceded by | |
167 | addresses or are within conditional blocks. This implementation | |
168 | follows historic practice and POSIX, by default, and provides the | |
169 | -a option for more reasonable behavior. | |
170 | ||
171 | 18. POSIX does not specify how escape sequences other than \n and \D | |
172 | (where D is the delimiter character) are to be treated. This is | |
173 | reasonable, however, it doesn't state that the backslash is to be | |
174 | discarded from the output regardless. A strict reading of POSIX | |
175 | would be that "echo xyz | sed s/./\a" would display "\ayz". As | |
176 | historic sed implementations always discarded the backslash, this | |
177 | implementation does as well. | |
178 | ||
179 | 19. POSIX specifies that an address can be "empty". This implies that | |
180 | constructs like ,d or 1,d and ,5d are allowed. This is not true | |
181 | for historic implementations of sed. This implementation follows | |
182 | historic practice. | |
86cf068c | 183 | |
9ff62da6 KB |
184 | 20. The b t and : commands are documented in POSIX to ignore leading |
185 | white space, but no mention is made of trailing white space. | |
186 | Historic implementations of sed assigned different locations to | |
187 | the labels "x" and "x ". This is not useful, and leads to subtle | |
188 | programming errors. This implementation ignores trailing whitespace. | |
86cf068c KB |
189 | |
190 | TK I think that line 11347 points out the the synopsis shows | |
191 | TK which are valid. | |
9ff62da6 KB |
192 | DDS I am talking about _trailing_ white space. In our implementation |
193 | DDS and historic implementation the label can contain _significant_ | |
194 | DDS white space at its end. This is obscure and not explained in | |
195 | DDS POSIX. | |
196 | TK I think we should delete trailing white space for the above | |
197 | TK reason. | |
198 | ||
199 | 21. Although POSIX specifies that reading from files that do not exist | |
200 | from within the script must not terminate the script, it does not | |
201 | specify what happens if a write command fails. Historic practice | |
202 | is to fail immediately if the file cannot be open or written. This | |
203 | implementation follows historic practice. | |
86cf068c | 204 | |
9ff62da6 KB |
205 | 22. Historic practice is that the \n construct can be used for either |
206 | string1 or string2 of the y command. This is not specified by | |
207 | POSIX. This implementation follows historic practice. | |
86cf068c | 208 | |
9ff62da6 KB |
209 | 23. POSIX does not specify if the "Nth occurrence" of an RE in a |
210 | substitute command is an overlapping or a non-overlapping one, | |
211 | i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". | |
212 | Historical practice is to drop core or only do non-overlapping | |
213 | expressions. This implementation follows historic practice. | |
86cf068c | 214 | |
9ff62da6 | 215 | 24. Historic implementations of sed ignore the regular expression |
86cf068c | 216 | delimiter characters within character classes. This is not |
9ff62da6 | 217 | specified in POSIX. This implementation follows historic practice. |