78ed81a3 |
1 | # @(#)POSIX 5.9 (Berkeley) 8/28/92 |
2 | |
3 | Comments on the IEEE P1003.2 Draft 12 |
4 | Part 2: Shell and Utilities |
5 | Section 4.55: sed - Stream editor |
6 | |
7 | Diomidis Spinellis <dds@doc.ic.ac.uk> |
8 | Keith Bostic <bostic@cs.berkeley.edu> |
9 | |
10 | In the following paragraphs, "wrong" usually means "inconsistent with |
11 | historic practice", as most of the following comments refer to |
12 | undocumented inconsistencies between the historical versions of sed and |
13 | the POSIX 1003.2 standard. All the comments are notes taken while |
14 | implementing a POSIX-compatible version of sed, and should not be |
15 | interpreted as official opinions or criticism towards the POSIX committee. |
16 | All uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. |
17 | |
18 | 1. 32V and BSD derived implementations of sed strip the text |
19 | arguments of the a, c and i commands of their initial blanks, |
20 | i.e. |
21 | |
22 | #!/bin/sed -f |
23 | a\ |
24 | foo\ |
25 | \ indent\ |
26 | bar |
27 | |
28 | produces: |
29 | |
30 | foo |
31 | indent |
32 | bar |
33 | |
34 | POSIX does not specify this behavior as the System V versions of |
35 | sed do not do this stripping. The argument against stripping is |
36 | that it is difficult to write sed scripts that have leading blanks |
37 | if they are stripped. The argument for stripping is that it is |
38 | difficult to write readable sed scripts unless indentation is allowed |
39 | and ignored, and leading whitespace is obtainable by entering a |
40 | backslash in front of it. This implementation follows the BSD |
41 | historic practice. |
42 | |
43 | 2. Historical versions of sed required that the w flag be the last |
44 | flag to an s command as it takes an additional argument. This |
45 | is obvious, but not specified in POSIX. |
46 | |
47 | 3. Historical versions of sed required that whitespace follow a w |
48 | flag to an s command. This is not specified in POSIX. This |
49 | implementation permits whitespace but does not require it. |
50 | |
51 | 4. Historical versions of sed permitted any number of whitespace |
52 | characters to follow the w command. This is not specified in |
53 | POSIX. This implementation permits whitespace but does not |
54 | require it. |
55 | |
56 | 5. The rule for the l command differs from historic practice. Table |
57 | 2-15 includes the various ANSI C escape sequences, including \\ |
58 | for backslash. Some historical versions of sed displayed two |
59 | digit octal numbers, too, not three as specified by POSIX. POSIX |
60 | is a cleanup, and is followed by this implementation. |
61 | |
62 | 6. The POSIX specification for ! does not specify that for a single |
63 | command the command must not contain an address specification |
64 | whereas the command list can contain address specifications. The |
65 | specification for ! implies that "3!/hello/p" works, and it never |
66 | has, historically. Note, |
67 | |
68 | 3!{ |
69 | /hello/p |
70 | } |
71 | |
72 | does work. |
73 | |
74 | 7. POSIX does not specify what happens with consecutive ! commands |
75 | (e.g. /foo/!!!p). Historic implementations allow any number of |
76 | !'s without changing the behaviour. (It seems logical that each |
77 | one might reverse the behaviour.) This implementation follows |
78 | historic practice. |
79 | |
80 | 8. Historic versions of sed permitted commands to be separated |
81 | by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first |
82 | three lines of a file. This is not specified by POSIX. |
83 | Note, the ; command separator is not allowed for the commands |
84 | a, c, i, w, r, :, b, t, # and at the end of a w flag in the s |
85 | command. This implementation follows historic practice and |
86 | implements the ; separator. |
87 | |
88 | 9. Historic versions of sed terminated the script if EOF was reached |
89 | during the execution of the 'n' command, i.e.: |
90 | |
91 | sed -e ' |
92 | n |
93 | i\ |
94 | hello |
95 | ' </dev/null |
96 | |
97 | did not produce any output. POSIX does not specify this behavior. |
98 | This implementation follows historic practice. |
99 | |
100 | 10. POSIX does not specify that the q command causes all lines that |
101 | have been appended to be output and that the pattern space is |
102 | printed before exiting. This implementation follows historic |
103 | practice. |
104 | |
105 | 11. Historical implementations do not output the change text of a c |
106 | command in the case of an address range whose first line number |
107 | is greater than the second (e.g. 3,1). POSIX requires that the |
108 | text be output. Since the historic behavior doesn't seem to have |
109 | any particular purpose, this implementation follows the POSIX |
110 | behavior. |
111 | |
112 | 12. POSIX does not specify whether address ranges are checked and |
113 | reset if a command is not executed due to a jump. The following |
114 | program will behave in different ways depending on whether the |
115 | 'c' command is triggered at the third line, i.e. will the text |
116 | be output even though line 3 of the input will never logically |
117 | encounter that command. |
118 | |
119 | 2,4b |
120 | 1,3c\ |
121 | text |
122 | |
123 | Historic implementations, and this implementation, do not output |
124 | the text in the above example. The general rule, therefore, |
125 | is that a range whose second address is never matched extends to |
126 | the end of the input. |
127 | |
128 | 13. Historical implementations allow an output suppressing #n at the |
129 | beginning of -e arguments as well as in a script file. POSIX |
130 | does not specify this. This implementation follows historical |
131 | practice. |
132 | |
133 | 14. POSIX does not explicitly specify how sed behaves if no script is |
134 | specified. Since the sed Synopsis permits this form of the command, |
135 | and the language in the Description section states that the input |
136 | is output, it seems reasonable that it behave like the cat(1) |
137 | command. Historic sed implementations behave differently for "ls | |
138 | sed", where they produce no output, and "ls | sed -e#", where they |
139 | behave like cat. This implementation behaves like cat in both cases. |
140 | |
141 | 15. The POSIX requirement to open all w files at the beginning makes |
142 | sed behave nonintuitively when the w commands are preceded by |
143 | addresses or are within conditional blocks. This implementation |
144 | follows historic practice and POSIX, by default, and provides the |
145 | -a option which opens the files only when they are needed. |
146 | |
147 | 16. POSIX does not specify how escape sequences other than \n and \D |
148 | (where D is the delimiter character) are to be treated. This is |
149 | reasonable, however, it also doesn't state that the backslash is |
150 | to be discarded from the output regardless. A strict reading of |
151 | POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". |
152 | As historic sed implementations always discarded the backslash, |
153 | this implementation does as well. |
154 | |
155 | 17. POSIX specifies that an address can be "empty". This implies |
156 | that constructs like ",d" or "1,d" and ",5d" are allowed. This |
157 | is not true for historic implementations or this implementation |
158 | of sed. |
159 | |
160 | 18. The b t and : commands are documented in POSIX to ignore leading |
161 | white space, but no mention is made of trailing white space. |
162 | Historic implementations of sed assigned different locations to |
163 | the labels "x" and "x ". This is not useful, and leads to subtle |
164 | programming errors, but it is historic practice and changing it |
165 | could theoretically break working scripts. This implementation |
166 | follows historic practice. |
167 | |
168 | 19. Although POSIX specifies that reading from files that do not exist |
169 | from within the script must not terminate the script, it does not |
170 | specify what happens if a write command fails. Historic practice |
171 | is to fail immediately if the file cannot be opened or written. |
172 | This implementation follows historic practice. |
173 | |
174 | 20. Historic practice is that the \n construct can be used for either |
175 | string1 or string2 of the y command. This is not specified by |
176 | POSIX. This implementation follows historic practice. |
177 | |
178 | 21. POSIX does not specify if the "Nth occurrence" of an RE in a |
179 | substitute command is an overlapping or a non-overlapping one, |
180 | i.e. what is the result of s/a*/A/2 on the pattern "aaaaa aaaaa". |
181 | Historical practice is to drop core or only do non-overlapping |
182 | RE's. This implementation only does non-overlapping RE's. |
183 | |
184 | 22. Historic implementations of sed ignore the RE delimiter characters |
185 | within character classes. This is not specified in POSIX. This |
186 | implementation follows historic practice. |
187 | |
188 | 23. Historic implementations handle empty RE's in a special way: the |
189 | empty RE is interpreted as if it were the last RE encountered, |
190 | whether in an address or elsewhere. POSIX does not document this |
191 | behavior. For example the command: |
192 | |
193 | sed -e /abc/s//XXX/ |
194 | |
195 | substitutes XXX for the pattern abc. The semantics of "the last |
196 | RE" can be defined in two different ways: |
197 | |
198 | 1. The last RE encountered when compiling (lexical/static scope). |
199 | 2. The last RE encountered while running (dynamic scope). |
200 | |
201 | While many historical implementations fail on programs depending |
202 | on scope differences, the SunOS version exhibited dynamic scope |
203 | behaviour. This implementation does dynamic scoping, as this seems |
204 | the most useful and in order to remain consistent with historical |
205 | practice. |