Commit | Line | Data |
---|---|---|
e74acc0a | 1 | # @(#)POSIX 5.11 (Berkeley) %G% |
c8efee25 | 2 | |
9ff62da6 KB |
3 | Comments on the IEEE P1003.2 Draft 12 |
4 | Part 2: Shell and Utilities | |
5 | Section 4.55: sed - Stream editor | |
c8efee25 | 6 | |
9ff62da6 KB |
7 | Diomidis Spinellis <dds@doc.ic.ac.uk> |
8 | Keith Bostic <bostic@cs.berkeley.edu> | |
86cf068c | 9 | |
9ff62da6 KB |
10 | In the following paragraphs, "wrong" usually means "inconsistent with |
11 | historic practice", as most of the following comments refer to | |
12 | undocumented inconsistencies between the historical versions of sed and | |
13 | the POSIX 1003.2 standard. All the comments are notes taken while | |
14 | implementing a POSIX-compatible version of sed, and should not be | |
15 | interpreted as official opinions or criticism towards the POSIX committee. | |
16 | All uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2. | |
c8efee25 | 17 | |
35a5a157 KB |
18 | 1. 32V and BSD derived implementations of sed strip the text |
19 | arguments of the a, c and i commands of their initial blanks, | |
20 | i.e. | |
c8efee25 KB |
21 | |
22 | #!/bin/sed -f | |
23 | a\ | |
24 | foo\ | |
35a5a157 | 25 | \ indent\ |
c8efee25 KB |
26 | bar |
27 | ||
86cf068c | 28 | produces: |
c8efee25 KB |
29 | |
30 | foo | |
35a5a157 | 31 | indent |
c8efee25 KB |
32 | bar |
33 | ||
35a5a157 KB |
34 | POSIX does not specify this behavior as the System V versions of |
35 | sed do not do this stripping. The argument against stripping is | |
36 | that it is difficult to write sed scripts that have leading blanks | |
37 | if they are stripped. The argument for stripping is that it is | |
38 | difficult to write readable sed scripts unless indentation is allowed | |
39 | and ignored, and leading whitespace is obtainable by entering a | |
40 | backslash in front of it. This implementation follows the BSD | |
9ff62da6 | 41 | historic practice. |
c8efee25 | 42 | |
cc266a68 | 43 | 2. Historical versions of sed required that the w flag be the last |
9ff62da6 KB |
44 | flag to an s command as it takes an additional argument. This |
45 | is obvious, but not specified in POSIX. | |
86cf068c | 46 | |
cc266a68 | 47 | 3. Historical versions of sed required that whitespace follow a w |
9ff62da6 KB |
48 | flag to an s command. This is not specified in POSIX. This |
49 | implementation permits whitespace but does not require it. | |
86cf068c | 50 | |
cc266a68 | 51 | 4. Historical versions of sed permitted any number of whitespace |
9ff62da6 KB |
52 | characters to follow the w command. This is not specified in |
53 | POSIX. This implementation permits whitespace but does not | |
54 | require it. | |
55 | ||
cc266a68 | 56 | 5. The rule for the l command differs from historic practice. Table |
9ff62da6 KB |
57 | 2-15 includes the various ANSI C escape sequences, including \\ |
58 | for backslash. Some historical versions of sed displayed two | |
cc266a68 KB |
59 | digit octal numbers, too, not three as specified by POSIX. POSIX |
60 | is a cleanup, and is followed by this implementation. | |
86cf068c | 61 | |
cc266a68 | 62 | 6. The POSIX specification for ! does not specify that for a single |
86cf068c | 63 | command the command must not contain an address specification |
cc266a68 KB |
64 | whereas the command list can contain address specifications. The |
65 | specification for ! implies that "3!/hello/p" works, and it never | |
ac7e13d8 KB |
66 | has, historically. Note, |
67 | ||
68 | 3!{ | |
69 | /hello/p | |
70 | } | |
71 | ||
72 | does work. | |
cc266a68 KB |
73 | |
74 | 7. POSIX does not specify what happens with consecutive ! commands | |
9ff62da6 KB |
75 | (e.g. /foo/!!!p). Historic implementations allow any number of |
76 | !'s without changing the behaviour. (It seems logical that each | |
77 | one might reverse the behaviour.) This implementation follows | |
78 | historic practice. | |
86cf068c | 79 | |
cc266a68 | 80 | 8. Historic versions of sed permitted commands to be separated |
9ff62da6 | 81 | by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first |
86cf068c KB |
82 | three lines of a file. This is not specified by POSIX. |
83 | Note, the ; command separator is not allowed for the commands | |
84 | a, c, i, w, r, :, b, t, # and at the end of a w flag in the s | |
9ff62da6 KB |
85 | command. This implementation follows historic practice and |
86 | implements the ; separator. | |
86cf068c | 87 | |
cc266a68 | 88 | 9. Historic versions of sed terminated the script if EOF was reached |
9ff62da6 | 89 | during the execution of the 'n' command, i.e.: |
c8efee25 KB |
90 | |
91 | sed -e ' | |
92 | n | |
93 | i\ | |
94 | hello | |
95 | ' </dev/null | |
96 | ||
9ff62da6 KB |
97 | did not produce any output. POSIX does not specify this behavior. |
98 | This implementation follows historic practice. | |
c8efee25 | 99 | |
de5af743 | 100 | 10. Deleted. |
86cf068c | 101 | |
cc266a68 | 102 | 11. Historical implementations do not output the change text of a c |
ac7e13d8 KB |
103 | command in the case of an address range whose first line number |
104 | is greater than the second (e.g. 3,1). POSIX requires that the | |
9ff62da6 KB |
105 | text be output. Since the historic behavior doesn't seem to have |
106 | any particular purpose, this implementation follows the POSIX | |
107 | behavior. | |
108 | ||
cc266a68 | 109 | 12. POSIX does not specify whether address ranges are checked and |
9ff62da6 | 110 | reset if a command is not executed due to a jump. The following |
430af964 KB |
111 | program will behave in different ways depending on whether the |
112 | 'c' command is triggered at the third line, i.e. will the text | |
56223abb KB |
113 | be output even though line 3 of the input will never logically |
114 | encounter that command. | |
9ff62da6 KB |
115 | |
116 | 2,4b | |
430af964 KB |
117 | 1,3c\ |
118 | text | |
9ff62da6 | 119 | |
56223abb KB |
120 | Historic implementations, and this implementation, do not output |
121 | the text in the above example. The general rule, therefore, | |
122 | is that a range whose second address is never matched extends to | |
123 | the end of the input. | |
9ff62da6 | 124 | |
cc266a68 | 125 | 13. Historical implementations allow an output suppressing #n at the |
9ff62da6 KB |
126 | beginning of -e arguments as well as in a script file. POSIX |
127 | does not specify this. This implementation follows historical | |
128 | practice. | |
86cf068c | 129 | |
cc266a68 | 130 | 14. POSIX does not explicitly specify how sed behaves if no script is |
9ff62da6 KB |
131 | specified. Since the sed Synopsis permits this form of the command, |
132 | and the language in the Description section states that the input | |
133 | is output, it seems reasonable that it behave like the cat(1) | |
134 | command. Historic sed implementations behave differently for "ls | | |
cc266a68 KB |
135 | sed", where they produce no output, and "ls | sed -e#", where they |
136 | behave like cat. This implementation behaves like cat in both cases. | |
9ff62da6 | 137 | |
35a5a157 | 138 | 15. The POSIX requirement to open all w files at the beginning makes |
9ff62da6 KB |
139 | sed behave nonintuitively when the w commands are preceded by |
140 | addresses or are within conditional blocks. This implementation | |
141 | follows historic practice and POSIX, by default, and provides the | |
cc266a68 | 142 | -a option which opens the files only when they are needed. |
9ff62da6 | 143 | |
cc266a68 | 144 | 16. POSIX does not specify how escape sequences other than \n and \D |
9ff62da6 | 145 | (where D is the delimiter character) are to be treated. This is |
cc266a68 KB |
146 | reasonable, however, it also doesn't state that the backslash is |
147 | to be discarded from the output regardless. A strict reading of | |
148 | POSIX would be that "echo xyz | sed s/./\a" would display "\ayz". | |
149 | As historic sed implementations always discarded the backslash, | |
150 | this implementation does as well. | |
151 | ||
152 | 17. POSIX specifies that an address can be "empty". This implies | |
153 | that constructs like ",d" or "1,d" and ",5d" are allowed. This | |
154 | is not true for historic implementations or this implementation | |
155 | of sed. | |
156 | ||
157 | 18. The b t and : commands are documented in POSIX to ignore leading | |
9ff62da6 KB |
158 | white space, but no mention is made of trailing white space. |
159 | Historic implementations of sed assigned different locations to | |
160 | the labels "x" and "x ". This is not useful, and leads to subtle | |
cc266a68 | 161 | programming errors, but it is historic practice and changing it |
ac7e13d8 KB |
162 | could theoretically break working scripts. This implementation |
163 | follows historic practice. | |
cc266a68 KB |
164 | |
165 | 19. Although POSIX specifies that reading from files that do not exist | |
9ff62da6 KB |
166 | from within the script must not terminate the script, it does not |
167 | specify what happens if a write command fails. Historic practice | |
cc266a68 KB |
168 | is to fail immediately if the file cannot be opened or written. |
169 | This implementation follows historic practice. | |
86cf068c | 170 | |
cc266a68 | 171 | 20. Historic practice is that the \n construct can be used for either |
9ff62da6 KB |
172 | string1 or string2 of the y command. This is not specified by |
173 | POSIX. This implementation follows historic practice. | |
86cf068c | 174 | |
e74acc0a | 175 | 21. Deleted. |
86cf068c | 176 | |
cc266a68 KB |
177 | 22. Historic implementations of sed ignore the RE delimiter characters |
178 | within character classes. This is not specified in POSIX. This | |
179 | implementation follows historic practice. | |
8ab54be7 KB |
180 | |
181 | 23. Historic implementations handle empty RE's in a special way: the | |
182 | empty RE is interpreted as if it were the last RE encountered, | |
183 | whether in an address or elsewhere. POSIX does not document this | |
184 | behavior. For example the command: | |
185 | ||
186 | sed -e /abc/s//XXX/ | |
187 | ||
188 | substitutes XXX for the pattern abc. The semantics of "the last | |
189 | RE" can be defined in two different ways: | |
190 | ||
191 | 1. The last RE encountered when compiling (lexical/static scope). | |
192 | 2. The last RE encountered while running (dynamic scope). | |
193 | ||
194 | While many historical implementations fail on programs depending | |
195 | on scope differences, the SunOS version exhibited dynamic scope | |
35a5a157 KB |
196 | behaviour. This implementation does dynamic scoping, as this seems |
197 | the most useful and in order to remain consistent with historical | |
198 | practice. |