BSD 4 development
[unix-history] / .ref-BSD-3 / usr / doc / adv.ed / ae2
CommitLineData
8340f87c
BJ
1.NH
2SPECIAL CHARACTERS
3.PP
4The editor
5.UL ed
6is the primary interface to the system
7for many people, so
8it is worthwhile to know
9how to get the most out of
10.UL ed
11for the least effort.
12.PP
13The next few sections will discuss
14shortcuts
15and labor-saving devices.
16Not all of these will be instantly useful
17to any one person, of course,
18but a few will be,
19and the others should give you ideas to store
20away for future use.
21And as always,
22until you try these things,
23they will remain theoretical knowledge,
24not something you have confidence in.
25.SH
26The List command `l'
27.PP
28.UL ed
29provides two commands for printing the contents of the lines
30you're editing.
31Most people are familiar with
32.UL p ,
33in combinations like
34.P1
351,$p
36.P2
37to print all the lines you're editing,
38or
39.P1
40s/abc/def/p
41.P2
42to change
43`abc'
44to
45`def'
46on the current line.
47Less familiar is the
48.ul
49list
50command
51.UL l
52(the letter `\fIl\|\fR'),
53which gives slightly more information than
54.UL p .
55In particular,
56.UL l
57makes visible characters that are normally invisible,
58such as tabs and backspaces.
59If you list a line that contains some of these,
60.UL l
61will print each tab as
62.UL \z\(mi>
63and each backspace as
64.UL \z\(mi< .
65This makes it much easier to correct the sort of typing mistake
66that inserts extra spaces adjacent to tabs,
67or inserts a backspace followed by a space.
68.PP
69The
70.UL l
71command
72also `folds' long lines for printing _
73any line that exceeds 72 characters is printed on multiple lines;
74each printed line except the last is terminated by a backslash
75.UL \*e ,
76so you can tell it was folded.
77This is useful for printing long lines on short terminals.
78.PP
79Occasionally the
80.UL l
81command will print in a line a string of numbers preceded by a backslash,
82such as \*e07 or \*e16.
83These combinations are used to make visible characters that normally don't print,
84like form feed or vertical tab or bell.
85Each such combination is a single character.
86When you see such characters, be wary _
87they may have surprising meanings when printed on some terminals.
88Often their presence means that your finger slipped while you were typing;
89you almost never want them.
90.SH
91The Substitute Command `s'
92.PP
93Most of the next few sections will be taken up with a discussion
94of the
95substitute
96command
97.UL s .
98Since this is the command for changing the contents of individual
99lines,
100it probably has the most complexity of any
101.UL ed
102command,
103and the most potential for effective use.
104.PP
105As the simplest place to begin,
106recall the meaning of a trailing
107.UL g
108after a substitute command.
109With
110.P1
111s/this/that/
112.P2
113and
114.P1
115s/this/that/g
116.P2
117the
118first
119one replaces the
120.ul
121first
122`this' on the line
123with `that'.
124If there is more than one `this' on the line,
125the second form
126with the trailing
127.UL g
128changes
129.ul
130all
131of them.
132.PP
133Either form of the
134.UL s
135command can be followed by
136.UL p
137or
138.UL l
139to `print' or `list' (as described in the previous section)
140the contents of the line:
141.P1
142s/this/that/p
143s/this/that/l
144s/this/that/gp
145s/this/that/gl
146.P2
147are all legal, and mean slightly different things.
148Make sure you know what the differences are.
149.PP
150Of course, any
151.UL s
152command can be preceded by one or two `line numbers'
153to specify that the substitution is to take place
154on a group of lines.
155Thus
156.P1
1571,$s/mispell/misspell/
158.P2
159changes the
160.ul
161first
162occurrence of
163`mispell' to `misspell' on every line of the file.
164But
165.P1
1661,$s/mispell/misspell/g
167.P2
168changes
169.ul
170every
171occurrence in every line
172(and this is more likely to be what you wanted in this
173particular case).
174.PP
175You should also notice that if you add a
176.UL p
177or
178.UL l
179to the end of any of these substitute commands,
180only the last line that got changed will be printed,
181not all the lines.
182We will talk later about how to print all the lines
183that were modified.
184.SH
185The Undo Command `u'
186.PP
187Occasionally you will make a substitution in a line,
188only to realize too late that it was a ghastly mistake.
189The `undo' command
190.UL u
191lets you `undo' the last substitution:
192the last line that was substituted can be restored to
193its previous state by typing the command
194.P1
195u
196.P2
197.SH
198The Metacharacter `\*.'
199.PP
200As you have undoubtedly noticed
201when you use
202.UL ed ,
203certain characters have unexpected meanings
204when they occur in the left side of a substitute command,
205or in a search for a particular line.
206In the next several sections, we will talk about
207these special characters,
208which are often called `metacharacters'.
209.PP
210The first one is the period `\*.'.
211On the left side of a substitute command,
212or in a search with `/.../',
213`\*.' stands for
214.ul
215any
216single character.
217Thus the search
218.P1
219/x\*.y/
220.P2
221finds any line where `x' and `y' occur separated by
222a single character, as in
223.P1
224x+y
225x\-y
226x\*By
227x\*.y
228.P2
229and so on.
230(We will use \*B to stand for a space whenever we need to
231make it visible.)
232.PP
233Since `\*.' matches a single character,
234that gives you a way to deal with funny characters
235printed by
236.UL l .
237Suppose you have a line that, when printed with the
238.UL l
239command, appears as
240.P1
241 .... th\*e07is ....
242.P2
243and you want to get rid of the
244\*e07
245(which represents the bell character, by the way).
246.PP
247The most obvious solution is to try
248.P1
249s/\*e07//
250.P2
251but this will fail. (Try it.)
252The brute force solution, which most people would now take,
253is to re-type the entire line.
254This is guaranteed, and is actually quite a reasonable tactic
255if the line in question isn't too big,
256but for a very long line,
257re-typing is a bore.
258This is where the metacharacter `\*.' comes in handy.
259Since `\*e07' really represents a single character,
260if we say
261.P1
262s/th\*.is/this/
263.P2
264the job is done.
265The `\*.' matches the mysterious character between the `h' and the `i',
266.ul
267whatever it is.
268.PP
269Bear in mind that since `\*.' matches any single character,
270the command
271.P1
272s/\*./,/
273.P2
274converts the first character on a line into a `,',
275which very often is not what you intended.
276.PP
277As is true of many characters in
278.UL ed ,
279the `\*.' has several meanings, depending
280on its context.
281This line shows all three:
282.P1
283\&\*.s/\*./\*./
284.P2
285The first `\*.' is a line number,
286the number of
287the line we are editing,
288which is called `line dot'.
289(We will discuss line dot more in Section 3.)
290The second `\*.' is a metacharacter
291that matches any single character on that line.
292The third `\*.' is the only one that really is
293an honest literal period.
294On the
295.ul
296right
297side of a substitution, `\*.'
298is not special.
299If you apply this command to the line
300.P1
301Now is the time\*.
302.P2
303the result will
304be
305.P1
306\&\*.ow is the time\*.
307.P2
308which is probably not what you intended.
309.SH
310The Backslash `\*e'
311.PP
312Since a period means `any character',
313the question naturally arises of what to do
314when you really want a period.
315For example, how do you convert the line
316.P1
317Now is the time\*.
318.P2
319into
320.P1
321Now is the time?
322.P2
323The backslash `\*e' does the job.
324A backslash turns off any special meaning that the next character
325might have; in particular,
326`\*e\*.' converts the `\*.' from a `match anything'
327into a period, so
328you can use it to replace
329the period in
330.P1
331Now is the time\*.
332.P2
333like this:
334.P1
335s/\*e\*./?/
336.P2
337The pair of characters `\*e\*.' is considered by
338.UL ed
339to be a single real period.
340.PP
341The backslash can also be used when searching for lines
342that contain a special character.
343Suppose you are looking for a line that contains
344.P1
345\&\*.PP
346.P2
347The search
348.P1
349/\*.PP/
350.P2
351isn't adequate, for it will find
352a line like
353.P1
354THE APPLICATION OF ...
355.P2
356because the `\*.' matches the letter `A'.
357But if you say
358.P1
359/\*e\*.PP/
360.P2
361you will find only lines that contain `\*.PP'.
362.PP
363The backslash can also be used to turn off special meanings for
364characters other than `\*.'.
365For example, consider finding a line that contains a backslash.
366The search
367.P1
368/\*e/
369.P2
370won't work,
371because the `\*e' isn't a literal `\*e', but instead means that the second `/'
372no longer \%delimits the search.
373But by preceding a backslash with another one,
374you can search for a literal backslash.
375Thus
376.P1
377/\*e\*e/
378.P2
379does work.
380Similarly, you can search for a forward slash `/' with
381.P1
382/\*e//
383.P2
384The backslash turns off the meaning of the immediately following `/' so that
385it doesn't terminate the /.../ construction prematurely.
386.PP
387As an exercise, before reading further, find two substitute commands each of which will
388convert the line
389.P1
390\*ex\*e\*.\*ey
391.P2
392into the line
393.P1
394\*ex\*ey
395.P2
396.PP
397Here are several solutions;
398verify that each works as advertised.
399.P1
400s/\*e\*e\*e\*.//
401s/x\*.\*./x/
402s/\*.\*.y/y/
403.P2
404.PP
405A couple of miscellaneous notes about
406backslashes and special characters.
407First, you can use any character to delimit the pieces
408of an
409.UL s
410command: there is nothing sacred about slashes.
411(But you must use slashes for context searching.)
412For instance, in a line that contains a lot of slashes already, like
413.P1
414//exec //sys.fort.go // etc...
415.P2
416you could use a colon as the delimiter _
417to delete all the slashes, type
418.P1
419s:/::g
420.P2
421.PP
422Second, if # and @ are your character erase and line kill characters,
423you have to type \*e# and \*e@;
424this is true whether you're talking to
425.UL ed
426or any other program.
427.PP
428When you are adding text with
429.UL a
430or
431.UL i
432or
433.UL c ,
434backslash is not special, and you should only put in
435one backslash for each one you really want.
436.SH
437The Dollar Sign `$'
438.PP
439The next metacharacter, the `$', stands for `the end of the line'.
440As its most obvious use, suppose you have the line
441.P1
442Now is the
443.P2
444and you wish to add the word `time' to the end.
445Use the $ like this:
446.P1
447s/$/\*Btime/
448.P2
449to get
450.P1
451Now is the time
452.P2
453Notice that a space is needed before `time' in
454the substitute command,
455or you will get
456.P1
457Now is thetime
458.P2
459.PP
460As another example, replace the second comma in
461the following line with a period without altering the first:
462.P1
463Now is the time, for all good men,
464.P2
465The command needed is
466.P1
467s/,$/\*./
468.P2
469The $ sign here provides context to make specific which comma we mean.
470Without it, of course, the
471.UL s
472command would operate on the first comma to produce
473.P1
474Now is the time\*. for all good men,
475.P2
476.PP
477As another example, to convert
478.P1
479Now is the time\*.
480.P2
481into
482.P1
483Now is the time?
484.P2
485as we did earlier, we can use
486.P1
487s/\*.$/?/
488.P2
489.PP
490Like `\*.', the `$'
491has multiple meanings depending on context.
492In the line
493.P1
494$s/$/$/
495.P2
496the first `$' refers to the
497last line of the file,
498the second refers to the end of that line,
499and the third is a literal dollar sign,
500to be added to that line.
501.SH
502The Circumflex `^'
503.PP
504The circumflex (or hat or caret)
505`^' stands for the beginning of the line.
506For example, suppose you are looking for a line that begins
507with `the'.
508If you simply say
509.P1
510/the/
511.P2
512you will in all likelihood find several lines that contain `the' in the middle before
513arriving at the one you want.
514But with
515.P1
516/^the/
517.P2
518you narrow the context, and thus arrive at the desired one
519more easily.
520.PP
521The other use of `^' is of course to enable you to insert
522something at the beginning of a line:
523.P1
524s/^/\*B/
525.P2
526places a space at the beginning of the current line.
527.PP
528Metacharacters can be combined. To search for a
529line that contains
530.ul
531only
532the characters
533.P1
534\&\*.PP
535.P2
536you can use the command
537.P1
538/^\*e\*.PP$/
539.P2
540.SH
541The Star `*'
542.PP
543Suppose you have a line that looks like this:
544.P1
545\fItext \fR x y \fI text \fR
546.P2
547where
548.ul
549text
550stands
551for lots of text,
552and there are some indeterminate number of spaces between the
553.UL x
554and the
555.UL y .
556Suppose the job is to replace all the spaces between
557.UL x
558and
559.UL y
560by a single space.
561The line is too long to retype, and there are too many spaces
562to count.
563What now?
564.PP
565This is where the metacharacter `*'
566comes in handy.
567A character followed by a star
568stands for as many consecutive occurrences of that
569character as possible.
570To refer to all the spaces at once, say
571.P1
572s/x\*B*y/x\*By/
573.P2
574The construction
575`\*B*'
576means
577`as many spaces as possible'.
578Thus `x\*B*y' means `an x, as many spaces as possible, then a y'.
579.PP
580The star can be used with any character, not just space.
581If the original example was instead
582.P1
583\fItext \fR x--------y \fI text \fR
584.P2
585then all `\-' signs can be replaced by a single space
586with the command
587.P1
588s/x-*y/x\*By/
589.P2
590.PP
591Finally, suppose that the line was
592.P1
593\fItext \fR x\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.y \fI text \fR
594.P2
595Can you see what trap lies in wait for the unwary?
596If you blindly type
597.P1
598s/x\*.*y/x\*By/
599.P2
600what will happen?
601The answer, naturally, is that it depends.
602If there are no other x's or y's on the line,
603then everything works, but it's blind luck, not good management.
604Remember that `\*.' matches
605.ul
606any
607single character?
608Then `\*.*' matches as many single characters as possible,
609and unless you're careful, it can eat up a lot more of the line
610than you expected.
611If the line was, for example, like this:
612.P1
613\fItext \fRx\fI text \fR x\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.y \fI text \fRy\fI text \fR
614.P2
615then saying
616.P1
617s/x\*.*y/x\*By/
618.P2
619will take everything from the
620.ul
621first
622`x' to the
623.ul
624last
625`y',
626which, in this example, is undoubtedly more than you wanted.
627.PP
628The solution, of course, is to turn off the special meaning of
629`\*.' with
630`\*e\*.':
631.P1
632s/x\*e\*.*y/x\*By/
633.P2
634Now everything works, for `\*e\*.*' means `as many
635.ul
636periods
637as possible'.
638.PP
639There are times when the pattern `\*.*' is exactly what you want.
640For example, to change
641.P1
642Now is the time for all good men ....
643.P2
644into
645.P1
646Now is the time\*.
647.P2
648use `\*.*' to eat up everything after the `for':
649.P1
650s/\*Bfor\*.*/\*./
651.P2
652.PP
653There are a couple of additional pitfalls associated with `*' that you should be aware of.
654Most notable is the fact that `as many as possible' means
655.ul
656zero
657or more.
658The fact that zero is a legitimate possibility is
659sometimes rather surprising.
660For example, if our line contained
661.P1
662\fItext \fR xy \fI text \fR x y \fI text \fR
663.P2
664and we said
665.P1
666s/x\*B*y/x\*By/
667.P2
668the
669.ul
670first
671`xy' matches this pattern, for it consists of an `x',
672zero spaces, and a `y'.
673The result is that the substitute acts on the first `xy',
674and does not touch the later one that actually contains some intervening spaces.
675.PP
676The way around this, if it matters, is to specify a pattern like
677.P1
678/x\*B\*B*y/
679.P2
680which says `an x, a space, then as many more spaces as possible, then a y',
681in other words, one or more spaces.
682.PP
683The other startling behavior of `*' is again related to the fact
684that zero is a legitimate number of occurrences of something
685followed by a star. The command
686.P1
687s/x*/y/g
688.P2
689when applied to the line
690.P1
691abcdef
692.P2
693produces
694.P1
695yaybycydyeyfy
696.P2
697which is almost certainly not what was intended.
698The reason for this behavior is that zero is a legal number
699of matches,
700and there are no x's at the beginning of the line
701(so that gets converted into a `y'),
702nor between the `a' and the `b'
703(so that gets converted into a `y'), nor ...
704and so on.
705Make sure you really want zero matches;
706if not, in this case write
707.P1
708s/xx*/y/g
709.P2
710`xx*' is one or more x's.
711.SH
712The Brackets `[ ]'
713.PP
714Suppose that you want to delete any numbers
715that appear
716at the beginning of all lines of a file.
717You might first think of trying a series of commands like
718.P1
7191,$s/^1*//
7201,$s/^2*//
7211,$s/^3*//
722.P2
723and so on,
724but this is clearly going to take forever if the numbers are at all long.
725Unless you want to repeat the commands over and over until
726finally all numbers are gone,
727you must get all the digits on one pass.
728This is the purpose of the brackets [ and ].
729.PP
730The construction
731.P1
732[0123456789]
733.P2
734matches any single digit _
735the whole thing is called a `character class'.
736With a character class, the job is easy.
737The pattern `[0123456789]*' matches zero or more digits (an entire number), so
738.P1
7391,$s/^[0123456789]*//
740.P2
741deletes all digits from the beginning of all lines.
742.PP
743Any characters can appear within a character class,
744and just to confuse the issue there are essentially no special characters
745inside the brackets;
746even the backslash doesn't have a special meaning.
747To search for special characters, for example, you can say
748.P1
749/[\*.\*e$^[]/
750.P2
751Within [...], the `[' is not special.
752To get a `]' into a character class,
753make it the first character.
754.PP
755It's a nuisance to have to spell out the digits,
756so you can abbreviate them as
757[0\-9];
758similarly, [a\-z] stands for the lower case letters,
759and
760[A\-Z] for upper case.
761.PP
762As a final frill on character classes, you can specify a class
763that means `none of the following characters'.
764This is done by beginning the class with a `^':
765.P1
766[^0-9]
767.P2
768stands for `any character
769.ul
770except
771a digit'.
772Thus you might find the first line that doesn't begin with a tab or space
773by a search like
774.P1
775/^[^(space)(tab)]/
776.P2
777.PP
778Within a character class,
779the circumflex has a special meaning
780only if it occurs at the beginning.
781Just to convince yourself, verify that
782.P1
783/^[^^]/
784.P2
785finds a line that doesn't begin with a circumflex.
786.SH
787The Ampersand `&'
788.PP
789The ampersand `&' is used primarily to save typing.
790Suppose you have the line
791.P1
792Now is the time
793.P2
794and you want to make it
795.P1
796Now is the best time
797.P2
798Of course you can always say
799.P1
800s/the/the best/
801.P2
802but it seems silly to have to repeat the `the'.
803The `&' is used to eliminate the repetition.
804On the
805.ul
806right
807side of a substitute, the ampersand means `whatever
808was just matched', so you can say
809.P1
810s/the/& best/
811.P2
812and the `&' will stand for `the'.
813Of course this isn't much of a saving if the thing
814matched is just `the', but if it is something truly long or awful,
815or if it is something like `.*'
816which matches a lot of text,
817you can save some tedious typing.
818There is also much less chance of making a typing error
819in the replacement text.
820For example, to parenthesize a line,
821regardless of its length,
822.P1
823s/\*.*/(&)/
824.P2
825.PP
826The ampersand can occur more than once on the right side:
827.P1
828s/the/& best and & worst/
829.P2
830makes
831.P1
832Now is the best and the worst time
833.P2
834and
835.P1
836s/\*.*/&? &!!/
837.P2
838converts the original line into
839.P1
840Now is the time? Now is the time!!
841.P2
842.PP
843To get a literal ampersand, naturally the backslash is used to turn off the special meaning:
844.P1
845s/ampersand/\*e&/
846.P2
847converts the word into the symbol.
848Notice that `&' is not special on the left side
849of a substitute, only on the
850.ul
851right
852side.
853.SH
854Substituting Newlines
855.PP
856.UL ed
857provides a facility for splitting a single line into two or more shorter lines by `substituting in a newline'.
858As the simplest example, suppose a line has gotten unmanageably long
859because of editing (or merely because it was unwisely typed).
860If it looks like
861.P1
862\fItext \fR xy \fI text \fR
863.P2
864you can break it between the `x' and the `y' like this:
865.P1
866s/xy/x\*e
867y/
868.P2
869This is actually a single command,
870although it is typed on two lines.
871Bearing in mind that `\*e' turns off special meanings,
872it seems relatively intuitive that a `\*e' at the end of
873a line would make the newline there
874no longer special.
875.PP
876You can in fact make a single line into several lines
877with this same mechanism.
878As a large example, consider underlining the word `very'
879in a long line
880by splitting `very' onto a separate line,
881and preceding it by the
882.UL roff
883or
884.UL nroff
885formatting command `.ul'.
886.P1
887\fItext \fR a very big \fI text \fR
888.P2
889The command
890.P1
891s/\*Bvery\*B/\*e
892\&.ul\*e
893very\*e
894/
895.P2
896converts the line into four shorter lines,
897preceding the word `very' by the
898line
899`.ul',
900and eliminating the spaces around the `very',
901all at the same time.
902.PP
903When a newline is substituted
904in, dot is left pointing at the last line created.
905.PP
906.SH
907Joining Lines
908.PP
909Lines may also be joined together,
910but this is done with the
911.UL j
912command
913instead of
914.UL s .
915Given the lines
916.P1
917Now is
918\*Bthe time
919.P2
920and supposing that dot is set to the first of them,
921then the command
922.P1
923j
924.P2
925joins them together.
926No blanks are added,
927which is why we carefully showed a blank
928at the beginning of the second line.
929.PP
930All by itself,
931a
932.UL j
933command
934joins line dot to line dot+1,
935but any contiguous set of lines can be joined.
936Just specify the starting and ending line numbers.
937For example,
938.P1
9391,$jp
940.P2
941joins all the lines into one big one
942and prints it.
943(More on line numbers in Section 3.)
944.SH
945Rearranging a Line with \*e( ... \*e)
946.PP
947(This section should be skipped on first reading.)
948Recall that `&' is a shorthand that stands for whatever
949was matched by the left side of an
950.UL s
951command.
952In much the same way you can capture separate pieces
953of what was matched;
954the only difference is that you have to specify
955on the left side just what pieces you're interested in.
956.PP
957Suppose, for instance, that
958you have a file of lines that consist of names in the form
959.P1
960Smith, A. B.
961Jones, C.
962.P2
963and so on,
964and you want the initials to precede the name, as in
965.P1
966A. B. Smith
967C. Jones
968.P2
969It is possible to do this with a series of editing commands,
970but it is tedious and error-prone.
971(It is instructive to figure out how it is done, though.)
972.PP
973The alternative
974is to `tag' the pieces of the pattern (in this case,
975the last name, and the initials),
976and then rearrange the pieces.
977On the left side of a substitution,
978if part of the pattern is enclosed between
979\*e( and \*e),
980whatever matched that part is remembered,
981and available for use on the right side.
982On the right side,
983the symbol `\*e1' refers to whatever
984matched the first \*e(...\*e) pair,
985`\*e2' to the second \*e(...\*e),
986and so on.
987.PP
988The command
989.P1
9901,$s/^\*e([^,]*\*e),\*B*\*e(\*.*\*e)/\*e2\*B\*e1/
991.P2
992although hard to read, does the job.
993The first \*e(...\*e) matches the last name,
994which is any string up to the comma;
995this is referred to on the right side with `\*e1'.
996The second \*e(...\*e) is whatever follows
997the comma and any spaces,
998and is referred to as `\*e2'.
999.PP
1000Of course, with any editing sequence this complicated,
1001it's foolhardy to simply run it and hope.
1002The global commands
1003.UL g
1004and
1005.UL v
1006discussed in section 4
1007provide a way for you to print exactly those
1008lines which were affected by the
1009substitute command,
1010and thus verify that it did what you wanted
1011in all cases.