| 1 | .NH |
| 2 | SPECIAL CHARACTERS |
| 3 | .PP |
| 4 | The editor |
| 5 | .UL ed |
| 6 | is the primary interface to the system |
| 7 | for many people, so |
| 8 | it is worthwhile to know |
| 9 | how to get the most out of |
| 10 | .UL ed |
| 11 | for the least effort. |
| 12 | .PP |
| 13 | The next few sections will discuss |
| 14 | shortcuts |
| 15 | and labor-saving devices. |
| 16 | Not all of these will be instantly useful |
| 17 | to any one person, of course, |
| 18 | but a few will be, |
| 19 | and the others should give you ideas to store |
| 20 | away for future use. |
| 21 | And as always, |
| 22 | until you try these things, |
| 23 | they will remain theoretical knowledge, |
| 24 | not something you have confidence in. |
| 25 | .SH |
| 26 | The List command `l' |
| 27 | .PP |
| 28 | .UL ed |
| 29 | provides two commands for printing the contents of the lines |
| 30 | you're editing. |
| 31 | Most people are familiar with |
| 32 | .UL p , |
| 33 | in combinations like |
| 34 | .P1 |
| 35 | 1,$p |
| 36 | .P2 |
| 37 | to print all the lines you're editing, |
| 38 | or |
| 39 | .P1 |
| 40 | s/abc/def/p |
| 41 | .P2 |
| 42 | to change |
| 43 | `abc' |
| 44 | to |
| 45 | `def' |
| 46 | on the current line. |
| 47 | Less familiar is the |
| 48 | .ul |
| 49 | list |
| 50 | command |
| 51 | .UL l |
| 52 | (the letter `\fIl\|\fR'), |
| 53 | which gives slightly more information than |
| 54 | .UL p . |
| 55 | In particular, |
| 56 | .UL l |
| 57 | makes visible characters that are normally invisible, |
| 58 | such as tabs and backspaces. |
| 59 | If you list a line that contains some of these, |
| 60 | .UL l |
| 61 | will print each tab as |
| 62 | .UL \z\(mi> |
| 63 | and each backspace as |
| 64 | .UL \z\(mi< . |
| 65 | This makes it much easier to correct the sort of typing mistake |
| 66 | that inserts extra spaces adjacent to tabs, |
| 67 | or inserts a backspace followed by a space. |
| 68 | .PP |
| 69 | The |
| 70 | .UL l |
| 71 | command |
| 72 | also `folds' long lines for printing _ |
| 73 | any line that exceeds 72 characters is printed on multiple lines; |
| 74 | each printed line except the last is terminated by a backslash |
| 75 | .UL \*e , |
| 76 | so you can tell it was folded. |
| 77 | This is useful for printing long lines on short terminals. |
| 78 | .PP |
| 79 | Occasionally the |
| 80 | .UL l |
| 81 | command will print in a line a string of numbers preceded by a backslash, |
| 82 | such as \*e07 or \*e16. |
| 83 | These combinations are used to make visible characters that normally don't print, |
| 84 | like form feed or vertical tab or bell. |
| 85 | Each such combination is a single character. |
| 86 | When you see such characters, be wary _ |
| 87 | they may have surprising meanings when printed on some terminals. |
| 88 | Often their presence means that your finger slipped while you were typing; |
| 89 | you almost never want them. |
| 90 | .SH |
| 91 | The Substitute Command `s' |
| 92 | .PP |
| 93 | Most of the next few sections will be taken up with a discussion |
| 94 | of the |
| 95 | substitute |
| 96 | command |
| 97 | .UL s . |
| 98 | Since this is the command for changing the contents of individual |
| 99 | lines, |
| 100 | it probably has the most complexity of any |
| 101 | .UL ed |
| 102 | command, |
| 103 | and the most potential for effective use. |
| 104 | .PP |
| 105 | As the simplest place to begin, |
| 106 | recall the meaning of a trailing |
| 107 | .UL g |
| 108 | after a substitute command. |
| 109 | With |
| 110 | .P1 |
| 111 | s/this/that/ |
| 112 | .P2 |
| 113 | and |
| 114 | .P1 |
| 115 | s/this/that/g |
| 116 | .P2 |
| 117 | the |
| 118 | first |
| 119 | one replaces the |
| 120 | .ul |
| 121 | first |
| 122 | `this' on the line |
| 123 | with `that'. |
| 124 | If there is more than one `this' on the line, |
| 125 | the second form |
| 126 | with the trailing |
| 127 | .UL g |
| 128 | changes |
| 129 | .ul |
| 130 | all |
| 131 | of them. |
| 132 | .PP |
| 133 | Either form of the |
| 134 | .UL s |
| 135 | command can be followed by |
| 136 | .UL p |
| 137 | or |
| 138 | .UL l |
| 139 | to `print' or `list' (as described in the previous section) |
| 140 | the contents of the line: |
| 141 | .P1 |
| 142 | s/this/that/p |
| 143 | s/this/that/l |
| 144 | s/this/that/gp |
| 145 | s/this/that/gl |
| 146 | .P2 |
| 147 | are all legal, and mean slightly different things. |
| 148 | Make sure you know what the differences are. |
| 149 | .PP |
| 150 | Of course, any |
| 151 | .UL s |
| 152 | command can be preceded by one or two `line numbers' |
| 153 | to specify that the substitution is to take place |
| 154 | on a group of lines. |
| 155 | Thus |
| 156 | .P1 |
| 157 | 1,$s/mispell/misspell/ |
| 158 | .P2 |
| 159 | changes the |
| 160 | .ul |
| 161 | first |
| 162 | occurrence of |
| 163 | `mispell' to `misspell' on every line of the file. |
| 164 | But |
| 165 | .P1 |
| 166 | 1,$s/mispell/misspell/g |
| 167 | .P2 |
| 168 | changes |
| 169 | .ul |
| 170 | every |
| 171 | occurrence in every line |
| 172 | (and this is more likely to be what you wanted in this |
| 173 | particular case). |
| 174 | .PP |
| 175 | You should also notice that if you add a |
| 176 | .UL p |
| 177 | or |
| 178 | .UL l |
| 179 | to the end of any of these substitute commands, |
| 180 | only the last line that got changed will be printed, |
| 181 | not all the lines. |
| 182 | We will talk later about how to print all the lines |
| 183 | that were modified. |
| 184 | .SH |
| 185 | The Undo Command `u' |
| 186 | .PP |
| 187 | Occasionally you will make a substitution in a line, |
| 188 | only to realize too late that it was a ghastly mistake. |
| 189 | The `undo' command |
| 190 | .UL u |
| 191 | lets you `undo' the last substitution: |
| 192 | the last line that was substituted can be restored to |
| 193 | its previous state by typing the command |
| 194 | .P1 |
| 195 | u |
| 196 | .P2 |
| 197 | .SH |
| 198 | The Metacharacter `\*.' |
| 199 | .PP |
| 200 | As you have undoubtedly noticed |
| 201 | when you use |
| 202 | .UL ed , |
| 203 | certain characters have unexpected meanings |
| 204 | when they occur in the left side of a substitute command, |
| 205 | or in a search for a particular line. |
| 206 | In the next several sections, we will talk about |
| 207 | these special characters, |
| 208 | which are often called `metacharacters'. |
| 209 | .PP |
| 210 | The first one is the period `\*.'. |
| 211 | On the left side of a substitute command, |
| 212 | or in a search with `/.../', |
| 213 | `\*.' stands for |
| 214 | .ul |
| 215 | any |
| 216 | single character. |
| 217 | Thus the search |
| 218 | .P1 |
| 219 | /x\*.y/ |
| 220 | .P2 |
| 221 | finds any line where `x' and `y' occur separated by |
| 222 | a single character, as in |
| 223 | .P1 |
| 224 | x+y |
| 225 | x\-y |
| 226 | x\*By |
| 227 | x\*.y |
| 228 | .P2 |
| 229 | and so on. |
| 230 | (We will use \*B to stand for a space whenever we need to |
| 231 | make it visible.) |
| 232 | .PP |
| 233 | Since `\*.' matches a single character, |
| 234 | that gives you a way to deal with funny characters |
| 235 | printed by |
| 236 | .UL l . |
| 237 | Suppose you have a line that, when printed with the |
| 238 | .UL l |
| 239 | command, appears as |
| 240 | .P1 |
| 241 | .... th\*e07is .... |
| 242 | .P2 |
| 243 | and you want to get rid of the |
| 244 | \*e07 |
| 245 | (which represents the bell character, by the way). |
| 246 | .PP |
| 247 | The most obvious solution is to try |
| 248 | .P1 |
| 249 | s/\*e07// |
| 250 | .P2 |
| 251 | but this will fail. (Try it.) |
| 252 | The brute force solution, which most people would now take, |
| 253 | is to re-type the entire line. |
| 254 | This is guaranteed, and is actually quite a reasonable tactic |
| 255 | if the line in question isn't too big, |
| 256 | but for a very long line, |
| 257 | re-typing is a bore. |
| 258 | This is where the metacharacter `\*.' comes in handy. |
| 259 | Since `\*e07' really represents a single character, |
| 260 | if we say |
| 261 | .P1 |
| 262 | s/th\*.is/this/ |
| 263 | .P2 |
| 264 | the job is done. |
| 265 | The `\*.' matches the mysterious character between the `h' and the `i', |
| 266 | .ul |
| 267 | whatever it is. |
| 268 | .PP |
| 269 | Bear in mind that since `\*.' matches any single character, |
| 270 | the command |
| 271 | .P1 |
| 272 | s/\*./,/ |
| 273 | .P2 |
| 274 | converts the first character on a line into a `,', |
| 275 | which very often is not what you intended. |
| 276 | .PP |
| 277 | As is true of many characters in |
| 278 | .UL ed , |
| 279 | the `\*.' has several meanings, depending |
| 280 | on its context. |
| 281 | This line shows all three: |
| 282 | .P1 |
| 283 | \&\*.s/\*./\*./ |
| 284 | .P2 |
| 285 | The first `\*.' is a line number, |
| 286 | the number of |
| 287 | the line we are editing, |
| 288 | which is called `line dot'. |
| 289 | (We will discuss line dot more in Section 3.) |
| 290 | The second `\*.' is a metacharacter |
| 291 | that matches any single character on that line. |
| 292 | The third `\*.' is the only one that really is |
| 293 | an honest literal period. |
| 294 | On the |
| 295 | .ul |
| 296 | right |
| 297 | side of a substitution, `\*.' |
| 298 | is not special. |
| 299 | If you apply this command to the line |
| 300 | .P1 |
| 301 | Now is the time\*. |
| 302 | .P2 |
| 303 | the result will |
| 304 | be |
| 305 | .P1 |
| 306 | \&\*.ow is the time\*. |
| 307 | .P2 |
| 308 | which is probably not what you intended. |
| 309 | .SH |
| 310 | The Backslash `\*e' |
| 311 | .PP |
| 312 | Since a period means `any character', |
| 313 | the question naturally arises of what to do |
| 314 | when you really want a period. |
| 315 | For example, how do you convert the line |
| 316 | .P1 |
| 317 | Now is the time\*. |
| 318 | .P2 |
| 319 | into |
| 320 | .P1 |
| 321 | Now is the time? |
| 322 | .P2 |
| 323 | The backslash `\*e' does the job. |
| 324 | A backslash turns off any special meaning that the next character |
| 325 | might have; in particular, |
| 326 | `\*e\*.' converts the `\*.' from a `match anything' |
| 327 | into a period, so |
| 328 | you can use it to replace |
| 329 | the period in |
| 330 | .P1 |
| 331 | Now is the time\*. |
| 332 | .P2 |
| 333 | like this: |
| 334 | .P1 |
| 335 | s/\*e\*./?/ |
| 336 | .P2 |
| 337 | The pair of characters `\*e\*.' is considered by |
| 338 | .UL ed |
| 339 | to be a single real period. |
| 340 | .PP |
| 341 | The backslash can also be used when searching for lines |
| 342 | that contain a special character. |
| 343 | Suppose you are looking for a line that contains |
| 344 | .P1 |
| 345 | \&\*.PP |
| 346 | .P2 |
| 347 | The search |
| 348 | .P1 |
| 349 | /\*.PP/ |
| 350 | .P2 |
| 351 | isn't adequate, for it will find |
| 352 | a line like |
| 353 | .P1 |
| 354 | THE APPLICATION OF ... |
| 355 | .P2 |
| 356 | because the `\*.' matches the letter `A'. |
| 357 | But if you say |
| 358 | .P1 |
| 359 | /\*e\*.PP/ |
| 360 | .P2 |
| 361 | you will find only lines that contain `\*.PP'. |
| 362 | .PP |
| 363 | The backslash can also be used to turn off special meanings for |
| 364 | characters other than `\*.'. |
| 365 | For example, consider finding a line that contains a backslash. |
| 366 | The search |
| 367 | .P1 |
| 368 | /\*e/ |
| 369 | .P2 |
| 370 | won't work, |
| 371 | because the `\*e' isn't a literal `\*e', but instead means that the second `/' |
| 372 | no longer \%delimits the search. |
| 373 | But by preceding a backslash with another one, |
| 374 | you can search for a literal backslash. |
| 375 | Thus |
| 376 | .P1 |
| 377 | /\*e\*e/ |
| 378 | .P2 |
| 379 | does work. |
| 380 | Similarly, you can search for a forward slash `/' with |
| 381 | .P1 |
| 382 | /\*e// |
| 383 | .P2 |
| 384 | The backslash turns off the meaning of the immediately following `/' so that |
| 385 | it doesn't terminate the /.../ construction prematurely. |
| 386 | .PP |
| 387 | As an exercise, before reading further, find two substitute commands each of which will |
| 388 | convert the line |
| 389 | .P1 |
| 390 | \*ex\*e\*.\*ey |
| 391 | .P2 |
| 392 | into the line |
| 393 | .P1 |
| 394 | \*ex\*ey |
| 395 | .P2 |
| 396 | .PP |
| 397 | Here are several solutions; |
| 398 | verify that each works as advertised. |
| 399 | .P1 |
| 400 | s/\*e\*e\*e\*.// |
| 401 | s/x\*.\*./x/ |
| 402 | s/\*.\*.y/y/ |
| 403 | .P2 |
| 404 | .PP |
| 405 | A couple of miscellaneous notes about |
| 406 | backslashes and special characters. |
| 407 | First, you can use any character to delimit the pieces |
| 408 | of an |
| 409 | .UL s |
| 410 | command: there is nothing sacred about slashes. |
| 411 | (But you must use slashes for context searching.) |
| 412 | For instance, in a line that contains a lot of slashes already, like |
| 413 | .P1 |
| 414 | //exec //sys.fort.go // etc... |
| 415 | .P2 |
| 416 | you could use a colon as the delimiter _ |
| 417 | to delete all the slashes, type |
| 418 | .P1 |
| 419 | s:/::g |
| 420 | .P2 |
| 421 | .PP |
| 422 | Second, if # and @ are your character erase and line kill characters, |
| 423 | you have to type \*e# and \*e@; |
| 424 | this is true whether you're talking to |
| 425 | .UL ed |
| 426 | or any other program. |
| 427 | .PP |
| 428 | When you are adding text with |
| 429 | .UL a |
| 430 | or |
| 431 | .UL i |
| 432 | or |
| 433 | .UL c , |
| 434 | backslash is not special, and you should only put in |
| 435 | one backslash for each one you really want. |
| 436 | .SH |
| 437 | The Dollar Sign `$' |
| 438 | .PP |
| 439 | The next metacharacter, the `$', stands for `the end of the line'. |
| 440 | As its most obvious use, suppose you have the line |
| 441 | .P1 |
| 442 | Now is the |
| 443 | .P2 |
| 444 | and you wish to add the word `time' to the end. |
| 445 | Use the $ like this: |
| 446 | .P1 |
| 447 | s/$/\*Btime/ |
| 448 | .P2 |
| 449 | to get |
| 450 | .P1 |
| 451 | Now is the time |
| 452 | .P2 |
| 453 | Notice that a space is needed before `time' in |
| 454 | the substitute command, |
| 455 | or you will get |
| 456 | .P1 |
| 457 | Now is thetime |
| 458 | .P2 |
| 459 | .PP |
| 460 | As another example, replace the second comma in |
| 461 | the following line with a period without altering the first: |
| 462 | .P1 |
| 463 | Now is the time, for all good men, |
| 464 | .P2 |
| 465 | The command needed is |
| 466 | .P1 |
| 467 | s/,$/\*./ |
| 468 | .P2 |
| 469 | The $ sign here provides context to make specific which comma we mean. |
| 470 | Without it, of course, the |
| 471 | .UL s |
| 472 | command would operate on the first comma to produce |
| 473 | .P1 |
| 474 | Now is the time\*. for all good men, |
| 475 | .P2 |
| 476 | .PP |
| 477 | As another example, to convert |
| 478 | .P1 |
| 479 | Now is the time\*. |
| 480 | .P2 |
| 481 | into |
| 482 | .P1 |
| 483 | Now is the time? |
| 484 | .P2 |
| 485 | as we did earlier, we can use |
| 486 | .P1 |
| 487 | s/\*.$/?/ |
| 488 | .P2 |
| 489 | .PP |
| 490 | Like `\*.', the `$' |
| 491 | has multiple meanings depending on context. |
| 492 | In the line |
| 493 | .P1 |
| 494 | $s/$/$/ |
| 495 | .P2 |
| 496 | the first `$' refers to the |
| 497 | last line of the file, |
| 498 | the second refers to the end of that line, |
| 499 | and the third is a literal dollar sign, |
| 500 | to be added to that line. |
| 501 | .SH |
| 502 | The Circumflex `^' |
| 503 | .PP |
| 504 | The circumflex (or hat or caret) |
| 505 | `^' stands for the beginning of the line. |
| 506 | For example, suppose you are looking for a line that begins |
| 507 | with `the'. |
| 508 | If you simply say |
| 509 | .P1 |
| 510 | /the/ |
| 511 | .P2 |
| 512 | you will in all likelihood find several lines that contain `the' in the middle before |
| 513 | arriving at the one you want. |
| 514 | But with |
| 515 | .P1 |
| 516 | /^the/ |
| 517 | .P2 |
| 518 | you narrow the context, and thus arrive at the desired one |
| 519 | more easily. |
| 520 | .PP |
| 521 | The other use of `^' is of course to enable you to insert |
| 522 | something at the beginning of a line: |
| 523 | .P1 |
| 524 | s/^/\*B/ |
| 525 | .P2 |
| 526 | places a space at the beginning of the current line. |
| 527 | .PP |
| 528 | Metacharacters can be combined. To search for a |
| 529 | line that contains |
| 530 | .ul |
| 531 | only |
| 532 | the characters |
| 533 | .P1 |
| 534 | \&\*.PP |
| 535 | .P2 |
| 536 | you can use the command |
| 537 | .P1 |
| 538 | /^\*e\*.PP$/ |
| 539 | .P2 |
| 540 | .SH |
| 541 | The Star `*' |
| 542 | .PP |
| 543 | Suppose you have a line that looks like this: |
| 544 | .P1 |
| 545 | \fItext \fR x y \fI text \fR |
| 546 | .P2 |
| 547 | where |
| 548 | .ul |
| 549 | text |
| 550 | stands |
| 551 | for lots of text, |
| 552 | and there are some indeterminate number of spaces between the |
| 553 | .UL x |
| 554 | and the |
| 555 | .UL y . |
| 556 | Suppose the job is to replace all the spaces between |
| 557 | .UL x |
| 558 | and |
| 559 | .UL y |
| 560 | by a single space. |
| 561 | The line is too long to retype, and there are too many spaces |
| 562 | to count. |
| 563 | What now? |
| 564 | .PP |
| 565 | This is where the metacharacter `*' |
| 566 | comes in handy. |
| 567 | A character followed by a star |
| 568 | stands for as many consecutive occurrences of that |
| 569 | character as possible. |
| 570 | To refer to all the spaces at once, say |
| 571 | .P1 |
| 572 | s/x\*B*y/x\*By/ |
| 573 | .P2 |
| 574 | The construction |
| 575 | `\*B*' |
| 576 | means |
| 577 | `as many spaces as possible'. |
| 578 | Thus `x\*B*y' means `an x, as many spaces as possible, then a y'. |
| 579 | .PP |
| 580 | The star can be used with any character, not just space. |
| 581 | If the original example was instead |
| 582 | .P1 |
| 583 | \fItext \fR x--------y \fI text \fR |
| 584 | .P2 |
| 585 | then all `\-' signs can be replaced by a single space |
| 586 | with the command |
| 587 | .P1 |
| 588 | s/x-*y/x\*By/ |
| 589 | .P2 |
| 590 | .PP |
| 591 | Finally, suppose that the line was |
| 592 | .P1 |
| 593 | \fItext \fR x\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.y \fI text \fR |
| 594 | .P2 |
| 595 | Can you see what trap lies in wait for the unwary? |
| 596 | If you blindly type |
| 597 | .P1 |
| 598 | s/x\*.*y/x\*By/ |
| 599 | .P2 |
| 600 | what will happen? |
| 601 | The answer, naturally, is that it depends. |
| 602 | If there are no other x's or y's on the line, |
| 603 | then everything works, but it's blind luck, not good management. |
| 604 | Remember that `\*.' matches |
| 605 | .ul |
| 606 | any |
| 607 | single character? |
| 608 | Then `\*.*' matches as many single characters as possible, |
| 609 | and unless you're careful, it can eat up a lot more of the line |
| 610 | than you expected. |
| 611 | If the line was, for example, like this: |
| 612 | .P1 |
| 613 | \fItext \fRx\fI text \fR x\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.\*.y \fI text \fRy\fI text \fR |
| 614 | .P2 |
| 615 | then saying |
| 616 | .P1 |
| 617 | s/x\*.*y/x\*By/ |
| 618 | .P2 |
| 619 | will take everything from the |
| 620 | .ul |
| 621 | first |
| 622 | `x' to the |
| 623 | .ul |
| 624 | last |
| 625 | `y', |
| 626 | which, in this example, is undoubtedly more than you wanted. |
| 627 | .PP |
| 628 | The solution, of course, is to turn off the special meaning of |
| 629 | `\*.' with |
| 630 | `\*e\*.': |
| 631 | .P1 |
| 632 | s/x\*e\*.*y/x\*By/ |
| 633 | .P2 |
| 634 | Now everything works, for `\*e\*.*' means `as many |
| 635 | .ul |
| 636 | periods |
| 637 | as possible'. |
| 638 | .PP |
| 639 | There are times when the pattern `\*.*' is exactly what you want. |
| 640 | For example, to change |
| 641 | .P1 |
| 642 | Now is the time for all good men .... |
| 643 | .P2 |
| 644 | into |
| 645 | .P1 |
| 646 | Now is the time\*. |
| 647 | .P2 |
| 648 | use `\*.*' to eat up everything after the `for': |
| 649 | .P1 |
| 650 | s/\*Bfor\*.*/\*./ |
| 651 | .P2 |
| 652 | .PP |
| 653 | There are a couple of additional pitfalls associated with `*' that you should be aware of. |
| 654 | Most notable is the fact that `as many as possible' means |
| 655 | .ul |
| 656 | zero |
| 657 | or more. |
| 658 | The fact that zero is a legitimate possibility is |
| 659 | sometimes rather surprising. |
| 660 | For example, if our line contained |
| 661 | .P1 |
| 662 | \fItext \fR xy \fI text \fR x y \fI text \fR |
| 663 | .P2 |
| 664 | and we said |
| 665 | .P1 |
| 666 | s/x\*B*y/x\*By/ |
| 667 | .P2 |
| 668 | the |
| 669 | .ul |
| 670 | first |
| 671 | `xy' matches this pattern, for it consists of an `x', |
| 672 | zero spaces, and a `y'. |
| 673 | The result is that the substitute acts on the first `xy', |
| 674 | and does not touch the later one that actually contains some intervening spaces. |
| 675 | .PP |
| 676 | The way around this, if it matters, is to specify a pattern like |
| 677 | .P1 |
| 678 | /x\*B\*B*y/ |
| 679 | .P2 |
| 680 | which says `an x, a space, then as many more spaces as possible, then a y', |
| 681 | in other words, one or more spaces. |
| 682 | .PP |
| 683 | The other startling behavior of `*' is again related to the fact |
| 684 | that zero is a legitimate number of occurrences of something |
| 685 | followed by a star. The command |
| 686 | .P1 |
| 687 | s/x*/y/g |
| 688 | .P2 |
| 689 | when applied to the line |
| 690 | .P1 |
| 691 | abcdef |
| 692 | .P2 |
| 693 | produces |
| 694 | .P1 |
| 695 | yaybycydyeyfy |
| 696 | .P2 |
| 697 | which is almost certainly not what was intended. |
| 698 | The reason for this behavior is that zero is a legal number |
| 699 | of matches, |
| 700 | and there are no x's at the beginning of the line |
| 701 | (so that gets converted into a `y'), |
| 702 | nor between the `a' and the `b' |
| 703 | (so that gets converted into a `y'), nor ... |
| 704 | and so on. |
| 705 | Make sure you really want zero matches; |
| 706 | if not, in this case write |
| 707 | .P1 |
| 708 | s/xx*/y/g |
| 709 | .P2 |
| 710 | `xx*' is one or more x's. |
| 711 | .SH |
| 712 | The Brackets `[ ]' |
| 713 | .PP |
| 714 | Suppose that you want to delete any numbers |
| 715 | that appear |
| 716 | at the beginning of all lines of a file. |
| 717 | You might first think of trying a series of commands like |
| 718 | .P1 |
| 719 | 1,$s/^1*// |
| 720 | 1,$s/^2*// |
| 721 | 1,$s/^3*// |
| 722 | .P2 |
| 723 | and so on, |
| 724 | but this is clearly going to take forever if the numbers are at all long. |
| 725 | Unless you want to repeat the commands over and over until |
| 726 | finally all numbers are gone, |
| 727 | you must get all the digits on one pass. |
| 728 | This is the purpose of the brackets [ and ]. |
| 729 | .PP |
| 730 | The construction |
| 731 | .P1 |
| 732 | [0123456789] |
| 733 | .P2 |
| 734 | matches any single digit _ |
| 735 | the whole thing is called a `character class'. |
| 736 | With a character class, the job is easy. |
| 737 | The pattern `[0123456789]*' matches zero or more digits (an entire number), so |
| 738 | .P1 |
| 739 | 1,$s/^[0123456789]*// |
| 740 | .P2 |
| 741 | deletes all digits from the beginning of all lines. |
| 742 | .PP |
| 743 | Any characters can appear within a character class, |
| 744 | and just to confuse the issue there are essentially no special characters |
| 745 | inside the brackets; |
| 746 | even the backslash doesn't have a special meaning. |
| 747 | To search for special characters, for example, you can say |
| 748 | .P1 |
| 749 | /[\*.\*e$^[]/ |
| 750 | .P2 |
| 751 | Within [...], the `[' is not special. |
| 752 | To get a `]' into a character class, |
| 753 | make it the first character. |
| 754 | .PP |
| 755 | It's a nuisance to have to spell out the digits, |
| 756 | so you can abbreviate them as |
| 757 | [0\-9]; |
| 758 | similarly, [a\-z] stands for the lower case letters, |
| 759 | and |
| 760 | [A\-Z] for upper case. |
| 761 | .PP |
| 762 | As a final frill on character classes, you can specify a class |
| 763 | that means `none of the following characters'. |
| 764 | This is done by beginning the class with a `^': |
| 765 | .P1 |
| 766 | [^0-9] |
| 767 | .P2 |
| 768 | stands for `any character |
| 769 | .ul |
| 770 | except |
| 771 | a digit'. |
| 772 | Thus you might find the first line that doesn't begin with a tab or space |
| 773 | by a search like |
| 774 | .P1 |
| 775 | /^[^(space)(tab)]/ |
| 776 | .P2 |
| 777 | .PP |
| 778 | Within a character class, |
| 779 | the circumflex has a special meaning |
| 780 | only if it occurs at the beginning. |
| 781 | Just to convince yourself, verify that |
| 782 | .P1 |
| 783 | /^[^^]/ |
| 784 | .P2 |
| 785 | finds a line that doesn't begin with a circumflex. |
| 786 | .SH |
| 787 | The Ampersand `&' |
| 788 | .PP |
| 789 | The ampersand `&' is used primarily to save typing. |
| 790 | Suppose you have the line |
| 791 | .P1 |
| 792 | Now is the time |
| 793 | .P2 |
| 794 | and you want to make it |
| 795 | .P1 |
| 796 | Now is the best time |
| 797 | .P2 |
| 798 | Of course you can always say |
| 799 | .P1 |
| 800 | s/the/the best/ |
| 801 | .P2 |
| 802 | but it seems silly to have to repeat the `the'. |
| 803 | The `&' is used to eliminate the repetition. |
| 804 | On the |
| 805 | .ul |
| 806 | right |
| 807 | side of a substitute, the ampersand means `whatever |
| 808 | was just matched', so you can say |
| 809 | .P1 |
| 810 | s/the/& best/ |
| 811 | .P2 |
| 812 | and the `&' will stand for `the'. |
| 813 | Of course this isn't much of a saving if the thing |
| 814 | matched is just `the', but if it is something truly long or awful, |
| 815 | or if it is something like `.*' |
| 816 | which matches a lot of text, |
| 817 | you can save some tedious typing. |
| 818 | There is also much less chance of making a typing error |
| 819 | in the replacement text. |
| 820 | For example, to parenthesize a line, |
| 821 | regardless of its length, |
| 822 | .P1 |
| 823 | s/\*.*/(&)/ |
| 824 | .P2 |
| 825 | .PP |
| 826 | The ampersand can occur more than once on the right side: |
| 827 | .P1 |
| 828 | s/the/& best and & worst/ |
| 829 | .P2 |
| 830 | makes |
| 831 | .P1 |
| 832 | Now is the best and the worst time |
| 833 | .P2 |
| 834 | and |
| 835 | .P1 |
| 836 | s/\*.*/&? &!!/ |
| 837 | .P2 |
| 838 | converts the original line into |
| 839 | .P1 |
| 840 | Now is the time? Now is the time!! |
| 841 | .P2 |
| 842 | .PP |
| 843 | To get a literal ampersand, naturally the backslash is used to turn off the special meaning: |
| 844 | .P1 |
| 845 | s/ampersand/\*e&/ |
| 846 | .P2 |
| 847 | converts the word into the symbol. |
| 848 | Notice that `&' is not special on the left side |
| 849 | of a substitute, only on the |
| 850 | .ul |
| 851 | right |
| 852 | side. |
| 853 | .SH |
| 854 | Substituting Newlines |
| 855 | .PP |
| 856 | .UL ed |
| 857 | provides a facility for splitting a single line into two or more shorter lines by `substituting in a newline'. |
| 858 | As the simplest example, suppose a line has gotten unmanageably long |
| 859 | because of editing (or merely because it was unwisely typed). |
| 860 | If it looks like |
| 861 | .P1 |
| 862 | \fItext \fR xy \fI text \fR |
| 863 | .P2 |
| 864 | you can break it between the `x' and the `y' like this: |
| 865 | .P1 |
| 866 | s/xy/x\*e |
| 867 | y/ |
| 868 | .P2 |
| 869 | This is actually a single command, |
| 870 | although it is typed on two lines. |
| 871 | Bearing in mind that `\*e' turns off special meanings, |
| 872 | it seems relatively intuitive that a `\*e' at the end of |
| 873 | a line would make the newline there |
| 874 | no longer special. |
| 875 | .PP |
| 876 | You can in fact make a single line into several lines |
| 877 | with this same mechanism. |
| 878 | As a large example, consider underlining the word `very' |
| 879 | in a long line |
| 880 | by splitting `very' onto a separate line, |
| 881 | and preceding it by the |
| 882 | .UL roff |
| 883 | or |
| 884 | .UL nroff |
| 885 | formatting command `.ul'. |
| 886 | .P1 |
| 887 | \fItext \fR a very big \fI text \fR |
| 888 | .P2 |
| 889 | The command |
| 890 | .P1 |
| 891 | s/\*Bvery\*B/\*e |
| 892 | \&.ul\*e |
| 893 | very\*e |
| 894 | / |
| 895 | .P2 |
| 896 | converts the line into four shorter lines, |
| 897 | preceding the word `very' by the |
| 898 | line |
| 899 | `.ul', |
| 900 | and eliminating the spaces around the `very', |
| 901 | all at the same time. |
| 902 | .PP |
| 903 | When a newline is substituted |
| 904 | in, dot is left pointing at the last line created. |
| 905 | .PP |
| 906 | .SH |
| 907 | Joining Lines |
| 908 | .PP |
| 909 | Lines may also be joined together, |
| 910 | but this is done with the |
| 911 | .UL j |
| 912 | command |
| 913 | instead of |
| 914 | .UL s . |
| 915 | Given the lines |
| 916 | .P1 |
| 917 | Now is |
| 918 | \*Bthe time |
| 919 | .P2 |
| 920 | and supposing that dot is set to the first of them, |
| 921 | then the command |
| 922 | .P1 |
| 923 | j |
| 924 | .P2 |
| 925 | joins them together. |
| 926 | No blanks are added, |
| 927 | which is why we carefully showed a blank |
| 928 | at the beginning of the second line. |
| 929 | .PP |
| 930 | All by itself, |
| 931 | a |
| 932 | .UL j |
| 933 | command |
| 934 | joins line dot to line dot+1, |
| 935 | but any contiguous set of lines can be joined. |
| 936 | Just specify the starting and ending line numbers. |
| 937 | For example, |
| 938 | .P1 |
| 939 | 1,$jp |
| 940 | .P2 |
| 941 | joins all the lines into one big one |
| 942 | and prints it. |
| 943 | (More on line numbers in Section 3.) |
| 944 | .SH |
| 945 | Rearranging a Line with \*e( ... \*e) |
| 946 | .PP |
| 947 | (This section should be skipped on first reading.) |
| 948 | Recall that `&' is a shorthand that stands for whatever |
| 949 | was matched by the left side of an |
| 950 | .UL s |
| 951 | command. |
| 952 | In much the same way you can capture separate pieces |
| 953 | of what was matched; |
| 954 | the only difference is that you have to specify |
| 955 | on the left side just what pieces you're interested in. |
| 956 | .PP |
| 957 | Suppose, for instance, that |
| 958 | you have a file of lines that consist of names in the form |
| 959 | .P1 |
| 960 | Smith, A. B. |
| 961 | Jones, C. |
| 962 | .P2 |
| 963 | and so on, |
| 964 | and you want the initials to precede the name, as in |
| 965 | .P1 |
| 966 | A. B. Smith |
| 967 | C. Jones |
| 968 | .P2 |
| 969 | It is possible to do this with a series of editing commands, |
| 970 | but it is tedious and error-prone. |
| 971 | (It is instructive to figure out how it is done, though.) |
| 972 | .PP |
| 973 | The alternative |
| 974 | is to `tag' the pieces of the pattern (in this case, |
| 975 | the last name, and the initials), |
| 976 | and then rearrange the pieces. |
| 977 | On the left side of a substitution, |
| 978 | if part of the pattern is enclosed between |
| 979 | \*e( and \*e), |
| 980 | whatever matched that part is remembered, |
| 981 | and available for use on the right side. |
| 982 | On the right side, |
| 983 | the symbol `\*e1' refers to whatever |
| 984 | matched the first \*e(...\*e) pair, |
| 985 | `\*e2' to the second \*e(...\*e), |
| 986 | and so on. |
| 987 | .PP |
| 988 | The command |
| 989 | .P1 |
| 990 | 1,$s/^\*e([^,]*\*e),\*B*\*e(\*.*\*e)/\*e2\*B\*e1/ |
| 991 | .P2 |
| 992 | although hard to read, does the job. |
| 993 | The first \*e(...\*e) matches the last name, |
| 994 | which is any string up to the comma; |
| 995 | this is referred to on the right side with `\*e1'. |
| 996 | The second \*e(...\*e) is whatever follows |
| 997 | the comma and any spaces, |
| 998 | and is referred to as `\*e2'. |
| 999 | .PP |
| 1000 | Of course, with any editing sequence this complicated, |
| 1001 | it's foolhardy to simply run it and hope. |
| 1002 | The global commands |
| 1003 | .UL g |
| 1004 | and |
| 1005 | .UL v |
| 1006 | discussed in section 4 |
| 1007 | provide a way for you to print exactly those |
| 1008 | lines which were affected by the |
| 1009 | substitute command, |
| 1010 | and thus verify that it did what you wanted |
| 1011 | in all cases. |