Commit | Line | Data |
---|---|---|
5b5469b3 KB |
1 | This is a nearly-public-domain reimplementation of the V8 regexp(3) package. |
2 | It gives C programs the ability to use egrep-style regular expressions, and | |
3 | does it in a much cleaner fashion than the analogous routines in SysV. | |
4 | ||
5 | Copyright (c) 1986 by University of Toronto. | |
6 | Written by Henry Spencer. Not derived from licensed software. | |
7 | ||
8 | Permission is granted to anyone to use this software for any | |
9 | purpose on any computer system, and to redistribute it freely, | |
10 | subject to the following restrictions: | |
11 | ||
12 | 1. The author is not responsible for the consequences of use of | |
13 | this software, no matter how awful, even if they arise | |
14 | from defects in it. | |
15 | ||
16 | 2. The origin of this software must not be misrepresented, either | |
17 | by explicit claim or by omission. | |
18 | ||
19 | 3. Altered versions must be plainly marked as such, and must not | |
20 | be misrepresented as being the original software. | |
21 | ||
22 | Barring a couple of small items in the BUGS list, this implementation is | |
23 | believed 100% compatible with V8. It should even be binary-compatible, | |
24 | sort of, since the only fields in a "struct regexp" that other people have | |
25 | any business touching are declared in exactly the same way at the same | |
26 | location in the struct (the beginning). | |
27 | ||
28 | This implementation is *NOT* AT&T/Bell code, and is not derived from licensed | |
29 | software. Even though U of T is a V8 licensee. This software is based on | |
30 | a V8 manual page sent to me by Dennis Ritchie (the manual page enclosed | |
31 | here is a complete rewrite and hence is not covered by AT&T copyright). | |
32 | The software was nearly complete at the time of arrival of our V8 tape. | |
33 | I haven't even looked at V8 yet, although a friend elsewhere at U of T has | |
34 | been kind enough to run a few test programs using the V8 regexp(3) to resolve | |
35 | a few fine points. I admit to some familiarity with regular-expression | |
36 | implementations of the past, but the only one that this code traces any | |
37 | ancestry to is the one published in Kernighan & Plauger (from which this | |
38 | one draws ideas but not code). | |
39 | ||
40 | Simplistically: put this stuff into a source directory, copy regexp.h into | |
41 | /usr/include, inspect Makefile for compilation options that need changing | |
42 | to suit your local environment, and then do "make r". This compiles the | |
43 | regexp(3) functions, compiles a test program, and runs a large set of | |
44 | regression tests. If there are no complaints, then put regexp.o, regsub.o, | |
45 | and regerror.o into your C library, and regexp.3 into your manual-pages | |
46 | directory. | |
47 | ||
48 | Note that if you don't put regexp.h into /usr/include *before* compiling, | |
49 | you'll have to add "-I." to CFLAGS before compiling. | |
50 | ||
51 | The files are: | |
52 | ||
53 | Makefile instructions to make everything | |
54 | regexp.3 manual page | |
55 | regexp.h header file, for /usr/include | |
56 | regexp.c source for regcomp() and regexec() | |
57 | regsub.c source for regsub() | |
58 | regerror.c source for default regerror() | |
59 | regmagic.h internal header file | |
60 | try.c source for test program | |
61 | timer.c source for timing program | |
62 | tests test list for try and timer | |
63 | ||
64 | This implementation uses nondeterministic automata rather than the | |
65 | deterministic ones found in some other implementations, which makes it | |
66 | simpler, smaller, and faster at compiling regular expressions, but slower | |
67 | at executing them. In theory, anyway. This implementation does employ | |
68 | some special-case optimizations to make the simpler cases (which do make | |
69 | up the bulk of regular expressions actually used) run quickly. In general, | |
70 | if you want blazing speed you're in the wrong place. Replacing the insides | |
71 | of egrep with this stuff is probably a mistake; if you want your own egrep | |
72 | you're going to have to do a lot more work. But if you want to use regular | |
73 | expressions a little bit in something else, you're in luck. Note that many | |
74 | existing text editors use nondeterministic regular-expression implementations, | |
75 | so you're in good company. | |
76 | ||
77 | This stuff should be pretty portable, given appropriate option settings. | |
78 | If your chars have less than 8 bits, you're going to have to change the | |
79 | internal representation of the automaton, although knowledge of the details | |
80 | of this is fairly localized. There are no "reserved" char values except for | |
81 | NUL, and no special significance is attached to the top bit of chars. | |
82 | The string(3) functions are used a fair bit, on the grounds that they are | |
83 | probably faster than coding the operations in line. Some attempts at code | |
84 | tuning have been made, but this is invariably a bit machine-specific. |