Commit | Line | Data |
---|---|---|
a937b5f0 | 1 | # @(#)TOUR 8.1 (Berkeley) %G% |
6a6fb649 MT |
2 | |
3 | NOTE -- This is the original TOUR paper distributed with ash and | |
4 | does not represent the current state of the shell. It is provided anyway | |
5 | since it provides helpful information for how the shell is structured, | |
6 | but be warned that things have changed -- the current shell is | |
7 | still under development. | |
8 | ||
9 | ================================================================ | |
ca873f9b KB |
10 | |
11 | A Tour through Ash | |
12 | ||
13 | Copyright 1989 by Kenneth Almquist. | |
14 | ||
15 | ||
16 | DIRECTORIES: The subdirectory bltin contains commands which can | |
17 | be compiled stand-alone. The rest of the source is in the main | |
18 | ash directory. | |
19 | ||
20 | SOURCE CODE GENERATORS: Files whose names begin with "mk" are | |
21 | programs that generate source code. A complete list of these | |
22 | programs is: | |
23 | ||
24 | program intput files generates | |
25 | ------- ------------ --------- | |
26 | mkbuiltins builtins builtins.h builtins.c | |
27 | mkinit *.c init.c | |
28 | mknodes nodetypes nodes.h nodes.c | |
29 | mksignames - signames.h signames.c | |
30 | mksyntax - syntax.h syntax.c | |
31 | mktokens - token.def | |
32 | bltin/mkexpr unary_op binary_op operators.h operators.c | |
33 | ||
34 | There are undoubtedly too many of these. Mkinit searches all the | |
35 | C source files for entries looking like: | |
36 | ||
37 | INIT { | |
38 | x = 1; /* executed during initialization */ | |
39 | } | |
40 | ||
41 | RESET { | |
42 | x = 2; /* executed when the shell does a longjmp | |
43 | back to the main command loop */ | |
44 | } | |
45 | ||
46 | SHELLPROC { | |
47 | x = 3; /* executed when the shell runs a shell procedure */ | |
48 | } | |
49 | ||
50 | It pulls this code out into routines which are when particular | |
51 | events occur. The intent is to improve modularity by isolating | |
52 | the information about which modules need to be explicitly | |
53 | initialized/reset within the modules themselves. | |
54 | ||
55 | Mkinit recognizes several constructs for placing declarations in | |
56 | the init.c file. | |
57 | INCLUDE "file.h" | |
58 | includes a file. The storage class MKINIT makes a declaration | |
59 | available in the init.c file, for example: | |
60 | MKINIT int funcnest; /* depth of function calls */ | |
61 | MKINIT alone on a line introduces a structure or union declara- | |
62 | tion: | |
63 | MKINIT | |
64 | struct redirtab { | |
65 | short renamed[10]; | |
66 | }; | |
67 | Preprocessor #define statements are copied to init.c without any | |
68 | special action to request this. | |
69 | ||
70 | INDENTATION: The ash source is indented in multiples of six | |
71 | spaces. The only study that I have heard of on the subject con- | |
72 | cluded that the optimal amount to indent is in the range of four | |
73 | to six spaces. I use six spaces since it is not too big a jump | |
74 | from the widely used eight spaces. If you really hate six space | |
75 | indentation, use the adjind (source included) program to change | |
76 | it to something else. | |
77 | ||
78 | EXCEPTIONS: Code for dealing with exceptions appears in | |
79 | exceptions.c. The C language doesn't include exception handling, | |
80 | so I implement it using setjmp and longjmp. The global variable | |
81 | exception contains the type of exception. EXERROR is raised by | |
82 | calling error. EXINT is an interrupt. EXSHELLPROC is an excep- | |
83 | tion which is raised when a shell procedure is invoked. The pur- | |
84 | pose of EXSHELLPROC is to perform the cleanup actions associated | |
85 | with other exceptions. After these cleanup actions, the shell | |
86 | can interpret a shell procedure itself without exec'ing a new | |
87 | copy of the shell. | |
88 | ||
89 | INTERRUPTS: In an interactive shell, an interrupt will cause an | |
90 | EXINT exception to return to the main command loop. (Exception: | |
91 | EXINT is not raised if the user traps interrupts using the trap | |
92 | command.) The INTOFF and INTON macros (defined in exception.h) | |
93 | provide uninterruptable critical sections. Between the execution | |
94 | of INTOFF and the execution of INTON, interrupt signals will be | |
95 | held for later delivery. INTOFF and INTON can be nested. | |
96 | ||
97 | MEMALLOC.C: Memalloc.c defines versions of malloc and realloc | |
98 | which call error when there is no memory left. It also defines a | |
99 | stack oriented memory allocation scheme. Allocating off a stack | |
100 | is probably more efficient than allocation using malloc, but the | |
101 | big advantage is that when an exception occurs all we have to do | |
102 | to free up the memory in use at the time of the exception is to | |
103 | restore the stack pointer. The stack is implemented using a | |
104 | linked list of blocks. | |
105 | ||
106 | STPUTC: If the stack were contiguous, it would be easy to store | |
107 | strings on the stack without knowing in advance how long the | |
108 | string was going to be: | |
109 | p = stackptr; | |
110 | *p++ = c; /* repeated as many times as needed */ | |
111 | stackptr = p; | |
112 | The folloing three macros (defined in memalloc.h) perform these | |
113 | operations, but grow the stack if you run off the end: | |
114 | STARTSTACKSTR(p); | |
115 | STPUTC(c, p); /* repeated as many times as needed */ | |
116 | grabstackstr(p); | |
117 | ||
118 | We now start a top-down look at the code: | |
119 | ||
120 | MAIN.C: The main routine performs some initialization, executes | |
121 | the user's profile if necessary, and calls cmdloop. Cmdloop is | |
122 | repeatedly parses and executes commands. | |
123 | ||
124 | OPTIONS.C: This file contains the option processing code. It is | |
125 | called from main to parse the shell arguments when the shell is | |
126 | invoked, and it also contains the set builtin. The -i and -j op- | |
127 | tions (the latter turns on job control) require changes in signal | |
128 | handling. The routines setjobctl (in jobs.c) and setinteractive | |
129 | (in trap.c) are called to handle changes to these options. | |
130 | ||
131 | PARSING: The parser code is all in parser.c. A recursive des- | |
132 | cent parser is used. Syntax tables (generated by mksyntax) are | |
133 | used to classify characters during lexical analysis. There are | |
134 | three tables: one for normal use, one for use when inside single | |
135 | quotes, and one for use when inside double quotes. The tables | |
136 | are machine dependent because they are indexed by character vari- | |
137 | ables and the range of a char varies from machine to machine. | |
138 | ||
139 | PARSE OUTPUT: The output of the parser consists of a tree of | |
140 | nodes. The various types of nodes are defined in the file node- | |
141 | types. | |
142 | ||
143 | Nodes of type NARG are used to represent both words and the con- | |
144 | tents of here documents. An early version of ash kept the con- | |
145 | tents of here documents in temporary files, but keeping here do- | |
146 | cuments in memory typically results in significantly better per- | |
147 | formance. It would have been nice to make it an option to use | |
148 | temporary files for here documents, for the benefit of small | |
149 | machines, but the code to keep track of when to delete the tem- | |
150 | porary files was complex and I never fixed all the bugs in it. | |
151 | (AT&T has been maintaining the Bourne shell for more than ten | |
152 | years, and to the best of my knowledge they still haven't gotten | |
153 | it to handle temporary files correctly in obscure cases.) | |
154 | ||
155 | The text field of a NARG structure points to the text of the | |
156 | word. The text consists of ordinary characters and a number of | |
157 | special codes defined in parser.h. The special codes are: | |
158 | ||
159 | CTLVAR Variable substitution | |
160 | CTLENDVAR End of variable substitution | |
161 | CTLBACKQ Command substitution | |
162 | CTLBACKQ|CTLQUOTE Command substitution inside double quotes | |
163 | CTLESC Escape next character | |
164 | ||
165 | A variable substitution contains the following elements: | |
166 | ||
167 | CTLVAR type name '=' [ alternative-text CTLENDVAR ] | |
168 | ||
169 | The type field is a single character specifying the type of sub- | |
170 | stitution. The possible types are: | |
171 | ||
172 | VSNORMAL $var | |
173 | VSMINUS ${var-text} | |
174 | VSMINUS|VSNUL ${var:-text} | |
175 | VSPLUS ${var+text} | |
176 | VSPLUS|VSNUL ${var:+text} | |
177 | VSQUESTION ${var?text} | |
178 | VSQUESTION|VSNUL ${var:?text} | |
179 | VSASSIGN ${var=text} | |
180 | VSASSIGN|VSNUL ${var=text} | |
181 | ||
182 | In addition, the type field will have the VSQUOTE flag set if the | |
183 | variable is enclosed in double quotes. The name of the variable | |
184 | comes next, terminated by an equals sign. If the type is not | |
185 | VSNORMAL, then the text field in the substitution follows, ter- | |
186 | minated by a CTLENDVAR byte. | |
187 | ||
188 | Commands in back quotes are parsed and stored in a linked list. | |
189 | The locations of these commands in the string are indicated by | |
190 | CTLBACKQ and CTLBACKQ+CTLQUOTE characters, depending upon whether | |
191 | the back quotes were enclosed in double quotes. | |
192 | ||
193 | The character CTLESC escapes the next character, so that in case | |
194 | any of the CTL characters mentioned above appear in the input, | |
195 | they can be passed through transparently. CTLESC is also used to | |
196 | escape '*', '?', '[', and '!' characters which were quoted by the | |
197 | user and thus should not be used for file name generation. | |
198 | ||
199 | CTLESC characters have proved to be particularly tricky to get | |
200 | right. In the case of here documents which are not subject to | |
201 | variable and command substitution, the parser doesn't insert any | |
202 | CTLESC characters to begin with (so the contents of the text | |
203 | field can be written without any processing). Other here docu- | |
204 | ments, and words which are not subject to splitting and file name | |
205 | generation, have the CTLESC characters removed during the vari- | |
206 | able and command substitution phase. Words which are subject | |
207 | splitting and file name generation have the CTLESC characters re- | |
208 | moved as part of the file name phase. | |
209 | ||
210 | EXECUTION: Command execution is handled by the following files: | |
211 | eval.c The top level routines. | |
212 | redir.c Code to handle redirection of input and output. | |
213 | jobs.c Code to handle forking, waiting, and job control. | |
214 | exec.c Code to to path searches and the actual exec sys call. | |
215 | expand.c Code to evaluate arguments. | |
216 | var.c Maintains the variable symbol table. Called from expand.c. | |
217 | ||
218 | EVAL.C: Evaltree recursively executes a parse tree. The exit | |
219 | status is returned in the global variable exitstatus. The alter- | |
220 | native entry evalbackcmd is called to evaluate commands in back | |
221 | quotes. It saves the result in memory if the command is a buil- | |
222 | tin; otherwise it forks off a child to execute the command and | |
223 | connects the standard output of the child to a pipe. | |
224 | ||
225 | JOBS.C: To create a process, you call makejob to return a job | |
226 | structure, and then call forkshell (passing the job structure as | |
227 | an argument) to create the process. Waitforjob waits for a job | |
228 | to complete. These routines take care of process groups if job | |
229 | control is defined. | |
230 | ||
231 | REDIR.C: Ash allows file descriptors to be redirected and then | |
232 | restored without forking off a child process. This is accom- | |
233 | plished by duplicating the original file descriptors. The redir- | |
234 | tab structure records where the file descriptors have be dupli- | |
235 | cated to. | |
236 | ||
237 | EXEC.C: The routine find_command locates a command, and enters | |
238 | the command in the hash table if it is not already there. The | |
239 | third argument specifies whether it is to print an error message | |
240 | if the command is not found. (When a pipeline is set up, | |
241 | find_command is called for all the commands in the pipeline be- | |
242 | fore any forking is done, so to get the commands into the hash | |
243 | table of the parent process. But to make command hashing as | |
244 | transparent as possible, we silently ignore errors at that point | |
245 | and only print error messages if the command cannot be found | |
246 | later.) | |
247 | ||
248 | The routine shellexec is the interface to the exec system call. | |
249 | ||
250 | EXPAND.C: Arguments are processed in three passes. The first | |
251 | (performed by the routine argstr) performs variable and command | |
252 | substitution. The second (ifsbreakup) performs word splitting | |
253 | and the third (expandmeta) performs file name generation. If the | |
254 | "/u" directory is simulated, then when "/u/username" is replaced | |
255 | by the user's home directory, the flag "didudir" is set. This | |
256 | tells the cd command that it should print out the directory name, | |
257 | just as it would if the "/u" directory were implemented using | |
258 | symbolic links. | |
259 | ||
260 | VAR.C: Variables are stored in a hash table. Probably we should | |
261 | switch to extensible hashing. The variable name is stored in the | |
262 | same string as the value (using the format "name=value") so that | |
263 | no string copying is needed to create the environment of a com- | |
264 | mand. Variables which the shell references internally are preal- | |
265 | located so that the shell can reference the values of these vari- | |
266 | ables without doing a lookup. | |
267 | ||
268 | When a program is run, the code in eval.c sticks any environment | |
269 | variables which precede the command (as in "PATH=xxx command") in | |
270 | the variable table as the simplest way to strip duplicates, and | |
271 | then calls "environment" to get the value of the environment. | |
272 | There are two consequences of this. First, if an assignment to | |
273 | PATH precedes the command, the value of PATH before the assign- | |
274 | ment must be remembered and passed to shellexec. Second, if the | |
275 | program turns out to be a shell procedure, the strings from the | |
276 | environment variables which preceded the command must be pulled | |
277 | out of the table and replaced with strings obtained from malloc, | |
278 | since the former will automatically be freed when the stack (see | |
279 | the entry on memalloc.c) is emptied. | |
280 | ||
281 | BUILTIN COMMANDS: The procedures for handling these are scat- | |
282 | tered throughout the code, depending on which location appears | |
283 | most appropriate. They can be recognized because their names al- | |
284 | ways end in "cmd". The mapping from names to procedures is | |
285 | specified in the file builtins, which is processed by the mkbuil- | |
286 | tins command. | |
287 | ||
288 | A builtin command is invoked with argc and argv set up like a | |
289 | normal program. A builtin command is allowed to overwrite its | |
290 | arguments. Builtin routines can call nextopt to do option pars- | |
291 | ing. This is kind of like getopt, but you don't pass argc and | |
292 | argv to it. Builtin routines can also call error. This routine | |
293 | normally terminates the shell (or returns to the main command | |
294 | loop if the shell is interactive), but when called from a builtin | |
295 | command it causes the builtin command to terminate with an exit | |
296 | status of 2. | |
297 | ||
298 | The directory bltins contains commands which can be compiled in- | |
299 | dependently but can also be built into the shell for efficiency | |
300 | reasons. The makefile in this directory compiles these programs | |
301 | in the normal fashion (so that they can be run regardless of | |
302 | whether the invoker is ash), but also creates a library named | |
303 | bltinlib.a which can be linked with ash. The header file bltin.h | |
304 | takes care of most of the differences between the ash and the | |
305 | stand-alone environment. The user should call the main routine | |
306 | "main", and #define main to be the name of the routine to use | |
307 | when the program is linked into ash. This #define should appear | |
308 | before bltin.h is included; bltin.h will #undef main if the pro- | |
309 | gram is to be compiled stand-alone. | |
310 | ||
311 | CD.C: This file defines the cd and pwd builtins. The pwd com- | |
312 | mand runs /bin/pwd the first time it is invoked (unless the user | |
313 | has already done a cd to an absolute pathname), but then | |
314 | remembers the current directory and updates it when the cd com- | |
315 | mand is run, so subsequent pwd commands run very fast. The main | |
316 | complication in the cd command is in the docd command, which | |
317 | resolves symbolic links into actual names and informs the user | |
318 | where the user ended up if he crossed a symbolic link. | |
319 | ||
320 | SIGNALS: Trap.c implements the trap command. The routine set- | |
321 | signal figures out what action should be taken when a signal is | |
322 | received and invokes the signal system call to set the signal ac- | |
323 | tion appropriately. When a signal that a user has set a trap for | |
324 | is caught, the routine "onsig" sets a flag. The routine dotrap | |
325 | is called at appropriate points to actually handle the signal. | |
326 | When an interrupt is caught and no trap has been set for that | |
327 | signal, the routine "onint" in error.c is called. | |
328 | ||
329 | OUTPUT: Ash uses it's own output routines. There are three out- | |
330 | put structures allocated. "Output" represents the standard out- | |
331 | put, "errout" the standard error, and "memout" contains output | |
332 | which is to be stored in memory. This last is used when a buil- | |
333 | tin command appears in backquotes, to allow its output to be col- | |
334 | lected without doing any I/O through the UNIX operating system. | |
335 | The variables out1 and out2 normally point to output and errout, | |
336 | respectively, but they are set to point to memout when appropri- | |
337 | ate inside backquotes. | |
338 | ||
339 | INPUT: The basic input routine is pgetc, which reads from the | |
340 | current input file. There is a stack of input files; the current | |
341 | input file is the top file on this stack. The code allows the | |
342 | input to come from a string rather than a file. (This is for the | |
343 | -c option and the "." and eval builtin commands.) The global | |
344 | variable plinno is saved and restored when files are pushed and | |
345 | popped from the stack. The parser routines store the number of | |
346 | the current line in this variable. | |
347 | ||
348 | DEBUGGING: If DEBUG is defined in shell.h, then the shell will | |
349 | write debugging information to the file $HOME/trace. Most of | |
350 | this is done using the TRACE macro, which takes a set of printf | |
351 | arguments inside two sets of parenthesis. Example: | |
352 | "TRACE(("n=%d0, n))". The double parenthesis are necessary be- | |
353 | cause the preprocessor can't handle functions with a variable | |
354 | number of arguments. Defining DEBUG also causes the shell to | |
355 | generate a core dump if it is sent a quit signal. The tracing | |
356 | code is in show.c. |