Commit | Line | Data |
---|---|---|
f2a0d81d C |
1 | .... refer | tbl | nroff -ms |
2 | .de UC | |
3 | \\s-2\\$1\\s0\\$2 | |
4 | .. | |
5 | .ds . \&\s+2.\s0 | |
6 | .if t .ds -- \(em | |
7 | .if n .ds -- -- | |
8 | .TR 69 | |
9 | .TM 77-1274-17 39199 39199-11 | |
10 | .ND October 27, 1977 | |
11 | .ND June 21, 1978 | |
12 | .TL | |
13 | Some Applications of Inverted Indexes on the UNIX System | |
14 | .AU "MH 2C-572" 6377 | |
15 | M. E. Lesk | |
16 | .AI | |
17 | .MH | |
18 | .AB | |
19 | .LP | |
20 | .ft B | |
21 | I. Some Applications of Inverted Indexes \- Overview | |
22 | .ft R | |
23 | .PP | |
24 | This memorandum describes a set of programs which | |
25 | make inverted indexes to | |
26 | UNIX* | |
27 | text files, and their | |
28 | application to | |
29 | retrieving and formatting citations for documents prepared using | |
30 | .I troff. | |
31 | .PP | |
32 | The indexing and searching programs make keyword | |
33 | indexes to volumes of material too large for linear searching. | |
34 | Searches for combinations of single words can be performed quickly. | |
35 | The programs for general searching are divided into | |
36 | two phases. The first makes an index from the original | |
37 | data; the second searches the index and retrieves | |
38 | items. | |
39 | Both of these phases are further divided into two parts | |
40 | to separate the data-dependent and algorithm dependent | |
41 | code. | |
42 | .PP | |
43 | The major current application of these programs is | |
44 | the | |
45 | .I troff | |
46 | preprocessor | |
47 | .I refer. | |
48 | A list of 4300 references is maintained on line, | |
49 | containing primarily papers written and cited by | |
50 | local authors. | |
51 | Whenever one of these references is required | |
52 | in a paper, a few words from the title or author list | |
53 | will retrieve it, and the user need not bother to re-enter | |
54 | the exact citation. | |
55 | Alternatively, authors can use their own lists of papers. | |
56 | .PP | |
57 | This memorandum is of interest to | |
58 | those who are interested in facilities for searching large | |
59 | but relatively unchanging text files on | |
60 | the | |
61 | UNIX | |
62 | system, | |
63 | and those who are interested in handling bibliographic | |
64 | citations with | |
65 | UNIX | |
66 | .I troff. | |
67 | .LP | |
68 | .ft B | |
69 | II. Updating Publication Lists | |
70 | .PP | |
71 | This section is a brief note describing the | |
72 | auxiliary programs for managing the updating | |
73 | processing. | |
74 | It is written to aid clerical users in | |
75 | maintaining lists of references. | |
76 | Primarily, the programs described permit a large | |
77 | amount of individual control over the content | |
78 | of publication lists while retaining the | |
79 | usefulness of the files to other users. | |
80 | .LP | |
81 | .ft B | |
82 | III. Manual Pages | |
83 | .PP | |
84 | This section contains the pages from the | |
85 | UNIX programmer's manual | |
86 | dealing with these commands. | |
87 | It is useful for reference. | |
88 | .sp | |
89 | \l'3i' | |
90 | .br | |
91 | * UNIX is a Trademark of Bell Laboratories. | |
92 | .AE | |
93 | .CS 10 4 14 0 0 4 | |
94 | .NH | |
95 | Introduction. | |
96 | .PP | |
97 | The | |
98 | .UX | |
99 | system | |
100 | has many utilities | |
101 | (e.g. \fIgrep, awk, lex, egrep, fgrep, ...\fR) | |
102 | to search through files of text, | |
103 | but most of them are based on a linear scan through the | |
104 | entire file, using some deterministic automaton. | |
105 | .ev 1 | |
106 | .ps 8 | |
107 | .vs 10p | |
108 | .ev | |
109 | This memorandum discusses a program which uses inverted | |
110 | indexes | |
111 | .[ | |
112 | %A D. Knuth | |
113 | %T The Art of Computer Programming: Vol. 3, Sorting and Searching | |
114 | %I Addison-Wesley | |
115 | %C Reading, Mass. | |
116 | %D 1977 | |
117 | %O See section 6.5. | |
118 | .] | |
119 | and can thus be used on much larger data bases. | |
120 | .PP | |
121 | As with any indexing system, of course, there are some disadvantages; | |
122 | once an index is made, the files that have been indexed can not be changed | |
123 | without remaking the index. | |
124 | Thus applications are restricted | |
125 | to those making many searches | |
126 | of relatively stable data. | |
127 | Furthermore, these programs depend on hashing, and can only | |
128 | search for exact matches of whole keywords. | |
129 | It is not possible to look for | |
130 | arithmetic or logical expressions (e.g. ``date greater than 1970'') or | |
131 | for regular expression searching such as that in | |
132 | .I lex. | |
133 | .[ | |
134 | lex lesk cstr | |
135 | .] | |
136 | .PP | |
137 | Currently there are two uses of this software, | |
138 | the | |
139 | .I refer | |
140 | preprocessor to format references, | |
141 | and the | |
142 | .I lookall | |
143 | command to search through all text files on | |
144 | the | |
145 | .UX | |
146 | system. | |
147 | .PP | |
148 | The remaining sections of this memorandum discuss | |
149 | the searching programs and their uses. | |
150 | Section 2 explains the operation of the searching algorithm and describes | |
151 | the data collected for use with the | |
152 | .I lookall | |
153 | command. | |
154 | The more important application, | |
155 | .I refer | |
156 | has a user's description in section 3. | |
157 | Section 4 goes into more detail on | |
158 | reference files | |
159 | for the benefit of those who | |
160 | wish to add references to data bases or | |
161 | write new | |
162 | .I troff | |
163 | macros for use with | |
164 | .I refer. | |
165 | The options to make | |
166 | .I refer | |
167 | collect identical citations, or otherwise relocate and adjust references, | |
168 | are described in section 5. | |
169 | The | |
170 | .UX | |
171 | manual sections for | |
172 | .I "refer, lookall," | |
173 | and associated commands are attached as appendices. | |
174 | .NH | |
175 | Searching. | |
176 | .PP | |
177 | The indexing and searching process is divided into two phases, | |
178 | each made of two parts. | |
179 | These are | |
180 | shown below. | |
181 | .IP A. | |
182 | Construct the index. | |
183 | .RS | |
184 | .IP (1) | |
185 | Find keys \*(-- turn the input files into a sequence of tags and keys, | |
186 | where each tag identifies a distinct item in the input | |
187 | and the keys for each such item are the strings under which it is | |
188 | to be indexed. | |
189 | .IP (2) | |
190 | Hash and sort \*(-- | |
191 | prepare a set of inverted indexes from which, given a set of keys, | |
192 | the appropriate item tags can be found quickly. | |
193 | .RE | |
194 | .IP B. | |
195 | Retrieve an item in response to a query. | |
196 | .RS | |
197 | .IP (3) | |
198 | Search \*(-- | |
199 | Given some keys, look through the files prepared by the hashing | |
200 | and sorting facility and derive the appropriate tags. | |
201 | .IP (4) | |
202 | Deliver \*(-- | |
203 | Given the tags, find the original items. This completes the | |
204 | searching process. | |
205 | .RE | |
206 | .LP | |
207 | The first phase, making the index, is presumably done relatively infrequently. | |
208 | It should, of course, be done whenever the data being | |
209 | indexed change. | |
210 | In contrast, the second phase, retrieving items, | |
211 | is presumably done often, and must be rapid. | |
212 | .PP | |
213 | An effort is made to separate code which depends on the data | |
214 | being handled from code which depends on the searching procedure. | |
215 | The search algorithm is involved only in programs | |
216 | (2) and (3), while knowledge of the actual data files is | |
217 | needed only by programs (1) and (4). | |
218 | Thus it is easy to adapt to different data files or different | |
219 | search algorithms. | |
220 | .PP | |
221 | To start with, it is necessary to have some way of selecting | |
222 | or generating keys from input files. | |
223 | For dealing with files that are basically English, we have | |
224 | a key-making program which automatically selects words | |
225 | and passes them to the hashing and sorting program (step 2). | |
226 | The format used has one line for each input item, | |
227 | arranged | |
228 | as follows: | |
229 | .DS | |
230 | name:start,length (tab) key1 key2 key3 ... | |
231 | .DE | |
232 | where | |
233 | .I name | |
234 | is the file name, | |
235 | .I start | |
236 | is the starting byte number, | |
237 | and | |
238 | .I length | |
239 | is the number of bytes in the entry. | |
240 | .PP | |
241 | These lines are the only input used to make the | |
242 | index. | |
243 | The first field (the file name, byte position, and byte count) | |
244 | is the tag of the item | |
245 | and can be used to retrieve it quickly. | |
246 | Normally, an item is either a whole file or a section of a file | |
247 | delimited by blank lines. | |
248 | After the tab, the second field contains the keys. | |
249 | The keys, if selected by the automatic program, are | |
250 | any alphanumeric strings which | |
251 | are not among the 100 most frequent words in English | |
252 | and which are not entirely numeric (except for four-digit | |
253 | numbers beginning 19, which are accepted as dates). | |
254 | Keys are truncated to six characters and converted to lower case. | |
255 | Some selection is needed if the original items are very large. | |
256 | We normally just take the first | |
257 | .I n | |
258 | keys, with | |
259 | .I n | |
260 | less than 100 or so; this replaces any attempt at intelligent selection. | |
261 | One file in our system is | |
262 | a complete English dictionary; it would presumably be retrieved for all queries. | |
263 | .PP | |
264 | To generate an inverted index to the list of record tags and keys, | |
265 | the keys | |
266 | are hashed | |
267 | and sorted to produce an index. | |
268 | What is wanted, ideally, is a series of lists showing the tags associated | |
269 | with each key. | |
270 | To condense this, | |
271 | what is actually produced is a list showing the tags associated | |
272 | with each hash code, and thus with some set of keys. | |
273 | To speed up access and further save space, | |
274 | a set of three or possibly four files is produced. | |
275 | These files are: | |
276 | .KS | |
277 | .bd 2 2 | |
278 | .TS | |
279 | center; | |
280 | c c | |
281 | lI l. | |
282 | File Contents | |
283 | entry Pointers to posting file | |
284 | for each hash code | |
285 | posting Lists of tag pointers for | |
286 | each hash code | |
287 | tag Tags for each item | |
288 | key Keys for each item | |
289 | (optional) | |
290 | .TE | |
291 | .bd 2 | |
292 | .KE | |
293 | The posting file comprises the real data: it contains a sequence of lists | |
294 | of items posted under each hash code. To speed up searching, | |
295 | the entry file is an array of pointers into the posting file, one per potential | |
296 | hash code. | |
297 | Furthermore, the items in the lists in the posting file are not referred to by their | |
298 | complete tag, but just by an address in the tag file, which | |
299 | gives the complete tags. | |
300 | The key file is optional and contains a copy of the keys | |
301 | used in the indexing. | |
302 | .PP | |
303 | The searching process starts with a query, containing several keys. | |
304 | The goal is to obtain all items which were indexed under these keys. | |
305 | The query keys are hashed, and the pointers in the entry file used | |
306 | to access the lists in the posting file. These lists | |
307 | are addresses in the tag file of documents posted under the | |
308 | hash codes derived from the query. | |
309 | The common items from all lists are determined; | |
310 | this must include the items indexed by every key, but may also | |
311 | contain some items which are false drops, since items referenced by | |
312 | the correct hash codes need not actually have contained the correct keys. | |
313 | Normally, if there are several keys in the query, there are not | |
314 | likely to be many false drops in the final combined list even though | |
315 | each hash code is somewhat ambiguous. | |
316 | The actual tags are then obtained from the tag file, and to guard against | |
317 | the possibility that an item has false-dropped on some hash code | |
318 | in the query, the original items are normally obtained from the delivery | |
319 | program (4) and the query keys checked against them | |
320 | by string comparison. | |
321 | .PP | |
322 | Usually, therefore, the check for bad drops is made against the original file. | |
323 | However, if the key derivation procedure is complex, it may be preferable | |
324 | to check against the keys fed to program (2). | |
325 | In this case the optional key file which contains the | |
326 | keys associated with each item is generated, and the item tag is supplemented | |
327 | by a string | |
328 | .DS | |
329 | ;start,length | |
330 | .DE | |
331 | which indicates the starting byte number in the key file and the length of | |
332 | the string of keys for each item. | |
333 | This file is not usually necessary with the present | |
334 | key-selection program, since the keys | |
335 | always appear in the original document. | |
336 | .PP | |
337 | There is also an option | |
338 | (\f3-C\f2n\|\f1) | |
339 | for coordination level searching. | |
340 | This retrieves items which match all but | |
341 | .I n | |
342 | of the query keys. | |
343 | The items are retrieved in the order of the number | |
344 | of keys that they match. | |
345 | Of course, | |
346 | .I n | |
347 | must be less than the number of query keys (nothing is | |
348 | retrieved unless it matches at least one key). | |
349 | .PP | |
350 | As an example, consider one set of 4377 references, comprising | |
351 | 660,000 bytes. | |
352 | This included 51,000 keys, of which 5,900 were distinct | |
353 | keys. | |
354 | The hash table is kept full to save space (at the expense of time); | |
355 | 995 of 997 possible hash codes were used. | |
356 | The total set of index files (no key file) included 171,000 bytes, | |
357 | about 26% of the original file size. | |
358 | It took 8 minutes of processor time to | |
359 | hash, sort, and write the index. | |
360 | To search for a single query with the resulting index took 1.9 seconds | |
361 | of processor time, | |
362 | while to find the same paper | |
363 | with a sequential linear search | |
364 | using | |
365 | .I grep | |
366 | (reading all of the tags and keys) | |
367 | took 12.3 seconds of processor time. | |
368 | .PP | |
369 | We have also used this software to index all of the English stored on our | |
370 | .UX | |
371 | system. | |
372 | This is the index searched by the | |
373 | .I lookall | |
374 | command. | |
375 | On a typical day there were | |
376 | 29,000 files in our user file system, containing about 152,000,000 | |
377 | bytes. | |
378 | Of these | |
379 | 5,300 files, containing 32,000,000 bytes (about 21%) | |
380 | were English text. | |
381 | The total number of `words' (determined mechanically) | |
382 | was 5,100,000. | |
383 | Of these 227,000 were selected as keys; | |
384 | 19,000 were distinct, hashing to 4,900 (of 5,000 possible) different hash codes. | |
385 | The | |
386 | resulting inverted file indexes used 845,000 bytes, or about | |
387 | 2.6% of the size of the original files. | |
388 | The particularly small indexes are caused by the | |
389 | fact that keys are taken from only the first 50 non-common words of | |
390 | some very long input files. | |
391 | .PP | |
392 | Even this large \f2lookall\f1 index can be searched quickly. | |
393 | For example, to find this document | |
394 | by looking for the keys | |
395 | ``lesk inverted indexes'' | |
396 | required | |
397 | 1.7 seconds of processor time | |
398 | and system time. | |
399 | By comparison, just to search the 800,000 byte dictionary (smaller than even | |
400 | the inverted indexes, let alone the 27,000,000 bytes of text files) with | |
401 | .I grep | |
402 | takes 29 seconds of processor time. | |
403 | The | |
404 | .I lookall | |
405 | program is thus useful when looking for a document which you believe | |
406 | is stored on-line, but do not know where. For example, many memos | |
407 | from our center are in the file system, but it is often | |
408 | difficult to guess where a particular memo might be (it might have several | |
409 | authors, each with many directories, and have been worked on by | |
410 | a secretary with yet more directories). | |
411 | Instructions for the use of the | |
412 | .I lookall | |
413 | command are given in the manual section, shown | |
414 | in the appendix to this memorandum. | |
415 | .PP | |
416 | The only indexes maintained routinely are those of publication lists and | |
417 | all English files. | |
418 | To make other indexes, the programs for making keys, sorting them, | |
419 | searching the indexes, and delivering answers must be used. | |
420 | Since they are usually invoked as parts of higher-level commands, | |
421 | they are not in the default command | |
422 | directory, but are available to any user in the directory | |
423 | .I /usr/lib/refer . | |
424 | Three programs are of interest: | |
425 | .I mkey , | |
426 | which isolates keys from input files; | |
427 | .I inv , | |
428 | which makes an index from a set of keys; | |
429 | and | |
430 | .I hunt , | |
431 | which searches the index and delivers the items. | |
432 | Note that the two parts of the retrieval phase are combined into | |
433 | one program, to avoid the excessive system work and delay which | |
434 | would result from running these as separate processes. | |
435 | .PP | |
436 | These three commands have a large number of options to adapt to different | |
437 | kinds of input. | |
438 | The user not interested in the detailed description that now follows may | |
439 | skip to section 3, which describes the | |
440 | .I refer | |
441 | program, a packaged-up version of these tools specifically | |
442 | oriented towards formatting references. | |
443 | .PP | |
444 | .B | |
445 | Make Keys. | |
446 | .R | |
447 | The program | |
448 | .I mkey | |
449 | is the key-making program corresponding to step (1) in phase A. | |
450 | Normally, it reads its input from the file names given as arguments, | |
451 | and if there are no arguments it reads from the standard input. | |
452 | It assumes that blank lines in the input delimit | |
453 | separate items, for each of which a different line of | |
454 | keys should be generated. | |
455 | The lines of keys are written on the standard output. | |
456 | Keys are any alphanumeric string in the input not | |
457 | among the most frequent words in English and not entirely numeric | |
458 | (except that all-numeric strings are acceptable if they | |
459 | are between 1900 and 1999). | |
460 | In the output, keys are translated to lower case, and truncated | |
461 | to six characters in length; any associated punctuation is removed. | |
462 | The following flag arguments are recognized by | |
463 | .I mkey: | |
464 | .TS | |
465 | center; | |
466 | lB lw(4i). | |
467 | \-c \f2name T{ | |
468 | Name of file of common words; | |
469 | default is | |
470 | .I /usr/lib/eign. | |
471 | T} | |
472 | \-f \f2name T{ | |
473 | Read a list of files from | |
474 | .I name | |
475 | and take each as an input argument. | |
476 | T} | |
477 | \-i \f2chars T{ | |
478 | Ignore all lines which begin with `%' followed by any character | |
479 | in | |
480 | .I chars . | |
481 | T} | |
482 | \-k\f2n T{ | |
483 | Use at most | |
484 | .I n | |
485 | keys per input item. | |
486 | T} | |
487 | \-l\f2n T{ | |
488 | Ignore items shorter than | |
489 | .I n | |
490 | letters long. | |
491 | T} | |
492 | \-n\f2m T{ | |
493 | Ignore as a key any word in the first | |
494 | .I m | |
495 | words of the list of common English words. | |
496 | The default is 100. | |
497 | T} | |
498 | \-s T{ | |
499 | Remove the labels | |
500 | .I (file:start,length) | |
501 | from the output; just give the keys. | |
502 | Used when searching rather than indexing. | |
503 | T} | |
504 | \-w T{ | |
505 | Each whole file is a separate item; | |
506 | blank lines in files are irrelevant. | |
507 | T} | |
508 | .TE | |
509 | .PP | |
510 | The normal arguments for indexing references are | |
511 | the defaults, which are | |
512 | .I "\-c /usr/lib/eign" , | |
513 | .I \-n100 , | |
514 | and | |
515 | .I \-l3 . | |
516 | For searching, the | |
517 | .I \-s | |
518 | option is also needed. | |
519 | When the big | |
520 | .I lookall | |
521 | index of all English files is run, | |
522 | the options are | |
523 | .I \-w , | |
524 | .I \-k50 , | |
525 | and | |
526 | .I "\-f (filelist)" . | |
527 | When running on textual input, | |
528 | the | |
529 | .I mkey | |
530 | program processes about 1000 English words per processor second. | |
531 | Unless the | |
532 | .I \-k | |
533 | option is used (and the input files are long enough for | |
534 | it to take effect) | |
535 | the output of | |
536 | .I mkey | |
537 | is comparable in size to its input. | |
538 | .PP | |
539 | .B | |
540 | Hash and invert. | |
541 | .R | |
542 | The | |
543 | .I inv | |
544 | program computes the hash codes and writes | |
545 | the inverted files. | |
546 | It reads the output of | |
547 | .I mkey | |
548 | and writes the set of files described earlier | |
549 | in this section. | |
550 | It expects one argument, which is used as the base name for | |
551 | the three (or four) files to be written. | |
552 | Assuming an argument of | |
553 | .I Index | |
554 | (the default) | |
555 | the entry file is named | |
556 | .I Index.ia , | |
557 | the posting file | |
558 | .I Index.ib , | |
559 | the tag file | |
560 | .I Index.ic , | |
561 | and the key file (if present) | |
562 | .I Index.id . | |
563 | The | |
564 | .I inv | |
565 | program recognizes the following options: | |
566 | .TS | |
567 | center; | |
568 | lB lw(4i). | |
569 | \-a T{ | |
570 | Append the new keys to a previous set of inverted files, | |
571 | making new files if there is no old set using the same base name. | |
572 | T} | |
573 | \-d T{ | |
574 | Write the optional key file. | |
575 | This is needed when you can not check for false drops by looking | |
576 | for the keys in the original inputs, i.e. when the key derivation | |
577 | procedure is complicated and | |
578 | the output keys are not words from the input files. | |
579 | T} | |
580 | \-h\f2n T{ | |
581 | The hash table size is | |
582 | .I n | |
583 | (default 997); | |
584 | .I n | |
585 | should be prime. | |
586 | Making \f2n\f1 bigger saves search time and spends disk space. | |
587 | T} | |
588 | \-i[u] \f2name T{ | |
589 | Take input from file | |
590 | .I name , | |
591 | instead of the standard input; | |
592 | if | |
593 | .B u | |
594 | is present | |
595 | .I name | |
596 | is unlinked when the sort is started. | |
597 | Using this option permits the sort scratch space | |
598 | to overlap the disk space used for input keys. | |
599 | T} | |
600 | \-n T{ | |
601 | Make a completely new set of inverted files, ignoring | |
602 | previous files. | |
603 | T} | |
604 | \-p T{ | |
605 | Pipe into the sort program, rather than writing a temporary | |
606 | input file. | |
607 | This saves disk space and spends processor time. | |
608 | T} | |
609 | \-v T{ | |
610 | Verbose mode; print a summary of the number of keys which | |
611 | finished indexing. | |
612 | T} | |
613 | .TE | |
614 | .PP | |
615 | About half the time used in | |
616 | .I inv | |
617 | is in the contained sort. | |
618 | Assuming the sort is roughly linear, however, | |
619 | a guess at the total timing for | |
620 | .I inv | |
621 | is 250 keys per second. | |
622 | The space used is usually of more importance: | |
623 | the entry file uses four bytes per possible hash (note | |
624 | the | |
625 | .B \-h | |
626 | option), | |
627 | and the tag file around 15-20 bytes per item indexed. | |
628 | Roughly, the posting file contains one item for each key instance | |
629 | and one item for each possible hash code; the items are two bytes | |
630 | long if the tag file is less than 65336 bytes long, and the | |
631 | items are four bytes wide if the tag file is greater than | |
632 | 65536 bytes long. | |
633 | Note that to minimize storage, the hash tables should be | |
634 | over-full; | |
635 | for most of the files indexed in this way, there is no | |
636 | other real choice, since the | |
637 | .I entry | |
638 | file must fit in memory. | |
639 | .PP | |
640 | .B | |
641 | Searching and Retrieving. | |
642 | .R | |
643 | The | |
644 | .I hunt | |
645 | program retrieves items from an index. | |
646 | It combines, as mentioned above, the two parts of phase (B): | |
647 | search and delivery. | |
648 | The reason why it is efficient to combine delivery and search | |
649 | is partly to avoid starting unnecessary processes, and partly | |
650 | because the delivery operation must be a part of the search | |
651 | operation in any case. | |
652 | Because of the hashing, the search part takes place in two stages: | |
653 | first items are retrieved which have the right hash codes associated with them, | |
654 | and then the actual items are inspected to determine false drops, i.e. | |
655 | to determine if anything with the right hash codes doesn't really have the right | |
656 | keys. | |
657 | Since the original item is retrieved to check on false drops, | |
658 | it is efficient to present it immediately, rather than only | |
659 | giving the tag as output and later retrieving the | |
660 | item again. | |
661 | If there were a separate key file, this argument would not apply, | |
662 | but separate key files are not common. | |
663 | .PP | |
664 | Input to | |
665 | .I hunt | |
666 | is taken from the standard input, | |
667 | one query per line. | |
668 | Each query should be in | |
669 | .I "mkey \-s" | |
670 | output format; | |
671 | all lower case, no punctuation. | |
672 | The | |
673 | .I hunt | |
674 | program takes one argument which specifies the base name of the index | |
675 | files to be searched. | |
676 | Only one set of index files can be searched at a time, | |
677 | although many text files may be indexed as a group, of course. | |
678 | If one of the text files has been changed since the index, that file | |
679 | is searched with | |
680 | .I fgrep; | |
681 | this may occasionally slow down the searching, and care should be taken to | |
682 | avoid having many out of date files. | |
683 | The following option arguments are recognized by | |
684 | .I hunt: | |
685 | .TS | |
686 | center; | |
687 | lB lw(4i). | |
688 | \-a T{ | |
689 | Give all output; ignore checking for false drops. | |
690 | T} | |
691 | \-C\f2n T{ | |
692 | Coordination level | |
693 | .I n; | |
694 | retrieve items with not more than | |
695 | .I n | |
696 | terms of the input missing; | |
697 | default | |
698 | .I C0 , | |
699 | implying that each search term must be in the output items. | |
700 | T} | |
701 | \-F[yn\f2d\f3\|] T{ | |
702 | ``\-Fy'' gives the text of all the items found; | |
703 | ``\-Fn'' suppresses them. | |
704 | ``\-F\f2d\|\f1'' where \f2d\f1\| is an integer | |
705 | gives the text of the first \f2d\f1 items. | |
706 | The default is | |
707 | .I \-Fy. | |
708 | T} | |
709 | \-g T{ | |
710 | Do not use | |
711 | .I fgrep | |
712 | to search files changed since the index was made; | |
713 | print an error comment instead. | |
714 | T} | |
715 | \-i \f2string T{ | |
716 | Take | |
717 | .I string | |
718 | as input, instead of reading the standard input. | |
719 | T} | |
720 | \-l \f2n T{ | |
721 | The maximum length of internal lists of candidate | |
722 | items is | |
723 | .I n; | |
724 | default 1000. | |
725 | T} | |
726 | \-o \f2string T{ | |
727 | Put text output (``\-Fy'') in | |
728 | .I string; | |
729 | of use | |
730 | .I only | |
731 | when | |
732 | invoked from another program. | |
733 | T} | |
734 | \-p T{ | |
735 | Print hash code frequencies; mostly | |
736 | for use in optimizing hash table sizes. | |
737 | T} | |
738 | \-T[yn\f2d\|\f3] T{ | |
739 | ``\-Ty'' gives the tags of the items found; | |
740 | ``\-Tn'' suppresses them. | |
741 | ``\-T\f2d\f1\|'' where \f2d\f1\| is an integer | |
742 | gives the first \f2d\f1 tags. | |
743 | The default is | |
744 | .I \-Tn . | |
745 | T} | |
746 | \-t \f2string T{ | |
747 | Put tag output (``\-Ty'') in | |
748 | .I string; | |
749 | of use | |
750 | .I only | |
751 | when invoked from another program. | |
752 | T} | |
753 | .TE | |
754 | .PP | |
755 | The timing of | |
756 | .I hunt | |
757 | is complex. | |
758 | Normally the hash table is overfull, so that there will | |
759 | be many false drops on any single term; | |
760 | but a multi-term query will have few false drops on | |
761 | all terms. | |
762 | Thus if a query is underspecified (one search term) | |
763 | many potential items will be examined and discarded as false | |
764 | drops, wasting time. | |
765 | If the query is overspecified (a dozen search terms) | |
766 | many keys will be examined only to verify that | |
767 | the single item under consideration has that key posted. | |
768 | The variation of search time with number of keys is | |
769 | shown in the table below. | |
770 | Queries of varying length were constructed to retrieve | |
771 | a particular document from the file of references. | |
772 | In the sequence to the left, search terms were chosen so as | |
773 | to select the desired paper as quickly as possible. | |
774 | In the sequence on the right, terms were chosen inefficiently, | |
775 | so that the query did not uniquely select the desired document | |
776 | until four keys had been used. | |
777 | The same document was the target in each case, | |
778 | and the final set of eight keys are also identical; the differences | |
779 | at five, six and seven keys are produced by measurement error, not | |
780 | by the slightly different key lists. | |
781 | .TS | |
782 | center; | |
783 | c s s s5 | c s s s | |
784 | cp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8 | |
785 | cp8 cp8 cp8 cp8 | cp8 cp8 cp8 cp8 | |
786 | n n n n | n n n n . | |
787 | Efficient Keys Inefficient Keys | |
788 | No. keys Total drops Retrieved Search time No. keys Total drops Retrieved Search time | |
789 | (incl. false) Documents (seconds) (incl. false) Documents (seconds) | |
790 | 1 15 3 1.27 1 68 55 5.96 | |
791 | 2 1 1 0.11 2 29 29 2.72 | |
792 | 3 1 1 0.14 3 8 8 0.95 | |
793 | 4 1 1 0.17 4 1 1 0.18 | |
794 | 5 1 1 0.19 5 1 1 0.21 | |
795 | 6 1 1 0.23 6 1 1 0.22 | |
796 | 7 1 1 0.27 7 1 1 0.26 | |
797 | 8 1 1 0.29 8 1 1 0.29 | |
798 | .TE | |
799 | As would be expected, the optimal search is achieved | |
800 | when the query just specifies the answer; however, | |
801 | overspecification is quite cheap. | |
802 | Roughly, the time required by | |
803 | .I hunt | |
804 | can be approximated as | |
805 | 30 milliseconds per search key plus 75 milliseconds | |
806 | per dropped document (whether it is a false drop or | |
807 | a real answer). | |
808 | In general, overspecification can be recommended; | |
809 | it protects the user against additions to the data base | |
810 | which turn previously uniquely-answered queries | |
811 | into ambiguous queries. | |
812 | .PP | |
813 | The careful reader will have noted an enormous discrepancy between these times | |
814 | and the earlier quoted time of around 1.9 seconds for a search. The times | |
815 | here are purely for the search and retrieval: they are measured by | |
816 | running many searches through a single invocation of the | |
817 | .I hunt | |
818 | program alone. | |
819 | The normal retrieval operation involves using the shell to | |
820 | set up a pipeline through | |
821 | .I mkey | |
822 | to | |
823 | .I hunt | |
824 | and starting both processes; this adds a fixed overhead of about 1.7 seconds | |
825 | of processor time | |
826 | to any single search. | |
827 | Furthermore, remember that all these times are processor times: | |
828 | on a typical morning on our \s-2PDP\s0 11/70 system, with about one dozen | |
829 | people logged on, | |
830 | to obtain 1 second of processor time for the search program | |
831 | took between 2 and 12 seconds of real time, with a median of | |
832 | 3.9 seconds and a mean of 4.8 seconds. | |
833 | Thus, although the work involved in a single search may be only | |
834 | 200 milliseconds, after you add the 1.7 seconds of startup processor | |
835 | time | |
836 | and then assume a 4:1 elapsed/processor time | |
837 | ratio, it will be 8 seconds before any response is printed. | |
838 | .NH | |
839 | Selecting and Formatting References for T\s-2ROFF\s0 | |
840 | .PP | |
841 | The major application of the retrieval software | |
842 | is | |
843 | .I refer, | |
844 | which is a | |
845 | .I troff | |
846 | preprocessor | |
847 | like | |
848 | .I eqn . | |
849 | .[ | |
850 | kernighan cherry acm 1975 | |
851 | .] | |
852 | It scans its input looking for items of the form | |
853 | .DS | |
854 | \*.[ | |
855 | imprecise citation | |
856 | \*.\^] | |
857 | .DE | |
858 | where an imprecise citation is merely a string | |
859 | of words found in the relevant bibliographic citation. | |
860 | This is translated into a properly formatted reference. | |
861 | If the imprecise citation does not correctly identify | |
862 | a single paper | |
863 | (either | |
864 | selecting no papers or too many) a message is given. | |
865 | The data base of citations searched may be tailored to each | |
866 | system, and individual users may specify their own | |
867 | citation | |
868 | files. | |
869 | On our system, the default data base is accumulated from | |
870 | the publication lists of the members of our organization, plus | |
871 | about half a dozen personal bibliographies that were collected. | |
872 | The present total is about 4300 citations, but this increases steadily. | |
873 | Even now, | |
874 | the data base covers a large fraction of local citations. | |
875 | .PP | |
876 | For example, the reference for the | |
877 | .I eqn | |
878 | paper above was specified as | |
879 | .DS | |
880 | \&\*.\*.\*. | |
881 | \&preprocessor like | |
882 | \&.I eqn. | |
883 | \&.[ | |
884 | \&kernighan cherry acm 1975 | |
885 | \&.] | |
886 | \&It scans its input looking for items | |
887 | \&\*.\*.\*. | |
888 | .DE | |
889 | This paper was itself printed using | |
890 | .I refer. | |
891 | The above input text was processed by | |
892 | .I refer | |
893 | as well as | |
894 | .I tbl | |
895 | and | |
896 | .I troff | |
897 | by the command | |
898 | .DS | |
899 | .ft I | |
900 | refer memo-file | tbl | troff \-ms | |
901 | .ft R | |
902 | .DE | |
903 | and the reference was automatically translated into a correct | |
904 | citation to the ACM paper on mathematical typesetting. | |
905 | .PP | |
906 | The procedure to use to place a reference in a paper | |
907 | using | |
908 | .I refer | |
909 | is as follows. | |
910 | First, use the | |
911 | .I lookbib | |
912 | command to check that the paper is in the data base | |
913 | and to find out what keys are necessary to retrieve it. | |
914 | This is done by typing | |
915 | .I lookbib | |
916 | and then typing some potential queries until | |
917 | a suitable query is found. | |
918 | For example, had one started to find | |
919 | the | |
920 | .I eqn | |
921 | paper shown above by presenting the query | |
922 | .DS | |
923 | $ lookbib | |
924 | kernighan cherry | |
925 | (EOT) | |
926 | .DE | |
927 | .I lookbib | |
928 | would have found several items; experimentation would quickly | |
929 | have shown that the query given above is adequate. | |
930 | Overspecifying the query is of course harmless. | |
931 | A particularly careful reader may have noticed that ``acm'' does not | |
932 | appear in the printed citation; | |
933 | we have supplemented some of the data base items with common | |
934 | extra keywords, such as common abbreviations for journals | |
935 | or other sources, to aid in searching. | |
936 | .PP | |
937 | If the reference is in the data base, the query | |
938 | that retrieved it can be inserted in the text, | |
939 | between | |
940 | .B \*.[ | |
941 | and | |
942 | .B \*.\^] | |
943 | brackets. | |
944 | If it is not in the data base, it can be typed | |
945 | into a private file of references, using the format | |
946 | discussed in the next section, and then | |
947 | the | |
948 | .B \-p | |
949 | option | |
950 | used to search this private file. | |
951 | Such a command might read | |
952 | (if the private references are called | |
953 | .I myfile ) | |
954 | .DS | |
955 | .ft 2 | |
956 | refer \-p myfile document | tbl | eqn | troff \-ms \*. \*. \*. | |
957 | .ft 1 | |
958 | .DE | |
959 | where | |
960 | .I tbl | |
961 | and/or | |
962 | .I eqn | |
963 | could be omitted if not needed. | |
964 | The use | |
965 | of the | |
966 | .I \-ms | |
967 | macros | |
968 | .[ | |
969 | lesk typing documents unix gcos | |
970 | .] | |
971 | or some other macro package, however, | |
972 | is essential. | |
973 | .I Refer | |
974 | only generates the data for the references; exact formatting | |
975 | is done by some macro package, and if none is supplied the | |
976 | references will not be printed. | |
977 | .PP | |
978 | By default, | |
979 | the references are numbered sequentially, | |
980 | and | |
981 | the | |
982 | .I \-ms | |
983 | macros format references as footnotes at the bottom of the page. | |
984 | This memorandum is an example of that style. | |
985 | Other possibilities are discussed in section 5 below. | |
986 | .NH | |
987 | Reference Files. | |
988 | .PP | |
989 | A reference file is a set of bibliographic references usable with | |
990 | .I refer. | |
991 | It can be indexed using the software described in section 2 | |
992 | for fast searching. | |
993 | What | |
994 | .I refer | |
995 | does is to read the input document stream, | |
996 | looking for imprecise citation references. | |
997 | It then searches through reference files to find | |
998 | the full citations, and inserts them into the | |
999 | document. | |
1000 | The format of the full citation is arranged to make it | |
1001 | convenient for a macro package, such as the | |
1002 | .I \-ms | |
1003 | macros, to format the reference | |
1004 | for printing. | |
1005 | Since | |
1006 | the format of the final reference is determined | |
1007 | by the desired style of output, | |
1008 | which is determined by the macros used, | |
1009 | .I refer | |
1010 | avoids forcing any kind of reference appearance. | |
1011 | All it does is define a set of string registers which | |
1012 | contain the basic information about the reference; | |
1013 | and provide a macro call which is expanded by the macro | |
1014 | package to format the reference. | |
1015 | It is the responsibility of the final macro package | |
1016 | to see that the reference is actually printed; if no | |
1017 | macros are used, and the output of | |
1018 | .I refer | |
1019 | fed untranslated to | |
1020 | .I troff, | |
1021 | nothing at all will be printed. | |
1022 | .PP | |
1023 | The strings defined by | |
1024 | .I refer | |
1025 | are taken directly from the files of references, which | |
1026 | are in the following format. | |
1027 | The references should be separated | |
1028 | by blank lines. | |
1029 | Each reference is a sequence of lines beginning with | |
1030 | .B % | |
1031 | and followed | |
1032 | by a key-letter. | |
1033 | The remainder of that line, and successive lines until the next line beginning | |
1034 | with | |
1035 | .B % , | |
1036 | contain the information specified by the key-letter. | |
1037 | In general, | |
1038 | .I refer | |
1039 | does not interpret the information, but merely presents | |
1040 | it to the macro package for final formatting. | |
1041 | A user with a separate macro package, for example, | |
1042 | can add new key-letters or use the existing ones for other purposes | |
1043 | without bothering | |
1044 | .I refer. | |
1045 | .PP | |
1046 | The meaning of the key-letters given below, in particular, | |
1047 | is that assigned by the | |
1048 | .I \-ms | |
1049 | macros. | |
1050 | Not all information, obviously, is used with each citation. | |
1051 | For example, if a document is both an internal memorandum and a journal article, | |
1052 | the macros ignore the memorandum version and cite only the journal article. | |
1053 | Some kinds of information are not used at all in printing the reference; | |
1054 | if a user does not like finding references by specifying title | |
1055 | or author keywords, and prefers to add specific keywords to the | |
1056 | citation, a field is available which is searched but not | |
1057 | printed (\f3K\f1). | |
1058 | .PP | |
1059 | The key letters currently recognized by | |
1060 | .I refer | |
1061 | and | |
1062 | .I \-ms, | |
1063 | with the kind of information implied, are: | |
1064 | .KS | |
1065 | .TS | |
1066 | center; | |
1067 | c c6 c c | |
1068 | c l c l. | |
1069 | Key Information specified Key Information specified | |
1070 | A Author's name N Issue number | |
1071 | B Title of book containing item O Other information | |
1072 | C City of publication P Page(s) of article | |
1073 | D Date R Technical report reference | |
1074 | E Editor of book containing item T Title | |
1075 | G Government (NTIS) ordering number V Volume number | |
1076 | I Issuer (publisher) | |
1077 | J Journal name | |
1078 | K Keys (for searching) X or | |
1079 | L Label Y or | |
1080 | M Memorandum label Z Information not used by \f2refer\f1 | |
1081 | .TE | |
1082 | .KE | |
1083 | For example, a sample reference could be | |
1084 | typed as: | |
1085 | .DS | |
1086 | %T Bounds on the Complexity of the Maximal | |
1087 | Common Subsequence Problem | |
1088 | %Z ctr127 | |
1089 | %A A. V. Aho | |
1090 | %A D. S. Hirschberg | |
1091 | %A J. D. Ullman | |
1092 | %J J. ACM | |
1093 | %V 23 | |
1094 | %N 1 | |
1095 | %P 1-12 | |
1096 | .\"%M TM 75-1271-7 | |
1097 | %M abcd-78 | |
1098 | %D Jan. 1976 | |
1099 | .DE | |
1100 | Order is irrelevant, except that authors are shown in the order | |
1101 | given. The output of | |
1102 | .I refer | |
1103 | is a stream of string definitions, one | |
1104 | for each of the fields of each reference, as | |
1105 | shown below. | |
1106 | .DS | |
1107 | \*.]- | |
1108 | \*.ds [A authors' names \*.\*.\*. | |
1109 | \*.ds [T title \*.\*.\*. | |
1110 | \*.ds [J journal \*.\*.\*. | |
1111 | \*.\*.\*. | |
1112 | \*.]\|[ type-number | |
1113 | .DE | |
1114 | The special macro | |
1115 | .B \&\*.]\- | |
1116 | precedes the string definitions | |
1117 | and the special macro | |
1118 | .B \*.]\|[ | |
1119 | follows. | |
1120 | These are changed from the input | |
1121 | .B \*.[ | |
1122 | and | |
1123 | .B \*.\^] | |
1124 | so that running the same file through | |
1125 | .I refer | |
1126 | again is harmless. | |
1127 | The | |
1128 | .B \*.]\- | |
1129 | macro can be used by the macro package to | |
1130 | initialize. | |
1131 | The | |
1132 | .B \*.]\|[ | |
1133 | macro, which should be used | |
1134 | to print the reference, is given an | |
1135 | argument | |
1136 | .I type-number | |
1137 | to indicate the kind of reference, as follows: | |
1138 | .KS | |
1139 | .TS | |
1140 | center; | |
1141 | c c | |
1142 | n l. | |
1143 | Value Kind of reference | |
1144 | 1 Journal article | |
1145 | 2 Book | |
1146 | 3 Article within book | |
1147 | 4 Technical report | |
1148 | 5 Bell Labs technical memorandum | |
1149 | 0 Other | |
1150 | .TE | |
1151 | .KE | |
1152 | The reference is flagged in the text | |
1153 | with the sequence | |
1154 | .DS | |
1155 | \e*\|([\*.number\e*\|(\*.\^] | |
1156 | .DE | |
1157 | where | |
1158 | .I number | |
1159 | is the footnote number. | |
1160 | The strings | |
1161 | .B [\*. | |
1162 | and | |
1163 | .B \*.\^] | |
1164 | should be used by the macro package | |
1165 | to format the reference flag in the text. | |
1166 | These strings can be replaced for a particular | |
1167 | footnote, as described in section 5. | |
1168 | The footnote number (or other signal) is available | |
1169 | to the reference macro | |
1170 | .B \*.]\|[ | |
1171 | as the | |
1172 | string register | |
1173 | .B [F . | |
1174 | .PP | |
1175 | In some cases users wish to suspend the searching, and merely | |
1176 | use the reference macro formatting. | |
1177 | That is, the user doesn't want to provide a search key | |
1178 | between | |
1179 | .B \*.[ | |
1180 | and | |
1181 | .B \*.\^] | |
1182 | brackets, but merely | |
1183 | the reference lines for the appropriate document. | |
1184 | Alternatively, the user | |
1185 | can wish | |
1186 | to add a few fields to those in the reference | |
1187 | as in the standard file, or | |
1188 | override some fields. | |
1189 | Altering or replacing fields, or supplying whole references, is easily done | |
1190 | by inserting lines beginning | |
1191 | with | |
1192 | .B % ; | |
1193 | any such line is taken as | |
1194 | direct input to the reference | |
1195 | processor rather than keys to be searched. | |
1196 | Thus | |
1197 | .DS | |
1198 | \*.[ | |
1199 | key1 key2 key3 \*.\*.\*. | |
1200 | %Q New format item | |
1201 | %R Override report name | |
1202 | \*.\^] | |
1203 | .DE | |
1204 | makes the indicates changes to the result of searching for | |
1205 | the keys. | |
1206 | All of the search keys must be given before the first | |
1207 | \f3%\f1 line. | |
1208 | .PP | |
1209 | If no search keys are provided, an entire citation can | |
1210 | be provided in-line in the text. | |
1211 | For example, if the | |
1212 | .I eqn | |
1213 | paper citation were to be inserted in | |
1214 | this way, rather than by searching for it in the data base, | |
1215 | the input would read | |
1216 | .DS | |
1217 | \&\*.\*.\*. | |
1218 | \&preprocessor like | |
1219 | \&.I eqn. | |
1220 | \&.[ | |
1221 | \&%A B. W. Kernighan | |
1222 | \&%A L. L. Cherry | |
1223 | \&%T A System for Typesetting Mathematics | |
1224 | \&%J Comm. ACM | |
1225 | \&%V 18 | |
1226 | \&%N 3 | |
1227 | \&%P 151-157 | |
1228 | \&%D March 1975 | |
1229 | \&.] | |
1230 | \&It scans its input looking for items | |
1231 | \&\*.\*.\*. | |
1232 | .DE | |
1233 | This would produce a citation of the same appearance as that | |
1234 | resulting from the file search. | |
1235 | .PP | |
1236 | As shown, fields are normally turned into | |
1237 | .I troff | |
1238 | strings. | |
1239 | Sometimes users would rather have them defined as macros, | |
1240 | so that other | |
1241 | .I troff | |
1242 | commands can be placed into the data. | |
1243 | When this is necessary, simply double the control character | |
1244 | .B % | |
1245 | in the data. | |
1246 | Thus the input | |
1247 | .DS | |
1248 | \&.[ | |
1249 | %V 23 | |
1250 | %%M | |
1251 | Bell Laboratories, | |
1252 | Murray Hill, N.J. 07974 | |
1253 | \&.] | |
1254 | .DE | |
1255 | is processed by | |
1256 | .I refer | |
1257 | into | |
1258 | .DS | |
1259 | \&.ds [V 23 | |
1260 | \&.de [M | |
1261 | Bell Laboratories, | |
1262 | Murray Hill, N.J. 07974 | |
1263 | \&.. | |
1264 | .DE | |
1265 | The information after | |
1266 | .B %%M | |
1267 | is defined as a macro to be invoked by | |
1268 | .B .[M | |
1269 | while the information after | |
1270 | .B %V | |
1271 | is turned into a string to be invoked by | |
1272 | .B \e\(**([V . | |
1273 | At present | |
1274 | .I \-ms | |
1275 | expects all information as strings. | |
1276 | .NH | |
1277 | Collecting References and other Refer Options | |
1278 | .PP | |
1279 | Normally, the combination of | |
1280 | .I refer | |
1281 | and | |
1282 | .I \-ms | |
1283 | formats output as | |
1284 | .I troff | |
1285 | footnotes which are consecutively numbered and placed | |
1286 | at the bottom of the page. However, | |
1287 | options exist to | |
1288 | place the references at the end; to arrange references alphabetically | |
1289 | by senior author; and to indicate references by strings in the text of the form | |
1290 | [Name1975a] | |
1291 | rather than by number. | |
1292 | Whenever references are not placed at the bottom of a page | |
1293 | identical references are coalesced. | |
1294 | .PP | |
1295 | For example, the | |
1296 | .B \-e | |
1297 | option to | |
1298 | .I refer | |
1299 | specifies that references are to be collected; in this case | |
1300 | they are output whenever the sequence | |
1301 | .DS | |
1302 | \*.[ | |
1303 | $LIST$ | |
1304 | \*.\^] | |
1305 | .DE | |
1306 | is encountered. | |
1307 | Thus, to place references at the end of a paper, the user would run | |
1308 | .I refer | |
1309 | with the | |
1310 | .I \-e | |
1311 | option and place the above $LIST$ commands after the last | |
1312 | line of the text. | |
1313 | .I Refer | |
1314 | will then move all the references to that point. | |
1315 | To aid in formatting the collected references, | |
1316 | .I refer | |
1317 | writes the references preceded by the line | |
1318 | .DS | |
1319 | .B .]< | |
1320 | .DE | |
1321 | and | |
1322 | followed by the line | |
1323 | .DS | |
1324 | .B .]> | |
1325 | .DE | |
1326 | to invoke special macros before and after the references. | |
1327 | .PP | |
1328 | Another possible option to | |
1329 | .I refer | |
1330 | is the | |
1331 | .B \-s | |
1332 | option to specify | |
1333 | sorting of references. The default, | |
1334 | of course, is to list references in the order presented. | |
1335 | The | |
1336 | .B \-s | |
1337 | option implies the | |
1338 | .B \-e | |
1339 | option, and thus requires | |
1340 | a | |
1341 | .DS | |
1342 | \*.[ | |
1343 | $LIST$ | |
1344 | \*.\^] | |
1345 | .DE | |
1346 | entry to call out the reference list. | |
1347 | The | |
1348 | .B \-s | |
1349 | option may be followed by a string of letters, numbers, and `+' signs indicating how | |
1350 | the references are to be sorted. | |
1351 | The sort is done using the fields whose key-letters are | |
1352 | in the string as sorting keys; the numbers indicate how many | |
1353 | of the fields are to be considered, with `+' | |
1354 | taken as a large number. | |
1355 | Thus the default is | |
1356 | .B \-sAD | |
1357 | meaning ``Sort on senior author, then date.'' To | |
1358 | sort on all authors and then title, specify | |
1359 | .B \-sA+T . | |
1360 | And to sort on two authors and then the journal, | |
1361 | write | |
1362 | .B \-sA2J . | |
1363 | .PP | |
1364 | Other options to | |
1365 | .I refer | |
1366 | change the signal or label inserted in the text for each reference. | |
1367 | Normally these are just sequential numbers, | |
1368 | and their exact placement (within brackets, as superscripts, etc.) is determined | |
1369 | by the macro package. | |
1370 | The | |
1371 | .B \-l | |
1372 | option replaces reference numbers by | |
1373 | strings composed of the senior author's last name, the date, | |
1374 | and a disambiguating letter. | |
1375 | If a number follows the | |
1376 | .B l | |
1377 | as in | |
1378 | .B \-l3 | |
1379 | only that many letters of the last name are used | |
1380 | in the label string. | |
1381 | To abbreviate the date as well the form | |
1382 | \f3-l\f2m,n\f1 | |
1383 | shortens the last name to the | |
1384 | first | |
1385 | .I m | |
1386 | letters and the date to the | |
1387 | last | |
1388 | .I n | |
1389 | digits. | |
1390 | For example, the option | |
1391 | .B \-l3,2 | |
1392 | would refer to the | |
1393 | .I eqn | |
1394 | paper (reference 3) by the signal | |
1395 | .I Ker75a , | |
1396 | since it is the first cited reference by Kernighan in 1975. | |
1397 | .PP | |
1398 | A user wishing to specify particular labels for | |
1399 | a private bibliography may use the | |
1400 | .B \-k | |
1401 | option. | |
1402 | Specifying | |
1403 | \f3\-k\f2x\f1 | |
1404 | causes the field \f2x\f1 to be used as a label. | |
1405 | The default is \f3L\f1. | |
1406 | If this field ends in \f3\-\f1, that character | |
1407 | is replaced by a sequence letter; otherwise the field | |
1408 | is used exactly as given. | |
1409 | .PP | |
1410 | If none of the | |
1411 | .I refer -produced | |
1412 | signals are desired, | |
1413 | the | |
1414 | .B \-b | |
1415 | option entirely suppresses automatic text signals. | |
1416 | .PP | |
1417 | If the user wishes to override the | |
1418 | .I \-ms | |
1419 | treatment of the reference signal (which is normally to | |
1420 | enclose the number in brackets in | |
1421 | .I nroff | |
1422 | and make it a superscript in | |
1423 | .I troff\\| ) | |
1424 | this can be done easily. | |
1425 | If the lines | |
1426 | .B \&.[ | |
1427 | or | |
1428 | .B \&.] | |
1429 | contain anything following these characters, | |
1430 | the remainders of these lines are used to surround | |
1431 | the reference signal, instead of the default. | |
1432 | Thus, for example, to say ``See reference (2).'' | |
1433 | and avoid | |
1434 | ``See reference.\s-3\u2\d\s+3'' the | |
1435 | input might appear | |
1436 | .DS | |
1437 | \&See reference | |
1438 | \&\*.[ ( | |
1439 | imprecise citation ... | |
1440 | \&\*.\^])\*. | |
1441 | .DE | |
1442 | Note that blanks are significant in this construction. | |
1443 | If a permanent change is desired in the style of reference | |
1444 | signals, however, it is probably easier to redefine the strings | |
1445 | .B \&[. | |
1446 | and | |
1447 | .B \&.] | |
1448 | (which are used to bracket each signal) | |
1449 | than to change each citation. | |
1450 | .PP | |
1451 | Although normally | |
1452 | .I refer | |
1453 | limits itself to retrieving the data for the reference, | |
1454 | and leaves to a macro package the job of arranging that | |
1455 | data as required by the local format, there are two | |
1456 | special options for rearrangements that can not be | |
1457 | done by macro packages. | |
1458 | The | |
1459 | .B \-c | |
1460 | option puts fields into all upper case | |
1461 | (C\s-2APS\s+2-S\s-2MALL\s+2 C\s-2APS\s+2 | |
1462 | in | |
1463 | .I troff | |
1464 | output). | |
1465 | The key-letters indicated what information is to be translated | |
1466 | to upper case follow the | |
1467 | .B c , | |
1468 | so that | |
1469 | .B \-cAJ | |
1470 | means that authors' names and journals are to be in caps. | |
1471 | The | |
1472 | .B \-a | |
1473 | option writes the names of authors last name first, that is | |
1474 | .I "A. D. Hall, Jr." | |
1475 | is written as | |
1476 | .I "Hall, A. D. Jr" . | |
1477 | The citation form of | |
1478 | the | |
1479 | .I "Journal of the ACM" , | |
1480 | for example, would require | |
1481 | both | |
1482 | .B \-cA | |
1483 | and | |
1484 | .B \-a | |
1485 | options. | |
1486 | This produces authors' names in the style | |
1487 | .I | |
1488 | K\s-2ERNIGHAN\s0, B. W. \s-2AND\s0 C\s-2HERRY\s0, L. L.\& | |
1489 | .R | |
1490 | for the previous example. | |
1491 | The | |
1492 | .B \-a | |
1493 | option may be followed by a number to indicate how many | |
1494 | author names should be reversed; | |
1495 | .B \-a1 | |
1496 | (without any | |
1497 | .B \-c | |
1498 | option) | |
1499 | would produce | |
1500 | .I | |
1501 | Kernighan, B. W. and L. L. Cherry, | |
1502 | .R | |
1503 | for example. | |
1504 | .PP | |
1505 | Finally, there is also the previously-mentioned | |
1506 | .B \-p | |
1507 | option to let the user specify | |
1508 | a private file of references to be searched before the public files. | |
1509 | Note that | |
1510 | .I refer | |
1511 | does not insist on a previously made index for these files. | |
1512 | If a file is named which contains reference | |
1513 | data but is not indexed, it will be searched | |
1514 | (more slowly) | |
1515 | by | |
1516 | .I refer | |
1517 | using | |
1518 | .I fgrep. | |
1519 | In this way | |
1520 | it is easy for users to keep small files of | |
1521 | new references, which can later be added to the | |
1522 | public data bases. | |
1523 | .SG MH-1274-MEL-\s8UNIX\s0 |