From: Bill Joy Date: Wed, 15 Oct 1980 20:03:55 +0000 (-0800) Subject: BSD 4 development X-Git-Tag: BSD-4~196 X-Git-Url: https://git.subgeniuskitty.com/unix-history/.git/commitdiff_plain/c4aafe303f423cf8f320095f6f6d7ca2f9fddb46 BSD 4 development Work on file usr/doc/REFS Work on file usr/doc/README Work on file usr/doc/assembler Work on file usr/doc/awk Work on file usr/doc/bc Work on file usr/doc/cman Work on file usr/doc/dc Synthesized-from: CSRG//cd1/4.0 --- diff --git a/usr/doc/README b/usr/doc/README new file mode 100644 index 0000000000..98a97e4194 --- /dev/null +++ b/usr/doc/README @@ -0,0 +1,9 @@ +This directory contains source for most of the +documents contained in Volume 2 of The UNIX +Programmer's Manual, 7th Edition. +Most use -ms to format; many also use +refer, tbl and eqn. Precise incantations +are engraved in ./run. + +The citations for those papers that use +refer are taken from /usr/dict/papers/Rv7man. diff --git a/usr/doc/REFS b/usr/doc/REFS new file mode 100644 index 0000000000..e5da45212e --- /dev/null +++ b/usr/doc/REFS @@ -0,0 +1,405 @@ +%A L. P. Deutsch +%A B. W. Lampson +%T An online editor +%J Comm. Assoc. Comp. Mach. +%V 10 +%N 12 +%D December 1967 +%P 793-799, 803 +%K qed + +.[ +%r 17 +%K cstr +%R Comp. Sci. Tech. Rep. No. 17 +%I Bell Laboratories +%C Murray Hill, New Jersey +%A B. W. Kernighan +%A L. L. Cherry +%T A System for Typesetting Mathematics +%d May 1974, revised April 1977 +%J Comm. Assoc. Comp. Mach. +%K acm cacm +%V 18 +%P 151-157 +%D March 1975 +.] + +%T U\s-2NIX\s0 Time-Sharing System: Document Preparation +%K unix bstj +%A B. W. Kernighan +%A M. E. Lesk +%A J. F. Ossanna +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 2115-2135 +%D 1978 + +%A T. A. Dolotta +%A J. R. Mashey +%T An Introduction to the Programmer's Workbench +%J Proc. 2nd Int. Conf. on Software Engineering +%D October 13-15, 1976 +%P 164-168 + +%T U\s-2NIX\s0 Time-Sharing System: The Programmer's Workbench +%A T. A. Dolotta +%A R. C. Haight +%A J. R. Mashey +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 2177-2200 +%D 1978 +%K unix bstj + +%T U\s-2NIX\s0 Time-Sharing System: U\s-2NIX\s0 on a Microprocessor +%K unix bstj +%A H. Lycklama +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 2087-2101 +%D 1978 + +%T The C Programming Language +%A B. W. Kernighan +%A D. M. Ritchie +%I Prentice-Hall +%C Englewood Cliffs, New Jersey +%D 1978 + +%T Computer Recreations +%A Aleph-null +%J Software Practice and Experience +%V 1 +%N 2 +%D April-June 1971 +%P 201-204 + +%T U\s-2NIX\s0 Time-Sharing System: The U\s-2NIX\s0 Shell +%A S. R. Bourne +%K unix bstj +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 1971-1990 +%D 1978 + +%A L. P. Deutsch +%A B. W. Lampson +%T \*sSDS\*n 930 time-sharing system preliminary reference manual +%R Doc. 30.10.10, Project \*sGENIE\*n +%C Univ. Cal. at Berkeley +%D April 1965 + +%A R. J. Feiertag +%A E. I. Organick +%T The Multics input-output system +%J Proc. Third Symposium on Operating Systems Principles +%D October 18-20, 1971 +%P 35-41 + +%A D. G. Bobrow +%A J. D. Burchfiel +%A D. L. Murphy +%A R. S. Tomlinson +%T \*sTENEX\*n, a Paged Time Sharing System for the \*sPDP\*n-10 +%J Comm. Assoc. Comp. Mach. +%V 15 +%N 3 +%D March 1972 +%K tenex +%P 135-143 + +%A R. E. Griswold +%A D. R. Hanson +%T An Overview of SL5 +%J SIGPLAN Notices +%V 12 +%N 4 +%D April 1977 +%P 40-50 + +%A E. W. Dijkstra +%T Cooperating Sequential Processes +%B Programming Languages +%E F. Genuys +%I Academic Press +%C New York +%D 1968 +%P 43-112 + +%A J. A. Hawley +%A W. B. Meyer +%T M\s-2UNIX\s0, A Multiprocessing Version of U\s-2NIX\s0 +%K munix unix +%R M.S. Thesis +%I Naval Postgraduate School +%C Monterey, Cal. +%D 1975 + +%T The U\s-2NIX\s0 Time-Sharing System +%K unix bstj +%A D. M. Ritchie +%A K. Thompson +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 1905-1929 +%D 1978 + +%A E. I. Organick +%T The M\s-2ULTICS\s0 System +%K multics +%I M.I.T. Press +%C Cambridge, Mass. +%D 1972 + +%T UNIX for Beginners +%A B. W. Kernighan +%D 1978 + +%T U\s-2NIX\s0 Programmer's Man\&ual +%A K. Thompson +%A D. M. Ritchie +%K unix +%I Bell Laboratories +%O Seventh Edition. +%D 1978 + +%A K. Thompson +%T The U\s-2NIX\s0 Command Language +%B Structured Programming\(emInfotech State of the Art Report +%I Infotech International Ltd. +%C Nicholson House, Maidenhead, Berkshire, England +%D March 1975 +%P 375-384 +%K unix +%X pwb +Brief description of shell syntax and semantics, without much +detail on implementation. +Much on pipes and convenience of hooking programs together. +Includes SERMONETTE: +"Many familiar computing `concepts' are missing from UNIX. +Files have no records. There are no access methods. +There are no file types. These concepts fill a much-needed gap. +I sincerely hope that when future systems are designed by +manufacturers the value of some of these ingrained notions is re-examined. +Like the politician and his `common man', manufacturers have +their `average user'. + +%A J. R. Mashey +%T PWB/UNIX Shell Tutorial +%D September 30, 1977 + +%A D. F. Hartley (Ed.) +%T The Cambridge Multiple Access System \- Users Reference Manual +%I University Mathematical Laboratory +%C Cambridge, England +%D 1968 + +%A P. A. Crisman (Ed.) +%T The Compatible Time-Sharing System +%I M.I.T. Press +%K whole ctss book +%C Cambridge, Mass. +%D 1965 + +%T LR Parsing +%A A. V. Aho +%A S. C. Johnson +%J Comp. Surveys +%V 6 +%N 2 +%P 99-124 +%D June 1974 + +%T Deterministic Parsing of Ambiguous Grammars +%A A. V. Aho +%A S. C. Johnson +%A J. D. Ullman +%J Comm. Assoc. Comp. Mach. +%K acm cacm +%V 18 +%N 8 +%P 441-452 +%D August 1975 + +%A A. V. Aho +%A J. D. Ullman +%T Principles of Compiler Design +%I Addison-Wesley +%C Reading, Mass. +%D 1977 + +.[ +%r 65 +%R Comp. Sci. Tech. Rep. No. 65 +%K CSTR +%A S. C. Johnson +%T Lint, a C Program Checker +%D December 1977 +%O updated version TM 78-1273-3 +%D 1978 +.] + +%T A Portable Compiler: Theory and Practice +%A S. C. Johnson +%J Proc. 5th ACM Symp. on Principles of Programming Languages +%P 97-104 +%D January 1978 + +.[ +%r 39 +%K CSTR +%R Comp. Sci. Tech. Rep. No. 39 +%I Bell Laboratories +%C Murray Hill, New Jersey +%A M. E. Lesk +%T Lex \(em A Lexical Analyzer Generator +%D October 1975 +.] + +.[ +%r 32 +%K CSTR +%R Comp. Sci. Tech. Rep. No. 32 +%I Bell Laboratories +%C Murray Hill, New Jersey +%A S. C. Johnson +%T Yacc \(em Yet Another Compiler-Compiler +%D July 1975 +.] + +%T U\s-2NIX\s0 Time-Sharing System: Portability of C Programs and the U\s-2NIX\s0 System +%K unix bstj +%A S. C. Johnson +%A D. M. Ritchie +%J Bell Sys. Tech. J. +%V 57 +%N 6 +%P 2021-2048 +%D 1978 + +%T Typing Documents on UNIX and GCOS: The -ms Macros for Troff +%A M. E. Lesk +%D 1977 + +%A K. Thompson +%A D. M. Ritchie +%T U\s-2NIX\s0 Programmer's Manual +%K unix +%I Bell Laboratories +%O Sixth Edition +%D May 1975 + +%T The Network U\s-2NIX\s0 System +%K unix +%A G. L. Chesson +%J Operating Systems Review +%V 9 +%N 5 +%P 60-66 +%D 1975 +%O Also in \f2Proc. 5th Symp. on Operating Systems Principles.\f1 + +%T Spider \(em An Experimental Data Communications System +%Z ctr127 +%A A. G. Fraser +%J Proc. IEEE Conf. on Communications +%P 21F +%O IEEE Cat. No. 74CH0859-9-CSCB. +%D June 1974 + +%T A Virtual Channel Network +%A A. G. Fraser +%J Datamation +%P 51-56 +%D February 1975 + +.[ +%r 41 +%K CSTR +%R Comp. Sci. Tech. Rep. No. 41 +%I Bell Laboratories +%C Murray Hill, New Jersey +%A J. W. Hunt +%A M. D. McIlroy +%T An Algorithm for Differential File Comparison +%D June 1976 +.] + +%A F. P. Brooks, Jr. +%T The Mythical Man-Month +%I Addison-Wesley +%C Reading, Mass. +%D 1975 +%X pwb +Readable, classic reference on software engineering and +problems of large projects, from someone with experience in them. +Required reading for any software engineer, even if conclusions may not +always be agreed with. +%br +"The second is the most dangerous system a man every designs." p.55. +%br +"Hence plan to throw one away; you will, anyhow." p.116. +%br +"Cosgrove has perceptively pointed out that the programmer delivers +satisfaction of a user need rather than any tangible product. +And both the actual need and the user's perception of that need +will change as programs are built, tested, and used." p.117. +%br +"The total cost of maintaining a widely used program is typically 40 percent +or more of the cost of developing it." p.121. +%br +"As shown above, amalgamating prose and program reduces the total +number of characters to be stored." p.175. + +%T A Portable Compiler for the Language C +%A A. Snyder +%I Master's Thesis, M.I.T. +%C Cambridge, Mass. +%D 1974 + +%T The C Language Calling Sequence +%A M. E. Lesk +%A S. C. Johnson +%A D. M. Ritchie +%D 1977 + +%T Optimal Code Generation for Expression Trees +%A A. V. Aho +%A S. C. Johnson +%D 1975 +%J J. Assoc. Comp. Mach. +%K acm jacm +%V 23 +%N 3 +%P 488-501 +%O Also in \f2Proc. ACM Symp. on Theory of Computing,\f1 pp. 207-217, 1975. + +%A R. Sethi +%A J. D. Ullman +%T The Generation of Optimal Code for Arithmetic Expressions +%J J. Assoc. Comp. Mach. +%K acm jacm +%V 17 +%N 4 +%D October 1970 +%P 715-728 +%O Reprinted as pp. 229-247 in \fICompiler Techniques\fR, ed. B. W. Pollack, Auerbach, Princeton NJ (1972). +%X pwb +Optimal approach for straight-line, fixed +number of regs. + +%T Code Generation for Machines with Multiregister +Operations +%A A. V. Aho +%A S. C. Johnson +%A J. D. Ullman +%J Proc. 4th ACM Symp. on Principles of Programming Languages +%P 21-28 +%D January 1977 + diff --git a/usr/doc/assembler b/usr/doc/assembler new file mode 100644 index 0000000000..6e1435b2e4 --- /dev/null +++ b/usr/doc/assembler @@ -0,0 +1,942 @@ +.TL +Assembler Reference Manual +.AU +John F. Reiser +.AI +.HO +.AU +Robert R. Henry\s-2\u*\d\s+2 +.FS +\&\s-2\u*\d\s+2 +Preparation of this paper supported in part +by the National Science Foundation under grant MCS # 78-07291. +.FE +.AI +Electronics Research Laboratory +University of California +Berkeley, CA 94720 +.ND November 5, 1979 +.NH +Introduction +.PP +This document describes the usage and input syntax +of the \s8UNIX VAX\s10-11 assembler \fIas\fP. +\fIAs\fP is designed for assembling the code produced by the +C compiler; certain concessions have been made to handle code written +directly by people, but in general little sympathy has been extended. +This document is intended only for the writer of a compiler or a maintainer +of the assembler. +.NH +Usage +.PP +\fIas\fP is used as follows: +.in +5 +as +[ \fB\-LVWJR\fR ] +[ \fB\-d\fIn\fR ] +[ \fB\-DTC\fR ] +[ \fB\-t \fIdirectory\fR ] +[ \fB\-o \fIoutput\fR ] +[ \fIname\d\s-2\&1\s+2\u1\fP ] +[ \fIname\d\s-2\&2\s+2\u ... \fP ] +.br +.in -5 +.PP +The \fB\-L\fP flag instructs the assembler to save labels beginning with +a 'L' in the symbol table portion output file. +Labels are not saved by default, as the default action of the link +editor \fIld\fP is to discard them anyway. +.PP +The \fB\-V\fP flag tells the assembler to place its interpass temporary +file into virtual memory. +In normal circumstances, the system manager +will decide where the temporary file should lie. +Our experiments +with very large temporary files show that placing the temporary +file into virtual memory will save about 13% of the assembly time, +where the size of the temporary file is about 350K bytes. +Most assembler sources will not be this long. +.PP +The \fB\-W\fP turns off reporting all errors. +.PP +The \fB\-J\fP flag forces \s-2UNIX\s+2 style pseudo\-branch +instructions with destinations further away than a +byte displacement to be +turned into jump instructions with 4 byte offsets. +The \fB\-J\fP flag buys you nothing if \fB\-d2\fP is set. +(See \(sc 9.4) +.PP +The \fB\-R\fP flag effectively turns \fB.data\fP\fI n\fP +segment changing directives into \fB.text\fP\fI n\fP directives. +This obviates the need to run editor scripts on assembler source +to ``read\-only'' fix initialized data segments. +Uninitialized data (via \fB.lcomm\fP and \fP.comm\fP directives) +is still assembled into the data or bss segments. +.PP +The \fB\-d\fP flag specifies the number of bytes +which the assembler should allow for a displacement when the value of the +displacement expression is undefined in the first pass. +The possible values of \fIn\fP are 1, 2, or 4; the assembler uses 4 bytes +if \fB-d\fP is not specified. +See \(sc 9.2. +.PP +Provided the \fB\-V\fP flag is not set, +the \fB\-t\fP flag causes the assembler to place its single temporary file +in the \fIdirectory\fP instead of in \fI/tmp\fP. +.PP +The \fB\-o\fP flag causes the output to be placed on the named file. +The output of the assembler is by default placed on +the file \fIa.out\fR in the current directory. +.PP +The input to the assembler is normally taken from the standard input. +If file arguments occurs, then the input is taken +sequentially from the files \fIname\d\s-2\&1\s+2\u\fP, +\fIname\d\s-2\&2\s+2\u\fP... +This is not to say that the files are assembled seperately; +\fIname\d\s-2\&2\s+2\u\fP +is effectively concatenated to \fIname\d\s-2\&1\s+2\u\fP, +so multiple definitions cannot occur amongst the input sources. +.PP +The \fB\-D\fP flag enables debugging information, provided that the +assembler has been compiled to have debugging information available. +.PP +The \fB\-T\fP flag enables a trace to be generate of each token read +by \fIas\fP to be printed. This is long and boring, but useful when +debugging the assembler. +.NH +Lexical conventions +.PP +Assembler tokens include identifiers (alternatively, ``symbols'' or ``names''), +constants, and operators. +.NH +Identifiers +.PP +An identifier consists of a sequence of alphanumeric characters (including +period ``\|\fB.\fR\|'', +underscore ``\(ul'' and +dollar ``\|$\|'') +of which the first may not be numeric. +If the assembler has been compiled to support flexible length symbols, +identifiers may be (practically) arbitrarily long with all +characters significant; +otherwise, only the first NCPS +(a symbol defined in \fI/usr/include/a.out.h\fP, and normally 8) +characters are significant. +.NH 2 +Constants +.NH 3 +Simple constants +.PP +All integer constants are 64 bits wide and interpreted as two's +complement numbers. +64 bit wide integer constants (quads) are only partially supported +by the \s-2VAX\s+2 hardware, and are supported only to provide +immediate constants to \s-2VAX\s+2 instructions with quad operands. +Floating-point constants are 64 bits wide. +The digits are ``0123456789abcdefABCDEF'' with the obvious values. +.PP +An octal constant consists of a sequence of digits with a leading zero. +.PP +A decimal constant consists of a sequence of digits without a leading zero. +.PP +A hexadecimal constant consists of the characters ``0x'' (or ``0X'') +followed by a sequence of digits. +.PP +A single-character constant consists of a single quote ``\|\(fm\|'' +followed by an \s8ASCII\s10 character, including \s8ASCII\s10 newline. +The constant's value is the code for the +given character. +.PP +A floating-point constant consists of the characters ``0f'', ``0d'', +``0F'', or ``0D'' followed by a sequence of characters which \fIatof\fP +will recognize as a floating-point number; +either ``e'', ``E'', ``d''or ``D'' +may be used to designate the exponent field. +.NH 3 +String Constants +.PP +A string constant is defined using the same syntax and semantics as ``C'' +beginning and ending with a ``"'' (double quote). +The \s8DEC\s10 assembler conventions for flexible string quoting is +not implemented. +All ``C'' backslash conventions are observed; the backslash conventions +peculiar to the \s-2PDP\-11\s+2 assembler are not observed. +Strings are known by their value and their length; the assembler +does not implicitly end strings with a null byte. +.NH 2 +Operators +.PP +There are several single-character +operators; see \(sc7. +.NH 2 +Blanks +.PP +Blank and tab characters +may be interspersed freely between tokens, but may +not be used within tokens (except character constants). +A blank or tab is required to separate adjacent +identifiers or constants not otherwise separated. +.NH 2 +Comments +.NH 3 +Decadent Comments +.PP +The character ``\|#\|'' introduces a comment, which extends +through the end of the line on which it appears. +Comments starting in column 1, +of the format ``\|# \fIexpression string\fP\|" +are interpreted as an indication that the assembler is now assembling +file \fIstring\fP at line \fIexpression\fP. +Thus, one can use the C preprocessor on an assembly language source file, +and use the \fI#include\fP and \fI#define\fP +preprocessor directives. +(Note that their may not be an assembler comment starting in column +1 if the assembler source is given to the C preprocessor, as it will +be intrepreted by the preprocessor in a way not intended.) +Comments are otherwise ignored by the assembler. +.NH 3 +C Style Comments +.PP +The assembler will recognize C style comments, introduced with +the prologue \fB/*\fP and ending with the epilogue \fB*/\fP. +C style comments may extend across multiple lines, and are the preferred +comment style to use if one chooses to use the C preprocessor. +.NH 1 +Segments and Location Counters +.PP +Assembled code and data fall into three segments: the text segment, +the data segment, and the bss segment. The operating system makes +some assumptions about the content of these segments; the assembler +does not. Within the text and data segments there are a number of +sub-segments, distinguished by number (``text 0'', ``text 1'', .\|.\|. +``data 0'', ``data 1'', .\|.\|.\|). +Currently there are four subsegments each in text and data. +The subsegments are for programming convenience only. Before writing the +output file, the assembler zero-pads each text subsegment to a multiple of four +bytes and then concatenates the subsegments in order to form the text segment; +an analogous operation is done for the data segment. +Requesting that the loader define symbols and storage regions is the only +action allowed by the assembler with respect to the bss segment. +Assembly begins in ``text 0''. +.PP +Associated with each (sub)segment is an implicit location counter which +begins at zero and is incremented by 1 for each byte assembled into the +(sub)segment. There is no way to explicitly reference a location counter. +Note that the location counters of subsegments other than ``text 0'' +and ``data 0'' behave peculiarly due to the concatenation used to form +the text and data segments. +.NH 1 +Statements +.PP +A source program is composed of a sequence of +\fIstatements\fP. +Statements are separated either by new-lines +or by semicolons. +There are two kinds of statements: null statements +and keyword statements. +Either kind of statement may be preceded by +one or more labels. +.NH 2 +Labels +.NH 3 +Name (Global) Labels +.PP +A global label consists of a name followed +by a colon ``\|:\|''. +The effect of a name label is to assign the current +value and type of the location counter +to the name. +An error is indicated in pass 1 if the +name is already defined; +an error is indicated in pass 2 if the +value assigned changes the definition +of the label. +.PP +A global label is referenced by its name. +.PP +Global labels beginning with a ``\|L\|'' +are discarded unless the \fB-L\fP option +is in effect. +.NH 3 +Numeric (Local) Labels +.PP +A numeric label consists of a digit \fI0\fP to \fI9\fP followed by a +colon (``\|:\|''). +Such a label serves to define temporary symbols of the form +``\fIn\fPb'' and ``\fIn\fPf'', +where \fIn\fP is the digit of the label. +As in the case of name labels, a numeric label assigns +the current value and type of the location counter +to the temporary symbol. +However, several numeric labels with the same digit +may be used within the same assembly. +References to symbols of the form +``\fIn\fPb'' +refer to the first numeric label ``\fIn\|:\fP'' +\fIb\fP\|ackwards from the reference; +``\fIn\fPf'' +symbols refer to the first numeric label ``\fIn\|:\fP'' +\fIf\fP\P\|orwards from the reference. +Such numeric labels tend to conserve the inventive powers of +the programmer. +.NH 2 +Null statements +.PP +A null statement is an empty statement (which may, however, +have labels). +A null statement is ignored by the assembler. +Common examples of null statements are empty +lines or lines containing only a label. +.NH 2 +Keyword statements +.PP +A keyword statement begins with one of the many predefined +keywords of the assembler; +the syntax of the remainder depends +on the keyword. +All instruction opcodes are keywords. +The remaining keywords are assembler pseudo-operations, +also called directives. +The pseudo-operations are listed below with the syntax they require. +.NH 1 +Expressions +.PP +An expression is a sequence of symbols representing a value. +Its constituents are identifiers, constants, +operators, and parentheses. +Each expression has a type. +.PP +All operators in expressions are fundamentally binary in +nature. +Arithmetic is two's complement and has 32 bits of precision. +There are four levels of precedence, listed here from +lowest precedence level to highest: +.IP (binary) 16 +\|+\|, -\| +.IP (binary) 16 +\||\|, \|&\|, \|^\|, \|!\| +.IP (binary) 16 +\|*\|, \|/\|, \|%\|, \|!\| +.IP (unary) 16 +\|-\|, \|!\| +.PP +All operators of the same precedence are evaluated strictly left to right, +except for evaluation order enforced by parenthesis. +.NH 2 +Expression operators +.PP +The operators are: +.IP + 16 +addition +.IP \- 16 +subtraction +.IP * 16 +multiplication +.IP / 16 +division +.IP % +modulo +.IP & 16 +bitwise and +.IP \(bv 16 +bitwise or +.IP ^ 16 +bitwise exclusive or +.IP "> (or >>)" 16 +logical right shift +.IP "< (or <<)" 16 +logical left shift +.hc +.IP ! 8 +\fIa\fR\|!\|\fIb\fR is \fIa \fBor \fR(\|\fBnot \fIb\fR\|); +i.e., the \fBor\fR of the first operand and +the one's complement of the second; most common use is +as a unary operator. +.PP +Expressions may be grouped by use of parentheses ``\|(\|\|)\|''. +.NH 2 +Types +.PP +The assembler deals with a number of types +of expressions. Most types +are attached to keywords and used to select the +routine which treats that keyword. The types likely +to be met explicitly are: +.IP undefined 8 +.br +Upon first encounter, each symbol is undefined. +It may become undefined if it is assigned an undefined expression. +It is an error to attempt to assemble an undefined +expression in pass 2; in pass 1, it is not (except that +certain keywords require operands which are not undefined). +.IP "undefined external" 8 +.br +A symbol which is declared \fB.globl\fR but not defined +in the current assembly is an undefined +external. +If such a symbol is declared, the link editor \fIld\fR +must be used to load the assembler's output with +another routine that defines the undefined reference. +.IP absolute 8 +.br +An absolute symbol is defined ultimately from a constant. +Its value is unaffected by any possible future applications +of the link-editor to the output file. +.IP text 8 +.br +The value of a text symbol is measured +with respect to the beginning of the text segment of the program. +If the assembler output is link-edited, its text +symbols may change in value +since the program need +not be the first in the link editor's output. +Most text symbols are defined by appearing as labels. +At the start of an assembly, the value of ``\|\fB.\fP\|'' is text 0. +.IP data 8 +.br +The value of a data symbol is measured +with respect to the origin of the data segment of a program. +Like text symbols, the value of a data symbol may change +during a subsequent link-editor run since previously +loaded programs may have data segments. +After the first \fB.data\fR statement, the value of ``\|\fB.\fP\|'' +is data 0. +.IP bss 8 +.br +The value of a bss symbol is measured from +the beginning of the bss segment of a program. +Like text and data symbols, the value of a bss symbol +may change during a subsequent link-editor +run, since previously loaded programs may have bss segments. +.IP "external absolute, text, data, or bss" 8 +.br +symbols declared \fB.globl\fR +but defined within an assembly as absolute, text, data, or bss +symbols may be used exactly as if they were not +declared \fB.globl\fR; however, their value and type are available +to the link editor so that the program may be loaded with others +that reference these symbols. +.IP register 8 +.br +The symbols +.DS +\fBr0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15\fP +\fBap fp sp pc\fP +.DE +are predefined +as register symbols. +In addition, the % operator converts an absolute value to type register. +.IP "other types" 8 +.br +Each keyword known to the assembler has a type which +is used to select the routine which processes +the associated keyword statement. +The behavior of such symbols +when not used as keywords is the same as if they were absolute. +.NH 2 +Type propagation in expressions +.PP +When operands are combined by expression operators, +the result has a type which depends on the types +of the operands and on the operator. +The rules involved are complex to state but +were intended to be sensible and predictable. +For purposes of expression evaluation the +important types are +.DS +undefined +absolute +text +data +bss +undefined external +other +.DE +The combination rules are then: +If one of the operands +is undefined, the result is undefined. +If both operands are absolute, the result is absolute. +If an absolute is combined with one of the ``other types'' +mentioned above, +the result has the other type. +An ``other type'' combined with an explicitly +discussed type other than absolute +it acts like an absolute. +.PP +Further rules applying to particular operators +are: +.IP + +If one operand is text-, data-, or bss-segment +relocatable, or is an undefined external, +the result has the postulated type and the other operand +must be absolute. +.IP \- +If the first operand is a relocatable +text-, data-, or bss-segment symbol, the second operand +may be absolute (in which case the result has the +type of the first operand); +or the second operand may have the same type +as the first (in which case the result is absolute). +If the first operand is external undefined, the second must be +absolute. +All other combinations are illegal. +.PP +.IP others +.br +It is illegal to apply these operators to any but absolute +symbols. +.NH 1 +Pseudo-operations (Directives) +.PP +The keywords listed below introduce +influence the later operations of the assembler. +The metanotation +.DS +[ stuff ] .\|.\|. +.DE +means that 0 or more instances of the given stuff may appear. +The metatnotation +.DS +( stuff )\|*\|\|\fIn\fP\| +.DE +means that exactly \fIn\fP occurances of stuff must occur. +.PP +Boldface tokens are literals, italic words +are substitutable. +.PP +The pseudo\-operations listed below are grouped into functional +categories, and not alphabetically. +.NH 2 +Interface to a Previous Pass +.in +5m +.NH 3 +\&.ABORT +.PP +As soon as the assembler sees this directive, it ignores all further +input (but it does read to the end of file), and aborts the assembly. +No files are created. +It is anticipated that this would be used in a pipe interconnected +version of a compiler, where the first major syntax error would +cause the compiler to issue this directive, saving unnecessary +work in assembling code that would have to be discarded anyway. +.NH 3 +\&.file \fIstring\fP +.PP +This directive causes the assembler to think it is in file \fIstring\fP +so error messages reflect the proper source file. +.NH 3 +\&.line \fIexpression\fP +.PP +This directive causes the assembler to think it is on line \fIexpression\fP +so error messages reflect the proper source file. +.PP +The only effect of assembling multiple files specified in the command string +is to insert the +\fIfile\fP and \fIline\fP directives, with the appropriate values, +at the beginning of the source from each file. +.NH 3 +Preprocessor Interface +.DS +\fI# expression string\fP +\fI# expression\fP +.DE +.PP +This is the only instance where a comment is meaningful to the assembler. +The ``\|#\|'' +.ul 1 +must +be in the first column. +This meta comment causes the assembler +to believe it is on line \fIexpression\fP. +The second argument, if included, causes the assembler to believe it is in +file \fIstring\fP, otherwise the current file name does not change. +.in -5m +.NH 2 +Location Counter Control +.in +5m +.NH 3 +\&\fB.align\fP \fIexpression\fP +.PP +The location counter is adjusted (by assembling bytes containing +zeroes, if necessary) so that the \fIexpression\fP lowest bits +become zero. +Thus ``.align 2'' makes the location counter evenly divisible by 4. +The expression must be defined, absolute, nonnegative, +and less than 16. +(Note that the subsegment concatenation convention +and the current loader conventions may not preserve attempts at aligning +to more than 2 low-order zero bits.) +.NH 3 +Subsegment switching +.DS + \fB.data\fP [ \fIexpression\fP ] + \fB.text\fP [ \fIexpression\fP ] +.DE +.PP +These two pseudo-operations cause the +assembler to begin assembling into the indicated text or data +subsegment. If specified, the expression must be defined and absolute; +an omitted expression is treated as zero. +The effect of a \fB.data\fP directive is treated +as a \fB.text\fP directive if the \fB\-R\fP assembly flag is set. +Assembly starts in the text 0 subsegment. +.NH 3 +\&\fB.org\fP \fIexpression\fP +.PP +The location counter is set equal to the value of the expression. +The expression must be defined. +The value of the expression must be greater than the current value +of the location counter. +.NH 3 +\&\fB.space\fP \fIexpression\fP +.PP +\&\fIexpression\fP bytes of zeroes are assembled. +.in -5m +.NH 2 +Initialized Data +.in +5m +.NH 3 +Expression Initialized Data +.DS + \fB.byte \fIexpression \fR[ \fB, \fIexpression \fR] .\|.\|. + \fB.word \fIexpression \fR[ \fB, \fIexpression \fR] .\|.\|. + \fB.int \fIexpression \fR[ \fB, \fIexpression \fR] .\|.\|. + \fB.long \fIexpression \fR[ \fB, \fIexpression \fR] .\|.\|. + + \fB.quad \fIexpression \fR[ \fB, \fIexpression \fR] .\|.\|. + + \fB.float \fIexpression \fR[ \fB, \fIexpression \fR] .\|.\|. + \fB.double \fIexpression \fR[ \fB, \fIexpression \fR] .\|.\|. +.DE +.PP +The \fIexpression\fP\|s in the comma-separated +list are truncated to the indicated size +(byte=8 bits, +word=16, +int=32, +long=32, +quad=64, +float=32, +double=64) +and +assembled in successive locations. +The expressions must be absolute. +The value assembled in bits 32-63 for \fB.double\fP is zero +if the expression is not of type double. +.PP +Except for \fB.quad\fP, \fB.float\fP and \fB.double\fP, +each expression may optionally be of the form +.DS + \fIexpression\d\s-21\&\s+2\u\fP \fB:\fP \fIexpression\d\s-2\&2\s+2\u\fP. +.DE +In this case the value of \fIexpression\d\s-2\&2\s+2\u\fP +is truncated to \fIexpression\d\s-2\&1\s+2\u\fP +bits and assembled in the next \fIexpr\d\s-2\&1\s+2\u\fP-bit +field which fits in +the natural data size being assembled. +Bits which are skipped because +a field does not fit are made zero. +Thus "\fB.byte\fP 123" is equivalent to +"\fB.byte\fP 8:123" and "\fB.byte\fP 3:1,2:1,5:1" +assembles two bytes, containing the values 9 and 1. +.br +\fBNB:\fP Since no \s-2VAX\s+2 compilers currently use bit fields, +these bit field constructs are liable to disappear in the future. +.NH 3 +String Initialized Data +.DS + \fB.ascii\fP \fIstring\fP [ \fB,\fP \fIstring\fP ] + \fB.asciz\fP \fIstring\fP [ \fB,\fP \fIstring\fP ] +.DE +.PP +Each \fIstring\fP in the list is assembled into successive locations, +with the first letter in the string being placed +into the first location, etc. +The \fB.ascii\fP directive will not null pad the string; +the \fB.asciz\fP directive will null pad the string. +(Recall that strings are known by their length, and need not be terminated +with a null, and that the C conventions for escaping are understood.) +The \fB.ascii\fP directive is identical to: +.DS +\&\fB.byte\fP \fIstring\d\s-2\&0\s+2\u\fP\fB,\fP \fIstring\d\s-2\&1\s+2\u\fP\fB,\fP ... +.DE +.NH 3 +Zero Filled Data +.DS +\fB.space\fP \fIexpression\fP +.DE +.PP +(See \(sc 8.2.4) +\&\fIexpression\fP bytes of zeroes are assembled. +\&\fIexpression\fP must be absolute. +.NH 3 +Arbitrarily Filled Data +.DS +\fB.fill\fP +\fIrep_expr\fP\fB, \fP +\fIsize_expr\fP\fB, \fP +\fIvalue_expr\fP\fR +.DE +.PP +All three expressions must be absolute. +\fIvalue_expr\fP, +treated as an expression of size \fIsize_expr\fP bytes, +is assembled and replicated \fIrep_expr\fP times. +The effect is to advance the current location counter +\fIrep_expr\fP \(** \fIsize_expr\fP bytes. +\fIsize_expr\fP must be between 1 and 8. +.in -5m +.NH 2 +Symbol Definition +.in +5m +.NH 3 +General +.in +5m +.NH 4 +\&\fB.comm\fI name \fB, \fIexpression\fR +.PP +Provided the \fIname\fR is not defined elsewhere, +its type is made ``undefined external'', and its value is \fIexpression\fR. +In fact the \fIname\fR behaves +in the current assembly just like an +undefined external. +However, the link editor \fIld\fR has been special-cased +so that all external symbols which are not +otherwise defined, and which have a non-zero +value, are defined to lie in the bss +segment, and enough space is left after the +symbol to hold \fIexpression\fR +bytes. +.NH 4 +\&\fB.lcomm\fI name \fB, \fIexpression\fR +.PP +\fIexpression\fP bytes will be allocated in the bss segment and \fIname\fP +assigned the location of the first byte, but the \fIname\fP is not declared +as global and hence will be unknown to the link editor. +.NH 4 +\&\fB.globl\fP \fIname\fP +.PP +This statement makes the \fIname\fR external. +If it is otherwise defined (by \fB.set\fP or by +appearance as a label) +it acts within the assembly exactly as if +the \fB.globl\fR statement were not given; +however, the link editor may be used +to combine this routine with other routines that refer +to this symbol. +.PP +Conversely, if the given symbol is not defined +within the current assembly, the link editor +can combine the output of this assembly +with that of others which define the symbol. +The assembler makes all otherwise +undefined symbols external. +.NH 4 +\&\fB.set\fP \fIname\fP \fB,\fP \fIexpression\fP +.PP +The (\fIname\fP, \fIexpression\fP) pair is entered into the symbol table. +Multiple \fB.set\fP statements with the same name are legal; +the most recent value replaces all previous values. +.in -5m +.NH 3 +Debugger Support +.in +5m +.NH 4 +\&\fB.lsym\fP \fIname\fP \fB,\fP \fIexpression\fP +.PP +A unique and otherwise unreferenceable instance of the +(\fIname\fP, \fIexpression\fP) +pair is created in the symbol table. +The Fortran 77 compiler uses this mechanism to pass local symbol definitions +to the link editor and debugger. +.NH 4 +Special Symbol Table entries +.DS +\&\fB.stab\fP (\fIexpr\d\s-2i\s+2\u \fB,\fR)\|*NCPS\| \fIexpr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fB,\fP expr\d\s-2\&4\s+2\u\fR +.in +5m +\fR(normal \fBs\fPymbol \fBtab\fPle entry)\fR +.in -5m +\&\fB.stabs\fP \fIstring, expr\d\s-2\&1\s+2\u, expr\d\s-2\&2\s+2\u, expr\d\s-2\&3\s+2\u, expr\d\s-2\&4\s+2\u\fR +.in +5m +\fR(\fBstab s\fPtring)\fR +.in -5m +\&\fB.stabn\fP \fIexpr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fB,\fP expr\d\s-2\&4\s+2\u\fR +.in +5m +\fR(\fBstab n\fPone)\fR +.in -5m +\&\fB.stabd\fP \fIexpr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fR +.in +5m +\fR(\fBstab d\fPot)\fR +.in -5m +.DE +.PP +The \fIstab\fP directives place symbols in the symbol table for the symbolic +debugger, \fIsdb\fP\s-2\u*\d\s+2. +.FS +.in +5 +.ti -5 +\s-2\u*\d\s+2Katseff, H.P. \fISdb: A Symbol Debugger\fP. +Bell Laboratories, Holmdel, +NJ. April 12, 1979. +.br +.ti -5 +\&Katseff, H.P. \fISymbol Table Format for Sdb\fP. File 39394, +Bell Laboratores, Holmdel, NJ. March 14, 1979. +.in -5 +.FE +In the \fB.stab\fP directive, +the first NCPS expressions are used for the +symbol name, which may be zero. +The \fB.stab\fP directive makes no sense if +the assembler recognizes arbitrary length symbols; +if so, the assembler complains. +The \fIstring\fP in the \fB.stabs\fP +directive more generally serves the same purpose as the NCPS expressions. +If the symbol name is zero, the +\&\fB.stabn\fP directive may be used instead. +.PP +The other expressions are stored in the name list structure in the symbol +table and preserved by the loader for reference by \fIsdb\fP\fR; +the value of the expressions are peculiar to formats required by \fIsdb\fP\fR. +.in +5m +.ti -5 +\&\fIexpr\d\s-2\&1\s+2\u\fP is used as a symbol table tag +(nlist field \fIn_type\fP). +.br +.ti -5 +\&\fIexpr\d\s-2\&2\s+2\u\fP seems to always be zero +(nlist field \fIn_other\fP). +.br +.ti -5 +\&\fIexpr\d\s-2\&3\s+2\u\fP is used for either the +source line number, or for a nesting level +(nlist field \fIn_desc\fP). +.br +.ti -5 +\fIexpr\d\s-2\&4\s+2\u\fR is used as tag specific information +(nlist field \fIn_value\fP). +In the +case of the \fB.stabd\fP directive, this expression is nonexistant, and +is taken to be the value of the location counter at the following instruction. +Since there is no associated name for a \fB.stabd\fP directive, it can +only be used in circumstances where the name is zero. +The effect of a \fB.stabd\fP directive can be achieved by one of the other +\&\fB.stab\fPx directives in the following manner: +.in -5m +.DS +\& \fB.stabs\fP \fIstring\fB,\fP expr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fB,\fP \fP LL\fIn\fP +LL\fIn\fP\fB:\fP +.DE +The \fB.stabd\fP directive is prefered, because it does not clog the symbol +table with labels used only for the stab symbol entries. +.in -5m +.in -5m +.NH 1 +Machine instructions +.PP +The syntax of machine instruction statements accepted by \fIas\fP +is generally similar to the syntax of \s8DEC MACRO\s10-32. There are +differences, however. +.NH 2 +Character set +.PP +\fIas\fP uses the character `$' instead of `#', +and the character `*' instead of `@'. Opcodes and register names +are spelled with lower-case rather than upper-case letters. +.NH 2 +Lengths +.PP +Under certain circumstances, the following constructs are (optionallly) +recognized by \&\fIas\fP to indicate the number of bytes to allocate +for unresolved expressions used to specify displacement or indirect +displacement addressing modes: +.DS +\&\fBB^\fP or \fBB\`\fP to indicate byte lengths (1 byte) +\&\fBW^\fP or \fBW\`\fP to indicate word lengths (2 bytes) +\&\fBL^\fP or \fBL\`\fP to indicate long word lengths (3 bytes) +.DE +One can also use lower case \fBb\fP, \fBw\fP or \fBl\fP instead of the upper +case letters. +There must be no space between the size specifier letter and the \fB^\fP or +\&\fB\`\fP. +The constructs \fBS^\fP and \fBG^\fP are not recognized +by \fIas\fP as they are by the \s-2DEC\s+2 assembler. +It is preferred to use the "\`" displacement specifier, +so that the ``^'' is not +misinterpreted as the \fBxor\fP operator. +.PP +Literal values (including floating-point literals used where the +hardware expects a floating-point operand) are assembled as short +literals if possible, hence not needing the \fBS^\fP \s-2DEC\s+2 +directive. If the value of the displacement is known exactly in the +first pass \fIas\fP determines the length automatically, assembling it +in the shortest possible way, ignoring (if present) the length +expression. If the value of the displacement is not known in the first +pass, \&\fI\fP will use the value of the displacement given by the +optional length specifier, or will use the value specified by the +\fB\-d\fP argument, or will default to 4 bytes. +.NH 2 +CASE instructions +.PP +\fIas\fP considers the instructions \fBcaseb\fP, \fBcasel\fP, \fBcasew\fP +to have three operands (namely: selector, base, limit). +The displacements must be explicitly assembled using one +or more \fB.word\fP statements. +.NH 2 +Extended branch instructions +.PP +These opcodes (formed in general +by substituting a ``j'' for the initial ``b'' +of the standard opcodes) +take as branch destinations the name of a label in the current +subsegment. If the destination is close enough then the corresponding +``b'' instruction is assembled. Otherwise the assembler choses a sequence +of one or more instructions which together have the same effect as if the +``b'' instruction had a larger span. In general, \fIas\fP chooses the +inverse branch followed by a \fBbrw\fP, but a \fBbrw\fP +is sometimes pooled among several ``j'' instructions with the same +destination. +If the \fB\-J\fP assembler option is given, +a \fBjmp\fP instruction is used instead of a \fBbrw\fP instruction +for \fBALL\fP (!!) ``j'' instructions with distant destinations. +This makes assembly of large (>32K bytes) assembly programs (inefficiently) +possible. +The current assembler does not try to use clever combinations of \fBbrb\fP, +\fBbrw\fP and \fBjmp\fP instructions. +The \fBjmp\fP instructions use PC relative addressing, with +the length of the offset given by the ``\fB\-d\fP'' assembler +option. +.KS +.DS +.ft B +.ta 1.0i 2.0i 3.0i +jeql jeqlu jneq jnequ +jgeq jgequ jgtr jgtru +jleq jlequ jlss jlssu +jbcc jbsc jbcs jbss +jlbc jlbs +jcc jcs +jvc jvs +jbc jbs +jbr +.DE +.KE +\fBjbr\fR turns into \fBbrb\fR +if its target is close enough; else a \fBbrw\fP is used. +.NH 1 +Diagnostics +.PP +Diagnostics are intended to be self explanatory and appear on +the standard output. +.NH 1 +Limits +.DS +.ta 2.0i +Arbitrary Files to assemble +Arbitrary Significant characters per name +127 Characters per input line +127 Characters per string +Arbitrary Symbols +4 Text segments +4 Data segments +.DE diff --git a/usr/doc/awk b/usr/doc/awk new file mode 100644 index 0000000000..758da7dc76 --- /dev/null +++ b/usr/doc/awk @@ -0,0 +1,1401 @@ +.fp 3 G +....TM "78-1271-12, 78-1273-6" 39199 39199-11 +.ND "September 1, 1978" +....TR 68 +.RP +. \" macros here +.tr _\(em +.if t .tr ~\(ap +.tr |\(or +.tr *\(** +.de UC +\&\\$3\s-1\\$1\\s0\&\\$2 +.. +.de IT +.if n .ul +\&\\$3\f2\\$1\fP\|\\$2 +.. +.de UL +.if n .ul +\&\\$3\f3\\$1\fP\&\\$2 +.. +.de P1 +.DS I 3n +.nf +.if n .ta 5 10 15 20 25 30 35 40 45 50 55 60 +.if t .ta .3i .6i .9i 1.2i +.if t .tr -\-'\(fm*\(** +.if t .tr _\(ul +.ft 3 +.lg 0 +.ss 18 +. \"use first argument as indent if present +.. +.de P2 +.ps \\n(PS +.vs \\n(VSp +.ft R +.ss 12 +.if n .ls 2 +.tr --''``^^!! +.if t .tr _\(em +.fi +.lg +.DE +.. +.hw semi-colon +.hy 14 +. \"2=not last lines; 4= no -xx; 8=no xx- +. \"special chars in programs +.de WS +.sp \\$1 +.. +. \" end of macros +.TL +Awk \(em A Pattern Scanning and Processing Language +.br +(Second Edition) +.AU "MH 2C-522" 4862 +Alfred V. Aho +.AU "MH 2C-518" 6021 +Brian W. Kernighan +.AU "MH 2C-514" 7214 +Peter J. Weinberger +.AI +.MH +.AB +.IT Awk +is a programming language whose +basic operation +is to search a set of files +for patterns, and to perform specified actions upon lines or fields of lines which +contain instances of those patterns. +.IT Awk +makes certain data selection and transformation operations easy to express; +for example, the +.IT awk +program +.sp +.ce +.ft 3 +length > 72 +.ft +.sp +prints all input lines whose length exceeds 72 characters; +the program +.ce +.sp +.ft 3 +NF % 2 == 0 +.ft R +.sp +prints all lines with an even number of fields; +and the program +.ce +.sp +.ft 3 +{ $1 = log($1); print } +.ft R +.sp +replaces the first field of each line by its logarithm. +.PP +.IT Awk +patterns may include arbitrary boolean combinations of regular expressions +and of relational operators on strings, numbers, fields, variables, and array elements. +Actions may include the same pattern-matching constructions as in patterns, +as well as +arithmetic and string expressions and assignments, +.UL if-else , +.UL while , +.UL for +statements, +and multiple output streams. +.PP +This report contains a user's guide, a discussion of the design and implementation of +.IT awk , +and some timing statistics. +....It supersedes TM-77-1271-5, dated September 8, 1977. +.AE +.CS 6 1 7 0 1 4 +.if n .ls 2 +.nr PS 9 +.nr VS 11 +.NH +Introduction +.if t .2C +.PP +.IT Awk +is a programming language designed to make +many common +information retrieval and text manipulation tasks +easy to state and to perform. +.PP +The basic operation of +.IT awk +is to scan a set of input lines in order, +searching for lines which match any of a set of patterns +which the user has specified. +For each pattern, an action can be specified; +this action will be performed on each line that matches the pattern. +.PP +Readers familiar with the +.UX +program +.IT grep\| +.[ +unix program manual +.] +will recognize +the approach, although in +.IT awk +the patterns may be more +general than in +.IT grep , +and the actions allowed are more involved than merely +printing the matching line. +For example, the +.IT awk +program +.P1 +{print $3, $2} +.P2 +prints the third and second columns of a table +in that order. +The program +.P1 +$2 ~ /A\||B\||C/ +.P2 +prints all input lines with an A, B, or C in the second field. +The program +.P1 +$1 != prev { print; prev = $1 } +.P2 +prints all lines in which the first field is different +from the previous first field. +.NH 2 +Usage +.PP +The command +.P1 +awk program [files] +.P2 +executes the +.IT awk +commands in +the string +.UL program +on the set of named files, +or on the standard input if there are no files. +The statements can also be placed in a file +.UL pfile , +and executed by the command +.P1 +awk -f pfile [files] +.P2 +.NH 2 +Program Structure +.PP +An +.IT awk +program is a sequence of statements of the form: +.P1 +.ft I + pattern { action } + pattern { action } + ... +.ft 3 +.P2 +Each line of input +is matched against +each of the patterns in turn. +For each pattern that matches, the associated action +is executed. +When all the patterns have been tested, the next line +is fetched and the matching starts over. +.PP +Either the pattern or the action may be left out, +but not both. +If there is no action for a pattern, +the matching line is simply +copied to the output. +(Thus a line which matches several patterns can be printed several times.) +If there is no pattern for an action, +then the action is performed for every input line. +A line which matches no pattern is ignored. +.PP +Since patterns and actions are both optional, +actions must be enclosed in braces +to distinguish them from patterns. +.NH 2 +Records and Fields +.PP +.IT Awk +input is divided into +``records'' terminated by a record separator. +The default record separator is a newline, +so by default +.IT awk +processes its input a line at a time. +The number of the current record is available in a variable +named +.UL NR . +.PP +Each input record +is considered to be divided into ``fields.'' +Fields are normally separated by +white space \(em blanks or tabs \(em +but the input field separator may be changed, as described below. +Fields are referred to as +.UL "$1, $2," +and so forth, +where +.UL $1 +is the first field, +and +.UL $0 +is the whole input record itself. +Fields may be assigned to. +The number of fields in the current record +is available in a variable named +.UL NF . +.PP +The variables +.UL FS +and +.UL RS +refer to the input field and record separators; +they may be changed at any time to any single character. +The optional command-line argument +\f3\-F\fIc\fR +may also be used to set +.UL FS +to the character +.IT c . +.PP +If the record separator is empty, +an empty input line is taken as the record separator, +and blanks, tabs and newlines are treated as field separators. +.PP +The variable +.UL FILENAME +contains the name of the current input file. +.NH 2 +Printing +.PP +An action may have no pattern, +in which case the action is executed for +all +lines. +The simplest action is to print some or all of a record; +this is accomplished by the +.IT awk +command +.UL print . +The +.IT awk +program +.P1 +{ print } +.P2 +prints each record, thus copying the input to the output intact. +More useful is to print a field or fields from each record. +For instance, +.P1 +print $2, $1 +.P2 +prints the first two fields in reverse order. +Items separated by a comma in the print statement will be separated by the current output field separator +when output. +Items not separated by commas will be concatenated, +so +.P1 +print $1 $2 +.P2 +runs the first and second fields together. +.PP +The predefined variables +.UL NF +and +.UL NR +can be used; +for example +.P1 +{ print NR, NF, $0 } +.P2 +prints each record preceded by the record number and the number of fields. +.PP +Output may be diverted to multiple files; +the program +.P1 +{ print $1 >"foo1"; print $2 >"foo2" } +.P2 +writes the first field, +.UL $1 , +on the file +.UL foo1 , +and the second field on file +.UL foo2 . +The +.UL >> +notation can also be used: +.P1 +print $1 >>"foo" +.P2 +appends the output to the file +.UL foo . +(In each case, +the output files are +created if necessary.) +The file name can be a variable or a field as well as a constant; +for example, +.P1 +print $1 >$2 +.P2 +uses the contents of field 2 as a file name. +.PP +Naturally there is a limit on the number of output files; +currently it is 10. +.PP +Similarly, output can be piped into another process +(on +.UC UNIX +only); for instance, +.P1 +print | "mail bwk" +.P2 +mails the output to +.UL bwk . +.PP +The variables +.UL OFS +and +.UL ORS +may be used to change the current +output field separator and output +record separator. +The output record separator is +appended to the output of the +.UL print +statement. +.PP +.IT Awk +also provides the +.UL printf +statement for output formatting: +.P1 +printf format expr, expr, ... +.P2 +formats the expressions in the list +according to the specification +in +.UL format +and prints them. +For example, +.P1 +printf "%8.2f %10ld\en", $1, $2 +.P2 +prints +.UL $1 +as a floating point number 8 digits wide, +with two after the decimal point, +and +.UL $2 +as a 10-digit long decimal number, +followed by a newline. +No output separators are produced automatically; +you must add them yourself, +as in this example. +The version of +.UL printf +is identical to that used with C. +.[ +C programm language prentice hall 1978 +.] +.NH 1 +Patterns +.PP +A pattern in front of an action acts as a selector +that determines whether the action is to be executed. +A variety of expressions may be used as patterns: +regular expressions, +arithmetic relational expressions, +string-valued expressions, +and arbitrary boolean +combinations of these. +.NH 2 +BEGIN and END +.PP +The special pattern +.UL BEGIN +matches the beginning of the input, +before the first record is read. +The pattern +.UL END +matches the end of the input, +after the last record has been processed. +.UL BEGIN +and +.UL END +thus provide a way to gain control before and after processing, +for initialization and wrapup. +.PP +As an example, the field separator +can be set to a colon by +.P1 +BEGIN { FS = ":" } +.ft I +\&... rest of program ... +.ft 3 +.P2 +Or the input lines may be counted by +.P1 +END { print NR } +.P2 +If +.UL BEGIN +is present, it must be the first pattern; +.UL END +must be the last if used. +.NH 2 +Regular Expressions +.PP +The simplest regular expression is a literal string of characters +enclosed in slashes, +like +.P1 +/smith/ +.P2 +This +is actually a complete +.IT awk +program which +will print all lines which contain any occurrence +of the name ``smith''. +If a line contains ``smith'' +as part of a larger word, +it will also be printed, as in +.P1 +blacksmithing +.P2 +.PP +.IT Awk +regular expressions include the regular expression +forms found in +the +.UC UNIX +text editor +.IT ed\| +.[ +unix program manual +.] +and +.IT grep +(without back-referencing). +In addition, +.IT awk +allows +parentheses for grouping, | for alternatives, +.UL + +for ``one or more'', and +.UL ? +for ``zero or one'', +all as in +.IT lex . +Character classes +may be abbreviated: +.UL [a\-zA\-Z0\-9] +is the set of all letters and digits. +As an example, +the +.IT awk +program +.P1 +/[Aa]ho\||[Ww]einberger\||[Kk]ernighan/ +.P2 +will print all lines which contain any of the names +``Aho,'' ``Weinberger'' or ``Kernighan,'' +whether capitalized or not. +.PP +Regular expressions +(with the extensions listed above) +must be enclosed in slashes, +just as in +.IT ed +and +.IT sed . +Within a regular expression, +blanks and the regular expression +metacharacters are significant. +To turn of the magic meaning +of one of the regular expression characters, +precede it with a backslash. +An example is the pattern +.P1 +/\|\e/\^.\^*\e// +.P2 +which matches any string of characters +enclosed in slashes. +.PP +One can also specify that any field or variable +matches +a regular expression (or does not match it) with the operators +.UL ~ +and +.UL !~ . +The program +.P1 +$1 ~ /[jJ]ohn/ +.P2 +prints all lines where the first field matches ``john'' or ``John.'' +Notice that this will also match ``Johnson'', ``St. Johnsbury'', and so on. +To restrict it to exactly +.UL [jJ]ohn , +use +.P1 +$1 ~ /^[jJ]ohn$/ +.P2 +The caret ^ refers to the beginning +of a line or field; +the dollar sign +.UL $ +refers to the end. +.NH 2 +Relational Expressions +.PP +An +.IT awk +pattern can be a relational expression +involving the usual relational operators +.UL < , +.UL <= , +.UL == , +.UL != , +.UL >= , +and +.UL > . +An example is +.P1 +$2 > $1 + 100 +.P2 +which selects lines where the second field +is at least 100 greater than the first field. +Similarly, +.P1 +NF % 2 == 0 +.P2 +prints lines with an even number of fields. +.PP +In relational tests, if neither operand is numeric, +a string comparison is made; +otherwise it is numeric. +Thus, +.P1 +$1 >= "s" +.P2 +selects lines that begin with an +.UL s , +.UL t , +.UL u , +etc. +In the absence of any other information, +fields are treated as strings, so +the program +.P1 +$1 > $2 +.P2 +will perform a string comparison. +.NH 2 +Combinations of Patterns +.PP +A pattern can be any boolean combination of patterns, +using the operators +.UL \||\|| +(or), +.UL && +(and), and +.UL ! +(not). +For example, +.P1 +$1 >= "s" && $1 < "t" && $1 != "smith" +.P2 +selects lines where the first field begins with ``s'', but is not ``smith''. +.UL && +and +.UL \||\|| +guarantee that their operands +will be evaluated +from left to right; +evaluation stops as soon as the truth or falsehood +is determined. +.NH 2 +Pattern Ranges +.PP +The ``pattern'' that selects an action may also +consist of two patterns separated by a comma, as in +.P1 +pat1, pat2 { ... } +.P2 +In this case, the action is performed for each line between +an occurrence of +.UL pat1 +and the next occurrence of +.UL pat2 +(inclusive). +For example, +.P1 +/start/, /stop/ +.P2 +prints all lines between +.UL start +and +.UL stop , +while +.P1 +NR == 100, NR == 200 { ... } +.P2 +does the action for lines 100 through 200 +of the input. +.NH 1 +Actions +.PP +An +.IT awk +action is a sequence of action statements +terminated by newlines or semicolons. +These action statements can be used to do a variety of +bookkeeping and string manipulating tasks. +.NH 2 +Built-in Functions +.PP +.IT Awk +provides a ``length'' function +to compute the length of a string of characters. +This program prints each record, +preceded by its length: +.P1 +{print length, $0} +.P2 +.UL length +by itself is a ``pseudo-variable'' which +yields the length of the current record; +.UL length(argument) +is a function which yields the length of its argument, +as in +the equivalent +.P1 +{print length($0), $0} +.P2 +The argument may be any expression. +.PP +.IT Awk +also +provides the arithmetic functions +.UL sqrt , +.UL log , +.UL exp , +and +.UL int , +for +square root, +base +.IT e +logarithm, +exponential, +and integer part of their respective arguments. +.PP +The name of one of these built-in functions, +without argument or parentheses, +stands for the value of the function on the +whole record. +The program +.P1 +length < 10 || length > 20 +.P2 +prints lines whose length +is less than 10 or greater +than 20. +.PP +The function +.UL substr(s,\ m,\ n) +produces the substring of +.UL s +that begins at position +.UL m +(origin 1) +and is at most +.UL n +characters long. +If +.UL n +is omitted, the substring goes to the end of +.UL s . +The function +.UL index(s1,\ s2) +returns the position where the string +.UL s2 +occurs in +.UL s1 , +or zero if it does not. +.PP +The function +.UL sprintf(f,\ e1,\ e2,\ ...) +produces the value of the expressions +.UL e1 , +.UL e2 , +etc., +in the +.UL printf +format specified by +.UL f . +Thus, for example, +.P1 +x = sprintf("%8.2f %10ld", $1, $2) +.P2 +sets +.UL x +to the string produced by formatting +the values of +.UL $1 +and +.UL $2 . +.NH 2 +Variables, Expressions, and Assignments +.PP +.IT Awk +variables take on numeric (floating point) +or string values according to context. +For example, in +.P1 +x = 1 +.P2 +.UL x +is clearly a number, while in +.P1 +x = "smith" +.P2 +it is clearly a string. +Strings are converted to numbers and +vice versa whenever context demands it. +For instance, +.P1 +x = "3" + "4" +.P2 +assigns 7 to +.UL x . +Strings which cannot be interpreted +as numbers in a numerical context +will generally have numeric value zero, +but it is unwise to count on this behavior. +.PP +By default, variables (other than built-ins) are initialized to the null string, +which has numerical value zero; +this eliminates the need for most +.UL BEGIN +sections. +For example, the sums of the first two fields can be computed by +.P1 + { s1 += $1; s2 += $2 } +END { print s1, s2 } +.P2 +.PP +Arithmetic is done internally in floating point. +The arithmetic operators are +.UL + , +.UL \- , +.UL \(** , +.UL / , +and +.UL % +(mod). +The C increment +.UL ++ +and +decrement +.UL \-\- +operators are also available, +and so are the assignment operators +.UL += , +.UL \-= , +.UL *= , +.UL /= , +and +.UL %= . +These operators may all be used in expressions. +.NH 2 +Field Variables +.PP +Fields in +.IT awk +share essentially all of the properties of variables _ +they may be used in arithmetic or string operations, +and may be assigned to. +Thus one can +replace the first field with a sequence number like this: +.P1 +{ $1 = NR; print } +.P2 +or +accumulate two fields into a third, like this: +.P1 +{ $1 = $2 + $3; print $0 } +.P2 +or assign a string to a field: +.P1 +{ if ($3 > 1000) + $3 = "too big" + print +} +.P2 +which replaces the third field by ``too big'' when it is, +and in any case prints the record. +.PP +Field references may be numerical expressions, +as in +.P1 +{ print $i, $(i+1), $(i+n) } +.P2 +Whether a field is deemed numeric or string depends on context; +in ambiguous cases like +.P1 +if ($1 == $2) ... +.P2 +fields are treated as strings. +.PP +Each input line is split into fields automatically as necessary. +It is also possible to split any variable or string +into fields: +.P1 +n = split(s, array, sep) +.P2 +splits the +the string +.UL s +into +.UL array[1] , +\&..., +.UL array[n] . +The number of elements found is returned. +If the +.UL sep +argument is provided, it is used as the field separator; +otherwise +.UL FS +is used as the separator. +.NH 2 +String Concatenation +.PP +Strings may be concatenated. +For example +.P1 +length($1 $2 $3) +.P2 +returns the length of the first three fields. +Or in a +.UL print +statement, +.P1 +print $1 " is " $2 +.P2 +prints +the two fields separated by `` is ''. +Variables and numeric expressions may also appear in concatenations. +.NH 2 +Arrays +.PP +Array elements are not declared; +they spring into existence by being mentioned. +Subscripts may have +.ul +any +non-null +value, including non-numeric strings. +As an example of a conventional numeric subscript, +the statement +.P1 +x[NR] = $0 +.P2 +assigns the current input record to +the +.UL NR -th +element of the array +.UL x . +In fact, it is possible in principle (though perhaps slow) +to process the entire input in a random order with the +.IT awk +program +.P1 + { x[NR] = $0 } +END { \fI... program ...\fP } +.P2 +The first action merely records each input line in +the array +.UL x . +.PP +Array elements may be named by non-numeric values, +which gives +.IT awk +a capability rather like the associative memory of +Snobol tables. +Suppose the input contains fields with values like +.UL apple , +.UL orange , +etc. +Then the program +.P1 +/apple/ { x["apple"]++ } +/orange/ { x["orange"]++ } +END { print x["apple"], x["orange"] } +.P2 +increments counts for the named array elements, +and prints them at the end of the input. +.NH 2 +Flow-of-Control Statements +.PP +.IT Awk +provides the basic flow-of-control statements +.UL if-else , +.UL while , +.UL for , +and statement grouping with braces, as in C. +We showed the +.UL if +statement in section 3.3 without describing it. +The condition in parentheses is evaluated; +if it is true, the statement following the +.UL if +is done. +The +.UL else +part is optional. +.PP +The +.UL while +statement is exactly like that of C. +For example, to print all input fields one per line, +.P1 +i = 1 +while (i <= NF) { + print $i + ++i +} +.P2 +.PP +The +.UL for +statement is also exactly that of C: +.P1 +for (i = 1; i <= NF; i++) + print $i +.P2 +does the same job as the +.UL while +statement above. +.PP +There is an alternate form of the +.UL for +statement which is suited for accessing the +elements of an associative array: +.P1 +for (i in array) + \fIstatement\f3 +.P2 +does +.ul +statement +with +.UL i +set in turn to each element of +.UL array . +The elements are accessed in an apparently random order. +Chaos will ensue if +.UL i +is altered, or if any new elements are +accessed during the loop. +.PP +The expression in the condition part of an +.UL if , +.UL while +or +.UL for +can include relational operators like +.UL < , +.UL <= , +.UL > , +.UL >= , +.UL == +(``is equal to''), +and +.UL != +(``not equal to''); +regular expression matches with the match operators +.UL ~ +and +.UL !~ ; +the logical operators +.UL \||\|| , +.UL && , +and +.UL ! ; +and of course parentheses for grouping. +.PP +The +.UL break +statement causes an immediate exit +from an enclosing +.UL while +or +.UL for ; +the +.UL continue +statement +causes the next iteration to begin. +.PP +The statement +.UL next +causes +.IT awk +to skip immediately to +the next record and begin scanning the patterns from the top. +The statement +.UL exit +causes the program to behave as if the end of the input +had occurred. +.PP +Comments may be placed in +.IT awk +programs: +they begin with the character +.UL # +and end with the end of the line, +as in +.P1 +print x, y # this is a comment +.P2 +.NH +Design +.PP +The +.UX +system +already provides several programs that +operate by passing input through a +selection mechanism. +.IT Grep , +the first and simplest, merely prints all lines which +match a single specified pattern. +.IT Egrep +provides more general patterns, i.e., regular expressions +in full generality; +.IT fgrep +searches for a set of keywords with a particularly fast algorithm. +.IT Sed\| +.[ +unix programm manual +.] +provides most of the editing facilities of +the editor +.IT ed , +applied to a stream of input. +None of these programs provides +numeric capabilities, +logical relations, +or variables. +.PP +.IT Lex\| +.[ +lesk lexical analyzer cstr +.] +provides general regular expression recognition capabilities, +and, by serving as a C program generator, +is essentially open-ended in its capabilities. +The use of +.IT lex , +however, requires a knowledge of C programming, +and a +.IT lex +program must be compiled and loaded before use, +which discourages its use for one-shot applications. +.PP +.IT Awk +is an attempt +to fill in another part of the matrix of possibilities. +It +provides general regular expression capabilities +and an implicit input/output loop. +But it also provides convenient numeric processing, +variables, +more general selection, +and control flow in the actions. +It +does not require compilation or a knowledge of C. +Finally, +.IT awk +provides +a convenient way to access fields within lines; +it is unique in this respect. +.PP +.IT Awk +also tries to integrate strings and numbers +completely, +by treating all quantities as both string and numeric, +deciding which representation is appropriate +as late as possible. +In most cases the user can simply ignore the differences. +.PP +Most of the effort in developing +.I awk +went into deciding what +.I awk +should or should not do +(for instance, it doesn't do string substitution) +and what the syntax should be +(no explicit operator for concatenation) +rather +than on writing or debugging the code. +We have tried +to make the syntax powerful +but easy to use and well adapted +to scanning files. +For example, +the absence of declarations and implicit initializations, +while probably a bad idea for a general-purpose programming language, +is desirable in a language +that is meant to be used for tiny programs +that may even be composed on the command line. +.PP +In practice, +.IT awk +usage seems to fall into two broad categories. +One is what might be called ``report generation'' \(em +processing an input to extract counts, +sums, sub-totals, etc. +This also includes the writing of trivial +data validation programs, +such as verifying that a field contains only numeric information +or that certain delimiters are properly balanced. +The combination of textual and numeric processing is invaluable here. +.PP +A second area of use is as a data transformer, +converting data from the form produced by one program +into that expected by another. +The simplest examples merely select fields, perhaps with rearrangements. +.NH +Implementation +.PP +The actual implementation of +.IT awk +uses the language development tools available +on the +.UC UNIX +operating system. +The grammar is specified with +.IT yacc ; +.[ +yacc johnson cstr +.] +the lexical analysis is done by +.IT lex ; +the regular expression recognizers are +deterministic finite automata +constructed directly from the expressions. +An +.IT awk +program is translated into a +parse tree which is then directly executed +by a simple interpreter. +.PP +.IT Awk +was designed for ease of use rather than processing speed; +the delayed evaluation of variable types +and the necessity to break input +into fields makes high speed difficult to achieve in any case. +Nonetheless, +the program has not proven to be unworkably slow. +.PP +Table I below shows the execution (user + system) time +on a PDP-11/70 of +the +.UC UNIX +programs +.IT wc , +.IT grep , +.IT egrep , +.IT fgrep , +.IT sed , +.IT lex , +and +.IT awk +on the following simple tasks: +.IP "\ \ 1." +count the number of lines. +.IP "\ \ 2." +print all lines containing ``doug''. +.IP "\ \ 3." +print all lines containing ``doug'', ``ken'' or ``dmr''. +.IP "\ \ 4." +print the third field of each line. +.IP "\ \ 5." +print the third and second fields of each line, in that order. +.IP "\ \ 6." +append all lines containing ``doug'', ``ken'', and ``dmr'' +to files ``jdoug'', ``jken'', and ``jdmr'', respectively. +.IP "\ \ 7." +print each line prefixed by ``line-number\ :\ ''. +.IP "\ \ 8." +sum the fourth column of a table. +.LP +The program +.IT wc +merely counts words, lines and characters in its input; +we have already mentioned the others. +In all cases the input was a file containing +10,000 lines +as created by the +command +.IT "ls \-l" ; +each line has the form +.P1 +-rw-rw-rw- 1 ava 123 Oct 15 17:05 xxx +.P2 +The total length of this input is +452,960 characters. +Times for +.IT lex +do not include compile or load. +.PP +As might be expected, +.IT awk +is not as fast as the specialized tools +.IT wc , +.IT sed , +or the programs in the +.IT grep +family, +but +is faster than the more general tool +.IT lex . +In all cases, the tasks were +about as easy to express as +.IT awk +programs +as programs in these other languages; +tasks involving fields were +considerably easier to express as +.IT awk +programs. +Some of the test programs are shown in +.IT awk , +.IT sed +and +.IT lex . +.[ +$LIST$ +.] +.1C +.TS +center; +c c c c c c c c c +c c c c c c c c c +c|n|n|n|n|n|n|n|n|. + Task +Program 1 2 3 4 5 6 7 8 +_ +\fIwc\fR 8.6 +\fIgrep\fR 11.7 13.1 +\fIegrep\fR 6.2 11.5 11.6 +\fIfgrep\fR 7.7 13.8 16.1 +\fIsed\fR 10.2 11.6 15.8 29.0 30.5 16.1 +\fIlex\fR 65.1 150.1 144.2 67.7 70.3 104.0 81.7 92.8 +\fIawk\fR 15.0 25.6 29.9 33.3 38.9 46.4 71.4 31.1 +_ +.TE +.sp +.ce +\fBTable I.\fR Execution Times of Programs. (Times are in sec.) +.sp 2 +.2C +.PP +The programs for some of these jobs are shown below. +The +.IT lex +programs are generally too long to show. +.LP +AWK: +.LP +.P1 +1. END {print NR} +.P2 +.P1 +2. /doug/ +.P2 +.P1 +3. /ken|doug|dmr/ +.P2 +.P1 +4. {print $3} +.P2 +.P1 +5. {print $3, $2} +.P2 +.P1 +6. /ken/ {print >"jken"} + /doug/ {print >"jdoug"} + /dmr/ {print >"jdmr"} +.P2 +.P1 +7. {print NR ": " $0} +.P2 +.P1 +8. {sum = sum + $4} + END {print sum} +.P2 +.LP +SED: +.LP +.P1 +1. $= +.P2 +.P1 +2. /doug/p +.P2 +.P1 +3. /doug/p + /doug/d + /ken/p + /ken/d + /dmr/p + /dmr/d +.P2 +.P1 +4. /[^ ]* [ ]*[^ ]* [ ]*\e([^ ]*\e) .*/s//\e1/p +.P2 +.P1 +5. /[^ ]* [ ]*\e([^ ]*\e) [ ]*\e([^ ]*\e) .*/s//\e2 \e1/p +.P2 +.P1 +6. /ken/w jken + /doug/w jdoug + /dmr/w jdmr +.P2 +.LP +LEX: +.LP +.P1 +1. %{ + int i; + %} + %% + \en i++; + . ; + %% + yywrap() { + printf("%d\en", i); + } +.P2 +.P1 +2. %% + ^.*doug.*$ printf("%s\en", yytext); + . ; + \en ; +.P2 diff --git a/usr/doc/bc b/usr/doc/bc new file mode 100644 index 0000000000..58d839c945 --- /dev/null +++ b/usr/doc/bc @@ -0,0 +1,1059 @@ +.RP +.TL +BC \- An Arbitrary Precision Desk-Calculator Language +.AU +Lorinda Cherry +.AU +Robert Morris +.AI +.MH +.AB +BC is a language and a compiler for doing arbitrary precision arithmetic +on the PDP-11 under the +.UX +time-sharing +system. The output of the compiler is interpreted and executed by +a collection of routines which can input, output, and do +arithmetic on indefinitely large integers and on scaled fixed-point +numbers. +.PP +These routines are themselves based on a dynamic storage allocator. +Overflow does not occur until all available core storage +is exhausted. +.PP +The language has a complete control structure as well as immediate-mode +operation. Functions can be defined and saved for later execution. +.PP +Two five hundred-digit numbers can be multiplied to give a +thousand digit result in about ten seconds. +.PP +A small collection of library functions is also available, +including sin, cos, arctan, log, exponential, and Bessel functions of +integer order. +.PP +Some of the uses of this compiler are +.IP \- +to do computation with large integers, +.IP \- +to do computation accurate to many decimal places, +.IP \- +conversion of numbers from one base to another base. +.AE +.PP +.SH +Introduction +.PP +BC is a language and a compiler for doing arbitrary precision +arithmetic on the +.UX +time-sharing system [1]. +The compiler was written to make conveniently available a +collection of routines (called DC [5]) which are capable of doing +arithmetic on integers of arbitrary size. The compiler +is by no means intended to provide a complete programming +language. +It is a minimal language facility. +.PP +There is a scaling provision that permits the +use of decimal point notation. +Provision is made for input and output in bases other than +decimal. Numbers can be converted from decimal to octal by +simply setting the output base to equal 8. +.PP +The actual limit on the number of digits that can +be handled depends on the amount of storage available on the machine. +Manipulation of numbers with many hundreds of digits +is possible even on the smallest versions of +.UX . +.PP +The syntax of BC has been deliberately selected to agree +substantially with the C language [2]. Those who +are familiar with C will find few surprises in this language. +.SH +Simple Computations with Integers +.PP +The simplest kind of statement is an arithmetic expression +on a line by itself. +For instance, if you type in the line: +.DS +142857 + 285714 +.DE +the program responds immediately with the line +.DS +428571 +.DE +The operators \-, *, /, %, and ^ can also be used; they +indicate subtraction, multiplication, division, remaindering, and +exponentiation, respectively. Division of integers produces an +integer result truncated toward zero. +Division by zero produces an error +comment. +.PP +Any term in an expression may be prefixed by a minus sign to +indicate that it is to be negated (the `unary' minus sign). +The expression +.DS +7+\-3 +.DE +is interpreted to mean that \-3 is to be added to 7. +.PP +More complex expressions with several operators and with +parentheses are interpreted just as in +Fortran, with ^ having the greatest binding +power, then * and % and /, and finally + and \-. +Contents of parentheses are evaluated before material +outside the parentheses. +Exponentiations are +performed from right to left and the other operators +from left to right. +The two expressions +.DS +a^b^c and a^(b^c) +.DE +are equivalent, as are the two expressions +.DS +a*b*c and (a*b)*c +.DE +BC shares with Fortran and C the undesirable convention that +.DS +a/b*c is equivalent to (a/b)*c +.DE +.PP +Internal storage registers to hold numbers have single lower-case +letter names. The value of an expression can be assigned to +a register in the usual way. The statement +.DS +x = x + 3 +.DE +has the effect of increasing by three the value of the contents of the +register named x. +When, as in this case, the outermost operator is an =, the +assignment is performed but the result is not printed. +Only 26 of these named storage registers are available. +.PP +There is a built-in square root function whose +result is truncated to an integer (but see scaling below). +The lines +.DS +x = sqrt(191) +x +.DE +produce the printed result +.DS +13 +.DE +.SH +Bases +.PP +There are special internal quantities, called `ibase' and `obase'. +The contents of `ibase', initially set to 10, +determines the base used for interpreting numbers read in. +For example, the lines +.DS +ibase = 8 +11 +.DE +will produce the output line +.DS +9 +.DE +and you are all set up to do octal to decimal conversions. +Beware, however of trying to change the input base back +to decimal by typing +.DS +ibase = 10 +.DE +Because the number 10 is interpreted as octal, this statement will +have no effect. +For those who deal in hexadecimal notation, +the characters A\-F are permitted in numbers +(no matter what base is in effect) +and are +interpreted as digits having values 10\-15 respectively. +The statement +.DS +ibase = A +.DE +will change you back to decimal input base no matter what the +current input base is. +Negative and large positive input bases are +permitted but useless. +No mechanism has been provided for the input of arbitrary +numbers in bases less than 1 and greater than 16. +.PP +The contents of `obase', initially set to 10, are used as the base for output +numbers. The lines +.DS +obase = 16 +1000 +.DE +will produce the output line +.DS +3E8 +.DE +which is to be interpreted as a 3-digit hexadecimal number. +Very large output bases are permitted, and they are sometimes useful. +For example, large numbers can be output in groups of five digits +by setting `obase' to 100000. +Strange (i.e. 1, 0, or negative) output bases are +handled appropriately. +.PP +Very large numbers are split across lines with 70 characters per line. +Lines which are continued end with \\. +Decimal output conversion is practically instantaneous, but output +of very large numbers (i.e., more than 100 digits) with other bases +is rather slow. +Non-decimal output conversion of +a one hundred digit number takes about +three seconds. +.PP +It is best to remember that `ibase' and `obase' have no effect +whatever on the course of internal computation or +on the evaluation of expressions, but only affect input and +output conversion, respectively. +.SH +Scaling +.PP +A third special internal quantity called `scale' is +used to determine the scale of calculated +quantities. +Numbers may have +up to 99 decimal digits after the decimal point. +This fractional part is retained in further computations. +We refer to the number of digits after the decimal point of +a number as its scale. +.PP +When two scaled numbers are combined by +means of one of the arithmetic operations, the result +has a scale determined by the following rules. For +addition and subtraction, the scale of the result is the larger +of the scales of the two operands. In this case, +there is never any truncation of the result. +For multiplications, the scale of the result is never +less than the maximum of the two scales of the operands, +never more than the sum of the scales of the operands +and, subject to those two restrictions, +the scale of the result is set equal to the contents of the internal +quantity `scale'. +The scale of a quotient is the contents of the internal +quantity `scale'. The scale of a remainder is +the sum of the scales of the quotient and the divisor. +The result of an exponentiation is scaled as if +the implied multiplications were performed. +An exponent must be an integer. +The scale of a square root is set to the maximum of the scale +of the argument and the contents of `scale'. +.PP +All of the internal operations are actually carried out in terms +of integers, with digits being discarded when necessary. +In every case where digits are discarded, truncation and +not rounding is performed. +.PP +The contents of +`scale' must be no greater than +99 and no less than 0. It is initially set to 0. +In case you need more than 99 fraction digits, you may arrange +your own scaling. +.PP +The internal quantities `scale', `ibase', and `obase' can be +used in expressions just like other variables. +The line +.DS +scale = scale + 1 +.DE +increases the value of `scale' by one, and the line +.DS +scale +.DE +causes the current value of `scale' to be printed. +.PP +The value of `scale' retains its meaning as a +number of decimal digits to be retained in internal +computation even when `ibase' or `obase' are not equal to 10. +The internal computations (which are still conducted in decimal, +regardless of the bases) are performed to the specified number +of decimal digits, never hexadecimal or octal or any +other kind of digits. +.SH +Functions +.PP +The name of a function is a single lower-case letter. +Function names are permitted to collide with simple +variable names. +Twenty-six different defined functions are permitted +in addition to the twenty-six variable names. +The line +.DS + define a(x){ +.DE +begins the definition of a function with one argument. +This line must be followed by one or more statements, +which make up the body of the function, ending +with a right brace }. +Return of control from a function occurs when a return +statement is executed or when the end of the function is reached. +The return statement can take either +of the two forms +.DS +return +return(x) +.DE +In the first case, the value of the function is 0, and in +the second, the value of the expression in parentheses. +.PP +Variables used in the function can be declared as automatic +by a statement of the form +.DS +auto x,y,z +.DE +There can be only one `auto' statement in a function and it must +be the first statement in the definition. +These automatic variables are allocated space and initialized +to zero on entry to the function and thrown away on return. The +values of any variables with the same names outside the function +are not disturbed. +Functions may be called recursively and the automatic variables +at each level of call are protected. +The parameters named in a function definition are treated in +the same way as the automatic variables of that function +with the single exception that they are given a value +on entry to the function. +An example of a function definition is +.DS + define a(x,y){ + auto z + z = x*y + return(z) + } +.DE +The value of this function, when called, will be the +product of its +two arguments. +.PP +A function is called by the appearance of its name +followed by a string of arguments enclosed in +parentheses and separated by commas. +The result +is unpredictable if the wrong number of arguments is used. +.PP +Functions with no arguments are defined and called using +parentheses with nothing between them: b(). +.PP +If the function +.ft I +a +.ft +above has been defined, then the line +.DS +a(7,3.14) +.DE +would cause the result 21.98 to be printed and the line +.DS +x = a(a(3,4),5) +.DE +would cause the value of x to become 60. +.SH +Subscripted Variables +.PP +A single lower-case letter variable name +followed by an expression in brackets is called a subscripted +variable (an array element). +The variable name is called the array name and the expression +in brackets is called the subscript. +Only one-dimensional arrays are +permitted. The names of arrays are permitted to +collide with the names of simple variables and function names. +Any fractional +part of a subscript is discarded before use. +Subscripts must be greater than or equal to zero and +less than or equal to 2047. +.PP +Subscripted variables may be freely used in expressions, in +function calls, and in return statements. +.PP +An array name may be used as an argument to a function, +or may be declared as automatic in +a function definition by the use of empty brackets: +.DS +f(a[\|]) +define f(a[\|]) +auto a[\|] +.DE +When an array name is so used, the whole contents of the array +are copied for the use of the function, and thrown away on exit +from the function. +Array names which refer to whole arrays cannot be used +in any other contexts. +.SH +Control Statements +.PP +The `if', the `while', and the `for' statements +may be used to alter the flow within programs or to cause iteration. +The range of each of them is a statement or +a compound statement consisting of a collection of +statements enclosed in braces. +They are written in the following way +.DS +if(relation) statement +while(relation) statement +for(expression1; relation; expression2) statement +.DE +or +.DS +if(relation) {statements} +while(relation) {statements} +for(expression1; relation; expression2) {statements} +.DE +.PP +A relation in one of the control statements is an expression of the form +.DS +x>y +.DE +where two expressions are related by one of the six relational +operators <, >, <=, >=, ==, or !=. +The relation == +stands for `equal to' and != stands for `not equal to'. +The meaning of the remaining relational operators is +clear. +.PP +BEWARE of using = instead of == in a relational. Unfortunately, +both of them are legal, so you will not get a diagnostic +message, but = really will not do a comparison. +.PP +The `if' statement causes execution of its range +if and only if the relation is true. +Then control passes to the next statement in sequence. +.PP +The `while' statement causes execution of its range +repeatedly as long as the relation +is true. The relation is tested before each execution +of its range and if the relation +is false, control passes to the next statement beyond the range +of the while. +.PP +The `for' statement begins +by executing `expression1'. Then the relation is tested +and, if true, the statements in the range of the `for' are executed. +Then `expression2' is executed. The relation is tested, and so on. +The typical use of the `for' statement is for a controlled iteration, +as in the statement +.DS +for(i=1; i<=10; i=i+1) i +.DE +which will print the integers from 1 to 10. +Here are some examples of the use of the control statements. +.DS +define f(n){ +auto i, x +x=1 +for(i=1; i<=n; i=i+1) x=x*i +return(x) +} +.DE +The line +.DS + f(a) +.DE +will print +.ft I +a +.ft +factorial if +.ft I +a +.ft +is a positive integer. +Here is the definition of a function which will +compute values of the binomial coefficient +(m and n are assumed to be positive integers). +.DS +define b(n,m){ +auto x, j +x=1 +for(j=1; j<=m; j=j+1) x=x*(n\-j+1)/j +return(x) +} +.DE +The following function computes values of the exponential function +by summing the appropriate series +without regard for possible truncation errors: +.DS +scale = 20 +define e(x){ + auto a, b, c, d, n + a = 1 + b = 1 + c = 1 + d = 0 + n = 1 + while(1==1){ + a = a*x + b = b*n + c = c + a/b + n = n + 1 + if(c==d) return(c) + d = c + } +} +.DE +.SH +Some Details +.PP +There are some language features that every user should know +about even if he will not use them. +.PP +Normally statements are typed one to a line. It is also permissible +to type several statements on a line separated by semicolons. +.PP +If an assignment statement is parenthesized, it then has +a value and it can be used anywhere that an expression can. +For example, the line +.DS +(x=y+17) +.DE +not only makes the indicated assignment, but also prints the +resulting value. +.PP +Here is an example of a use of the value of an +assignment statement even when it is not parenthesized. +.DS +x = a[i=i+1] +.DE +causes a value to be assigned to x and also increments i +before it is used as a subscript. +.PP +The following constructs work in BC in exactly the same manner +as they do in the C language. Consult the appendix or the +C manuals [2] for their exact workings. +.DS +.ta 2i +x=y=z is the same as x=(y=z) +x =+ y x = x+y +x =\- y x = x\-y +x =* y x = x*y +x =/ y x = x/y +x =% y x = x%y +x =^ y x = x^y +x++ (x=x+1)\-1 +x\-\- (x=x\-1)+1 +++x x = x+1 +\-\-x x = x\-1 +.DE +Even if you don't intend to use the constructs, +if you type one inadvertently, something correct but unexpected +may happen. +.PP +WARNING! In some of these constructions, spaces are +significant. +There is a real difference between +x=\-y and x= \-y. +The first replaces x by x\-y and the second by \-y. +.SH +Three Important Things +.PP +1. To exit a BC program, type `quit'. +.PP +2. There is a comment convention identical to that of C and +of PL/I. Comments begin with `/*' and end with `*/'. +.PP +3. There is a library of math functions which may be obtained by +typing at command level +.DS +bc \-l +.DE +This command will load a set of library functions +which, at the time of writing, consists of sine (named `s'), +cosine (`c'), arctangent (`a'), natural logarithm (`l'), +exponential (`e') and Bessel functions of integer order (`j(n,x)'). Doubtless more functions will be added +in time. +The library sets the scale to 20. You can reset it to something +else if you like. +The design of these mathematical library routines +is discussed elsewhere [3]. +.PP +If you type +.DS +bc file ... +.DE +BC will read and execute the named file or files before accepting +commands from the keyboard. In this way, you may load your +favorite programs and function definitions. +.SH +Acknowledgement +.PP +The compiler is written in YACC [4]; its original +version was written by S. C. Johnson. +.SH +References +.IP [1] +K. Thompson and D. M. Ritchie, +.ft I +UNIX Programmer's Manual, +.ft +Bell Laboratories, +1978. +.IP [2] +B. W. Kernighan and +D. M. Ritchie, +.ft I +The C Programming Language, +.ft +Prentice-Hall, 1978. +.IP [3] +R. Morris, +.ft I +A Library of Reference Standard Mathematical Subroutines, +.ft +Bell Laboratories internal memorandum, 1975. +.IP [4] +S. C. Johnson, +.ft I +YACC \(em Yet Another Compiler-Compiler. +.ft +Bell Laboratories Computing Science Technical Report #32, 1978. +.IP [5] +R. Morris and L. L. Cherry, +.ft I +DC \- An Interactive Desk Calculator. +.ft +.LP +.bp +.ft B +.DS C +Appendix +.DE +.ft +.NH +Notation +.PP +In the following pages syntactic categories are in \fIitalics\fP; +literals are in \fBbold\fP; material in brackets [\|] is optional. +.NH +Tokens +.PP +Tokens consist of keywords, identifiers, constants, operators, +and separators. +Token separators may be blanks, tabs or comments. +Newline characters or semicolons separate statements. +.NH 2 +Comments +.PP +Comments are introduced by the characters /* and terminated by +*/. +.NH 2 +Identifiers +.PP +There are three kinds of identifiers \- ordinary identifiers, array identifiers +and function identifiers. +All three types consist of single lower-case letters. +Array identifiers are followed by square brackets, possibly +enclosing an expression describing a subscript. +Arrays are singly dimensioned and may contain up to 2048 +elements. +Indexing begins at zero so an array may be indexed from 0 to 2047. +Subscripts are truncated to integers. +Function identifiers are followed by parentheses, possibly enclosing arguments. +The three types of identifiers do not conflict; +a program can have a variable named \fBx\fP, +an array named \fBx\fP and a function named \fBx\fP, all of which are separate and +distinct. +.NH 2 +Keywords +.PP +The following are reserved keywords: +.ft B +.ta .5i 1.0i +.nf + ibase if + obase break + scale define + sqrt auto + length return + while quit + for +.fi +.ft +.NH 2 +Constants +.PP +Constants consist of arbitrarily long numbers +with an optional decimal point. +The hexadecimal digits \fBA\fP\-\fBF\fP are also recognized as digits with +values 10\-15, respectively. +.NH 1 +Expressions +.PP +The value of an expression is printed unless the main +operator is an assignment. +Precedence is the same as the order +of presentation here, with highest appearing first. +Left or right associativity, where applicable, is +discussed with each operator. +.bp +.NH 2 +Primitive expressions +.NH 3 +Named expressions +.PP +Named expressions are +places where values are stored. +Simply stated, +named expressions are legal on the left +side of an assignment. +The value of a named expression is the value stored in the place named. +.NH 4 +\fIidentifiers\fR +.PP +Simple identifiers are named expressions. +They have an initial value of zero. +.NH 4 +\fIarray-name\fP\|[\|\fIexpression\fP\|] +.PP +Array elements are named expressions. +They have an initial value of zero. +.NH 4 +\fBscale\fR, \fBibase\fR and \fBobase\fR +.PP +The internal registers +\fBscale\fP, \fBibase\fP and \fBobase\fP are all named expressions. +\fBscale\fP is the number of digits after the decimal point to be +retained in arithmetic operations. +\fBscale\fR has an initial value of zero. +\fBibase\fP and \fBobase\fP are the input and output number +radix respectively. +Both \fBibase\fR and \fBobase\fR have initial values of 10. +.NH 3 +Function calls +.NH 4 +\fIfunction-name\fB\|(\fR[\fIexpression\fR\|[\fB,\|\fIexpression\|\fR.\|.\|.\|]\|]\fB) +.PP +A function call consists of a function name followed by parentheses +containing a comma-separated list of +expressions, which are the function arguments. +A whole array passed as an argument is specified by the +array name followed by empty square brackets. +All function arguments are passed by +value. +As a result, changes made to the formal parameters have +no effect on the actual arguments. +If the function terminates by executing a return +statement, the value of the function is +the value of the expression in the parentheses of the return +statement or is zero if no expression is provided +or if there is no return statement. +.NH 4 +sqrt\|(\|\fIexpression\fP\|) +.PP +The result is the square root of the expression. +The result is truncated in the least significant decimal place. +The scale of the result is +the scale of the expression or the +value of +.ft B +scale, +.ft +whichever is larger. +.NH 4 +length\|(\|\fIexpression\fP\|) +.PP +The result is the total number of significant decimal digits in the expression. +The scale of the result is zero. +.NH 4 +scale\|(\|\fIexpression\fP\|) +.PP +The result is the scale of the expression. +The scale of the result is zero. +.NH 3 +Constants +.PP +Constants are primitive expressions. +.NH 3 +Parentheses +.PP +An expression surrounded by parentheses is +a primitive expression. +The parentheses are used to alter the +normal precedence. +.NH 2 +Unary operators +.PP +The unary operators +bind right to left. +.NH 3 +\-\|\fIexpression\fP +.PP +The result is the negative of the expression. +.NH 3 +++\|\fInamed-expression\fP +.PP +The named expression is +incremented by one. +The result is the value of the named expression after +incrementing. +.NH 3 +\-\-\|\fInamed-expression\fP +.PP +The named expression is +decremented by one. +The result is the value of the named expression after +decrementing. +.NH 3 +\fInamed-expression\fP\|++ +.PP +The named expression is +incremented by one. +The result is the value of the named expression before +incrementing. +.NH 3 +\fInamed-expression\fP\|\-\- +.PP +The named expression is +decremented by one. +The result is the value of the named expression before +decrementing. +.NH 2 +Exponentiation operator +.PP +The exponentiation operator binds right to left. +.NH 3 +\fIexpression\fP ^ \fIexpression\fP +.PP +The result is the first +expression raised to the power of the +second expression. +The second expression must be an integer. +If \fIa\fP +is the scale of the left expression +and \fIb\fP is the absolute value +of the right expression, +then the scale of the result is: +.PP +min\|(\|\fIa\(mub\fP,\|max\|(\|\fBscale\fP,\|\fIa\fP\|)\|) +.NH 2 +Multiplicative operators +.PP +The operators *, /, % bind left to right. +.NH 3 +\fIexpression\fP * \fIexpression\fP +.PP +The result is the product +of the two expressions. +If \fIa\fP and \fIb\fP are the +scales of the two expressions, +then the scale of the result is: +.PP +min\|(\|\fIa+b\fP,\|max\|(\|\fBscale\fP,\|\fIa\fP,\|\fIb\fP\|)\|) +.NH 3 +\fIexpression\fP / \fIexpression\fP +.PP +The result is the quotient of the two expressions. +The scale of the result is the value of \fBscale\fR. +.NH 3 +\fIexpression\fP % \fIexpression\fP +.PP +The % operator produces the remainder of the division +of the two expressions. +More precisely, +\fIa\fP%\fIb\fP is \fIa\fP\-\fIa\fP/\fIb\fP*\fIb\fP. +.PP +The scale of the result is the sum of the scale of +the divisor and the value of +.ft B +scale +.ft +.NH 2 +Additive operators +.PP +The additive operators bind left to right. +.NH 3 +\fIexpression\fP + \fIexpression\fP +.PP +The result is the sum of the two expressions. +The scale of the result is +the maximun of the scales of the expressions. +.NH 3 +\fIexpression\fP \- \fIexpression\fP +.PP +The result is the difference of the two expressions. +The scale of the result is the +maximum of the scales of the expressions. +.NH 2 +assignment operators +.PP +The assignment operators bind right to left. +.NH 3 +\fInamed-expression\fP = \fIexpression\fP +.PP +This expression results in assigning the value of the expression +on the right +to the named expression on the left. +.NH 3 +\fInamed-expression\fP =+ \fIexpression\fP +.NH 3 +\fInamed-expression\fP =\- \fIexpression\fP +.NH 3 +\fInamed-expression\fP =* \fIexpression\fP +.NH 3 +\fInamed-expression\fP =/ \fIexpression\fP +.NH 3 +\fInamed-expression\fP =% \fIexpression\fP +.NH 3 +\fInamed-expression\fP =^ \fIexpression\fP +.PP +The result of the above expressions is equivalent +to ``named expression = named expression OP expression'', +where OP is the operator after the = sign. +.NH 1 +Relations +.PP +Unlike all other operators, the relational operators +are only valid as the object of an \fBif\fP, \fBwhile\fP, +or inside a \fBfor\fP statement. +.NH 2 +\fIexpression\fP < \fIexpression\fP +.NH 2 +\fIexpression\fP > \fIexpression\fP +.NH 2 +\fIexpression\fP <= \fIexpression\fP +.NH 2 +\fIexpression\fP >= \fIexpression\fP +.NH 2 +\fIexpression\fP == \fIexpression\fP +.NH 2 +\fIexpression\fP != \fIexpression\fP +.NH 1 +Storage classes +.PP +There are only two storage classes in BC, global and automatic +(local). +Only identifiers that are to be local to a function need be +declared with the \fBauto\fP command. +The arguments to a function +are local to the function. +All other identifiers are assumed to be global +and available to all functions. +All identifiers, global and local, have initial values +of zero. +Identifiers declared as \fBauto\fP are allocated on entry to the function +and released on returning from the function. +They therefore do not retain values between function calls. +\fBauto\fP arrays are specified by the array name followed by empty square brackets. +.PP +Automatic variables in BC do not work in exactly the same way +as in either C or PL/I. On entry to a function, the old values of +the names that appear as parameters and as automatic +variables are pushed onto a stack. +Until return is made from the function, reference to these +names refers only to the new values. +.NH 1 +Statements +.PP +Statements must be separated by semicolon or newline. +Except where altered by control statements, execution +is sequential. +.NH 2 +Expression statements +.PP +When a statement is an expression, unless +the main operator is an assignment, the value +of the expression is printed, followed by a newline character. +.NH 2 +Compound statements +.PP +Statements may be grouped together and used when one statement is expected +by surrounding them with { }. +.NH 2 +Quoted string statements +.PP +"any string" +.sp .5 +This statement prints the string inside the quotes. +.NH 2 +If statements +.sp .5 +\fBif\|(\|\fIrelation\fB\|)\|\fIstatement\fR +.PP +The substatement is executed if the relation is true. +.NH 2 +While statements +.sp .5 +\fBwhile\|(\|\fIrelation\fB\|)\|\fIstatement\fR +.PP +The statement is executed while the relation +is true. +The test occurs before each execution of the statement. +.NH 2 +For statements +.sp .5 +\fBfor\|(\|\fIexpression\fB; \fIrelation\fB; \fIexpression\fB\|)\|\fIstatement\fR +.PP +The for statement is the same as +.nf +.ft I + first-expression + \fBwhile\|(\fPrelation\|\fB) {\fP + statement + last-expression + } +.ft R +.fi +.PP +All three expressions must be present. +.NH 2 +Break statements +.sp .5 +\fBbreak\fP +.PP +\fBbreak\fP causes termination of a \fBfor\fP or \fBwhile\fP statement. +.NH 2 +Auto statements +.sp .5 +\fBauto \fIidentifier\fR\|[\|\fB,\fIidentifier\fR\|] +.PP +The auto statement causes the values of the identifiers to be pushed down. +The identifiers can be ordinary identifiers or array identifiers. +Array identifiers are specified by following the array name by empty square +brackets. +The auto statement must be the first statement +in a function definition. +.NH 2 +Define statements +.sp .5 +.nf +\fBdefine(\|\fR[\fIparameter\|\fR[\fB\|,\|\fIparameter\|.\|.\|.\|\fR]\|]\|\fB)\|{\fI + statements\|\fB}\fR +.fi +.PP +The define statement defines a function. +The parameters may +be ordinary identifiers or array names. +Array names must be followed by empty square brackets. +.NH 2 +Return statements +.sp .5 +\fBreturn\fP +.sp .5 +\fBreturn(\fI\|expression\|\fB)\fR +.PP +The return statement causes termination of a function, +popping of its auto variables, and +specifies the result of the function. +The first form is equivalent to \fBreturn(0)\fR. +The result of the function is the result of the expression +in parentheses. +.NH 2 +Quit +.PP +The quit statement stops execution of a BC program and returns +control to UNIX when it is first encountered. +Because it is not treated as an executable statement, +it cannot be used +in a function definition or in an +.ft B +if, for, +.ft +or +.ft B +while +.ft +statement. diff --git a/usr/doc/cman b/usr/doc/cman new file mode 100644 index 0000000000..7c8fbf5ef9 --- /dev/null +++ b/usr/doc/cman @@ -0,0 +1,2 @@ +Sorry, but for copyright reasons, the source +for the C Reference Manual is not distributed. diff --git a/usr/doc/dc b/usr/doc/dc new file mode 100644 index 0000000000..f77522ee99 --- /dev/null +++ b/usr/doc/dc @@ -0,0 +1,709 @@ +.RP +....TM 75-1271-8 39199 39199-11 +.TL +DC \- An Interactive Desk Calculator +.AU "MH 2C-524" 3878 +Robert Morris +.AU +Lorinda Cherry +.AI +.MH +.AB +DC is an interactive desk calculator program implemented +on the +.UX +time-sharing system to do arbitrary-precision +integer arithmetic. +It has provision for manipulating scaled fixed-point numbers and +for input and output in bases other than decimal. +.PP +The size of numbers that can be manipulated is limited +only by available core storage. +On typical implementations of +.UX , +the size of numbers that +can be handled varies from several hundred digits on the smallest +systems to several thousand on the largest. +.AE +.PP +.SH +.ND +.PP +DC is an arbitrary precision arithmetic package implemented +on the +.UX +time-sharing system +in the form of an interactive desk calculator. +It works like a stacking calculator using reverse Polish notation. +Ordinarily DC operates on decimal integers, but one may +specify an input base, output base, and a number of fractional +digits to be maintained. +.PP +A language called BC [1] has been developed which accepts +programs written in the familiar style of higher-level +programming languages and compiles output which is +interpreted by DC. +Some of the commands described below were designed +for the compiler interface and are not easy for a human user +to manipulate. +.PP +Numbers that are typed into DC are put on a push-down +stack. +DC commands work by taking the top number or two +off the stack, performing the desired operation, and pushing the result +on the stack. +If an argument is given, +input is taken from that file until its end, +then from the standard input. +.SH +SYNOPTIC DESCRIPTION +.PP +Here we describe the DC commands that are intended +for use by people. The additional commands that are +intended to be invoked by compiled output are +described in the detailed description. +.PP +Any number of commands are permitted on a line. +Blanks and new-line characters are ignored except within numbers +and in places where a register name is expected. +.PP +The following constructions are recognized: +.SH +number +.IP +The value of the number is pushed onto the main stack. +A number is an unbroken string of the digits 0-9 +and the capital letters A\-F which are treated as digits +with values 10\-15 respectively. +The number may be preceded by an underscore \*_ to input a +negative number. +Numbers may contain decimal points. +.SH ++ \- * % ^ +.IP +The +top two values on the stack are added +(\fB+\fP), +subtracted +(\fB\-\fP), +multiplied (\fB*\fP), +divided (\fB/\fP), +remaindered (\fB%\fP), +or exponentiated (^). +The two entries are popped off the stack; +the result is pushed on the stack in their place. +The result of a division is an integer truncated toward zero. +See the detailed description below for the treatment of +numbers with decimal points. +An exponent must not have any digits after the decimal point. +.SH +s\fIx\fP +.IP +The +top of the main stack is popped and stored into +a register named \fIx\fP, where \fIx\fP may be any character. +If +the +.ft B +s +.ft +is capitalized, +.ft I +x +.ft +is treated as a stack and the value is pushed onto it. +Any character, even blank or new-line, is a valid register name. +.SH +l\fIx\fP +.IP +The +value in register +.ft I +x +.ft +is pushed onto the stack. +The register +.ft I +x +.ft +is not altered. +If the +.ft B +l +.ft +is capitalized, +register +.ft I +x +.ft +is treated as a stack and its top value is popped onto the main stack. +.LP +All registers start with empty value which is treated as a zero +by the command \fBl\fP and is treated as an error by the command \fBL\fP. +.SH +.SH +d +.IP +The +top value on the stack is duplicated. +.SH +p +.IP +The top value on the stack is printed. +The top value remains unchanged. +.SH +f +.IP +All values on the stack and in registers are printed. +.SH +x +.IP +treats the top element of the stack as a character string, +removes it from the stack, and +executes it as a string of DC commands. +.SH +[ ... ] +.IP +puts the bracketed character string onto the top of the stack. +.SH +q +.IP +exits the program. +If executing a string, the recursion level is +popped by two. +If +.ft B +q +.ft +is capitalized, +the top value on the stack is popped and the string execution level is popped +by that value. +.SH +<\fIx\fP >\fIx\fP =\fIx\fP !<\fIx\fP !>\fIx\fP !=\fIx\fP +.IP +The +top two elements of the stack are popped and compared. +Register +.ft I +x +.ft +is executed if they obey the stated +relation. +Exclamation point is negation. +.SH +v +.IP +replaces the top element on the stack by its square root. +The square root of an integer is truncated to an integer. +For the treatment of numbers with decimal points, see +the detailed description below. +.SH +! +.IP +interprets the rest of the line as a +.UX +command. +Control returns to DC when the +.UX +command terminates. +.SH +c +.IP +All values on the stack are popped; the stack becomes empty. +.SH +i +.IP +The top value on the stack is popped and used as the +number radix for further input. +If \fBi\fP is capitalized, the value of +the input base is pushed onto the stack. +No mechanism has been provided for the input of arbitrary +numbers in bases less than 1 or greater than 16. +.SH +o +.IP +The top value on the stack is popped and used as the +number radix for further output. +If \fBo\fP is capitalized, the value of the output +base is pushed onto the stack. +.SH +k +.IP +The top of the stack is popped, and that value is used as +a scale factor +that influences the number of decimal places +that are maintained during multiplication, division, and exponentiation. +The scale factor must be greater than or equal to zero and +less than 100. +If \fBk\fP is capitalized, the value of the scale factor +is pushed onto the stack. +.SH +z +.IP +The value of the stack level is pushed onto the stack. +.SH +? +.IP +A line of input is taken from the input source (usually the console) +and executed. +.SH +DETAILED DESCRIPTION +.SH +Internal Representation of Numbers +.PP +Numbers are stored internally using a dynamic storage allocator. +Numbers are kept in the form of a string +of digits to the base 100 stored one digit per byte +(centennial digits). +The string is stored with the low-order digit at the +beginning of the string. +For example, the representation of 157 +is 57,1. +After any arithmetic operation on a number, care is taken +that all digits are in the range 0\-99 and that +the number has no leading zeros. +The number zero is represented by the empty string. +.PP +Negative numbers are represented in the 100's complement +notation, which is analogous to two's complement notation for binary +numbers. +The high order digit of a negative number is always \-1 +and all other digits are in the range 0\-99. +The digit preceding the high order \-1 digit is never a 99. +The representation of \-157 is 43,98,\-1. +We shall call this the canonical form of a number. +The advantage of this kind of representation of negative +numbers is ease of addition. When addition is performed digit +by digit, the result is formally correct. The result need only +be modified, if necessary, to put it into canonical form. +.PP +Because the largest valid digit is 99 and the byte can +hold numbers twice that large, addition can be carried out +and the handling of carries done later when +that is convenient, as it sometimes is. +.PP +An additional byte is stored with each number beyond +the high order digit to indicate the number of +assumed decimal digits after the decimal point. The representation +of .001 is 1,\fI3\fP +where the scale has been italicized to emphasize the fact that it +is not the high order digit. +The value of this extra byte is called the +.ft B +scale factor +.ft +of the number. +.SH +The Allocator +.PP +DC uses a dynamic string storage allocator +for all of its internal storage. +All reading and writing of numbers internally is done through +the allocator. +Associated with each string in the allocator is a four-word header containing pointers +to the beginning of the string, the end of the string, +the next place to write, and the next place to read. +Communication between the allocator and DC +is done via pointers to these headers. +.PP +The allocator initially has one large string on a list +of free strings. All headers except the one pointing +to this string are on a list of free headers. +Requests for strings are made by size. +The size of the string actually supplied is the next higher +power of 2. +When a request for a string is made, the allocator +first checks the free list to see if there is +a string of the desired size. +If none is found, the allocator finds the next larger free string and splits it repeatedly until +it has a string of the right size. +Left-over strings are put on the free list. +If there are no larger strings, +the allocator tries to coalesce smaller free strings into +larger ones. +Since all strings are the result +of splitting large strings, +each string has a neighbor that is next to it in core +and, if free, can be combined with it to make a string twice as long. +This is an implementation of the `buddy system' of allocation +described in [2]. +.PP +Failing to find a string of the proper length after coalescing, +the allocator asks the system for more space. +The amount of space on the system is the only limitation +on the size and number of strings in DC. +If at any time in the process of trying to allocate a string, the allocator runs out of +headers, it also asks the system for more space. +.PP +There are routines in the allocator for reading, writing, copying, rewinding, +forward-spacing, and backspacing strings. +All string manipulation is done using these routines. +.PP +The reading and writing routines +increment the read pointer or write pointer so that +the characters of a string are read or written in +succession by a series of read or write calls. +The write pointer is interpreted as the end of the +information-containing portion of a string and a call +to read beyond that point returns an end-of-string indication. +An attempt to write beyond the end of a string +causes the allocator to +allocate a larger space and then copy +the old string into the larger block. +.SH +Internal Arithmetic +.PP +All arithmetic operations are done on integers. +The operands (or operand) needed for the operation are popped +from the main stack and their scale factors stripped off. +Zeros are added or digits removed as necessary to get +a properly scaled result from the internal arithmetic routine. +For example, if the scale of the operands is different and decimal +alignment is required, as it is for +addition, zeros are appended to the operand with the smaller +scale. +After performing the required arithmetic operation, +the proper scale factor is appended to the end of the number before +it is pushed on the stack. +.PP +A register called \fBscale\fP plays a part +in the results of most arithmetic operations. +\fBscale\fP is the bound on the number of decimal places retained in +arithmetic computations. +\fBscale\fP may be set to the number on the top of the stack +truncated to an integer with the \fBk\fP command. +\fBK\fP may be used to push the value of \fBscale\fP on the stack. +\fBscale\fP must be greater than or equal to 0 and less than 100. +The descriptions of the individual arithmetic operations will +include the exact effect of \fBscale\fP on the computations. +.SH +Addition and Subtraction +.PP +The scales of the two numbers are compared and trailing +zeros are supplied to the number with the lower scale to give both +numbers the same scale. The number with the smaller scale is +multiplied by 10 if the difference of the scales is odd. +The scale of the result is then set to the larger of the scales +of the two operands. +.PP +Subtraction is performed by negating the number +to be subtracted and proceeding as in addition. +.PP +Finally, the addition is performed digit by digit from the +low order end of the number. The carries are propagated +in the usual way. +The resulting number is brought into canonical form, which may +require stripping of leading zeros, or for negative numbers +replacing the high-order configuration 99,\-1 by the digit \-1. +In any case, digits which are not in the range 0\-99 must +be brought into that range, propagating any carries or borrows +that result. +.SH +Multiplication +.PP +The scales are removed from the two operands and saved. +The operands are both made positive. +Then multiplication is performed in +a digit by digit manner that exactly mimics the hand method +of multiplying. +The first number is multiplied by each digit of the second +number, beginning with its low order digit. The intermediate +products are accumulated into a partial sum which becomes the +final product. +The product is put into the canonical form and its sign is +computed from the signs of the original operands. +.PP +The scale of the result is set equal to the sum +of the scales of the two operands. +If that scale is larger than the internal register +.ft B +scale +.ft +and also larger than both of the scales of the two operands, +then the scale of the result is set equal to the largest +of these three last quantities. +.SH +Division +.PP +The scales are removed from the two operands. +Zeros are appended or digits removed from the dividend to make +the scale of the result of the integer division equal to +the internal quantity +\fBscale\fP. +The signs are removed and saved. +.PP +Division is performed much as it would be done by hand. +The difference of the lengths of the two numbers +is computed. +If the divisor is longer than the dividend, +zero is returned. +Otherwise the top digit of the divisor is divided into the top +two digits of the dividend. +The result is used as the first (high-order) digit of the +quotient. +It may turn out be one unit too low, but if it is, the next +trial quotient will be larger than 99 and this will be +adjusted at the end of the process. +The trial digit is multiplied by the divisor and the result subtracted +from the dividend and the process is repeated to get +additional quotient digits until the remaining +dividend is smaller than the divisor. +At the end, the digits of the quotient are put into +the canonical form, with propagation of carry as needed. +The sign is set from the sign of the operands. +.SH +Remainder +.PP +The division routine is called and division is performed +exactly as described. The quantity returned is the remains of the +dividend at the end of the divide process. +Since division truncates toward zero, remainders have the same +sign as the dividend. +The scale of the remainder is set to +the maximum of the scale of the dividend and +the scale of the quotient plus the scale of the divisor. +.SH +Square Root +.PP +The scale is stripped from the operand. +Zeros are added if necessary to make the +integer result have a scale that is the larger of +the internal quantity +\fBscale\fP +and the scale of the operand. +.PP +The method used to compute sqrt(y) is Newton's method +with successive approximations by the rule +.EQ +x sub {n+1} ~=~ half ( x sub n + y over x sub n ) +.EN +The initial guess is found by taking the integer square root +of the top two digits. +.SH +Exponentiation +.PP +Only exponents with zero scale factor are handled. If the exponent is +zero, then the result is 1. If the exponent is negative, then +it is made positive and the base is divided into one. The scale +of the base is removed. +.PP +The integer exponent is viewed as a binary number. +The base is repeatedly squared and the result is +obtained as a product of those powers of the base that +correspond to the positions of the one-bits in the binary +representation of the exponent. +Enough digits of the result +are removed to make the scale of the result the same as if the +indicated multiplication had been performed. +.SH +Input Conversion and Base +.PP +Numbers are converted to the internal representation as they are read +in. +The scale stored with a number is simply the number of fractional digits input. +Negative numbers are indicated by preceding the number with a \fB\_\fP. +The hexadecimal digits A\-F correspond to the numbers 10\-15 regardless of input base. +The \fBi\fP command can be used to change the base of the input numbers. +This command pops the stack, truncates the resulting number to an integer, +and uses it as the input base for all further input. +The input base is initialized to 10 but may, for example be changed to +8 or 16 to do octal or hexadecimal to decimal conversions. +The command \fBI\fP will push the value of the input base on the stack. +.SH +Output Commands +.PP +The command \fBp\fP causes the top of the stack to be printed. +It does not remove the top of the stack. +All of the stack and internal registers can be output +by typing the command \fBf\fP. +The \fBo\fP command can be used to change the output base. +This command uses the top of the stack, truncated to an integer as +the base for all further output. +The output base in initialized to 10. +It will work correctly for any base. +The command \fBO\fP pushes the value of the output base on the stack. +.SH +Output Format and Base +.PP +The input and output bases only affect +the interpretation of numbers on input and output; they have no +effect on arithmetic computations. +Large numbers are output with 70 characters per line; +a \\ indicates a continued line. +All choices of input and output bases work correctly, although not all are +useful. +A particularly useful output base is 100000, which has the effect of +grouping digits in fives. +Bases of 8 and 16 can be used for decimal-octal or decimal-hexadecimal +conversions. +.SH +Internal Registers +.PP +Numbers or strings may be stored in internal registers or loaded on the stack +from registers with the commands \fBs\fP and \fBl\fP. +The command \fBs\fIx\fR pops the top of the stack and +stores the result in register \fBx\fP. +\fIx\fP can be any character. +\fBl\fIx\fR puts the contents of register \fBx\fP on the top of the stack. +The \fBl\fP command has no effect on the contents of register \fIx\fP. +The \fBs\fP command, however, is destructive. +.SH +Stack Commands +.PP +The command \fBc\fP clears the stack. +The command \fBd\fP pushes a duplicate of the number on the top of the stack +on the stack. +The command \fBz\fP pushes the stack size on the stack. +The command \fBX\fP replaces the number on the top of the stack +with its scale factor. +The command \fBZ\fP replaces the top of the stack +with its length. +.SH +Subroutine Definitions and Calls +.PP +Enclosing a string in \fB[]\fP pushes the ascii string on the stack. +The \fBq\fP command quits or in executing a string, pops the recursion levels by two. +.SH +Internal Registers \- Programming DC +.PP +The load and store +commands together with \fB[]\fP to store strings, \fBx\fP to execute +and the testing commands `<', `>', `=', `!<', `!>', `!=' can be used to program +DC. +The \fBx\fP command assumes the top of the stack is an string of DC commands +and executes it. +The testing commands compare the top two elements on the stack and if the relation holds, execute the register +that follows the relation. +For example, to print the numbers 0-9, +.DS +[lip1+ si li10>a]sa +0si lax +.DE +.SH +Push-Down Registers and Arrays +.PP +These commands were designed for used by a compiler, not by +people. +They involve push-down registers and arrays. +In addition to the stack that commands work on, DC can be thought +of as having individual stacks for each register. +These registers are operated on by the commands \fBS\fP and \fBL\fP. +\fBS\fIx\fR pushes the top value of the main stack onto the stack for +the register \fIx\fP. +\fBL\fIx\fR pops the stack for register \fIx\fP and puts the result on the main +stack. +The commands \fBs\fP and \fBl\fP also work on registers but not as push-down +stacks. +\fBl\fP doesn't effect the top of the +register stack, and \fBs\fP destroys what was there before. +.PP +The commands to work on arrays are \fB:\fP and \fB;\fP. +\fB:\fIx\fR pops the stack and uses this value as an index into +the array \fIx\fP. +The next element on the stack is stored at this index in \fIx\fP. +An index must be greater than or equal to 0 and +less than 2048. +\fB;\fIx\fR is the command to load the main stack from the array \fIx\fP. +The value on the top of the stack is the index +into the array \fIx\fP of the value to be loaded. +.SH +Miscellaneous Commands +.PP +The command \fB!\fP interprets the rest of the line as a +.UX + command and passes +it to +.UX +to execute. +One other compiler command is \fBQ\fP. +This command uses the top of the stack as the number of levels of recursion to skip. +.SH +DESIGN CHOICES +.PP +The real reason for the use of a dynamic storage allocator was +that a general purpose program could be (and in fact has been) +used for a variety of other tasks. +The allocator has some value for input and for compiling (i.e. +the bracket [...] commands) where it cannot be known in advance +how long a string will be. +The result was that at a modest +cost in execution time, all considerations of string allocation +and sizes of strings were removed from the remainder of the program +and debugging was made easier. The allocation method +used wastes approximately 25% of available space. +.PP +The choice of 100 as a base for internal arithmetic +seemingly has no compelling advantage. Yet the base cannot +exceed 127 because of hardware limitations and at the cost +of 5% in space, debugging was made a great deal easier and +decimal output was made much faster. +.PP +The reason for a stack-type arithmetic design was +to permit all DC commands from addition to subroutine execution +to be implemented in essentially the same way. The result +was a considerable degree of logical separation of the final +program into modules with very little communication between +modules. +.PP +The rationale for the lack of interaction between the scale and the bases +was to provide an understandable means of proceeding after +a change of base or scale when numbers had already been entered. +An earlier implementation which had global notions of +scale and base did not work out well. +If the value of +.ft B +scale +.ft +were to be interpreted in the current +input or output base, +then a change of base or scale in the midst of a +computation would cause great confusion in the interpretation +of the results. +The current scheme has the advantage that the value of +the input and output bases +are only used for input and output, respectively, and they +are ignored in all other operations. +The value of +scale +is not used for any essential purpose by any part of the program +and it is used only to prevent the number of +decimal places resulting from the arithmetic operations from +growing beyond all bounds. +.PP +The design rationale for the choices for the scales of +the results of arithmetic were that in no case should +any significant digits be thrown away if, on appearances, the +user actually wanted them. Thus, if the user wants +to add the numbers 1.5 and 3.517, it seemed reasonable to give +him the result 5.017 without requiring him to unnecessarily +specify his rather obvious requirements for precision. +.PP +On the other hand, multiplication and exponentiation produce +results with many more digits than their operands and it +seemed reasonable to give as a minimum the number of decimal +places in the operands but not to give more than that +number of digits +unless the user asked for them by specifying a value for \fBscale\fP. +Square root can be handled in just the same way as multiplication. +The operation of division gives arbitrarily many decimal places +and there is simply no way to guess how many places the user +wants. +In this case only, the user must +specify a \fBscale\fP to get any decimal places at all. +.PP +The scale of remainder was chosen to make it possible +to recreate the dividend from the quotient and remainder. +This is easy to implement; no digits are thrown away. +.SH +References +.IP [1] +L. L. Cherry, R. Morris, +.ft I +BC \- An Arbitrary Precision Desk-Calculator Language. +.ft +.IP [2] +K. C. Knowlton, +.ft I +A Fast Storage Allocator, +.ft +Comm. ACM \fB8\fP, pp. 623-625 (Oct. 1965).