BSD 4 development

author Bill Joy <wnj@ucbvax.Berkeley.EDU>

Wed, 15 Oct 1980 20:03:55 +0000 (12:03 -0800)

committer Bill Joy <wnj@ucbvax.Berkeley.EDU>

Wed, 15 Oct 1980 20:03:55 +0000 (12:03 -0800)
author Bill Joy <wnj@ucbvax.Berkeley.EDU>
Wed, 15 Oct 1980 20:03:55 +0000 (12:03 -0800)
committer Bill Joy <wnj@ucbvax.Berkeley.EDU>
Wed, 15 Oct 1980 20:03:55 +0000 (12:03 -0800)
diff --git a/usr/doc/README b/usr/doc/README

new file mode 100644 (file)

index 0000000..98a97e4
--- /dev/null
+++ b/usr/doc/README
@@ -0,0 +1,9 @@
+This directory contains source for most of the
+documents contained in Volume 2 of The UNIX
+Programmer's Manual, 7th Edition.
+Most use -ms to format; many also use
+refer, tbl and eqn. Precise incantations
+are engraved in ./run.
+
+The citations for those papers that use
+refer are taken from /usr/dict/papers/Rv7man.
diff --git a/usr/doc/REFS b/usr/doc/REFS

new file mode 100644 (file)

index 0000000..e5da452
--- /dev/null
+++ b/usr/doc/REFS
@@ -0,0 +1,405 @@
+%A L. P. Deutsch
+%A B. W. Lampson
+%T An online editor
+%J Comm. Assoc. Comp. Mach.
+%V 10
+%N 12
+%D December 1967
+%P 793-799, 803
+%K qed
+
+.[
+%r 17
+%K cstr
+%R Comp. Sci. Tech. Rep. No. 17
+%I Bell Laboratories
+%C Murray Hill, New Jersey
+%A B. W. Kernighan
+%A L. L. Cherry
+%T A System for Typesetting Mathematics
+%d May 1974, revised April 1977
+%J Comm. Assoc. Comp. Mach.
+%K acm cacm
+%V 18
+%P 151-157
+%D March 1975
+.]
+
+%T U\s-2NIX\s0 Time-Sharing System: Document Preparation
+%K unix bstj
+%A B. W. Kernighan
+%A M. E. Lesk
+%A J. F. Ossanna
+%J Bell Sys. Tech. J.
+%V 57
+%N 6
+%P 2115-2135
+%D 1978
+
+%A T. A. Dolotta
+%A J. R. Mashey
+%T An Introduction to the Programmer's Workbench
+%J Proc. 2nd Int. Conf. on Software Engineering
+%D October 13-15, 1976
+%P 164-168
+
+%T U\s-2NIX\s0 Time-Sharing System: The Programmer's Workbench
+%A T. A. Dolotta
+%A R. C. Haight
+%A J. R. Mashey
+%J Bell Sys. Tech. J.
+%V 57
+%N 6
+%P 2177-2200
+%D 1978
+%K unix bstj
+
+%T U\s-2NIX\s0 Time-Sharing System: U\s-2NIX\s0 on a Microprocessor
+%K unix bstj
+%A H. Lycklama
+%J Bell Sys. Tech. J.
+%V 57
+%N 6
+%P 2087-2101
+%D 1978
+
+%T The C Programming Language
+%A B. W. Kernighan
+%A D. M. Ritchie
+%I Prentice-Hall
+%C Englewood Cliffs, New Jersey
+%D 1978
+
+%T Computer Recreations
+%A Aleph-null
+%J Software Practice and Experience
+%V 1
+%N 2
+%D April-June 1971
+%P 201-204
+
+%T U\s-2NIX\s0 Time-Sharing System: The U\s-2NIX\s0 Shell
+%A S. R. Bourne
+%K unix bstj
+%J Bell Sys. Tech. J.
+%V 57
+%N 6
+%P 1971-1990
+%D 1978
+
+%A L. P. Deutsch
+%A B. W. Lampson
+%T \*sSDS\*n 930 time-sharing system preliminary reference manual
+%R Doc. 30.10.10, Project \*sGENIE\*n
+%C Univ. Cal. at Berkeley
+%D April 1965
+
+%A R. J. Feiertag
+%A E. I. Organick
+%T The Multics input-output system
+%J Proc. Third Symposium on Operating Systems Principles
+%D October 18-20, 1971
+%P 35-41
+
+%A D. G. Bobrow
+%A J. D. Burchfiel
+%A D. L. Murphy
+%A R. S. Tomlinson
+%T \*sTENEX\*n, a Paged Time Sharing System for the \*sPDP\*n-10
+%J Comm. Assoc. Comp. Mach.
+%V 15
+%N 3
+%D March 1972
+%K tenex
+%P 135-143
+
+%A R. E. Griswold
+%A D. R. Hanson
+%T An Overview of SL5
+%J SIGPLAN Notices
+%V 12
+%N 4
+%D April 1977
+%P 40-50
+
+%A E. W. Dijkstra
+%T Cooperating Sequential Processes
+%B Programming Languages
+%E F. Genuys
+%I Academic Press
+%C New York
+%D 1968
+%P 43-112
+
+%A J. A. Hawley
+%A W. B. Meyer
+%T M\s-2UNIX\s0, A Multiprocessing Version of U\s-2NIX\s0
+%K munix unix
+%R M.S. Thesis
+%I Naval Postgraduate School
+%C Monterey, Cal.
+%D 1975
+
+%T The U\s-2NIX\s0 Time-Sharing System
+%K unix bstj
+%A D. M. Ritchie
+%A K. Thompson
+%J Bell Sys. Tech. J.
+%V 57
+%N 6
+%P 1905-1929
+%D 1978
+
+%A E. I. Organick
+%T The M\s-2ULTICS\s0 System
+%K multics
+%I M.I.T. Press
+%C Cambridge, Mass.
+%D 1972
+
+%T UNIX for Beginners
+%A B. W. Kernighan
+%D 1978
+
+%T U\s-2NIX\s0 Programmer's Man\&ual
+%A K. Thompson
+%A D. M. Ritchie
+%K unix
+%I Bell Laboratories
+%O Seventh Edition.
+%D 1978
+
+%A K. Thompson
+%T The U\s-2NIX\s0 Command Language
+%B Structured Programming\(emInfotech State of the Art Report
+%I Infotech International Ltd.
+%C Nicholson House, Maidenhead, Berkshire, England
+%D March 1975
+%P 375-384
+%K unix
+%X pwb
+Brief description of shell syntax and semantics, without much
+detail on implementation.
+Much on pipes and convenience of hooking programs together.
+Includes SERMONETTE:
+"Many familiar computing `concepts' are missing from UNIX.
+Files have no records. There are no access methods.
+There are no file types.  These concepts fill a much-needed gap.
+I sincerely hope that when future systems are designed by
+manufacturers the value of some of these ingrained notions is re-examined.
+Like the politician and his `common man', manufacturers have
+their `average user'.
+
+%A J. R. Mashey
+%T PWB/UNIX Shell Tutorial
+%D September 30, 1977
+
+%A D. F. Hartley (Ed.)
+%T The Cambridge Multiple Access System \- Users Reference Manual
+%I University Mathematical Laboratory
+%C Cambridge, England
+%D 1968
+
+%A P. A. Crisman (Ed.)
+%T The Compatible Time-Sharing System
+%I M.I.T. Press
+%K whole ctss book
+%C Cambridge, Mass.
+%D 1965
+
+%T LR Parsing
+%A A. V. Aho
+%A S. C. Johnson
+%J Comp. Surveys
+%V 6
+%N 2
+%P 99-124
+%D June 1974
+
+%T Deterministic Parsing of Ambiguous Grammars
+%A A. V. Aho
+%A S. C. Johnson
+%A J. D. Ullman
+%J Comm. Assoc. Comp. Mach.
+%K acm cacm
+%V 18
+%N 8
+%P 441-452
+%D August 1975
+
+%A A. V. Aho
+%A J. D. Ullman
+%T Principles of Compiler Design
+%I Addison-Wesley
+%C Reading, Mass.
+%D 1977
+
+.[
+%r 65
+%R Comp. Sci. Tech. Rep. No. 65
+%K CSTR
+%A S. C. Johnson
+%T Lint, a C Program Checker
+%D December 1977
+%O updated version TM 78-1273-3
+%D 1978
+.]
+
+%T A Portable Compiler: Theory and Practice
+%A S. C. Johnson
+%J Proc. 5th ACM Symp. on Principles of Programming Languages
+%P 97-104
+%D January 1978
+
+.[
+%r 39
+%K CSTR
+%R Comp. Sci. Tech. Rep. No. 39
+%I Bell Laboratories
+%C Murray Hill, New Jersey
+%A M. E. Lesk
+%T Lex \(em A Lexical Analyzer Generator
+%D October 1975
+.]
+
+.[
+%r 32
+%K CSTR
+%R Comp. Sci. Tech. Rep. No. 32
+%I Bell Laboratories
+%C Murray Hill, New Jersey
+%A S. C. Johnson
+%T Yacc \(em  Yet Another Compiler-Compiler
+%D July 1975
+.]
+
+%T U\s-2NIX\s0 Time-Sharing System: Portability of C Programs and the U\s-2NIX\s0 System
+%K unix bstj
+%A S. C. Johnson
+%A D. M. Ritchie
+%J Bell Sys. Tech. J.
+%V 57
+%N 6
+%P 2021-2048
+%D 1978
+
+%T Typing Documents on UNIX and GCOS:  The -ms Macros for Troff
+%A M. E. Lesk
+%D 1977
+
+%A K. Thompson
+%A D. M. Ritchie
+%T U\s-2NIX\s0 Programmer's Manual
+%K unix
+%I Bell Laboratories
+%O Sixth Edition
+%D May 1975
+
+%T The Network U\s-2NIX\s0 System
+%K unix
+%A G. L. Chesson
+%J Operating Systems Review
+%V 9
+%N 5
+%P 60-66
+%D 1975
+%O Also in \f2Proc. 5th Symp. on Operating Systems Principles.\f1
+
+%T Spider \(em An Experimental Data Communications System
+%Z ctr127
+%A A. G. Fraser
+%J Proc. IEEE Conf. on Communications
+%P 21F
+%O IEEE Cat. No. 74CH0859-9-CSCB.
+%D June 1974
+
+%T A Virtual Channel Network
+%A A. G. Fraser
+%J Datamation
+%P 51-56
+%D February 1975
+
+.[
+%r 41
+%K CSTR
+%R Comp. Sci. Tech. Rep. No. 41
+%I Bell Laboratories
+%C Murray Hill, New Jersey
+%A J. W. Hunt
+%A M. D. McIlroy
+%T An Algorithm for Differential File Comparison
+%D June 1976
+.]
+
+%A F. P. Brooks, Jr.
+%T The Mythical Man-Month
+%I Addison-Wesley
+%C Reading, Mass.
+%D 1975
+%X pwb
+Readable, classic reference on software engineering and
+problems of large projects, from someone with experience in them.
+Required reading for any software engineer, even if conclusions may not
+always be agreed with.
+%br
+"The second is the most dangerous system a man every designs." p.55.
+%br
+"Hence plan to throw one away; you will, anyhow." p.116.
+%br
+"Cosgrove has perceptively pointed out that the programmer delivers
+satisfaction of a user need rather than any tangible product.
+And both the actual need and the user's perception of that need
+will change as programs are built, tested, and used." p.117.
+%br
+"The total cost of maintaining a widely used program is typically 40 percent
+or more of the cost of developing it." p.121.
+%br
+"As shown above, amalgamating prose and program reduces the total
+number of characters to be stored." p.175.
+
+%T A Portable Compiler for the Language C
+%A A. Snyder
+%I Master's Thesis, M.I.T.
+%C Cambridge, Mass.
+%D 1974
+
+%T The C Language Calling Sequence
+%A M. E. Lesk
+%A S. C. Johnson
+%A D. M. Ritchie
+%D 1977
+
+%T Optimal Code Generation for Expression Trees
+%A A. V. Aho
+%A S. C. Johnson
+%D 1975
+%J J. Assoc. Comp. Mach.
+%K acm jacm
+%V 23
+%N 3
+%P 488-501
+%O Also in \f2Proc. ACM Symp. on Theory of Computing,\f1 pp. 207-217, 1975.
+
+%A R. Sethi
+%A J. D. Ullman
+%T The Generation of Optimal Code for Arithmetic Expressions
+%J J. Assoc. Comp. Mach.
+%K acm jacm
+%V 17
+%N 4
+%D October 1970
+%P 715-728
+%O Reprinted as pp. 229-247 in \fICompiler Techniques\fR, ed. B. W. Pollack, Auerbach, Princeton NJ (1972).
+%X pwb
+Optimal approach for straight-line, fixed
+number of regs.
+
+%T Code Generation for Machines with Multiregister
+Operations
+%A A. V. Aho
+%A S. C. Johnson
+%A J. D. Ullman
+%J Proc. 4th ACM Symp. on Principles of Programming Languages
+%P 21-28
+%D January 1977
+
diff --git a/usr/doc/assembler b/usr/doc/assembler

new file mode 100644 (file)

index 0000000..6e1435b
--- /dev/null
+++ b/usr/doc/assembler
@@ -0,0 +1,942 @@
+.TL
+Assembler Reference Manual
+.AU
+John F. Reiser
+.AI
+.HO
+.AU
+Robert R. Henry\s-2\u*\d\s+2
+.FS
+\&\s-2\u*\d\s+2
+Preparation of this paper supported in part
+by the National Science Foundation under grant MCS # 78-07291.
+.FE
+.AI
+Electronics Research Laboratory
+University of California
+Berkeley, CA  94720
+.ND November 5, 1979
+.NH
+Introduction
+.PP
+This document describes the usage and input syntax
+of the \s8UNIX VAX\s10-11 assembler \fIas\fP.
+\fIAs\fP is designed for assembling the code produced by the
+C compiler; certain concessions have been made to handle code written
+directly by people, but in general little sympathy has been extended.
+This document is intended only for the writer of a compiler or a maintainer
+of the assembler.
+.NH
+Usage
+.PP
+\fIas\fP is used as follows:
+.in +5
+as
+[ \fB\-LVWJR\fR ]
+[ \fB\-d\fIn\fR ]
+[ \fB\-DTC\fR ]
+[ \fB\-t \fIdirectory\fR ]
+[ \fB\-o \fIoutput\fR ]
+[ \fIname\d\s-2\&1\s+2\u1\fP ]
+[ \fIname\d\s-2\&2\s+2\u ... \fP ]
+.br
+.in -5
+.PP
+The \fB\-L\fP flag instructs the assembler to save labels beginning with
+a 'L' in the symbol table portion output file.
+Labels are not saved by default, as the default action of the link
+editor \fIld\fP is to discard them anyway.
+.PP
+The \fB\-V\fP flag tells the assembler to place its interpass temporary
+file into virtual memory.
+In normal circumstances, the system manager
+will decide where the temporary file should lie.
+Our experiments
+with very large temporary files show that placing the temporary
+file into virtual memory will save about 13% of the assembly time,
+where the size of the temporary file is about 350K bytes.
+Most assembler sources will not be this long.
+.PP
+The \fB\-W\fP turns off reporting all errors.
+.PP
+The \fB\-J\fP flag forces \s-2UNIX\s+2 style pseudo\-branch
+instructions with destinations further away than a
+byte displacement to be
+turned into jump instructions with 4 byte offsets.
+The \fB\-J\fP flag buys you nothing if \fB\-d2\fP is set.
+(See \(sc 9.4)
+.PP
+The \fB\-R\fP flag effectively turns \fB.data\fP\fI n\fP
+segment changing directives into \fB.text\fP\fI n\fP directives.
+This obviates the need to run editor scripts on assembler source
+to ``read\-only'' fix initialized data segments.
+Uninitialized data (via \fB.lcomm\fP and \fP.comm\fP directives)
+is still assembled into the data or bss segments.
+.PP
+The \fB\-d\fP flag specifies the number of bytes
+which the assembler should allow for a displacement when the value of the
+displacement expression is undefined in the first pass.
+The possible values of \fIn\fP are 1, 2, or 4; the assembler uses 4 bytes
+if \fB-d\fP is not specified.
+See \(sc 9.2.
+.PP
+Provided the \fB\-V\fP flag is not set,
+the \fB\-t\fP flag causes the assembler to place its single temporary file
+in the \fIdirectory\fP instead of in \fI/tmp\fP.
+.PP
+The \fB\-o\fP flag causes the output to be placed on the named file.
+The output of the assembler is by default placed on
+the file \fIa.out\fR in the current directory.
+.PP
+The input to the assembler is normally taken from the standard input.
+If file arguments occurs, then the input is taken
+sequentially from the files \fIname\d\s-2\&1\s+2\u\fP,
+\fIname\d\s-2\&2\s+2\u\fP...
+This is not to say that the files are assembled seperately;
+\fIname\d\s-2\&2\s+2\u\fP
+is effectively concatenated to \fIname\d\s-2\&1\s+2\u\fP,
+so multiple definitions cannot occur amongst the input sources.
+.PP
+The \fB\-D\fP flag enables debugging information, provided that the
+assembler has been compiled to have debugging information available.
+.PP
+The \fB\-T\fP flag enables a trace to be generate of each token read
+by \fIas\fP to be printed.  This is long and boring, but useful when
+debugging the assembler.
+.NH
+Lexical conventions
+.PP
+Assembler tokens include identifiers (alternatively, ``symbols'' or ``names''),
+constants, and operators.
+.NH
+Identifiers
+.PP
+An identifier consists of a sequence of alphanumeric characters (including
+period ``\|\fB.\fR\|'',
+underscore ``\(ul'' and
+dollar ``\|$\|'')
+of which the first may not be numeric.
+If the assembler has been compiled to support flexible length symbols,
+identifiers may be (practically) arbitrarily long with all
+characters significant;
+otherwise, only the first NCPS
+(a symbol defined in \fI/usr/include/a.out.h\fP, and normally 8)
+characters are significant.
+.NH 2
+Constants
+.NH 3
+Simple constants
+.PP
+All integer constants are 64 bits wide and interpreted as two's
+complement numbers.
+64 bit wide integer constants (quads) are only partially supported
+by the \s-2VAX\s+2 hardware, and are supported only to provide
+immediate constants to \s-2VAX\s+2 instructions with quad operands.
+Floating-point constants are 64 bits wide.
+The digits are ``0123456789abcdefABCDEF'' with the obvious values.
+.PP
+An octal constant consists of a sequence of digits with a leading zero.
+.PP
+A decimal constant consists of a sequence of digits without a leading zero.
+.PP
+A hexadecimal constant consists of the characters ``0x'' (or ``0X'')
+followed by a sequence of digits.
+.PP
+A single-character constant consists of a single quote ``\|\(fm\|''
+followed by an \s8ASCII\s10 character, including \s8ASCII\s10 newline.
+The constant's value is the code for the
+given character.
+.PP
+A floating-point constant consists of the characters ``0f'', ``0d'',
+``0F'', or ``0D'' followed by a sequence of characters which \fIatof\fP
+will recognize as a floating-point number;
+either ``e'', ``E'', ``d''or ``D''
+may be used to designate the exponent field.
+.NH 3
+String Constants
+.PP
+A string constant is defined using the same syntax and semantics as ``C''
+beginning and ending with a ``"'' (double quote).
+The \s8DEC\s10 assembler conventions for flexible string quoting is
+not implemented.
+All ``C'' backslash conventions are observed; the backslash conventions
+peculiar to the \s-2PDP\-11\s+2 assembler are not observed.
+Strings are known by their value and their length; the assembler
+does not implicitly end strings with a null byte.
+.NH 2
+Operators
+.PP
+There are several single-character
+operators; see \(sc7.
+.NH 2
+Blanks
+.PP
+Blank and tab characters
+may be interspersed freely between tokens, but may
+not be used within tokens (except character constants).
+A blank or tab is required to separate adjacent
+identifiers or constants not otherwise separated.
+.NH 2
+Comments
+.NH 3
+Decadent Comments
+.PP
+The character ``\|#\|'' introduces a comment, which extends
+through the end of the line on which it appears.
+Comments starting in column 1,
+of the format ``\|# \fIexpression string\fP\|"
+are interpreted as an indication that the assembler is now assembling
+file \fIstring\fP at line \fIexpression\fP.
+Thus, one can use the C preprocessor on an assembly language source file,
+and use the \fI#include\fP and \fI#define\fP
+preprocessor directives.
+(Note that their may not be an assembler comment starting in column
+1 if the assembler source is given to the C preprocessor, as it will
+be intrepreted by the preprocessor in a way not intended.)
+Comments are otherwise ignored by the assembler.
+.NH 3
+C Style Comments
+.PP
+The assembler will recognize C style comments, introduced with
+the prologue \fB/*\fP and ending with the epilogue \fB*/\fP.
+C style comments may extend across multiple lines, and are the preferred
+comment style to use if one chooses to use the C preprocessor.
+.NH 1
+Segments and Location Counters
+.PP
+Assembled code and data fall into three segments:  the text segment,
+the data segment, and the bss segment.  The operating system makes
+some assumptions about the content of these segments;  the assembler
+does not.  Within the text and data segments there are a number of
+sub-segments, distinguished by number (``text 0'', ``text 1'', .\|.\|.
+``data 0'', ``data 1'', .\|.\|.\|).
+Currently there are four subsegments each in text and data.
+The subsegments are for programming convenience only.  Before writing the
+output file, the assembler zero-pads each text subsegment to a multiple of four
+bytes and then concatenates the subsegments in order to form the text segment;
+an analogous operation is done for the data segment.
+Requesting that the loader define symbols and storage regions is the only
+action allowed by the assembler with respect to the bss segment.
+Assembly begins in ``text 0''.
+.PP
+Associated with each (sub)segment is an implicit location counter which
+begins at zero and is incremented by 1 for each byte assembled into the
+(sub)segment.  There is no way to explicitly reference a location counter.
+Note that the location counters of subsegments other than ``text 0''
+and ``data 0'' behave peculiarly due to the concatenation used to form
+the text and data segments.
+.NH 1
+Statements
+.PP
+A source program is composed of a sequence of
+\fIstatements\fP.
+Statements are separated either by new-lines
+or by semicolons.
+There are two kinds of statements: null statements
+and keyword statements.
+Either kind of statement may be preceded by
+one or more labels.
+.NH 2
+Labels
+.NH 3
+Name (Global) Labels
+.PP
+A global label consists of a name followed
+by a colon ``\|:\|''.
+The effect of a name label is to assign the current
+value and type of the location counter
+to the name.
+An error is indicated in pass 1 if the
+name is already defined;
+an error is indicated in pass 2 if the
+value assigned changes the definition
+of the label.
+.PP
+A global label is referenced by its name.
+.PP
+Global labels beginning with a ``\|L\|''
+are discarded unless the \fB-L\fP option
+is in effect.
+.NH 3
+Numeric (Local) Labels
+.PP
+A numeric label consists of a digit \fI0\fP to \fI9\fP followed by a
+colon (``\|:\|'').
+Such a label serves to define temporary symbols of the form
+``\fIn\fPb'' and ``\fIn\fPf'',
+where \fIn\fP is the digit of the label.
+As in the case of name labels, a numeric label assigns
+the current value and type of the location counter
+to the temporary symbol.
+However, several numeric labels with the same digit
+may be used within the same assembly.
+References to symbols of the form
+``\fIn\fPb''
+refer to the first numeric label ``\fIn\|:\fP''
+\fIb\fP\|ackwards from the reference;
+``\fIn\fPf''
+symbols refer to the first numeric label ``\fIn\|:\fP''
+\fIf\fP\P\|orwards from the reference.
+Such numeric labels tend to conserve the inventive powers of
+the programmer.
+.NH 2
+Null statements
+.PP
+A null statement is an empty statement (which may, however,
+have labels).
+A null statement is ignored by the assembler.
+Common examples of null statements are empty
+lines or lines containing only a label.
+.NH 2
+Keyword statements
+.PP
+A keyword statement begins with one of the many predefined
+keywords of the assembler;
+the syntax of the remainder depends
+on the keyword.
+All instruction opcodes are keywords.
+The remaining keywords are assembler pseudo-operations,
+also called directives.
+The pseudo-operations are listed below with the syntax they require.
+.NH 1
+Expressions
+.PP
+An expression is a sequence of symbols representing a value.
+Its constituents are identifiers, constants,
+operators, and parentheses.
+Each expression has a type.
+.PP
+All operators in expressions are fundamentally binary in
+nature.
+Arithmetic is two's complement and has 32 bits of precision.
+There are four levels of precedence, listed here from
+lowest precedence level to highest:
+.IP (binary) 16
+\|+\|, -\|
+.IP (binary) 16
+\||\|, \|&\|, \|^\|, \|!\|
+.IP (binary) 16
+\|*\|, \|/\|, \|%\|, \|!\|
+.IP (unary) 16
+\|-\|, \|!\|
+.PP
+All operators of the same precedence are evaluated strictly left to right,
+except for evaluation order enforced by parenthesis.
+.NH 2
+Expression operators
+.PP
+The operators are:
+.IP + 16
+addition
+.IP \- 16
+subtraction
+.IP * 16
+multiplication
+.IP / 16
+division
+.IP % 
+modulo
+.IP & 16
+bitwise and
+.IP \(bv 16
+bitwise or
+.IP ^ 16
+bitwise exclusive or
+.IP "> (or >>)" 16
+logical right shift
+.IP "< (or <<)" 16
+logical left shift
+.hc
+.IP ! 8
+\fIa\fR\|!\|\fIb\fR is \fIa \fBor \fR(\|\fBnot \fIb\fR\|);
+i.e., the \fBor\fR of the first operand and
+the one's complement of the second; most common use is
+as a unary operator.
+.PP
+Expressions may be grouped by use of parentheses ``\|(\|\|)\|''.
+.NH 2
+Types
+.PP
+The assembler deals with a number of types
+of expressions.  Most types
+are attached to keywords and used to select the
+routine which treats that keyword.  The types likely
+to be met explicitly are:
+.IP undefined 8
+.br
+Upon first encounter, each symbol is undefined.
+It may become undefined if it is assigned an undefined expression.
+It is an error to attempt to assemble an undefined
+expression in pass 2; in pass 1, it is not (except that
+certain keywords require operands which are not undefined).
+.IP "undefined external" 8
+.br
+A symbol which is declared \fB.globl\fR but not defined
+in the current assembly is an undefined
+external.
+If such a symbol is declared, the link editor \fIld\fR
+must be used to load the assembler's output with
+another routine that defines the undefined reference.
+.IP absolute 8
+.br
+An absolute symbol is defined ultimately from a constant.
+Its value is unaffected by any possible future applications
+of the link-editor to the output file.
+.IP text 8
+.br
+The value of a text symbol is measured
+with respect to the beginning of the text segment of the program.
+If the assembler output is link-edited, its text
+symbols may change in value
+since the program need
+not be the first in the link editor's output.
+Most text symbols are defined by appearing as labels.
+At the start of an assembly, the value of ``\|\fB.\fP\|'' is text 0.
+.IP data 8
+.br
+The value of a data symbol is measured
+with respect to the origin of the data segment of a program.
+Like text symbols, the value of a data symbol may change
+during a subsequent link-editor run since previously
+loaded programs may have data segments.
+After the first \fB.data\fR statement, the value of ``\|\fB.\fP\|''
+is data 0.
+.IP bss 8
+.br
+The value of a bss symbol is measured from
+the beginning of the bss segment of a program.
+Like text and data symbols, the value of a bss symbol
+may change during a subsequent link-editor
+run, since previously loaded programs may have bss segments.
+.IP "external absolute, text, data, or bss" 8
+.br
+symbols declared \fB.globl\fR
+but defined within an assembly as absolute, text, data, or bss
+symbols may be used exactly as if they were not
+declared \fB.globl\fR; however, their value and type are available
+to the link editor so that the program may be loaded with others
+that reference these symbols.
+.IP register 8
+.br
+The symbols
+.DS
+\fBr0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15\fP
+\fBap fp sp pc\fP
+.DE
+are predefined
+as register symbols.
+In addition, the % operator converts an absolute value to type register.
+.IP "other types" 8
+.br
+Each keyword known to the assembler has a type which
+is used to select the routine which processes
+the associated keyword statement.
+The behavior of such symbols
+when not used as keywords is the same as if they were absolute.
+.NH 2
+Type propagation in expressions
+.PP
+When operands are combined by expression operators,
+the result has a type which depends on the types
+of the operands and on the operator.
+The rules involved are complex to state but
+were intended to be sensible and predictable.
+For purposes of expression evaluation the
+important types are
+.DS
+undefined
+absolute
+text
+data
+bss
+undefined external
+other
+.DE
+The combination rules are then:
+If one of the operands
+is undefined, the result is undefined.
+If both operands are absolute, the result is absolute.
+If an absolute is combined with one of the ``other types''
+mentioned above,
+the result has the other type.
+An ``other type'' combined with an explicitly
+discussed type other than absolute
+it acts like an absolute.
+.PP
+Further rules applying to particular operators
+are:
+.IP +
+If one operand is text-, data-, or bss-segment
+relocatable, or is an undefined external,
+the result has the postulated type and the other operand
+must be absolute.
+.IP \-
+If the first operand is a relocatable
+text-, data-, or bss-segment symbol, the second operand
+may be absolute (in which case the result has the
+type of the first operand);
+or the second operand may have the same type
+as the first (in which case the result is absolute).
+If the first operand is external undefined, the second must be
+absolute.
+All other combinations are illegal.
+.PP
+.IP others
+.br
+It is illegal to apply these operators to any but absolute
+symbols.
+.NH 1
+Pseudo-operations (Directives)
+.PP
+The keywords listed below introduce
+influence the later operations of the assembler.
+The metanotation
+.DS
+[ stuff ] .\|.\|.
+.DE
+means that 0 or more instances of the given stuff may appear.
+The metatnotation
+.DS
+( stuff )\|*\|\|\fIn\fP\|
+.DE
+means that exactly \fIn\fP occurances of stuff must occur.
+.PP
+Boldface tokens are literals, italic words
+are substitutable.
+.PP
+The pseudo\-operations listed below are grouped into functional
+categories, and not alphabetically.
+.NH 2
+Interface to a Previous Pass
+.in +5m
+.NH 3
+\&.ABORT
+.PP
+As soon as the assembler sees this directive, it ignores all further
+input (but it does read to the end of file), and aborts the assembly.
+No files are created.
+It is anticipated that this would be used in a pipe interconnected
+version of a compiler, where the first major syntax error would
+cause the compiler to issue this directive, saving unnecessary
+work in assembling code that would have to be discarded anyway.
+.NH 3
+\&.file \fIstring\fP
+.PP
+This directive causes the assembler to think it is in file \fIstring\fP
+so error messages reflect the proper source file.
+.NH 3
+\&.line \fIexpression\fP
+.PP
+This directive causes the assembler to think it is on line \fIexpression\fP
+so error messages reflect the proper source file.
+.PP
+The only effect of assembling multiple files specified in the command string
+is to insert the
+\fIfile\fP and \fIline\fP directives, with the appropriate values,
+at the beginning of the source from each file.
+.NH 3
+Preprocessor Interface
+.DS
+\fI# expression string\fP
+\fI# expression\fP
+.DE
+.PP
+This is the only instance where a comment is meaningful to the assembler.
+The ``\|#\|''
+.ul 1
+must
+be in the first column.
+This meta comment causes the assembler
+to believe it is on line \fIexpression\fP.
+The second argument, if included, causes the assembler to believe it is in
+file \fIstring\fP, otherwise the current file name does not change.
+.in -5m
+.NH 2
+Location Counter Control
+.in +5m
+.NH 3
+\&\fB.align\fP  \fIexpression\fP
+.PP
+The location counter is adjusted (by assembling bytes containing
+zeroes, if necessary) so that the \fIexpression\fP lowest bits
+become zero.
+Thus ``.align 2'' makes the location counter evenly divisible by 4.
+The expression must be defined, absolute, nonnegative,
+and less than 16.
+(Note that the subsegment concatenation convention
+and the current loader conventions may not preserve attempts at aligning
+to more than 2 low-order zero bits.)
+.NH 3
+Subsegment switching
+.DS
+  \fB.data\fP [ \fIexpression\fP ]
+  \fB.text\fP  [ \fIexpression\fP ]
+.DE
+.PP
+These two pseudo-operations cause the
+assembler to begin assembling into the indicated text or data
+subsegment.  If specified, the expression must be defined and absolute;
+an omitted expression is treated as zero.
+The effect of a \fB.data\fP directive is treated
+as a \fB.text\fP directive if the \fB\-R\fP assembly flag is set.
+Assembly starts in the text 0 subsegment.
+.NH 3
+\&\fB.org\fP  \fIexpression\fP
+.PP
+The location counter is set equal to the value of the expression.
+The expression must be defined.
+The value of the expression must be greater than the current value
+of the location counter.
+.NH 3
+\&\fB.space\fP  \fIexpression\fP
+.PP
+\&\fIexpression\fP bytes of zeroes are assembled.
+.in -5m
+.NH 2
+Initialized Data
+.in +5m
+.NH 3
+Expression Initialized Data
+.DS
+  \fB.byte             \fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
+  \fB.word             \fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
+  \fB.int              \fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
+  \fB.long             \fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
+
+  \fB.quad     \fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
+
+  \fB.float            \fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
+  \fB.double   \fIexpression  \fR[  \fB, \fIexpression \fR]  .\|.\|.
+.DE
+.PP
+The \fIexpression\fP\|s in the comma-separated
+list are truncated to the indicated size
+(byte=8 bits,
+word=16,
+int=32,
+long=32,
+quad=64,
+float=32,
+double=64)
+and
+assembled in successive locations.
+The expressions must be absolute.
+The value assembled in bits 32-63 for \fB.double\fP is zero
+if the expression is not of type double.
+.PP
+Except for \fB.quad\fP, \fB.float\fP and \fB.double\fP,
+each expression may optionally be of the form
+.DS
+  \fIexpression\d\s-21\&\s+2\u\fP  \fB:\fP  \fIexpression\d\s-2\&2\s+2\u\fP.
+.DE
+In this case the value of \fIexpression\d\s-2\&2\s+2\u\fP
+is truncated to \fIexpression\d\s-2\&1\s+2\u\fP
+bits and assembled in the next \fIexpr\d\s-2\&1\s+2\u\fP-bit
+field which fits in
+the natural data size being assembled.
+Bits which are skipped because
+a field does not fit are made zero.
+Thus "\fB.byte\fP 123" is equivalent to
+"\fB.byte\fP 8:123" and "\fB.byte\fP 3:1,2:1,5:1"
+assembles two bytes, containing the values 9 and 1.
+.br
+\fBNB:\fP Since no \s-2VAX\s+2 compilers currently use bit fields,
+these bit field constructs are liable to disappear in the future.
+.NH 3
+String Initialized Data
+.DS
+ \fB.ascii\fP \fIstring\fP [ \fB,\fP \fIstring\fP ]
+ \fB.asciz\fP \fIstring\fP [ \fB,\fP \fIstring\fP ]
+.DE
+.PP
+Each \fIstring\fP in the list is assembled into successive locations,
+with the first letter in the string being placed
+into the first location, etc.
+The \fB.ascii\fP directive will not null pad the string;
+the \fB.asciz\fP directive will null pad the string.
+(Recall that strings are known by their length, and need not be terminated
+with a null, and that the C conventions for escaping are understood.)
+The \fB.ascii\fP directive is identical to:
+.DS
+\&\fB.byte\fP \fIstring\d\s-2\&0\s+2\u\fP\fB,\fP \fIstring\d\s-2\&1\s+2\u\fP\fB,\fP ...
+.DE
+.NH 3
+Zero Filled Data
+.DS
+\fB.space\fP \fIexpression\fP
+.DE
+.PP
+(See \(sc 8.2.4)
+\&\fIexpression\fP bytes of zeroes are assembled.
+\&\fIexpression\fP must be absolute.
+.NH 3
+Arbitrarily Filled Data
+.DS
+\fB.fill\fP
+\fIrep_expr\fP\fB, \fP
+\fIsize_expr\fP\fB, \fP
+\fIvalue_expr\fP\fR
+.DE
+.PP
+All three expressions must be absolute.
+\fIvalue_expr\fP,
+treated as an expression of size \fIsize_expr\fP bytes,
+is assembled and replicated \fIrep_expr\fP times.
+The effect is to advance the current location counter
+\fIrep_expr\fP \(** \fIsize_expr\fP bytes.
+\fIsize_expr\fP must be between 1 and 8.
+.in -5m
+.NH 2
+Symbol Definition
+.in +5m
+.NH 3
+General
+.in +5m
+.NH 4
+\&\fB.comm\fI  name  \fB,  \fIexpression\fR
+.PP
+Provided the \fIname\fR is not defined elsewhere,
+its type is made ``undefined external'', and its value is \fIexpression\fR.
+In fact the \fIname\fR behaves
+in the current assembly just like an
+undefined external.
+However, the link editor \fIld\fR has been special-cased
+so that all external symbols which are not
+otherwise defined, and which have a non-zero
+value, are defined to lie in the bss
+segment, and enough space is left after the
+symbol to hold \fIexpression\fR
+bytes.
+.NH 4
+\&\fB.lcomm\fI  name  \fB,  \fIexpression\fR
+.PP
+\fIexpression\fP bytes will be allocated in the bss segment and \fIname\fP
+assigned the location of the first byte, but the \fIname\fP is not declared
+as global and hence will be unknown to the link editor.
+.NH 4
+\&\fB.globl\fP  \fIname\fP
+.PP
+This statement makes the \fIname\fR external.
+If it is otherwise defined (by \fB.set\fP or by
+appearance as a label)
+it acts within the assembly exactly as if
+the \fB.globl\fR statement were not given;
+however, the link editor may be used
+to combine this routine with other routines that refer
+to this symbol.
+.PP
+Conversely, if the given symbol is not defined
+within the current assembly, the link editor
+can combine the output of this assembly
+with that of others which define the symbol.
+The assembler makes all otherwise
+undefined symbols external.
+.NH 4
+\&\fB.set\fP  \fIname\fP \fB,\fP \fIexpression\fP
+.PP
+The (\fIname\fP, \fIexpression\fP) pair is entered into the symbol table.
+Multiple \fB.set\fP statements with the same name are legal;
+the most recent value replaces all previous values.
+.in -5m
+.NH 3
+Debugger Support
+.in +5m
+.NH 4
+\&\fB.lsym\fP  \fIname\fP \fB,\fP \fIexpression\fP
+.PP
+A unique and otherwise unreferenceable instance of the
+(\fIname\fP, \fIexpression\fP)
+pair is created in the symbol table.
+The Fortran 77 compiler uses this mechanism to pass local symbol definitions
+to the link editor and debugger.
+.NH 4
+Special Symbol Table entries
+.DS
+\&\fB.stab\fP (\fIexpr\d\s-2i\s+2\u \fB,\fR)\|*NCPS\| \fIexpr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fB,\fP expr\d\s-2\&4\s+2\u\fR
+.in +5m
+\fR(normal \fBs\fPymbol \fBtab\fPle entry)\fR
+.in -5m
+\&\fB.stabs\fP \fIstring, expr\d\s-2\&1\s+2\u, expr\d\s-2\&2\s+2\u, expr\d\s-2\&3\s+2\u, expr\d\s-2\&4\s+2\u\fR
+.in +5m
+\fR(\fBstab s\fPtring)\fR
+.in -5m
+\&\fB.stabn\fP \fIexpr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fB,\fP expr\d\s-2\&4\s+2\u\fR
+.in +5m
+\fR(\fBstab n\fPone)\fR
+.in -5m
+\&\fB.stabd\fP \fIexpr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fR
+.in +5m
+\fR(\fBstab d\fPot)\fR
+.in -5m
+.DE
+.PP
+The \fIstab\fP directives place symbols in the symbol table for the symbolic
+debugger, \fIsdb\fP\s-2\u*\d\s+2.
+.FS
+.in +5
+.ti -5
+\s-2\u*\d\s+2Katseff, H.P. \fISdb: A Symbol Debugger\fP.
+Bell Laboratories, Holmdel,
+NJ.  April 12, 1979.
+.br
+.ti -5
+\&Katseff, H.P. \fISymbol Table Format for Sdb\fP. File 39394,
+Bell Laboratores, Holmdel, NJ. March 14, 1979.
+.in -5
+.FE
+In the \fB.stab\fP directive,
+the first NCPS expressions are used for the
+symbol name, which may be zero.
+The \fB.stab\fP directive makes no sense if
+the assembler recognizes arbitrary length symbols;
+if so, the assembler complains.
+The \fIstring\fP in the \fB.stabs\fP
+directive more generally serves the same purpose as the NCPS expressions.
+If the symbol name is zero, the
+\&\fB.stabn\fP directive may be used instead.
+.PP
+The other expressions are stored in the name list structure in the symbol
+table and preserved by the loader for reference by \fIsdb\fP\fR;
+the value of the expressions are peculiar to formats required by \fIsdb\fP\fR.
+.in +5m
+.ti -5
+\&\fIexpr\d\s-2\&1\s+2\u\fP is used as a symbol table tag
+(nlist field \fIn_type\fP).
+.br
+.ti -5
+\&\fIexpr\d\s-2\&2\s+2\u\fP seems to always be zero
+(nlist field \fIn_other\fP).
+.br
+.ti -5
+\&\fIexpr\d\s-2\&3\s+2\u\fP is used for either the
+source line number, or for a nesting level
+(nlist field \fIn_desc\fP).
+.br
+.ti -5
+\fIexpr\d\s-2\&4\s+2\u\fR is used as tag specific information
+(nlist field \fIn_value\fP).
+In the
+case of the \fB.stabd\fP directive, this expression is nonexistant, and
+is taken to be the value of the location counter at the following instruction.
+Since there is no associated name for a \fB.stabd\fP directive, it can
+only be used in circumstances where the name is zero.
+The effect of a \fB.stabd\fP directive can be achieved by one of the other
+\&\fB.stab\fPx directives in the following manner:
+.in -5m
+.DS
+\&     \fB.stabs\fP \fIstring\fB,\fP expr\d\s-2\&1\s+2\u\fB,\fP expr\d\s-2\&2\s+2\u\fB,\fP expr\d\s-2\&3\s+2\u\fB,\fP \fP LL\fIn\fP
+LL\fIn\fP\fB:\fP
+.DE
+The \fB.stabd\fP directive is prefered, because it does not clog the symbol
+table with labels used only for the stab symbol entries.
+.in -5m
+.in -5m
+.NH 1
+Machine instructions
+.PP
+The syntax of machine instruction statements accepted by \fIas\fP
+is generally similar to the syntax of \s8DEC MACRO\s10-32.  There are
+differences, however.
+.NH 2
+Character set
+.PP
+\fIas\fP uses the character `$' instead of `#',
+and the character `*' instead of `@'.  Opcodes and register names
+are spelled with lower-case rather than upper-case letters.
+.NH 2
+Lengths
+.PP
+Under certain circumstances, the following constructs are (optionallly)
+recognized by \&\fIas\fP to indicate the number of bytes to allocate
+for unresolved expressions used to specify displacement or indirect
+displacement addressing modes:
+.DS
+\&\fBB^\fP or  \fBB\`\fP       to indicate byte lengths                (1 byte)
+\&\fBW^\fP or  \fBW\`\fP       to indicate word lengths                (2 bytes)
+\&\fBL^\fP or  \fBL\`\fP       to indicate long word lengths   (3 bytes)
+.DE
+One can also use lower case \fBb\fP, \fBw\fP or \fBl\fP instead of the upper
+case letters.
+There must be no space between the size specifier letter and the \fB^\fP or
+\&\fB\`\fP.
+The constructs \fBS^\fP and \fBG^\fP are not recognized
+by \fIas\fP as they are by the \s-2DEC\s+2 assembler.
+It is preferred to use the "\`" displacement specifier,
+so that the ``^'' is not
+misinterpreted as the \fBxor\fP operator.
+.PP
+Literal values (including floating-point literals used where the
+hardware expects a floating-point operand) are assembled as short
+literals if possible, hence not needing the \fBS^\fP \s-2DEC\s+2
+directive.  If the value of the displacement is known exactly in the
+first pass \fIas\fP determines the length automatically, assembling it
+in the shortest possible way, ignoring (if present) the length
+expression.  If the value of the displacement is not known in the first
+pass, \&\fI\fP will use the value of the displacement given by the
+optional length specifier, or will use the value specified by the
+\fB\-d\fP argument, or will default to 4 bytes.
+.NH 2
+CASE instructions
+.PP
+\fIas\fP considers the instructions \fBcaseb\fP, \fBcasel\fP, \fBcasew\fP
+to have three operands (namely: selector, base, limit).
+The displacements must be explicitly assembled using one
+or more \fB.word\fP statements.
+.NH 2
+Extended branch instructions
+.PP
+These opcodes (formed in general
+by substituting a ``j'' for the initial ``b''
+of the standard opcodes)
+take as branch destinations the name of a label in the current
+subsegment.  If the destination is close enough then the corresponding
+``b'' instruction is assembled.  Otherwise the assembler choses a sequence
+of one or more instructions which together have the same effect as if the
+``b'' instruction had a larger span.  In general, \fIas\fP chooses the
+inverse branch followed by a \fBbrw\fP, but a \fBbrw\fP
+is sometimes pooled among several ``j'' instructions with the same
+destination.
+If the \fB\-J\fP assembler option is given,
+a \fBjmp\fP instruction is used instead of a \fBbrw\fP instruction
+for \fBALL\fP (!!) ``j'' instructions with distant destinations.
+This makes assembly of large (>32K bytes) assembly programs (inefficiently)
+possible.
+The current assembler does not try to use clever combinations of \fBbrb\fP,
+\fBbrw\fP and \fBjmp\fP instructions.
+The \fBjmp\fP instructions use PC relative addressing, with
+the length of the offset given by the ``\fB\-d\fP'' assembler
+option.
+.KS
+.DS
+.ft B
+.ta 1.0i 2.0i 3.0i
+jeql   jeqlu   jneq    jnequ
+jgeq   jgequ   jgtr    jgtru
+jleq   jlequ   jlss    jlssu
+jbcc   jbsc    jbcs    jbss
+jlbc   jlbs
+jcc    jcs     
+jvc    jvs
+jbc    jbs
+jbr
+.DE
+.KE
+\fBjbr\fR turns into \fBbrb\fR
+if its target is close enough; else a \fBbrw\fP is used.
+.NH 1
+Diagnostics
+.PP
+Diagnostics are intended to be self explanatory and appear on
+the standard output.
+.NH 1
+Limits
+.DS
+.ta    2.0i
+Arbitrary      Files to assemble
+Arbitrary      Significant characters per name
+127    Characters per input line
+127    Characters per string
+Arbitrary      Symbols
+4      Text segments
+4      Data segments
+.DE
diff --git a/usr/doc/awk b/usr/doc/awk

new file mode 100644 (file)

index 0000000..758da7d
--- /dev/null
+++ b/usr/doc/awk
@@ -0,0 +1,1401 @@
+.fp 3 G
+....TM "78-1271-12, 78-1273-6" 39199 39199-11
+.ND "September 1, 1978"
+....TR 68
+.RP
+.      \" macros here
+.tr _\(em
+.if t .tr ~\(ap
+.tr |\(or
+.tr *\(**
+.de UC
+\&\\$3\s-1\\$1\\s0\&\\$2
+..
+.de IT
+.if n .ul
+\&\\$3\f2\\$1\fP\|\\$2
+..
+.de UL
+.if n .ul
+\&\\$3\f3\\$1\fP\&\\$2
+..
+.de P1
+.DS I 3n
+.nf
+.if n .ta 5 10 15 20 25 30 35 40 45 50 55 60
+.if t .ta .3i .6i .9i 1.2i
+.if t .tr -\-'\(fm*\(**
+.if t .tr _\(ul
+.ft 3
+.lg 0
+.ss 18
+.              \"use first argument as indent if present
+..
+.de P2
+.ps \\n(PS
+.vs \\n(VSp
+.ft R
+.ss 12
+.if n .ls 2
+.tr --''``^^!!
+.if t .tr _\(em
+.fi
+.lg
+.DE
+..
+.hw semi-colon
+.hy 14
+.              \"2=not last lines; 4= no -xx; 8=no xx-
+.              \"special chars in programs
+.de WS
+.sp \\$1
+..
+.      \" end of macros
+.TL
+Awk \(em A Pattern Scanning and Processing Language
+.br
+(Second Edition)
+.AU "MH 2C-522" 4862
+Alfred V. Aho
+.AU "MH 2C-518" 6021
+Brian W. Kernighan
+.AU "MH 2C-514" 7214
+Peter J. Weinberger
+.AI
+.MH
+.AB
+.IT Awk
+is a programming language whose
+basic operation
+is to search a set of files
+for patterns, and to perform specified actions upon lines or fields of lines which
+contain instances of those patterns.
+.IT Awk
+makes certain data selection and transformation operations easy to express;
+for example, the
+.IT awk
+program
+.sp
+.ce
+.ft 3
+length > 72
+.ft
+.sp
+prints all input lines whose length exceeds 72 characters;
+the program
+.ce
+.sp
+.ft 3
+NF % 2 == 0
+.ft R
+.sp
+prints all lines with an even number of fields;
+and the program
+.ce
+.sp
+.ft 3
+{ $1 = log($1); print }
+.ft R
+.sp
+replaces the first field of each line by its logarithm.
+.PP
+.IT Awk
+patterns may include arbitrary boolean combinations of regular expressions
+and of relational operators on strings, numbers, fields, variables, and array elements.
+Actions may include the same pattern-matching constructions as in patterns,
+as well as
+arithmetic and string expressions and assignments,
+.UL if-else ,
+.UL while ,
+.UL for
+statements,
+and multiple output streams.
+.PP
+This report contains a user's guide, a discussion of the design and implementation of
+.IT awk ,
+and some timing statistics.
+....It supersedes TM-77-1271-5, dated September 8, 1977.
+.AE
+.CS 6 1 7 0 1 4
+.if n .ls 2
+.nr PS 9
+.nr VS 11
+.NH
+Introduction
+.if t .2C
+.PP
+.IT Awk
+is a programming language designed to make
+many common
+information retrieval and text manipulation tasks
+easy to state and to perform.
+.PP
+The basic operation of
+.IT awk
+is to scan a set of input lines in order,
+searching for lines which match any of a set of patterns
+which the user has specified.
+For each pattern, an action can be specified;
+this action will be performed on each line that matches the pattern.
+.PP
+Readers familiar with the
+.UX
+program
+.IT grep\|
+.[
+unix program manual
+.]
+will recognize
+the approach, although in
+.IT awk
+the patterns may be more
+general than in
+.IT grep ,
+and the actions allowed are more involved than merely
+printing the matching line.
+For example, the
+.IT awk
+program
+.P1
+{print $3, $2}
+.P2
+prints the third and second columns of a table
+in that order.
+The program
+.P1
+$2 ~ /A\||B\||C/
+.P2
+prints all input lines with an A, B, or C in the second field.
+The program
+.P1
+$1 != prev     { print; prev = $1 }
+.P2
+prints all lines in which the first field is different
+from the previous first field.
+.NH 2
+Usage
+.PP
+The command
+.P1
+awk  program  [files]
+.P2
+executes the
+.IT awk
+commands in
+the string
+.UL program
+on the set of named files,
+or on the standard input if there are no files.
+The statements can also be placed in a file
+.UL pfile ,
+and executed by the command
+.P1
+awk  -f pfile  [files]
+.P2
+.NH 2
+Program Structure
+.PP
+An
+.IT awk
+program is a sequence of statements of the form:
+.P1
+.ft I
+       pattern { action }
+       pattern { action }
+       ...
+.ft 3
+.P2
+Each line of input
+is matched against
+each of the patterns in turn.
+For each pattern that matches, the associated action
+is executed.
+When all the patterns have been tested, the next line
+is fetched and the matching starts over.
+.PP
+Either the pattern or the action may be left out,
+but not both.
+If there is no action for a pattern,
+the matching line is simply
+copied to the output.
+(Thus a line which matches several patterns can be printed several times.)
+If there is no pattern for an action,
+then the action is performed for every input line.
+A line which matches no pattern is ignored.
+.PP
+Since patterns and actions are both optional,
+actions must be enclosed in braces
+to distinguish them from patterns.
+.NH 2
+Records and Fields
+.PP
+.IT Awk
+input is divided into
+``records'' terminated by a record separator.
+The default record separator is a newline,
+so by default
+.IT awk
+processes its input a line at a time.
+The number of the current record is available in a variable
+named
+.UL NR .
+.PP
+Each input record
+is considered to be divided into ``fields.''
+Fields are normally separated by
+white space \(em blanks or tabs \(em
+but the input field separator may be changed, as described below.
+Fields are referred to as
+.UL "$1, $2,"
+and so forth,
+where
+.UL $1
+is the first field,
+and
+.UL $0
+is the whole input record itself.
+Fields may be assigned to.
+The number of fields in the current record
+is available in a variable named
+.UL NF .
+.PP
+The variables
+.UL FS
+and
+.UL RS
+refer to the input field and record separators;
+they may be changed at any time to any single character.
+The optional command-line argument
+\f3\-F\fIc\fR
+may also be used to set
+.UL FS
+to the character
+.IT c .
+.PP
+If the record separator is empty,
+an empty input line is taken as the record separator,
+and blanks, tabs and newlines are treated as field separators.
+.PP
+The variable
+.UL FILENAME
+contains the name of the current input file.
+.NH 2
+Printing
+.PP
+An action may have no pattern,
+in which case the action is executed for
+all
+lines.
+The simplest action is to print some or all of a record;
+this is accomplished by the
+.IT awk
+command
+.UL print .
+The
+.IT awk
+program
+.P1
+{ print }
+.P2
+prints each record, thus copying the input to the output intact.
+More useful is to print a field or fields from each record.
+For instance, 
+.P1
+print $2, $1
+.P2
+prints the first two fields in reverse order.
+Items separated by a comma in the print statement will be separated by the current output field separator
+when output.
+Items not separated by commas will be concatenated,
+so
+.P1
+print $1 $2
+.P2
+runs the first and second fields together.
+.PP
+The predefined variables
+.UL NF
+and
+.UL NR
+can be used;
+for example
+.P1
+{ print NR, NF, $0 }
+.P2
+prints each record preceded by the record number and the number of fields.
+.PP
+Output may be diverted to multiple files;
+the program
+.P1
+{ print $1 >"foo1"; print $2 >"foo2" }
+.P2
+writes the first field,
+.UL $1 ,
+on the file
+.UL foo1 ,
+and the second field on file
+.UL foo2 .
+The
+.UL >>
+notation can also be used:
+.P1
+print $1 >>"foo"
+.P2
+appends the output to the file
+.UL foo .
+(In each case,
+the output files are
+created if necessary.)
+The file name can be a variable or a field as well as a constant;
+for example,
+.P1
+print $1 >$2
+.P2
+uses the contents of field 2 as a file name.
+.PP
+Naturally there is a limit on the number of output files;
+currently it is 10.
+.PP
+Similarly, output can be piped into another process
+(on
+.UC UNIX
+only); for instance,
+.P1
+print | "mail bwk"
+.P2
+mails the output to
+.UL bwk .
+.PP
+The variables
+.UL OFS
+and
+.UL ORS
+may be used to change the current
+output field separator and output
+record separator.
+The output record separator is
+appended to the output of the
+.UL print
+statement.
+.PP
+.IT Awk
+also provides the
+.UL printf
+statement for output formatting:
+.P1
+printf format expr, expr, ...
+.P2
+formats the expressions in the list
+according to the specification
+in
+.UL format
+and prints them.
+For example,
+.P1
+printf "%8.2f  %10ld\en", $1, $2
+.P2
+prints 
+.UL $1
+as a floating point number 8 digits wide,
+with two after the decimal point,
+and
+.UL $2
+as a 10-digit long decimal number,
+followed by a newline.
+No output separators are produced automatically;
+you must add them yourself,
+as in this example.
+The version of
+.UL printf
+is identical to that used with C.
+.[
+C programm language prentice hall 1978
+.]
+.NH 1
+Patterns
+.PP
+A pattern in front of an action acts as a selector
+that determines whether the action is to be executed.
+A variety of expressions may be used as patterns:
+regular expressions,
+arithmetic relational expressions,
+string-valued expressions,
+and arbitrary boolean
+combinations of these.
+.NH 2
+BEGIN and END
+.PP
+The special pattern
+.UL BEGIN
+matches the beginning of the input,
+before the first record is read.
+The pattern
+.UL END
+matches the end of the input,
+after the last record has been processed.
+.UL BEGIN
+and
+.UL END
+thus provide a way to gain control before and after processing,
+for initialization and wrapup.
+.PP
+As an example, the field separator
+can be set to a colon by
+.P1
+BEGIN  { FS = ":" }
+.ft I
+\&... rest of program ...
+.ft 3
+.P2
+Or the input lines may be counted by
+.P1
+END  { print NR }
+.P2
+If
+.UL BEGIN
+is present, it must be the first pattern;
+.UL END
+must be the last if used.
+.NH 2
+Regular Expressions
+.PP
+The simplest regular expression is a literal string of characters
+enclosed in slashes,
+like
+.P1
+/smith/
+.P2
+This
+is actually a complete
+.IT awk
+program which
+will print all lines which contain any occurrence
+of the name ``smith''.
+If a line contains ``smith''
+as part of a larger word,
+it will also be printed, as in
+.P1
+blacksmithing
+.P2
+.PP
+.IT Awk
+regular expressions include the regular expression
+forms found in
+the
+.UC UNIX
+text editor
+.IT ed\|
+.[
+unix program manual
+.]
+and
+.IT grep
+(without back-referencing).
+In addition,
+.IT awk
+allows
+parentheses for grouping, | for alternatives,
+.UL +
+for ``one or more'', and
+.UL ?
+for ``zero or one'',
+all as in
+.IT lex .
+Character classes
+may be abbreviated:
+.UL [a\-zA\-Z0\-9]
+is the set of all letters and digits.
+As an example,
+the
+.IT awk
+program
+.P1
+/[Aa]ho\||[Ww]einberger\||[Kk]ernighan/
+.P2
+will print all lines which contain any of the names
+``Aho,'' ``Weinberger'' or ``Kernighan,''
+whether capitalized or not.
+.PP
+Regular expressions
+(with the extensions listed above)
+must be enclosed in slashes,
+just as in
+.IT ed
+and
+.IT sed .
+Within a regular expression,
+blanks and the regular expression
+metacharacters are significant.
+To turn of the magic meaning
+of one of the regular expression characters,
+precede it with a backslash.
+An example is the pattern
+.P1
+/\|\e/\^.\^*\e//
+.P2
+which matches any string of characters
+enclosed in slashes.
+.PP
+One can also specify that any field or variable
+matches
+a regular expression (or does not match it) with the operators
+.UL ~
+and
+.UL !~ .
+The program
+.P1
+$1 ~ /[jJ]ohn/
+.P2
+prints all lines where the first field matches ``john'' or ``John.''
+Notice that this will also match ``Johnson'', ``St. Johnsbury'', and so on.
+To restrict it to exactly
+.UL [jJ]ohn ,
+use
+.P1
+$1 ~ /^[jJ]ohn$/
+.P2
+The caret ^ refers to the beginning
+of a line or field;
+the dollar sign
+.UL $
+refers to the end.
+.NH 2
+Relational Expressions
+.PP
+An
+.IT awk
+pattern can be a relational expression
+involving the usual relational operators
+.UL < ,
+.UL <= ,
+.UL == ,
+.UL != ,
+.UL >= ,
+and
+.UL > .
+An example is
+.P1
+$2 > $1 + 100
+.P2
+which selects lines where the second field
+is at least 100 greater than the first field.
+Similarly,
+.P1
+NF % 2 == 0
+.P2
+prints lines with an even number of fields.
+.PP
+In relational tests, if neither operand is numeric,
+a string comparison is made;
+otherwise it is numeric.
+Thus,
+.P1
+$1 >= "s"
+.P2
+selects lines that begin with an
+.UL s ,
+.UL t ,
+.UL u ,
+etc.
+In the absence of any other information,
+fields are treated as strings, so
+the program
+.P1
+$1 > $2
+.P2
+will perform a string comparison.
+.NH 2
+Combinations of Patterns
+.PP
+A pattern can be any boolean combination of patterns,
+using the operators
+.UL \||\||
+(or),
+.UL &&
+(and), and
+.UL !
+(not).
+For example,
+.P1
+$1 >= "s" && $1 < "t" && $1 != "smith"
+.P2
+selects lines where the first field begins with ``s'', but is not ``smith''.
+.UL &&
+and
+.UL \||\||
+guarantee that their operands
+will be evaluated
+from left to right;
+evaluation stops as soon as the truth or falsehood
+is determined.
+.NH 2
+Pattern Ranges
+.PP
+The ``pattern'' that selects an action may also
+consist of two patterns separated by a comma, as in
+.P1
+pat1, pat2     { ... }
+.P2
+In this case, the action is performed for each line between
+an occurrence of
+.UL pat1
+and the next occurrence of
+.UL pat2
+(inclusive).
+For example,
+.P1
+/start/, /stop/
+.P2
+prints all lines between
+.UL start
+and
+.UL stop ,
+while
+.P1
+NR == 100, NR == 200 { ... }
+.P2
+does the action for lines 100 through 200
+of the input.
+.NH 1
+Actions
+.PP
+An
+.IT awk
+action is a sequence of action statements
+terminated by newlines or semicolons.
+These action statements can be used to do a variety of
+bookkeeping and string manipulating tasks.
+.NH 2
+Built-in Functions
+.PP
+.IT Awk
+provides a ``length'' function
+to compute the length of a string of characters.
+This program prints each record,
+preceded by its length:
+.P1
+{print length, $0}
+.P2
+.UL length
+by itself is a ``pseudo-variable'' which
+yields the length of the current record;
+.UL length(argument)
+is a function which yields the length of its argument,
+as in
+the equivalent
+.P1
+{print length($0), $0}
+.P2
+The argument may be any expression.
+.PP
+.IT Awk
+also
+provides the arithmetic functions
+.UL sqrt ,
+.UL log ,
+.UL exp ,
+and
+.UL int ,
+for
+square root,
+base
+.IT e
+logarithm,
+exponential,
+and integer part of their respective arguments.
+.PP
+The name of one of these built-in functions,
+without argument or parentheses,
+stands for the value of the function on the
+whole record.
+The program
+.P1
+length < 10 || length > 20
+.P2
+prints lines whose length
+is less than 10 or greater
+than 20.
+.PP
+The function
+.UL substr(s,\ m,\ n)
+produces the substring of
+.UL s
+that begins at position
+.UL m
+(origin 1)
+and is at most
+.UL n
+characters long.
+If
+.UL n
+is omitted, the substring goes to the end of
+.UL s .
+The function
+.UL index(s1,\ s2)
+returns the position where the string
+.UL s2
+occurs in
+.UL s1 ,
+or zero if it does not.
+.PP
+The function
+.UL sprintf(f,\ e1,\ e2,\ ...)
+produces the value of the expressions
+.UL e1 ,
+.UL e2 ,
+etc.,
+in the
+.UL printf
+format specified by
+.UL f .
+Thus, for example,
+.P1
+x = sprintf("%8.2f %10ld", $1, $2)
+.P2
+sets
+.UL x
+to the string produced by formatting
+the values of
+.UL $1
+and
+.UL $2 .
+.NH 2
+Variables, Expressions, and Assignments
+.PP
+.IT Awk
+variables take on numeric (floating point)
+or string values according to context.
+For example, in
+.P1
+x = 1
+.P2
+.UL x
+is clearly a number, while in
+.P1
+x = "smith"
+.P2
+it is clearly a string.
+Strings are converted to numbers and
+vice versa whenever context demands it.
+For instance,
+.P1
+x = "3" + "4"
+.P2
+assigns 7 to
+.UL x .
+Strings which cannot be interpreted
+as numbers in a numerical context
+will generally have numeric value zero,
+but it is unwise to count on this behavior.
+.PP
+By default, variables (other than built-ins) are initialized to the null string,
+which has numerical value zero;
+this eliminates the need for most
+.UL BEGIN
+sections.
+For example, the sums of the first two fields can be computed by
+.P1
+       { s1 += $1; s2 += $2 }
+END    { print s1, s2 }
+.P2
+.PP
+Arithmetic is done internally in floating point.
+The arithmetic operators are
+.UL + ,
+.UL \- ,
+.UL \(** ,
+.UL / ,
+and
+.UL %
+(mod).
+The C increment
+.UL ++
+and
+decrement
+.UL \-\-
+operators are also available,
+and so are the assignment operators
+.UL += ,
+.UL \-= ,
+.UL *= ,
+.UL /= ,
+and
+.UL %= .
+These operators may all be used in expressions.
+.NH 2
+Field Variables
+.PP
+Fields in
+.IT awk
+share essentially all of the properties of variables _
+they may be used in arithmetic or string operations,
+and may be assigned to.
+Thus one can
+replace the first field with a sequence number like this:
+.P1
+{ $1 = NR; print }
+.P2
+or
+accumulate two fields into a third, like this:
+.P1
+{ $1 = $2 + $3; print $0 }
+.P2
+or assign a string to a field:
+.P1
+{ if ($3 > 1000)
+       $3 = "too big"
+  print
+}
+.P2
+which replaces the third field by ``too big'' when it is,
+and in any case prints the record.
+.PP
+Field references may be numerical expressions,
+as in
+.P1
+{ print $i, $(i+1), $(i+n) }
+.P2
+Whether a field is deemed numeric or string depends on context;
+in ambiguous cases like
+.P1
+if ($1 == $2) ...
+.P2
+fields are treated as strings.
+.PP
+Each input line is split into fields automatically as necessary.
+It is also possible to split any variable or string
+into fields:
+.P1
+n = split(s, array, sep)
+.P2
+splits the
+the string
+.UL s
+into
+.UL array[1] ,
+\&...,
+.UL array[n] .
+The number of elements found is returned.
+If the
+.UL sep
+argument is provided, it is used as the field separator;
+otherwise
+.UL FS
+is used as the separator.
+.NH 2
+String Concatenation
+.PP
+Strings may be concatenated.
+For example
+.P1
+length($1 $2 $3)
+.P2
+returns the length of the first three fields.
+Or in a
+.UL print
+statement,
+.P1
+print $1 " is " $2
+.P2
+prints
+the two fields separated by `` is ''.
+Variables and numeric expressions may also appear in concatenations.
+.NH 2
+Arrays
+.PP
+Array elements are not declared;
+they spring into existence by being mentioned.
+Subscripts may have
+.ul
+any
+non-null
+value, including non-numeric strings.
+As an example of a conventional numeric subscript,
+the statement
+.P1
+x[NR] = $0
+.P2
+assigns the current input record to
+the
+.UL NR -th
+element of the array
+.UL x .
+In fact, it is possible in principle (though perhaps slow)
+to process the entire input in a random order with the
+.IT awk
+program
+.P1
+       { x[NR] = $0 }
+END    { \fI... program ...\fP }
+.P2
+The first action merely records each input line in
+the array
+.UL x .
+.PP
+Array elements may be named by non-numeric values,
+which gives
+.IT awk
+a capability rather like the associative memory of
+Snobol tables.
+Suppose the input contains fields with values like
+.UL apple ,
+.UL orange ,
+etc.
+Then the program
+.P1
+/apple/        { x["apple"]++ }
+/orange/       { x["orange"]++ }
+END            { print x["apple"], x["orange"] }
+.P2
+increments counts for the named array elements,
+and prints them at the end of the input.
+.NH 2
+Flow-of-Control Statements
+.PP
+.IT Awk
+provides the basic flow-of-control statements
+.UL if-else ,
+.UL while ,
+.UL for ,
+and statement grouping with braces, as in C.
+We showed the
+.UL if
+statement in section 3.3 without describing it.
+The condition in parentheses is evaluated;
+if it is true, the statement following the
+.UL if
+is done.
+The
+.UL else
+part is optional.
+.PP
+The
+.UL while
+statement is exactly like that of C.
+For example, to print all input fields one per line,
+.P1
+i = 1
+while (i <= NF) {
+       print $i
+       ++i
+}
+.P2
+.PP
+The
+.UL for
+statement is also exactly that of C:
+.P1
+for (i = 1; i <= NF; i++)
+       print $i
+.P2
+does the same job as the
+.UL while
+statement above.
+.PP
+There is an alternate form of the
+.UL for
+statement which is suited for accessing the
+elements of an associative array:
+.P1
+for (i in array)
+       \fIstatement\f3
+.P2
+does
+.ul
+statement
+with 
+.UL i
+set in turn to each element of
+.UL array .
+The elements are accessed in an apparently random order.
+Chaos will ensue if 
+.UL i
+is altered, or if any new elements are
+accessed during the loop.
+.PP
+The expression in the condition part of an
+.UL if ,
+.UL while
+or
+.UL for
+can include relational operators like
+.UL < ,
+.UL <= ,
+.UL > ,
+.UL >= ,
+.UL ==
+(``is equal to''),
+and
+.UL !=
+(``not equal to'');
+regular expression matches with the match operators
+.UL ~
+and
+.UL !~ ;
+the logical operators
+.UL \||\|| ,
+.UL && ,
+and
+.UL ! ;
+and of course parentheses for grouping.
+.PP
+The
+.UL break
+statement causes an immediate exit
+from an enclosing
+.UL while
+or
+.UL for ;
+the
+.UL continue
+statement
+causes the next iteration to begin.
+.PP
+The statement
+.UL next
+causes
+.IT awk
+to skip immediately to
+the next record and begin scanning the patterns from the top.
+The statement
+.UL exit
+causes the program to behave as if the end of the input
+had occurred.
+.PP
+Comments may be placed in
+.IT awk
+programs:
+they begin with the character
+.UL #
+and end with the end of the line,
+as in
+.P1
+print x, y     # this is a comment
+.P2
+.NH
+Design
+.PP
+The
+.UX
+system
+already provides several programs that
+operate by passing input through a
+selection mechanism.
+.IT Grep ,
+the first and simplest, merely prints all lines which
+match a single specified pattern.
+.IT Egrep
+provides more general patterns, i.e., regular expressions
+in full generality;
+.IT fgrep
+searches for a set of keywords with a particularly fast algorithm.
+.IT Sed\|
+.[
+unix programm manual
+.]
+provides most of the editing facilities of
+the editor
+.IT ed  ,
+applied to a stream of input.
+None of these programs provides
+numeric capabilities,
+logical relations,
+or variables.
+.PP
+.IT Lex\|
+.[
+lesk lexical analyzer cstr
+.]
+provides general regular expression recognition capabilities,
+and, by serving as a C program generator,
+is essentially open-ended in its capabilities.
+The use of
+.IT lex ,
+however, requires a knowledge of C programming,
+and a
+.IT lex
+program must be compiled and loaded before use,
+which discourages its use for one-shot applications.
+.PP
+.IT Awk
+is an attempt
+to fill in another part of the matrix of possibilities.
+It
+provides general regular expression capabilities
+and an implicit input/output loop.
+But it also provides convenient numeric processing,
+variables,
+more general selection,
+and control flow in the actions.
+It
+does not require compilation or a knowledge of C.
+Finally,
+.IT awk
+provides
+a convenient way to access fields within lines;
+it is unique in this respect.
+.PP
+.IT Awk
+also tries to integrate strings and numbers
+completely,
+by treating all quantities as both string and numeric,
+deciding which representation is appropriate
+as late as possible.
+In most cases the user can simply ignore the differences.
+.PP
+Most of the effort in developing
+.I awk
+went into deciding what
+.I awk
+should or should not do
+(for instance, it doesn't do string substitution)
+and what the syntax should be
+(no explicit operator for concatenation)
+rather
+than on writing or debugging the code.
+We have tried
+to make the syntax powerful
+but easy to use and well adapted
+to scanning files.
+For example,
+the absence of declarations and implicit initializations,
+while probably a bad idea for a general-purpose programming language,
+is desirable in a language
+that is meant to be used for tiny programs
+that may even be composed on the command line.
+.PP
+In practice,
+.IT awk
+usage seems to fall into two broad categories.
+One is what might be called ``report generation'' \(em
+processing an input to extract counts,
+sums, sub-totals, etc.
+This also includes the writing of trivial
+data validation programs,
+such as verifying that a field contains only numeric information
+or that certain delimiters are properly balanced.
+The combination of textual and numeric processing is invaluable here.
+.PP
+A second area of use is as a data transformer,
+converting data from the form produced by one program
+into that expected by another.
+The simplest examples merely select fields, perhaps with rearrangements.
+.NH
+Implementation
+.PP
+The actual implementation of
+.IT awk
+uses the language development tools available
+on the
+.UC UNIX
+operating system.
+The grammar is specified with
+.IT yacc ;
+.[
+yacc johnson cstr
+.]
+the lexical analysis is done by
+.IT lex ;
+the regular expression recognizers are
+deterministic finite automata
+constructed directly from the expressions.
+An
+.IT awk
+program is translated into a 
+parse tree which is then directly executed
+by a simple interpreter.
+.PP
+.IT Awk
+was designed for ease of use rather than processing speed;
+the delayed evaluation of variable types
+and the necessity to break input
+into fields makes high speed difficult to achieve in any case.
+Nonetheless,
+the program has not proven to be unworkably slow.
+.PP
+Table I below shows the execution (user + system) time
+on a PDP-11/70 of
+the
+.UC UNIX
+programs
+.IT wc ,
+.IT grep ,
+.IT egrep ,
+.IT fgrep ,
+.IT sed ,
+.IT lex ,
+and
+.IT awk
+on the following simple tasks:
+.IP "\ \ 1."
+count the number of lines.
+.IP "\ \ 2."
+print all lines containing ``doug''.
+.IP "\ \ 3."
+print all lines containing ``doug'', ``ken'' or ``dmr''.
+.IP "\ \ 4."
+print the third field of each line.
+.IP "\ \ 5."
+print the third and second fields of each line, in that order.
+.IP "\ \ 6."
+append all lines containing ``doug'', ``ken'', and ``dmr''
+to files ``jdoug'', ``jken'', and ``jdmr'', respectively.
+.IP "\ \ 7."
+print each line prefixed by ``line-number\ :\ ''.
+.IP "\ \ 8."
+sum the fourth column of a table.
+.LP
+The program
+.IT wc
+merely counts words, lines and characters in its input;
+we have already mentioned the others.
+In all cases the input was a file containing
+10,000 lines
+as created by the
+command
+.IT "ls \-l" ;
+each line has the form
+.P1
+-rw-rw-rw- 1 ava 123 Oct 15 17:05 xxx
+.P2
+The total length of this input is
+452,960 characters.
+Times for
+.IT lex
+do not include compile or load.
+.PP
+As might be expected,
+.IT awk
+is not as fast as the specialized tools
+.IT wc ,
+.IT sed ,
+or the programs in the
+.IT grep
+family,
+but
+is faster than the more general tool
+.IT lex .
+In all cases, the tasks were
+about as easy to express as
+.IT awk
+programs
+as programs in these other languages;
+tasks involving fields were
+considerably easier to express as
+.IT awk
+programs.
+Some of the test programs are shown in
+.IT awk ,
+.IT sed
+and
+.IT lex .
+.[
+$LIST$
+.]
+.1C
+.TS
+center;
+c c c c c c c c c
+c c c c c c c c c
+c|n|n|n|n|n|n|n|n|.
+                               Task
+Program        1       2       3       4       5       6       7       8
+_
+\fIwc\fR       8.6
+\fIgrep\fR     11.7    13.1
+\fIegrep\fR    6.2     11.5    11.6
+\fIfgrep\fR    7.7     13.8    16.1
+\fIsed\fR      10.2    11.6    15.8    29.0    30.5    16.1
+\fIlex\fR      65.1    150.1   144.2   67.7    70.3    104.0   81.7    92.8
+\fIawk\fR      15.0    25.6    29.9    33.3    38.9    46.4    71.4    31.1
+_
+.TE
+.sp
+.ce
+\fBTable I.\fR  Execution Times of Programs. (Times are in sec.)
+.sp 2
+.2C
+.PP
+The programs for some of these jobs are shown below.
+The
+.IT lex
+programs are generally too long to show.
+.LP
+AWK:
+.LP
+.P1
+1.     END     {print NR}
+.P2
+.P1
+2.     /doug/
+.P2
+.P1
+3.     /ken|doug|dmr/
+.P2
+.P1
+4.     {print $3}
+.P2
+.P1
+5.     {print $3, $2}
+.P2
+.P1
+6.     /ken/   {print >"jken"}
+       /doug/  {print >"jdoug"}
+       /dmr/   {print >"jdmr"}
+.P2
+.P1
+7.     {print NR ": " $0}
+.P2
+.P1
+8.             {sum = sum + $4}
+       END     {print sum}
+.P2
+.LP
+SED:
+.LP
+.P1
+1.     $=
+.P2
+.P1
+2.     /doug/p
+.P2
+.P1
+3.     /doug/p
+       /doug/d
+       /ken/p
+       /ken/d
+       /dmr/p
+       /dmr/d
+.P2
+.P1
+4.     /[^ ]* [ ]*[^ ]* [ ]*\e([^ ]*\e) .*/s//\e1/p
+.P2
+.P1
+5.     /[^ ]* [ ]*\e([^ ]*\e) [ ]*\e([^ ]*\e) .*/s//\e2 \e1/p
+.P2
+.P1
+6.     /ken/w jken
+       /doug/w jdoug
+       /dmr/w jdmr
+.P2
+.LP
+LEX:
+.LP
+.P1
+1.     %{
+       int i;
+       %}
+       %%
+       \en     i++;
+       .       ;
+       %%
+       yywrap() {
+               printf("%d\en", i);
+       }
+.P2
+.P1
+2.     %%
+       ^.*doug.*$      printf("%s\en", yytext);
+       .       ;
+       \en     ;
+.P2
diff --git a/usr/doc/bc b/usr/doc/bc

new file mode 100644 (file)

index 0000000..58d839c
--- /dev/null
+++ b/usr/doc/bc
@@ -0,0 +1,1059 @@
+.RP
+.TL
+BC \- An Arbitrary Precision Desk-Calculator Language
+.AU
+Lorinda Cherry
+.AU
+Robert Morris
+.AI
+.MH
+.AB
+BC is a language and a compiler for doing arbitrary precision arithmetic
+on the PDP-11 under the
+.UX
+time-sharing
+system.  The output of the compiler is interpreted and executed by
+a collection of routines which can input, output, and do
+arithmetic on indefinitely large integers and on scaled fixed-point
+numbers.
+.PP
+These routines are themselves based on a dynamic storage allocator.
+Overflow does not occur until all available core storage
+is exhausted.
+.PP
+The language has a complete control structure as well as immediate-mode
+operation.  Functions can be defined and saved for later execution.
+.PP
+Two five hundred-digit numbers can be multiplied to give a
+thousand digit result in about ten seconds.
+.PP
+A small collection of library functions is also available,
+including sin, cos, arctan, log, exponential, and Bessel functions of
+integer order.
+.PP
+Some of the uses of this compiler are
+.IP \-
+to do computation with large integers,
+.IP \-
+to do computation accurate to many decimal places,
+.IP \-
+conversion of numbers from one base to another base.
+.AE
+.PP
+.SH
+Introduction
+.PP
+BC is a language and a compiler for doing arbitrary precision
+arithmetic on the
+.UX
+time-sharing system [1].
+The compiler was written to make conveniently available a
+collection of routines (called DC [5]) which are capable of doing
+arithmetic on integers of arbitrary size.  The compiler
+is by no means intended to provide a complete programming
+language.
+It is a minimal language facility.
+.PP
+There is a scaling provision that permits the
+use of decimal point notation.
+Provision is made for input and output in bases other than
+decimal.  Numbers can be converted from decimal to octal by
+simply setting the output base to equal 8.
+.PP
+The actual limit on the number of digits that can
+be handled depends on the amount of storage available on the machine.
+Manipulation of numbers with many hundreds of digits
+is possible even on the smallest versions of
+.UX .
+.PP
+The syntax of BC has been deliberately selected to agree
+substantially with the C language [2].  Those who
+are familiar with C will find few surprises in this language.
+.SH
+Simple Computations with Integers
+.PP
+The simplest kind of statement is an arithmetic expression
+on a line by itself.
+For instance, if you type in the line:
+.DS
+142857 + 285714
+.DE
+the program responds immediately with the line
+.DS
+428571
+.DE
+The operators \-, *, /, %, and ^ can also be used; they
+indicate subtraction, multiplication, division, remaindering, and
+exponentiation, respectively.  Division of integers produces an
+integer result truncated toward zero.
+Division by zero produces an error
+comment.
+.PP
+Any term in an expression may be prefixed by a minus sign to
+indicate that it is to be negated (the `unary' minus sign).
+The expression
+.DS
+7+\-3
+.DE
+is interpreted to mean that \-3 is to be added to 7.
+.PP
+More complex expressions with several operators and with
+parentheses are interpreted just as in
+Fortran, with ^ having the greatest binding
+power, then * and % and /, and finally + and \-.
+Contents of parentheses are evaluated before material
+outside the parentheses.
+Exponentiations are
+performed from right to left and the other operators
+from left to right.
+The two expressions
+.DS
+a^b^c  and  a^(b^c)
+.DE
+are equivalent, as are the two expressions
+.DS
+a*b*c  and  (a*b)*c
+.DE
+BC shares with Fortran and C the undesirable convention that
+.DS
+a/b*c  is equivalent to  (a/b)*c
+.DE
+.PP
+Internal storage registers to hold numbers have single lower-case
+letter names.  The value of an expression can be assigned to
+a register in the usual way.  The statement
+.DS
+x = x + 3
+.DE
+has the effect of increasing by three the value of the contents of the
+register named x.
+When, as in this case, the outermost operator is an =, the
+assignment is performed but the result is not printed.
+Only 26 of these named storage registers are available.
+.PP
+There is a built-in square root function whose
+result is truncated to an integer (but see scaling below).
+The lines
+.DS
+x = sqrt(191)
+x
+.DE
+produce the printed result
+.DS
+13
+.DE
+.SH
+Bases
+.PP
+There are special internal quantities, called `ibase' and `obase'.
+The contents of `ibase', initially set to 10,
+determines the base used for interpreting numbers read in.
+For example, the lines
+.DS
+ibase = 8
+11
+.DE
+will produce the output line
+.DS
+9
+.DE
+and you are all set up to do octal to decimal conversions.
+Beware, however of trying to change the input base back
+to decimal by typing
+.DS
+ibase = 10
+.DE
+Because the number 10 is interpreted as octal, this statement will
+have no effect.
+For those who deal in hexadecimal notation,
+the characters A\-F are permitted in numbers
+(no matter what base is in effect)
+and are
+interpreted as digits having values 10\-15 respectively.
+The statement
+.DS
+ibase = A
+.DE
+will change you back to decimal input base no matter what the
+current input base is.
+Negative and large positive input bases are
+permitted but useless.
+No mechanism has been provided for the input of arbitrary
+numbers in bases less than 1 and greater than 16.
+.PP
+The contents of `obase', initially set to 10, are used as the base for output
+numbers.  The lines
+.DS
+obase = 16
+1000
+.DE
+will produce the output line
+.DS
+3E8
+.DE
+which is to be interpreted as a 3-digit hexadecimal number.
+Very large output bases are permitted, and they are sometimes useful.
+For example, large numbers can be output in groups of five digits
+by setting `obase' to 100000.
+Strange (i.e. 1, 0, or negative) output bases are
+handled appropriately.
+.PP
+Very large numbers are split across lines with 70 characters per line.
+Lines which are continued end with \\.
+Decimal output conversion is practically instantaneous, but output
+of very large numbers (i.e., more than 100 digits) with other bases
+is rather slow.
+Non-decimal output conversion of
+a one hundred digit number takes about
+three seconds.
+.PP
+It is best to remember that `ibase' and `obase' have no effect
+whatever on the course of internal computation or
+on the evaluation of expressions, but only affect input and
+output conversion, respectively.
+.SH
+Scaling
+.PP
+A third special internal quantity called `scale' is
+used to determine the scale of calculated
+quantities.
+Numbers may have
+up to 99 decimal digits after the decimal point.
+This fractional part is retained in further computations.
+We refer to the number of digits after the decimal point of
+a number as its scale.
+.PP
+When two scaled numbers are combined by
+means of one of the arithmetic operations, the result
+has a scale determined by the following rules.  For
+addition and subtraction, the scale of the result is the larger
+of the scales of the two operands.  In this case,
+there is never any truncation of the result.
+For multiplications, the scale of the result is never
+less than the maximum of the two scales of the operands,
+never more than the sum of the scales of the operands
+and, subject to those two restrictions,
+the scale of the result is set equal to the contents of the internal
+quantity `scale'.
+The scale of a quotient is the contents of the internal
+quantity `scale'.  The scale of a remainder is
+the sum of the scales of the quotient and the divisor.
+The result of an exponentiation is scaled as if
+the implied multiplications were performed.
+An exponent must be an integer.
+The scale of a square root is set to the maximum of the scale
+of the argument and the contents of `scale'.
+.PP
+All of the internal operations are actually carried out in terms
+of integers, with digits being discarded when necessary.
+In every case where digits are discarded, truncation and
+not rounding is performed.
+.PP
+The contents of
+`scale' must be no greater than
+99 and no less than 0.  It is initially set to 0.
+In case you need more than 99 fraction digits, you may arrange
+your own scaling.
+.PP
+The internal quantities `scale', `ibase', and `obase' can be
+used in expressions just like other variables.
+The line
+.DS
+scale = scale + 1
+.DE
+increases the value of `scale' by one, and the line
+.DS
+scale
+.DE
+causes the current value of `scale' to be printed.
+.PP
+The value of `scale' retains its meaning as a
+number of decimal digits to be retained in internal
+computation even when `ibase' or `obase' are not equal to 10.
+The internal computations (which are still conducted in decimal,
+regardless of the bases) are performed to the specified number
+of decimal digits, never hexadecimal or octal or any
+other kind of digits.
+.SH
+Functions
+.PP
+The name of a function is a single lower-case letter.
+Function names are permitted to collide with simple
+variable names.
+Twenty-six different defined functions are permitted
+in addition to the twenty-six variable names.
+The line
+.DS
+       define a(x){
+.DE
+begins the definition of a function with one argument.
+This line must be followed by one or more statements,
+which make up the body of the function, ending
+with a right brace }.
+Return of control from a function occurs when a return
+statement is executed or when the end of the function is reached.
+The return statement can take either
+of the two forms
+.DS
+return
+return(x)
+.DE
+In the first case, the value of the function is 0, and in
+the second, the value of the expression in parentheses.
+.PP
+Variables used in the function can be declared as automatic
+by a statement of the form
+.DS
+auto x,y,z
+.DE
+There can be only one `auto' statement in a function and it must
+be the first statement in the definition.
+These automatic variables are allocated space and initialized
+to zero on entry to the function and thrown away on return.  The
+values of any variables with the same names outside the function
+are not disturbed.
+Functions may be called recursively and the automatic variables
+at each level of call are protected.
+The parameters named in a function definition are treated in
+the same way as the automatic variables of that function
+with the single exception that they are given a value
+on entry to the function.
+An example of a function definition is
+.DS
+       define a(x,y){
+               auto z
+               z = x*y
+               return(z)
+       }
+.DE
+The value of this function, when called, will be the
+product of its
+two arguments.
+.PP
+A function is called by the appearance of its name
+followed by a string of arguments enclosed in
+parentheses and separated by commas.
+The result
+is unpredictable if the wrong number of arguments is used.
+.PP
+Functions with no arguments are defined and called using
+parentheses with nothing between them: b().
+.PP
+If the function
+.ft I
+a
+.ft
+above has been defined, then the line
+.DS
+a(7,3.14)
+.DE
+would cause the result 21.98 to be printed and the line
+.DS
+x = a(a(3,4),5)
+.DE
+would cause the value of x to become 60.
+.SH
+Subscripted Variables
+.PP
+A single lower-case letter variable name
+followed by an expression in brackets is called a subscripted
+variable (an array element).
+The variable name is called the array name and the expression
+in brackets is called the subscript.
+Only one-dimensional arrays are
+permitted.  The names of arrays are permitted to
+collide with the names of simple variables and function names.
+Any fractional
+part of a subscript is discarded before use.
+Subscripts must be greater than or equal to zero and 
+less than or equal to 2047.
+.PP
+Subscripted variables may be freely used in expressions, in
+function calls, and in return statements.
+.PP
+An array name may be used as an argument to a function,
+or may be declared as automatic in
+a function definition by the use of empty brackets:
+.DS
+f(a[\|])
+define f(a[\|])
+auto a[\|]
+.DE
+When an array name is so used, the whole contents of the array
+are copied for the use of the function, and thrown away on exit
+from the function.
+Array names which refer to whole arrays cannot be used
+in any other contexts.
+.SH
+Control Statements
+.PP
+The `if', the `while', and the `for' statements
+may be used to alter the flow within programs or to cause iteration.
+The range of each of them is a statement or
+a compound statement consisting of a collection of
+statements enclosed in braces.
+They are written in the following way
+.DS
+if(relation) statement
+while(relation) statement
+for(expression1; relation; expression2) statement
+.DE
+or
+.DS
+if(relation) {statements}
+while(relation) {statements}
+for(expression1; relation; expression2) {statements}
+.DE
+.PP
+A relation in one of the control statements is an expression of the form
+.DS
+x>y
+.DE
+where  two expressions are related by one of the six relational
+operators <, >, <=, >=, ==, or !=.
+The relation ==
+stands for `equal to' and != stands for `not equal to'.
+The meaning of the remaining relational operators is
+clear.
+.PP
+BEWARE of using = instead of == in a relational.  Unfortunately,
+both of them are legal, so you will not get a diagnostic
+message, but = really will not do a comparison.
+.PP
+The `if' statement causes execution of its range
+if and only if the relation is true.
+Then control passes to the next statement in sequence.
+.PP
+The `while' statement causes execution of its range
+repeatedly as long as the relation
+is true.  The relation is tested before each execution
+of its range and if the relation
+is false, control passes to the next statement beyond the range
+of the while.
+.PP
+The `for' statement begins
+by executing `expression1'.  Then the relation is tested
+and, if true, the statements in the range of the `for' are executed.
+Then `expression2' is executed.  The relation is tested, and so on.
+The typical use of the `for' statement is for a controlled iteration,
+as in the statement
+.DS
+for(i=1; i<=10; i=i+1) i
+.DE
+which will print the integers from 1 to 10.
+Here are some examples of the use of the control statements.
+.DS
+define f(n){
+auto i, x
+x=1
+for(i=1; i<=n; i=i+1) x=x*i
+return(x)
+}
+.DE
+The line
+.DS
+       f(a)
+.DE
+will print
+.ft I
+a
+.ft
+factorial if
+.ft I
+a
+.ft
+is a positive integer.
+Here is the definition of a function which will
+compute values of the binomial coefficient
+(m and n are assumed to be positive integers).
+.DS
+define b(n,m){
+auto x, j
+x=1
+for(j=1; j<=m; j=j+1) x=x*(n\-j+1)/j
+return(x)
+}
+.DE
+The following function computes values of the exponential function
+by summing the appropriate series
+without regard for possible truncation errors:
+.DS
+scale = 20
+define e(x){
+       auto a, b, c, d, n
+       a = 1
+       b = 1
+       c = 1
+       d = 0
+       n = 1
+       while(1==1){
+               a = a*x
+               b = b*n
+               c = c + a/b
+               n = n + 1
+               if(c==d) return(c)
+               d = c
+       }
+}
+.DE
+.SH
+Some Details
+.PP
+There are some language features that every user should know
+about even if he will not use them.
+.PP
+Normally statements are typed one to a line.  It is also permissible
+to type several statements on a line separated by semicolons.
+.PP
+If an assignment statement is parenthesized, it then has
+a value and it can be used anywhere that an expression can.
+For example, the line
+.DS
+(x=y+17)
+.DE
+not only makes the indicated assignment, but also prints the
+resulting value.
+.PP
+Here is an example of a use of the value of an
+assignment statement even when it is not parenthesized.
+.DS
+x = a[i=i+1]
+.DE
+causes a value to be assigned to x and also increments i
+before it is used as a subscript.
+.PP
+The following constructs work in BC in exactly the same manner
+as they do in the C language.  Consult the appendix or the
+C manuals [2] for their exact workings.
+.DS
+.ta 2i
+x=y=z  is the same as  x=(y=z)
+x =+ y x = x+y
+x =\- y        x = x\-y
+x =* y x = x*y
+x =/ y x = x/y
+x =% y x = x%y
+x =^ y x = x^y
+x++    (x=x+1)\-1
+x\-\-  (x=x\-1)+1
+++x    x = x+1
+\-\-x  x = x\-1
+.DE
+Even if you don't intend to use the constructs,
+if you type one inadvertently, something correct but unexpected
+may happen.
+.PP
+WARNING!  In some of these constructions, spaces are
+significant.
+There is a real difference between
+x=\-y and x= \-y.
+The first replaces x by x\-y and the second by \-y.
+.SH
+Three Important Things
+.PP
+1.  To exit a BC program, type `quit'.
+.PP
+2. There is a comment convention identical to that of C and
+of PL/I.  Comments begin with `/*' and end with `*/'.
+.PP
+3. There is a library of math functions which may be obtained by
+typing at command level
+.DS
+bc \-l
+.DE
+This command will load a set of library functions
+which, at the time of writing, consists of sine (named `s'),
+cosine (`c'), arctangent (`a'), natural logarithm (`l'),
+exponential (`e') and Bessel functions of integer order (`j(n,x)').  Doubtless more functions will be added
+in time.
+The library sets the scale to 20.  You can reset it to something
+else if you like.
+The design of these mathematical library routines
+is discussed elsewhere [3].
+.PP
+If you type
+.DS
+bc file ...
+.DE
+BC will read and execute the named file or files before accepting
+commands from the keyboard.  In this way, you may load your
+favorite programs and function definitions.
+.SH
+Acknowledgement
+.PP
+The compiler is written in YACC [4]; its original
+version  was written by S. C. Johnson.
+.SH
+References
+.IP [1]
+K. Thompson and D. M. Ritchie,
+.ft I
+UNIX Programmer's Manual,
+.ft
+Bell Laboratories,
+1978.
+.IP [2]
+B. W. Kernighan and
+D. M. Ritchie,
+.ft I
+The C Programming Language,
+.ft
+Prentice-Hall, 1978.
+.IP [3]
+R. Morris,
+.ft I
+A Library of Reference Standard Mathematical Subroutines,
+.ft
+Bell Laboratories internal memorandum, 1975.
+.IP [4]
+S. C. Johnson,
+.ft I
+YACC \(em Yet Another Compiler-Compiler.
+.ft
+Bell Laboratories Computing Science Technical Report #32, 1978.
+.IP [5]
+R. Morris and L. L. Cherry,
+.ft I
+DC \- An Interactive Desk Calculator.
+.ft
+.LP
+.bp
+.ft B
+.DS C
+Appendix
+.DE
+.ft
+.NH
+Notation
+.PP
+In the following pages syntactic categories are in \fIitalics\fP;
+literals are in \fBbold\fP; material in brackets [\|] is optional.
+.NH
+Tokens
+.PP
+Tokens consist of keywords, identifiers, constants, operators,
+and separators.
+Token separators may be blanks, tabs or comments.
+Newline characters or semicolons separate statements.
+.NH 2
+Comments
+.PP
+Comments are introduced by the characters /* and terminated by
+*/.
+.NH 2
+Identifiers
+.PP
+There are three kinds of identifiers \- ordinary identifiers, array identifiers
+and function identifiers.
+All three types consist of single lower-case letters.
+Array identifiers are followed by square brackets, possibly
+enclosing an expression describing a subscript.
+Arrays are singly dimensioned and may contain up to 2048
+elements.
+Indexing begins at zero so an array may be indexed from 0 to 2047.
+Subscripts are truncated to integers.
+Function identifiers are followed by parentheses, possibly enclosing arguments.
+The three types of identifiers do not conflict;
+a program can have a variable named \fBx\fP,
+an array named \fBx\fP and a function named \fBx\fP, all of which are separate and
+distinct.
+.NH 2
+Keywords
+.PP
+The following are reserved keywords:
+.ft B
+.ta .5i 1.0i
+.nf
+       ibase   if
+       obase   break
+       scale   define
+       sqrt    auto
+       length  return
+       while   quit
+       for
+.fi
+.ft
+.NH 2
+Constants
+.PP
+Constants consist of arbitrarily long numbers
+with an optional decimal point.
+The hexadecimal digits \fBA\fP\-\fBF\fP are also recognized as digits with
+values 10\-15, respectively.
+.NH 1
+Expressions
+.PP
+The value of an expression is printed unless the main
+operator is an assignment.
+Precedence is the same as the order
+of presentation here, with highest appearing first.
+Left or right associativity, where applicable, is
+discussed with each operator.
+.bp
+.NH 2
+Primitive expressions
+.NH 3
+Named expressions
+.PP
+Named expressions are
+places where values are stored.
+Simply stated,
+named expressions are legal on the left
+side of an assignment.
+The value of a named expression is the value stored in the place named.
+.NH 4
+\fIidentifiers\fR
+.PP
+Simple identifiers are named expressions.
+They have an initial value of zero.
+.NH 4
+\fIarray-name\fP\|[\|\fIexpression\fP\|]
+.PP
+Array elements are named expressions.
+They have an initial value of zero.
+.NH 4
+\fBscale\fR, \fBibase\fR and \fBobase\fR
+.PP
+The internal registers
+\fBscale\fP, \fBibase\fP and \fBobase\fP are all named expressions.
+\fBscale\fP is the number of digits after the decimal point to be
+retained in arithmetic operations.
+\fBscale\fR has an initial value of zero.
+\fBibase\fP and \fBobase\fP are the input and output number
+radix respectively.
+Both \fBibase\fR and \fBobase\fR have initial values of 10.
+.NH 3
+Function calls
+.NH 4
+\fIfunction-name\fB\|(\fR[\fIexpression\fR\|[\fB,\|\fIexpression\|\fR.\|.\|.\|]\|]\fB)
+.PP
+A function call consists of a function name followed by parentheses
+containing a comma-separated list of
+expressions, which are the function arguments.
+A whole array passed as an argument is specified by the
+array name followed by empty square brackets.
+All function arguments are passed by
+value.
+As a result, changes made to the formal parameters have
+no effect on the actual arguments.
+If the function terminates by executing a return
+statement, the value of the function is
+the value of the expression in the parentheses of the return
+statement or is zero if no expression is provided
+or if there is no return statement.
+.NH 4
+sqrt\|(\|\fIexpression\fP\|)
+.PP
+The result is the square root of the expression.
+The result is truncated in the least significant decimal place.
+The scale of the result is
+the scale of the expression or the
+value of
+.ft B
+scale,
+.ft
+whichever is larger.
+.NH 4
+length\|(\|\fIexpression\fP\|)
+.PP
+The result is the total number of significant decimal digits in the expression.
+The scale of the result is zero.
+.NH 4
+scale\|(\|\fIexpression\fP\|)
+.PP
+The result is the scale of the expression.
+The scale of the result is zero.
+.NH 3
+Constants
+.PP
+Constants are primitive expressions.
+.NH 3
+Parentheses
+.PP
+An expression surrounded by parentheses is
+a primitive expression.
+The parentheses are used to alter the
+normal precedence.
+.NH 2
+Unary operators
+.PP
+The unary operators
+bind right to left.
+.NH 3
+\-\|\fIexpression\fP
+.PP
+The result is the negative of the expression.
+.NH 3
+++\|\fInamed-expression\fP
+.PP
+The named expression is
+incremented by one.
+The result is the value of the named expression after
+incrementing.
+.NH 3
+\-\-\|\fInamed-expression\fP
+.PP
+The named expression is
+decremented by one.
+The result is the value of the named expression after
+decrementing.
+.NH 3
+\fInamed-expression\fP\|++
+.PP
+The named expression is
+incremented by one.
+The result is the value of the named expression before
+incrementing.
+.NH 3
+\fInamed-expression\fP\|\-\-
+.PP
+The named expression is
+decremented by one.
+The result is the value of the named expression before
+decrementing.
+.NH 2
+Exponentiation operator
+.PP
+The exponentiation operator binds right to left.
+.NH 3
+\fIexpression\fP ^ \fIexpression\fP
+.PP
+The result is the first
+expression raised to the power of the
+second expression.
+The second expression must be an integer.
+If \fIa\fP
+is the scale of the left expression
+and \fIb\fP is the absolute value
+of the right expression,
+then the scale of the result is:
+.PP
+min\|(\|\fIa\(mub\fP,\|max\|(\|\fBscale\fP,\|\fIa\fP\|)\|)
+.NH 2
+Multiplicative operators
+.PP
+The operators *, /, % bind left to right.
+.NH 3
+\fIexpression\fP * \fIexpression\fP
+.PP
+The result is the product
+of the two expressions.
+If \fIa\fP and \fIb\fP are the
+scales of the two expressions,
+then the scale of the result is:
+.PP
+min\|(\|\fIa+b\fP,\|max\|(\|\fBscale\fP,\|\fIa\fP,\|\fIb\fP\|)\|)
+.NH 3
+\fIexpression\fP / \fIexpression\fP
+.PP
+The result is the quotient of the two expressions.
+The scale of the result is the value of \fBscale\fR.
+.NH 3
+\fIexpression\fP % \fIexpression\fP
+.PP
+The % operator produces the remainder of the division
+of the two expressions.
+More precisely,
+\fIa\fP%\fIb\fP is \fIa\fP\-\fIa\fP/\fIb\fP*\fIb\fP.
+.PP
+The scale of the result is the sum of the scale of
+the divisor and the value of
+.ft B
+scale
+.ft
+.NH 2
+Additive operators
+.PP
+The additive operators bind left to right.
+.NH 3
+\fIexpression\fP + \fIexpression\fP
+.PP
+The result is the sum of the two expressions.
+The scale of the result is
+the maximun of the scales of the expressions.
+.NH 3
+\fIexpression\fP \- \fIexpression\fP
+.PP
+The result is the difference of the two expressions.
+The scale of the result is the
+maximum of the scales of the expressions.
+.NH 2
+assignment operators
+.PP
+The assignment operators bind right to left.
+.NH 3
+\fInamed-expression\fP = \fIexpression\fP
+.PP
+This expression results in assigning the value of the expression
+on the right
+to the named expression on the left.
+.NH 3
+\fInamed-expression\fP =+ \fIexpression\fP
+.NH 3
+\fInamed-expression\fP =\- \fIexpression\fP
+.NH 3
+\fInamed-expression\fP =* \fIexpression\fP
+.NH 3
+\fInamed-expression\fP =/ \fIexpression\fP
+.NH 3
+\fInamed-expression\fP =% \fIexpression\fP
+.NH 3
+\fInamed-expression\fP =^ \fIexpression\fP
+.PP
+The result of the above expressions is equivalent
+to ``named expression = named expression OP expression'',
+where OP is the operator after the = sign.
+.NH 1
+Relations
+.PP
+Unlike all other operators, the relational operators
+are only valid as the object of an \fBif\fP, \fBwhile\fP,
+or inside a \fBfor\fP statement.
+.NH 2
+\fIexpression\fP < \fIexpression\fP
+.NH 2
+\fIexpression\fP > \fIexpression\fP
+.NH 2
+\fIexpression\fP <= \fIexpression\fP
+.NH 2
+\fIexpression\fP >= \fIexpression\fP
+.NH 2
+\fIexpression\fP == \fIexpression\fP
+.NH 2
+\fIexpression\fP != \fIexpression\fP
+.NH 1
+Storage classes
+.PP
+There are only two storage classes in BC, global and automatic
+(local).
+Only identifiers that are to be local to a function need be 
+declared with the \fBauto\fP command.
+The arguments to a function
+are local to the function.
+All other identifiers are assumed to be global
+and available to all functions.
+All identifiers, global and local, have initial values
+of zero.
+Identifiers declared as \fBauto\fP are allocated on entry to the function 
+and released on returning from the function.
+They therefore do not retain values between function calls.
+\fBauto\fP arrays are specified by the array name followed by empty square brackets.
+.PP
+Automatic variables in BC do not work in exactly the same way
+as in either C or PL/I.  On entry to a function, the old values of
+the names that appear as parameters and as automatic
+variables are pushed onto a stack.  
+Until return is made from the function, reference to these
+names refers only to the new values.
+.NH 1
+Statements
+.PP
+Statements must be separated by semicolon or newline.
+Except where altered by control statements, execution
+is sequential.
+.NH 2
+Expression statements
+.PP
+When a statement is an expression, unless
+the main operator is an assignment, the value
+of the expression is printed, followed by a newline character.
+.NH 2
+Compound statements
+.PP
+Statements may be grouped together and used when one statement is expected
+by surrounding them with { }.
+.NH 2
+Quoted string statements
+.PP
+"any string"
+.sp .5
+This statement prints the string inside the quotes.
+.NH 2
+If statements
+.sp .5
+\fBif\|(\|\fIrelation\fB\|)\|\fIstatement\fR
+.PP
+The substatement is executed if the relation is true.
+.NH 2
+While statements
+.sp .5
+\fBwhile\|(\|\fIrelation\fB\|)\|\fIstatement\fR
+.PP
+The statement is executed while the relation
+is true.
+The test occurs before each execution of the statement.
+.NH 2
+For statements
+.sp .5
+\fBfor\|(\|\fIexpression\fB; \fIrelation\fB; \fIexpression\fB\|)\|\fIstatement\fR
+.PP
+The for statement is the same as
+.nf
+.ft I
+       first-expression
+       \fBwhile\|(\fPrelation\|\fB) {\fP
+               statement
+               last-expression
+       }
+.ft R
+.fi
+.PP
+All three expressions must be present.
+.NH 2
+Break statements
+.sp .5
+\fBbreak\fP
+.PP
+\fBbreak\fP causes termination of a \fBfor\fP or \fBwhile\fP statement.
+.NH 2
+Auto statements
+.sp .5
+\fBauto \fIidentifier\fR\|[\|\fB,\fIidentifier\fR\|]
+.PP
+The auto statement causes the values of the identifiers to be pushed down.
+The identifiers can be ordinary identifiers or array identifiers.
+Array identifiers are specified by following the array name by empty square
+brackets.
+The auto statement must be the first statement
+in a function definition.
+.NH 2
+Define statements
+.sp .5
+.nf
+\fBdefine(\|\fR[\fIparameter\|\fR[\fB\|,\|\fIparameter\|.\|.\|.\|\fR]\|]\|\fB)\|{\fI
+       statements\|\fB}\fR
+.fi
+.PP
+The define statement defines a function.
+The parameters may
+be ordinary identifiers or array names.
+Array names must be followed by empty square brackets.
+.NH 2
+Return statements
+.sp .5
+\fBreturn\fP
+.sp .5
+\fBreturn(\fI\|expression\|\fB)\fR
+.PP
+The return statement causes termination of a function,
+popping of its auto variables, and
+specifies the result of the function.
+The first form is equivalent to \fBreturn(0)\fR.
+The result of the function is the result of the expression
+in parentheses.
+.NH 2
+Quit
+.PP
+The quit statement stops execution of a BC program and returns
+control to UNIX when it is first encountered.
+Because it is not treated as an executable statement,
+it cannot be used
+in a function definition or in an 
+.ft B
+if, for,
+.ft
+or
+.ft B
+while
+.ft
+statement.
diff --git a/usr/doc/cman b/usr/doc/cman

new file mode 100644 (file)

index 0000000..7c8fbf5
--- /dev/null
+++ b/usr/doc/cman
@@ -0,0 +1,2 @@
+Sorry, but for copyright reasons, the source
+for the C Reference Manual is not distributed.
diff --git a/usr/doc/dc b/usr/doc/dc

new file mode 100644 (file)

index 0000000..f77522e
--- /dev/null
+++ b/usr/doc/dc
@@ -0,0 +1,709 @@
+.RP
+....TM 75-1271-8 39199 39199-11
+.TL
+DC \- An Interactive Desk Calculator
+.AU "MH 2C-524" 3878
+Robert Morris
+.AU
+Lorinda Cherry
+.AI
+.MH
+.AB
+DC is an interactive desk calculator program implemented
+on the
+.UX
+time-sharing system to do arbitrary-precision
+integer arithmetic.
+It has provision for manipulating scaled fixed-point numbers and
+for input and output in bases other than decimal.
+.PP
+The size of numbers that can be manipulated is limited
+only by available core storage.
+On typical implementations of
+.UX ,
+the size of numbers that
+can be handled varies from several hundred digits on the smallest
+systems to several thousand on the largest.
+.AE
+.PP
+.SH
+.ND
+.PP
+DC is an arbitrary precision arithmetic package implemented
+on the
+.UX
+time-sharing system
+in the form of an interactive desk calculator.
+It works like a stacking calculator using reverse Polish notation.
+Ordinarily DC operates on decimal integers, but one may
+specify an input base, output base, and a number of fractional
+digits to be maintained.
+.PP
+A language called BC [1] has been developed which accepts
+programs written in the familiar style of higher-level
+programming languages and compiles output which is
+interpreted by DC.
+Some of the commands described below were designed
+for the compiler interface and are not easy for a human user
+to manipulate.
+.PP
+Numbers that are typed into DC are put on a push-down
+stack.
+DC commands work by taking the top number or two
+off the stack, performing the desired operation, and pushing the result
+on the stack.
+If an argument is given,
+input is taken from that file until its end,
+then from the standard input.
+.SH
+SYNOPTIC DESCRIPTION
+.PP
+Here we describe the DC commands that are intended
+for use by people.  The additional commands that are
+intended to be invoked by compiled output are
+described in the detailed description.
+.PP
+Any number of commands are permitted on a line.
+Blanks and new-line characters are ignored except within numbers
+and in places where a register name is expected.
+.PP
+The following constructions are recognized:
+.SH
+number
+.IP
+The value of the number is pushed onto the main stack.
+A number is an unbroken string of the digits 0-9
+and the capital letters A\-F which are treated as digits
+with values 10\-15 respectively.
+The number may be preceded by an underscore \*_ to input a
+negative number.
+Numbers may contain decimal points.
+.SH
++  \-  *  %  ^
+.IP
+The
+top two values on the stack are added
+(\fB+\fP),
+subtracted
+(\fB\-\fP),
+multiplied (\fB*\fP),
+divided (\fB/\fP),
+remaindered (\fB%\fP),
+or exponentiated (^).
+The two entries are popped off the stack;
+the result is pushed on the stack in their place.
+The result of a division is an integer truncated toward zero.
+See the detailed description below for the treatment of
+numbers with decimal points.
+An exponent must not have any digits after the decimal point.
+.SH
+s\fIx\fP
+.IP
+The
+top of the main stack is popped and stored into
+a register named \fIx\fP, where \fIx\fP may be any character.
+If
+the
+.ft B
+s
+.ft
+is capitalized,
+.ft I
+x
+.ft
+is treated as a stack and the value is pushed onto it.
+Any character, even blank or new-line, is a valid register name.
+.SH
+l\fIx\fP
+.IP
+The
+value in register
+.ft I
+x
+.ft
+is pushed onto the stack.
+The register
+.ft I
+x
+.ft
+is not altered.
+If the
+.ft B
+l
+.ft
+is capitalized,
+register
+.ft I
+x
+.ft
+is treated as a stack and its top value is popped onto the main stack.
+.LP
+All registers start with empty value which is treated as a zero
+by the command \fBl\fP and is treated as an error by the command \fBL\fP.
+.SH
+.SH
+d
+.IP
+The
+top value on the stack is duplicated.
+.SH
+p
+.IP
+The top value on the stack is printed.
+The top value remains unchanged.
+.SH
+f
+.IP
+All values on the stack and in registers are printed.
+.SH
+x
+.IP
+treats the top element of the stack as a character string,
+removes it from the stack, and
+executes it as a string of DC commands.
+.SH
+[ ... ]
+.IP
+puts the bracketed character string onto the top of the stack.
+.SH
+q
+.IP
+exits the program.
+If executing a string, the recursion level is
+popped by two.
+If
+.ft B
+q
+.ft
+is capitalized,
+the top value on the stack is popped and the string execution level is popped
+by that value.
+.SH
+<\fIx\fP  >\fIx\fP  =\fIx\fP  !<\fIx\fP  !>\fIx\fP  !=\fIx\fP
+.IP
+The
+top two elements of the stack are popped and compared.
+Register
+.ft I
+x
+.ft
+is executed if they obey the stated
+relation.
+Exclamation point is negation.
+.SH
+v
+.IP
+replaces the top element on the stack by its square root.
+The square root of an integer is truncated to an integer.
+For the treatment of numbers with decimal points, see
+the detailed description below.
+.SH
+!
+.IP
+interprets the rest of the line as a
+.UX
+command.
+Control returns to DC when the
+.UX
+command terminates.
+.SH
+c
+.IP
+All values on the stack are popped; the stack becomes empty.
+.SH
+i
+.IP
+The top value on the stack is popped and used as the
+number radix for further input.
+If \fBi\fP is capitalized, the value of
+the input base is pushed onto the stack.
+No mechanism has been provided for the input of arbitrary
+numbers in bases less than 1 or greater than 16.
+.SH
+o
+.IP
+The top value on the stack is popped and used as the
+number radix for further output.
+If \fBo\fP is capitalized, the value of the output
+base is pushed onto the stack.
+.SH
+k
+.IP
+The top of the stack is popped, and that value is used as
+a scale factor
+that influences the number of decimal places
+that are maintained during multiplication, division, and exponentiation.
+The scale factor must be greater than or equal to zero and
+less than 100.
+If \fBk\fP is capitalized, the value of the scale factor
+is pushed onto the stack.
+.SH
+z
+.IP
+The value of the stack level is pushed onto the stack.
+.SH
+?
+.IP
+A line of input is taken from the input source (usually the console)
+and executed.
+.SH
+DETAILED DESCRIPTION
+.SH
+Internal Representation of Numbers
+.PP
+Numbers are stored internally using a dynamic storage allocator.
+Numbers are kept in the form of a string
+of digits to the base 100 stored one digit per byte
+(centennial digits).
+The string is stored with the low-order digit at the
+beginning of the string.
+For example, the representation of 157
+is 57,1.
+After any arithmetic operation on a number, care is taken
+that all digits are in the range 0\-99 and that
+the number has no leading zeros.
+The number zero is represented by the empty string.
+.PP
+Negative numbers are represented in the 100's complement
+notation, which is analogous to two's complement notation for binary
+numbers.
+The high order digit of a negative number is always \-1
+and all other digits are in the range 0\-99.
+The digit preceding the high order \-1 digit is never a 99.
+The representation of \-157 is 43,98,\-1.
+We shall call this the canonical form of a number.
+The advantage of this kind of representation of negative
+numbers is ease of addition.  When addition is performed digit
+by digit, the result is formally correct.  The result need only
+be modified, if necessary, to put it into canonical form.
+.PP
+Because the largest valid digit is 99 and the byte can
+hold numbers twice that large, addition can be carried out
+and the handling of carries done later when
+that is convenient, as it sometimes is.
+.PP
+An additional byte is stored with each number beyond
+the high order digit to indicate the number of
+assumed decimal digits after the decimal point.  The representation
+of .001 is 1,\fI3\fP
+where the scale has been italicized to emphasize the fact that it
+is not the high order digit.
+The value of this extra byte is called the
+.ft B
+scale factor
+.ft
+of the number.
+.SH
+The Allocator
+.PP
+DC uses a dynamic string storage allocator
+for all of its internal storage.
+All reading and writing of numbers internally is done through
+the allocator.
+Associated with each string in the allocator is a four-word header containing pointers
+to the beginning of the string, the end of the string,
+the next place to write, and the next place to read.
+Communication between the allocator and DC
+is done via pointers to these headers.
+.PP
+The allocator initially has one large string on a list
+of free strings.  All headers except the one pointing
+to this string are on a list of free headers.
+Requests for strings are made by size.
+The size of the string actually supplied is the next higher
+power of 2.
+When a request for a string is made, the allocator
+first checks the free list to see if there is
+a string of the desired size.
+If none is found, the allocator finds the next larger free string and splits it repeatedly until
+it has a string of the right size.
+Left-over strings are put on the free list.
+If there are no larger strings,
+the allocator tries to coalesce smaller free strings into
+larger ones.
+Since all strings are the result
+of splitting large strings,
+each string has a neighbor that is next to it in core
+and, if free, can be combined with it to make a string twice as long.
+This is an implementation of the `buddy system' of allocation
+described in [2].
+.PP
+Failing to find a string of the proper length after coalescing,
+the allocator asks the system for more space.
+The amount of space on the system is the only limitation
+on the size and number of strings in DC.
+If at any time in the process of trying to allocate a string, the allocator runs out of
+headers, it also asks the system for more space.
+.PP
+There are routines in the allocator for reading, writing, copying, rewinding,
+forward-spacing, and backspacing strings.
+All string manipulation is done using these routines.
+.PP
+The reading and writing routines
+increment the read pointer or write pointer so that
+the characters of a string are read or written in
+succession by a series of read or write calls.
+The write pointer is interpreted as the end of the
+information-containing portion of a string and a call
+to read beyond that point returns an end-of-string indication.
+An attempt to write beyond the end of a string
+causes the allocator to
+allocate a larger space and then copy
+the old string into the larger block.
+.SH
+Internal Arithmetic
+.PP
+All arithmetic operations are done on integers.
+The operands (or operand) needed for the operation are popped
+from the main stack and their scale factors stripped off.
+Zeros are added or digits removed as necessary to get
+a properly scaled result from the internal arithmetic routine.
+For example, if the scale of the operands is different and decimal
+alignment is required, as it is for
+addition, zeros are appended to the operand with the smaller
+scale.
+After performing the required arithmetic operation,
+the proper scale factor is appended to the end of the number before
+it is pushed on the stack.
+.PP
+A register called \fBscale\fP plays a part
+in the results of most arithmetic operations.
+\fBscale\fP is the bound on the number of decimal places retained in
+arithmetic computations.
+\fBscale\fP may be set to the number on the top of the stack
+truncated to an integer with the \fBk\fP command.
+\fBK\fP may be used to push the value of \fBscale\fP on the stack.
+\fBscale\fP must be greater than or equal to 0 and less than 100.
+The descriptions of the individual arithmetic operations will
+include the exact effect of \fBscale\fP on the computations.
+.SH
+Addition and Subtraction
+.PP
+The scales of the two numbers are compared and trailing
+zeros are supplied to the number with the lower scale to give both
+numbers the same scale.  The number with the smaller scale is
+multiplied by 10 if the difference of the scales is odd.
+The scale of the result is then set to the larger of the scales
+of the two operands.
+.PP
+Subtraction is performed by negating the number
+to be subtracted and proceeding as in addition.
+.PP
+Finally, the addition is performed digit by digit from the
+low order end of the number.  The carries are propagated
+in the usual way.
+The resulting number is brought into canonical form, which may
+require stripping of leading zeros, or for negative numbers
+replacing the high-order configuration 99,\-1 by the digit \-1.
+In any case, digits which are not in the range 0\-99 must
+be brought into that range, propagating any carries or borrows
+that result.
+.SH
+Multiplication
+.PP
+The scales are removed from the two operands and saved.
+The operands are both made positive.
+Then multiplication is performed in
+a digit by digit manner that exactly mimics the hand method
+of multiplying.
+The first number is multiplied by each digit of the second
+number, beginning with its low order digit.  The intermediate
+products are accumulated into a partial sum which becomes the
+final product.
+The product is put into the canonical form and its sign is
+computed from the signs of the original operands.
+.PP
+The scale of the result is set equal to the sum
+of the scales of the two operands.
+If that scale is larger than the internal register
+.ft B
+scale
+.ft
+and also larger than both of the scales of the two operands,
+then the scale of the result is set equal to the largest
+of these three last quantities.
+.SH
+Division
+.PP
+The scales are removed from the two operands.
+Zeros are appended or digits removed from the dividend to make
+the scale of the result of the integer division equal to
+the internal quantity
+\fBscale\fP.
+The signs are removed and saved.
+.PP
+Division is performed much as it would be done by hand.
+The difference of the lengths of the two numbers
+is computed.
+If the divisor is longer than the dividend,
+zero is returned.
+Otherwise the top digit of the divisor is divided into the top
+two digits of the dividend.
+The result is used as the first (high-order) digit of the
+quotient.
+It may turn out be one unit too low, but if it is, the next
+trial quotient will be larger than 99 and this will be
+adjusted at the end of the process.
+The trial digit is multiplied by the divisor and the result subtracted
+from the dividend and the process is repeated to get
+additional quotient digits until the remaining
+dividend is smaller than the divisor.
+At the end, the digits of the quotient are put into
+the canonical form, with propagation of carry as needed.
+The sign is set from the sign of the operands.
+.SH
+Remainder
+.PP
+The division routine is called and division is performed
+exactly as described.  The quantity returned is the remains of the
+dividend at the end of the divide process.
+Since division truncates toward zero, remainders have the same
+sign as the dividend.
+The scale of the remainder is set to 
+the maximum of the scale of the dividend and
+the scale of the quotient plus the scale of the divisor.
+.SH
+Square Root
+.PP
+The scale is stripped from the operand.
+Zeros are added if necessary to make the
+integer result have a scale that is the larger of
+the internal quantity
+\fBscale\fP
+and the scale of the operand.
+.PP
+The method used to compute sqrt(y) is Newton's method
+with successive approximations by the rule
+.EQ
+x sub {n+1} ~=~ half ( x sub n + y over x sub n )
+.EN
+The initial guess is found by taking the integer square root
+of the top two digits.
+.SH
+Exponentiation
+.PP
+Only exponents with zero scale factor are handled.  If the exponent is
+zero, then the result is 1.  If the exponent is negative, then
+it is made positive and the base is divided into one.  The scale
+of the base is removed.
+.PP
+The integer exponent is viewed as a binary number.
+The base is repeatedly squared and the result is
+obtained as a product of those powers of the base that
+correspond to the positions of the one-bits in the binary
+representation of the exponent.
+Enough digits of the result
+are removed to make the scale of the result the same as if the
+indicated multiplication had been performed.
+.SH
+Input Conversion and Base
+.PP
+Numbers are converted to the internal representation as they are read
+in.
+The scale stored with a number is simply the number of fractional digits input.
+Negative numbers are indicated by preceding the number with a \fB\_\fP.
+The hexadecimal digits A\-F correspond to the numbers 10\-15 regardless of input base.
+The \fBi\fP command can be used to change the base of the input numbers.
+This command pops the stack, truncates the resulting number to an integer,
+and uses it as the input base for all further input.
+The input base is initialized to 10 but may, for example be changed to
+8 or 16 to do octal or hexadecimal to decimal conversions.
+The command \fBI\fP will push the value of the input base on the stack.
+.SH
+Output Commands
+.PP
+The command \fBp\fP causes the top of the stack to be printed.
+It does not remove the top of the stack.
+All of the stack and internal registers can be output
+by typing the command \fBf\fP.
+The \fBo\fP command can be used to change the output base.
+This command uses the top of the stack, truncated to an integer as
+the base for all further output.
+The output base in initialized to 10.
+It will work correctly for any base.
+The command \fBO\fP pushes the value of the output base on the stack.
+.SH
+Output Format and Base
+.PP
+The input and output bases only affect
+the interpretation of numbers on input and output; they have no
+effect on arithmetic computations.
+Large numbers are output with 70 characters per line;
+a \\ indicates a continued line.
+All choices of input and output bases work correctly, although not all are
+useful.
+A particularly useful output base is 100000, which has the effect of
+grouping digits in fives.
+Bases of 8 and 16 can be used for decimal-octal or decimal-hexadecimal
+conversions.
+.SH
+Internal Registers
+.PP
+Numbers or strings may be stored in internal registers or loaded on the stack
+from registers with the commands \fBs\fP and \fBl\fP.
+The command \fBs\fIx\fR pops the top of the stack and
+stores the result in register \fBx\fP.
+\fIx\fP can be any character.
+\fBl\fIx\fR puts the contents of register \fBx\fP on the top of the stack.
+The \fBl\fP command has no effect on the contents of register \fIx\fP.
+The \fBs\fP command, however, is destructive.
+.SH
+Stack Commands
+.PP
+The command \fBc\fP clears the stack.
+The command \fBd\fP pushes a duplicate of the number on the top of the stack
+on the stack.
+The command \fBz\fP pushes the stack size on the stack.
+The command \fBX\fP replaces the number on the top of the stack
+with its scale factor.
+The command \fBZ\fP replaces the top of the stack
+with its length.
+.SH
+Subroutine Definitions and Calls
+.PP
+Enclosing a string in \fB[]\fP pushes the ascii string on the stack.
+The \fBq\fP command quits or in executing a string, pops the recursion levels by two.
+.SH
+Internal Registers \- Programming DC
+.PP
+The load and store
+commands together with \fB[]\fP to store strings, \fBx\fP to execute
+and the testing commands `<', `>', `=', `!<', `!>', `!=' can be used to program
+DC.
+The \fBx\fP command assumes the top of the stack is an string of DC commands
+and executes it.
+The testing commands compare the top two elements on the stack and if the relation holds, execute the register
+that follows the relation.
+For example, to print the numbers 0-9,
+.DS
+[lip1+  si  li10>a]sa
+0si  lax
+.DE
+.SH
+Push-Down Registers and Arrays
+.PP
+These commands were designed for used by a compiler, not by
+people.
+They involve push-down registers and arrays.
+In addition to the stack that commands work on, DC can be thought
+of as having individual stacks for each register.
+These registers are operated on by the commands \fBS\fP and \fBL\fP.
+\fBS\fIx\fR pushes the top value of the main stack onto the stack for
+the register \fIx\fP.
+\fBL\fIx\fR pops the stack for register \fIx\fP and puts the result on the main
+stack.
+The commands \fBs\fP and \fBl\fP also work on registers but not as push-down
+stacks.
+\fBl\fP doesn't effect the top of the
+register stack, and \fBs\fP destroys what was there before.
+.PP
+The commands to work on arrays are \fB:\fP and \fB;\fP.
+\fB:\fIx\fR pops the stack and uses this value as an index into
+the array \fIx\fP.
+The next element on the stack is stored at this index in \fIx\fP.
+An index must be greater than or equal to 0 and
+less than 2048.
+\fB;\fIx\fR is the command to load the main stack from the array \fIx\fP.
+The value on the top of the stack is the index
+into the array \fIx\fP of the value to be loaded.
+.SH
+Miscellaneous Commands
+.PP
+The command \fB!\fP interprets the rest of the line as a 
+.UX
+ command and passes
+it to 
+.UX
+to execute.
+One other compiler command is \fBQ\fP.
+This command uses the top of the stack as the number of levels of recursion to skip.
+.SH
+DESIGN CHOICES
+.PP
+The real reason for the use of a dynamic storage allocator was
+that a general purpose program could be (and in fact has been)
+used for a variety of other tasks.
+The allocator has some value for input and for compiling (i.e.
+the bracket [...] commands) where it cannot be known in advance
+how long a string will be.
+The result was that at a modest
+cost in execution time, all considerations of string allocation
+and sizes of strings were removed from the remainder of the program
+and debugging was made easier.  The allocation method
+used wastes approximately 25% of available space.
+.PP
+The choice of 100 as a base for internal arithmetic
+seemingly has no compelling advantage.  Yet the base cannot
+exceed 127 because of hardware limitations and at the cost
+of 5% in space, debugging was made a great deal easier and
+decimal output was made much faster.
+.PP
+The reason for a stack-type arithmetic design was
+to permit all DC commands from addition to subroutine execution
+to be implemented in essentially the same way.  The result
+was a considerable degree of logical separation of the final
+program into modules with very little communication between
+modules.
+.PP
+The rationale for the lack of interaction between the scale and the bases
+was to provide an understandable means of proceeding after
+a change of base or scale when numbers had already been entered.
+An earlier implementation which had global notions of
+scale and base did not work out well.
+If the value of
+.ft B
+scale
+.ft
+were to be interpreted in the current
+input or output base,
+then a change of base or scale in the midst of a
+computation would cause great confusion in the interpretation
+of the results.
+The current scheme has the advantage that the value of
+the input and output bases
+are only used for input and output, respectively, and they
+are ignored in all other operations.
+The value of
+scale
+is not used for any essential purpose by any part of the program
+and it is used only to prevent the number of
+decimal places resulting from the arithmetic operations from
+growing beyond all bounds.
+.PP
+The design rationale for the choices for the scales of
+the results of arithmetic were that in no case should
+any significant digits be thrown away if, on appearances, the
+user actually wanted them.  Thus, if the user wants
+to add the numbers 1.5 and 3.517, it seemed reasonable to give
+him the result 5.017 without requiring him to unnecessarily
+specify his rather obvious requirements for precision.
+.PP
+On the other hand, multiplication and exponentiation produce
+results with many more digits than their operands and it
+seemed reasonable to give as a minimum the number of decimal
+places in the operands but not to give more than that
+number of digits
+unless the user asked for them by specifying a value for \fBscale\fP.
+Square root can be handled in just the same way as multiplication.
+The operation of division gives arbitrarily many decimal places
+and there is simply no way to guess how many places the user
+wants.
+In this case only, the user must
+specify a \fBscale\fP to get any decimal places at all.
+.PP
+The scale of remainder was chosen to make it possible
+to recreate the dividend from the quotient and remainder.
+This is easy to implement; no digits are thrown away.
+.SH
+References
+.IP [1]
+L. L. Cherry, R. Morris,
+.ft I
+BC \- An Arbitrary Precision Desk-Calculator Language.
+.ft
+.IP [2]
+K. C. Knowlton,
+.ft I
+A Fast Storage Allocator,
+.ft
+Comm. ACM \fB8\fP, pp. 623-625 (Oct. 1965).
author	Bill Joy <wnj@ucbvax.Berkeley.EDU>
	Wed, 15 Oct 1980 20:03:55 +0000 (12:03 -0800)
committer	Bill Joy <wnj@ucbvax.Berkeley.EDU>
	Wed, 15 Oct 1980 20:03:55 +0000 (12:03 -0800)
usr/doc/README	[new file with mode: 0644]	patch \| blob
usr/doc/REFS	[new file with mode: 0644]	patch \| blob
usr/doc/assembler	[new file with mode: 0644]	patch \| blob
usr/doc/awk	[new file with mode: 0644]	patch \| blob
usr/doc/bc	[new file with mode: 0644]	patch \| blob
usr/doc/cman	[new file with mode: 0644]	patch \| blob
usr/doc/dc	[new file with mode: 0644]	patch \| blob