[unix-history] / usr / doc / ctour / newstuff

.na
.ce
C Changes

1.  Long integers

The compiler implements 32-bit integers.
The associated type keyword is `long'.
The word can act rather like an adjective in that
`long int' means a 32-bit integer and `long float'
means the same as `double.'
But plain `long' is a long integer.
Essentially all operations on longs are implemented except that
assignment-type operators do not have values, so
l1+(l2=+l3) won't work.
Neither will l1 = l2 = 0.

Long constants are written with a terminating `l' or `L'.
E.g. "123L" or "0177777777L" or "0X56789abcdL".
The latter is a hex constant, which could also have been short;
it is marked by starting with "0X".
Every fixed decimal constant larger than 32767 is taken to
be long, and so are octal or hex constants larger than
0177777 (0Xffff, or 0xFFFF if you like).
A warning is given in such a case since this is actually
an incompatibility with the older compiler.
Where the constant is just used as an initializer or
assigned to something it doesn't matter.
If it is passed to a subroutine
then the routine will not get what it expected.

When a short and a long integer are
operands of an arithmetic operator,
the short is converted to long (with sign extension).
This is true also when a short is assigned to a long.
When a long is assigned to a short integer it
is truncated at the high order end with no notice
of possible loss of significant digits.
This is true as well when a long is added to a pointer
(which includes its usage as a subscript).
The conversion rules for expressions involving
doubles and floats mixed with longs
are the same as those for short integers,
.ul
mutatis mutandis.

A point to note is that constant expressions involving
longs are not evaluated at compile time,
and may not be used where constants are expected.
Thus

	long x {5000L*5000L};

is illegal;

	long x {5000*5000};

is legal but wrong because the high-order part is lost;
but both

	long x 25000000L;

and

	long x 25.e6;

are correct
and have the same meaning
because the double constant is converted to long at compile time.

2.  Unsigned integers

A new fundamental data type with keyword `unsigned,' is
available.  It may be used alone:

        unsigned u;

or as an adjective with `int'

        unsigned int u;

with the same meaning.  There are not yet (or possibly ever)
unsigned longs or chars.  The meaning of an unsigned variable is
that of an integer modulo 2^n, where n is 16 on the PDP-11.  All
operators whose operands are unsigned produce results consistent
with this interpretation except division and remainder where the
divisor is larger than 32767; then the result is incorrect.  The
dividend in an unsigned division may however have any value (i.e.
up to 65535) with correct results.  Right shifts of unsigned
quantities are guaranteed to be logical shifts.

When an ordinary integer and an unsigned integer are combined
then the ordinary integer is mapped into an integer mod 2^16 and
the result is unsigned.  Thus, for example `u = -1' results in
assigning 65535 to u.  This is mathematically reasonable, and
also happens to involve no run-time overhead.

When an unsigned integer is assigned to a plain integer, an
(undiagnosed) overflow occurs when the unsigned integer exceeds
2^15-1.

It is intended that unsigned integers be used in contexts where
previously character pointers were used (artificially and
nonportably) to represent unsigned integers.

3.  Block structure.

A sequence of declarations may now appear at the beginning of any
compound statement in {}.  The variables declared thereby are
local to the compound statement.  Any declarations of the same
name existing before the block was entered are pushed down for
the duration of the block.  Just as in functions, as before, auto
variables disappear and lose their values when the block is left;
static variables retain their values.  Also according to the same
rules as for the declarations previously allowed at the start of
functions, if no storage class is mentioned in a declaration the
default is automatic.

Implementation of inner-block declarations is such that there is
no run-time cost associated with using them.

4.  Initialization (part 1)

This compiler properly handles initialization of structures
so the construction

 	struct { char name[8]; char type; float val; } x
 		{ "abc", 'a', 123.4 };

compiles correctly.
In particular it is recognized that the string is supposed
to fill an 8-character array, the `a' goes into a character,
and that the 123.4 must be rounded and placed in a single-precision
cell.
Structures of arrays, arrays of structures, and the like all work;
a more formal description of what is done follows.

<initializer> ::= <element>

<element> ::= <expression> | <element> , <element> |
 		{ <element> } | { <element> , }

An element is an expression or a comma-separated sequence of
elements possibly enclosed in braces.
In a brace-enclosed
sequence, a comma is optional after the last element.
This very
ambiguous definition is parsed as described below.
"Expression"
must of course be a constant expression within the previous
meaning of the Act.

An initializer for a non-structured scalar is an element with
exactly one expression in it.

An "aggregate" is a structure or an array.
If the initializer
for an aggregate begins with a left brace, then the succeeding
comma-separated sequence of elements initialize the members of
the aggregate.
It is erroneous for the number of members in the
sequence to exceed the number of elements in the aggregate.
If
the sequence has too few members the aggregate is padded.

If the initializer for an aggregate does not begin with a left
brace, then the members of the aggregate are initialized with
successive elements from the succeeding comma-separated sequence.
If the sequence terminates before the aggregate is filled the
aggregate is padded.

The "top level" initializer is the object which initializes an
external object itself, as opposed to one of its members.
The
top level initializer for an aggregate must begin with a left
brace.

If the top-level object being initialized is an array and if its
size is omitted in the declaration, e.g. "int a[]", then the size
is calculated from the number of elements which initialized it.

Short of complete assimilation of this description, there are two
simple approaches to the initialization of complicated objects.
First, observe that it is always legal to initialize any object
with a comma-separated sequence of expressions.
The members of
every structure and array are stored in a specified order, so the
expressions which initialize these members may if desired be laid
out in a row to successively, and recursively, initialize the
members.

Alternatively, the sequences of expressions which initialize
arrays or structures may uniformly be enclosed in braces.

5.  Initialization (part 2)

Declarations, whether external, at the head of functions, or
in inner blocks may have initializations whose syntax is the same
as previous external declarations with initializations.  The only
restrictions are that automatic structures and arrays may not be
initialized (they can't be assigned either); nor, for the moment
at least, may external variables when declared inside a function.

The declarations and initializations should be thought of as
occurring in lexical order so that forward references in
initializations are unlikely to work.  E.g.,

        { int a a;
          int b c;
          int c 5;
          ...
        }

Here a is initialized by itself (and its value is thus
undefined); b is initialized with the old value of c (which is
either undefined or any c declared in an outer block).

6.  Bit fields

A declarator inside a structure may have the form

	<declarator> : <constant>

which specifies that the object declared is stored in a field
the number of bits in which is specified by the constant.
If several such things are stacked up next to each other
then the compiler allocates the fields from right to left,
going to the next word
when the new field will not fit.
The declarator may also have the form

	: <constant>

which allocates an unnamed field to simplify accurate
modelling of things like hardware formats where there are unused
fields.
Finally,

	: 0

means to force the next field to start on a word boundary.

The types of bit fields can be only "int" or "char".
The only difference between the two
is in the alignment and length restrictions:
no int field can be longer than 16 bits, nor any char longer
than 8 bits.
If a char field will not fit into the current character,
then it is moved up to the next character boundary.

Both int and char fields
are taken to be unsigned (non-negative)
integers.

Bit-field variables are not quite full-class citizens.
Although most operators can be applied to them,
including assignment operators,
they do not have addresses (i.e. there are no bit pointers)
so the unary & operator cannot be applied to them.
For essentially this reason there are no arrays of bit field
variables.

There are three twoes in the implementation:
addition (=+) applied to fields
can result in an overflow into the next field;
it is not possible to initialize bit fields.

7.  Macro preprocessor

The proprocessor handles `define' statements with formal arguments.
The line

	#define macro(a1,...,an) ...a1...an...

is recognized by the presence of a left parenthesis
following the defined name.
When the form

	macro(b1,...,bn)

is recognized in normal C program text,
it is replaced by the definition, with the corresponding
.ul
bi
actual argument string substituted for the corresponding
.ul
ai
formal arguments.
Both actual and formal arguments are separated by
commas not included in parentheses; the formal arguments
have the syntax of names.

Macro expansions are no longer surrounded by spaces.
Lines in which a replacement has taken place are rescanned until
no macros remain.

The preprocessor has a rudimentary conditional facility.
A line of the form

	#ifdef name

is ignored if
`name' is defined to the preprocessor
(i.e. was the subject of a `define' line).
If name is not defined then all lines through
a line of the form

	#endif

are ignored.
A corresponding
form is

	#ifndef name
 	...
 	#endif

which ignores the intervening lines unless `name' is defined.
The name `unix' is predefined and replaced by itself
to aid writers of C programs which are expected to be transported
to other machines with C compilers.

In connection with this, there is a new option to the cc command:

	cc -Dname

which causes `name' to be defined to the preprocessor (and replaced by
itself).
This can be used together with conditional preprocessor
statements to select variant versions of a program at compile time.

The previous two facilities (macros with arguments, conditional
compilation)
were actually available in the 6th Edition system, but
undocumented.
New in this release of the cc command is the ability to
nest `include' files.
Preprocessor include lines may have the new form

	#include <file>

where the angle brackets replace double quotes.
In this case, the file name is prepended with a standard prefix,
namely `/usr/include'.
In is intended that commonly-used include files be placed
in this directory;
the convention reduces the dependence on system-specific
naming conventions.
The standard prefix can be replaced by
the cc command option `-I':

	cc -Iotherdirectory

8.  Registers

A formal argument may be given the storage class `register.'
When this occurs the save sequence copies it
from the place
the caller left it into a fast register;
all usual restrictions on its use are the same
as for ordinary register variables.

Now any variable inside a function may be declared `register;'
if the type is unsuitable, or if
there are more than three register declarations,
then the compiler makes it `auto' instead.
The restriction that the & operator may not be applied
to a register remains.

9.  Mode declarations

A declaration of the form

	typedef\b\b\b\b\b\b\b_______ type-specifier declarator ;\b_

makes the name given in the declarator into the equivalent
of a keyword specifying the type which the name would have
in an ordinary declaration.
Thus

	typedef int *iptr;

makes `iptr' usable in declarations of pointers to integers;
subsequently the declarations

	iptr ip;
.br
	int *ip;

would mean the same thing.
Type names introduced in this way
obey the same scope rules as ordinary variables.
The facility is new, experimental, and probably buggy.

10. Restrictions

The compiler is somewhat stickier about
some constructions that used to be accepted.

One difference is that external declarations made inside
functions are remembered to the end of the file,
that is even past the end of the function.
The most frequent problem that this causes is that
implicit declaration of a function as an integer in one
routine,
and subsequent explicit declaration
of it as another type,
is not allowed.
This turned out to affect
several source programs
distributed with the system.

It is now required that all forward references to labels
inside a function be the subject of a `goto.'
This has turned out to affect mainly people who
pass a label to the routine `setexit.'
In fact a routine is supposed to be passed here,
and why a label worked I do not know.

In general this compiler makes it more difficult
to use label variables.
Think of this as a contribution to structured programming.

The compiler now checks multiple declarations of the same name
more carefully for consistency.
It used to be possible to declare the same name to
be a pointer to different structures;
this is caught.
So too are declarations of the same array as having different
sizes.
The exception is that array declarations with empty brackets
may be used in conjunction with a declaration with a specified size.
Thus

	int a[];
	int a[50];

is acceptable (in either order).

An external array all of whose definitions
involve empty brackets is diagnosed as `undefined'
by the loader;
it used to be taken as having 1 element.
Commit	Line	Data
05841234 TL	1	.na
	2	.ce
	3	C Changes
	4
	5	1. Long integers
	6
	7	The compiler implements 32-bit integers.
	8	The associated type keyword is `long'.
	9	The word can act rather like an adjective in that
	10	`long int' means a 32-bit integer and `long float'
	11	means the same as `double.'
	12	But plain `long' is a long integer.
	13	Essentially all operations on longs are implemented except that
	14	assignment-type operators do not have values, so
	15	l1+(l2=+l3) won't work.
	16	Neither will l1 = l2 = 0.
	17
	18	Long constants are written with a terminating `l' or `L'.
	19	E.g. "123L" or "0177777777L" or "0X56789abcdL".
	20	The latter is a hex constant, which could also have been short;
	21	it is marked by starting with "0X".
	22	Every fixed decimal constant larger than 32767 is taken to
	23	be long, and so are octal or hex constants larger than
	24	0177777 (0Xffff, or 0xFFFF if you like).
	25	A warning is given in such a case since this is actually
	26	an incompatibility with the older compiler.
	27	Where the constant is just used as an initializer or
	28	assigned to something it doesn't matter.
	29	If it is passed to a subroutine
	30	then the routine will not get what it expected.
	31
	32	When a short and a long integer are
	33	operands of an arithmetic operator,
	34	the short is converted to long (with sign extension).
	35	This is true also when a short is assigned to a long.
	36	When a long is assigned to a short integer it
	37	is truncated at the high order end with no notice
	38	of possible loss of significant digits.
	39	This is true as well when a long is added to a pointer
	40	(which includes its usage as a subscript).
	41	The conversion rules for expressions involving
	42	doubles and floats mixed with longs
	43	are the same as those for short integers,
	44	.ul
	45	mutatis mutandis.
	46
	47	A point to note is that constant expressions involving
	48	longs are not evaluated at compile time,
	49	and may not be used where constants are expected.
	50	Thus
	51
	52	long x {5000L*5000L};
	53
	54	is illegal;
	55
	56	long x {5000*5000};
	57
	58	is legal but wrong because the high-order part is lost;
	59	but both
	60
	61	long x 25000000L;
	62
	63	and
	64
65	long x 25.e6;
66
67	are correct
68	and have the same meaning
69	because the double constant is converted to long at compile time.
70
71	2. Unsigned integers
72
73	A new fundamental data type with keyword `unsigned,' is
74	available. It may be used alone:
75
76	unsigned u;
77
78	or as an adjective with `int'
79
80	unsigned int u;
81
82	with the same meaning. There are not yet (or possibly ever)
83	unsigned longs or chars. The meaning of an unsigned variable is
84	that of an integer modulo 2^n, where n is 16 on the PDP-11. All
85	operators whose operands are unsigned produce results consistent
86	with this interpretation except division and remainder where the
87	divisor is larger than 32767; then the result is incorrect. The
88	dividend in an unsigned division may however have any value (i.e.
89	up to 65535) with correct results. Right shifts of unsigned
90	quantities are guaranteed to be logical shifts.
91
92	When an ordinary integer and an unsigned integer are combined
93	then the ordinary integer is mapped into an integer mod 2^16 and
94	the result is unsigned. Thus, for example `u = -1' results in
95	assigning 65535 to u. This is mathematically reasonable, and
96	also happens to involve no run-time overhead.
97
98	When an unsigned integer is assigned to a plain integer, an
99	(undiagnosed) overflow occurs when the unsigned integer exceeds
100	2^15-1.
101
102	It is intended that unsigned integers be used in contexts where
103	previously character pointers were used (artificially and
104	nonportably) to represent unsigned integers.
105
106	3. Block structure.
107
108	A sequence of declarations may now appear at the beginning of any
109	compound statement in {}. The variables declared thereby are
110	local to the compound statement. Any declarations of the same
111	name existing before the block was entered are pushed down for
112	the duration of the block. Just as in functions, as before, auto
113	variables disappear and lose their values when the block is left;
114	static variables retain their values. Also according to the same
115	rules as for the declarations previously allowed at the start of
116	functions, if no storage class is mentioned in a declaration the
117	default is automatic.
118
119	Implementation of inner-block declarations is such that there is
120	no run-time cost associated with using them.
121
122	4. Initialization (part 1)
123
124	This compiler properly handles initialization of structures
125	so the construction
126
127	struct { char name[8]; char type; float val; } x
128	{ "abc", 'a', 123.4 };
129
130	compiles correctly.
131	In particular it is recognized that the string is supposed
132	to fill an 8-character array, the `a' goes into a character,
133	and that the 123.4 must be rounded and placed in a single-precision
134	cell.
135	Structures of arrays, arrays of structures, and the like all work;
136	a more formal description of what is done follows.
137
138	<initializer> ::= <element>
139
140	<element> ::= <expression> \| <element> , <element> \|
141	{ <element> } \| { <element> , }
142
143	An element is an expression or a comma-separated sequence of
144	elements possibly enclosed in braces.
145	In a brace-enclosed
146	sequence, a comma is optional after the last element.
147	This very
148	ambiguous definition is parsed as described below.
149	"Expression"
150	must of course be a constant expression within the previous
151	meaning of the Act.
152
153	An initializer for a non-structured scalar is an element with
154	exactly one expression in it.
155
156	An "aggregate" is a structure or an array.
157	If the initializer
158	for an aggregate begins with a left brace, then the succeeding
159	comma-separated sequence of elements initialize the members of
160	the aggregate.
161	It is erroneous for the number of members in the
162	sequence to exceed the number of elements in the aggregate.
163	If
164	the sequence has too few members the aggregate is padded.
165
166	If the initializer for an aggregate does not begin with a left
167	brace, then the members of the aggregate are initialized with
168	successive elements from the succeeding comma-separated sequence.
169	If the sequence terminates before the aggregate is filled the
170	aggregate is padded.
171
172	The "top level" initializer is the object which initializes an
173	external object itself, as opposed to one of its members.
174	The
175	top level initializer for an aggregate must begin with a left
176	brace.
177
178	If the top-level object being initialized is an array and if its
179	size is omitted in the declaration, e.g. "int a[]", then the size
180	is calculated from the number of elements which initialized it.
181
182	Short of complete assimilation of this description, there are two
183	simple approaches to the initialization of complicated objects.
184	First, observe that it is always legal to initialize any object
185	with a comma-separated sequence of expressions.
186	The members of
187	every structure and array are stored in a specified order, so the
188	expressions which initialize these members may if desired be laid
189	out in a row to successively, and recursively, initialize the
190	members.
191
192	Alternatively, the sequences of expressions which initialize
193	arrays or structures may uniformly be enclosed in braces.
194
195	5. Initialization (part 2)
196
197	Declarations, whether external, at the head of functions, or
198	in inner blocks may have initializations whose syntax is the same
199	as previous external declarations with initializations. The only
200	restrictions are that automatic structures and arrays may not be
201	initialized (they can't be assigned either); nor, for the moment
202	at least, may external variables when declared inside a function.
203
204	The declarations and initializations should be thought of as
205	occurring in lexical order so that forward references in
206	initializations are unlikely to work. E.g.,
207
208	{ int a a;
209	int b c;
210	int c 5;
211	...
212	}
213
214	Here a is initialized by itself (and its value is thus
215	undefined); b is initialized with the old value of c (which is
216	either undefined or any c declared in an outer block).
217
218	6. Bit fields
219
220	A declarator inside a structure may have the form
221
222	<declarator> : <constant>
223
224	which specifies that the object declared is stored in a field
225	the number of bits in which is specified by the constant.
226	If several such things are stacked up next to each other
227	then the compiler allocates the fields from right to left,
228	going to the next word
229	when the new field will not fit.
230	The declarator may also have the form
231
232	: <constant>
233
234	which allocates an unnamed field to simplify accurate
235	modelling of things like hardware formats where there are unused
236	fields.
237	Finally,
238
239	: 0
240
241	means to force the next field to start on a word boundary.
242
243	The types of bit fields can be only "int" or "char".
244	The only difference between the two
245	is in the alignment and length restrictions:
246	no int field can be longer than 16 bits, nor any char longer
247	than 8 bits.
248	If a char field will not fit into the current character,
249	then it is moved up to the next character boundary.
250
251	Both int and char fields
252	are taken to be unsigned (non-negative)
253	integers.
254
255	Bit-field variables are not quite full-class citizens.
256	Although most operators can be applied to them,
257	including assignment operators,
258	they do not have addresses (i.e. there are no bit pointers)
259	so the unary & operator cannot be applied to them.
260	For essentially this reason there are no arrays of bit field
261	variables.
262
263	There are three twoes in the implementation:
264	addition (=+) applied to fields
265	can result in an overflow into the next field;
266	it is not possible to initialize bit fields.
267
268	7. Macro preprocessor
269
270	The proprocessor handles `define' statements with formal arguments.
271	The line
272
273	#define macro(a1,...,an) ...a1...an...
274
275	is recognized by the presence of a left parenthesis
276	following the defined name.
277	When the form
278
279	macro(b1,...,bn)
280
281	is recognized in normal C program text,
282	it is replaced by the definition, with the corresponding
283	.ul
284	bi
285	actual argument string substituted for the corresponding
286	.ul
287	ai
288	formal arguments.
289	Both actual and formal arguments are separated by
290	commas not included in parentheses; the formal arguments
291	have the syntax of names.
292
293	Macro expansions are no longer surrounded by spaces.
294	Lines in which a replacement has taken place are rescanned until
295	no macros remain.
296
297	The preprocessor has a rudimentary conditional facility.
298	A line of the form
299
300	#ifdef name
301
302	is ignored if
303	`name' is defined to the preprocessor
304	(i.e. was the subject of a `define' line).
305	If name is not defined then all lines through
306	a line of the form
307
308	#endif
309
310	are ignored.
311	A corresponding
312	form is
313
314	#ifndef name
315	...
316	#endif
317
318	which ignores the intervening lines unless `name' is defined.
319	The name `unix' is predefined and replaced by itself
320	to aid writers of C programs which are expected to be transported
321	to other machines with C compilers.
322
323	In connection with this, there is a new option to the cc command:
324
325	cc -Dname
326
327	which causes `name' to be defined to the preprocessor (and replaced by
328	itself).
329	This can be used together with conditional preprocessor
330	statements to select variant versions of a program at compile time.
331
332	The previous two facilities (macros with arguments, conditional
333	compilation)
334	were actually available in the 6th Edition system, but
335	undocumented.
336	New in this release of the cc command is the ability to
337	nest `include' files.
338	Preprocessor include lines may have the new form
339
340	#include <file>
341
342	where the angle brackets replace double quotes.
343	In this case, the file name is prepended with a standard prefix,
344	namely `/usr/include'.
345	In is intended that commonly-used include files be placed
346	in this directory;
347	the convention reduces the dependence on system-specific
348	naming conventions.
349	The standard prefix can be replaced by
350	the cc command option `-I':
351
352	cc -Iotherdirectory
353
354	8. Registers
355
356	A formal argument may be given the storage class `register.'
357	When this occurs the save sequence copies it
358	from the place
359	the caller left it into a fast register;
360	all usual restrictions on its use are the same
361	as for ordinary register variables.
362
363	Now any variable inside a function may be declared `register;'
364	if the type is unsuitable, or if
365	there are more than three register declarations,
366	then the compiler makes it `auto' instead.
367	The restriction that the & operator may not be applied
368	to a register remains.
369
370	9. Mode declarations
371
372	A declaration of the form
373
374	typedef\b\b\b\b\b\b\b_______ type-specifier declarator ;\b_
375
376	makes the name given in the declarator into the equivalent
377	of a keyword specifying the type which the name would have
378	in an ordinary declaration.
379	Thus
380
381	typedef int *iptr;
382
383	makes `iptr' usable in declarations of pointers to integers;
384	subsequently the declarations
385
386	iptr ip;
387	.br
388	int *ip;
389
390	would mean the same thing.
391	Type names introduced in this way
392	obey the same scope rules as ordinary variables.
393	The facility is new, experimental, and probably buggy.
394
395	10. Restrictions
396
397	The compiler is somewhat stickier about
398	some constructions that used to be accepted.
399
400	One difference is that external declarations made inside
401	functions are remembered to the end of the file,
402	that is even past the end of the function.
403	The most frequent problem that this causes is that
404	implicit declaration of a function as an integer in one
405	routine,
406	and subsequent explicit declaration
407	of it as another type,
408	is not allowed.
409	This turned out to affect
410	several source programs
411	distributed with the system.
412
413	It is now required that all forward references to labels
414	inside a function be the subject of a `goto.'
415	This has turned out to affect mainly people who
416	pass a label to the routine `setexit.'
417	In fact a routine is supposed to be passed here,
418	and why a label worked I do not know.
419
420	In general this compiler makes it more difficult
421	to use label variables.
422	Think of this as a contribution to structured programming.
423
424	The compiler now checks multiple declarations of the same name
425	more carefully for consistency.
426	It used to be possible to declare the same name to
427	be a pointer to different structures;
428	this is caught.
429	So too are declarations of the same array as having different
430	sizes.
431	The exception is that array declarations with empty brackets
432	may be used in conjunction with a declaration with a specified size.
433	Thus
434
435	int a[];
436	int a[50];
437
438	is acceptable (in either order).
439
440	An external array all of whose definitions
441	involve empty brackets is diagnosed as `undefined'
442	by the loader;
443	it used to be taken as having 1 element.