[unix-history] / share / doc / smm / 15.net / a.t

.\" Copyright (c) 1983, 1986 The Regents of the University of California.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\"    notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\"    notice, this list of conditions and the following disclaimer in the
.\"    documentation and/or other materials provided with the distribution.
.\" 3. All advertising materials mentioning features or use of this software
.\"    must display the following acknowledgement:
.\"	This product includes software developed by the University of
.\"	California, Berkeley and its contributors.
.\" 4. Neither the name of the University nor the names of its contributors
.\"    may be used to endorse or promote products derived from this software
.\"    without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
.\" SUCH DAMAGE.
.\"
.\"	@(#)a.t	6.5 (Berkeley) 4/17/91
.\"
.nr H2 1
.\".ds RH "Gateways and routing
.br
.ne 2i
.NH
\s+2Gateways and routing issues\s0
.PP
The system has been designed with the expectation that it will
be used in an internetwork environment.  The ``canonical''
environment was envisioned to be a collection of local area
networks connected at one or more points through hosts with
multiple network interfaces (one on each local area network),
and possibly a connection to a long haul network (for example,
the ARPANET).  In such an environment, issues of
gatewaying and packet routing become very important.  Certain
of these issues, such as congestion
control, have been handled in a simplistic manner or specifically
not addressed.
Instead, where possible, the network system
attempts to provide simple mechanisms upon which more involved
policies may be implemented.  As some of these problems become
better understood, the solutions developed will be incorporated
into the system.
.PP
This section will describe the facilities provided for packet
routing.  The simplistic mechanisms provided for congestion
control are described in chapter 12.
.NH 2
Routing tables
.PP
The network system maintains a set of routing tables for
selecting a network interface to use in delivering a 
packet to its destination.  These tables are of the form:
.DS
.ta \w'struct   'u +\w'u_long   'u +\w'sockaddr rt_gateway;    'u
struct rtentry {
	u_long	rt_hash;		/* hash key for lookups */
	struct	sockaddr rt_dst;	/* destination net or host */
	struct	sockaddr rt_gateway;	/* forwarding agent */
	short	rt_flags;		/* see below */
	short	rt_refcnt;		/* no. of references to structure */
	u_long	rt_use;			/* packets sent using route */
	struct	ifnet *rt_ifp;		/* interface to give packet to */
};
.DE
.PP
The routing information is organized in two separate tables, one
for routes to a host and one for routes to a network.  The
distinction between hosts and networks is necessary so
that a single mechanism may be used
for both broadcast and multi-drop type networks, and
also for networks built from point-to-point links (e.g
DECnet [DEC80]).
.PP
Each table is organized as a hashed set of linked lists.
Two 32-bit hash values are calculated by routines defined for
each address family; one based on the destination being
a host, and one assuming the target is the network portion
of the address.  Each hash value is used to
locate a hash chain to search (by taking the value modulo the
hash table size) and the entire 32-bit value is then
used as a key in scanning the list of routes.  Lookups are
applied first to the routing
table for hosts, then to the routing table for networks.
If both lookups fail, a final lookup is made for a ``wildcard''
route (by convention, network 0).
The first appropriate route discovered is used.
By doing this, routes to a specific host on a network may be
present as well as routes to the network.  This also allows a
``fall back'' network route to be defined to a ``smart'' gateway
which may then perform more intelligent routing.
.PP
Each routing table entry contains a destination (the desired final destination),
a gateway to which to send the packet,
and various flags which indicate the route's status and type (host or
network).  A count
of the number of packets sent using the route is kept, along
with a count of ``held references'' to the dynamically
allocated structure to insure that memory reclamation
occurs only when the route is not in use.  Finally, a pointer to the
a network interface is kept; packets sent using
the route should be handed to this interface.
.PP
Routes are typed in two ways: either as host or network, and as
``direct'' or ``indirect''.  The host/network
distinction determines how to compare the \fIrt_dst\fP field
during lookup.  If the route is to a network, only a packet's
destination network is compared to the \fIrt_dst\fP entry stored
in the table.  If the route is to a host, the addresses must
match bit for bit.
.PP
The distinction between ``direct'' and ``indirect'' routes indicates
whether the destination is directly connected to the source.
This is needed when performing local network encapsulation.  If
a packet is destined for a peer at a host or network which is
not directly connected to the source, the internetwork packet
header will
contain the address of the eventual destination, while
the local network header will address the intervening
gateway.  Should the destination be directly connected, these addresses
are likely to be identical, or a mapping between the two exists.
The RTF_GATEWAY flag indicates that the route is to an ``indirect''
gateway agent, and that the local network header should be filled in
from the \fIrt_gateway\fP field instead of
from the final internetwork destination address.
.PP
It is assumed that multiple routes to the same destination will not
be present; only one of multiple routes, that most recently installed,
will be used.
.PP
Routing redirect control messages are used to dynamically
modify existing routing table entries as well as dynamically
create new routing table entries.  On hosts where exhaustive
routing information is too expensive to maintain (e.g. work
stations), the
combination of wildcard routing entries and routing redirect
messages can be used to provide a simple routing management
scheme without the use of a higher level policy process. 
Current connections may be rerouted after notification of the protocols
by means of their \fIpr_ctlinput\fP entries.
Statistics are kept by the routing table routines
on the use of routing redirect messages and their
affect on the routing tables.  These statistics may be viewed using
.I netstat (1).
.PP
Status information other than routing redirect control messages
may be used in the future, but at present they are ignored.
Likewise, more intelligent ``metrics'' may be used to describe
routes in the future, possibly based on bandwidth and monetary
costs.
.NH 2
Routing table interface
.PP
A protocol accesses the routing tables through
three routines,
one to allocate a route, one to free a route, and one
to process a routing redirect control message.
The routine \fIrtalloc\fP performs route allocation; it is
called with a pointer to the following structure containing
the desired destination:
.DS
._f
struct route {
	struct	rtentry *ro_rt;
	struct	sockaddr ro_dst;
};
.DE
The route returned is assumed ``held'' by the caller until
released with an \fIrtfree\fP call.  Protocols which implement
virtual circuits, such as TCP, hold onto routes for the duration
of the circuit's lifetime, while connection-less protocols,
such as UDP, allocate and free routes whenever their destination address
changes.
.PP
The routine \fIrtredirect\fP is called to process a routing redirect
control message.  It is called with a destination address,
the new gateway to that destination, and the source of the redirect.
Redirects are accepted only from the current router for the destination.
If a non-wildcard route
exists to the destination, the gateway entry in the route is modified 
to point at the new gateway supplied.  Otherwise, a new routing
table entry is inserted reflecting the information supplied.  Routes
to interfaces and routes to gateways which are not directly accessible
from the host are ignored.
.NH 2
User level routing policies
.PP
Routing policies implemented in user processes manipulate the
kernel routing tables through two \fIioctl\fP calls.  The
commands SIOCADDRT and SIOCDELRT add and delete routing entries,
respectively; the tables are read through the /dev/kmem device.
The decision to place policy decisions in a user process implies
that routing table updates may lag a bit behind the identification of
new routes, or the failure of existing routes, but this period
of instability is normally very small with proper implementation
of the routing process.  Advisory information, such as ICMP
error messages and IMP diagnostic messages, may be read from
raw sockets (described in the next section).
.PP
Several routing policy processes have already been implemented.  The
system standard
``routing daemon'' uses a variant of the Xerox NS Routing Information
Protocol [Xerox82] to maintain up-to-date routing tables in our local
environment.  Interaction with other existing routing protocols,
such as the Internet EGP (Exterior Gateway Protocol), has been
accomplished using a similar process.
Commit	Line	Data
15637ed4 RG	1	.\" Copyright (c) 1983, 1986 The Regents of the University of California.
	2	.\" All rights reserved.
	3	.\"
	4	.\" Redistribution and use in source and binary forms, with or without
	5	.\" modification, are permitted provided that the following conditions
	6	.\" are met:
	7	.\" 1. Redistributions of source code must retain the above copyright
	8	.\" notice, this list of conditions and the following disclaimer.
	9	.\" 2. Redistributions in binary form must reproduce the above copyright
	10	.\" notice, this list of conditions and the following disclaimer in the
	11	.\" documentation and/or other materials provided with the distribution.
	12	.\" 3. All advertising materials mentioning features or use of this software
	13	.\" must display the following acknowledgement:
	14	.\" This product includes software developed by the University of
	15	.\" California, Berkeley and its contributors.
	16	.\" 4. Neither the name of the University nor the names of its contributors
	17	.\" may be used to endorse or promote products derived from this software
	18	.\" without specific prior written permission.
	19	.\"
	20	.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
	21	.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	22	.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
	23	.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
	24	.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
	25	.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
	26	.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
	27	.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
	28	.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
	29	.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
	30	.\" SUCH DAMAGE.
	31	.\"
	32	.\" @(#)a.t 6.5 (Berkeley) 4/17/91
	33	.\"
	34	.nr H2 1
	35	.\".ds RH "Gateways and routing
	36	.br
	37	.ne 2i
	38	.NH
	39	\s+2Gateways and routing issues\s0
	40	.PP
	41	The system has been designed with the expectation that it will
	42	be used in an internetwork environment. The ``canonical''
	43	environment was envisioned to be a collection of local area
	44	networks connected at one or more points through hosts with
	45	multiple network interfaces (one on each local area network),
	46	and possibly a connection to a long haul network (for example,
	47	the ARPANET). In such an environment, issues of
	48	gatewaying and packet routing become very important. Certain
	49	of these issues, such as congestion
	50	control, have been handled in a simplistic manner or specifically
	51	not addressed.
	52	Instead, where possible, the network system
	53	attempts to provide simple mechanisms upon which more involved
	54	policies may be implemented. As some of these problems become
	55	better understood, the solutions developed will be incorporated
	56	into the system.
	57	.PP
	58	This section will describe the facilities provided for packet
	59	routing. The simplistic mechanisms provided for congestion
	60	control are described in chapter 12.
	61	.NH 2
	62	Routing tables
	63	.PP
	64	The network system maintains a set of routing tables for
65	selecting a network interface to use in delivering a
66	packet to its destination. These tables are of the form:
67	.DS
68	.ta \w'struct 'u +\w'u_long 'u +\w'sockaddr rt_gateway; 'u
69	struct rtentry {
70	u_long rt_hash; /* hash key for lookups */
71	struct sockaddr rt_dst; /* destination net or host */
72	struct sockaddr rt_gateway; /* forwarding agent */
73	short rt_flags; /* see below */
74	short rt_refcnt; /* no. of references to structure */
75	u_long rt_use; /* packets sent using route */
76	struct ifnet rt_ifp; / interface to give packet to */
77	};
78	.DE
79	.PP
80	The routing information is organized in two separate tables, one
81	for routes to a host and one for routes to a network. The
82	distinction between hosts and networks is necessary so
83	that a single mechanism may be used
84	for both broadcast and multi-drop type networks, and
85	also for networks built from point-to-point links (e.g
86	DECnet [DEC80]).
87	.PP
88	Each table is organized as a hashed set of linked lists.
89	Two 32-bit hash values are calculated by routines defined for
90	each address family; one based on the destination being
91	a host, and one assuming the target is the network portion
92	of the address. Each hash value is used to
93	locate a hash chain to search (by taking the value modulo the
94	hash table size) and the entire 32-bit value is then
95	used as a key in scanning the list of routes. Lookups are
96	applied first to the routing
97	table for hosts, then to the routing table for networks.
98	If both lookups fail, a final lookup is made for a ``wildcard''
99	route (by convention, network 0).
100	The first appropriate route discovered is used.
101	By doing this, routes to a specific host on a network may be
102	present as well as routes to the network. This also allows a
103	``fall back'' network route to be defined to a ``smart'' gateway
104	which may then perform more intelligent routing.
105	.PP
106	Each routing table entry contains a destination (the desired final destination),
107	a gateway to which to send the packet,
108	and various flags which indicate the route's status and type (host or
109	network). A count
110	of the number of packets sent using the route is kept, along
111	with a count of ``held references'' to the dynamically
112	allocated structure to insure that memory reclamation
113	occurs only when the route is not in use. Finally, a pointer to the
114	a network interface is kept; packets sent using
115	the route should be handed to this interface.
116	.PP
117	Routes are typed in two ways: either as host or network, and as
118	``direct'' or ``indirect''. The host/network
119	distinction determines how to compare the \fIrt_dst\fP field
120	during lookup. If the route is to a network, only a packet's
121	destination network is compared to the \fIrt_dst\fP entry stored
122	in the table. If the route is to a host, the addresses must
123	match bit for bit.
124	.PP
125	The distinction between ``direct'' and ``indirect'' routes indicates
126	whether the destination is directly connected to the source.
127	This is needed when performing local network encapsulation. If
128	a packet is destined for a peer at a host or network which is
129	not directly connected to the source, the internetwork packet
130	header will
131	contain the address of the eventual destination, while
132	the local network header will address the intervening
133	gateway. Should the destination be directly connected, these addresses
134	are likely to be identical, or a mapping between the two exists.
135	The RTF_GATEWAY flag indicates that the route is to an ``indirect''
136	gateway agent, and that the local network header should be filled in
137	from the \fIrt_gateway\fP field instead of
138	from the final internetwork destination address.
139	.PP
140	It is assumed that multiple routes to the same destination will not
141	be present; only one of multiple routes, that most recently installed,
142	will be used.
143	.PP
144	Routing redirect control messages are used to dynamically
145	modify existing routing table entries as well as dynamically
146	create new routing table entries. On hosts where exhaustive
147	routing information is too expensive to maintain (e.g. work
148	stations), the
149	combination of wildcard routing entries and routing redirect
150	messages can be used to provide a simple routing management
151	scheme without the use of a higher level policy process.
152	Current connections may be rerouted after notification of the protocols
153	by means of their \fIpr_ctlinput\fP entries.
154	Statistics are kept by the routing table routines
155	on the use of routing redirect messages and their
156	affect on the routing tables. These statistics may be viewed using
78ed81a3	157	.I netstat (1).
15637ed4 RG	158	.PP
	159	Status information other than routing redirect control messages
	160	may be used in the future, but at present they are ignored.
	161	Likewise, more intelligent ``metrics'' may be used to describe
	162	routes in the future, possibly based on bandwidth and monetary
	163	costs.
	164	.NH 2
	165	Routing table interface
	166	.PP
	167	A protocol accesses the routing tables through
	168	three routines,
	169	one to allocate a route, one to free a route, and one
	170	to process a routing redirect control message.
	171	The routine \fIrtalloc\fP performs route allocation; it is
	172	called with a pointer to the following structure containing
	173	the desired destination:
	174	.DS
	175	._f
	176	struct route {
	177	struct rtentry *ro_rt;
	178	struct sockaddr ro_dst;
	179	};
	180	.DE
	181	The route returned is assumed ``held'' by the caller until
	182	released with an \fIrtfree\fP call. Protocols which implement
	183	virtual circuits, such as TCP, hold onto routes for the duration
	184	of the circuit's lifetime, while connection-less protocols,
	185	such as UDP, allocate and free routes whenever their destination address
	186	changes.
	187	.PP
	188	The routine \fIrtredirect\fP is called to process a routing redirect
	189	control message. It is called with a destination address,
	190	the new gateway to that destination, and the source of the redirect.
	191	Redirects are accepted only from the current router for the destination.
	192	If a non-wildcard route
	193	exists to the destination, the gateway entry in the route is modified
	194	to point at the new gateway supplied. Otherwise, a new routing
	195	table entry is inserted reflecting the information supplied. Routes
	196	to interfaces and routes to gateways which are not directly accessible
	197	from the host are ignored.
	198	.NH 2
	199	User level routing policies
	200	.PP
	201	Routing policies implemented in user processes manipulate the
	202	kernel routing tables through two \fIioctl\fP calls. The
	203	commands SIOCADDRT and SIOCDELRT add and delete routing entries,
	204	respectively; the tables are read through the /dev/kmem device.
	205	The decision to place policy decisions in a user process implies
	206	that routing table updates may lag a bit behind the identification of
	207	new routes, or the failure of existing routes, but this period
	208	of instability is normally very small with proper implementation
	209	of the routing process. Advisory information, such as ICMP
	210	error messages and IMP diagnostic messages, may be read from
	211	raw sockets (described in the next section).
	212	.PP
	213	Several routing policy processes have already been implemented. The
	214	system standard
	215	``routing daemon'' uses a variant of the Xerox NS Routing Information
	216	Protocol [Xerox82] to maintain up-to-date routing tables in our local
	217	environment. Interaction with other existing routing protocols,
	218	such as the Internet EGP (Exterior Gateway Protocol), has been
	219	accomplished using a similar process.