Minor changes to cross compiler howto.
[website_subgeniuskitty.com] / data / development / pdp-11 / modern_c_software_development / pdp11-cross-compiler.md
CommitLineData
4d61aa4e
AT
1# Overview #
2
3TODO: Write introduction. Goal is to build a cross compiler targeting pdp11-aout.
4
5TODO: What kind of joint header do I want across all the articles in a set, linking them together?
6
7This document guides you through building a cross compiler using GCC on
8FreeBSD. This cross compiler will run on a modern AMD64 machine but emit code
9which runs on a DEC PDP-11. In addition to the compiler, these instructions
10also build associated tooling like an assembler, linker, etc.
11
12In this manner, modern programming tools like `make`, `git`, `vi`, and more can
13be used to write modern C in your usual style while targeting the PDP-11.
14
15
16# Installation #
17
18These instructions were tested on FreeBSD 12 with GCC 7.3.0 from ports as the
19host compiler. The cross compiler was built from the GCC 10.2.0 and Binutils
202.35.1 source code.
21
22Building GCC requires GNU Make. On FreeBSD either install via `pkg install
23gmake` or build from ports under `devel/gmake`. On Linux your `make` command is
24probably `gmake` in disguise. Run `make --version` and see if the first line is
25something like `GNU Make 4.2.1`.
26
27In addition to GCC, we will also need to compile GNU Binutils since it contains
28the assembler, linker, and other necessary tools.
29
30Obtain suitable source code tarballs from these links.
31
32 - <https://www.gnu.org/software/binutils/>
33
34 - <https://www.gnu.org/software/gcc/>
35
36I like to build all my cross compilers under one folder in my home directory,
37each with a version specific sub-folder.
38
39 setenv PREFIX "$HOME/cross-compiler/pdp11-gcc10.2.0"
40
41Remember to make any `$PATH` changes permanent. For `tcsh` on FreeBSD, this
42means editing `~/.cshrc`. To set the `$PATH` for this session, execute the
43following.
44
45 setenv PATH "$PREFIX/bin:$PATH"
46
47The `$TARGET` environment variable is critical as it tells GCC what kind of
48cross compiler we desire. In our case, this [target
49triplet](https://wiki.osdev.org/Target_Triplet) is requesting code for the
50PDP-11 architecture, wrapped in an `a.out` container, with no hosted
51environment. That means this is a bare-metal target. There will be no C
52standard library, only the C language itself.
53
54 setenv TARGET pdp11-aout
55
56Both GCC and binutils are best built from outside the source tree. Make two
57directories to hold the build detritus. Use a clean build directory each time
58you reconfigure or rebuild.
59
60 cd $HOME/cross-compiler/pdp11-gcc10.2.0
61 mkdir workdir-binutils
62 mkdir workdir-gcc
63
64Build binutils first. Assuming you saved the source code in
65`~/cross-compiler/pdp11-gcc10.2.0/`, simply do the following.
66
67 cd $HOME/cross-compiler/pdp11-gcc10.2.0
68 tar xzf binutils-2.35.1.tar.gz
69 cd workdir-binutils
70
71Now configure, build and install binutils.
72
73 ../binutils-2.35.1/configure --target=$TARGET --prefix="$PREFIX" \
74 --with-sysroot --disable-nls --disable-werror
75 gmake
76 gmake install
77
78Verify that you can access a series of files in your `$PATH` named
79`pdp11-aout-*` (e.g. `pdp11-aout-as`), and that checking their version with
80`pdp11-aout-as --version` results in something like `GNU Binutils 2.35.1`.
81
82With binutils built and installed, now it's time to build GCC.
83
84Follow a similar process to unpack the source code, but note the new
85requirement to download dependencies. In older versions of GCC this command was
86`./contrib/download-dependencies` instead of
87`./contrib/download-prerequisites`.
88
89 cd $HOME/cross-compiler/pdp11-gcc10.2.0
90 tar xzf gcc-10.2.0.tar.gz
91 cd gcc-10.2.0
92 ./contrib/download-prerequisites
93 cd ../workdir-gcc
94
95Configuring GCC proceeds similarly to binutils. Both GNU `as` and GNU `ld` are
96part of binutils, hence the directive informing GCC to use them.
97
98 ../gcc-10.2.0/configure --target=$TARGET --prefix="$PREFIX" \
99 --disable-nls --enable-languages=c --without-headers \
100 --with-gnu-as --with-gnu-ld --disable-libssp
101 gmake all-gcc
102 gmake install-gcc
103
104Verify that `pdp11-aout-gcc --version` from your `$PATH` reports something like
105`pdp11-aout-gcc 10.2.0`.
106
107That's it, you're done. You now have a cross compiler that will run on your
108workstation and output PDP-11 compatible binaries in `a.out` format.
109
110At this point you can [skip ahead to the next section](TODO) or continue
111reading about some potential pitfalls of the cross compiler we've just built.
112
113
114# Potential Pitfalls #
115
116Below are a few problems I ran into while using my cross compiler, some of
117which may apply when compiling your own code for the PDP-11. I hope that by
118mentioning the problems here, along with symptoms and workarounds, you might be
119saved some time when encountering them.
120
121## Compiling libgcc ##
122
123Our newly built cross compiler expects `libgcc` to exist at link time, but we
124didn't build it. So what is `libgcc` anyway? Quoting from the [GCC
125manual](https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html):
126
127 GCC provides a low-level runtime library, libgcc.a or libgcc_s.so.1 on some
128 platforms. GCC generates calls to routines in this library automatically,
129 whenever it needs to perform some operation that is too complicated to emit
130 inline code for.
131
132 Most of the routines in libgcc handle arithmetic operations that the target
133 processor cannot perform directly. This includes integer multiply and divide on
134 some machines, and all floating-point and fixed-point operations on other
135 machines. libgcc also includes routines for exception handling, and a handful
136 of miscellaneous operations.
137
138 Some of these routines can be defined in mostly machine-independent C. Others
139 must be hand-written in assembly language for each processor that needs them.
140
141Why didn't we build `libgcc`? Because we encountered this [error
142message](./pdp11-cross-compiler-libgcc-errormsg.txt).
143
144
145### Problem ###
146
147Consider the following C code which performs division and modulus operations on
14816-bit unsigned integers.
149
150 #include "pdp11.h"
151 #include <stdint.h>
152
153 uint16_t a=8, b=64;
154 printf("b \% a = %o\n", b % a);
155 printf("b / a = %o\n", b / a);
156
157If we try to compile this code, we receive two errors from the linker.
158
159 pdp11-aout-ld: example.o:example.o:(.text+0x8e): undefined reference to `__umodhi3'
160 pdp11-aout-ld: example.o:example.o:(.text+0xac): undefined reference to `__udivhi3'
161
162The two functions referenced, `__umodhi3` and `__udivhi3` are part of `libgcc`.
163The names reference the **u**nsigned **mod**ulo or **div**ision on
7ce9a1b4 164**h**alf-**i**nteger types. Per the [GCC
4d61aa4e
AT
165manual](https://gcc.gnu.org/onlinedocs/gccint/Machine-Modes.html#Machine-Modes),
166the half-integer mode uses a two-byte integer.
167
168
169### Solution ###
170
171There are two ways around this problem.
172
173The first (and superior) option is figuring out how to build `libgcc`. The
174command to initiate the build is `gmake all-target-libgcc`, executed under the
175same environment in which `gmake all-gcc` was executed earlier in this guide.
176If you figure out what I'm doing wrong, let me know.
177
178The second option is to implement your own functions for `__umodhi3()`,
179`__udivhi3()`, and whatever else might come up. It's not hard to make something
180functional, though catching all the edge cases could be challenging.
181
182
7ce9a1b4 183## Using uint32 ##
4d61aa4e
AT
184
185Although the PDP-11 utilizes a 16-bit word, GCC is clever enough to allow
186operations on 32-bit words by breaking them up into smaller operations. For
187example, in the following assembly code generated by GCC, note how the 32-bit
188word is pushed onto the stack as two separate words.
189
190 uint32_t a=0710004010 uint16_t a=010;
191
192 add $-4, sp add $-2, sp
193 mov $3440, (sp) mov $10, (sp)
194 mov $4010, 2(sp)
195
196
197### Problem ###
198
199Whenever I try to make real use of code with `uint32_t`, I encounter internal
200compiler errors like the following.
201
202 memtest.c:119:1: error: insn does not satisfy its constraints:
203 }
204 ^
205 (insn 95 44 45 (set (reg:HI 1 r1)
206 (reg/f:HI 16 virtual-incoming-args)) "memtest.c":114 14 {movhi}
207 (nil))
208 memtest.c:119:1: internal compiler error: in extract_constrain_insn_cached, at recog.c:2225
209 no stack trace because unwind library not available
210 Please submit a full bug report,
211 with preprocessed source if appropriate.
212 See <https://gcc.gnu.org/bugs/> for instructions.
213 *** Error code 1
214
215In each case, adding a single `uint32_t` operation in one spot in the code
216resulted in a compiler error in a completely different part of the code.
217Removing the offending `uint32_t` line caused the program to again compile and
218execute normally. In each case, I already had `uint32_t` related code working
219elsewhere in the program.
220
221
222### Solution ###
223
224Until I track down the bug causing these errors, I've been using structs
225containing pairs of `uint16_t` words and writing helper functions to perform
226operations on them.
227
228
229## GNU Assembler Bug ##
230
231If you're stuck using an older version of GNU binutils, as I was while cross
232compiling from a SPARCstation 20, there is a bug in the GNU assembler that
233crops up whenever double-indirection is used in GCC. It was present until at
234least GNU Binutil 2.28 but appears to be fixed no later than 2.32 per the
235following code snippet in `binutils-2.32/gas/config/tc-pdp11.c`.
236
237 if (*str == '@' || *str == '*')
238 {
239 /* @(Rn) == @0(Rn): Mode 7, Indexed deferred.
240 Check for auto-increment deferred. */
241 if ( ...
242
243
244### Problem ###
245
246One of the addressing modes supported by the PDP-11 is 'index deferred',
247represented by `@X(Rn)`. This operand indicates that `Rn` contains a pointer
248which should be dereferenced and the result added to `X` to generate a new
249pointer to the final location. For example, consider the following four values,
250one stored in a register and the other three in memory. Then `@2(R1)` is the
251value `222`.
252
253 R1: 1000
254 1000: 2000
255 2000: 111
256 2002: 222
257
258Similarly, `@0(R1)` is the value `111`. In most PDP-11 assemblers, including
259DEC's MACRO-11 assembler, the string `@(Rn)` is an alias to `@0(Rn)`. But when
260the GNU assembler encounters `@(Rn)` it assembles it as though it were `(Rn)`,
261a single level of indirection instead of two levels!
262
263If we're only writing assembly then we can work around this bug by always using
264the form `@0(Rn)`. But what if we're writing C and using GCC to compile it?
265Consider the following C code example, taken directly from some stack-based
266debugger code written for the PDP-11.
267
268 uint16_t ** csp = (uint16_t **) 070000;
269 *csp = (uint16_t *) 060000;
270 **csp = 0;
271
272When GCC compiles this to assembly it generates code of the form `@(Rn)` when
273assigning a value to `**csp` thus causing the value `0` to overwrite the value
274`060000` at `*csp` if GNU `as` is used to assemble the code.
275
276
277### Solution ###
278
279The following patch, tested on GNU binutils 2.28, fixes the bug. It's a little
280hacky since it overloads the `operand->code` variable to pass unrelated state
281information to `parse_reg()`.
282
283 --- tc-pdp11.c 2017-06-24 22:33:00.260210000 -0700
284 +++ tc-pdp11.c.fixed 2017-06-24 22:32:12.455205000 -0700
285 @@ -431,6 +431,9 @@
286 {
287 LITTLENUM_TYPE literal_float[2];
288
289 + /* Store the value (if any) passed by parse_op_noreg() before parse_reg() overwrites it. */
290 + int deferred = operand->code;
291 +
292 str = skip_whitespace (str);
293
294 switch (*str)
295 @@ -451,6 +454,15 @@
296 operand->code |= 020;
297 str++;
298 }
299 + /*
300 + * This catches the case where @(Rn) is interpreted as (Rn) rather than @0(Rn)
301 + */
302 + else if (deferred)
303 + {
304 + operand->additional = 1;
305 + operand->word = 0;
306 + operand->code |= 060;
307 + }
308 else
309 {
310 operand->code |= 010;
311 @@ -581,6 +593,12 @@
312
313 if (*str == '@' || *str == '*')
314 {
315 + /*
316 + * operand->code is overwritten by parse_reg() inside parse_op_no_deferred()
317 + * We use it to temporarily catch the alias @(Rn) -> @0(Rn) since
318 + * parse_op_no_deferred() starts at str+1 and thus misses the '@'.
319 + */
320 + operand->code |= 010;
321 str = parse_op_no_deferred (str + 1, operand);
322 if (operand->error)
323 return str;
324