| 1 | # Overview # |
| 2 | |
| 3 | TODO: Write introduction. Goal is to build a cross compiler targeting pdp11-aout. |
| 4 | |
| 5 | TODO: What kind of joint header do I want across all the articles in a set, linking them together? |
| 6 | |
| 7 | This document guides you through building a cross compiler using GCC on |
| 8 | FreeBSD. This cross compiler will run on a modern AMD64 machine but emit code |
| 9 | which runs on a DEC PDP-11. In addition to the compiler, these instructions |
| 10 | also build associated tooling like an assembler, linker, etc. |
| 11 | |
| 12 | In this manner, modern programming tools like `make`, `git`, `vi`, and more can |
| 13 | be used to write modern C in your usual style while targeting the PDP-11. |
| 14 | |
| 15 | |
| 16 | # Installation # |
| 17 | |
| 18 | These instructions were tested on FreeBSD 12 with GCC 7.3.0 from ports as the |
| 19 | host compiler. The cross compiler was built from the GCC 10.2.0 and Binutils |
| 20 | 2.35.1 source code. |
| 21 | |
| 22 | Building GCC requires GNU Make. On FreeBSD either install via `pkg install |
| 23 | gmake` or build from ports under `devel/gmake`. On Linux your `make` command is |
| 24 | probably `gmake` in disguise. Run `make --version` and see if the first line is |
| 25 | something like `GNU Make 4.2.1`. |
| 26 | |
| 27 | In addition to GCC, we will also need to compile GNU Binutils since it contains |
| 28 | the assembler, linker, and other necessary tools. |
| 29 | |
| 30 | Obtain suitable source code tarballs from these links. |
| 31 | |
| 32 | - <https://www.gnu.org/software/binutils/> |
| 33 | |
| 34 | - <https://www.gnu.org/software/gcc/> |
| 35 | |
| 36 | I like to build all my cross compilers under one folder in my home directory, |
| 37 | each with a version specific sub-folder. |
| 38 | |
| 39 | setenv PREFIX "$HOME/cross-compiler/pdp11-gcc10.2.0" |
| 40 | |
| 41 | Remember to make any `$PATH` changes permanent. For `tcsh` on FreeBSD, this |
| 42 | means editing `~/.cshrc`. To set the `$PATH` for this session, execute the |
| 43 | following. |
| 44 | |
| 45 | setenv PATH "$PREFIX/bin:$PATH" |
| 46 | |
| 47 | The `$TARGET` environment variable is critical as it tells GCC what kind of |
| 48 | cross compiler we desire. In our case, this [target |
| 49 | triplet](https://wiki.osdev.org/Target_Triplet) is requesting code for the |
| 50 | PDP-11 architecture, wrapped in an `a.out` container, with no hosted |
| 51 | environment. That means this is a bare-metal target. There will be no C |
| 52 | standard library, only the C language itself. |
| 53 | |
| 54 | setenv TARGET pdp11-aout |
| 55 | |
| 56 | Both GCC and binutils are best built from outside the source tree. Make two |
| 57 | directories to hold the build detritus. Use a clean build directory each time |
| 58 | you reconfigure or rebuild. |
| 59 | |
| 60 | cd $HOME/cross-compiler/pdp11-gcc10.2.0 |
| 61 | mkdir workdir-binutils |
| 62 | mkdir workdir-gcc |
| 63 | |
| 64 | Build binutils first. Assuming you saved the source code in |
| 65 | `~/cross-compiler/pdp11-gcc10.2.0/`, simply do the following. |
| 66 | |
| 67 | cd $HOME/cross-compiler/pdp11-gcc10.2.0 |
| 68 | tar xzf binutils-2.35.1.tar.gz |
| 69 | cd workdir-binutils |
| 70 | |
| 71 | Now configure, build and install binutils. |
| 72 | |
| 73 | ../binutils-2.35.1/configure --target=$TARGET --prefix="$PREFIX" \ |
| 74 | --with-sysroot --disable-nls --disable-werror |
| 75 | gmake |
| 76 | gmake install |
| 77 | |
| 78 | Verify that you can access a series of files in your `$PATH` named |
| 79 | `pdp11-aout-*` (e.g. `pdp11-aout-as`), and that checking their version with |
| 80 | `pdp11-aout-as --version` results in something like `GNU Binutils 2.35.1`. |
| 81 | |
| 82 | With binutils built and installed, now it's time to build GCC. |
| 83 | |
| 84 | Follow a similar process to unpack the source code, but note the new |
| 85 | requirement to download dependencies. In older versions of GCC this command was |
| 86 | `./contrib/download-dependencies` instead of |
| 87 | `./contrib/download-prerequisites`. |
| 88 | |
| 89 | cd $HOME/cross-compiler/pdp11-gcc10.2.0 |
| 90 | tar xzf gcc-10.2.0.tar.gz |
| 91 | cd gcc-10.2.0 |
| 92 | ./contrib/download-prerequisites |
| 93 | cd ../workdir-gcc |
| 94 | |
| 95 | Configuring GCC proceeds similarly to binutils. Both GNU `as` and GNU `ld` are |
| 96 | part of binutils, hence the directive informing GCC to use them. |
| 97 | |
| 98 | ../gcc-10.2.0/configure --target=$TARGET --prefix="$PREFIX" \ |
| 99 | --disable-nls --enable-languages=c --without-headers \ |
| 100 | --with-gnu-as --with-gnu-ld --disable-libssp |
| 101 | gmake all-gcc |
| 102 | gmake install-gcc |
| 103 | |
| 104 | Verify that `pdp11-aout-gcc --version` from your `$PATH` reports something like |
| 105 | `pdp11-aout-gcc 10.2.0`. |
| 106 | |
| 107 | That's it, you're done. You now have a cross compiler that will run on your |
| 108 | workstation and output PDP-11 compatible binaries in `a.out` format. |
| 109 | |
| 110 | At this point you can [skip ahead to the next section](TODO) or continue |
| 111 | reading about some potential pitfalls of the cross compiler we've just built. |
| 112 | |
| 113 | |
| 114 | # Potential Pitfalls # |
| 115 | |
| 116 | Below are a few problems I ran into while using my cross compiler, some of |
| 117 | which may apply when compiling your own code for the PDP-11. I hope that by |
| 118 | mentioning the problems here, along with symptoms and workarounds, you might be |
| 119 | saved some time when encountering them. |
| 120 | |
| 121 | ## Compiling libgcc ## |
| 122 | |
| 123 | Our newly built cross compiler expects `libgcc` to exist at link time, but we |
| 124 | didn't build it. So what is `libgcc` anyway? Quoting from the [GCC |
| 125 | manual](https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html): |
| 126 | |
| 127 | GCC provides a low-level runtime library, libgcc.a or libgcc_s.so.1 on some |
| 128 | platforms. GCC generates calls to routines in this library automatically, |
| 129 | whenever it needs to perform some operation that is too complicated to emit |
| 130 | inline code for. |
| 131 | |
| 132 | Most of the routines in libgcc handle arithmetic operations that the target |
| 133 | processor cannot perform directly. This includes integer multiply and divide on |
| 134 | some machines, and all floating-point and fixed-point operations on other |
| 135 | machines. libgcc also includes routines for exception handling, and a handful |
| 136 | of miscellaneous operations. |
| 137 | |
| 138 | Some of these routines can be defined in mostly machine-independent C. Others |
| 139 | must be hand-written in assembly language for each processor that needs them. |
| 140 | |
| 141 | Why didn't we build `libgcc`? Because we encountered this [error |
| 142 | message](./pdp11-cross-compiler-libgcc-errormsg.txt). |
| 143 | |
| 144 | |
| 145 | ### Problem ### |
| 146 | |
| 147 | Consider the following C code which performs division and modulus operations on |
| 148 | 16-bit unsigned integers. |
| 149 | |
| 150 | #include "pdp11.h" |
| 151 | #include <stdint.h> |
| 152 | |
| 153 | uint16_t a=8, b=64; |
| 154 | printf("b \% a = %o\n", b % a); |
| 155 | printf("b / a = %o\n", b / a); |
| 156 | |
| 157 | If we try to compile this code, we receive two errors from the linker. |
| 158 | |
| 159 | pdp11-aout-ld: example.o:example.o:(.text+0x8e): undefined reference to `__umodhi3' |
| 160 | pdp11-aout-ld: example.o:example.o:(.text+0xac): undefined reference to `__udivhi3' |
| 161 | |
| 162 | The two functions referenced, `__umodhi3` and `__udivhi3` are part of `libgcc`. |
| 163 | The names reference the **u**nsigned **mod**ulo or **div**ision on |
| 164 | **h**alf-**i**teger types. Per the [GCC |
| 165 | manual](https://gcc.gnu.org/onlinedocs/gccint/Machine-Modes.html#Machine-Modes), |
| 166 | the half-integer mode uses a two-byte integer. |
| 167 | |
| 168 | |
| 169 | ### Solution ### |
| 170 | |
| 171 | There are two ways around this problem. |
| 172 | |
| 173 | The first (and superior) option is figuring out how to build `libgcc`. The |
| 174 | command to initiate the build is `gmake all-target-libgcc`, executed under the |
| 175 | same environment in which `gmake all-gcc` was executed earlier in this guide. |
| 176 | If you figure out what I'm doing wrong, let me know. |
| 177 | |
| 178 | The second option is to implement your own functions for `__umodhi3()`, |
| 179 | `__udivhi3()`, and whatever else might come up. It's not hard to make something |
| 180 | functional, though catching all the edge cases could be challenging. |
| 181 | |
| 182 | |
| 183 | ## uint32 ## |
| 184 | |
| 185 | Although the PDP-11 utilizes a 16-bit word, GCC is clever enough to allow |
| 186 | operations on 32-bit words by breaking them up into smaller operations. For |
| 187 | example, in the following assembly code generated by GCC, note how the 32-bit |
| 188 | word is pushed onto the stack as two separate words. |
| 189 | |
| 190 | uint32_t a=0710004010 uint16_t a=010; |
| 191 | |
| 192 | add $-4, sp add $-2, sp |
| 193 | mov $3440, (sp) mov $10, (sp) |
| 194 | mov $4010, 2(sp) |
| 195 | |
| 196 | |
| 197 | ### Problem ### |
| 198 | |
| 199 | Whenever I try to make real use of code with `uint32_t`, I encounter internal |
| 200 | compiler errors like the following. |
| 201 | |
| 202 | memtest.c:119:1: error: insn does not satisfy its constraints: |
| 203 | } |
| 204 | ^ |
| 205 | (insn 95 44 45 (set (reg:HI 1 r1) |
| 206 | (reg/f:HI 16 virtual-incoming-args)) "memtest.c":114 14 {movhi} |
| 207 | (nil)) |
| 208 | memtest.c:119:1: internal compiler error: in extract_constrain_insn_cached, at recog.c:2225 |
| 209 | no stack trace because unwind library not available |
| 210 | Please submit a full bug report, |
| 211 | with preprocessed source if appropriate. |
| 212 | See <https://gcc.gnu.org/bugs/> for instructions. |
| 213 | *** Error code 1 |
| 214 | |
| 215 | In each case, adding a single `uint32_t` operation in one spot in the code |
| 216 | resulted in a compiler error in a completely different part of the code. |
| 217 | Removing the offending `uint32_t` line caused the program to again compile and |
| 218 | execute normally. In each case, I already had `uint32_t` related code working |
| 219 | elsewhere in the program. |
| 220 | |
| 221 | |
| 222 | ### Solution ### |
| 223 | |
| 224 | Until I track down the bug causing these errors, I've been using structs |
| 225 | containing pairs of `uint16_t` words and writing helper functions to perform |
| 226 | operations on them. |
| 227 | |
| 228 | |
| 229 | ## GNU Assembler Bug ## |
| 230 | |
| 231 | If you're stuck using an older version of GNU binutils, as I was while cross |
| 232 | compiling from a SPARCstation 20, there is a bug in the GNU assembler that |
| 233 | crops up whenever double-indirection is used in GCC. It was present until at |
| 234 | least GNU Binutil 2.28 but appears to be fixed no later than 2.32 per the |
| 235 | following code snippet in `binutils-2.32/gas/config/tc-pdp11.c`. |
| 236 | |
| 237 | if (*str == '@' || *str == '*') |
| 238 | { |
| 239 | /* @(Rn) == @0(Rn): Mode 7, Indexed deferred. |
| 240 | Check for auto-increment deferred. */ |
| 241 | if ( ... |
| 242 | |
| 243 | |
| 244 | ### Problem ### |
| 245 | |
| 246 | One of the addressing modes supported by the PDP-11 is 'index deferred', |
| 247 | represented by `@X(Rn)`. This operand indicates that `Rn` contains a pointer |
| 248 | which should be dereferenced and the result added to `X` to generate a new |
| 249 | pointer to the final location. For example, consider the following four values, |
| 250 | one stored in a register and the other three in memory. Then `@2(R1)` is the |
| 251 | value `222`. |
| 252 | |
| 253 | R1: 1000 |
| 254 | 1000: 2000 |
| 255 | 2000: 111 |
| 256 | 2002: 222 |
| 257 | |
| 258 | Similarly, `@0(R1)` is the value `111`. In most PDP-11 assemblers, including |
| 259 | DEC's MACRO-11 assembler, the string `@(Rn)` is an alias to `@0(Rn)`. But when |
| 260 | the GNU assembler encounters `@(Rn)` it assembles it as though it were `(Rn)`, |
| 261 | a single level of indirection instead of two levels! |
| 262 | |
| 263 | If we're only writing assembly then we can work around this bug by always using |
| 264 | the form `@0(Rn)`. But what if we're writing C and using GCC to compile it? |
| 265 | Consider the following C code example, taken directly from some stack-based |
| 266 | debugger code written for the PDP-11. |
| 267 | |
| 268 | uint16_t ** csp = (uint16_t **) 070000; |
| 269 | *csp = (uint16_t *) 060000; |
| 270 | **csp = 0; |
| 271 | |
| 272 | When GCC compiles this to assembly it generates code of the form `@(Rn)` when |
| 273 | assigning a value to `**csp` thus causing the value `0` to overwrite the value |
| 274 | `060000` at `*csp` if GNU `as` is used to assemble the code. |
| 275 | |
| 276 | |
| 277 | ### Solution ### |
| 278 | |
| 279 | The following patch, tested on GNU binutils 2.28, fixes the bug. It's a little |
| 280 | hacky since it overloads the `operand->code` variable to pass unrelated state |
| 281 | information to `parse_reg()`. |
| 282 | |
| 283 | --- tc-pdp11.c 2017-06-24 22:33:00.260210000 -0700 |
| 284 | +++ tc-pdp11.c.fixed 2017-06-24 22:32:12.455205000 -0700 |
| 285 | @@ -431,6 +431,9 @@ |
| 286 | { |
| 287 | LITTLENUM_TYPE literal_float[2]; |
| 288 | |
| 289 | + /* Store the value (if any) passed by parse_op_noreg() before parse_reg() overwrites it. */ |
| 290 | + int deferred = operand->code; |
| 291 | + |
| 292 | str = skip_whitespace (str); |
| 293 | |
| 294 | switch (*str) |
| 295 | @@ -451,6 +454,15 @@ |
| 296 | operand->code |= 020; |
| 297 | str++; |
| 298 | } |
| 299 | + /* |
| 300 | + * This catches the case where @(Rn) is interpreted as (Rn) rather than @0(Rn) |
| 301 | + */ |
| 302 | + else if (deferred) |
| 303 | + { |
| 304 | + operand->additional = 1; |
| 305 | + operand->word = 0; |
| 306 | + operand->code |= 060; |
| 307 | + } |
| 308 | else |
| 309 | { |
| 310 | operand->code |= 010; |
| 311 | @@ -581,6 +593,12 @@ |
| 312 | |
| 313 | if (*str == '@' || *str == '*') |
| 314 | { |
| 315 | + /* |
| 316 | + * operand->code is overwritten by parse_reg() inside parse_op_no_deferred() |
| 317 | + * We use it to temporarily catch the alias @(Rn) -> @0(Rn) since |
| 318 | + * parse_op_no_deferred() starts at str+1 and thus misses the '@'. |
| 319 | + */ |
| 320 | + operand->code |= 010; |
| 321 | str = parse_op_no_deferred (str + 1, operand); |
| 322 | if (operand->error) |
| 323 | return str; |
| 324 | |