RECC, The Robert Elder Compiler (and emulator and microkernel) Collection
RECC, is a collection of programming tools that includes a C compiler, a specification for a very simple and easy to emulate CPU, a toy microkernel, some assembly language specifications, and much more.
Here is a more detailed explanation with pictures.
This project is on hold until my financial situation improves enough for me to be able to work on it again. In the meantime, I plan to focus on continuing to create educational content related to computer programming, some of which intersects with this projet. Consider signing up for my mailing list on my blog for updates on what I'm doing and links to new content.
The RECC compiler has a number of goals, including
- To develop a C compiler using source code that conforms to a very strict subset of ANSI c89. ANSI C is one of the few languages that is ubiquitously supported even in very old systems, and is almost certain to be supported for many years to come (because it is (almost) a proper subset of even the newest standards of C++).
- To develop a C compiler with relatively simple source code. RECC is coded to use as little of the C preprocessor as possible, its source code avoids typedefs and it will never include GNU extensions, C99 or C++ code (although it may be able to compile code with these features in the future).
- To develop a C compiler that has very few dependencies. More dependencies means less portability, faster bit rot, and less chance of building everywhere without modification. RECC is currently only dependent on stdarg.h, stdio.h, stdlib.h, string.h and assert.h.
- Creating a simple and easy to emulate compile target for a large amount of C code. The compile target of RECC is a very simple 14 instruction machine language. The machine language is simple enough that you can write an emulator for it in a few hundred lines. This makes it possible to easily embed any C code cross-compiled with RECC into an emulator running in another language.
Highlights of the project include
- The One-Page CPU - It is called the 'One-Page CPU' because you can fit the specification on an 8.5 by 11 sheet of paper without margins using 8 point Consolas font, and still have lots of room to spare. It is intended to be an easy to emulate compile target that supports context switching and interrupts.
- A Programmers Reference Manual for assembly language programming with the One Page CPU. This discusses programming in the L1 and L2 languages.
- The One-Page CPU L1 Assembly Language Specification - RECC links multiple .l2 files into an .l1 file. An l1 file could be considered the compiler toolchain equivalent of a binary, but it is just human readable assembly with all symbols resolved and a very strict syntax.
- The One-Page CPU L2 Assembly Language Specification - RECC compiles .c files to .l2 files, which are similar to object files, but they are human readable assembly files with possibly unresolved abstract symbols.
- A toy microkernel compiled with RECC running in an emulated One-Page CPU in the browser.
- An LL grammar for the C programming language - For writing recursive descent parsers. This was hand-converted from an LR grammar, and I haven't performed any rigorous analysis to verify its soundness, but I was able to write a fairly complete compiler with it.
- Static filesystem compilation The RECC compiler currently supports the ability to statically compile files from your host computer into an object file that implements an ext like filesystem that can be reached through stdio's fopen. Only reading the filesystem is currently supported.
The compiler is still quite far from being a production compiler, but it is able to compile a small microkernel which is demoed here. The compiler is not yet self-hosting, mainly due to missing support for the subset of the C standard library that it relies on. Structs are supported. Float types are not. Variadic functions are supported. It strives to one day be an ISO C89 compliant C compiler. The compiler is able to compile on my machine with gcc or clang without any errors or warnings using very strict compile flags. It also passes all test cases with a clean valgrind report.
- Updated 2020-01-24: I have completely run out of time and money to work on this project. Hopefully, I can return to working on this project again in the future when things improve.
- Updated 2019-01-06: Pushed updates to canto-iii branch with new work on arbitrary precision integer math library. Next step will be using this to finish floating point number parsing.
- Updated 2018-10-27: Oh, hi there Hacker News. I'll be setting aside some time to work on RECC for the month of December (probably on floating point) and then sending out an update in Jan 2019.
- Updated 2018-01-01: Added a number of test cases and parser improvements. Parser is nearly fully complete with the exception of floating point numbers.
- Updated 2017-12-01: I'm still alive, I've just been busy for the last 8 months or so. Currently shifting my attention back to doing some work on RECC.
- Updated 2017-03-24: Continued work on 'canto-iii' branch. Added a number of adjustments for syntax model; Much more accurate model of struct, union and enum; Parser now handles more integer constants correctly.
- Updated 2017-01-01: Typesystem improvements have evolved into building an abstract syntax model; Most work is now taking place on the 'canto-iii' branch; This is a massive refactoring arc and it will be a long time before it can be merged into master.
- Updated 2016-10-09: Continued work on new typesystem; Ability to easily cross-build some of compiler against a previous version.
- Updated 2016-08-04: More automation in bootstrapping process; Use of typesafe ids for type system; Completed parser error messages.
- Updated 2016-06-05: Improvements to bootstrapping process; Easier generation of code for data structures; Work on type system.
- Updated 2016-04-02: Started work on refactoring type system.
- Updated 2016-02-06: Added a regex engine, and replaced the lexer code to use the regex engine API.
- Updated 2015-12-25: Added some support for virtual memory. Added open source license.
- Updated 2015-09-29: Ability to link multiple .l2 files into another .l2 file. Refactoring of project layout.
- Updated 2015-08-28: Preprocessor supports recursive function macros; Improvements to build process.
- Updated 2015-08-01: Compiler is 5 times faster when parsing and code generating its own parser; Added defined behaviour for division by zero in CPU spec.
- Updated 2015-07-01: Compiler can cross compile and run test cases with its own preprocessor, lexer, parser, and code generator. Emulated compiler is slow, but output is correct so far for small programs.
- Updated 2015-07-01: Added a demo showing the compiler compiling a small program using the cross-compiled version of itself.
- Updated 2015-06-14: Added support for emulation of an ext like filesystem; Preprocessor supports most cases of function macros; Updates to CPU specification.
- Updated 2015-05-09: Refactor to properly support lvalues and rvalues; Proof of concept malloc support; sizeof support.
- Updated 2015-04-11: Added emulators for Java and Python; Added sw directive; Steps toward supporting malloc.
- Updated 2015-02-26: Added an emulator for the C programming language that can 'run' .l1 files (including the kernel).
- Updated 2015-02-10: Full support for typedef; Fixed some cases of complex declarators; anonymous tagged type instances; Some support for multi-dimensional constant array initializers. RECC is now able to sucessfully parse and code generate most of its own source code (although the generated code likely has many errors). It cannot link its own source code yet, due to missing standard library support.
- Updated 2015-01-07: Support most basic cases of typedef; Updates to CPU spec; Instruction re-writing in linker for large symbol values; Ternary operator support.
The full collection which includes the compiler, linker, a javascript emulator, microkernel and the above specifications is here:
I've decided that this project is fun enough to work on that I'll make a personal (non-legally binding) commitment to support this project until one of the following things happens:
- I'm dead.
- I'm no longer mentally capable of supporting this project.
- December 25, 2044 at 5:00pm GMT -4
I'll define support as "Giving at least an acknowledgement (and hopefully a helpful response) to at least 95% of inquiries within 7 days."
Direct inquiries to recc[at]robertelder.org