# Copyright 2016 Robert Elder Software Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); you may not # use this file except in compliance with the License. You may obtain a copy # of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, WITHOUT # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. 1. INTRODUCTION 1.1 This document is a guide to assembly language programming for the 'One Page CPU'. The One Page CPU is a specification for a very simple CPU that was designed with the primary goal of being easy to emulate in software, while still providing a practical environment for programs running on it. The CPU uses 32-bit addresses and registers. It supports context switching, timer interrupts, I/O interrupts and 14 different machine instructions. Performance regarding execution speed or memory consumption has not been a primary consideration for this project. The justification behind this is based on the observation that is typically more difficult to go from a high performance complicated design to any kind of simple design than it is to go from an inefficient simple design to a complicated and efficient one. Make it work, make it right, make it fast. 1.2 In order to explain assembly language programming for the One Page CPU, it is helpful to describe the process of compiling .c files and executing them as it works in standard compilers. In a typical compiler (like gcc or clang), code in a .c file will first be preprocessed into 'pure C' code (typically with a .i file extension). This process resolves #include directives and replaces them with whatever was to be included. Macros like #define are also evaluated and their definitions are substituted directly throughout the code. After preprocessing, the pure c files can be used to generate object files (.o files). One or more object files can be linked together to produce an executable program or a library. 1.3 In order to cross compile C programs to run on the One Page CPU, a new cross compiler (the RECC compiler) was created. In the RECC compiler .c files are still preprocessed to .i files, but instead of generating object code, files with the extension '.l2' are generated. These files are a lot like object files, but they contain human readable assembly and relative symbol offsets. Linking information, symbol information, comments, and relaxed whitespace constraints are allowed in l2 files. The 'l2' corresponds to the 'One Page CPU L2 language' specification. After multiple .l2 files have been generated, they can be linked together to produce a single .l1 file. The 'l1' corresponds to the 'One Page CPU L1 language'. The L1 language is a proper subset of the L2 language. In the L1 language, all symbols are resolved to fixed addresses, comments are not allowed, and strict whitespace constraints are enforced. An .l1 file can be thought of as a very portable executable file whos format is a human readable assembly language. 1.4 Since one of our primary goals is to be able to run C programs inside emulators written in other programming languages, we need a way to put the l1 files in a format that can be easily run elsewhere. The process of parsing and assembling l1 files can be abstracted away between languages by an additional step that creates programming language specific L0 files. This step is done by 'preloading' an l1 file for a specific target programming language. This process produces a file with an extension of .l0.. The format of this file will conform to the syntax of the target language, and the L0 file will contain a numeric representation of machine instructions, offsets and memory layout that can be processed and emulated in a few hundred lines in the target language. 1.5 Summary: main.c code.c foo.c C code | | | v v v main.i code.i foo.i Preprocessed | | | v v v main.l2 code.l2 foo.l2 'Object Files' \ | / v v v 'Portable Executable' program.l1--+---------------+---------------+ | | | | | | v v v 'Native Executable' program.l0.js program.l0.c program.l0.py 2. REGISTERS 2.1 Register Names The One Page CPU instruction set uses 512 registers. Six of these registers have 'special' names (PC, SP, FP, ZR, FR, WR), and the other 506 registers are named r1, r2, ... r506. Each register is referred to by only one name. There are no restrictions on what registers can be used used with a given instruction. 2.2 Special Registers 2.2.1 The PC register is the familiar program counter. The value of the PC register holds the value of the next instruction to be fetched. If the PC register is used as an argument to an instruction, it will be evaluated to the value of the address of the instruction being executed plus 4. This is because after fetching an instruction the PC is incremented in anticipation of fetching the next instruction to execute. In the encoding of machine instructions the special registers are given the following numeric values: PC = 0, SP = 1, FP = 2, ZR = 3, FR = 4, and WR = 5. The registers r1 to r506 are given a numeric representation of 6 to 511. 2.2.2 The SP register is the stack pointer, which is of particular importance to any compiler targeting the One Page CPU architecture. The stack pointer is also used automatically during an interrupt, or a return from an interrupt to store or retrieve the value of the program counter. 2.2.3 The FP register is the frame pointer register. It is of importance only to the compiler, as it is a convience that makes function calls easier. The convention is that the frame pointer always points to the value of the previous frame pointer. If there is no previous frame pointer it contains the value 0. 2.2.4 The ZR register is the zero register. It stores the value 0. Changing the value in the zero register is not recommended. 2.2.5 The FR register is the flags register. It stores various flags that affect execution of the CPU: Bit 0: Writing one to this bit halts the processor. No further instructions or interrupts execute. Bit 1: Global Interrupt Enable. All interrupts enabled when 1. All interrupts disabled otherwise. Bit 2: When set to 1, atomically sets bit 1 of FR to 1, bit 2 of FR to 0, PC to [SP], and SP to SP + WR. Bit 3: TIMER1 interrupt enable. See TIMER1_PERIOD. Bit 4: TIMER1 interrupt asserted. CPU sets to 1. User must set to 0. Bit 5: UART1_OUT interrupt enable. Used for detecting when bit 9 of FR has been set by CPU. Bit 6: UART1_OUT interrupt asserted. CPU sets to 1. User must set to 0. Bit 7: UART1_IN interrupt enable. Used for detecting when bit 10 of FR has been set by CPU. Bit 8: UART1_IN interrupt asserted. CPU sets to 1. User must set to 0. Bit 9: UART1_OUT ready. Indicates whether UART1_OUT is ready. CPU sets to 1. User must set 0. Bit 10: UART1_IN ready. Indicates whether UART1_IN contains input data. CPU sets to 1. User must set 0. Bit 11-31 Reserved for future use. 2.2.6 The WR register is the word register. It stores the size in bytes of a word for the One Page CPU, which happens to be 4. 2.2.7 At CPU startup, WR = 0x4, FR = 0x200. All other registers are initialized to 0. 3. INSTRUCTIONS 3.1 There are 14 machine instructions supported by the One Page CPU. Space for 2 additional instructions was left intentionally to provide room for future enhancement. All machine instructions are 32 bits in width. Particular emphasis has been placed on making the instructions themselves as simple as possible. In modern instruction sets it is not uncommon to encounter 10 different variations for an addition instruction, depending on word sizes, and various optional immediate values, or register classes. For each of the 14 instructions supported by the One Page CPU there is only one valid syntax, where the only variability is the registers or the immediate values which are always part of the instruction. 3.2 READING THIS SECTION 31...27 26.....18 17......9 8.......0 +---------------------------------------------------------------+ | foo rA rB rC | 11111 | AAAAAAAAA | BBBBBBBBB | CCCCCCCCC | +---------------------------------------------------------------+ Diagrams like the one above are used in this section to describe the layout of the machine instructions used in the One Page CPU. The above contrived example describes a ficticious instruction 'foo'. When assembled to a 32 bit word, this machine instruction would have bits 31 to 27 set to the value '11111'. Bits 26 to 18 would describe the number used to identify whatever register the user specified for the register 'rA'. The register rA could be any of PC, SP, FP, ZR, FR, WR or r1 to r506. The same goes for registers rB and rC. If register rA was 'PC' bits 26 to 18 would have the value '000000000'. If register rC was 'r1' bits 8 to 0 would have the value '000000110'. 3.3 ADD 31...27 26.....18 17.....9 8.......0 +---------------------------------------------------------------+ | add rX rY rZ | 00000 | XXXXXXXXX | YYYYYYYYY | ZZZZZZZZZ | +---------------------------------------------------------------+ Performs a 32-bit unsigned addition bewteen the contents of register rY and rZ with the result stored in register rX. Examples: add r1 r2 r3; Simple addition add SP SP WR; Pop the stack pointer (without retrieving value) add PC r1 ZR; Branch to value in r1 3.4 SUB 31...27 26.....18 17.....9 8.......0 +---------------------------------------------------------------+ | sub rX rY rZ | 00001 | XXXXXXXXX | YYYYYYYYY | ZZZZZZZZZ | +---------------------------------------------------------------+ Performs a 32-bit unsigned subtraction of rY - rZ with the result stored in register rX. Examples: sub r1 r2 r3; Simple subtraction sub SP SP WR; Push stack pointer (without storing anything) 3.5 MUL 31...27 26.....18 17.....9 8.......0 +---------------------------------------------------------------+ | mul rX rY rZ | 00010 | XXXXXXXXX | YYYYYYYYY | ZZZZZZZZZ | +---------------------------------------------------------------+ Performs a 32-bit unsigned multiplication of rY and rZ with the result stored in register rX. Overflow is discarded. Examples: mul r1 r1 r3; 3.6 DIV 31...27 26.....18 17.....9 8.......0 +---------------------------------------------------------------+ | div rX rY rZ | 00011 | XXXXXXXXX | YYYYYYYYY | ZZZZZZZZZ | +---------------------------------------------------------------+ Performs a 32-bit unsigned division of rY and rZ with the result stored in register rX. Examples: div r1 r2 r3 r4; 3.7 AND 31...27 26.....18 17.....9 8.......0 +---------------------------------------------------------------+ | and rX rY rZ | 00100 | XXXXXXXXX | YYYYYYYYY | ZZZZZZZZZ | +---------------------------------------------------------------+ A bitwise logical 'and' of the contents of rY and the contents of rZ stored in rX. Examples: and r1 r2 r3; 3.8 OR 31...27 26.....18 17.....9 8.......0 +---------------------------------------------------------------+ | or rX rY rZ | 00101 | XXXXXXXXX | YYYYYYYYY | ZZZZZZZZZ | +---------------------------------------------------------------+ A bitwise logical 'or' of the contents of rY and the contents of rZ stored in rX. Examples: or r1 r2 r3; 3.9 NOT 31...27 26.....18 17.....9 8......0 +-----------------------------------------------------------+ | not rX rY | 00110 | XXXXXXXXX | YYYYYYYYY | Reserved | +-----------------------------------------------------------+ A bitwise logical 'not' of the contents of rY stored in rX. Examples: not r1 r2; 3.10 LOA 31...27 26.....18 17.....9 8......0 +-----------------------------------------------------------+ | loa rX rY | 00111 | XXXXXXXXX | YYYYYYYYY | Reserved | +-----------------------------------------------------------+ Load the contents of the memory location given in rY into register rX. Examples: dw 0x41; The value 'A' ll r2 0x1234; The address of the value of 'A' loa r1 r2; Load 'A' into r1; loa PC PC; Branch to the address in the next word dw 0x0DDFC; The address of something interesting 3.11 STO 31...27 26.....18 17.....9 8......0 +-----------------------------------------------------------+ | sto rX rY | 01000 | XXXXXXXXX | YYYYYYYYY | Reserved | +-----------------------------------------------------------+ Store the value in rY in the memory location given in rX. Examples: ll r1 0x41; Load 'A' into r1 sto SP r1; Store on stack sub SP SP WR; Push stack pointer ll r1 0x42; Load 'B' into r1 sto SP r1; Store again on stack 3.12 SHR 31...27 26.....18 17.....9 8......0 +-----------------------------------------------------------+ | shr rX rY | 01001 | XXXXXXXXX | YYYYYYYYY | Reserved | +-----------------------------------------------------------+ An bitwise logical shift applied to the contents of rX shifted rY bits to the right. Bits are shifted into the left side as 0s. Examples: shr r1 r2 3.13 SHL 31...27 26.....18 17.....9 8......0 +-----------------------------------------------------------+ | shl rX rY | 01010 | XXXXXXXXX | YYYYYYYYY | Reserved | +-----------------------------------------------------------+ An bitwise logical shift applied to the contents of rX shifted rY bits to the left. Bits are shifted into the right side as 0s. Examples: shl r1 r2 3.14 BEQ 31...27 26.....18 17.....9 8.......0 +---------------------------------------------------------------+ | beq rX rY i | 01011 | XXXXXXXXX | YYYYYYYYY | iiiiiiiii | +---------------------------------------------------------------+ Performs a relative branch if the contents of register rX are equal to the contents of register rY. The branch distance is expressed as an immediate positive or negative decimal number that dictates the number of words instructions to jump forward or backward. A distance of 0 would effectively be a no op since it would just execute the next instruction as normal. The immediate value is represented in the machine code as a 9 bit twos complement integer in bits 0 to 8. Examples: beq ZR ZR 0; No-op; advance to next instruction beq ZR ZR -1; Loop forever on this instruction beq r1 r2 39; Conditional jump over the next 39 instructions beq ZR ZR -2; Re-execute the previous instruction 3.15 BLT 31...27 26.....18 17.....9 8.......0 +---------------------------------------------------------------+ | blt rX rY i | 01100 | XXXXXXXXX | YYYYYYYYY | iiiiiiiii | +---------------------------------------------------------------+ Behaves exactly like the beq instruction, except the branch only occurs if rX is less than rY when rX and rY are treated as unsigned integers. Examples: blt r1 r2 39; Conditional jump over the next 39 instructions 3.16 LL 31...27 26.....18 17....16 15.............0 +----------------------------------------------------------------+ | ll rX 0xN | 01101 | XXXXXXXXX | Reserved | NNNNNNNNNNNNNNNN | +----------------------------------------------------------------+ Load the immediate hexidecimal value 0xN into register rX. The value 0xN can be a maximum of 16 bits in size. Examples: ll r1 0x41; Load 'A' into r1 4. LINKER AND LOADER DIRECTIVES 4.1 SW The sw directive (skip words) is used during the process of loading machine code to indicate that the loader program should 'skip words'. For example, if we wished to define a large region of blank space we could use the sw directive instead of defining a large number of zeros with the dw directive: ... 0x0001 0000 add PC r1 r2; Machine instructions take up 4 bytes... 0x0001 0004 sw 0x100; Skip 0x100 words = 0x400 bytes 0x0001 0404 sub SP SP WR; 0x400 bytes of unused space above this instruction ... 4.2 DW The dw directive (define word) tells the linker to explicitly define 32 bits to be some explicit value. This allows defining of arbitrary data: add PC PC PC; This instruction assembles to a 32 bit word of all 0s. dw 0x0; This assembles to be the same thing as 'add PC PC PC'. dw 0x41; The letter 'A' 4.3 OFFSET An offset declaration is used in L1 and L2 files to describe whether the code in that file should be relocatable or not. If the code is not relocatable it describes the offset where the code must be loaded to. The OFFSET directive should be the first statement in a file. An offset declaration has one of two syntaxes: 4.3.1 'OFFSET RELOCATABLE' The code in this file can be relocated anywhere. Any addresses in this file must be stated abstractly with identifiers that will be replaced in a further linking step. 4.3.2 'OFFSET END' The code in this file must be placed at the end of the resulting file. There can be a maximum of one file with this type of offset directive. 4.3.3 'OFFSET 0xNNNNNNNN' The code in this file must be linked in such a way that the resulting executable will be loaded to address 0xNNNNNNNN. If an attempt is made to link multiple non relocatable L2 files to overlapping memory locations an error will occur. 4.4 IMPLEMENTS, REQUIRES, EXTERNAL INTERNAL Linkage declarations are used to specify information about relative symbols in L2 files. A symbol declared as INTERNAL will not be visible outside the current file. A symbol declared as EXTERNAL either refers to a symbol that is implemented outside the current file, or is implemented in the current file, and will be visible in other files. In addition to specifying visibility of a symbol you must also declare whether a symbol is implemented, required or both. Examples: IMPLEMENTS INTERNAL foo; The symbol 'foo' is not visible outside this file. An internal symbol that is implemented, but not required could be safely removed since it never gets used. IMPLEMENTS, REQUIRES EXTERNAL boo; The symbol 'boo' is used in this file, and visible externally as well. 4.5 Labels Labels can be used in L2 files only. They are used to abstractly represent addresses that will change after the linking step. They consist of a line beginning with a valid C identifier and ending with a colon: Example: ... add r1 r2 r3; label1: anotherlabel: sub r1 r2 r3; label2: sub r1 r2 r3; ... In the above example, 'label1' and 'anotherlabel' both describe the same address. In the L2 language only, labels can be used as arguments to instructions. The labels will be replaced with literal values after the code has been linked to a fixed address. The following examples illustrate instructions that can use labels as their arguments: beq ZR ZR do_again; blt ZR ZR do_again; dw global_var; ll foo; Since labels can represent addresses anywhere in the 32 bit address space, the beq, blt, and ll instructions will be re-written and expanded by the linker to equivalent instruction sequences that accomplish the same thing. This is important because it relies on the assumption that the stack pointer is always pointing to the top of the stack. 4.5 FUNCTION, VARIABLE, CONSTANT These directives are used to describe the start and end of regions of machine instructions. These directives will dictate the permissions assigned to these regions when the are loaded into memory in order for virtual memory to work properly. 4.6 START, END Used in proximity to a few different directives to indicate start and end addresses. 4.7 IMPLEMENTED, REQUIRED Used to indicate that certain symbols are either required or implemented. Intended to be used when loading shared libraries. These directives may be removed in the future. 4.8 UNRESOLVED Used to indicate that a symbol in a .l0 file is unresolved, and the loader must find the address of a symbol that matches the corresponding name in a shared library. 4.9 STRING Used to specify 4 bytes of a string that is stored in little-endian. Used to keep track of symbol names inside of .l0 files, and helps resolve unresolved symbols. 4.10 REGION Used to indicate that a certain region in the program has certain properties (like permissions). Used to keep symbols with different permissions grouped together. 4.11 PERMISSION Used in conjunction with REGION directive to specify the read, write or execute permissions associated with a region of code. The sha1sum of the lines above is 9926c429b7ed7ec68db6ced5919f79f4ae96750a