|
| 1 | +# Little Functional Language Compiler |
| 2 | + |
| 3 | +[](https://circleci.com/gh/andycraig/functional-compiler/tree/master) |
| 4 | + |
| 5 | +A compiler for LFL ('Little Functional Language'), an original functional programming language. LFL has first-class functions and closures and supports recursion, and has Lisp-style syntax. The compiler is hand-written in C and Assembly. |
| 6 | + |
| 7 | +I created LFL as a learning exercise, after wondering how anonymous functions might be implemented. |
| 8 | + |
| 9 | +[Overview](#overview) | [Building](#building) | [Usage example](#usage-example) | [Feature showcase](#feature-showcase) | [Keywords](#keywords) | [Built-in functions](#built-in-functions) | [Limitations](#limitations) | [References](#references) | [Testing](#testing) |
| 10 | + |
| 11 | +## Overview |
| 12 | + |
| 13 | +The compiler takes in a file of LFL code, processes it and outputs the corresponding Assembly code. |
| 14 | + |
| 15 | +Here's an example of an LFL program, which creates an anonymous function that adds 1 to its argument, creates an alias `inc` for that function, and applies it to 1 to yield 2: |
| 16 | + |
| 17 | +``` |
| 18 | +(let inc (λ x (plus x 1)) |
| 19 | + (inc 1)) |
| 20 | +``` |
| 21 | + |
| 22 | +The compiler converts this to: |
| 23 | + |
| 24 | +``` |
| 25 | +; Assembly code generated by compiler |
| 26 | + global main |
| 27 | + extern printf, malloc ; C functions |
| 28 | + extern make_closure, call_closure ; built-in functions |
| 29 | + extern plus, minus, equals ; standard library functions |
| 30 | +
|
| 31 | + section .text |
| 32 | +_f0: |
| 33 | + push rbp |
| 34 | + mov rbp, rsp |
| 35 | + sub rsp, 40 ; memory for local variables |
| 36 | + mov rax, rdi ; pointer to vector of arguments |
| 37 | +
|
| 38 | +... ET CETERA |
| 39 | +
|
| 40 | + call printf |
| 41 | + mov rax, 0 ; exit code 0 |
| 42 | + leave |
| 43 | + ret |
| 44 | +
|
| 45 | + section .data |
| 46 | +message: db "%d", 10, 0 ; 10 is newline, 0 is end-of-string |
| 47 | +``` |
| 48 | + |
| 49 | +The steps the compiler goes through are: |
| 50 | + |
| 51 | +1. **Tokenising:** The text file of LFL code is read in and converted into a list of symbols. |
| 52 | +2. **Parsing:** An Abstract Syntax Tree (AST) of function calls, constants and variable names is generated from the list of symbols. |
| 53 | +3. **Processing special forms:** Nodes in the AST with keywords (e.g., `λ`/`lambda`, `if`) are converted into special AST nodes. |
| 54 | +4. **Processing lambdas:** The bodies of lambda expressions in the AST are pulled up to the global level and named, and the lambda expressions themselves are replaced with calls to a special function `make_closure`. |
| 55 | +5. **Emitting Assembly code:** The AST is traversed, and at each node the corresponding Assembly code is written to the output file. |
| 56 | + |
| 57 | +## Building |
| 58 | + |
| 59 | +Build with CMake (requires NASM to build `libstandard.a`): |
| 60 | + |
| 61 | +``` |
| 62 | +mkdir build |
| 63 | +cd build |
| 64 | +cmake .. |
| 65 | +make # Creates binary 'bin/compile' and static libraries 'lib/libclosure.a', 'lib/libstandard.a' |
| 66 | +``` |
| 67 | + |
| 68 | +## Usage example |
| 69 | + |
| 70 | +Let's use [`examples/example.code`](examples/example.code). Compile it to Assembly with: |
| 71 | + |
| 72 | +``` |
| 73 | +bin/compile examples/example.code example.asm |
| 74 | +``` |
| 75 | + |
| 76 | +It produces a file `example.asm` of Assembly. To turn this Assembly code into something that can be run, it needs to be further processed with the NASM Assembly compiler and linked: |
| 77 | + |
| 78 | +``` |
| 79 | +nasm -f elf64 example.asm |
| 80 | +gcc -no-pie -o example example.o lib/libclosure.a lib/libstandard.a |
| 81 | +``` |
| 82 | + |
| 83 | +Now we can run it: |
| 84 | + |
| 85 | +``` |
| 86 | +./example # Should print '2' |
| 87 | +``` |
| 88 | + |
| 89 | +## Feature showcase |
| 90 | + |
| 91 | +Here's [an example program](examples/example_first_class.code) that shows closures and first-class functions in action: |
| 92 | + |
| 93 | +``` |
| 94 | +(let make-adder |
| 95 | + (λ x ; A function that returns a function |
| 96 | + (λ y (plus x y))) ; Creates a closure over x |
| 97 | + (let add5 (make-adder 5) |
| 98 | + (add5 3))) ; Prints '8' |
| 99 | +``` |
| 100 | + |
| 101 | +[This program](examples/example_mult.code) uses recursion to do integer multiplication: |
| 102 | + |
| 103 | +``` |
| 104 | +(defrec mult_aux ; Use 'defrec' to define a recursive function |
| 105 | + (lambda x y acc |
| 106 | + (if |
| 107 | + (equals x 1) |
| 108 | + acc |
| 109 | + (mult_aux (minus x 1) y (plus acc y))))) ; Recursion happens here |
| 110 | +
|
| 111 | +(def mult ; Wrapper for recursive function |
| 112 | + (lambda x y (mult_aux x y y))) |
| 113 | + |
| 114 | +(mult 3 4) ; Prints '12' |
| 115 | +``` |
| 116 | + |
| 117 | +## Keywords |
| 118 | + |
| 119 | +### `λ` (aka `lambda`) |
| 120 | + |
| 121 | +Syntax is `(λ ARG_1 ... ARG_N RESULT)`. Example: |
| 122 | + |
| 123 | +``` |
| 124 | +(λ x y (plus x y)) ; Just adds x and y |
| 125 | +``` |
| 126 | + |
| 127 | +### `if` |
| 128 | + |
| 129 | +Syntax is `(if PREDICATE TRUE-CASE FALSE-CASE)`. Example: |
| 130 | + |
| 131 | +``` |
| 132 | +(if (equals x 0) 0 (plus x 1)) ; Adds 1 to x unless it was 0 |
| 133 | +``` |
| 134 | + |
| 135 | +### `let`/`letrec` |
| 136 | + |
| 137 | +Syntax is `(let ARG DEFINITION BODY)`. Example: |
| 138 | + |
| 139 | +``` |
| 140 | +(let x 7 |
| 141 | + (plus x 1)) ; Prints '8' |
| 142 | +``` |
| 143 | + |
| 144 | +`letrec` must be used when creating a recursive function. |
| 145 | + |
| 146 | +### `def`/`defrec` |
| 147 | + |
| 148 | +Syntactic sugar for a `let/letrec` wrapped around the last expression in the file. Example: |
| 149 | + |
| 150 | +``` |
| 151 | +(def double (λ x (plus x x ))) |
| 152 | +
|
| 153 | +(double 2) ; Prints '4' |
| 154 | +``` |
| 155 | + |
| 156 | +## Built-in functions |
| 157 | + |
| 158 | +[`standard.asm`](src/standard.asm) defines the following functions: |
| 159 | + |
| 160 | +- `plus` (e.g.: `(plus 3 2)`, which returns 5) |
| 161 | +- `minus` (e.g.: `(minus 3 2)`, which returns 1) |
| 162 | +- `equals` (e.g.: `(equals 3 3)`, which returns 1) |
| 163 | + |
| 164 | +... and that's it. |
| 165 | + |
| 166 | +## Limitations |
| 167 | + |
| 168 | +- Functions can have up to four arguments, and up to three of these can be free variables captured by closures. (These free variables include other functions produced by `let`/`letrec`/`def`/`defrec`.) This restriction is because the compiler uses the fastcall calling convention, which requires using named registers for the first few arguments and the stack after that, and I didn't implement passing arguments using the stack. |
| 169 | +- Integers (and functions) are the only data types. No floats, no strings, no lists ... You name it, it's not implemented. |
| 170 | +- Register use is about as inefficient as it could be: registers other than `rax` are almost unused, except when passing arguments. |
| 171 | +- No way of getting input from the user. |
| 172 | +- Only form of output beyond the automatic printing of the last expression. |
| 173 | +- Rampant memory leaks (both the compiler and the Assembly code it outputs). |
| 174 | +- No run-time checking to verify that calls are made only on functions. |
| 175 | + |
| 176 | +## References |
| 177 | + |
| 178 | +I found these resources useful when learning how to implement closures: |
| 179 | + |
| 180 | +- [Closure conversion: How to compile lambda](http://matt.might.net/articles/closure-conversion/) |
| 181 | +- [Lecture 11: First-class Functions](https://course.ccs.neu.edu/cs4410/lec_lambdas_notes.html) of Northeastern University's course CS 4410/6410: Compiler Design |
| 182 | + |
| 183 | +## Testing |
| 184 | + |
| 185 | +That the [code examples](examples) produce the expected results can be verified by installing the [Bash Automated Testing System (BATS)](https://github.com/bats-core/bats-core) and running |
| 186 | + |
| 187 | +``` |
| 188 | +bats test/test.bats |
| 189 | +``` |
0 commit comments