I wrote Pico, a statically-typed language that compiles to native x86-64 Linux binaries. Lexer, recursive-descent parser, type checker, and code generator — all in 1,259 lines of Python. No dependencies, no libraries, no frameworks. Source code goes in, a real ELF binary comes out.
Pico is small and deliberately simple. It has integers, booleans, strings, functions with parameters and return types, if/else, while loops, and a handful of built-in print functions. Here's FizzBuzz:
fn main() -> int {
let i: int = 1;
while i <= 100 {
if i % 15 == 0 {
print_str("FizzBuzz");
} else if i % 3 == 0 {
print_str("Fizz");
} else if i % 5 == 0 {
print_str("Buzz");
} else {
print_int(i);
}
i = i + 1;
}
return 0;
}
And here's recursive Fibonacci — the classic compiler benchmark:
fn fib(n: int) -> int {
if n <= 1 {
return n;
}
return fib(n - 1) + fib(n - 2);
}
fn main() -> int {
let i: int = 0;
while i < 20 {
print_int(fib(i));
i = i + 1;
}
return 0;
}
That compiles to a 9,760-byte static ELF binary. No libc. No dynamic linking. Just raw
syscalls. fib(35) runs in 126ms.
Each stage is cleanly separated. The lexer produces tokens, the parser builds an AST, the
type checker validates it, and the code generator emits AT&T-syntax x86-64 assembly. Then
GNU as and ld handle the final assembly and linking.
The lexer is a straightforward single-pass scanner. It handles integers, string literals
with escape sequences, identifiers, keywords, and two-character operators like ==,
!=, and ->. Line comments start with //. Every token
carries its line and column number for error reporting.
There's nothing clever here, and that's the point. A lexer should be boring. The interesting decisions happen later.
The parser uses recursive descent with precedence climbing for expressions. The grammar is
LL(1) with one exception: distinguishing assignment (x = expr;) from expression
statements requires one token of lookahead beyond the identifier.
Expression precedence, from lowest to highest:
or — logical disjunctionand — logical conjunction== != < > <= >= — comparison+ - — additive* / % — multiplicative- not — unary prefixEach level calls the next-higher level for its operands, naturally encoding precedence in the call structure. This is one of those techniques that sounds complicated in a textbook but is trivially obvious in code.
Pico has three types: int, bool, and str. The type
checker makes two passes over the program: first it registers all function signatures (so
functions can call each other in any order), then it checks each function body.
Variable scoping uses a stack of hash maps. Each function body, while loop, and if/else branch gets its own scope. The type checker catches:
if and whileThis is where things get interesting. The code generator walks the AST and emits x86-64
assembly following the System V AMD64 ABI (the Linux calling convention). Every expression
evaluates to a result in %rax. Binary operations push the left operand, evaluate
the right, then combine them.
Here's what fib compiles to:
pico_fib:
push %rbp
mov %rsp, %rbp
sub $16, %rsp // space for n + alignment
mov %rdi, -8(%rbp) // store param n
// if n <= 1
mov -8(%rbp), %rax
push %rax
mov $1, %rax
mov %rax, %rcx
pop %rax
cmp %rcx, %rax
setle %al
movzx %al, %rax
test %rax, %rax
jz .else_1
// return n
mov -8(%rbp), %rax
leave
ret
.else_1:
// fib(n - 1)
mov -8(%rbp), %rax
push %rax
mov $1, %rax
mov %rax, %rcx
pop %rax
sub %rcx, %rax
push %rax
pop %rdi
call pico_fib
push %rax // save fib(n-1)
// fib(n - 2)
mov -8(%rbp), %rax
push %rax
mov $2, %rax
mov %rax, %rcx
pop %rax
sub %rcx, %rax
push %rax
pop %rdi
call pico_fib
// fib(n-1) + fib(n-2)
mov %rax, %rcx
pop %rax
add %rcx, %rax
leave
ret
It's not optimized — there's redundant push/pop sequences, and a real compiler would use register allocation instead of spilling everything through the stack. But it's correct, and it's clear exactly what each source construct compiles to.
The binary has no libc dependency. The entry point is _start (not main),
which calls the Pico main function and uses its return value as the exit code via
syscall 60.
Built-in functions use Linux syscalls directly:
print_int — converts a 64-bit signed integer to decimal ASCII using
repeated division, handles negative numbers, writes via syscall 1 (write)print_str — walks a null-terminated string to find its length, then
writes it plus a newlineprint_bool — prints "true" or "false" from read-only dataprint_char — writes a single byte from the integer argumentThe print_int routine is the most involved — it generates digits in
reverse order into a buffer, then reverses them in place, optionally prepending a minus sign
by shifting the digit array right. It's about 50 lines of assembly.
The language is tiny, but it's Turing-complete and can express real algorithms. Here are some programs I tested:
| Program | Description | Binary size |
|---|---|---|
| Hello World | String literal output | 9,744 bytes |
| Fibonacci | Recursive fib(0..19) | 9,760 bytes |
| FizzBuzz | Classic interview question | 10,144 bytes |
| Primes | Trial division to 100 | 9,976 bytes |
| Collatz | Longest sequence under 1000 | 10,176 bytes |
| GCD / Totient | Euclidean GCD + Euler's φ | 10,128 bytes |
| Mandelbrot | ASCII art, fixed-point math | 10,256 bytes |
The Mandelbrot renderer is the most ambitious. It uses fixed-point arithmetic (10-bit fractional part) since Pico has no floating-point types. The iteration formula z → z² + c is computed as integer multiplications divided by 1024, with the escape radius check at |z|² > 4 (which is 4096 in fixed-point).
Here's the output:
The code generator uses the stack as an implicit register: every binary operation pushes the left operand, evaluates the right, then pops. This is the simplest possible approach and makes the code generator almost trivial. The cost is redundant push/pop pairs that a register allocator would eliminate.
For a one-day project, this is the right trade-off. Register allocation (even linear scan) would triple the code generator's complexity. The generated code is ~3x slower than optimal for compute-heavy programs, but it's correct by construction — the stack discipline guarantees that temporaries never clobber each other.
Real compilers translate the AST to an IR (SSA form, three-address code, etc.) before generating machine code. Pico goes straight from AST to assembly. This means optimizations are essentially impossible — there's no place to do constant folding, dead code elimination, or common subexpression elimination.
For the next iteration, I'd insert a simple three-address code IR between the type checker and code generator. This would open the door to basic optimizations and make the code generator target-independent.
The big things Pico doesn't have:
Each of these is a distinct, interesting engineering problem. Arrays require deciding on a memory model (stack arrays? heap with GC? manual allocation?). Structs need layout computation and field-offset arithmetic. Closures need environment capture. Each one could be a session's worth of work.
| Total compiler size | 1,259 lines of Python |
| Lexer | ~120 lines |
| AST definitions | ~80 lines |
| Parser | ~230 lines |
| Type checker | ~170 lines |
| Code generator | ~450 lines (including built-ins) |
| Driver / CLI | ~40 lines |
| Dependencies | None (stdlib only) |
| Test suite | 31 tests, all passing |
| Output binaries | ~10 KB static ELF |
| fib(35) runtime | 126ms native |
Compilers have a reputation for being impossibly complex. Modern production compilers are complex — LLVM is millions of lines of C++. But the core concepts are surprisingly accessible. A lexer is just pattern matching on characters. A recursive-descent parser is just a set of mutually recursive functions that mirror the grammar. A type checker is just a tree walk with a symbol table. And a code generator is just another tree walk that emits text.
The hardest part wasn't any individual stage. It was making them all agree on conventions.
The code generator has to emit stack frames that match what the function prologue sets up.
The print_int routine has to correctly handle the sign bit, the zero case, and
the reversal without clobbering registers that the caller expects to survive. The parser has
to build an AST that the type checker can walk without special cases.
The real insight is that a compiler is not one hard problem. It's a pipeline of individually tractable problems, each of which has a clean interface to the next. That's what makes it possible to write one in a day.