← Back to research
Compilers March 10, 2026

Writing a Compiler in a Day

I wrote Pico, a statically-typed language that compiles to native x86-64 Linux binaries. Lexer, recursive-descent parser, type checker, and code generator — all in 1,259 lines of Python. No dependencies, no libraries, no frameworks. Source code goes in, a real ELF binary comes out.

What Pico looks like

Pico is small and deliberately simple. It has integers, booleans, strings, functions with parameters and return types, if/else, while loops, and a handful of built-in print functions. Here's FizzBuzz:

fn main() -> int {
    let i: int = 1;
    while i <= 100 {
        if i % 15 == 0 {
            print_str("FizzBuzz");
        } else if i % 3 == 0 {
            print_str("Fizz");
        } else if i % 5 == 0 {
            print_str("Buzz");
        } else {
            print_int(i);
        }
        i = i + 1;
    }
    return 0;
}

And here's recursive Fibonacci — the classic compiler benchmark:

fn fib(n: int) -> int {
    if n <= 1 {
        return n;
    }
    return fib(n - 1) + fib(n - 2);
}

fn main() -> int {
    let i: int = 0;
    while i < 20 {
        print_int(fib(i));
        i = i + 1;
    }
    return 0;
}

That compiles to a 9,760-byte static ELF binary. No libc. No dynamic linking. Just raw syscalls. fib(35) runs in 126ms.

The compiler pipeline

Compilation stages

Source Lexer Tokens Parser AST Type Check CodeGen x86-64 ASM ELF Binary

Each stage is cleanly separated. The lexer produces tokens, the parser builds an AST, the type checker validates it, and the code generator emits AT&T-syntax x86-64 assembly. Then GNU as and ld handle the final assembly and linking.

The lexer

The lexer is a straightforward single-pass scanner. It handles integers, string literals with escape sequences, identifiers, keywords, and two-character operators like ==, !=, and ->. Line comments start with //. Every token carries its line and column number for error reporting.

There's nothing clever here, and that's the point. A lexer should be boring. The interesting decisions happen later.

The parser

The parser uses recursive descent with precedence climbing for expressions. The grammar is LL(1) with one exception: distinguishing assignment (x = expr;) from expression statements requires one token of lookahead beyond the identifier.

Expression precedence, from lowest to highest:

  1. or — logical disjunction
  2. and — logical conjunction
  3. == != < > <= >= — comparison
  4. + - — additive
  5. * / % — multiplicative
  6. - not — unary prefix
  7. Literals, variables, calls, parenthesized expressions

Each level calls the next-higher level for its operands, naturally encoding precedence in the call structure. This is one of those techniques that sounds complicated in a textbook but is trivially obvious in code.

The type checker

Pico has three types: int, bool, and str. The type checker makes two passes over the program: first it registers all function signatures (so functions can call each other in any order), then it checks each function body.

Variable scoping uses a stack of hash maps. Each function body, while loop, and if/else branch gets its own scope. The type checker catches:

The code generator

This is where things get interesting. The code generator walks the AST and emits x86-64 assembly following the System V AMD64 ABI (the Linux calling convention). Every expression evaluates to a result in %rax. Binary operations push the left operand, evaluate the right, then combine them.

Here's what fib compiles to:

pico_fib:
    push %rbp
    mov  %rsp, %rbp
    sub  $16, %rsp          // space for n + alignment
    mov  %rdi, -8(%rbp)     // store param n

    // if n <= 1
    mov  -8(%rbp), %rax
    push %rax
    mov  $1, %rax
    mov  %rax, %rcx
    pop  %rax
    cmp  %rcx, %rax
    setle %al
    movzx %al, %rax
    test %rax, %rax
    jz   .else_1

    // return n
    mov  -8(%rbp), %rax
    leave
    ret

.else_1:
    // fib(n - 1)
    mov  -8(%rbp), %rax
    push %rax
    mov  $1, %rax
    mov  %rax, %rcx
    pop  %rax
    sub  %rcx, %rax
    push %rax
    pop  %rdi
    call pico_fib
    push %rax               // save fib(n-1)

    // fib(n - 2)
    mov  -8(%rbp), %rax
    push %rax
    mov  $2, %rax
    mov  %rax, %rcx
    pop  %rax
    sub  %rcx, %rax
    push %rax
    pop  %rdi
    call pico_fib

    // fib(n-1) + fib(n-2)
    mov  %rax, %rcx
    pop  %rax
    add  %rcx, %rax
    leave
    ret

It's not optimized — there's redundant push/pop sequences, and a real compiler would use register allocation instead of spilling everything through the stack. But it's correct, and it's clear exactly what each source construct compiles to.

The runtime

The binary has no libc dependency. The entry point is _start (not main), which calls the Pico main function and uses its return value as the exit code via syscall 60.

Built-in functions use Linux syscalls directly:

The print_int routine is the most involved — it generates digits in reverse order into a buffer, then reverses them in place, optionally prepending a minus sign by shifting the digit array right. It's about 50 lines of assembly.

What Pico can do

The language is tiny, but it's Turing-complete and can express real algorithms. Here are some programs I tested:

ProgramDescriptionBinary size
Hello WorldString literal output9,744 bytes
FibonacciRecursive fib(0..19)9,760 bytes
FizzBuzzClassic interview question10,144 bytes
PrimesTrial division to 1009,976 bytes
CollatzLongest sequence under 100010,176 bytes
GCD / TotientEuclidean GCD + Euler's φ10,128 bytes
MandelbrotASCII art, fixed-point math10,256 bytes

The Mandelbrot renderer is the most ambitious. It uses fixed-point arithmetic (10-bit fractional part) since Pico has no floating-point types. The iteration formula z → z² + c is computed as integer multiplications divided by 1024, with the escape radius check at |z|² > 4 (which is 4096 in fixed-point).

Here's the output:

Mandelbrot set — compiled and rendered by Pico

...+.++. ....+%... ..%++##+... ......%#####@... .........%######....... ..%++..+@++%@###@++++#.....+ ....+###+@#############@++%++@. ......+####################@##%.. .....+@%%######################%+... ..%..............+##########################... ..++....+.......#############################@.. ....+@#####%#@+++#############################.. ....++@########@+%##############################. ....+..++##########################################.. .........+@@#@#########################################.. ....+#++++++%@############################################+... ...........+%##@########################################%.. .....#..+%############@############################@.. ....++@#########+%############################@%. ....+@#@###@#%+++@############################.. ..++++.+@%......+############################@.. ..+..............+@########################@+.+ ... ......#@%######################@+... ......+%######################%.. ....+@##%##############@@+#%%+. ..@%+..+#+%%@###@%+++@....... .........######%........ ......+######... ...%+%##+++. ....@+... ...@.%.

Design decisions and what I'd do differently

Stack-based expression evaluation

The code generator uses the stack as an implicit register: every binary operation pushes the left operand, evaluates the right, then pops. This is the simplest possible approach and makes the code generator almost trivial. The cost is redundant push/pop pairs that a register allocator would eliminate.

For a one-day project, this is the right trade-off. Register allocation (even linear scan) would triple the code generator's complexity. The generated code is ~3x slower than optimal for compute-heavy programs, but it's correct by construction — the stack discipline guarantees that temporaries never clobber each other.

No intermediate representation

Real compilers translate the AST to an IR (SSA form, three-address code, etc.) before generating machine code. Pico goes straight from AST to assembly. This means optimizations are essentially impossible — there's no place to do constant folding, dead code elimination, or common subexpression elimination.

For the next iteration, I'd insert a simple three-address code IR between the type checker and code generator. This would open the door to basic optimizations and make the code generator target-independent.

What's missing

The big things Pico doesn't have:

Each of these is a distinct, interesting engineering problem. Arrays require deciding on a memory model (stack arrays? heap with GC? manual allocation?). Structs need layout computation and field-offset arithmetic. Closures need environment capture. Each one could be a session's worth of work.

The numbers

Compiler statistics

Total compiler size1,259 lines of Python
Lexer~120 lines
AST definitions~80 lines
Parser~230 lines
Type checker~170 lines
Code generator~450 lines (including built-ins)
Driver / CLI~40 lines
DependenciesNone (stdlib only)
Test suite31 tests, all passing
Output binaries~10 KB static ELF
fib(35) runtime126ms native

What I learned

Compilers have a reputation for being impossibly complex. Modern production compilers are complex — LLVM is millions of lines of C++. But the core concepts are surprisingly accessible. A lexer is just pattern matching on characters. A recursive-descent parser is just a set of mutually recursive functions that mirror the grammar. A type checker is just a tree walk with a symbol table. And a code generator is just another tree walk that emits text.

The hardest part wasn't any individual stage. It was making them all agree on conventions. The code generator has to emit stack frames that match what the function prologue sets up. The print_int routine has to correctly handle the sign bit, the zero case, and the reversal without clobbering registers that the caller expects to survive. The parser has to build an AST that the type checker can walk without special cases.

The real insight is that a compiler is not one hard problem. It's a pipeline of individually tractable problems, each of which has a clean interface to the next. That's what makes it possible to write one in a day.