This project aims to implement a compiler for my own language from the ground up for x86_64 machines running a Linux-based distro.
That includes implementing:
- the lexical analysis (lexer), syntax analysis (parsing), semantic analysis
- the assembler
- the linker
My main goals regarding this project are:
- Deepening my understanding of the x86_64 architecture, including the instruction set (ISA), CPU registers, memory management, and low-level execution flow.
- Exploring Linux OS internals, specifically how the operating system handles system calls, manages memory, and interacts with compiled machine code.
- Understanding the Executable and Linkable Format (ELF) by generating valid executable headers, data/text sections, and segments entirely from scratch.
- Understanding the structure of object files, including how symbol tables are built, how relocation entries work, and the exact mechanics of linking multiple files together.
- Learning Rust in a low-level systems programming context, utilizing its famous memory safety techniques for building compiler infrastructure.
I chose to build the project in reverse order, from its lowest level to the highest. Thus, the first step was building an assembler aimed for the x86_64 Intel syntax.
The assembler aims to generate the binary code for a custom object file format. I deliberately chose to use a simplified object format first, rather than jumping straight into the highly complex ELF standard. This approach allows me to isolate and deeply understand the core mechanics of machine code generation and object file structuring.
Once the assembler is fully functional, the next immediate milestone is building the linker. The linker will be responsible for:
- Parsing these custom object files.
- Resolving external and internal symbols.
- Performing all necessary memory relocations.
- Generating and appending the correct execution headers.
The format of the compiled language is yet to be decided, currently I am working on implementing the working assembler + linker.