A custom compiler implementation built using LLVM infrastructure. This project demonstrates the design and implementation of a compiler pipeline including lexical analysis, parsing, semantic analysis, intermediate representation (IR) generation, and native code generation using LLVM.
This project implements a full compiler for a custom-designed programming language. The compiler translates high-level source code into LLVM Intermediate Representation (IR), which can then be optimized and compiled into machine code using LLVM tools.
The purpose of this project is to:
- Understand compiler construction principles
- Implement a full compilation pipeline
- Integrate LLVM for backend code generation
- Produce executable machine code from custom language programs
The compiler follows a traditional multi-stage architecture:
Source Code
→ Lexical Analysis
→ Parsing
→ AST Construction
→ Semantic Analysis
→ LLVM IR Generation
→ Optimization
→ Machine Code Generation
Each stage is modular and separated logically in the codebase.
The lexer converts raw source code into tokens such as:
- Keywords
- Identifiers
- Literals
- Operators
- Delimiters
The parser processes tokens and builds an Abstract Syntax Tree (AST) according to the language grammar.
Supported constructs typically include:
- Variable declarations
- Arithmetic expressions
- Conditional statements
- Loops
- Function definitions
- Function calls
- Return statements
This stage ensures:
- Type correctness
- Scope validation
- Symbol table consistency
- Function signature matching
- Proper variable usage
The AST is translated into LLVM IR using LLVM APIs such as:
- LLVMContext
- Module
- IRBuilder
- Function
- BasicBlock
The generated IR can be:
- Printed as .ll file
- JIT executed
- Compiled into object code
The custom language includes:
- int
- float
- bool
- void
- Arithmetic: +, -, *, /
- Comparison: ==, !=, <, >, <=, >=
- Logical: &&, ||
- if / else
- while / for
- User-defined functions
- Parameters
- Return values
Example structure:
src/
lexer/
parser/
ast/
semantic/
codegen/
main.cpp
CMakeLists.txt
README.md
Module descriptions:
- lexer/ → Token generation
- parser/ → Syntax analysis
- ast/ → AST node definitions
- semantic/ → Symbol tables and type checking
- codegen/ → LLVM IR generation
- main.cpp → Compiler entry point
- LLVM (version 14+ recommended)
- CMake
- C++17 compatible compiler (clang++ or g++)
sudo apt install llvm clangmkdir build
cd build
cmake ..
makeCompile a source file:
./compiler input.myGenerate LLVM IR:
./compiler input.my -emit-llvmExecute using LLVM interpreter:
lli output.llCompile to native executable:
llc output.ll -filetype=obj
clang output.o -o program
./programExample source program:
int main() {
int a = 5;
int b = 10;
return a + b;
}
The compiler generates LLVM IR including:
- Function definition
- Alloca instructions
- Load/store instructions
- Add instruction
- Return instruction
LLVM allows multiple optimization passes such as:
- Constant propagation
- Dead code elimination
- Loop optimizations
- Instruction combining
Optimization example:
opt -O2 output.ll -o optimized.llThe compiler includes:
- Syntax error reporting with line numbers
- Type mismatch detection
- Undeclared variable errors
- Function signature mismatch errors
These diagnostics help users debug source programs effectively.
- Add arrays and structs
- Add object-oriented features
- Improve error recovery
- Add unit testing framework
- Implement REPL mode
- Improve optimization pipeline
- Add JIT execution mode
Mahan Baneshi _ Compiler Design Project implemented using LLVM infrastructure.