Project Overview
Yanki is a complete compiler implementation that demonstrates the entire compilation pipeline from source code to executable machine code. Built with a clean, modular architecture, it showcases fundamental compiler construction principles while leveraging LLVM's powerful optimization capabilities.
Core Architecture
The compiler follows a traditional three-phase design:
- Frontend: Lexical analysis and syntax parsing
- Middle-end: AST construction and transformation
- Backend: LLVM IR generation and machine code optimization
Compiler Pipeline
Lexical Analysis
- Tokenization: Converts raw source code into meaningful tokens
- Token Recognition: Identifies keywords, operators, literals, and identifiers
- Error Handling: Provides detailed feedback for invalid tokens
Syntax Analysis
- Recursive Descent Parser: Constructs Abstract Syntax Tree from token stream
- Grammar Validation: Ensures syntactic correctness of input programs
- AST Generation: Creates structured representation for further processing
Code Generation
- AST Transformation: Converts syntax tree into LLVM Intermediate Representation
- LLVM Backend: Leverages LLVM's optimization passes and target-specific code generation
- Machine Code Output: Produces optimized object files ready for linking
Technology Stack
Technology | Purpose |
---|---|
C++17 | Core compiler implementation and memory management |
LLVM 14+ | Intermediate representation and backend optimization |
CMake | Cross-platform build system and dependency management |
Visitor Pattern | AST traversal and manipulation |
Yanki Language Features
Syntax Examples
// Variable declarations and assignments
x: 42;
y: 3;
result: x + y;
// Complex expressions
complex: (x + y) * (z - 5);
nested: ((x + 2) * (y - 1)) / (z + 3);
// Output statements
show: result;
show: complex;
// Program termination
exit;
Supported Operations
- Arithmetic: Addition, subtraction, multiplication, division
- Variable Management: Declaration and assignment
- Expression Evaluation: Nested parenthetical expressions
- Output Control: Display variable values
- Program Flow: Structured termination
Development Tools & Features
Debugging Capabilities
- Token Inspection:
--tokens
flag to examine lexical analysis output - AST Visualization:
--ast-print
to display syntax tree structure - LLVM IR Output:
--llvm-ir-print
to inspect intermediate representation
Build System
- CMake Integration: Cross-platform build configuration
- Modular Structure: Clean separation of lexer, parser, and transformer components
- Extensible Design: Easy addition of new language features and optimizations
Repository & Links
- Source Code: Yanki Compiler on GitHub