Yanki

Jan 29, 2025

Project Overview

Yanki is a complete compiler implementation that demonstrates the entire compilation pipeline from source code to executable machine code. Built with a clean, modular architecture, it showcases fundamental compiler construction principles while leveraging LLVM's powerful optimization capabilities.

Core Architecture

The compiler follows a traditional three-phase design:

  1. Frontend: Lexical analysis and syntax parsing
  2. Middle-end: AST construction and transformation
  3. Backend: LLVM IR generation and machine code optimization

Compiler Pipeline

Lexical Analysis

  • Tokenization: Converts raw source code into meaningful tokens
  • Token Recognition: Identifies keywords, operators, literals, and identifiers
  • Error Handling: Provides detailed feedback for invalid tokens

Syntax Analysis

  • Recursive Descent Parser: Constructs Abstract Syntax Tree from token stream
  • Grammar Validation: Ensures syntactic correctness of input programs
  • AST Generation: Creates structured representation for further processing

Code Generation

  • AST Transformation: Converts syntax tree into LLVM Intermediate Representation
  • LLVM Backend: Leverages LLVM's optimization passes and target-specific code generation
  • Machine Code Output: Produces optimized object files ready for linking

Technology Stack

TechnologyPurpose
C++17Core compiler implementation and memory management
LLVM 14+Intermediate representation and backend optimization
CMakeCross-platform build system and dependency management
Visitor PatternAST traversal and manipulation

Yanki Language Features

Syntax Examples

// Variable declarations and assignments
x: 42;
y: 3;
result: x + y;

// Complex expressions
complex: (x + y) * (z - 5);
nested: ((x + 2) * (y - 1)) / (z + 3);

// Output statements
show: result;
show: complex;

// Program termination
exit;

Supported Operations

  • Arithmetic: Addition, subtraction, multiplication, division
  • Variable Management: Declaration and assignment
  • Expression Evaluation: Nested parenthetical expressions
  • Output Control: Display variable values
  • Program Flow: Structured termination

Development Tools & Features

Debugging Capabilities

  • Token Inspection: --tokens flag to examine lexical analysis output
  • AST Visualization: --ast-print to display syntax tree structure
  • LLVM IR Output: --llvm-ir-print to inspect intermediate representation

Build System

  • CMake Integration: Cross-platform build configuration
  • Modular Structure: Clean separation of lexer, parser, and transformer components
  • Extensible Design: Easy addition of new language features and optimizations
Ismail Drissi