A modern C++20 implementation that compiles Pascal source code to bytecode and executes it via a stack-based virtual machine.
This project implements a complete Pascal compiler and stack-based virtual machine. The compiler translates Pascal source code into custom bytecode, which is then executed by a custom VM. The implementation follows modern C++20 practices with a focus on separation of concerns, and educational value.
Current Version: 0.1.0
License: MIT License (see LICENSE.txt)
- Modular Architecture: Compiler pipeline divided into distinct phases (Lexing, Parsing, Semantics, Code Generation)
- Type Safety: Comprehensive type checking and semantic analysis before code generation
- Modern C++20: Leveraging latest C++ features for cleaner, safer code
- Educational Focus: Clear structure suitable for learning compiler design and implementation
- Testing: Comprehensive unit tests for both compiler and VM components
- Int: Integer type for arithmetic operations
- Real: Floating-point numbers for precise calculations
- Char: Single character values
- Bool: Boolean values (true/false)
- String: String literals for text processing
- Arrays: Multi-dimensional arrays with flexible indexing types
- Records: Structured data types with named fields
- Subranges: Bounded ranges based on ordinal types
- Enumerations: User-defined ordinal types with named values
- if/else: Conditional execution
- case statements: Multi-way branching with constant labels
- while loops: Pre-condition iteration
- repeat/until loops: Post-condition iteration
- for loops: Counted iteration (with
toanddownto) - goto with labels: Unconditional jumps for advanced control flow
- Support for both functions (returning values) and procedures
- By-value and by-reference (var) parameters (by values only at the moment)
- Nested function definitions
- Forward declarations
- read(): Input for variables of various types
- write(): Output for expressions and literals
- Program header with identifier
- Declaration blocks: labels, constants, types, variables
- Function and procedure definitions
- Main statement block
- Modern C++20 compilation with CMake build system
- Comprehensive type checking with detailed error messages
- Symbol table management for scoping and name resolution
- Visitor pattern for AST traversal and code generation
- Bytecode generation from AST
- Stack-based virtual machine with 45+ opcodes
- Unit testing with GoogleTest framework
- Error reporting with line and column information
pascal/
├── compiler/
│ ├── src/
│ │ ├── Lexer.cpp/hpp # Tokenization and token definitions
│ │ ├── Parser.cpp/hpp # Recursive descent parser
│ │ ├── Ast.cpp/hpp # Abstract syntax tree nodes with validation
│ │ ├── Semantics.cpp/hpp # Type checking and symbol management
│ │ ├── Generator.cpp/hpp # Bytecode code generation
│ │ ├── ValidationUtils.hpp
│ │ ├── Visitor.hpp # Visitor pattern interfaces
│ │ └── Main.cpp # Compiler CLI entry point
│ └── tests/ # Compiler unit tests
├── vm/
│ ├── src/
│ │ ├── vm.cpp/hpp # Virtual machine implementation
│ │ └── Main.cpp # VM CLI entry point
│ └── tests/ # VM unit tests
├── tests/ # Example Pascal programs
│ ├── hello.pas # User input with record types
│ ├── calc.pas # Interactive calculator
│ ├── fib.pas # Recursive Fibonacci
│ └── adv_calc.pas # Expression parser
├── CMakeLists.txt # Build configuration
├── gram.txt # Pascal grammar reference
└── LICENSE.txt # MIT license
Component Descriptions:
- compiler/src/Lexer: Tokenizes source code into lexical tokens with position tracking
- compiler/src/Parser: Recursive descent parser that builds Abstract Syntax Tree from tokens
- compiler/src/Ast: Defines all AST node types representing Pascal expressions/statements
- compiler/src/Semantics: Performs type checking, symbol table construction, and semantic validation
- compiler/src/Generator: Transforms AST into bytecode instructions using visitor pattern
- compiler/src/Visitor: Visitor interfaces for AST traversal and code generation
- vm/src/vm: Stack-based virtual machine that executes bytecode instructions
- gram.txt: Complete Pascal grammar documentation for reference
- tests/: Example Pascal programs demonstrating various language features
┌─────────────────────────────────────────────────────────────┐
│ Pascal Source Code │
│ (program.pas) │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Lexer │
│ • Character stream → Token stream │
│ • Token: type, value, position │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Parser │
│ • Recursive descent parser │
│ • Tokens → Abstract Syntax Tree (AST) │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Semantics Analysis │
│ • Symbol table construction │
│ • Type checking and validation │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Code Generator │
│ • AST → Bytecode instructions │
│ • Instruction: opcode + immediate values │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Bytecode File │
│ (.bin format) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Bytecode (.bin) │
└──────────────────────┬──────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Program Counter (PC) ││
│ ───────►│ ◄─────────│
│ ││
│ Frame Pointer (FP) ││
│ ───────►│ ◄─────────│
│ ││
│ ┌────────────────────────────────┐ ││
│ │ Runtime Stack │ ││
│ │ ┌──────────────────────────┐ │ ││
│ │ │ Return Address │ │ ││
│ │ ├──────────────────────────┤ │ ││
│ │ │ Old Frame Pointer │ │ ││
│ │ ├──────────────────────────┤ │ ││
│ │ │ Local Variables │ │ ││
│ │ │ (function parameters) │ │ ││
│ │ └──────────────────────────┘ │ ││
│ └────────────────────────────────┘ ││
│ ││
│ ┌────────────────────────────────┐ ││
│ │ Opcode Decoder & Executor │ ││
│ └────────────────────────────────┘ ││
└─────────────────────────────────────────────────────────────┘
Opcode Categories:
├── Stack Operations: PUSH_Q, PUSH_B, POP_Q, POP_B
├── Load/Store: LOAD_Q, LOAD_B, STORE_Q, STORE_B
├── Arithmetic: ADD_I, MUL_I, DIV_I, ADD_R, MUL_R, DIV_R
├── Comparison: CMP_I, CMP_R, CMP_C, LE, LT, EQ, NE, GT, GE
├── Logic: AND, OR, NOT
├── Control: JMP, JMP_TRUE, JMP_FALSE, CALL, RET
├── I/O: READ_I, WRITE_I, WRITE_CONST_S
└── Utility: HALT, DUPL_Q, DUPL_B, MODSTK
- Visitor Pattern: Used for AST traversal across Expression, Statement, and Selector(field for records and element for arrays) visitors
- Recursive Descent Parser: Grammar rules directly mapped to parsing methods
- RAII: Extensive use of unique_ptr/shared_ptr for resource management (with const PTR* for non-owning processes)
- Single Responsibility: Each component (Lexer/Parser/Semantics/Generator) handles a specific phase
- Binary file format with 8-byte size header
- Sequential instruction encoding without alignment
- Opcode format:
- Opcodes: 1 byte
- Immediate values: 1 or 8 bytes appended inline
- String literals: Embedded directly with null termination
- CMake: Version 3.10.0 or higher
- C++ Compiler: GCC 10+, Clang 12+, or MSVC 2019+ with C++20 support
- Git: For cloning the repository
# Clone the repository
git clone <repository-url>
cd pascal
# Create build directory
mkdir build
cd build
# Configure and build
cmake ..
cmake --build .- compiler: Main compiler executable for compiling Pascal source files
- vm: Virtual machine executable for running bytecode files
- unit_tests_compiler: Compiler unit tests (GoogleTest based)
- unit_tests_vm: VM unit tests (GoogleTest based)
-
Basic compilation:
./compiler program.pas
Creates
program.binby default -
Custom output file:
./compiler program.pas -o output.bin
-
Error reporting: Compiler provides detailed errors with line and column information
-
Execute bytecode:
./vm program.bin
-
File naming: VM accepts filename with or without
.binextension
- Write Pascal code in
my_program.pas - Compile to bytecode:
./compiler my_program.pas
- Run the compiled program:
./vm my_program.bin
- Troubleshooting: Check compiler output for syntax/semantic errors
-
Compiler:
- Expects 1 or 3 arguments: source file (with optional
-o output_file) - Provides usage hints on error
- Auto-adds
.pasextension if not provided
- Expects 1 or 3 arguments: source file (with optional
-
VM:
- Expects exactly 1 argument: bytecode file
- Auto-adds
.binextension if not provided
Demonstrates string input via read(), record types with fields, and validation loops with repeat/until.
Features:
- Program structure with
constdeclarations varwithrecordcontainingarrayand fields- User I/O with prompts and data reading
- Input validation with boolean logic
Run:
./compiler tests/hello.pas
./vm tests/hello.binInteractive calculator supporting arithmetic operations: +, -, *, /.
Features:
- Function definitions with boolean returns
whileloops for user interactioncasestatements for operation selection- Boolean logic with logical operators
Run:
./compiler tests/calc.pas
./vm tests/calc.binRecursive Fibonacci function accepting input from 0-20.
Features:
- Function recursion
- Integer validation with range checks
- Function calls within expressions
- Mathematical computation
Run:
./compiler tests/fib.pas
./vm tests/fib.binMathematical expression parser supporting integers, parentheses, and operator precedence.
Features:
- Arrays for character-to-integer mapping (Should support type-conversion sometime in the future)
- Forward declarations for functions
- Nested functions and procedures
- Comprehensive expression evaluation with precedence
- Lexing and parsing of mathematical expressions
- Infix-to-postfix conversion implementation
Run:
./compiler tests/adv_calc.pas
./vm tests/adv_calc.binEach example demonstrates specific language constructs and patterns for Pascal programming.
-
Compiler tests only:
./unit_tests_compiler
-
VM tests only:
./unit_tests_vm
- Lexer: Token generation and error handling, identifier recognition, literal parsing
- Parser: AST construction validation, grammar rule compliance, error detection
- Semantics: Type checking and scope management, symbol table construction, error detection
- Opcode execution: Arithmetic, logic, comparison, stack operations
- Stack management: Push, pop, load, store operations
- Function call/return: Frame pointer handling, parameter passing
- I/O operations: Input/output for various types
Program Structure:
program := PROGRAM ID ';' block '.'
Declaration Sections:
- Labels:
LABEL (INT | ID) { ',' (INT | ID) } ';' - Constants:
CONST NAME '=' constant ';' { NAME '=' constant ';' } - Types:
TYPE NAME '=' type ';' { NAME '=' type ';' } - Variables:
VAR id_list ':' type ';' { id_list ':' type ';' }
Statement Types:
- Simple: assignment, procedure call, goto
- Structured: compound, conditional, repetitive
- Compound:
BEGIN statement_sequence END
Expression Syntax:
- Relational:
simple_expression [ relational_operator simple_expression ] - Simple:
[ '+' | '-' ] term { addition_operator term } - Terms:
factor { multiplication_operator factor } - Factors: literals, variables, function calls, parenthesized expressions, unary NOT
Type Categories:
- Basic types: integer types, char, boolean, real
- Structured types: arrays, records
- Subranges:
constant '..' constant - Enumerations:
'(' id_list ')'
- Integer:
ADD_I,SUB_I,MUL_I,DIV_I - Real:
ADD_R,SUB_R,MUL_R,DIV_R - Character:
ADD_C,SUB_C,MUL_C,DIV_C
PUSH_Q,PUSH_B,PUSH_FP,POP_Q,POP_BDUPL_Q,DUPL_B,MODSTK
LOAD_Q,LOAD_B,STORE_Q,STORE_B
CMP_I,CMP_R,CMP_C,LE,LT,EQ,NE,GT,GE
AND,OR,NOT
JMP,JMP_TRUE,JMP_FALSE,CALL,RET
READ_I,READ_R,READ_C,READ_B,READ_SWRITE_I,WRITE_R,WRITE_C,WRITE_B,WRITE_S,WRITE_CONST_S
C2I(char to int conversion),HALT
-
Enable debug output in VM: Edit
vm/src/CMakeLists.txtand setDEBUG_VM=1Rebuild and run with verbose output -
Low-level bytecode inspection: Use debugger (gdb...) to examine bytecode execution flow Compare expected vs actual instruction sequences
-
Validate parsing: Check
gram.txtfor correct grammar rules Add debug print statements in Parser for rule matching
To add new language features:
- Tokens: Add to
Lexer.hppinTOKEN_TYPEenum - Grammar: Implement parsing rules in
Parser.cpp - AST: Create node definitions in
Ast.hpp/cpp - Semantics: Add type checking in
Semantics.cpp - Generator: Emit bytecode in
Generator.cpp - Visitor: Implement visitor methods for AST node types
License: MIT License (c) 2025 Anass Serroukh
Full license text available in LICENSE.txt file.
This project implements a comprehensive Pascal compiler and virtual machine designed for educational purposes and demonstrating compiler construction principles using modern C++20 practices.