Baa Compiler Internals

Version: 0.4.4.1 | ← Language Spec | API Reference →

Target Architecture: x86-64 (AMD64) Targets: Windows x64 (COFF/PE) + Linux x86-64 (ELF) Calling Conventions: Microsoft x64 ABI (Windows) + SystemV AMD64 ABI (Linux)

This document details the internal architecture, data structures, and algorithms used in the Baa compiler.

Pipeline Architecture
Component Boundaries & Size Guard
Lexical Analysis
Syntactic Analysis
Abstract Syntax Tree
Semantic Analysis
Intermediate Representation
IR Mem2Reg Pass
IR Out-of-SSA Pass
IR SSA Verification
IR Well-Formedness Verification
IR Canonicalization Pass
IR CFG Simplification Pass
IR Data Layout Module
IR Constant Folding Pass
IR Dead Code Elimination Pass
IR Copy Propagation Pass
IR Common Subexpression Elimination Pass
IR Global Value Numbering Pass
IR Loop Invariant Code Motion Pass
IR Inlining Pass
IR Loop Unrolling Pass
Instruction Selection
Register Allocation
Code Generation
Global Data Section
Naming & Entry Point
IR Developer Guide
Code Review Checklist

1. Pipeline Architecture

The compiler is orchestrated by the Driver (src/main.c), which acts as the entry point and build manager. It parses command-line arguments to determine which stages of compilation to run.

1.1. Compilation Stages

flowchart LR
    A[".baa Source"] --> B["Lexer + Preprocessor"]
    B -->|Tokens| C["Parser"]
    C -->|AST| D["Semantic Analysis"]
    D -->|Validated AST| E["IR Lowering"]
    E -->|IR| F["Optimizer"]
    F -->|Optimized IR| G["Code Generator"]
    G --> H[".s Assembly"]
    H -->|GCC -c| I[".o Object"]
    I -->|GCC -o| J[".exe Executable"]
    
    style A fill:#e1f5fe
    style J fill:#c8e6c9
    style E fill:#fff3e0
    style F fill:#fff3e0

Stage	Input	Output	Component	Description
1. Frontend	`.baa` Source	AST	`lexer.c`, `parser.c`	Tokenizes, handles macros, and builds the syntax tree.
2. Analysis	AST	Valid AST	`analysis.c`	Semantic Pass: Checks types, scopes, and resolves symbols.
3. IR Lowering	AST	IR	`ir_lower.c` (v0.3.0.3+) + `ir_builder.c`	Converts AST expressions/statements to SSA-form Intermediate Representation using the IR Builder.
4. Optimization	IR	Optimized IR	`ir_optimizer.c`, `ir_mem2reg.c`, `ir_sccp.c`, `ir_gvn.c`, etc.	Full middle-end: Inlining (O2), Mem2Reg, Canon, InstCombine, SCCP, ConstFold, CopyProp, GVN (O2), CSE (O2), DCE, CFGSimplify, LICM.
5. Backend	IR	`.s` Assembly	`isel.c`, `regalloc.c`, `emit.c`	Lowers IR to machine instructions, allocates registers, and emits x86-64 AT&T assembly.
6. Assemble	`.s` Assembly	`.o` Object	`gcc -c`	Invokes external assembler. On Windows (v0.4.4.1), toolchain calls run via ASCII staging paths, then outputs are copied back to requested UTF-8 paths.
7. Link	`.o` Object	`.exe` Executable	`gcc`	Links with C Runtime. On Windows (v0.4.4.1), link inputs/outputs are staged on ASCII paths for GCC compatibility.

Note (v0.3.2.4+): The compiler uses the full IR-based backend pipeline end-to-end: AST → IR → Optimizer → ISel → RegAlloc → Emit → Assembly.

1.1.1. Component Map

flowchart TB
    Driver["Driver / CLI\nsrc/driver/main.c"] --> Lexer["Lexer + Preprocessor\nsrc/frontend/lexer.c"]
    Lexer --> Parser["Parser\nsrc/frontend/parser.c"]
    Parser --> Analyzer["Semantic Analysis\nsrc/frontend/analysis.c"]
    Analyzer --> Lower["IR Lowering\nsrc/middleend/ir_lower.c (v0.3.0.3+)"]
    Lower --> IR["IR Module\nsrc/middleend/ir.c (v0.3.0+)"]
    IR --> Backend["Backend\nsrc/backend/isel.c + src/backend/regalloc.c + src/backend/emit.c"]
    Backend --> GCC["External Toolchain\nMinGW-w64 gcc"]

    Driver --> Diagnostics["Diagnostics\nsrc/support/error.c"]
    Driver --> Updater["Updater\nsrc/support/updater.c (Windows-only)"]
    
    style Lower fill:#fff3e0
    style IR fill:#fff3e0
    style Backend fill:#fff3e0

1.2. The Driver (CLI)

The driver in main.c (v0.2.0+) supports multi-file compilation and various modes:

Flag	Mode	Output	Action
(Default)	Compile & Link	`.exe`	Runs full pipeline. Deletes intermediate `.s` and `.o` files.
`-o <file>`	Custom Output	`.exe`	Sets the linked output filename (default: `out.exe`).
(Multiple Files)	Multi-File Build	`.exe`	Compiles each `.baa` to `.o` and links them.
`-S`, `-s`	Assembly Only	`.s`	Stops after code emission. Writes `<input>.s` (or `-o` when a single input file is used).
`-c`	Compile Only	`.o`	Stops after assembling. Writes `<input>.o` (or `-o` when a single input file is used).
`-v`	Verbose	-	Prints commands and compilation time; keeps intermediate `.s` files.
`--debug-info`	Debug Info	`.s/.o/.exe`	Emits source `.file/.loc` info and passes `-g` to toolchain.
`--asm-comments`	Assembly Comments	`.s`	Emits explanatory comments in generated assembly (prologue/epilogue/blocks).
`-O0` / `-O1` / `-O2`	Optimization Level	-	Selects optimizer aggressiveness (`-O1` is default).
`--dump-ir`	IR Dump	stdout	Prints Baa IR (Arabic) after semantic analysis (v0.3.0.6+).
`--emit-ir`	IR Emit	`<input>.ir`	Writes Baa IR (Arabic) to a `.ir` file after semantic analysis (v0.3.0.7).
`--dump-ir-opt`	Optimized IR Dump	stdout	Prints Baa IR (Arabic) after optimization (v0.3.2.6.5).
`--verify`	Verify (All)	stderr	Runs `--verify-ir` + `--verify-ssa` (requires `-O1`/`-O2`) (v0.3.2.9.1).
`--verify-ir`	IR Verification	stderr	Verifies IR well-formedness (operands/types/terminators/phi/calls) after optimization and before Out-of-SSA/backend (v0.3.2.6.5).
`--verify-ssa`	SSA Verification	stderr	Verifies SSA invariants after Mem2Reg and before Out-of-SSA (requires `-O1`/`-O2`) (v0.3.2.5.3).
`--verify-gate`	Verifier Gate (Debug)	stderr	Runs `--verify-ir`/`--verify-ssa` after each optimizer iteration (requires `-O1`/`-O2`) (v0.3.2.6.5).
`--time-phases`	Phase Timings	stderr	Prints per-phase timing and IR arena memory stats (`[TIME]`/`[MEM]`) (v0.3.2.9.2).
`--target=<t>`	Target Select	`.s/.o/.exe`	Selects backend target: `x86_64-windows` or `x86_64-linux`.
`-fPIC` / `-fPIE`	Code Model (ELF)	`.s/.o/.exe`	Enables PIC/PIE-friendly emission on Linux/ELF.
`-fno-pic` / `-fno-pie`	Disable PIC/PIE	`.s/.o/.exe`	Disables PIC/PIE modes.
`-mcmodel=small`	Code Model	`.s/.o/.exe`	Uses small code model (only supported model).
`-fstack-protector` / `-fstack-protector-all` / `-fno-stack-protector`	Stack Protector (ELF)	`.s/.o/.exe`	Controls stack-canary emission on Linux/ELF.
`-funroll-loops`	Loop Unrolling (Opt-in)	-	Conservatively fully-unrolls small constant-trip-count loops after Out-of-SSA (v0.3.2.7.1).
`--version`	Version Info	stdout	Displays compiler version and build date.
`--help`, `-h`	Help	stdout	Shows usage information.
`update`	Self-Update	-	Downloads and installs the latest version.

1.2.1. Component Boundaries & Size Guard (v0.5.0 sidecar)

The source tree now uses physical component directories under src/, and the build targets those component source files directly:

Component	Current scope
Frontend	source loading, lexing, preprocessing, parsing, AST construction
Middle-End	semantic analysis, IR construction, IR verification, IR optimization
Backend	target-aware IR lowering, register allocation, assembly emission
Driver	CLI orchestration, staging/toolchain execution, updater entry points
Support	shared diagnostics and shared declarations

The full ownership/dependency contract now lives in Component Ownership.

Component-local internal facades are also in place for the main implementation roots:

src/frontend/frontend_internal.h
src/middleend/middleend_internal.h
src/backend/backend_internal.h
src/driver/driver_internal.h
src/support/support_internal.h

These headers are not public API; they centralize implementation-facing includes during the transition.

Size governance for handwritten modules is also active:

scripts/check_module_sizes.py scans src/**/*.c and src/**/*.h.
Warning threshold: 700 physical lines per file.
Error threshold: 1000 physical lines per file.
scripts/qa_run.py --mode full|stress runs the guard before the expensive QA stages.
CI runs the same guard before full QA on both Windows and Linux.

This remains a transitional hardening step: implementation files and component-owned headers now live under component directories, and local internal facade headers reduce direct cross-component include leakage.

Current in-place split pattern (2026-03-06):

parser.c now delegates to parser_types.c, parser_expr.c, parser_stmt.c, and parser_decl.c.
analysis.c now delegates to analysis_scope.c, analysis_types.c, analysis_semantic_utils.c, analysis_builtins.c, analysis_format.c, analysis_infer_expr.c, and analysis_visit.c.
lexer.c, isel.c, regalloc.c, ir.c, ir_text.c, ir_verify_ir.c, ir_lower.c, and emit.c also use companion implementation files to shrink the original hotspots while preserving their exported entry points.
scripts/module_size_allowlist.txt is currently empty; the size guard has no active legacy exceptions.
driver*.h and process.h now live under src/driver/.
emit.h, isel.h, regalloc.h, target.h, and code_model.h now live under src/backend/.
all ir*.h headers now live under src/middleend/.
lexer.h, ast.h, parser.h, and analysis.h now live under src/frontend/ as the frontend-owned public surface.
version.h, read_file.h, diagnostics.h, updater.h, and target_contract.h now live under src/support/ as the support-owned public surface.
src/baa.h is now a compatibility umbrella over those component-owned headers.
support/diagnostics.h no longer pulls in frontend lexer declarations directly; error_report(...) is a compatibility macro over error_report_loc(...).
the build intentionally uses no project-wide include directories; source files must use same-directory includes or explicit relative component paths.
the src/ root now effectively contains only the baa.h compatibility umbrella and resource files.

1.3. Diagnostic Engine

1.3.1. Benchmarking (v0.3.2.9.2)

The repository includes a small benchmark suite under bench/ and a runner script:

Runner: scripts/bench.py
Bench programs: bench/runtime_*.baa (compile+run) and bench/compile_*.baa (compile-time/memory)

Examples:

# Run all benchmarks (compile + runtime)
python3 scripts/bench.py --mode all

# Compile-only benchmark (compiler only, no assembler/linker)
python3 scripts/bench.py --mode compile_s --opt O2

# Memory profiling on Linux (uses /usr/bin/time -v)
python3 scripts/bench.py --mode mem --opt O2

# Include verifier and per-phase stats
python3 scripts/bench.py --mode compile_s --opt O2 --verify --time-phases

Notes:

The runner uses repo-relative paths to avoid toolchain quoting issues when the repo path contains spaces.
--time-phases prints [TIME]/[MEM] lines to stderr for machine parsing.

1.3.2. Regression Testing (v0.3.8)

Primary runner: scripts/qa_run.py
- --mode quick: integration smoke (tests/integration/**/*.baa via tests/test.py)
- --mode full: integration + regression + verify smoke + multi-file smoke
- --mode stress: full + tests/stress/*.baa + seeded fuzz-lite (timeout-guarded)
Legacy runners remain valid:
- tests/test.py (integration)
- tests/regress.py (integration + corpus + negatives)
On all hosts: docs-derived v0.2.x corpus runs under tests/corpus_v2x_docs/.
Test metadata markers (recognized by tests/test.py and tests/regress.py):
- // RUN: execution contract (expect-pass, expect-fail, runtime, compile-only, skip)
- // FLAGS: per-test compiler flags
- // EXPECT: negative diagnostic anchor(s)
- // ARGS: runtime executable arguments
- // STDIN: stdin lines for runtime tests (may repeat; joined with \n + trailing newline)
- // EXPECT-ASM: assembly substring expectations (used with -S compile-only tests)
Stress programs live under tests/stress/.

Windows build:

cmake -B build -G "MinGW Makefiles"
cmake --build build

python scripts\qa_run.py --mode full

Linux build:

cmake -B build-linux -DCMAKE_BUILD_TYPE=Release
cmake --build build-linux -j

python3 scripts/qa_run.py --mode full

The compiler uses a centralized Diagnostic Module (src/error.c) to handle errors and warnings.

Note (v0.3.2.9.4): semantic analysis errors now use error_report(...) as well, so semantic diagnostics include file:line:col and source context like parser errors.

Error Features:

Source Context: Prints the actual line of code where the error occurred.
Pointers: Uses ^ to point exactly to the offending token.
Colored Output: Errors displayed in red (ANSI) when terminal supports it (v0.2.8+).
Panic Mode Recovery (v0.3.7): When a syntax error is found, the parser does not exit immediately. It reports the error, enters panic mode, then synchronizes by context:
1. Statement mode: sync on ., } and statement starters.
2. Declaration mode: sync on declaration starters (صحيح, نص, هيكل, اتحاد, ثابت, ساكن, ...).
3. Switch mode: sync on حالة, افتراضي, } and statement terminators.
4. Parsing resumes after the nearest valid anchor to reduce cascading diagnostics.

Warning Features (v0.2.8+):

Non-fatal: Warnings do not stop compilation by default.
Colored Output: Warnings displayed in yellow (ANSI) when terminal supports it.
Warning Names: Each warning shows its type in brackets: [-Wunused-variable].
Configurable: Enable with -Wall or specific -W<type> flags.
Errors Mode: Use -Werror to treat warnings as fatal errors.
Numeric Diagnostics (v0.3.5.5): -Wimplicit-narrowing and -Wsigned-unsigned-compare.

ANSI Color Support:

Windows 10+: Automatically enables Virtual Terminal Processing.
Unix/Linux: Detects TTY via isatty().
Override with -Wcolor (force on) or -Wno-color (force off).

2. Lexical Analysis

The Lexer (src/lexer.c) transforms raw bytes into Token structures.

2.1. Internal Structure

The Lexer now supports Nested Includes via a state stack and Macro Definitions.

// Represents the state of a single file being parsed
typedef struct {
    char* source;       // Full source code buffer (owned by this state)
    char* cur_char;     // Current reading pointer
    const char* filename; // اسم الملف الحالي
    int line;
    int col;
} LexerState;

// Definition (Macro)
typedef struct {
    char* name;  // اسم الماكرو
    char* value; // القيمة الاستبدالية
} Macro;

// The main Lexer context
typedef struct {
    // الحالة الحالية
    LexerState state;

    // مكدس التضمين (Include Stack)
    LexerState stack[10]; // أقصى عمق للتضمين: 10
    int stack_depth;

    // حالة المعالج القبلي (Preprocessor State)
    Macro macros[100];    // جدول الماكروهات (حد أقصى 100)
    int macro_count;      // عدد الماكروهات المعرفة
    bool skipping;        // هل نحن في وضع التخطي؟ (مُشتق من مكدس الشروط)

    // مكدس الشروط (#إذا_عرف/#وإلا/#نهاية) لدعم التعشيش بشكل صحيح
    struct {
        unsigned char parent_active;
        unsigned char cond_true;
        unsigned char in_else;
    } if_stack[32];
    int if_depth;
} Lexer;

Lexer Limits:

Limit	Value	Description
Max Include Depth	10	`stack[10]` - maximum nested `#تضمين`
Max Macros	100	`macros[100]` - maximum `#تعريف` macros
Max Conditional Nesting	32	`if_stack[32]` - maximum nested `#إذا_عرف`

2.2. Preprocessor Logic

The preprocessor is integrated directly into the lexer_next_token function. It intercepts directives starting with # before tokenizing normal code.

2.2.1. Definitions (`#تعريف`)

When #تعريف NAME VALUE is encountered:

The name and value are parsed as strings.
They are stored in the macros table.
When the Lexer later encounters an IDENTIFIER:
- It checks the macro table
- If found, replaces the token's value with the macro value
- Updates the token type based on the value (INT if numeric, STRING if quoted, IDENTIFIER otherwise)

2.2.2. Conditionals (`#إذا_عرف`)

When #إذا_عرف NAME is encountered:

The lexer checks if NAME exists in the macro table.
If it exists, normal parsing continues.
If not, the lexer enters Skipping Mode.
In Skipping Mode, all tokens are discarded until #وإلا or #نهاية is found.

2.2.3. Undefine (`#الغاء_تعريف`)

When #الغاء_تعريف NAME is encountered:

The lexer searches for NAME in the macro table.
If found, the entry is removed (by shifting subsequent entries).
If not found, the directive is ignored.

2.2.4. Include (`#تضمين`)

When #تضمين "file" is encountered:

The filename is extracted from the quoted string.
Include resolution tries, in order:
- source-file directory (<source_dir>/<path>),
- exact path as written,
- {BAA_HOME}/<path> (for relative paths),
- CLI include paths from -I (in the user-provided order),
- for bare names: <source_dir>/stdlib/<name>, stdlib/<name>, {BAA_STDLIB}/<name>, {BAA_HOME}/stdlib/<name>.
The first successful candidate is normalized to a canonical active path.
The normalized path is checked against the current include stack to reject cycles early.
The selected file is read into memory.
The current lexer state is pushed onto the include stack.
The lexer state is updated to point to the new file's content.
When EOF is reached, the previous state is popped and restored.

2.2.5. Conditional Stack Implementation

The preprocessor supports nested conditionals via if_stack[32]:

Field	Purpose
`parent_active`	Was the parent conditional block active?
`cond_true`	Is the current condition (or branch) true?
`in_else`	Are we currently in the `#وإلا` (else) branch?

Nesting rules:

Maximum 32 nested conditional levels
#إذا_عرف pushes a new level onto if_stack
#وإلا toggles cond_true within the current level
#نهاية pops the current level
skipping mode is computed from the stack state

2.3. Key Features

Feature	Description
UTF-8 Handling	Full Unicode support for Arabic text
Strict UTF-8 Validation (v0.3.7)	Rejects invalid UTF-8 sequences in identifiers and string/char literals
BOM Detection	Skips `0xEF 0xBB 0xBF` if present
Arabic Numerals	Normalizes `٠`-`٩` → `0`-`9`
Arabic Punctuation	Handles `؛` (semicolon) `0xD8 0x9B`

2.4. Token Types

Keywords:    صحيح, ص٨, ص١٦, ص٣٢, ص٦٤, ط٨, ط١٦, ط٣٢, ط٦٤, عشري, عشري٣٢, حرف, نص, منطقي, عدم, حجم, نوع, ثابت, ساكن, إذا, وإلا, طالما, لكل, اختر, حالة, افتراضي, اطبع, اقرأ, إرجع, توقف, استمر, تعداد, هيكل, اتحاد
Literals:    INTEGER, STRING, CHAR, TRUE, FALSE
Operators:   + - * / % ++ -- ! ~ && || & | ^ << >>
Comparison:  == != < > <= >=
Delimiters:  ( ) { } [ ] , . : ؛
Special:     IDENTIFIER, EOF

Note: ثابت (const) was added in v0.2.7. ساكن (static storage) was added in v0.3.7.5.

3. Syntactic Analysis

The Parser (src/parser.c) builds the AST using Recursive Descent with 2-token lookahead.

3.0. Parser Structure

typedef struct {
    Lexer* lexer;       // Reference to the lexer for token stream
    Token current;      // Current token (lookahead)
    Token next;         // Next token (2-token lookahead)
    bool panic_mode;    // وضع الذعر للتعافي من الأخطاء
    bool had_error;     // هل حدث خطأ أثناء التحليل؟
} Parser;

Parser Type Alias Registry:

The parser maintains its own type alias registry (separate from semantic analysis):

#define PARSER_MAX_TYPE_ALIASES 256

typedef struct {
    char* name;
    DataType target_type;
    char* target_type_name;
    DataType target_ptr_base_type;
    char* target_ptr_base_type_name;
    int target_ptr_depth;
    FuncPtrSig* target_func_sig; // مملوك (قد يكون NULL)
} ParserTypeAlias;

Panic Mode Recovery (v0.3.7):

When a syntax error is detected, panic_mode is set to true
The parser synchronizes on statement terminators (., }) or declaration starters
This prevents cascading error messages from a single syntax error
After synchronization, normal parsing resumes

Synchronization Modes:

typedef enum {
    PARSER_SYNC_STATEMENT = 0,    // Statement-level sync
    PARSER_SYNC_DECLARATION = 1,  // Declaration-level sync
    PARSER_SYNC_SWITCH = 2        // Switch-case sync
} ParserSyncMode;

3.1. Grammar (BNF)

Program       ::= Declaration* EOF
Declaration   ::= FuncDecl | GlobalVarDecl | GlobalArrayDecl | EnumDecl | StructDecl | UnionDecl | TypeAliasDecl

FuncDecl      ::= Type ID "(" ParamList ")" Block
                | Type ID "(" ParamList ")" "."    // Prototype (v0.2.5+)
GlobalVarDecl ::= DeclMods TypeSpec ID ("=" Expr)? "."
GlobalArrayDecl ::= DeclMods "صحيح" ID "[" INT "]" ArrayInit? "."   // v0.3.3+
EnumDecl      ::= "تعداد" ID "{" EnumMembers? "}"                    // v0.3.4+
StructDecl    ::= "هيكل" ID "{" FieldDecl* "}"                      // v0.3.4+
UnionDecl     ::= "اتحاد" ID "{" FieldDecl* "}"                     // v0.3.4.5+
TypeAliasDecl ::= "نوع" ID "=" TypeSpec "."                         // v0.3.6.5+

DeclMod       ::= "ثابت" | "ساكن"
DeclMods      ::= DeclMod*
TypeSpec      ::= Type | EnumType | StructType | UnionType | AliasType
Type          ::= "صحيح" | "ص٨" | "ص١٦" | "ص٣٢" | "ص٦٤"
                | "ط٨" | "ط١٦" | "ط٣٢" | "ط٦٤"
                | "عشري" | "عشري٣٢" | "حرف" | "نص" | "منطقي" | "عدم"
EnumType      ::= "تعداد" ID
StructType    ::= "هيكل" ID
UnionType     ::= "اتحاد" ID
AliasType     ::= ID                                    // Resolved via parser alias registry

Block         ::= "{" Statement* "}"
Statement     ::= VarDecl | ArrayDecl | Assign | ArrayAssign | MemberAssign
                | If | Switch | While | For | Return | Print | Read | CallStmt
                | Break | Continue

VarDecl       ::= DeclMods TypeSpec ID ("=" Expr)? "."   // initializer optional للتخزين الساكن
ArrayDecl     ::= DeclMods "صحيح" ID "[" INT "]" ArrayInit? "."  // v0.3.3+

EnumMembers   ::= ID (COMMA ID)* COMMA?
FieldDecl     ::= "ثابت"? TypeSpec ID "."

ArrayInit     ::= "=" "{" (Expr (COMMA Expr)* COMMA?)? "}"
COMMA         ::= "," | "،"
Assign        ::= ID "=" Expr "."
ArrayAssign   ::= ID "[" Expr "]" "=" Expr "."
MemberAssign  ::= MemberAccess "=" Expr "."

MemberAccess  ::= Primary ":" ID (":" ID)*

If            ::= "إذا" "(" Expr ")" Block ("وإلا" (Block | If))?
Switch        ::= "اختر" "(" Expr ")" "{" Case* Default? "}"
Case          ::= "حالة" (INT | CHAR) ":" Statement*
Default       ::= "افتراضي" ":" Statement*

While         ::= "طالما" "(" Expr ")" Block
For           ::= "لكل" "(" Init? "؛" Expr? "؛" Update? ")" Block
Break         ::= "توقف" "."
Continue      ::= "استمر" "."
Return        ::= "إرجع" Expr? "."
Print         ::= "اطبع" Expr "."
Read          ::= "اقرأ" ID "."

3.2. Expression Precedence

Implemented via precedence climbing:

Logical OR   ::= Logical AND { "||" Logical AND }
Logical AND  ::= Bitwise OR { "&&" Bitwise OR }
Bitwise OR   ::= Bitwise XOR { "|" Bitwise XOR }
Bitwise XOR  ::= Bitwise AND { "^" Bitwise AND }
Bitwise AND  ::= Equality { "&" Equality }
Equality     ::= Relational { ("==" | "!=") Relational }
Relational   ::= Shift { ("<" | ">" | "<=" | ">=") Shift }
Shift        ::= Additive { ("<<" | ">>") Additive }
Additive     ::= Multiplicative { ("+" | "-") Multiplicative }
Multiplicative ::= Unary { ("*" | "/" | "%") Unary }
Unary        ::= ("!" | "~" | "-" | "++" | "--") Unary | Postfix
Postfix      ::= Primary { "++" | "--" }
Primary      ::= INT | STRING | CHAR | ID | ArrayAccess | Call | "حجم" "(" (TypeSpec | Expr) ")" | "(" Expr ")"

3.3. Error Handling Strategy

The parser uses synchronize() to recover from errors.

Example Scenario:

صحيح س = ١٠  // Error: Missing dot
صحيح ص = ٢٠.

Parser expects . but finds صحيح.
report_error() is called.
synchronize() is called. It skips until it sees صحيح (start of next statement).
Parser continues parsing صحيح ص = ٢٠..
At the end, compiler exits with status 1 if any errors were found.

Type alias parsing note (v0.3.6.5):

نوع is handled as a contextual keyword in parser declarations.
This preserves existing identifier/member usages like س:نوع while still supporting نوع اسم = ... at top-level.

4. Abstract Syntax Tree

The AST uses a tagged union structure for type-safe node representation.

4.1. Node Types

Category	Node Types
Structure	`NODE_PROGRAM`, `NODE_FUNC_DEF`, `NODE_BLOCK`, `NODE_TYPE_ALIAS`
Variables	`NODE_VAR_DECL`, `NODE_ASSIGN`, `NODE_VAR_REF`
Array Decls	`NODE_ARRAY_DECL`, `NODE_ARRAY_ACCESS`, `NODE_ARRAY_ASSIGN`
Member Access	`NODE_MEMBER_ACCESS`, `NODE_MEMBER_ASSIGN`, `NODE_DEREF_ASSIGN`
Control Flow	`NODE_IF`, `NODE_WHILE`, `NODE_FOR`, `NODE_RETURN`
Branching	`NODE_SWITCH`, `NODE_CASE`, `NODE_BREAK`, `NODE_CONTINUE`
Expressions	`NODE_BIN_OP`, `NODE_UNARY_OP`, `NODE_POSTFIX_OP`, `NODE_CALL_EXPR`
Literals	`NODE_INT`, `NODE_STRING`, `NODE_CHAR`, `NODE_BOOL`, `NODE_NULL`
Calls & I/O	`NODE_CALL_STMT`, `NODE_PRINT`, `NODE_READ`

4.2. Node Structure

typedef struct Node {
    NodeType type;      // Discriminator
    struct Node* next;  // Linked list for siblings
    union { ... } data; // Type-specific payload
} Node;

5. Semantic Analysis

The Semantic Analyzer (src/analysis.c) performs a static check on the AST before code generation.

5.1. Responsibilities

Symbol Resolution: Verifies variables are declared before use.
Type Checking: Enforces compatibility across primitive/sized numeric types, نص/حرف/منطقي/عشري, and compound types (تعداد/هيكل/اتحاد) including array elements.
Scope Validation: Manages visibility rules.
Constant Checking (v0.2.7+): Prevents reassignment of immutable variables.
Static Storage Rules (v0.3.7.5): Validates ساكن declarations and enforces compile-time initializers for static-storage objects.
Control Flow Validation: Ensures break and continue are used only within loops/switches.
Function Validation: Checks function prototypes and definitions match.
Usage Tracking (v0.2.8+): Tracks variable usage for unused variable warnings.
Dead Code Detection (v0.2.8+): Detects unreachable code after return/break.
Type Alias Validation (v0.3.6.5): Registers aliases, validates alias targets, and enforces strict alias/symbol name collision diagnostics.
Array Shape Validation (v0.3.9): Tracks array rank/dimensions in symbols, validates index-count match, and performs compile-time out-of-bounds checks for constant indices.
Pointer Semantics (v0.3.10): Validates pointer arithmetic, comparisons, dereference, and address-of constraints.
Type Casting (v0.3.10.5): Enforces rules for explicit scalar and pointer conversions.
Function Pointers (v0.3.10.6): Validates assignment, comparison (EQ/NE only), and indirect calls matching exact signatures.
Variadic Functions (v0.4.0.5): Validates ... signatures, variadic builtin usage (بدء_معاملات/معامل_تالي/نهاية_معاملات), and fixed/extra argument checks for variadic direct calls.
Inline Assembly (v0.4.0.6): Validates مجمع { ... } blocks, enforces fixed-register constraint subset (=a/=c/=d, a/c/d), checks output lvalue requirements, and restricts operand types to integer/pointer forms.
Standard Library Modules + Float Extensions (v0.4.2): Validates Math/System/Time builtins (جذر_تربيعي/أس/جيب/جيب_تمام/ظل/مطلق/عشوائي/متغير_بيئة/نفذ_أمر/وقت_حالي/وقت_كنص), Arabic float format specs (%ع/%أ), and accepts عشري٣٢ as a float keyword alias.
Error Handling Builtins (v0.4.3): Validates تأكد/توقف_فوري/كود_خطأ_النظام/ضبط_كود_خطأ_النظام/نص_كود_خطأ for arity/type contracts with Arabic diagnostics.

5.1.1. Multi-File Symbol Visibility (v0.5.2)

The current cross-file linkage contract is intentionally small and explicit:

top-level functions are externally visible by default,
a prototype declaration in a .baahd header does not emit a body,
top-level ساكن globals and arrays are lowered with internal linkage at file scope,
ساكن on functions remains rejected in parsing/semantic validation,
shared global-variable APIs are not first-class yet because there is no separate extern-style variable declaration syntax.

This behavior is now locked by multi-file QA smoke in addition to the existing single-file semantic checks.

5.2. Constant Checking (v0.2.7+)

The analyzer tracks is_const / is_static and enforces immutability + static-storage constraints:

Error Condition	Error Message
Reassigning a constant	Arabic semantic error for modifying `ثابت`
Modifying constant array element	Arabic semantic error for constant array mutation
Automatic constant without initializer	Arabic semantic error (`الثابت ... يجب تهيئته`)
Static-storage non-constant initializer	Arabic semantic error for non-constant static initializer

5.3. Warning Generation (v0.2.8+)

The analyzer generates warnings for potential issues that don't prevent compilation:

Unused Variable Detection

Algorithm:

Each symbol has an is_used flag initialized to false.
When a variable is referenced (in expressions, assignments, etc.), the flag is set to true.
At end of function scope, all local variables with is_used == false generate a warning.
At end of program, all global variables with is_used == false generate a warning.

Exception: Function parameters are marked as "used" implicitly to avoid false positives.

Dead Code Detection

Algorithm:

While analyzing a block, track if a "terminating" statement was encountered.
Terminating statements: NODE_RETURN, NODE_BREAK, NODE_CONTINUE.
If a terminating statement was found and there are more statements after it, generate a warning.

Implementation:

static void analyze_statements_with_dead_code_check(Node* statements, const char* context) {
    bool found_terminator = false;
    Node* stmt = statements;
    while (stmt) {
        if (found_terminator) {
            warning_report(WARN_DEAD_CODE, ...);
            found_terminator = false; // Avoid multiple warnings
        }
        analyze_node(stmt);
        if (is_terminating_statement(stmt)) {
            found_terminator = true;
        }
        stmt = stmt->next;
    }
}

Variable Shadowing

When a local variable is declared with the same name as a global variable, a WARN_SHADOW_VARIABLE warning is generated.

Implicit Narrowing Conversions (v0.3.5.5)

The analyzer emits WARN_IMPLICIT_NARROWING when an implicit numeric conversion may lose information.

Covered conversion sites:

Variable declaration initializers
Assignments
Return expressions
Function-call arguments
Array element assignments
Struct/union member assignments

The check is constant-aware: if the source expression is a compile-time constant that is provably representable in the destination type, the warning is suppressed.

Signed/Unsigned Mixed Comparisons (v0.3.5.5)

The analyzer emits WARN_SIGNED_UNSIGNED_COMPARE for comparison operators (==, !=, <, >, <=, >=) when the integer-promotion result mixes signed and unsigned domains.

Low-Level Semantic Checks (v0.3.6)

Bitwise operators (&, |, ^, ~, <<, >>) are restricted to integer-like types.
Shift-count literals are range-checked (0..63) during semantic analysis.
NODE_SIZEOF is resolved to a compile-time integer value when size information is known.
عدم (void) rules are enforced:
- no variable declarations of type عدم (local/global),
- no function parameters of type عدم,
- return shape must match function type (إرجع. only in عدم functions, value required in non-void).

5.4. Isolation Note

Since v0.2.4, analysis.c maintains its own symbol table for isolation. This ensures validation logic is independent from the backend pipeline.

In v0.3.7, semantic lookups were optimized using hash-indexed chains for local/global symbol lookup while preserving deterministic semantics and existing symbol ownership.

Future improvement: Unify symbol tables into a shared context object passed between phases.

The analyzer walks the AST recursively. It maintains a Symbol Table stack to track active variables in the current scope. If it encounters:

x = "text" (where x is int): Reports a type mismatch error.
print y (where y is undeclared): Reports an undefined symbol error.
x = 5 (where x is const): Reports a const reassignment error (v0.2.7+).

5.5. Analysis Limits and Constants

The semantic analyzer uses the following constants (defined in src/analysis.c):

Constant	Value	Description
`ANALYSIS_MAX_SYMBOLS`	100	Maximum symbols per scope (global/local)
`ANALYSIS_MAX_SCOPES`	64	Maximum nested scope depth
`ANALYSIS_MAX_FUNCS`	128	Maximum function declarations
`ANALYSIS_MAX_FUNC_PARAMS`	32	Maximum parameters per function
`ANALYSIS_SYMBOL_HASH_BUCKETS`	257	Hash table buckets for symbol lookup
`ANALYSIS_MAX_ENUMS`	128	Maximum enum definitions
`ANALYSIS_MAX_STRUCTS`	128	Maximum struct definitions
`ANALYSIS_MAX_UNIONS`	128	Maximum union definitions
`ANALYSIS_MAX_ENUM_MEMBERS`	128	Maximum members per enum
`ANALYSIS_MAX_STRUCT_FIELDS`	128	Maximum fields per struct/union
`ANALYSIS_MAX_TYPE_ALIASES`	256	Maximum type alias definitions

5.6. Symbol Table Structures

The semantic analyzer maintains several internal data structures for symbol management:

Symbol (Variable Symbol)

typedef struct {
    char name[32];              // اسم الرمز (Symbol name)
    ScopeType scope;            // النطاق (SCOPE_GLOBAL or SCOPE_LOCAL)
    DataType type;              // نوع البيانات (للمتغير: نوعه، للمصفوفة: نوع العنصر)
    char type_name[32];         // اسم النوع عند TYPE_ENUM/TYPE_STRUCT (فارغ لغير ذلك)
    DataType ptr_base_type;     // نوع أساس المؤشر عندما type == TYPE_POINTER
    char ptr_base_type_name[32];// اسم النوع المركب لأساس المؤشر
    int ptr_depth;              // عمق المؤشر عندما type == TYPE_POINTER
    FuncPtrSig* func_sig;       // توقيع مؤشر الدالة عندما type == TYPE_FUNC_PTR
    bool is_array;              // هل الرمز مصفوفة؟
    int array_rank;             // عدد الأبعاد
    int64_t array_total_elems;  // حاصل ضرب الأبعاد
    int* array_dims;            // أبعاد المصفوفة (مملوك لجدول الرموز)
    int offset;                 // الإزاحة في المكدس أو العنوان
    bool is_const;              // هل هو ثابت (immutable)؟
    bool is_static;             // هل التخزين ساكن؟
    bool is_used;               // هل تم استخدام هذا المتغير؟ (للتحذيرات)
    int decl_line;              // سطر التعريف (للتحذيرات)
    int decl_col;               // عمود التعريف (للتحذيرات)
    const char* decl_file;      // ملف التعريف (للتحذيرات)
} Symbol;

FuncSymbol (Function Symbol)

typedef struct {
    char* name;                     // اسم الدالة (مملوك strdup)
    DataType return_type;           // نوع الإرجاع
    DataType return_ptr_base_type;  // نوع أساس مؤشر الإرجاع
    char* return_ptr_base_type_name;// اسم نوع أساس المؤشر (مملوك strdup)
    int return_ptr_depth;           // عمق مؤشر الإرجاع
    FuncPtrSig* return_func_sig;    // توقيع مؤشر دالة الإرجاع (مملوك clone)
    DataType* param_types;          // أنواع المعاملات (مملوك malloc)
    DataType* param_ptr_base_types; // أنواع أساس مؤشرات المعاملات
    char** param_ptr_base_type_names;// أسماء أنواع أساس مؤشرات المعاملات
    int* param_ptr_depths;          // أعماق مؤشرات المعاملات
    FuncPtrSig** param_func_sigs;   // تواقيع مؤشرات دوال المعاملات
    int param_count;                // عدد المعاملات
    FuncPtrSig* ref_funcptr_sig;    // توقيع "مرجع الدالة" كقيمة (مملوك clone)
    bool is_defined;                // هل تم تعريف الدالة (لها جسم)؟
    const char* decl_file;          // ملف التعريف
    int decl_line;                  // سطر التعريف
    int decl_col;                   // عمود التعريف
} FuncSymbol;

Function Pointer Signature (`FuncPtrSig`)

typedef struct FuncPtrSig {
    DataType return_type;
    DataType return_ptr_base_type;
    char* return_ptr_base_type_name;  // مملوك (قد يكون NULL)
    int return_ptr_depth;
    int param_count;
    DataType* param_types;              // مملوك (malloc)
    DataType* param_ptr_base_types;     // مملوك (malloc)
    char** param_ptr_base_type_names;   // مملوك (malloc) وعناصره مملوكة (strdup)
    int* param_ptr_depths;              // مملوك (malloc)
} FuncPtrSig;

Compound Type Definitions

EnumDef (تعريف التعداد):

typedef struct {
    char* name;  // مملوك (strdup)
    int member_count;
    struct {
        char* name;   // مملوك (strdup)
        int64_t value;
    } members[ANALYSIS_MAX_ENUM_MEMBERS];
} EnumDef;

StructDef (تعريف الهيكل):

typedef struct {
    char* name;  // مملوك (strdup)
    int field_count;
    StructFieldDef fields[ANALYSIS_MAX_STRUCT_FIELDS];
    int size;
    int align;
    bool layout_done;
    bool layout_in_progress;
} StructDef;

UnionDef (تعريف الاتحاد):

typedef struct {
    char* name;  // مملوك (strdup)
    int field_count;
    StructFieldDef fields[ANALYSIS_MAX_STRUCT_FIELDS];
    int size;
    int align;
    bool layout_done;
    bool layout_in_progress;
} UnionDef;

StructFieldDef (تعريف حقل الهيكل/الاتحاد):

typedef struct {
    char* name;       // مملوك (strdup)
    DataType type;
    char* type_name;  // مملوك (strdup) عند TYPE_ENUM/TYPE_STRUCT
    DataType ptr_base_type;
    char* ptr_base_type_name;
    int ptr_depth;
    bool is_const;
    int offset;
    int size;
    int align;
} StructFieldDef;

TypeAliasDef (تعريف الاسم البديل للنوع):

typedef struct {
    char* name;              // مملوك (strdup)
    DataType target_type;    // النوع الهدف بعد فك الاسم البديل
    char* target_type_name;  // مملوك (strdup) عند TYPE_ENUM/TYPE_STRUCT/TYPE_UNION
    DataType target_ptr_base_type;
    char* target_ptr_base_type_name;
    int target_ptr_depth;
    FuncPtrSig* target_func_sig; // مملوك (clone) عند TYPE_FUNC_PTR
} TypeAliasDef;

Hash-Indexed Symbol Lookup (v0.3.7+)

Semantic lookups use hash-indexed chains for O(1) average-case lookup:

// جداول الرموز
static Symbol global_symbols[ANALYSIS_MAX_SYMBOLS];
static int global_count = 0;
static int global_symbol_hash_head[ANALYSIS_SYMBOL_HASH_BUCKETS];
static int global_symbol_hash_next[ANALYSIS_MAX_SYMBOLS];

static Symbol local_symbols[ANALYSIS_MAX_SYMBOLS];
static int local_count = 0;
static int local_symbol_hash_head[ANALYSIS_SYMBOL_HASH_BUCKETS];
static int local_symbol_hash_next[ANALYSIS_MAX_SYMBOLS];

// مكدس النطاقات
static int scope_stack[ANALYSIS_MAX_SCOPES];
static int scope_depth = 0;

The hash function used is FNV-1a 32-bit for fast string hashing.

5.7. DataType and Operation Enums

The AST uses the following type enumeration (defined in src/frontend/ast.h):

typedef enum {
    TYPE_INT,           // صحيح / ص٦٤ (int64)

    // أحجام الأعداد الصحيحة (v0.3.5.5)
    TYPE_I8,            // ص٨
    TYPE_I16,           // ص١٦
    TYPE_I32,           // ص٣٢
    TYPE_U8,            // ط٨
    TYPE_U16,           // ط١٦
    TYPE_U32,           // ط٣٢
    TYPE_U64,           // ط٦٤

    TYPE_STRING,        // نص (حرف[])
    TYPE_POINTER,       // مؤشر عام
    TYPE_FUNC_PTR,      // مؤشر دالة: دالة(...) -> نوع
    TYPE_BOOL,          // منطقي (bool - stored as byte)
    TYPE_CHAR,          // حرف (UTF-8 sequence)
    TYPE_FLOAT,         // عشري (float64) + عشري٣٢ (alias في v0.4.2)
    TYPE_VOID,          // عدم (void)
    TYPE_ENUM,          // تعداد (يُخزن كـ int64)
    TYPE_STRUCT,        // هيكل (ليس قيمة من الدرجة الأولى)
    TYPE_UNION          // اتحاد (ليس قيمة من الدرجة الأولى)
} DataType;

OpType Enum (Binary Operations):

typedef enum { 
    // عمليات حسابية
    OP_ADD, OP_SUB, OP_MUL, OP_DIV, OP_MOD,
    // عمليات بتية (Bitwise)
    OP_BIT_AND, OP_BIT_OR, OP_BIT_XOR, OP_SHL, OP_SHR,
    // عمليات مقارنة
    OP_EQ, OP_NEQ, OP_LT, OP_GT, OP_LTE, OP_GTE,
    // عمليات منطقية
    OP_AND, OP_OR
} OpType;

UnaryOpType Enum:

typedef enum {
    UOP_NEG,        // السالب (-)
    UOP_NOT,        // النفي (!)
    UOP_BIT_NOT,    // النفي البتي (~)
    UOP_ADDR,       // أخذ العنوان (&)
    UOP_DEREF,      // فك الإشارة (*)
    UOP_INC,        // الزيادة (++)
    UOP_DEC         // النقصان (--)
} UnaryOpType;

5.8. Memory Allocation

Type	C Type	Size	Notes
`صحيح`	`int64_t`	8 bytes	Signed integer
`نص`	`char*`	8 bytes	Pointer to read-only string (.rdata/.rodata)
`منطقي`	`bool` (stored as int)	8 bytes	Stored as 0/1 in 8-byte slots

I/O note: The backend dynamically resolves format strings (%lld, %llu, %g/%e) for integers/floats (Arabic %ع/%أ → C %f/%e in formatted builtins). Strings (نص) and Characters (حرف) are handled with a custom UTF-8 emission loop or packed format.

5.9. Constant Folding (Optimization)

The parser performs constant folding on arithmetic expressions. If both operands of a binary operation are integer literals, the compiler evaluates the result at compile-time.

Example:

Source: ٢ * ٣ + ٤
Before folding: BinOp(+, BinOp(*, 2, 3), 4)
After folding: Int(10)

Supported Operations: +, -, *, /, % Note: Division/modulo by zero is detected and reported during folding.

6. Intermediate Representation (v0.3.10.6)

The IR Module (src/ir.h, src/ir.c) provides an Arabic-first Intermediate Representation using SSA (Static Single Assignment) form.

Memory management (v0.3.2.6.1): IR objects are now allocated from a module-owned arena (src/ir_arena.c) and freed in bulk by ir_module_free(). IR passes should treat IR nodes as module-owned and avoid per-node frees.

IR serialization (v0.3.2.6.3): The compiler also includes a machine-readable IR text serializer/reader for round-trip tests (src/ir_text.c, src/ir_text.h). This format is separate from the Arabic-first debug printer (ir_module_print()).

6.1. Design Philosophy

Baa's IR is designed with three goals:

Arabic Identity: All opcodes, types, and predicates have Arabic names.
Technical Parity: Comparable to LLVM IR, GIMPLE, or WebAssembly in capabilities.
SSA Form: Each virtual register is assigned exactly once, enabling powerful optimizations.

6.2. IR Structure

IRModule
├── name: char*             // Module name (source file)
├── arena: IRArena          // IR memory arena (all IR objects allocated here)
├── cached_i8_ptr_type: IRType*  // Common type cache
├── globals: IRGlobal*      // Global variables
├── global_count: int
├── funcs: IRFunc*          // Functions
├── func_count: int
├── strings: IRStringEntry* // C string literal table
├── string_count: int
├── baa_strings: IRBaaStringEntry*  // Baa string table (حرف[])
└── baa_string_count: int

IRFunc
├── name: char*
├── ret_type: IRType*
├── params: IRParam[]
├── param_count: int        // Number of parameters
├── blocks: IRBlock*        // Linked list of basic blocks
├── block_count: int        // Number of blocks
├── entry: IRBlock*         // Entry block pointer
├── next_reg: int           // Virtual register counter (next available %م<n>)
├── next_inst_id: int       // Instruction ID counter
├── ir_epoch: uint32_t      // IR change counter (invalidates analyses)
├── def_use: IRDefUse*      // Def-Use analysis cache (heap allocated)
├── next_block_id: int      // Block ID counter
├── is_prototype: bool      // Is this a declaration without body?
└── next: IRFunc*           // Next function in module

IRBlock
├── label: char*            // Arabic label (e.g., "بداية", "حلقة")
├── id: int
├── parent: IRFunc*         // Function containing this block
├── first/last: IRInst*     // Instruction list
├── inst_count: int
├── succs[2]: IRBlock*      // Successors (0-2 for br/br_cond)
├── succ_count: int
├── preds: IRBlock**        // Predecessors (dynamic array)
├── pred_count: int
├── pred_capacity: int
├── idom: IRBlock*          // Immediate dominator
├── dom_frontier: IRBlock** // Dominance frontier
├── dom_frontier_count: int
└── next: IRBlock*          // Next block in function

IRInst
├── op: IROp                // Opcode
├── type: IRType*           // Result type
├── id: int                 // Instruction ID for diagnostics/tests
├── dest: int               // Destination register (-1 if none)
├── operands[4]: IRValue*   // Up to 4 operands
├── operand_count: int      // Number of operands used
├── cmp_pred: IRCmpPred     // For comparison instructions
├── phi_entries: IRPhiEntry* // Linked list of [value, block] pairs
├── call_target: char*      // Direct call target name (NULL for indirect)
├── call_callee: IRValue*   // Indirect call callee value (NULL for direct)
├── call_args: IRValue**    // Argument list
├── call_arg_count: int     // Number of arguments
├── src_file: const char*   // Source file (debug info)
├── src_line: int           // Source line (debug info)
├── src_col: int            // Source column (debug info)
├── dbg_name: const char*   // Optional symbol name for debugging
├── parent: IRBlock*        // Block containing this instruction
├── prev: IRInst*           // Previous instruction in block
└── next: IRInst*           // Next instruction in block

6.2.1. IR Memory Management (IR Arena) — v0.3.2.6.1

The IR system uses an arena allocator for efficient memory management. All IR objects (types, values, instructions, blocks, functions, globals) are allocated from the module-owned arena and freed in bulk when the module is destroyed.

Arena Structure:

typedef struct IRArenaChunk {
    struct IRArenaChunk* next;
    size_t used;
    size_t cap;
    unsigned char data[];  // Flexible array member
} IRArenaChunk;

typedef struct IRArena {
    IRArenaChunk* head;
    size_t default_chunk_size;
} IRArena;

typedef struct IRArenaStats {
    size_t chunks;       // Number of allocated chunks
    size_t used_bytes;   // Total used bytes
    size_t cap_bytes;    // Total capacity
} IRArenaStats;

Key Functions:

void ir_arena_init(IRArena* arena, size_t default_chunk_size);
void ir_arena_destroy(IRArena* arena);
void* ir_arena_alloc(IRArena* arena, size_t size, size_t align);
void* ir_arena_calloc(IRArena* arena, size_t count, size_t size, size_t align);
char* ir_arena_strdup(IRArena* arena, const char* s);
void ir_arena_get_stats(const IRArena* arena, IRArenaStats* out_stats);

Important Notes:

IR passes should treat IR nodes as module-owned and avoid per-node frees
Memory is freed in bulk by ir_module_free() via ir_arena_destroy()
The arena provides O(1) allocation with minimal overhead
All IR objects are annotated with: ملاحظة: هذه البنية تُخصَّص داخل ساحة IR (Arena) وتُحرَّر دفعة واحدة.

Usage Pattern:

IRModule* module = ir_module_new("program.baa");
// All IR objects allocated via ir_module_get_current() use the arena
IRFunc* func = ir_func_new("الرئيسية", ret_type);
ir_module_add_func(module, func);
// ... build IR ...
ir_module_free(module);  // Bulk free all arena memory

6.2.2. IR Module Context

The IR system maintains a thread-local context for the current module to simplify allocation:

void ir_module_set_current(IRModule* module);
IRModule* ir_module_get_current(void);

This allows IR construction functions to allocate from the correct arena without passing the module explicitly.

Indirect Call Support (v0.3.10.6):

For indirect function calls through function pointers:

call_target is NULL
call_callee contains the IRValue (register) holding the function pointer
ISel lowers this to call *%reg instead of call @function

6.3. IR Opcodes (Arabic)

Category	Opcode	Arabic	Description
Arithmetic	`IR_OP_ADD`	جمع	Addition
	`IR_OP_SUB`	طرح	Subtraction
	`IR_OP_MUL`	ضرب	Multiplication
	`IR_OP_DIV`	قسم	Division
	`IR_OP_MOD`	باقي	Modulo
	`IR_OP_NEG`	سالب	Negation
Memory	`IR_OP_ALLOCA`	حجز	Stack allocation
	`IR_OP_LOAD`	حمل	Load from memory
	`IR_OP_STORE`	خزن	Store to memory
	`IR_OP_PTR_OFFSET`	إزاحة_مؤشر	Pointer offset: base + index * sizeof(pointee)
Comparison	`IR_OP_CMP`	قارن	Compare with predicate
Logical	`IR_OP_AND`	و	Bitwise AND
	`IR_OP_OR`	أو	Bitwise OR
	`IR_OP_XOR`	أو_حصري	Bitwise XOR
	`IR_OP_NOT`	نفي	Bitwise NOT
	`IR_OP_SHL`	ازاحة_يسار	Shift left
	`IR_OP_SHR`	ازاحة_يمين	Shift right (signed/unsigned-aware)
Control	`IR_OP_BR`	قفز	Unconditional branch
	`IR_OP_BR_COND`	قفز_شرط	Conditional branch
	`IR_OP_RET`	رجوع	Return
	`IR_OP_CALL`	نداء	Function call
SSA	`IR_OP_PHI`	فاي	Phi node
	`IR_OP_COPY`	نسخ	Copy value
	`IR_OP_NOP`	NOP	No operation
Conversion	`IR_OP_CAST`	تحويل	Type cast

6.4. IR Types (Arabic)

Type	Arabic	Bits	Description
`IR_TYPE_VOID`	فراغ	0	No value
`IR_TYPE_I1`	ص١	1	Boolean
`IR_TYPE_I8`	ص٨	8	Byte/Char (8-bit signed)
`IR_TYPE_I16`	ص١٦	16	Short (16-bit signed)
`IR_TYPE_I32`	ص٣٢	32	Int (32-bit signed)
`IR_TYPE_I64`	ص٦٤	64	Long (64-bit signed)
`IR_TYPE_U8`	ط٨	8	Unsigned byte
`IR_TYPE_U16`	ط١٦	16	Unsigned short
`IR_TYPE_U32`	ط٣٢	32	Unsigned int
`IR_TYPE_U64`	ط٦٤	64	Unsigned long
`IR_TYPE_CHAR`	حرف	8	UTF-8 char (packed into i64)
`IR_TYPE_F64`	ع٦٤	64	Float (double)
`IR_TYPE_PTR`	مؤشر	64	Pointer
`IR_TYPE_ARRAY`	مصفوفة	varies	Array
`IR_TYPE_FUNC`	دالة	64	Function pointer type (v0.3.10.6+)

ملاحظة (v0.3.10.6): قيم IR_TYPE_FUNC تُستخدم كمؤشرات دوال (قابلة للتخزين/التحميل/المقارنة EQ/NE مع 0)، وتُخفض على x86-64 كقيمة 64-بت مثل المؤشر العادي. الدالة ir_builder_emit_call_indirect() تُستخدم للنداء غير المباشر.

6.5. Comparison Predicates

Predicate	Arabic	Description
`IR_CMP_EQ`	يساوي	Equal
`IR_CMP_NE`	لا_يساوي	Not Equal
`IR_CMP_GT`	أكبر	Greater Than (signed)
`IR_CMP_LT`	أصغر	Less Than (signed)
`IR_CMP_GE`	أكبر_أو_يساوي	Greater or Equal (signed)
`IR_CMP_LE`	أصغر_أو_يساوي	Less or Equal (signed)
`IR_CMP_UGT`	أكبر_بدون_إشارة	Greater Than (unsigned)
`IR_CMP_ULT`	أصغر_بدون_إشارة	Less Than (unsigned)
`IR_CMP_UGE`	أكبر_أو_يساوي_بدون_إشارة	Greater or Equal (unsigned)
`IR_CMP_ULE`	أصغر_أو_يساوي_بدون_إشارة	Less or Equal (unsigned)

6.6. Virtual Registers

Registers use Arabic naming with Arabic-Indic numerals:

Format: %م<n> where م = مؤقت (temporary)
Examples: %م٠, %م١, %م٢, ...

The int_to_arabic_numerals() function converts integers to Arabic-Indic digits (٠١٢٣٤٥٦٧٨٩).

6.7. Example IR Output

Baa Source:

صحيح الرئيسية() {
    صحيح س = ١٠.
    صحيح ص = ٢٠.
    إرجع س + ص.
}

Generated IR (Arabic mode):

دالة الرئيسية() -> ص٦٤ {
بداية:
    %م٠ = حجز ص٦٤
    خزن ص٦٤ ١٠, %م٠
    %م١ = حجز ص٦٤
    خزن ص٦٤ ٢٠, %م١
    %م٢ = حمل ص٦٤ %م٠
    %م٣ = حمل ص٦٤ %م١
    %م٤ = جمع ص٦٤ %م٢, %م٣
    رجوع ص٦٤ %م٤
}

6.8. IR Module API (Low-Level)

Key functions for building IR directly (without builder):

// Module
IRModule* ir_module_new(const char* name);
void ir_module_add_func(IRModule* module, IRFunc* func);
int ir_module_add_string(IRModule* module, const char* str);

// Function
IRFunc* ir_func_new(const char* name, IRType* ret_type);
int ir_func_alloc_reg(IRFunc* func);
IRBlock* ir_func_new_block(IRFunc* func, const char* label);

// Block
IRBlock* ir_block_new(const char* label, int id);
void ir_block_append(IRBlock* block, IRInst* inst);

// Instructions
IRInst* ir_inst_binary(IROp op, IRType* type, int dest, IRValue* lhs, IRValue* rhs);
IRInst* ir_inst_cmp(IRCmpPred pred, int dest, IRValue* lhs, IRValue* rhs);
IRInst* ir_inst_load(IRType* type, int dest, IRValue* ptr);
IRInst* ir_inst_store(IRValue* value, IRValue* ptr);
IRInst* ir_inst_br(IRBlock* target);
IRInst* ir_inst_br_cond(IRValue* cond, IRBlock* if_true, IRBlock* if_false);
IRInst* ir_inst_ret(IRValue* value);
IRInst* ir_inst_call(const char* target, IRType* ret_type, int dest, IRValue** args, int arg_count);
IRInst* ir_inst_call_indirect(IRValue* callee, IRType* ret_type, int dest, IRValue** args, int arg_count);
IRInst* ir_inst_phi(IRType* type, int dest);

// Printing
void ir_module_print(IRModule* module, FILE* out, int use_arabic);
void ir_module_dump(IRModule* module, const char* filename, int use_arabic);

6.9. IR Builder API (High-Level, v0.3.0.2+)

The IR Builder (src/ir_builder.h, src/ir_builder.c) provides a convenient builder pattern API:

// Builder lifecycle
IRBuilder* ir_builder_new(IRModule* module);
void ir_builder_free(IRBuilder* builder);

// Function/Block creation
IRFunc* ir_builder_create_func(IRBuilder* builder, const char* name, IRType* ret_type);
IRBlock* ir_builder_create_block(IRBuilder* builder, const char* label);
void ir_builder_set_insert_point(IRBuilder* builder, IRBlock* block);

// Register allocation
int ir_builder_alloc_reg(IRBuilder* builder);

// Emit instructions (auto-appends to current block)
int ir_builder_emit_add(IRBuilder* builder, IRType* type, IRValue* lhs, IRValue* rhs);
int ir_builder_emit_sub(IRBuilder* builder, IRType* type, IRValue* lhs, IRValue* rhs);
int ir_builder_emit_mul(IRBuilder* builder, IRType* type, IRValue* lhs, IRValue* rhs);
int ir_builder_emit_alloca(IRBuilder* builder, IRType* type);
int ir_builder_emit_load(IRBuilder* builder, IRType* type, IRValue* ptr);
void ir_builder_emit_store(IRBuilder* builder, IRValue* value, IRValue* ptr);
int ir_builder_emit_ptr_offset(IRBuilder* builder, IRType* type, IRValue* base, IRValue* index);
int ir_builder_emit_cast(IRBuilder* builder, IRType* from, IRValue* v, IRType* to);
void ir_builder_emit_br(IRBuilder* builder, IRBlock* target);
void ir_builder_emit_br_cond(IRBuilder* builder, IRValue* cond, IRBlock* if_true, IRBlock* if_false);
void ir_builder_emit_ret(IRBuilder* builder, IRValue* value);
int ir_builder_emit_call(IRBuilder* builder, const char* target, IRType* ret_type, IRValue** args, int arg_count);
int ir_builder_emit_call_indirect(IRBuilder* builder, IRValue* callee, IRType* ret_type, IRValue** args, int arg_count);

// Control flow structure helpers
void ir_builder_create_if_then(IRBuilder* builder, IRValue* cond,
                                const char* then_label, const char* merge_label,
                                IRBlock** then_block, IRBlock** merge_block);
void ir_builder_create_while(IRBuilder* builder,
                              const char* header_label, const char* body_label,
                              const char* exit_label,
                              IRBlock** header_block, IRBlock** body_block,
                              IRBlock** exit_block);

// Constants
IRValue* ir_builder_const_int(int64_t value);
IRValue* ir_builder_const_i64(int64_t value);
IRValue* ir_builder_const_bool(int value);

Benefits over low-level API:

Automatic register allocation
Automatic CFG edge management (successors/predecessors)
Source location propagation
Control flow structure helpers for if/else/while
Statistics tracking

6.10. AST → IR Lowering (Expressions, v0.3.0.3+)

Expression lowering lives in src/ir_lower.h and src/ir_lower.c and is built on top of the IR Builder (src/ir_builder.h, src/ir_builder.c).

Key concepts:

IRLowerCtx: Lowering context (builder + local bindings + control-flow stacks + debug bounds-check toggle).
ir_lower_bind_local(): Bind a variable name to its حجز pointer register. (Statement lowering will populate this in v0.3.0.4.)
Local bindings now carry array metadata (rank/dimensions/element type) to support multi-dimensional indexing.
lower_expr(): Lower AST expressions into IR operands (IRValue*) and emits IR instructions via the builder.

Currently lowered expressions:

NODE_INT, NODE_STRING, NODE_CHAR, NODE_BOOL, NODE_FLOAT, NODE_NULL
NODE_VAR_REF (loads via حمل)
NODE_BIN_OP (arithmetic, comparisons, logical ops, pointer difference)
NODE_UNARY_OP (سالب, bitwise نفي, !, UOP_ADDR via pointers, UOP_DEREF via load)
NODE_POSTFIX_OP (++/-- postfix via load + add/sub + store; expression result is the old value)
NODE_SIZEOF -> compile-time constant size
NODE_CAST -> تحويل (cast)
NODE_CALL_EXPR -> نداء (supports direct calls, and indirect calls via IR_TYPE_FUNC pointers)
Builtin string calls in NODE_CALL_EXPR (v0.3.9):
- طول_نص: loop until terminator and return length
- قارن_نص: lexicographic compare over Baa حرف
- نسخ_نص / دمج_نص: heap allocation عبر malloc + copy loops
- حرر_نص: تحرير الذاكرة عبر free
Builtin dynamic memory calls in NODE_CALL_EXPR (v0.3.11):
- حجز_ذاكرة: lowers to malloc
- تحرير_ذاكرة: lowers to free
- إعادة_حجز: lowers to realloc
- نسخ_ذاكرة: lowers to memcpy
- تعيين_ذاكرة: lowers to memset
Builtin file I/O calls in NODE_CALL_EXPR (v0.3.12):
- فتح_ملف: lowers to fopen (handle is عدم* representing FILE*)
- اغلق_ملف: lowers to fclose
- اقرأ_حرف: lowers to fgetc + UTF-8 packing into حرف
- اكتب_حرف: lowers to fputc
- اقرأ_ملف: lowers to fread
- اكتب_ملف: lowers to fwrite
- نهاية_ملف: lowers to feof
- موقع_ملف: lowers to ftello (Linux) / _ftelli64 (Windows)
- اذهب_لموقع: lowers to fseeko (Linux) / _fseeki64 (Windows)
- اقرأ_سطر: reads bytes until \\n/EOF and returns nullable نص
- اكتب_سطر: lowers to fputs + fputc('\\n')
Builtin variadic runtime calls in NODE_CALL_EXPR (v0.4.0.5):
- بدء_معاملات: initializes variadic cursor from hidden variadic base.
- معامل_تالي: reads next packed argument slot as requested type and advances cursor.
- نهاية_معاملات: clears variadic cursor.
Builtin standard-library module calls in NODE_CALL_EXPR (v0.4.2):
- Math: جذر_تربيعي -> sqrt, أس -> pow, جيب -> sin, جيب_تمام -> cos, ظل -> tan, مطلق -> llabs, عشوائي -> rand
- System: متغير_بيئة -> getenv (+ C-string → Baa string conversion), نفذ_أمر -> system
- Time: وقت_حالي -> time, وقت_كنص -> ctime (+ C-string → Baa string conversion)
Builtin error-handling calls in NODE_CALL_EXPR (v0.4.3):
- تأكد / توقف_فوري: fail-fast abort paths with message emission.
- كود_خطأ_النظام / ضبط_كود_خطأ_النظام: host errno bridge (__errno_location on Linux, _errno on Windows).
- نص_كود_خطأ: lowers to strerror + C-string → Baa string conversion.

6.11. AST → IR Lowering (Statements, v0.3.0.4+)

Statement lowering is implemented in the same module and currently supports:

NODE_VAR_DECL: emit حجز + خزن and bind the variable name via ir_lower_bind_local()
NODE_ASSIGN: emit خزن to an existing local binding
NODE_RETURN: emit رجوع
NODE_PRINT: emit نداء @اطبع(...) (builtin call)
NODE_READ: emit نداء @اقرأ(%ptr) (builtin call)
NODE_ARRAY_DECL / NODE_ARRAY_ASSIGN (including multi-dimensional index chains)
NODE_MEMBER_ASSIGN on indexed array elements and structs
NODE_DEREF_ASSIGN: store value through dereferenced pointer

6.12. AST → IR Lowering (Control Flow, v0.3.0.5+)

Control flow lowering extends statement lowering to produce a full CFG using:

قفز (unconditional branch)
قفز_شرط (conditional branch)

Currently lowered control-flow nodes:

NODE_IF: then/else/merge blocks with قفز_شرط
NODE_WHILE: header/body/exit blocks, back edge to header (قفز)
NODE_FOR: init + header/body/increment/exit blocks (استمر targets increment)
NODE_SWITCH: comparison-chain dispatch + case blocks + default + end (with fallthrough)
NODE_BREAK: branch to active loop/switch exit block
NODE_CONTINUE: branch to active loop header/increment block

For full specification, see BAA_IR_SPECIFICATION.md.

6.13. IR Printer (v0.3.0.6)

The IR printer provides a canonical, Arabic-first text format for debugging and tooling.

Core printer entry point: ir_module_print()
Instruction formatting: ir_inst_print()
Values / registers / immediates: ir_value_print()
Arabic-Indic numerals for registers: int_to_arabic_numerals()

The driver exposes the printer via the CLI flag --dump-ir implemented in src/main.c. This flag:

Parses + analyzes the source as usual.
Builds an IR module using IRBuilder and lowers AST statements using lower_stmt().
Prints IR to stdout.

Note: --dump-ir is a debug/inspection output mode. The default compilation pipeline is fully IR-based: AST → IR → Optimizer → ISel → RegAlloc → Emit.

Example invocation:

build\baa.exe --dump-ir program.baa

6.14. IR Analysis Infrastructure (v0.3.1.1)

The IR analysis layer provides foundational compiler analyses required by the upcoming optimizer pipeline:

CFG validation: ensure each block has a terminator (قفز / قفز_شرط / رجوع)
- ir_func_validate_cfg()
- ir_module_validate_cfg()
Predecessor rebuilding: recompute preds[] and succs[] from terminator instructions (useful after IR edits)
- ir_func_rebuild_preds()
- ir_module_rebuild_preds()
Dominator tree + dominance frontier: compute idom for each block and build dominance frontier sets
- ir_func_compute_dominators()
- ir_module_compute_dominators()
Loop detection (v0.3.2.7.1): natural loop discovery via back edges using dominance (src/ir_loop.c, src/ir_loop.h).
LICM (v0.3.2.7.1): conservative hoisting of pure loop-invariant computations to preheaders (src/ir_licm.c, src/ir_licm.h).
Strength reduction (v0.3.2.7.1): instruction selection reduces ضرب by power-of-two constants inside loops to shl.
Loop unrolling (v0.3.2.7.1): optional conservative full unroll for small constant trip-count loops (after Out-of-SSA) (src/ir_unroll.c, src/ir_unroll.h).
Inlining (v0.3.2.7.2): conservative inliner at -O2 for small internal functions with a single call site (src/ir_inline.c, src/ir_inline.h).

Implementation lives in src/ir_analysis.c.

6.14.5. IR Mem2Reg Pass (ترقية_الذاكرة_إلى_سجلات) — v0.3.2.5.2

Canonical Mem2Reg is a correctness-first SSA construction step that promotes a safe subset of local variables represented by حجز/خزن/حمل into SSA values:

Computes dominance + dominance frontiers (via ir_func_compute_dominators()).
Inserts فاي nodes at join points.
Performs SSA renaming to rewrite حمل/خزن into SSA register values (usually نسخ).

File: src/ir_mem2reg.c

Entry Point: ir_mem2reg_run()

Pass Descriptor: IR_PASS_MEM2REG (used with the optimizer pipeline).

Constraints (correctness-first):

No pointer escape (not passed to نداء, not used inside فاي, not stored as a value)
Alloca block must dominate all uses (ensures SSA correctness)
Must be definitely initialized before any load on all paths (must-def initialization). The initializing خزن may be in a different block as long as every path to a حمل has a prior store.

Pipeline position: Runs first inside each optimizer iteration (before Canon/InstCombine/SCCP/ConstFold/CopyProp/etc.) via ir_optimizer_run().

6.14.6. IR Out-of-SSA Pass (الخروج_من_SSA) — v0.3.2.5.2

Out-of-SSA eliminates فاي before the backend by inserting copies on CFG edges. When a predecessor has multiple successors (critical edge), the pass splits the edge to create an insertion block:

P -> B becomes P -> E -> B

File: src/ir_outssa.c

Entry Point: ir_outssa_run()

Driver integration: Executed in src/main.c before isel_run_ex() to ensure no IR_OP_PHI reaches ISel/RegAlloc/Emit.

6.14.7. IR SSA Verification (التحقق_من_SSA) — v0.3.2.5.3

SSA verification is an analysis step that validates IR invariants after Mem2Reg and before Out-of-SSA:

Single definition: each virtual register is defined exactly once (SSA property), including function parameter registers.
Dominance: every use is dominated by the register’s definition (with edge semantics for فاي).
Phi correctness (فاي): exactly one incoming value per predecessor block, no duplicates, and no non-predecessor entries.

This verifier is exposed via the CLI flag:

--verify-ssa — aborts compilation with diagnostics on the first violations (capped), and requires -O1/-O2 because Mem2Reg runs in the optimizer pipeline.

Files: src/ir_verify_ssa.c, header: src/ir_verify_ssa.h

6.14.8. IR Well-Formedness Verification (التحقق_من_سلامة_الـIR) — v0.3.2.6.5

IR well-formedness verification validates general IR invariants that should hold regardless of SSA state:

Operand counts and required fields per instruction
Type consistency between instruction results and operands
Terminator placement (must end blocks; no instructions after terminators)
Phi placement and incoming-edge shape (after rebuilding predecessors)
Intra-module call signature checks when the callee exists in the same IR module

This verifier is exposed via the CLI flag:

--verify-ir — aborts compilation with diagnostics on the first violations (capped).

Files: src/ir_verify_ir.c, header: src/ir_verify_ir.h

Pipeline position: Executed after optimization and before Out-of-SSA/backend in src/main.c.

6.14.9. IR Canonicalization Pass (توحيد_الـIR) — v0.3.2.6.5

Canonicalization normalizes instruction forms to increase matchability for later optimizations (CSE/DCE/constfold):

Commutative ops: constant placement and deterministic operand ordering
Comparisons: swap operands and predicate when the constant is on the left

File: src/ir_canon.c

Entry Point: ir_canon_run()

Pass Descriptor: IR_PASS_CANON (used with the optimizer pipeline).

6.14.9.1. IR InstCombine Pass (دمج_التعليمات) — v0.3.2.8.6

InstCombine performs fast, local instruction simplifications to improve later passes (SCCP/constfold/copyprop/DCE). It rewrites eligible instructions into نسخ (IR_OP_COPY) or constants rather than deleting SSA definitions directly.

File: src/ir_instcombine.c

Entry Point: ir_instcombine_run()

Testing: Integration validation via scripts/qa_run.py --mode full and --mode stress.

6.14.9.2. IR SCCP Pass (نشر_الثوابت_المتناثر) — v0.3.2.8.6

SCCP (Sparse Conditional Constant Propagation) combines reachability with SSA constant propagation:

Tracks reachable blocks and feasible edges.
Propagates integer constants through SSA.
Folds قفز_شرط (IR_OP_BR_COND) into قفز (IR_OP_BR) when the condition becomes constant.

File: src/ir_sccp.c

Entry Point: ir_sccp_run()

Testing: Integration validation via scripts/qa_run.py --mode full and --mode stress.

6.14.10. IR CFG Simplification Pass (تبسيط_CFG) — v0.3.2.6.5

CFG simplification reduces unnecessary control-flow structure:

قفز_شرط cond, X, X becomes قفز X
Removes trivial قفز-only blocks conservatively, avoiding unsafe phi interactions
Provides a reusable critical-edge splitting helper for IR passes

File: src/ir_cfg_simplify.c

Entry Point: ir_cfg_simplify_run()

Helper: ir_cfg_split_critical_edge()

Pass Descriptor: IR_PASS_CFG_SIMPLIFY (used with the optimizer pipeline).

6.14.11. IR Data Layout Module (نموذج_تخطيط_البيانات) — v0.3.2.6.6

The Data Layout module provides a central source of truth for target-specific type information (size, alignment, store size). Currently hardcoded for Windows x86-64, but designed to support multiple backends in the future.

File: src/ir_data_layout.c / src/ir_data_layout.h

Key API:

ir_type_size_bytes(dl, type): Returns size in bytes (e.g., i32 → 4).
ir_type_alignment(dl, type): Returns required alignment (e.g., i32 → 4).
ir_type_store_size(dl, type): Returns memory size for storage.

Arithmetic Contract (v0.3.2.6.6):

Overflow: Two's complement wrap (no undefined behavior).
Safe Division: INT64_MIN / -1 → INT64_MIN (no trap).
Safe Modulo: INT64_MIN % -1 → 0 (no trap).

6.15. IR Constant Folding Pass (طي_الثوابت) — v0.3.1.2

The IR constant folding pass optimizes Baa IR by evaluating arithmetic and comparison instructions at compile time when both operands are immediate constants. It replaces all uses of the folded register with the constant value and removes the instruction from its block.

File: src/ir_constfold.c

Entry Point: ir_constfold_run()

Pass Descriptor: IR_PASS_CONSTFOLD (used with the optimizer pipeline).

Supported Operations:

Arithmetic: جمع (add), طرح (sub), ضرب (mul), قسم (div), باقي (mod)
Comparisons: قارن (eq, ne, gt, lt, ge, le)

How it works:

Scans each function and block for foldable instructions.
If both operands are immediate integer constants, computes the result.
Replaces all uses of the destination register with a new constant IRValue.
Removes the folded instruction from its block.
Pass is function-local; virtual registers are scoped per function.

Testing: Covered by integration corpus and optimizer-enabled smoke in scripts/qa_run.py --mode full.

API: See docs/API_REFERENCE.md for function signatures.

6.16. IR Dead Code Elimination Pass (حذف_الميت) — v0.3.1.3

The IR dead code elimination pass removes useless IR after lowering/other optimizations:

Dead SSA instructions: any instruction that produces a destination register which is never used, and has no side effects.
Unreachable blocks: any basic block not reachable from the function entry block.

File: src/ir_dce.c

Entry Point: ir_dce_run()

Pass Descriptor: IR_PASS_DCE

Conservative correctness rules:

نداء (calls) are treated as side-effecting and are not removed even if the result is unused.
خزن (stores) are not removed.
Terminators (قفز, قفز_شرط, رجوع) are never removed.

CFG hygiene:

Unreachable-block removal uses ir_func_rebuild_preds() before/after pruning.
Phi nodes are pruned of incoming edges from removed predecessor blocks to avoid dangling references.

Testing: Covered by integration corpus and optimizer-enabled smoke in scripts/qa_run.py --mode full.

API: See docs/API_REFERENCE.md for function signatures.

6.17. IR Copy Propagation Pass (نشر_النسخ) — v0.3.1.4

The IR copy propagation pass removes redundant SSA copy chains by replacing uses of registers defined by نسخ (IR_OP_COPY) with their original source values. This simplifies the IR and improves the effectiveness of later passes (like common subexpression elimination and dead code elimination).

File: src/ir_copyprop.c

Entry Point: ir_copyprop_run()

Pass Descriptor: IR_PASS_COPYPROP

Scope: Function-local (virtual registers are scoped per function in the current IR).

What it does:

Detects IR_OP_COPY instructions (نسخ) and builds an alias map (%مX → source value).
Canonicalizes copy chains (e.g. %م٣ = نسخ %م٢, %م٢ = نسخ %م١) so %م٣ is rewritten to %م١.
Rewrites operands in:
- normal instruction operands
- نداء call arguments
- فاي phi incoming values
Removes نسخ instructions after propagation.

Testing: Covered by integration corpus and optimizer-enabled smoke in scripts/qa_run.py --mode full.

6.18. IR Common Subexpression Elimination Pass (حذف_المكرر) — v0.3.1.5

The IR common subexpression elimination (CSE) pass detects duplicate computations with identical opcode and operands, replacing subsequent uses with the first computed result.

File: src/ir_cse.c

Entry Point: ir_cse_run()

Pass Descriptor: IR_PASS_CSE

Algorithm:

For each function and block, hash each pure expression (opcode + operand signatures).
If a duplicate hash is found, replace all uses of the duplicate instruction's destination register with the original result.
Remove redundant instructions after propagation.

Eligible Operations (pure, no side effects):

Arithmetic: جمع (add), طرح (sub), ضرب (mul), قسم (div), باقي (mod)
Comparisons: قارن (compare)
Logical: و (and), أو (or), نفي (not)

NOT Eligible (side effects or non-deterministic):

Memory: حجز (alloca), حمل (load), خزن (store)
Control: نداء (call), فاي (phi), terminators (branches/returns)

Pipeline position: Enabled at -O2 after GVN.

Testing: Covered by integration corpus and optimizer-enabled smoke in scripts/qa_run.py --mode full.

API: See docs/API_REFERENCE.md for function signatures.

6.18.1. IR Global Value Numbering Pass (ترقيم_القيم_العالمية) — v0.3.2.8.6

GVN (Global Value Numbering) removes redundant pure expressions across dominator scopes, even when they use different SSA registers due to copies. Unlike CSE which relies on exact opcode/operand matching, GVN assigns value numbers to expressions based on their semantic equivalence.

File: src/ir_gvn.c

Entry Point: ir_gvn_run()

Pass Descriptor: IR_PASS_GVN

Algorithm:

Computes dominance tree for each function.
Assigns value numbers to expressions based on opcode and operand value numbers.
Detects equivalent expressions even when they use different SSA registers.
Replaces redundant computations with the original value.

Pipeline position: Enabled at -O2 after copy propagation and before CSE.

Testing: Covered by integration corpus and optimizer-enabled smoke in scripts/qa_run.py --mode full.

6.18.2. IR Loop Invariant Code Motion Pass (حركة_التعليمات_غير_المتغيرة) — v0.3.2.7.1

LICM (Loop Invariant Code Motion) identifies pure instructions inside loops that depend only on values outside the loop and moves them to the loop preheader.

File: src/ir_licm.c

Entry Point: ir_licm_run()

Pass Descriptor: IR_PASS_LICM

Safety Constraints:

Does not move memory operations or calls.
Does not move division/remainder to avoid changing trap behavior when the loop is not entered.
Requires a single preheader for the loop header (otherwise skips that loop).

Pipeline position: Runs in both -O1 and -O2 after CFG simplification.

Testing: Covered by integration corpus and optimizer-enabled smoke in scripts/qa_run.py --mode full.

6.18.3. IR Inlining Pass (تضمين_الدوال) — v0.3.2.7.2

The inlining pass expands function calls directly at their call sites, enabling further optimizations by exposing the function body to the optimizer.

File: src/ir_inline.c

Entry Point: ir_inline_run()

Algorithm:

Conservatively inlines small internal functions with a single call site.
Applied before Mem2Reg (before SSA construction) to avoid phi complexity.
Relies on Mem2Reg + subsequent optimization passes for "cleanup after inlining".

Pipeline position: Enabled at -O2 only, runs before the main optimization loop.

Testing: Covered by integration corpus and optimizer-enabled smoke in scripts/qa_run.py --mode full.

6.18.4. IR Loop Unrolling Pass (فك_الحلقات) — v0.3.2.7.1

Loop unrolling replicates loop bodies to reduce loop overhead and enable further optimizations.

File: src/ir_unroll.c

Entry Point: ir_unroll_run()

Constraints:

Only if the trip count is constant and small.
Only on natural loops with a single preheader.
Runs after Out-of-SSA because that makes loop values explicit through copies.

Pipeline position: Enabled with -funroll-loops flag after Out-of-SSA.

Testing: Covered by integration corpus and optimizer-enabled smoke in scripts/qa_run.py --mode full.

6.19. Instruction Selection (اختيار_التعليمات) — v0.3.2.1

The instruction selection pass converts Baa IR (SSA form) into an abstract machine representation (MachineModule) that closely mirrors x86-64 instructions while keeping virtual registers. Physical register assignment is deferred to the register allocation pass (v0.3.2.2).

Files: src/isel.h, src/isel.c

Entry Point: isel_run_ex() — takes an IRModule* plus a BaaTarget to select ABI/object-format behavior.

Multi-target note (v0.3.2.8.1): The backend is being refactored to accept a BaaTarget descriptor (src/target.h) so the same IR can be lowered for Windows x64 (COFF) or Linux x86-64 (ELF). The driver exposes this via --target=....

6.19.1. Architecture Overview

IRModule ──→ isel_run_ex() ──→ MachineModule
  IRFunc        │              MachineFunc
  IRBlock       │              MachineBlock
  IRInst        │              MachineInst (1:N expansion)
                ▼
         ISelCtx (internal context)
         - current function/block
         - vreg counter
         - stack size tracking

Each IR instruction is lowered to one or more MachineInst nodes. The expansion ratio is typically 1:1 to 1:4 depending on the IR opcode (e.g., IR_OP_DIV expands to MOV + CQO + IDIV).

6.19.2. Key Data Structures

Structure	Description
`MachineOp`	Enum of x86-64 opcodes: ADD, SUB, IMUL, SHL, SHR, SAR, IDIV, DIV, NEG, CQO, ADDSD, SUBSD, MULSD, DIVSD, UCOMISD, XORPD, CVTSI2SD, CVTTSD2SI, MOV, LEA, LOAD, STORE, CMP, TEST, SETcc (E, NE, G, L, GE, LE, A, B, AE, BE, P, NP), MOVZX, MOVSX, AND, OR, NOT, XOR, JMP, JE, JNE, CALL, TAILJMP, RET, PUSH, POP, NOP, LABEL, COMMENT
`MachineOperandKind`	NONE, VREG, IMM, MEM, LABEL, GLOBAL, FUNC, XMM
`MachineOperand`	Union: vreg number, immediate value, memory (base+offset), label id, global/func name, xmm register
`MachineInst`	Doubly-linked list node: op + dst/src1/src2 + ir_reg + comment + src_loc + dbg_name + sysv_al (for varargs)
`MachineBlock`	Label + instruction list + successors + linked-list next
`MachineFunc`	Name + block list + vreg counter + stack_size + param_count
`MachineModule`	Function list + globals (ref from IR) + strings (ref from IR) + baa_strings

MachineInst Structure:

typedef struct MachineInst {
    MachineOp op;               // كود العملية
    MachineOperand dst;         // المعامل الوجهة
    MachineOperand src1;        // المعامل المصدر الأول
    MachineOperand src2;        // المعامل المصدر الثاني (اختياري)

    // معلومات تعقب المصدر
    int ir_reg;                 // سجل IR الأصلي (للربط مع IR)
    const char* comment;        // تعليق اختياري (لسهولة القراءة)

    // معلومات الديبغ (Debug Info)
    const char* src_file;
    int src_line;
    int src_col;
    int ir_inst_id;             // معرّف تعليمة IR (إن وُجد)
    const char* dbg_name;       // اسم متغير/رمز اختياري

    // SystemV AMD64 varargs: AL = عدد سجلات XMM المستخدمة لتمرير المعاملات.
    // -1 => لا يُطلب إعداد AL صراحةً (الافتراضي 0).
    int sysv_al;

    // القائمة المترابطة المزدوجة
    struct MachineInst* prev;
    struct MachineInst* next;
} MachineInst;

MachineBlock Structure:

typedef struct MachineBlock {
    char* label;                // اسم الكتلة
    int id;                     // معرف الكتلة
    MachineInst* first;         // أول تعليمة
    MachineInst* last;          // آخر تعليمة
    int inst_count;             // عدد التعليمات
    struct MachineBlock* succs[2];  // الخلفاء (0-2)
    int succ_count;
    struct MachineBlock* next;  // الكتلة التالية في القائمة
} MachineBlock;

MachineFunc Structure:

typedef struct MachineFunc {
    char* name;                 // اسم الدالة
    MachineBlock* blocks;       // قائمة الكتل
    int block_count;
    int next_vreg;              // عداد السجلات الافتراضية
    int stack_size;             // حجم المكدس المحلي
    int param_count;            // عدد المعاملات
    struct MachineFunc* next;   // الدالة التالية
} MachineFunc;

MachineModule Structure:

typedef struct MachineModule {
    MachineFunc* funcs;         // قائمة الدوال
    int func_count;
    IRGlobal* globals;          // مرجع من IR (غير مملوك)
    int global_count;
    IRStringEntry* strings;     // مرجع من IR (غير مملوك)
    int string_count;
    IRBaaStringEntry* baa_strings;  // مرجع من IR (غير مملوك)
    int baa_string_count;
} MachineModule;

6.19.3. Instruction Lowering Patterns

IR Opcode	Machine Pattern	Notes
`IR_OP_ADD` / `IR_OP_SUB` / `IR_OP_MUL`	`MOV dst, lhs; OP dst, rhs`	Two-address form. Immediates inlined as src2
`IR_OP_DIV` / `IR_OP_MOD`	`MOV RAX, lhs; CQO; IDIV rhs`	If rhs is immediate, temp vreg is allocated for MOV
`IR_OP_NEG`	`MOV dst, src; NEG dst`	Two-instruction pattern
`IR_OP_ALLOCA`	`LEA dst, [RBP - offset]`	Stack offset tracked in `ISelCtx.stack_size`
`IR_OP_LOAD`	`LOAD dst, [ptr]` or `LOAD dst, @global`	Global variables use MACH_OP_GLOBAL operand
`IR_OP_STORE`	`STORE [ptr], src`	Immediate values can be stored directly to memory
`IR_OP_PTR_OFFSET`	`MOV dst, base; (scale index); ADD dst, index_scaled`	Used for array indexing: computes element address using data layout element size
`IR_OP_CMP`	`CMP lhs, rhs; SETcc tmp; MOVZX dst, tmp`	SETcc selected by predicate (EQ/NE/GT/LT/GE/LE). If LHS is immediate, temp vreg is used
`IR_OP_AND` / `IR_OP_OR` / `IR_OP_XOR`	`MOV dst, lhs; OP dst, rhs`	Same two-address form as arithmetic
`IR_OP_SHL`	`MOV dst, lhs; SHL dst, rhs`	If rhs is non-immediate, count is moved to RCX/CL
`IR_OP_SHR`	`MOV dst, lhs; SHR/SAR dst, rhs`	`SHR` for unsigned types, `SAR` for signed types
`IR_OP_NOT`	`MOV dst, src; NOT dst`	Bitwise NOT
`IR_OP_BR`	`JMP label`	Unconditional jump
`IR_OP_BR_COND`	`TEST cond, cond; JNE true_label; JMP false_label`	Three-instruction pattern
`IR_OP_RET`	`MOV RAX, val; RET`	Uses special vreg -2 (= RAX)
`IR_OP_CALL`	`MOV param_regs, args...; (setup stack args); CALL @func/*reg; MOV dst, RAX`	Direct: `CALL @func`. Indirect: `CALL *reg` (callee value). ABI: Windows (shadow) / SysV (no shadow). In v0.4.0.5 variadic Baa calls pass packed extras via hidden `__baa_va_base` pointer. In v0.4.0.6 inline asm is lowered كـ pseudo-call (`__baa_inline_asm_v0406`) ويُحوّل في ISel إلى أسطر تجميع خام مع نقل مدخلات/مخرجات السجلات.
`IR_OP_CALL` + `IR_OP_RET` (tail)	`MOV param_regs, args...; TAILJMP @func`	v0.3.2.7.3: مفعل فقط عند `-O2` وبشكل محافظ (register args only)
`IR_OP_PHI`	`NOP`	Placeholder; copy insertion deferred to register allocation
`IR_OP_CAST`	`MOV dst, src` (larger/same size) or `MOVZX/MOVSX dst, src` (smaller to larger)	Size and sign dependent conversion (`تحويل`)

6.19.4. Special Virtual Register Conventions

The instruction selector uses negative vreg numbers to represent physical register constraints that will be resolved during register allocation:

Vreg	Physical Register	Purpose
-1	RBP	Memory base for stack accesses
-2	RAX	Return value register
-3	RSP	Stack pointer base for outgoing call frames
-4	R11	Reserved scratch register (spill-base fixups, mem-to-mem avoidance)
-5	RDX	Remainder register for `idiv` / backend fixed constraint
-6	RCX	Shift count register (`cl`) for variable shifts
-10..	ABI arg regs	Function arguments (target-dependent). Windows: -10..-13 → RCX/RDX/R8/R9. SysV: -10..-15 → RDI/RSI/RDX/RCX/R8/R9

6.19.5. Design Decisions

Virtual registers preserved: ISel keeps IR virtual register numbers intact. Physical register mapping is entirely deferred to v0.3.2.2 (register allocation).
Immediate inlining: Constants are embedded as MACH_OP_IMM wherever x86-64 encoding permits. Where not allowed (CMP first operand, IDIV divisor), a temp vreg + MOV is emitted.
Phi nodes as NOPs: Phi instructions become NOP placeholders. Actual copy insertion into predecessor blocks is deferred to SSA destruction during register allocation.
MachineModule references IR data: Global variables and string tables are referenced (not copied) from the IR module. Memory is freed by the IR module.
Stack size tracking: Each IR_OP_ALLOCA increases stack_size by the store size of the allocated pointee type (rounded up to its alignment via the target data layout). The LEA instruction uses the accumulated offset.

Testing: Backend behavior is validated by integration runtime tests under tests/integration/backend/.

6.20. Register Allocation (تخصيص_السجلات) — v0.3.2.2

The register allocator transforms virtual register references in machine instructions into physical x86-64 registers. It uses the Linear Scan algorithm for simplicity and fast compilation.

Source: src/regalloc.h / src/regalloc.c

6.20.1. Architecture Overview

MachineModule (vregs)
    │
    ├── 1. Number Instructions    ← Sequential numbering for position tracking
    ├── 2. Compute def/use        ← Per-block def/use bitsets
    ├── 3. Liveness Analysis      ← Iterative dataflow → live-in/live-out
    ├── 4. Build Live Intervals   ← vreg → [start, end] ranges
    ├── 5. Linear Scan            ← Assign physical registers, spill on pressure
    ├── 6. Insert Spill Code      ← Handle spilled vregs
    └── 7. Rewrite Operands       ← Replace VREG → physical reg / MEM
    │
    ▼
MachineModule (physical regs)

6.20.2. Key Data Structures

Structure	Purpose
`PhysReg`	Enum of 16 x86-64 physical registers (RAX=0 through R15=15)
`LiveInterval`	Per-vreg range: `{vreg, start, end, phys_reg, spilled, spill_offset}`
`BlockLiveness`	Per-block bitsets: `{def, use, live_in, live_out}` as `uint64_t*` arrays
`RegAllocCtx`	Full context: function, inst_map, block liveness, intervals, vreg→phys mapping, spill tracking

6.20.3. Allocation Order

Registers are allocated in a specific priority order to minimize callee-save overhead:

Caller-saved temporaries: R10 (free to use, no save/restore). R11 is reserved as a scratch register for spill/base fixups.
General purpose: RSI, RDI (caller-saved on Windows x64)
Callee-saved: RBX, R12, R13, R14, R15 (require save/restore in prologue/epilogue)
ABI-reserved: RCX, RDX, R8, R9 (argument registers, allocated last). RAX is reserved for return value and backend scratch sequences.

Always reserved: RSP (stack pointer), RBP (frame pointer) — never allocated.

6.20.4. Special Virtual Register Conventions

ISel emits negative vregs for ABI-fixed locations. The register allocator resolves these during rewrite:

Virtual Reg	Physical Reg	Purpose
`-1`	RBP	Frame pointer (memory base)
`-2`	RAX	Return value
`-4`	R11	Scratch register for spilled memory bases
`-5`	RDX	Remainder register (`idiv`)
`-6`	RCX	Shift count register (`cl`)
`-10`	RCX	1st argument (Windows x64)
`-11`	RDX	2nd argument (Windows x64)
`-12`	R8	3rd argument (Windows x64)
`-13`	R9	4th argument (Windows x64)

6.20.5. Liveness Analysis

The liveness analysis uses iterative dataflow on bitsets:

def/use computation: Walk each block's instructions. For each instruction, if a vreg is used before being defined in the block, it goes into use. If defined, it goes into def. Two-address form (e.g., add dst, dst, src) records dst as both use and def.
Dataflow iteration: Iterate in reverse block order until fixpoint (max 100 iterations):
- live_out[B] = union(live_in[S]) for all successors S of B
- live_in[B] = use[B] union (live_out[B] - def[B])
Interval construction: Walk instructions sequentially, extending intervals for vregs in live_in/live_out sets at block boundaries.

6.20.6. Spilling

When register pressure exceeds available registers, the allocator spills the longest-lived interval (comparing current candidate vs active intervals). Spilled vregs are assigned stack offsets relative to RBP. During rewrite, spilled VREG operands are converted to MEM operands [RBP + offset], leveraging x86-64's ability to have one memory operand per instruction. Exception: if a spilled vreg is used as the base of a memory operand (e.g. MACH_LOAD/MACH_STORE through a spilled pointer), the allocator reloads the pointer base into a reserved scratch register (R11) immediately before the instruction.

6.20.7. Design Decisions

Linear scan over graph coloring: Chosen for simplicity and O(n log n) compilation speed. Sufficient for the current optimization level.
Spill via rewrite (not explicit loads/stores): Spilled vregs become [RBP+offset] MEM operands directly, avoiding extra load/store insertion. Works because x86-64 allows one memory operand per instruction. Exception: spilled pointer bases used in MACH_OP_MEM.base_vreg are reloaded into R11 before MACH_LOAD/MACH_STORE.
RSP/RBP always reserved: Frame pointer is always maintained for simple stack access. No frame pointer omission.
Callee-saved tracking: RegAllocCtx.callee_saved_used[] tracks which callee-saved registers are allocated, informing prologue/epilogue generation in the code emission phase.

Testing: Register allocation behavior is validated by integration runtime tests under tests/integration/backend/.

6.20.8. Target Abstraction Layer (تجريد الهدف) — v0.3.2.8.1

The backend uses a target abstraction layer (src/target.h, src/target.c) to separate OS/object-format/calling-convention assumptions from the rest of the backend (isel/regalloc/emit). This enables support for multiple targets.

Target Kinds:

typedef enum {
    BAA_TARGET_X86_64_WINDOWS = 0,   // Windows x86-64 (COFF/PE)
    BAA_TARGET_X86_64_LINUX   = 1,   // Linux x86-64 (ELF)
} BaaTargetKind;

typedef enum {
    BAA_OBJFORMAT_COFF = 0,  // Windows PE/COFF
    BAA_OBJFORMAT_ELF  = 1,  // Linux ELF
} BaaObjectFormat;

Calling Convention Descriptor:

typedef struct BaaCallingConv {
    int int_arg_reg_count;            // عدد سجلات معاملات الأعداد الصحيحة
    int int_arg_phys_regs[8];         // PhysReg values (from regalloc.h)

    int ret_phys_reg;                 // PhysReg (عادةً RAX)

    unsigned int callee_saved_mask;   // bitmask over PhysReg
    unsigned int caller_saved_mask;   // bitmask over PhysReg

    int stack_align_bytes;            // محاذاة المكدس عند نقاط الاستدعاء (عادة 16)

    // تمثيل سجلات معاملات ABI داخل Machine IR كسجلات افتراضية سالبة
    // arg i -> (abi_arg_vreg0 - i)
    int abi_arg_vreg0;                // افتراضي: -10
    int abi_ret_vreg;                 // افتراضي: -2 (RAX)

    int shadow_space_bytes;           // Windows: 32, SysV: 0
    bool home_reg_args_on_call;       // Windows varargs: true, SysV: false
    bool sysv_set_al_zero_on_call;    // SysV varargs rule: true
} BaaCallingConv;

Target Descriptor:

typedef struct BaaTarget {
    BaaTargetKind kind;
    const char* name;                 // short name: x86_64-windows, x86_64-linux
    const char* triple;               // future: full triple

    BaaObjectFormat obj_format;
    const IRDataLayout* data_layout;
    const BaaCallingConv* cc;

    const char* default_exe_ext;      // ".exe" on Windows, "" on Linux
} BaaTarget;

Target Selection:

const BaaTarget* baa_target_builtin_windows_x86_64(void);
const BaaTarget* baa_target_builtin_linux_x86_64(void);
const BaaTarget* baa_target_host_default(void);
const BaaTarget* baa_target_parse(const char* s);  // "x86_64-windows" or "x86_64-linux"

ABI Differences:

Feature	Windows x64	SystemV AMD64 (Linux)
Integer args	RCX, RDX, R8, R9	RDI, RSI, RDX, RCX, R8, R9
Shadow space	32 bytes	None
Varargs	Home register args on stack	Set AL = number of XMM args
Callee-saved	RBX, RBP, RDI, RSI, R12-R15	RBX, RBP, R12-R15
Stack alignment	16 bytes	16 bytes

Backend Integration:

isel_run_ex() takes const BaaTarget* to select ABI/object-format behavior
regalloc_run_ex() accepts target for calling convention-aware allocation
emit_module_ex() uses target for code emission decisions (sections, symbols)

6.21. Code Emission (إصدار كود التجميع) — v0.3.2.3

The code emission pass is the final backend stage that converts machine IR (after register allocation) into x86-64 assembly text in AT&T syntax, compatible with GAS (GNU Assembler) on Windows.

Source: src/emit.h / src/emit.c

6.21.1. Architecture Overview

MachineModule (physical regs)
    │
    ├── 1. Emit rodata section    ← Format strings (COFF: .rdata, ELF: .rodata)
    ├── 2. Emit .data section     ← Global variables with initializers
    ├── 3. Emit .text section     ← Functions:
    │   ├── Function prologue     ← Stack setup + callee-saved preservation
    │   ├── Instruction emission  ← Translate each MachineInst to AT&T
    │   └── Function epilogue     ← Callee-saved restoration + return
    └── 4. Emit string table      ← .Lstr_N labels for string literals
    │
    ▼
Assembly file (.s)

6.21.2. AT&T Syntax Conventions

Aspect	AT&T Syntax	Intel Syntax (for comparison)
Register prefix	`%rax`, `%rcx`	`rax`, `rcx`
Immediate prefix	`$10`	`10`
Operand order	`mov source, dest`	`mov dest, source`
Size suffix	`movq` (64-bit), `movl` (32-bit), `movb` (8-bit)	`mov qword`, `mov dword`, `mov byte`
Memory addressing	`offset(%base)`	`[base + offset]`

6.21.3. Function Prologue Generation

The prologue sets up the stack frame and preserves callee-saved registers:

push %rbp              # Save old frame pointer
mov %rsp, %rbp         # Set up new frame pointer
sub $N, %rsp           # Allocate stack space (N = local + shadow + callee-save, 16-byte aligned)
mov %rbx, -8(%rbp)     # Save callee-saved registers (if used)
mov %r12, -16(%rbp)
...

Stack frame layout:

High addresses
    ┌─────────────────┐
    │  Return address │ ← pushed by CALL
    ├─────────────────┤
    │  Old RBP        │ ← pushed by prologue
    ├─────────────────┤ ← RBP points here
    │  Local vars     │ (func->stack_size bytes)
    ├─────────────────┤
    │  Shadow space   │ (32 bytes for Windows x64)
    ├─────────────────┤
    │  Callee-saved   │ (RBX, R12-R15 if used)
    ├─────────────────┤ ← RSP points here (16-byte aligned)
Low addresses

Callee-saved register detection:

The emitter scans all instructions in the function to determine which callee-saved registers (RBX, RSI, RDI, R12-R15) are used as destinations. Only used registers are preserved in the prologue and restored in the epilogue.

6.21.4. Function Epilogue Generation

The epilogue restores callee-saved registers and tears down the stack frame:

mov -16(%rbp), %r12    # Restore callee-saved registers (reverse order)
mov -8(%rbp), %rbx
leave                  # Equivalent to: mov %rbp, %rsp; pop %rbp
ret                    # Return to caller

6.21.5. Instruction Emission

Each MachineInst is translated to one or more AT&T assembly instructions:

Machine Op	AT&T Output	Notes
`MACH_MOV`	`movq %src, %dst`	Skips redundant `mov %reg, %reg`
`MACH_ADD`	`addq %src2, %dst`	Two-address form (dst = dst + src2)
`MACH_SUB`	`subq %src2, %dst`	Two-address form
`MACH_IMUL`	`imulq %src2, %dst`	Two-address form
`MACH_NEG`	`negq %dst`	Unary negation
`MACH_CQO`	`cqo`	Sign-extend RAX into RDX:RAX
`MACH_IDIV`	`idivq %src1`	Signed division (RDX:RAX / src1)
`MACH_LEA`	`leaq offset(%base), %dst`	Load effective address
`MACH_LOAD`	`movq offset(%base), %dst`	Load from memory
`MACH_STORE`	`movq %src, offset(%base)`	Store to memory
`MACH_CMP`	`cmpq %src2, %src1`	Compare (sets flags)
`MACH_TEST`	`testq %src2, %src1`	Bitwise AND (sets flags)
`MACH_SETcc`	`sete %dst8`	Set byte if condition (6 variants: E, NE, G, L, GE, LE)
`MACH_MOVZX`	`movzbq %src8, %dst64`	Zero-extend byte to qword
`MACH_AND`	`andq %src2, %dst`	Bitwise AND
`MACH_OR`	`orq %src2, %dst`	Bitwise OR
`MACH_NOT`	`notq %dst`	Bitwise NOT
`MACH_XOR`	`xorq %src2, %dst`	Bitwise XOR
`MACH_JMP`	`jmp .LBB_N`	Unconditional jump
`MACH_JE`	`je .LBB_N`	Jump if equal
`MACH_JNE`	`jne .LBB_N`	Jump if not equal
`MACH_CALL`	`sub $32, %rsp; call <sym>; add $32, %rsp` / `sub $32, %rsp; call *%reg; add $32, %rsp`	Direct/indirect call. Shadow space on Windows only
`MACH_TAILJMP`	`restore callee-saved; leave; home args; jmp func`	Tail call optimization (no new return address)
`MACH_RET`	(triggers epilogue emission)	Return handled by epilogue
`MACH_PUSH`	`pushq %src`	Push to stack
`MACH_POP`	`popq %dst`	Pop from stack
`MACH_LABEL`	`.LBB_N:`	Block label
`MACH_NOP`	(skipped)	No operation

6.21.6. Data Section Emission

Format strings (rodata):

.section .rdata,"dr"  # COFF (Windows)
.section .rodata       # ELF (Linux)
fmt_int: .asciz "%d\n"
fmt_str: .asciz "%s\n"
fmt_scan_int: .asciz "%d"

Global variables (.data):

.data
global_var: .quad 42           # Integer initializer
global_str: .quad .Lbs_0       # Baa string pointer initializer (ptr<char>)
global_fp:  .quad جمع          # Function pointer initializer (func address)

Linkage note (v0.3.7.5):

Globals lowered from ساكن use internal linkage.
ELF emission prints .local <symbol> for internal globals.
Non-internal globals are exported with .globl <symbol>.

String tables (rodata):

.section .rdata,"dr"  # COFF (Windows)
.section .rodata       # ELF (Linux)

# C strings (i8*) for printf/scanf formats, etc.
.Lstr_0: .asciz "%d\n"
.Lstr_1: .asciz "%s\n"

# Baa strings (char[]) as packed .quad entries, null-terminated.
.Lbs_0:
    .quad <packed 'م'>
    .quad <packed 'ر'>
    .quad <packed 'ح'>
    .quad <packed 'ب'>
    .quad <packed 'ا'>
    .quad 0

6.21.7. Function Name Translation

The emitter translates Arabic function names to their C runtime equivalents:

Baa Name	Assembly Name	Purpose
`الرئيسية`	`main`	Program entry point
`اطبع`	`printf`	Print function
`اقرأ`	`scanf`	Input function

6.21.8. ABI Compliance (Windows x64 + SystemV AMD64)

This backend is being refactored to support multiple ABIs via BaaTarget (src/target.h).

Windows x64 ABI: shadow space (32 bytes) around calls; first 4 args in RCX/RDX/R8/R9; return in RAX.
SystemV AMD64 (Linux): no shadow space; first 6 args in RDI/RSI/RDX/RCX/R8/R9; return in RAX; varargs require AL=0 when no XMM args.

6.21.9. Design Decisions

AT&T syntax: Chosen for compatibility with GAS (GNU Assembler) which is the default on MinGW-w64.
Redundant move elimination: The emitter skips mov %reg, %reg instructions that may result from register allocation.
Callee-saved detection: Scans all instructions to determine which registers need preservation, minimizing prologue/epilogue overhead.
Call frame management: Allocates shadow space on Windows; emits SysV call sequence on ELF targets.
Size suffix inference: Determines instruction size suffix (q/l/w/b) from operand size_bits field.

Entry Points:

emit_module() — Top-level entry point for complete assembly file
emit_func() — Emits single function with prologue/epilogue
emit_inst() — Translates individual machine instruction

Testing: Integration testing via full compilation pipeline (no standalone unit tests yet).

8. Global Data Section

Section	Contents
`.data`	Global variables (mutable)
`.rdata` / `.rodata`	String literals (read-only)
`.text`	Executable code

String Table

Strings are collected during parsing and emitted with unique labels:

# COFF (Windows)
.section .rdata,"dr"
.LC0:
    .asciz "مرحباً"
.LC1:
    .asciz "العالم"

# ELF (Linux)
.section .rodata
.LC0:
    .asciz "مرحباً"
.LC1:
    .asciz "العالم"

9. Naming & Entry Point

Aspect	Details
Entry Point	`الرئيسية` → exported as `main`
Name Mangling	None - functions use their Arabic UTF-8 names as assembly labels
Special Case	`الرئيسية` is explicitly exported as `main` using `.globl main`
Main with args (v0.3.12.5)	If the user defines `صحيح الرئيسية(صحيح عدد، نص[] معاملات)`, the compiler lowers the user function as `__baa_user_main` and emits an ABI wrapper named `الرئيسية` (exported as `main`). The wrapper converts C `char** argv` into Baa `نص[]` (`حرف[]` packed UTF-8) before calling `__baa_user_main`.
Custom startup (v0.3.12.5)	`--startup=custom` selects a custom entry symbol `__baa_start` (driver injects a small startup stub and links with `-Wl,-e,__baa_start`). The stub delegates to CRT/libc startup (`mainCRTStartup` on Windows, `__libc_start_main` on Linux).
External Calls	C runtime (`printf`, etc.) via toolchain symbol resolution

← Language Spec | API Reference →

FilesExpand file tree

INTERNALS.md

Latest commit

History

INTERNALS.md

File metadata and controls

Baa Compiler Internals

Table of Contents

1. Pipeline Architecture

1.1. Compilation Stages

1.1.1. Component Map

1.2. The Driver (CLI)

1.2.1. Component Boundaries & Size Guard (v0.5.0 sidecar)

1.3. Diagnostic Engine

1.3.1. Benchmarking (v0.3.2.9.2)

1.3.2. Regression Testing (v0.3.8)

2. Lexical Analysis

2.1. Internal Structure

2.2. Preprocessor Logic

2.2.1. Definitions (#تعريف)

2.2.2. Conditionals (#إذا_عرف)

2.2.3. Undefine (#الغاء_تعريف)

2.2.4. Include (#تضمين)

2.2.5. Conditional Stack Implementation

2.3. Key Features

2.4. Token Types

3. Syntactic Analysis

3.0. Parser Structure

3.1. Grammar (BNF)

3.2. Expression Precedence

3.3. Error Handling Strategy

4. Abstract Syntax Tree

4.1. Node Types

4.2. Node Structure

5. Semantic Analysis

5.1. Responsibilities

5.1.1. Multi-File Symbol Visibility (v0.5.2)

5.2. Constant Checking (v0.2.7+)

5.3. Warning Generation (v0.2.8+)

Unused Variable Detection

Dead Code Detection

Variable Shadowing

Implicit Narrowing Conversions (v0.3.5.5)

Signed/Unsigned Mixed Comparisons (v0.3.5.5)

Low-Level Semantic Checks (v0.3.6)

5.4. Isolation Note

5.5. Analysis Limits and Constants

5.6. Symbol Table Structures

Symbol (Variable Symbol)

FuncSymbol (Function Symbol)

Function Pointer Signature (FuncPtrSig)

Compound Type Definitions

Hash-Indexed Symbol Lookup (v0.3.7+)

5.7. DataType and Operation Enums

5.8. Memory Allocation

5.9. Constant Folding (Optimization)

6. Intermediate Representation (v0.3.10.6)

6.1. Design Philosophy

6.2. IR Structure

6.2.1. IR Memory Management (IR Arena) — v0.3.2.6.1

6.2.2. IR Module Context

6.3. IR Opcodes (Arabic)

6.4. IR Types (Arabic)

6.5. Comparison Predicates

6.6. Virtual Registers

6.7. Example IR Output

6.8. IR Module API (Low-Level)

6.9. IR Builder API (High-Level, v0.3.0.2+)

6.10. AST → IR Lowering (Expressions, v0.3.0.3+)

6.11. AST → IR Lowering (Statements, v0.3.0.4+)

6.12. AST → IR Lowering (Control Flow, v0.3.0.5+)

6.13. IR Printer (v0.3.0.6)

6.14. IR Analysis Infrastructure (v0.3.1.1)

6.14.5. IR Mem2Reg Pass (ترقية_الذاكرة_إلى_سجلات) — v0.3.2.5.2

6.14.6. IR Out-of-SSA Pass (الخروج_من_SSA) — v0.3.2.5.2

6.14.7. IR SSA Verification (التحقق_من_SSA) — v0.3.2.5.3

6.14.8. IR Well-Formedness Verification (التحقق_من_سلامة_الـIR) — v0.3.2.6.5

6.14.9. IR Canonicalization Pass (توحيد_الـIR) — v0.3.2.6.5

6.14.9.1. IR InstCombine Pass (دمج_التعليمات) — v0.3.2.8.6

6.14.9.2. IR SCCP Pass (نشر_الثوابت_المتناثر) — v0.3.2.8.6

2.2.1. Definitions (`#تعريف`)

2.2.2. Conditionals (`#إذا_عرف`)

2.2.3. Undefine (`#الغاء_تعريف`)

2.2.4. Include (`#تضمين`)

Function Pointer Signature (`FuncPtrSig`)