Track the development progress of the Baa programming language. Current Status: Phase 4.5 - Bootstrap Readiness (v0.5.x) ← IN PROGRESS
Goal: Produce a Kernighan & Ritchie–style first book for Baa, in Arabic, serving as the definitive learning + reference resource.
- [~] Write the Arabic "Baa Book" — book-length guide in Arabic with exercises. — draft exists in
docs/BAA_BOOK_AR.md - Define terminology glossary — consistent Arabic technical vocabulary.
- Create example suite — verified, idiomatic examples that compile with v0.3.7.
- Add exercises and challenges — per chapter, with expected outputs.
- Add debugging and performance chapters — common pitfalls, diagnostics, optimization notes.
- Native technical review — review by Arabic-speaking engineers before release.
Goal: Decouple the language from x86 Assembly to enable optimizations and multiple backends.
Design Document: See BAA_IR_SPECIFICATION.md for full IR specification.
- Define
IROpenum — All opcodes:IR_OP_ADD,IR_OP_SUB,IR_OP_MUL, etc. - Define
IRTypeenum — Types:IR_TYPE_I64,IR_TYPE_I32,IR_TYPE_I8,IR_TYPE_I1,IR_TYPE_PTR. - Define
IRInststruct — Instruction with opcode, type, dest register, operands. - Define
IRBlockstruct — Basic block with label, instruction list, successors. - Define
IRFuncstruct — Function with name, return type, entry block, register counter. - Create
ir.h— Header file with all IR definitions. - Create
ir.c— Implementation with helper functions and IR printing.
-
IRBuildercontext struct — Builder pattern with insertion point tracking. -
ir_builder_create_func()— Create a new IR function. -
ir_builder_create_block()— Create a new basic block with label. -
ir_builder_set_insert_point()— Set insertion point for new instructions. -
ir_builder_alloc_reg()— Allocate next virtual register%م<n>. -
ir_builder_emit_*()— Emit instructions (add, sub, mul, div, load, store, br, ret, call, etc.). - Control flow helpers —
ir_builder_create_if_then(),ir_builder_create_while(). - Create
ir_builder.h— Header file with builder API. - Create
ir_builder.c— Implementation of builder functions.
-
lower_expr()— Main expression lowering dispatcher. - Lower
NODE_INT— Return immediate value. - Lower
NODE_VAR_REF— Generateحمل(load) instruction. - Lower
NODE_BIN_OP— Generateجمع/طرح/ضرب/قسمinstructions. - Lower
NODE_UNARY_OP— Generateسالب/نفيinstructions. - Lower
NODE_CALL_EXPR— Generateنداء(call) instruction.
-
lower_stmt()— Main statement lowering dispatcher. - Lower
NODE_VAR_DECL— Generateحجز(alloca) +خزن(store). - Lower
NODE_ASSIGN— Generateخزن(store) instruction. - Lower
NODE_RETURN— Generateرجوع(return) instruction. - Lower
NODE_PRINT— Generateنداء @اطبع()call. - Lower
NODE_READ— Generateنداء @اقرأ()call.
- Lower
NODE_IF— Create condition block + true/false blocks + merge block. - Lower
NODE_WHILE— Create header/body/exit blocks with back edge. - Lower
NODE_FOR— Create init/header/body/increment/exit blocks. - Lower
NODE_SWITCH— Create comparison chain + case blocks. - Lower
NODE_BREAK— Generateقفزto loop exit. - Lower
NODE_CONTINUE— Generateقفزto loop header/increment.
-
ir_print_func()— Print function header and all blocks. -
ir_print_block()— Print block label and all instructions. -
ir_print_inst()— Print single instruction with Arabic opcodes. - Arabic numeral output — Print register numbers in Arabic (٠١٢٣٤٥٦٧٨٩).
-
--dump-irCLI flag — Add command-line option to print IR.
- Integrate IR into pipeline — AST → IR (skip direct codegen).
- Create
ir_test.baa— Simple test programs. - Verify IR output — Check IR text matches specification.
- Update
main.c— Add IR phase between analysis and codegen. - Add
--emit-irflag — Write IR to.irfile. - Fix global variable resolution — Proper lookup in
lower_expr()andlower_assign().
- CFG validation — Verify all blocks have terminators.
- Predecessor lists — Build predecessor list for each block.
- Dominator tree — Compute dominance relationships.
- Define
IRPassinterface — Function pointer for optimization passes.
- Detect constant operands — Both operands are immediate values.
- Fold arithmetic —
جمع ص٦٤ ٥، ٣→٨. - Fold comparisons —
قارن أكبر ص٦٤ ١٠، ٥→صواب. - Replace instruction — Remove op, use constant result.
- Mark used values — Walk from terminators backward.
- Identify dead instructions — Result never used.
- Remove dead instructions — Delete from block.
- Remove unreachable blocks — No predecessors (except entry).
- Detect copy instructions —
IR_OP_COPY(نسخ) instruction pattern. - Replace uses — Substitute original for copy in operands / call args / phi entries.
- Remove redundant copies — Delete
نسخinstruction after propagation.
- Hash expressions — Create signature for each operation.
- Detect duplicates — Same op + same operands.
- Replace with existing result — Reuse previous computation.
- Pass ordering — Define optimal pass sequence (constfold → copyprop → CSE → DCE).
- Iteration — Run passes until no changes (fixpoint, max 10 iterations).
-
-O0,-O1,-O2flags — Control optimization level. -
--dump-ir-opt— Print IR after optimization.
- Define
MachineInst— Abstract machine instruction. - IR to Machine mapping —
جمع→ADD,حمل→MOV, etc. - Pattern matching — Select optimal instruction sequences.
- Handle immediates — Inline constants where possible.
- Liveness analysis — Compute live ranges for each virtual register.
- Linear scan allocator — Simple, fast allocation algorithm.
- Spilling — Handle register pressure by spilling to stack.
- Map to x64 registers — RAX, RBX, RCX, RDX, R8-R15.
- Emit function prologue — Stack setup, callee-saved registers.
- Emit instructions — Generate AT&T syntax assembly.
- Emit function epilogue — Stack teardown, return.
- Emit data section — Global variables and string literals.
- Replace old codegen — IR → Backend → Assembly.
- Verify output — Compare with old codegen results.
- Performance testing — Ensure no regression.
- Remove legacy codegen — Retire the legacy AST backend from the build.
- Fix ISel logical op size mismatch —
isel_lower_logical()forced 64-bit operand size; widened 8-bit boolean vregs to prevent assembler errors. - Fix function parameter ABI copies —
isel_lower_func()prepends MOV from RCX/RDX/R8/R9 to parameter vregs at entry block. - Fix IDIV RAX constraint —
isel_lower_div()explicitly routes dividend through RAX (vreg -2) for correct division results. - Comprehensive backend test —
tests/integration/backend/backend_test.baa: 27 functions, 63 assertions, all PASS.
- Bundle MinGW-w64 GCC — Ship GCC toolchain in
gcc/subfolder inside the installer. - Auto-detect bundled GCC —
resolve_gcc_path()inmain.cfindsgcc.exerelative tobaa.exe. - Update installer (setup.iss) — Add
gcc\*files, dual PATH entries, post-install GCC verification. - GCC bundle script —
scripts/prepare_gcc_bundle.ps1downloads and prepares the minimal toolchain. - Sync version metadata —
baa.rcandsetup.issupdated to0.3.2.4, publisher to "Omar Aglan".
Strategy (Canonical SSA / الطريقة القياسية لـ SSA):
- Mem2Reg (ترقية الذاكرة إلى سجلات) بأسلوب Cytron/LLVM القياسي:
- حساب المسيطرات (Dominators) و حدود السيطرة (Dominance Frontiers)
- إدراج عقد فاي (Phi) عند نقاط الدمج (join points)
- إعادة التسمية (SSA Renaming) لبناء تعريفات واصلة (reaching definitions)
- هذا الأسلوب هو الأساس طويل المدى لتحسينات متقدمة لاحقاً مثل: GVN/CSE و LICM و PRE وغيرها.
- Identify promotable allocas — Single-block allocas with no escaping (correctness-first baseline; design stays compatible with full Mem2Reg).
- Replace loads/stores — Convert to direct register use.
- Remove dead allocas — Delete promoted
حجزinstructions.
- Compute dominance frontiers — Where Phi nodes are needed.
- Insert Phi placeholders — Add
فايat join points. - Rename variables — SSA renaming pass with reaching definitions.
- Connect Phi operands — Link values from predecessor blocks.
- Verify SSA properties — Each register defined exactly once.
- Check dominance — Definition dominates all uses.
- Validate Phi nodes — One operand per predecessor.
-
--verify-ssaflag — Debug option to run SSA checks.
- Enable compiler warnings (two-tier) — Default warnings on; optional
-Werrorhardening toggle. - Fix unsafe string building in driver — Replace
sprintf/strcpy/strcatcommand construction with bounded appends and overflow checks. - Fix symbol-table name overflow — Guard
Symbol.name[32]writes; reject/diagnose identifiers that exceed the limit. - Harden updater version parsing — Support multi-part versions like
0.3.2.5.3; replacesprintfwithsnprintf; check parse results. - Audit
strncpyusage — Ensure explicit NUL-termination and bounds safety in lexer and helpers. - Replace
atoiwith checked parsing — Usestrtoll+ validation for integer literals and array sizes; produce safe diagnostics. - Warning clean build — Zero warnings under default warning set;
-Werrorbuild passes.
✅ COMPLETED (2026-02-11)
- Arena allocator for IR — Fast allocation, bulk deallocation.
- IR cloning — Deep copy of functions/blocks.
- IR destruction — Clean up all IR memory.
- Def-use chains for SSA regs — Build and maintain use lists to make IR passes fast and safe (avoid whole-function rescans).
- Instruction numbering / stable IDs — Deterministic per-function instruction IDs for analyses, debugging, and regression tests.
- IR mutation helpers — Central utilities to insert/remove instructions and update CFG metadata (pred/succ/dominance caches) consistently.
✅ COMPLETED (2026-02-11)
- Source location tracking — Map IR instructions to source lines.
- Variable name preservation — Keep original names for debugging.
-
--debug-infoflag — Emit debug metadata in assembly.
✅ COMPLETED (2026-02-11)
- Text IR writer — Output canonical IR text format.
- Text IR reader — Parse IR text back to data structures.
- Round-trip testing — Write → Read → Compare.
✅ COMPLETED (2026-02-12)
- Fix liveness across loop back-edges — Linear scan liveness analysis does not correctly propagate live ranges across loop back-edges when many variables are simultaneously live, causing register clobbering and segfaults.
- Extend live intervals to loop ends — Ensure variables used inside loops have their intervals extended to cover the entire loop body including back-edges.
- Add block-level scoping in semantic analyzer — Currently function-level only; for-loop variables cannot be redeclared in the same function, requiring unique names.
- Stress test with high register pressure — Validate fix with functions containing 8+ live variables across multiple nested loops.
✅ COMPLETED (2026-02-13)
- IR well-formedness verifier (
--verify-ir) — Validate operand counts, type consistency, terminator rules, phi placement, and call signatures (separate from--verify-ssa). - Verifier gate in optimizer (debug) — Optional mode to run
--verify-ir/--verify-ssaafter each pass iteration to catch pass bugs early. - Canonicalization pass — Normalize commutative operands, constant placement, and comparison canonical forms to make CSE/DCE/constfold more effective.
- CFG simplification pass — Merge trivial blocks, remove redundant branches, and provide a reusable critical-edge splitting utility for IR passes.
- Define IR arithmetic semantics — Document and enforce overflow behavior (recommended: two’s-complement wrap), and clarify
i1truthiness anddiv/modrules for negatives. - Data layout helpers — Add size/alignment queries per
IRType(incl. pointer size) as the foundation for futureTargetabstraction and correct aggregate lowering. - Memory model contract — Specify and verify rules for
حجز/حمل/خزن(typed pointers, aliasing assumptions, and what is/ isn’t legal for optimization).
✅ COMPLETED (2026-02-14)
- Fix SSA verification failure — Resolved dominance issue in CSE pass for
switchstatements withdefaultcases.
✅ COMPLETED (2026-02-16)
- Loop detection — Identify natural loops via back edges.
- Loop invariant code motion — Hoist constant computations.
- Strength reduction — Replace expensive ops (mul → shift).
- Loop unrolling — Optional with
-funroll-loops.
✅ COMPLETED (2026-02-16)
- Inline heuristics — Small functions, single call site.
- Inline expansion — Copy function body to call site.
- Post-inline cleanup — Re-run optimization passes.
✅ COMPLETED (2026-02-17)
- Detect tail calls —
callimmediately followed byret(بدون تعليمات بينهما). - Convert to jump — Replace call+ret with a tail jump (no new return address).
- Stack reuse — Reuse caller's stack frame + caller shadow space (Windows x64 ABI).
- Limitation (for v0.3.2.7.3) — initial implementation supports only <= 4 arguments (register args); stack-args tail calls are scheduled in v0.3.2.8.5.
✅ COMPLETED (2026-02-17)
- Define
Targetinterface — OS/object-format, data layout, calling convention, asm directives. - x86-64 Windows target — Keep current behavior as the first concrete target.
- Host default target — Windows host defaults to
x86_64-windows, Linux host defaults tox86_64-linux. - Target selection —
--target=x86_64-windows|x86_64-linuxflag.
✅ COMPLETED (2026-02-17)
- Define
CallingConvstruct — arg regs order/count, return regs, caller/callee-saved masks. - Stack rules — stack alignment + varargs rules are modeled; stack-arg placement is deferred (backend rejects stack args for now).
- Windows x64 ABI — RCX/RDX/R8/R9 + 32-byte shadow/home space.
- SystemV AMD64 ABI — RDI/RSI/RDX/RCX/R8/R9, no shadow space, sets
AL=0for calls (conservative).
✅ COMPLETED (2026-02-17)
- Small code model — Default (only supported model in v0.3.2.8.3).
- PIC/PIE flags —
-fPIC/-fPIE(Linux/ELF; initial support). - Stack protection — stack canaries on ELF via
-fstack-protector*.
✅ COMPLETED (2026-02-17)
- Native Linux build of compiler — build
baaon Linux with GCC/Clang + CMake. - SystemV AMD64 ABI implementation — different calling convention from Windows.
- ELF output support —
.rodata/.data/.textdirectives compatible with ELF GAS. - Link with host gcc (for now) — produce ELF executables via host toolchain; later reduce/remove GCC dependency.
- Cross-compilation (later) — optional
--target=x86_64-linuxfrom Windows once a cross toolchain story exists.
✅ COMPLETED (2026-02-17)
- Proper stack-arg calling — outgoing call frames support stack args on Windows x64 and SysV AMD64.
- Tail calls beyond reg-args — enable TCO with stack args conservatively (requires enough incoming stack-arg area).
- Varargs home space (Windows) — home/register args are written into shadow space before calls.
✅ COMPLETED (2026-02-17)
Goal: move toward compiler-grade IR optimizations in pragmatic steps.
- InstCombine — local simplification patterns (canonical
COPYrewrites). - SCCP — sparse conditional constant propagation +
br_condfolding. - GVN — dominator-scoped global value numbering for pure expressions.
- Mem2Reg promotability unlock — must-def initialization analysis (instead of “init store must be in alloca block”).
- Pipeline wiring — run InstCombine+SCCP early; GVN at
-O2before CSE.
- Well-formedness checks — All functions have entry blocks.
- Type consistency — Operand types match instruction requirements.
- CFG integrity — All branches point to valid blocks.
- SSA verification — Run
--verify-ssaon all test programs. -
baa --verifymode — Run all verification passes.
- Compile-time benchmark — Compare compiler-only (
-S) and end-to-end compile wall time. - Runtime benchmark — Run deterministic
bench/runtime_*.baaprograms. - Memory usage profiling — Track peak RSS on Linux via
/usr/bin/time -vand IR arena stats via--time-phases. - Benchmark suite — Collection of representative programs (
bench/*.baa).
- Output comparison — Compare IR-based output against a reference corpus.
- Test all v0.2.x programs — Ensure backward compatibility.
- Edge case testing — Complex control flow, nested loops, recursion.
- Error case testing — Verify error messages unchanged.
- Update INTERNALS.md — Document new IR pipeline.
- IR Developer Guide — How to add new IR instructions.
- Driver split — Extract CLI parsing, toolchain execution, and per-file compile pipeline into dedicated
src/driver_*.c/.hmodules. - Driver link safety — Remove fixed-size
argv_linkconstruction; build link argv dynamically based on object count. - Shared file reader — Provide
read_file()as a shared module (src/read_file.c) for the lexer include system. - Line ending normalization — Add
.gitattributesto keep the repo LF-normalized and reduce diff churn. - Remove deprecated code — Remove legacy AST-based codegen paths and backend-compare mode.
- Code review checklist — Ensure code quality standards.
Goal: Add essential features to make Baa practical for real-world programs and ready for self-hosting.
Goal: Enable direct initialization of arrays with values.
- Array Literal Syntax – Initialize arrays with comma-separated values using
{}(supports partial init + zero-fill like C).
Syntax:
صحيح قائمة[٥] = {١، ٢، ٣، ٤، ٥}.
// With Arabic comma (،) or regular comma (,)
صحيح أرقام[٣] = {١٠، ٢٠، ٣٠}.
- Parser: Handle
{}initializer list after array declaration. - Parser: Support both Arabic comma
،(U+060C) and regular comma,as separators. - Semantic Analysis: Allow partial init (
count <= size) and zero-fill the remainder; reject overflow. - Codegen: Emit
.datainitializers for globals and runtime stores for locals (including zero-fill).
- Multi-dimensional arrays:
صحيح مصفوفة[٣][٤].✅ COMPLETED (2026-02-25) - Array length operator:
صحيح طول = حجم(قائمة) / حجم(صحيح).✅ COMPLETED (2026-02-25)
Goal: Add compound types for better code organization and type safety.
- Enum Declaration – Named integer constants with type safety.
- Struct Declaration – Group related data into composite types (supports nested structs + enum fields).
- Member Access – Use
:(colon) operator for accessing members.
Goal: Memory-efficient variant types for parsers and data structures.
-
Union Declaration:
اتحاد قيمة { صحيح رقم. نص نص_قيمة. منطقي منطق. } -
Union Usage:
اتحاد قيمة ق. ق:رقم = ٤٢. // All members share same memory ق:نص_قيمة = "مرحبا". // Overwrites previous value -
Tagged Union Pattern (manual):
تعداد نوع_قيمة { رقم، نص_ق } هيكل قيمة_موسومة { تعداد نوع_قيمة نوع. اتحاد قيمة بيانات. }
- Token: Add
TOKEN_UNIONforاتحادkeyword. - Parser: Parse union declaration similar to struct.
- Semantic: All members start at offset 0.
- Memory Layout: Size = max member size, align = max member align.
- Codegen: Generate union storage + member access code.
Complete Example:
// ١. تعريف التعداد (Enumeration)
تعداد لون {
أحمر،
أزرق،
أسود،
أبيض
}
// ٢. تعريف الهيكل (Structure)
هيكل سيارة {
نص موديل.
صحيح سنة_الصنع.
تعداد لون لون_السيارة.
}
صحيح الرئيسية() {
هيكل سيارة س.
س:موديل = "تويوتا كورولا".
س:سنة_الصنع = ٢٠٢٤.
س:لون_السيارة = لون:أحمر.
اطبع س:موديل.
اطبع س:سنة_الصنع.
إذا (س:لون_السيارة == لون:أحمر) {
اطبع "تحذير: السيارات الحمراء سريعة!".
}
إرجع ٠.
}
Enumerations:
- Token: Add
TOKEN_ENUMforتعدادkeyword. - Parser: Parse enum declaration:
تعداد <name> { <members> }. - Parser: Support Arabic comma
،between enum members. - Semantic: Auto-assign integer values (0, 1, 2...).
- Semantic: Enum values accessible via
<enum_name>:<value_name>. - Type System: Add
TYPE_ENUMtoDataType.
Structures:
- Token: Add
TOKEN_STRUCTforهيكلkeyword. - Token: Add
TOKEN_COLONfor:(already exists, verify usage). - Parser: Parse struct declaration:
هيكل <name> { <fields> }. - Parser: Parse struct instantiation:
هيكل <name> <var>. - Parser: Parse member access:
<var>:<member>. - Semantic: Track struct definitions in symbol table.
- Semantic: Validate member access against struct definition.
- Memory Layout: Calculate field offsets with padding/alignment (supports nested structs).
- Codegen: Emit struct storage + member access code.
Goal: Add a proper character type with UTF-8 source support.
ملاحظة: نوع عشري قُدِّم في v0.3.5 كثوابت/تخزين، واكتمل في v0.3.5.5 (عمليات + ABI).
-
Character Type (
حرف) – UTF-8 character value (variable-length, 1..4 bytes) stored as a packed scalar. -
String-Char Relationship – Strings (
نص) are represented as arrays ofحرف(حرف[]) with indexing (اسم[٠]). -
Float Type (
عشري) – Deferred from v0.3.4.5 (introduced as a basic type + literals in v0.3.5; completed in v0.3.5.5 with ops/ABI).
Syntax:
حرف ح = 'أ'.
نص اسم = "أحمد". // Equivalent to: حرف اسم[] = {'أ', 'ح', 'م', 'د', '\0'}.
- Token: Already have
TOKEN_CHARfor literals. - Token: Add
TOKEN_KEYWORD_CHARforحرفtype keyword. - Type System: Add
TYPE_CHARtoDataTypeenum. - Semantic: Distinguish between
charandint. - Codegen: Store
حرفas packedi64(bytes + length) and support UTF-8 printing. - String Representation: Update internal string handling to use
حرف[].
- String operations:
طول_نص(),دمج_نص(),قارن_نص()
Goal: Make numeric types practical for systems programming: sized integers + usable عشري (f64).
-
Sized Integer Types:
ص٨ بايت_موقع = -١٢٨. // int8_t: -128 to 127 ص١٦ قصير = -٣٢٠٠٠. // int16_t: -32768 to 32767 ص٣٢ عادي = -٢٠٠٠٠٠٠٠٠٠. // int32_t ص٦٤ طويل = ٩٠٠٠٠٠٠٠٠٠٠٠٠. // int64_t (current صحيح) ط٨ بايت = ٢٥٥. // uint8_t ط١٦ قصير٢ = ٦٥٠٠٠. // uint16_t ط٣٢ عادي٢ = ٤٠٠٠٠٠٠٠٠٠. // uint32_t ط٦٤ طويل٢ = ١٨٠٠٠٠٠٠٠٠٠٠٠٠٠٠٠٠٠٠. // uint64_t -
C-like semantics:
- integer promotions + usual arithmetic conversions
- correct signed/unsigned comparisons
- correct
div/modsemantics for signed vs unsigned
-
عشريusability:- arithmetic
+ - * / - comparisons
== != < > <= >= اطبعsupportsعشري- ABI lowering on SysV AMD64 + Windows x64 (XMM regs + SysV varargs rules)
- arithmetic
- Lexer: Tokenize
ص٨,ص١٦,ص٣٢,ص٦٤,ط٨,ط١٦,ط٣٢,ط٦٤. - Type System: Add size and signedness to integer types.
- IR: Add unsigned variants and allow
f64ops + verifier updates. - Optimizer: Ensure integer/float correctness (avoid invalid float folds; unsigned-aware folds).
- ISel/Emitter: Generate correct-sized integer ops and scalar SSE2 for
f64. - ABI (Windows + SystemV): Pass/return
عشريin XMM registers; handle SysV varargs rules. - Semantic: Warn on implicit narrowing conversions.
- Semantic: Handle signed/unsigned comparison warnings.
Goal: Add bitwise operations and low-level features needed for systems programming.
-
Bitwise Operators:
صحيح أ = ٥ & ٣. // AND: 5 & 3 = 1 صحيح ب = ٥ | ٣. // OR: 5 | 3 = 7 صحيح ج = ٥ ^ ٣. // XOR: 5 ^ 3 = 6 صحيح د = ~٥. // NOT: ~5 = -6 صحيح هـ = ١ << ٤. // Left shift: 1 << 4 = 16 صحيح و = ١٦ >> ٢. // Right shift: 16 >> 2 = 4 -
Sizeof Operator:
صحيح حجم_صحيح = حجم(صحيح). // Returns 8 صحيح حجم_حرف = حجم(حرف). // Returns 1 صحيح حجم_مصفوفة = حجم(قائمة). // Returns array size in bytes -
Void Type:
عدم اطبع_رسالة() { اطبع "مرحباً". // No return needed } -
Escape Sequences:
نص سطر = "سطر١\س سطر٢". // Newline (/س) نص جدول = "عمود١\ت عمود٢". // Tab (/ت) نص مسار = "C:\\ملفات". // Backslash حرف صفر = '\٠'. // Null character
- Lexer: Tokenize
&,|,^,~,<<,>>. - Parser: Add bitwise operators with correct precedence.
- Parser: Parse
حجم(type)andحجم(expr)expressions. - Lexer: Add
عدمkeyword for void type. - Lexer: Handle escape sequences in string/char literals.
- Semantic: Type check bitwise operations (integers only).
- Codegen: Generate bitwise assembly instructions.
- Codegen: Calculate sizes for
حجمoperator.
Goal: Create custom type names for readability and abstraction.
-
Simple Type Alias:
نوع معرف = ط٦٤. نوع نتيجة = ص٣٢. معرف رقم_المستخدم = ١٢٣٤٥. نتيجة كود_خطأ = -١. -
Pointer Type Alias:
نوع نص_ثابت = ثابت حرف*. نوع مؤشر_بايت = ط٨*.(Deferred — requires pointer type grammar from v0.3.10.)
- Token/AST plumbing: Added
TOKEN_TYPE_ALIAS/NODE_TYPE_ALIASsupport in shared structures. - Parser: Parse
نوع <name> = <type>.at top-level with C-like declare-before-use semantics. - Semantic: Resolve aliases during type checking and validate alias targets (
enum/struct/union). - Symbol Table: Store type aliases separately and enforce strict name-collision diagnostics.
Goal: Refine and enhance existing compiler systems.
- Error Messages – Improve clarity and helpfulness of diagnostic messages.
- Code Quality – Refactor complex functions, improve code organization.
- Memory Management – Fix memory leaks, improve buffer handling.
- Performance – Profile and optimize slow compilation paths.
- Documentation – Update all docs to reflect v0.3.3-0.3.7 changes.
- Edge Cases – Fix known bugs and handle corner cases.
- Improve panic mode recovery in parser.
- Better handling of UTF-8 edge cases in lexer.
- Optimize symbol table lookups (consider hash table).
- Add more comprehensive error recovery.
- Improve assembly output readability (comments in assembly).
Goal: Variables that persist between function calls.
- Static Local Syntax:
صحيح عداد() { ساكن صحيح ع = ٠. // Initialized once, persists ع = ع + ١. إرجع ع. } // First call returns 1, second returns 2, etc.
- Token: Add
TOKEN_STATICforساكنkeyword. - Semantic: Static locals go in .data section, not stack.
- Codegen: Generate unique global label for static locals.
- Codegen: Initialize in .data section.
Goal: Establish robust testing infrastructure and fix accumulated issues.
-
Test Framework – Create automated test runner.
- Script to compile and run
.baatest files. - Compare actual output vs expected output.
- Report pass/fail with clear diagnostics.
- Script to compile and run
-
Test Categories:
- Lexer Tests – Token generation, UTF-8 handling, preprocessor.
- Parser Tests – Syntax validation, error recovery.
- Semantic Tests – Type checking, scope validation.
- Codegen Tests – Correct assembly output, execution results.
- Integration Tests – Full programs with expected output.
-
Test Coverage:
- All language features (v0.0.1 - v0.3.7).
- Edge cases and corner cases.
- Error conditions (syntax errors, type mismatches, etc.).
- Multi-file compilation scenarios.
- Preprocessor directive combinations.
-
GitHub Actions workflow:
name: Baa CI on: [push, pull_request] jobs: build-and-test: runs-on: windows-latest steps: - uses: actions/checkout@v3 - name: Build Baa run: gcc src/*.c -o baa.exe - name: Run Tests run: ./run_tests.bat
- Known Issues – Fix all open bugs from previous versions.
- Regression Testing – Ensure new features don't break old code.
- Stress Testing – Test with large files, deep nesting, many symbols.
- Arabic Text Edge Cases – Test various Arabic Unicode scenarios.
Goal: Complete array and string functionality.
-
Multi-dimensional Arrays ✅ COMPLETED (2026-02-25):
صحيح مصفوفة[٣][٤]. مصفوفة[٠][٠] = ١٠. مصفوفة[١][٢] = ٢٠. -
Array Length Operator ✅ COMPLETED (2026-02-25):
صحيح قائمة[١٠]. صحيح الطول = حجم(قائمة) / حجم(صحيح). // Returns 10 -
Array Bounds Checking (Optional debug mode) ✅ COMPLETED (2026-02-25):
- Runtime checks in debug-oriented lowering mode (
--debug-infopath). - Abort path on out-of-bounds access.
- Runtime checks in debug-oriented lowering mode (
- String Length ✅ COMPLETED (2026-02-25):
صحيح الطول = طول_نص(اسم). - String Concatenation ✅ COMPLETED (2026-02-25):
نص كامل = دمج_نص(اسم, " علي"). - String Comparison ✅ COMPLETED (2026-02-25):
صحيح نتيجة = قارن_نص(اسم, "محمد"). - String Indexing (read-only) ✅ COMPLETED (2026-02-25):
حرف أول = اسم[٠]. - String Copy ✅ COMPLETED (2026-02-25):
نص نسخة = نسخ_نص(اسم).
- Parser: Parse multi-dimensional array declarations and access.
- Semantic: Track array dimensions in symbol table.
- Codegen: Calculate offsets for multi-dimensional arrays (row-major order).
- Standard Library: Create
baalib.baawith string functions. - UTF-8 Aware: Ensure functions handle multi-byte Arabic characters correctly.
Goal: Add pointer types for manual memory management and data structures.
-
Pointer Type Declaration ✅ COMPLETED (2026-02-25):
صحيح* مؤشر. // Pointer to integer حرف* نص_مؤشر. // Pointer to character (C-string) هيكل سيارة* س_مؤشر. // Pointer to struct -
Address-of Operator (
&) ✅ COMPLETED (2026-02-25):صحيح س = ١٠. صحيح* م = &س. // م points to س -
Dereference Operator (
*) ✅ COMPLETED (2026-02-25):صحيح قيمة = *م. // قيمة = 10 *م = ٢٠. // س now equals 20 -
Null Pointer ✅ COMPLETED (2026-02-25):
صحيح* م = عدم. // Null pointer إذا (م == عدم) { اطبع "مؤشر فارغ". } -
Pointer Arithmetic ✅ COMPLETED (2026-02-25):
صحيح قائمة[٥] = {١، ٢، ٣، ٤، ٥}. صحيح* م = &قائمة[٠]. م = م + ١. // Points to قائمة[١] اطبع *م. // Prints 2
- Lexer/Parser integration: Handle
*في سياق النوع/فك الإشارة والضرب و&كـ bitwise/address-of. - Parser: Parse pointer declarations +
عدمكمؤشر فارغ + جملة*ptr = value.. - Type System: Add
TYPE_POINTERwith base type tracking in AST/symbol metadata. - Semantic: Validate pointer operations (dereference/address-of/null/pointer compare/arithmetic).
- Codegen: Lower address-of/dereference/pointer assignment and pointer arithmetic safely.
Goal: Explicit type conversions for low-level programming.
-
Cast Syntax:
صحيح س = ٦٥. حرف ح = كـ<حرف>(س). -
Numeric Casts:
ص٣٢ صغير = كـ<ص٣٢>(قيمة_كبيرة). // Truncation ص٦٤ كبير = كـ<ص٦٤>(قيمة_صغيرة). // Sign extension ط٦٤ بدون = كـ<ط٦٤>(موقع). // Signed to unsigned -
Pointer Casts:
ط٨* بايتات = كـ<ط٨*>(مؤشر_هيكل). // Reinterpret عدم* عام = كـ<عدم*>(أي_مؤشر). // To void pointer هيكل س* محدد = كـ<هيكل س*>(عام). // From void pointer -
Pointer Difference (
pointer - pointer):صحيح* أ = &قائمة[٠]. صحيح* ب = أ + ٣. صحيح فرق = ب - أ. // = 3 (فرق عناصر)
- Lexer: Tokenize
كـkeyword and<>for type parameter. - Parser: Parse
كـ<type>(expr)form. - Semantic: Validate cast safety, warn on dangerous casts.
- Codegen: Generate appropriate conversion instructions.
Goal: First-class function references for callbacks and dispatch tables.
-
Function Pointer Type:
// Pointer to function taking two صحيح, returning صحيح نوع دالة_ثنائية = دالة(صحيح، صحيح) -> صحيح. // Or inline دالة(صحيح، صحيح) -> صحيح مؤشر_دالة. -
Assign Function to Pointer:
صحيح جمع(صحيح أ، صحيح ب) { إرجع أ + ب. } صحيح ضرب(صحيح أ، صحيح ب) { إرجع أ * ب. } دالة_ثنائية عملية = جمع. // Points to جمع عملية = ضرب. // Now points to ضرب -
Call Through Pointer:
صحيح نتيجة = عملية(١٠، ٢٠). // Calls ضرب(10, 20) = 200 -
Function Pointer as Parameter:
صحيح طبق(صحيح[] قائمة، صحيح حجم، دالة_ثنائية د) { صحيح نتيجة = قائمة[٠]. لكل (صحيح ع = ١؛ ع < حجم؛ ع++) { نتيجة = د(نتيجة، قائمة[ع]). } إرجع نتيجة. } // Usage صحيح مجموع = طبق(أرقام، ١٠، جمع). -
Null Function Pointer:
دالة_ثنائية فارغ = عدم. إذا (فارغ != عدم) { فارغ(١، ٢). }
- Parser: Parse function type syntax
دالة(...) -> نوع. - Type System: Add function pointer type with signature plumbing.
- Semantic: Type-check function pointer assignments.
- Semantic: Validate call through pointer matches signature.
- Codegen: Generate indirect call instructions.
- IR: Add function pointer type to IR.
✅ COMPLETED (2026-02-28)
Goal: Enable heap allocation for dynamic data structures.
-
Memory Allocation:
// Allocate memory for 10 integers صحيح* قائمة = حجز_ذاكرة(١٠ * حجم(صحيح)). // Allocate memory for a struct هيكل سيارة* س = حجز_ذاكرة(حجم(هيكل سيارة)). -
Memory Deallocation:
تحرير_ذاكرة(قائمة). تحرير_ذاكرة(س). -
Memory Reallocation:
// Resize array to 20 integers قائمة = إعادة_حجز(قائمة, ٢٠ * حجم(صحيح)). -
Memory Operations:
// Copy memory نسخ_ذاكرة(وجهة, مصدر, حجم). // Set memory to value تعيين_ذاكرة(مؤشر, ٠, حجم).
- Runtime: ربط مباشر مع libc (
malloc/free/realloc/memcpy/memset). - Built-in Functions: إضافة
حجز_ذاكرة,تحرير_ذاكرة,إعادة_حجز,نسخ_ذاكرة,تعيين_ذاكرة. - Semantic: دعم
عدم*(void*) كـ مؤشّر عام (تحويلات ضمنية مع مؤشرات الكائنات). - Codegen: خفض الدوال إلى استدعاءات C القياسية مع قواعد shadowing.
✅ COMPLETED (2026-03-01)
Goal: Enable reading and writing files for compiler self-hosting.
-
File Opening:
عدم* ملف = فتح_ملف("بيانات.txt", "قراءة"). عدم* ملف_كتابة = فتح_ملف("ناتج.txt", "كتابة"). عدم* ملف_إضافة = فتح_ملف("سجل.txt", "إضافة"). -
File Reading:
حرف حرف_واحد = اقرأ_حرف(ملف). نص سطر = اقرأ_سطر(ملف). صحيح بايتات = اقرأ_ملف(ملف, مخزن, حجم). -
File Writing:
اكتب_حرف(ملف, 'أ'). اكتب_سطر(ملف, "مرحباً"). اكتب_ملف(ملف, بيانات, حجم). -
File Closing:
اغلق_ملف(ملف). -
File Status:
منطقي انتهى = نهاية_ملف(ملف). صحيح موقع = موقع_ملف(ملف). اذهب_لموقع(ملف, ٠).
- Runtime: Wrap C stdio functions (fopen, fread, fwrite, fclose, fgetc, fputc, fputs, feof, ftello/fseeko).
- Built-in Functions: Add file operation functions using
عدم*as an opaqueFILE*handle. - Error Handling: Return error codes for failed operations (
فتح_ملفيعيدعدم، واكتب_سطر/اكتب_حرفتعيد -1 عند الفشل). - Codegen: Generate direct libc stdio calls with shadowing rules.
✅ COMPLETED (2026-03-02)
Goal: Access program arguments - essential for compiler self-hosting.
- Main with Arguments:
صحيح الرئيسية(صحيح عدد، نص[] معاملات) { // عدد = argument count (like argc) // معاملات = argument array (like argv) إذا (عدد < ٢) { اطبع "الاستخدام: برنامج <ملف>". إرجع ١. } نص اسم_البرنامج = معاملات[٠]. نص ملف_إدخال = معاملات[١]. اطبع "تجميع: ". اطبع ملف_إدخال. إرجع ٠. }
- Parser: Allow parameters in
الرئيسيةfunction. - Semantic: Validate main signature matches expected pattern.
- Codegen: Link with proper C runtime entry point.
- Codegen (Opt-in):
--startup=custom— custom entrypoint symbol (__baa_start) while keeping CRT/libc init. - Codegen (Full Independence): True
_startwithout CRT/libc (deferred to Phase 8).
✅ COMPLETED (2026-03-02)
Goal: Make Baa production-ready with a comprehensive standard library.
Goal: Professional I/O capabilities.
-
Formatted Output:
اطبع_منسق("الاسم: %ن، العمر: %ص\س", اسم, عمر). -
String Formatting:
نص رسالة = نسق("النتيجة: %ص", قيمة). حرر_نص(رسالة). -
Formatted Input:
نص سطر = اقرأ_سطر(). صحيح رقم = اقرأ_رقم(). صحيح أ = ٠. عشري ب = ٠. نص س = عدم. // ملاحظة: في الإدخال، %ن يتطلب عرضاً رقمياً (مثلاً %10ن). صحيح مقروء = اقرأ_منسق("%ص %ع %10ن", &أ, &ب, &س). إذا (مقروء == 3) { حرر_نص(س). }
✅ COMPLETED (2026-03-02)
Goal: Functions accepting variable number of arguments.
-
Variadic Declaration:
عدم اطبع_منسق(نص تنسيق، ...) { // Implementation using variadic access } -
Variadic Access Macros/Functions:
عدم اطبع_أرقام(صحيح عدد، ...) { قائمة_معاملات معاملات. بدء_معاملات(معاملات، عدد). لكل (صحيح ع = ٠؛ ع < عدد؛ ع++) { صحيح قيمة = معامل_تالي(معاملات، صحيح). اطبع قيمة. } نهاية_معاملات(معاملات). } // Usage اطبع_أرقام(٣، ١٠، ٢٠، ٣٠).
- Lexer: Tokenize
...(ellipsis). - Parser: Parse variadic function declarations (including
دالة(...)->...signatures). - Type System: Handle variadic function types and variadic call arity/type checks.
- Codegen (Windows x64): Variadic calls lowered عبر وسيط داخلي موحد (
__baa_va_base) متوافق مع مسار الباك-إند الحالي. - Codegen (Linux x64): نفس وسيط المعاملات الداخلي لضمان سلوك موحد على
x86_64-linux. - Built-ins: Implement
بدء_معاملات,معامل_تالي,نهاية_معاملات.
✅ COMPLETED (2026-03-02)
Goal: Embed assembly code for low-level operations.
-
Basic Inline Assembly:
مجمع { "nop" } -
With Outputs and Inputs:
صحيح قراءة_عداد() { ط٣٢ منخفض. ط٣٢ مرتفع. مجمع { "rdtsc" : "=a" (منخفض)، "=d" (مرتفع) } إرجع (كـ<ص٦٤>(مرتفع) << ٣٢) | كـ<ص٦٤>(منخفض). }
- Token: Add
TOKEN_ASMforمجمعkeyword. - Parser: Parse inline assembly blocks.
- Codegen: Emit assembly directly with proper constraints.
- Semantic: Validate constraint syntax.
✅ COMPLETED (2026-03-02)
- Full constraint support (memory, register classes)
- Clobber lists
- Math Module —
جذر_تربيعي(),أس(),مطلق(),عشوائي(). - String Module — Complete string manipulation.
- IO Module — File and console operations.
- System Module — Environment variables, command execution.
- Time Module — Date/time operations.
✅ COMPLETED (2026-03-02)
Goal: Extend floating point beyond the core عشري (completed in v0.3.5.5).
- Math functions —
جذر_تربيعي(),أس(),جيب(),جيب_تمام(),ظل(). - Formatting — better float printing options (precision + scientific
%أ). - Additional float types —
عشري٣٢keyword (current lowering alias toعشري/f64 in v0.4.2).
✅ COMPLETED (2026-03-02)
Goal: Graceful error management.
-
Assertions:
تأكد(س > ٠, "س يجب أن يكون موجباً"). -
Error Codes – Standardized error return values.
-
Panic Function –
توقف_فوري("رسالة خطأ").
✅ COMPLETED (2026-03-02)
- Complete Documentation — All features documented.
- Tutorial Series — Step-by-step learning materials.
- Example Programs — Comprehensive example collection.
- Performance Optimization — Profile and optimize compiler.
✅ COMPLETED (2026-03-02)
Goal: Prepare the codebase, toolchain, and language contracts for a low-risk self-hosting transition.
- Define canonical component boundaries — Frontend / Middle-End / Backend / Driver / Support.
- Set module-size policy — target
<= 700lines/file, hard cap1000lines for hand-written C modules. - Split oversized modules first —
analysis.c,emit.c,ir.c,ir_lower.c,ir_text.c,isel.c,lexer.c,parser.c,regalloc.c, andir_verify_ir.care now under the hard cap via companion implementation splits. - Restructure source layout safely — component directories now exist under
src/, and the build now targets those files directly. - Add local module facades —
frontend_internal.h,middleend_internal.h,backend_internal.h,driver_internal.h, andsupport_internal.hnow define component-local include surfaces. - Update build graph —
CMakeLists.txtremains explicit/deterministic and the Windows build uses C-only include propagation while header wrappers remain transitional. - Add size-regression guard —
scripts/check_module_sizes.pyenforces warn700/ error1000and runs inqa_run.py+ CI before full QA. - Document ownership map —
docs/COMPONENT_OWNERSHIP.mddefines module responsibilities + dependency rules.
✅ COMPLETED (2026-03-06)
Partial status update (2026-03-06):
- policy/guard/documentation are in place for
v0.5.0, - all remaining hard-cap hotspots were split below the
1000-line limit, scripts/module_size_allowlist.txtis now empty,- source files were reorganized under
src/frontend,src/middleend,src/backend,src/driver, andsrc/support, CMakeLists.txtnow builds directly from component sources, while root-level compatibility is limited to selected header wrappers during header migration.
- Freeze grammar surface — avoid syntax churn before bootstrap.
- Freeze stdlib signatures — lock callable contracts used by compiler-in-Baa.
- Freeze target ABI contracts — Windows x64 + SystemV AMD64 invariants.
- Freeze IR invariants — verifier-enforced guarantees documented.
✅ COMPLETED (2026-04-28)
- Deterministic include/import resolution — include resolution now canonicalizes successful paths before they become active lexer filenames, so equivalent relative spellings collapse to a single resolved path.
- Cycle diagnostics —
#تضمينcycles are now rejected early with Arabic diagnostics that print the include chain instead of recursing until depth exhaustion. - Symbol visibility rules — top-level functions keep external linkage by default, top-level
ساكنglobals/arrays keep file-local internal linkage, and multi-file QA now locks this contract with dedicated smoke coverage. - Header/API hygiene —
baa.his now a compatibility umbrella only; shared declarations were split into component-owned public headers undersrc/frontend/andsrc/support/, diagnostics no longer depend on frontend lexer headers, and the middle-end now consumes a small shared target contract instead of touching backend target layout directly.
- Incremental compilation model — avoid full rebuilds on small edits.
- Dependency tracking — reliable invalidation for headers/includes.
- Reproducible outputs — stable artifacts for same inputs/toolchain.
- Build profile presets — dev/debug/release/verify presets.
✅ COMPLETED (2026-04-28)
- Improve span accuracy — tighter line/column ranges.
- Add actionable fix hints — Arabic-first suggestions for common errors.
- Strengthen panic recovery — fewer cascading diagnostics.
- Negative test expansion — enforce diagnostic contracts.
✅ COMPLETED (2026-05-01)
- Assertion runtime contract — stable behavior in debug/release modes.
- Panic/error-code policy — consistent fatal/non-fatal paths.
- Safety toggles — explicit compile-time/runtime control flags.
- Document failure semantics — deterministic exit/status behavior.
- Cross-target parity suite — Linux/Windows behavior consistency.
- Fuzz + stress expansion — parser/semantic/IR robustness.
- IR/SSA regression locking — snapshots + verifier gating.
- Release gate checklist — mandatory pass criteria before Phase 5.
- Define minimal subset — features required to implement compiler slices in Baa.
- Ban unstable features in Baa0 — keep bootstrap surface conservative.
- Publish Baa0 compliance suite — tests for subset guarantees.
- Document migration plan — C modules to Baa equivalents by priority.
- Rewrite one compiler slice in Baa — start with
lexer(or smallest high-value slice). - Mixed pipeline build — C host compiler + Baa pilot module in one build.
- Behavioral equivalence checks — output parity vs C implementation.
- Go/No-Go report for Phase 5 — readiness decision based on objective gates.
- Cross-target QA green —
quick/fullpass on bothx86_64-windowsandx86_64-linux. - Determinism checks green — stable IR text and stable diagnostics for identical inputs.
- File-size governance active — CI guard for module-size budget is enforced.
- Contracts frozen and published — grammar/ABI/IR/Baa0 docs tagged and versioned.
- Bootstrap handoff bundle ready — scripts + manifests + reproducible build notes archived.
-
docs/COMPONENT_OWNERSHIP.md— boundaries + owners + allowed dependencies. -
docs/BOOTSTRAP_CONTRACT.md— frozen ABI/IR/language requirements for v0.9. -
docs/BAA0_SPEC.md— bootstrap subset definition and exclusions. -
scripts/qa_bootstrap_gate.py— one-command admission checks for Phase 5. -
tests/bootstrap/— parity corpus dedicated to migration from C → Baa slices.
Goal: The ultimate proof of capability — Baa compiling itself.
- Feature freeze during migration — no new language features while porting compiler slices.
- Parity-first policy — semantic/diagnostic/IR behavior must match C baseline unless explicitly approved.
- Target parity policy — every migration milestone must pass on Windows + Linux.
- Rollback-ready checkpoints — each slice port keeps a reversible gate in CI.
- Use frozen contracts from v0.5 — language/ABI/IR/Baa0 are inputs, not redefined in v0.9.
- Pin Stage-0 toolchain manifest — inherit deterministic build inputs from v0.5 gates.
- Create bootstrap snapshot tag — lock the handoff commit before Baa rewrites.
- Enable mixed-unit builds — compile/link
.cand.baacompiler units together. - Define bridge boundaries — stable ABI/data-layout interfaces between C and Baa modules.
- Add parity harness — module-level equivalence checks (tokens/AST/IR/asm as applicable).
- Golden corpus harness — fixed corpus with expected outputs for every migration step.
- Diff tooling — normalized comparator for diagnostics and IR text.
- Port lexer slice to Baa — use split layout from v0.5 component organization.
- Build through mixed harness — keep remaining compiler units in C.
- Token-stream parity tests — compare lexer output against C baseline.
- No feature expansion in v0.9 — only correctness/portability fixes permitted.
- UTF-8 and preprocessor parity — nested includes/macros/conditionals must match C behavior.
- Stress corpus parity — large and malformed sources produce equivalent diagnostics.
- Port parser slice to Baa — AST construction with existing grammar contracts.
- Build through mixed harness — C/Baa hybrid remains supported.
- AST + diagnostics parity tests — match C baseline trees and key error anchors.
- Recursion/stack validation — ensure depth safety on both targets.
- Panic recovery parity — same synchronization behavior for statement/declaration/switch modes.
- Alias/type grammar parity — no regressions in parser-side alias resolution.
- Port semantic slice to Baa — preserve existing type/ABI rules.
- Port symbol/scope tables — with ownership semantics equivalent to C version.
- Negative-suite parity — verify same diagnostics and failure points.
- Warning parity — warning classes and trigger points match baseline behavior.
- Deterministic diagnostics — stable ordering/text for repeated runs.
- Port IR core + lowering slices to Baa — aligned with v0.5 split modules.
- IR verifier parity — preserve well-formedness and SSA contracts.
- IR text parity tests — compare canonical IR output vs C baseline.
- Optimizer gate parity —
--verify-ir/--verify-ssaremain clean across corpus. - Arena/ownership parity — preserve module-owned bulk-free behavior.
- Port backend slices to Baa — ISel/RegAlloc/Emit with unchanged ABI behavior.
- Target parity — Windows x64 and Linux x64 assembly equivalence gates.
- Runtime parity — integration tests must match C compiler behavior.
- ABI edge-case suite — variadics, float args/returns, pointer-heavy calls, stack alignment.
- Asm diff audit — differences explained and approved when semantically equivalent.
- Port driver/diagnostics slices to Baa — CLI orchestration + error reporting.
- Retire mixed mode — switch default build to full-Baa compiler pipeline.
- Full compiler in Baa — all core components ported and wired.
- CLI parity matrix — all important flags preserve behavior (
--verify,--target,-S,-c, ...). - Multi-file parity — compile/link workflows remain deterministic and stable.
- Compile Baa compiler with C-Baa — Produces baa₁.
- Test baa₁ — Run full test suite.
- Compile Baa compiler with baa₁ — Produces baa₂.
- Compile Baa compiler with baa₂ — Produces baa₃.
- Verify baa₂ == baa₃ — Reproducible builds!
- Release v1.0.0 — Historic milestone! 🎉
- Functional parity — all core QA suites pass using baa₁/baa₂ toolchains.
- Reproducibility parity — bootstrap stages produce stable outputs per target.
- Operational parity — driver flags + diagnostics match expected contracts.
- Performance budget — compile-time regression remains within agreed threshold.
- Documentation parity — bootstrap and recovery procedures are complete.
#!/bin/bash
# verify_bootstrap.sh
echo "Stage 0: Building with C compiler..."
./baa_c baa.baa -o baa1.exe
echo "Stage 1: Building with Baa (first generation)..."
./baa1.exe baa.baa -o baa2.exe
echo "Stage 2: Building with Baa (second generation)..."
./baa2.exe baa.baa -o baa3.exe
echo "Verifying reproducibility..."
if diff baa2.exe baa3.exe > /dev/null; then
echo "✅ SUCCESS: baa2 and baa3 are identical!"
echo "🎉 BAA IS SELF-HOSTING!"
else
echo "❌ FAILURE: baa2 and baa3 differ!"
exit 1
fiGoal: Remove dependency on external assembler (GAS/MASM).
- Frontend — assembly lexer/parser with strict diagnostics.
- Core — instruction model + encoder + relocation planner.
- Writers — COFF/ELF object emitters with symbol/relocation tables.
- Integration — backend handoff + CLI controls + verification pipeline.
- Define instruction encoding tables — x86-64 opcode maps.
- Parse assembly text — Tokenize AT&T/Intel syntax.
- Build instruction IR — Internal representation of machine code.
- Handle labels — Track label addresses for jumps.
- MVP syntax decision — AT&T as canonical input first, Intel optional later.
- Source mapping — retain line/column mapping for assembler diagnostics.
- REX prefixes — 64-bit register encoding.
- ModR/M and SIB bytes — Addressing mode encoding.
- Immediate encoding — Handle different immediate sizes.
- Displacement encoding — Memory offset encoding.
- Instruction validation — Check valid operand combinations.
- Coverage matrix — define required instruction families for Baa backend output.
- Relocation-aware encoding — encode fixup placeholders for unresolved symbols.
- COFF format (Windows) — Generate .obj files.
- ELF format (Linux) — Generate .o files.
- Section handling — .text, .data, .bss, .rodata.
- Symbol table — Export/import symbols.
- Relocation entries — Handle address fixups.
- ABI metadata correctness — calling-convention relevant symbol attributes.
- Cross-platform conformance tests — validate produced
.obj/.owith system linkers.
- Replace GAS calls — Use internal assembler.
-
--use-internal-asmflag — Optional internal assembler. - Verify output — Compare with GAS output.
- Performance test — Ensure acceptable speed.
- Fallback policy — clear fallback to external assembler on unsupported patterns.
- CI dual-mode runs — external/internal assembler comparison in regression pipeline.
- Error messages — Clear assembly error diagnostics.
- Debug info — Generate debug symbols.
- Listing output — Optional assembly listing with addresses.
- Documentation — Assembler internals guide.
- Default-on readiness — internal assembler can become default for supported targets.
- Stability gate — crash-free stress/fuzz-lite runs across assembler corpus.
- Parity signoff — approved diff report vs GAS on representative corpus.
Goal: Remove dependency on external linker (ld/link.exe).
- Object ingestion layer — robust COFF/ELF readers with strict validation.
- Link graph core — symbol table, section graph, relocation plan, address assignment.
- Format writers — PE/ELF executable writers with deterministic layout rules.
- Runtime bridge layer — controlled integration with CRT/system runtime requirements.
- Verification layer — parity checks vs system linkers + deterministic output checks.
- Deterministic link ordering — stable output independent of host filesystem ordering.
- No silent symbol resolution — ambiguous/duplicate/undefined symbols must emit explicit diagnostics.
- Relocation correctness first — overflow and invalid relocation types are hard failures.
- Target isolation — Windows and Linux paths share abstractions, not target-specific hacks.
- Parse object files — Read COFF/ELF format.
- Symbol resolution — Match symbol references to definitions.
- Section merging — Combine sections from multiple objects.
- Memory layout — Assign virtual addresses to sections.
- Archive scanning strategy — deterministic one-pass/two-pass policy for
.a/.lib. - Input validation policy — reject malformed object metadata with Arabic diagnostics.
- Apply relocations — Fix up addresses in code/data.
- Handle relocation types — PC-relative, absolute, GOT, PLT.
- Overflow detection — Check address range limits.
- Relocation test matrix — per-target coverage for required relocation kinds.
- Late-binding audit logs — debug trace mode for relocation decisions.
- PE header — DOS stub, PE signature, file header.
- Optional header — Entry point, section alignment, subsystem.
- Section headers — .text, .data, .rdata, .bss.
- Import table — For C runtime and Windows API.
- Export table — If building DLLs (future).
- Generate .exe — Complete Windows executable.
- PE conformance checks — verify headers/alignments with tooling (
dumpbin/llvm-readobj). - Subsystem policies — console/gui/custom-startup rules documented and tested.
- ELF header — File identification, entry point.
- Program headers — Loadable segments.
- Section headers — .text, .data, .rodata, .bss.
- Dynamic linking info — For libc linkage.
- Generate executable — Complete Linux binary.
- ELF conformance checks — validate segments/sections with
readelf/objdump. - PIE/non-PIE policy — deterministic handling for code model and relocation mode.
- Static libraries — Link .a/.lib archives.
- Library search paths —
-Lflag support. - Entry point selection — Custom entry point support.
- Strip symbols — Remove debug symbols for release.
- Map file — Generate link map for debugging.
- Section GC plan — optional dead-section elimination with safety checks.
- Weak symbol semantics — explicit target-specific behavior for weak/strong bindings.
- Replace ld/link calls — Use internal linker.
-
--use-internal-linkerflag — Optional internal linker. - Verify output — Compare with system linker output.
- End-to-end test — Compile and link without external tools.
- Dual-link CI mode — run internal and system linker paths on same corpus.
- Fallback policy — controlled fallback with reasoned diagnostics when unsupported.
- Cross-target parity signoff — runtime and symbol behavior match system linker expectations.
- Determinism signoff — stable binary layout for identical inputs/toolchains.
- Stress signoff — large multi-object links and archive-heavy workloads remain stable.
- Default-on readiness — internal linker can be default for supported targets.
- Regression stability — quick/full/stress QA pass with internal linker enabled.
- Debuggability baseline — diagnostics + map output sufficient for failure triage.
Goal: Zero external dependencies — Baa builds itself with no external tools.
- Runtime kernel layer — startup, syscall/API bridge, memory primitives, and process exit flow.
- Native stdlib layer — string/math/memory/io modules implemented in Baa with stable ABI contracts.
- Bootstrap provenance layer — staged compiler lineage (
baa0 -> baa1 -> baa2) with hashable artifacts. - Hermetic build layer — controlled inputs, pinned flags, deterministic packaging outputs.
- Operational verification layer — dependency scans, runtime smoke tests, and cross-target release gates.
- No hidden host dependencies — any required host tool must be explicitly listed in bootstrap docs.
- Reproducibility-first policy — build determinism regressions block release.
- Cross-target parity policy — Windows/Linux runtime features must ship together or stay gated.
- Fail-fast runtime diagnostics — startup/syscall/runtime failures must emit explicit Arabic-first errors.
- Recovery-path requirement — each independence milestone must define rollback and re-bootstrap steps.
Windows:
- Direct Windows API calls — Replace printf with WriteConsoleA.
- Implement
اطبعnatively — Direct syscall/API. - Implement
اقرأnatively — ReadConsoleA. - Implement memory functions — HeapAlloc/HeapFree instead of malloc/free.
- Implement file I/O — CreateFile, ReadFile, WriteFile.
- Custom entry point — Replace C runtime startup.
- SEH-aware startup — preserve stable crash/exit semantics without CRT helpers.
- Win64 ABI compliance checks — stack alignment, shadow space, and return path verified.
Linux:
- Direct syscalls — write, read, mmap, exit.
- Implement
اطبعnatively — syscall to write(1, ...). - Implement
اقرأnatively — syscall to read(0, ...). - Implement memory functions — mmap/munmap for allocation.
- Implement file I/O — open, read, write, close syscalls.
- Custom _start — No libc dependency.
- Syscall ABI validation — register/stack contracts verified on x86_64 SystemV.
- Signal/exit behavior checks — deterministic termination semantics across test corpus.
- Rewrite string functions in Baa — No C dependency.
- Rewrite math functions in Baa — Pure Baa implementation.
- Rewrite memory functions in Baa — Custom allocator.
- Full standard library in Baa — All library code in Baa.
- Module boundary contracts — document stable APIs for
core,memory,string,math,io. - Behavioral parity suite — compare stdlib behavior vs prior runtime expectations.
- Allocator policy gates — fragmentation/throughput baselines for long-running workloads.
- Single binary compiler — No external dependencies.
- Cross-compilation support — Build Linux binary on Windows and vice versa.
- Reproducible builds — Same source → identical binary.
- Bootstrap from source — Document minimal bootstrap path.
- Hermetic manifest — lock source inputs, flags, and artifact metadata per release.
- Stage replay tooling — deterministic scripts to replay bootstrap on clean machines.
- Provenance hashing — publish hashes for each stage artifact and final binaries.
- Full test suite passes — All tests without external tools.
- Benchmark comparison — Performance vs GCC toolchain.
- Security audit — Review for vulnerabilities.
- Documentation complete — Full toolchain documentation.
- Release v3.0.0 — Fully independent Baa! 🎉
- Supply-chain audit — verify release pipeline does not reintroduce external tool reliance.
- Disaster-recovery drill — validate clean-room bootstrap from tagged sources.
- Zero-dependency signoff — compile/assemble/link/run path requires only Baa-delivered artifacts.
- Deterministic bootstrap signoff — repeated stage builds are byte-stable per target.
- Cross-target signoff — Windows + Linux release candidates pass identical gate checklist.
- Operational signoff — upgrade/rollback/bootstrap recovery procedures validated and documented.
- Runtime independence — no libc/CRT dependency in default build+run workflow.
- Toolchain independence — compiler, assembler, linker, and runtime owned by Baa project.
- Reproducibility baseline — release artifacts are verifiable and reproducible from source.
- Sustainability baseline — maintenance docs and on-call triage playbooks are complete.
┌────────────────────────────────────────────────────────────────┐
│ Baa Toolchain Evolution │
├────────────────────────────────────────────────────────────────┤
│ │
│ v0.2.x (Current): │
│ ┌─────────┐ ┌───────────────────────────────────────────┐ │
│ │ Baa │ → │ GCC (assembler + linker + C runtime) │ │
│ │ Compiler│ │ │ │
│ └─────────┘ └───────────────────────────────────────────┘ │
│ │
│ v1.0.0 (Self-Hosting): │
│ ┌─────────┐ ┌───────────────────────────────────────────┐ │
│ │ Baa │ → │ GCC (assembler + linker + C runtime) │ │
│ │ in Baa! │ │ │ │
│ └─────────┘ └───────────────────────────────────────────┘ │
│ │
│ v1.5.0 (Own Assembler): │
│ ┌─────────┐ ┌─────────┐ ┌─────────────────────────────┐ │
│ │ Baa │ → │ Baa │ → │ GCC (linker + C runtime) │ │
│ │ Compiler│ │ Assembler│ │ │ │
│ └─────────┘ └─────────┘ └─────────────────────────────┘ │
│ │
│ v2.0.0 (Own Linker): │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌───────────────┐ │
│ │ Baa │ → │ Baa │ → │ Baa │ → │ C Runtime │ │
│ │ Compiler│ │ Assembler│ │ Linker │ │ (printf etc) │ │
│ └─────────┘ └─────────┘ └─────────┘ └───────────────┘ │
│ │
│ v3.0.0 (Full Independence): │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Baa Toolchain (100% Baa) │ │
│ │ Compiler → Assembler → Linker → Native Runtime │ │
│ │ │ │
│ │ No External Dependencies! │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
v0.2.0 — The Driver (CLI & Build System)
- CLI Argument Parser — Implement a custom argument parser to handle flags manually.
- Input/Output Control (
-o,-S,-c). - Information Flags (
--version,--help,-v). - Build Pipeline — Orchestrate Lexer -> Parser -> Codegen -> GCC.
v0.2.1 — Polish & Branding
- Executable Icon — Embed
.icoresource. - Metadata — Version info, Copyright, Description in
.exe.
v0.2.2 — The Diagnostic Engine Patch
- Source Tracking — Update
TokenandNodeto store Filename, Line, and Column. - Error Module — Create a dedicated error reporting system.
- Pretty Printing — Display errors with context (
^pointers). - Panic Recovery — Continue parsing after errors.
v0.2.3 Distribution & Updater Patch
- Windows Installer — Create
setup.exeusing Inno Setup. - PATH Integration — Add compiler to system environment variables.
- Self-Updater — Implement
baa updatecommand.
v0.2.4 The Semantic Pass (Type Checker)
- File Extension Migration — Change
.bto.baa. Reserved.baahdfor headers. - Pass Separation — Completely separate Parsing from Code Generation.
parse()returns a raw AST.analyze()walks the AST to check types and resolve symbols.- The backend pipeline consumes a validated AST.
- Symbol Resolution — Check for undefined variables before code generation starts.
- Scope Analysis — Implement scope stack to properly handle nested blocks and variable shadowing.
- Type Checking — Validate assignments (int = string now fails during semantic analysis).
v0.2.5 Multi-File & Include System
- File Extension Migration — Change
.bto.baa. Reserved.baahdfor headers. - Include Directive —
#تضمين "file.baahd"(C-style#include). - Header Files —
.baahdextension for declarations (function signatures, extern variables). - Function Prototypes — Declarations without types
صحيح دالة().(Added). - Multi-file CLI — Accept multiple inputs:
baa main.baa lib.baa -o out.exe. - Linker Integration — Compile each file to
.othen link together.
v0.2.6 Preprocessor Directives
- Define —
#تعريف اسم قيمةfor compile-time constants. - Conditional —
#إذا_عرف,#إذا_عرف,#إذا_لم_يعرف,#وإلا,#وإلا_إذا,#نهاية_إذاfor conditional compilation. - Undefine —
#الغاء_تعريفto remove definitions.
v0.2.7 Constants & Immutability
- Constant Keyword —
ثابتfor immutable variables:ثابت صحيح حد = ١٠٠. - Const Checking — Semantic error on reassignment of constants.
- Array Constants — Support constant arrays.
v0.2.8 Warnings & Diagnostics
- Warning System — Separate warnings from errors (non-fatal).
- Unused Variables — Warn if variable declared but never used.
- Dead Code — Warn about code after
إرجعorتوقف. -
-WFlags —-Wall,-Werrorto control warning behavior.
v0.2.9 — Input & UX Polish
- Input Statement —
اقرأ س.(scanf) for reading user input. - Boolean Type —
منطقيtype withصواب/خطأliterals. - Colored Output — ANSI colors for errors (red), warnings (yellow). (Implemented in v0.2.8)
- Compile Timing — Show compilation time with
-v.
v0.1.3 — Control Flow & Optimizations
- Extended If — Support
وإلا(Else) andوإلا إذا(Else If) blocks. - Switch Statement —
اختر(Switch),حالة(Case),افتراضي(Default) - Constant Folding — Compile-time math (
١ + ٢→٣)
v0.1.2 — Recursion & Strings
- Recursion — Stack alignment fix
- String Variables —
نصtype - Loop Control —
توقف(Break) &استمر(Continue)
v0.1.1 — Structured Data
- Arrays — Fixed-size stack arrays (
صحيح قائمة[١٠]) - For Loop —
لكل (..؛..؛..)syntax - Logic Operators —
&&,||,!with short-circuiting - Postfix Operators —
++,--
v0.1.0 — Text & Unary
- Strings — String literal support (
"...") - Characters — Character literals (
'...') - Printing — Updated
اطبعto handle multiple types - Negative Numbers — Unary minus support
v0.0.9 — Advanced Math
- Math — Multiplication, Division, Modulo
- Comparisons — Greater/Less than logic (
<,>,<=,>=) - Parser — Operator Precedence Climbing (PEMDAS)
v0.0.8 — Functions
- Functions — Function definitions and calls
- Entry Point — Mandatory
الرئيسيةexported asmain - Scoping — Global vs Local variables
- Windows x64 ABI — Register passing, stack alignment, shadow space
v0.0.7 — Loops
- While Loop —
طالماimplementation - Assignments — Update existing variables
v0.0.6 — Control Flow
- If Statement —
إذاwith blocks - Comparisons —
==,!= - Documentation — Comprehensive Internals & API docs
v0.0.5 — Type System
- Renamed
رقمtoصحيح(int) - Single line comments (
//)
v0.0.4 — Variables
- Variable declarations and stack offsets
- Basic symbol table
v0.0.3 — I/O
-
اطبع(Print) via Windowsprintf - Multiple statements support
v0.0.2 — Math
- Arabic numerals (٠-٩)
- Addition and subtraction
v0.0.1 — Foundation
- Basic pipeline: Lexer → Parser → Codegen → GCC
| Phase | Version | Milestone | Dependencies |
|---|---|---|---|
| Phase 3 | v0.3.x | IR Complete | GCC |
| Phase 3.5 | v0.3.3-v0.3.12 | Language Complete | GCC |
| Phase 4 | v0.4.x | Standard Library | GCC |
| Phase 5 | v1.0.0 | Self-Hosting 🏆 | GCC |
| Phase 6 | v1.5.0 | Own Assembler | GCC (linker only) |
| Phase 7 | v2.0.0 | Own Linker | C Runtime only |
| Phase 8 | v3.0.0 | Full Independence 🏆 | Nothing! |
For detailed changes, see the Changelog iled changes, see the Changelog*