Tyrne boots in four stages: QEMU (or the board firmware) hands control to the ELF entry point, a short assembly stub sets up the runtime environment, a Rust entry function (kernel_entry) wires the BSP together and brings up every kernel subsystem, and finally start() transfers control to the cooperative scheduler. This document is the "how" for Phase 4c on bsp-qemu-virt; the "why" for each concrete choice lives in ADR-0012. Each future BSP will follow the same stage structure with its own addresses and peripherals.
The overall three-layer architecture is described in overview.md, and the HAL traits the kernel uses are in hal.md. This document focuses specifically on the boot path from reset to scheduler steady state, as implemented for the QEMU virt aarch64 target.
The four boot stages, each with a tightly bounded responsibility:
- Firmware / loader. QEMU's
-kernelflag loads the ELF image at its load address (0x40080000per ADR-0012; the image is linked high but loaded low — see §"High-half migration"), sets the PC to the ELF's entry point (_start_phys, the LOW physical address of_start— the MMU is off at reset), and enters at EL1 (default QEMUvirt) or EL2 (-machine virtualization=on, or most real-hardware boot stacks delivering at EL2). The device-tree blob address is placed inx0; v1 ignores it. - Assembly stub (
_start). Three phases: first, K3-12 (interrupts masked viaMSR DAIFSet, #0xf) executes at the very head of the reset vector so a spurious interrupt cannot escape into an uninstalled vector table. Second, the EL drop (per ADR-0024) readsCurrentEL; on EL2 it configuresHCR_EL2/SPSR_EL2/ELR_EL2anderets to a post-drop label, on EL1 it falls through, on EL3 (or any unexpected EL) it halts in a named-labelwfe-loop (halt_unsupported_el: wfe ; b halt_unsupported_el) — there is no Rust panic infrastructure pre-kernel_entry. Third, the conventional setup: load__stack_topintoSP, enable FP/SIMD viaCPACR_EL1, zero the BSS range (__bss_start..__bss_end) using 8-byte stores, and branch tokernel_entry. Ifkernel_entryever returns (it shouldn't), the stub falls into a defensivewfe ; b 2bhalt loop. After phase two, every later instruction runs at EL1 — the precondition T-009'sUNSAFE-2026-0016runtime check now relies on as a load-bearing invariant rather than a defensive guard. kernel_entry→kernel_main_high(Rust, in the BSP). The first Rust code to run, split across the high-half migration (T-022 / ADR-0033; see §"High-half migration" below for the mechanism):kernel_entry(LOW physical alias, MMU off → low identity). Constructs a throwaway low-MMIOPl011Uartfor early diagnostics, installs the EL1 vector table (T-012, low vectors), activates the low-identity MMU viammu_bootstrap(T-016 / ADR-0027 — lands the v1 identity layout inTTBR0_EL1, flipsSCTLR_EL1.{M,I,C} = 1; MMIO goes through device-nGnRnE attributes), then builds the high-halfTTBR1_EL1tables viahigh_half_activate(T-022 / ADR-0033 —EPD1 1→0, both regimes now live) and branches the running kernel into the high half through the migration trampoline (MSR VBAR-high; rebaseSP;br kernel_main_high). It never returns. Marked#[no_mangle] extern "C"so the assembly stub can find it.kernel_main_high(HIGH half,TTBR1_EL1). FreesTTBR0_EL1(null +EPD0 = 1+TLBI VMALLE1), printstyrne: high-half active, then runs the rest of bring-up at high-half addresses: constructs the persistentPl011Uart+QemuVirtCpuat the HIGH device-MMIO alias, captures the boot-to-end timestamp, initialises the Physical Memory Manager (T-017 / ADR-0035 — bitmap allocator over the 128 MiB RAM extent, two reserved ranges covering the QEMU firmware region and the kernel image /.bss/.boot_pt/ boot stack), initialises the address-space arena (T-018 / ADR-0028 — wraps the bootstrap L0 root asAddressSpaceArena<QemuVirtMmu>slot 0 + mints the bootstrap AS authority cap; noMmu::create_address_spaceon the populated root per ADR-0028 §Simulation row 0), loads the embedded userspace placeholder image viatask_loader::load_image(T-019 / ADR-0029 — produces aLoadedImagefor the embeddedmov w0, #42; retblob; does NOT execute — runnability gates on B6 per phase-b §B4 §Revision-notes; first runtime exerciser of UNSAFE-2026-0025 post-bootstrapMmu::map, UNSAFE-2026-0026Pmm::alloc_framezero-fill, and UNSAFE-2026-0027 loader byte-copy), initialises the GIC, unmasksDAIF.I, prints the timer banner, then sets up the kernel-object arenas + capability tables + IPC + scheduler before transferring control tostart().
- Scheduler start (
start). The final call inkernel_main_highisstart(SCHED.as_mut_ptr(), cpu, activate_address_space), which hands control to the cooperative FIFO scheduler and never returns; the scheduler runs the first ready task and drives the cooperative IPC demo until the system halts (see scheduler.md). An early design intended a portabletyrne_kernel::runthat a BSP would delegate to; the B-phase brought subsystem bring-up into the kernel-entry path instead, andstart(defined inkernel/src/sched/mod.rs) is the actual handoff point. Consolidating the bring-up back into a portable kernel entry is a possible future refactor.
Since T-022 the kernel runs in the high half (TTBR1_EL1) so TTBR0_EL1 is free for per-task userspace — the ADR-0033 prerequisite that unblocks a real EL0 task's SVC vector fetch (B6). The kernel image is linked high (KBASE = 0xFFFF_FFFF_4008_0000) but loaded low (0x4008_0000); the ELF entry is forced to _start's low physical address (_start_phys, linker.ld) because the MMU is off at reset. A single linear high-half offset KERNEL_HIGH_HALF_OFFSET = 0xFFFF_FFFF_0000_0000 maps physical memory: kernel_VA = OFFSET + PA (tyrne_hal::phys_to_kernel_va). The boot-time transition (ADR-0033 §Simulation):
kernel_entry(LOW). Runs at the low physical alias with the MMU off. Because the whole image is high-linked uniformly, PC-relativeadrp/adrreferences resolve to LOW (load) addresses at runtime (the offset cancels between in-image symbols), so no separate identity-VMA section is needed. It enables the low-identity MMU (mmu_bootstrap), then builds the high-halfTTBR1tables and clearsEPD1(mmu_bootstrap::high_half_activate:DSB ISH→MSR TTBR1_EL1→ISB→MSR TCR_EL1withEPD1 = 0→ISB). Both regimes are now live; the kernel still executes low.- Migration trampoline (the crossing). A small inline-asm block:
MSR VBAR_EL1, <high>+ISB(high vectors live before the branch) →add sp, sp, OFFSET(rebaseSPto the high stack alias) →br <kernel_main_high high VA>,options(noreturn). The PC physically crosses low→high at thebr;DAIFis masked and noStaticCellholds a low VA, so the crossing cannot brick. kernel_main_high(HIGH). FreesTTBR0_EL1(MSR TTBR0_EL1, xzr+ setEPD0 = 1+TLBI VMALLE1+DSB ISH), prints the newtyrne: high-half activeboot marker, then constructs the console + GIC at their high device-MMIO aliases and runs the rest of the bring-up (§Stage 3) at high-half addresses. Function pointers / vtables (absolute, HIGH) are all taken here, so they stay reachable onceTTBR0is freed.
v1 maps the whole high-half RAM window PXN = 0 (RWX-equivalent, like the identity map it replaces; AP = 0b00 keeps EL0 with no access); the ADR-0033 layout's distinct PXN = 1 physmap region is per-section W^X hardening deferred to ADR-0034. The migration is fault-clean (-d int,unimp: exactly the 2 syscall-smoke SVC exceptions, zero new Translation/Permission faults). Audit: UNSAFE-2026-0031 + Amendments to 0022/0023/0024.
Forward limit (Pi 4 / large images).
KERNEL_HIGH_HALF_OFFSET = 0xFFFF_FFFF_0000_0000bounds the direct map to the low 4 GiB of PA, and the migration mask (OFFSET | (addr & 0xFFFF_FFFF)) assumes the kernel image PA is below 4 GiB. A BSP with > 4 GiB RAM or peripherals above 4 GiB (e.g. the Raspberry Pi 4, Phase D) needs a different offset and a revisited mask before carrying this pattern over.
sequenceDiagram
participant QEMU as QEMU virt / firmware
participant Asm as _start (asm stub)
participant KE as kernel_entry (BSP, Rust)
participant U as PL011 UART
QEMU->>Asm: PC = _start, DTB in x0 (ignored), entry EL = 1 or 2
Note over Asm: Phase 1 — K3-12: msr daifset, #0xf<br/>(interrupts masked from very first instruction)
Note over Asm: Phase 2 — EL drop (per ADR-0024)<br/>read CurrentEL; mask bits[3:2]
alt CurrentEL == EL2
Asm->>Asm: configure HCR_EL2 (RW=1, E2H=0, TGE=0)
Asm->>Asm: SPSR_EL2 = EL1h | DAIF masked (0x3c5)
Asm->>Asm: ELR_EL2 = post_eret label; eret
Note over Asm: now at EL1, DAIF still masked
else CurrentEL == EL1
Note over Asm: fall through (no drop needed)
else CurrentEL == EL3 (unsupported)
Note over Asm: halt_unsupported_el: wfe ; b halt_unsupported_el
end
Note over Asm: Phase 3 — conventional setup<br/>SP ← __stack_top<br/>CPACR_EL1.FPEN ← 0b11; isb<br/>BSS zeroed (__bss_start..__bss_end)
Asm->>KE: bl kernel_entry (EL = 1, guaranteed)
Note over KE: T-009 / UNSAFE-2026-0016 asserts CurrentEL == 1<br/>as a load-bearing post-condition of Phase 2
Note over KE: ── kernel_entry (LOW physical alias; MMU off) ──
KE->>KE: early Pl011Uart at LOW 0x0900_0000 (identity)
KE->>U: write_bytes(b"tyrne: hello from kernel_main\n")
KE->>KE: install VBAR_EL1 (low vectors; T-012)
KE->>KE: mmu_bootstrap() — low-identity MMU on<br/>(T-016 / ADR-0027)
KE->>U: write_bytes(b"tyrne: mmu activated\n")
KE->>KE: high_half_activate() — build TTBR1 tables, EPD1 1→0<br/>(T-022 / ADR-0033; both regimes now live)
KE->>KE: migration trampoline — MSR VBAR-high; ISB;<br/>add sp,sp,OFFSET; br kernel_main_high (PC crosses low→high)
Note over KE: ── kernel_main_high (HIGH half, TTBR1_EL1) ──
KE->>KE: free TTBR0_EL1 (xzr + EPD0=1 + TLBI VMALLE1)
KE->>KE: Pl011Uart + QemuVirtCpu at HIGH device-MMIO alias
KE->>U: write_bytes(b"tyrne: high-half active\n")
KE->>KE: boot_ns = cpu.now_ns() snapshot (post-migration)
KE->>KE: Pmm::new — Physical Memory Manager init<br/>(T-017 / ADR-0035)
KE->>U: write_bytes(b"tyrne: pmm initialized (...)\n")
KE->>KE: AddressSpace arena init — wrap bootstrap L0<br/>(T-018 / ADR-0028; populated-but-uninstalled root post-T-022)
KE->>U: write_bytes(b"tyrne: address-space-arena ready (...)\n")
KE->>KE: task_loader::load_image — embedded raw-flat blob<br/>into a fresh AS (T-019 / ADR-0029; NOT executed)
KE->>U: write_bytes(b"tyrne: image loaded (...)\n")
KE->>KE: GIC init + DAIF.I unmask (T-012; high device-MMIO)
KE->>U: write_bytes(b"tyrne: timer ready (...)")
KE->>KE: kernel-object setup, IPC, scheduler
KE->>KE: start() — never returns
Note over KE: steady state — cooperative IPC demo (high half)
The kernel image is a single contiguous block starting at 0x40080000; RAM below that is reserved for QEMU's internal use. The initial stack is a 64 KiB region reserved at the image's tail.
0x4000_0000 ─── RAM start (reserved for QEMU firmware region)
...
0x4008_0000 ─── _start (.text.boot) ← ELF entry
.text
.rodata
.data
.bss (zeroed by _start)
[reserved 64 KiB] (initial stack region)
__stack_top ─── high end of stack
...
0x4800_0000 ─── end of 128 MiB RAM region
- Code and read-only data (
.text,.rodata) are loaded at their linked addresses. - Initialized data (
.data) is loaded from the ELF. - BSS is zeroed in
_startbefore Rust executes, so allstaticitems in safe Rust see their declared initial values (zero for BSS-resident statics). - Stack grows downward from
__stack_top. Nothing enforces that it does not grow into.bss— stack overflow is undefined behaviour in v1. Guard pages arrive with MMU setup.
.section .text.boot, "ax"
.global _start
_start:
/* (1) K3-12: mask DAIF before anything else. */
msr daifset, #0xf
/* (2) EL drop per ADR-0024. Read CurrentEL; mask bits[3:2]. */
mrs x0, CurrentEL
and x0, x0, #(3 << 2)
cmp x0, #(2 << 2)
b.eq el2_to_el1 // EL2 → drop to EL1
cmp x0, #(1 << 2)
b.eq post_eret // already at EL1 → skip drop
halt_unsupported_el: // EL3 (or anything else) → halt
wfe
b halt_unsupported_el
el2_to_el1:
mov x0, #(1 << 31) // HCR_EL2.RW = 1 (EL1 = aarch64); E2H/TGE = 0 (non-VHE)
msr hcr_el2, x0
mov x0, #0x3c5 // SPSR_EL2 = EL1h | DAIF masked
msr spsr_el2, x0
adrp x0, post_eret
add x0, x0, :lo12:post_eret
msr elr_el2, x0
eret
post_eret:
/* (3) Conventional setup. From here on, EL is guaranteed = 1. */
adrp x0, __stack_top // page-aligned base of the symbol
add x0, x0, :lo12:__stack_top // add the low 12 bits
mov sp, x0 // set SP
mov x0, #0x300000 // CPACR_EL1.FPEN = 0b11
msr cpacr_el1, x0
isb
adrp x0, __bss_start
add x0, x0, :lo12:__bss_start
adrp x1, __bss_end
add x1, x1, :lo12:__bss_end
0: cmp x0, x1
b.hs 1f
str xzr, [x0], #8
b 0b
1: bl kernel_entry // hand off to Rust
2: wfe // defensive halt if we return
b 2badrp + add with :lo12: is the standard aarch64 idiom for "address of symbol" — PC-relative, handles any static layout the linker picks. str xzr, [x0], #8 stores the zero register with post-increment. eret consumes SPSR_EL2's mode + DAIF + register state and ELR_EL2's target address: after the instruction the CPU runs at EL1 with DAIF still masked (the K3-12 mask propagates via SPSR_EL2's DAIF bits, so no second msr daifset is needed at post_eret). The full safety argument lives in UNSAFE-2026-0017.
bsp-qemu-virt/linker.ld pins the above memory map:
ENTRY(_start_phys)— the ELF'se_entryis set to_start_phys(= _start - KERNEL_HH_OFFSET), the LOW physical address of_start, so QEMU's reset PC is physical (the MMU is off at reset; the high VMA would translation-fault immediately). This matches the link-high/load-low migration described in §"High-half migration" and ADR-0033.- Link-high / load-low (ADR-0033). Three constants pin the split —
KERNEL_HH_OFFSET = 0xFFFF_FFFF_0000_0000,KERNEL_IMAGE_PHYS_BASE = 0x40080000, andKBASE = KERNEL_HH_OFFSET + KERNEL_IMAGE_PHYS_BASE(= 0xFFFF_FFFF_4008_0000). Virtual addresses start at. = KBASE; each section sets its load address low viaAT(ADDR(.section) - KERNEL_HH_OFFSET), so the whole image is one uniform high-half alias of the physical image loaded at0x40080000. (There is noMEMORY {}block — the single 128 MiB region is expressed directly withKBASE+AT().) .textstarts withKEEP(*(.text.boot))so_startis first (VMAKBASE, LMA0x40080000— where QEMU loads it and where it runs with the MMU off), followed by the 2 KiB-alignedKEEP(*(.text.vectors))exception-vector table (VBAR_EL1requires 2 KiB alignment)..bssis 8-byte aligned at both ends so the BSS-zero loop can step by 8.- A 64 KiB stack region is reserved after
.bss;__stack_topnames its high end. /DISCARD/drops.comment,.note.*,.eh_frame*, and.gcc_except_table*— unwinding tables are dead weight underpanic=abort.
When kernel_entry, the scheduler, or any later kernel code panics, control reaches the BSP's #[panic_handler] function. In Phase 4c, that handler:
- Reconstructs the
Pl011Uart(the original instance may not be reachable from the panic context). - Writes a short marker (
"\n!! tyrne panic !!\n"). - Writes the panic message using
FmtWriteradapted onto theConsole. - Halts in a
spin_loopthat never returns.
This is the minimum useful panic reporting. Future revisions will add core id, register state, and a backtrace — each requires additional infrastructure that is not in v1.
Properties the boot flow maintains. These are the claims a reader can rely on and a test can exercise.
- Entry is deterministic.
_startalways runs the same sequence of instructions on the same input. - Interrupts are masked from the very first instruction. K3-12:
MSR DAIFSet, #0xfis the literal first instruction at_start. The mask carries through the EL drop viaSPSR_EL2's DAIF bits, so it is still in effect atkernel_entry. Tasks unmask explicitly viaCpu::restore_irq_state(IrqState(0))when they need interrupts. kernel_entryruns at EL1 unconditionally. Per ADR-0024: if the BSP is delivered at EL2,_start's drop sequence transitions to non-VHE EL1; if delivered at EL1, the drop is a no-op; if delivered at EL3 (no v1 hardware target does),_starthalts loudly. T-009'sUNSAFE-2026-0016runtime check insideQemuVirtCpu::newis the post-condition that pins this.- The stack is set before any Rust code runs. No Rust code executes with an undefined
SP. - BSS is zero when Rust sees it. All
staticitems in safe Rust have their declared initial values. kernel_entrynever runs more than once. There is only one boot CPU in v1; it callskernel_entryonce.kernel_entrynever returns to the asm stub. It is-> !; a return would be a bug and is defensively halted by the stub.- Hardware MMIO addresses are hardcoded. No runtime discovery. BSP-specific; justified because
virtis a fixed platform. panic=abort, not unwind. No unwinding tables in the binary; panics halt.
- EL drop is
boot.s-side, not kernel-side. ADR-0024 Option A — the kernel reasons about exactly one EL (EL1, non-VHE) andboot.sdoes the work of getting there. The alternative (multi-EL kernel code) was rejected because the maintenance tax compounds across every later HAL impl. The cost is ~30 lines of asm in_start. - DTB ignored. Convenient now; will need explicit parsing when the first board with runtime topology (Pi 4) lands.
- Stack is a fixed 64 KiB with no guard page. Overflow is UB. Good enough for v1; per-task stacks with guards come with the scheduler.
_startis hand-written assembly. Every BSP will have its own. A shared-boot library would force premature commonality; we accept the duplication to keep each BSP's boot transparent.- Hardcoded UART base.
0x0900_0000is QEMUvirtspecific. Each BSP carries its own constants; the trade is deliberate (see P6 — HAL separation).
- EL3 → EL2 → EL1 chain. v1 hardware targets do not boot at EL3; if a future BSP requires it, a follow-up task adds the EL3→EL2 transition on top of the existing EL2→EL1 logic per ADR-0024 §Open questions.
- DTB parsing and
BootInfo. The kernel's typed boot-info contract, probably introduced with Pi 4 support. - Multi-core start. PSCI
CPU_ONfor secondary cores. High-half kernel migration.Resolved (T-022 / ADR-0033, 2026-05-30) — the kernel now runs inTTBR1_EL1andTTBR0_EL1is freed for per-task userspace (see §"High-half migration" above). v1 keeps the whole high-half RAM windowPXN = 0(RWX-equivalent); per-section W^X hardening (a distinctPXN = 1physmap) is deferred to ADR-0034.- Guard-page stacks. With the MMU now active (T-016), guard-page stacks become reachable — pending a follow-on B-phase task that remaps a stack region's bottom page as invalid.
- Measured boot / attestation. Hardware-dependent; deferred per ADR-0012.
- ADR-0012: Boot flow and memory layout for
bsp-qemu-virt. - ADR-0024: EL drop to EL1 policy.
- ADR-0004: Target platforms.
- ADR-0006: Workspace layout.
hal.md— the HAL traits the BSP implements.overview.md— three-layer architecture.docs/guides/run-under-qemu.md— how to actually run the kernel.- QEMU
virtmachine documentation — https://qemu.readthedocs.io/en/latest/system/arm/virt.html - ARM Architecture Reference Manual (ARMv8-A) —
adrp/ERET/ EL semantics. - PL011 UART documentation — for the console implementation.