Skip to content

Latest commit

 

History

History
234 lines (179 loc) · 37 KB

File metadata and controls

234 lines (179 loc) · 37 KB

0027 — Kernel virtual memory layout (B2 — identity-mapped MMU activation)

  • Status: Accepted
  • Date: 2026-05-08
  • Deciders: @cemililik

Context

Phase B2 activates the MMU. Until this milestone the kernel runs with translation off — every address is a physical address, the data and instruction caches operate in their post-reset (mostly disabled) state, and there is no way to differentiate device-MMIO accesses from normal-RAM accesses at the architectural level. The next milestones (B3 onwards) want a programmable address-space surface for capability-mediated grants, MMIO that obeys the device-attribute discipline, and (eventually, in B5) a per-task user half. None of that is reachable while SCTLR_EL1.M = 0.

The B2 milestone scope, per phase-b.md §B2, is "MMU activation (kernel-half mapping)". The phase-plan note explicitly enumerates three sub-decisions ADR-0027 must settle:

  1. Identity vs. high-half split. Where does the kernel image live in virtual address space the moment the MMU comes on? Identity-mapped at its physical load address, or relocated to a high-half base (ARM convention 0xFFFF_FFFF_8000_0000+)?
  2. Memory-type attributes. MAIR_EL1 carries up to eight named attribute encodings; the kernel must commit to which indices represent which memory types — at minimum normal cached for RAM and device-nGnRnE for MMIO — so that page-table entries can encode the right AttrIndx value.
  3. TLB-invalidation discipline. Every mapping mutation needs a matching TLB invalidate. Forgetting that step is a class-of-bug that produces stale-translation hazards which only surface under load. The HAL's Mmu trait currently returns Result<(), MmuError> from map / unmap — leaving the did-the-caller-flush? question to reviewer judgement. Should the trait surface make the responsibility unmissable?

The MMU-activation moment is itself a multi-step state-machine transition: the kernel runs at PA 0x4008_0000 with SCTLR_EL1.M = 0; we build page tables; we configure MAIR_EL1, TCR_EL1, TTBR0_EL1; we set SCTLR_EL1.M = 1; from the next instruction onwards the PC and every load go through the TLB. Getting any of those steps in the wrong order produces an instruction-fetch fault on the very next instruction — a class of hazard the 2026-05-06 B1 smoke regression taught the project to walk through with a §Simulation table before Accept. ADR-0027 is the first non-recovery-primitive state-machine ADR drafted under the write-adr skill §Simulation discipline — ADR-0026's table was the empirical retro-source for the rule (it landed alongside the rule's codification); ADR-0032's Propose commit was the first ADR drafted under the rule but its subject is a recovery primitive (the ipc_cancel_recv error-rollback path), so the rule's "multi-step state machine in the productive design" target lands here for the first time.

The decision is load-bearing for the next four ADRs the phase-b ledger reserves: ADR-0028 (address-space data structure) inherits the AddressSpace shape from this ADR's TTBR / page-table topology; ADR-0029 (initial userspace image format) inherits the kernel-vs-user VA boundary settled here; ADR-0030 (syscall ABI) inherits the page-fault / capability-grant story; ADR-0031 / future MMU follow-ups (ASID assignment, copy-on-write, huge pages) all build on the same layout.

Out of scope of ADR-0027 (deferred by reference, not relitigated): per-page flag updates (ADR-0009 §Open questions), huge-page block mappings as a first-class trait surface (block descriptors at L2 are used during bootstrap only), multi-core TLB shootdown, ASID assignment, copy-on-write, and translation-walk queries.

Decision drivers

  • Methodical pace. The project's standing rule (CLAUDE.md non-negotiable #6) is "minimum required surface per milestone". B2 needs MMU on with caching enabled and MMIO type-aware; it does not need userspace context-switch readiness. Choices that defer userspace-shape complexity are preferred when they are reversible without code surgery.
  • Future-userspace compatibility without user code today. B5 introduces userspace; that ADR (currently reserved as ADR-0030 for syscall ABI; the high-half migration would be a separate ADR slot) needs to swap TTBR0_EL1 per task. The B2 layout must not paint the project into a corner where adding the user half later requires rewriting the kernel layout.
  • Bounded unsafe surface. Page-table writes, system-register writes, and TLB invalidation all add to the audit-log. The choice of layout determines how many sites we audit; complex layouts (high-half + identity + transition + teardown) audit more sites than simple ones (identity-only).
  • Reproducible bootstrap. Boot-time page tables must live in a predictable, statically-sized region (ADR-0009's "frame allocation is the kernel's responsibility" principle, applied at the bootstrap moment when there is no PMM yet). The layout decision determines how many bootstrap frames are needed.
  • Standard practice across reference kernels. Linux, FreeBSD, NetBSD, seL4, and Hubris all enable the MMU early in boot; Linux/FreeBSD/NetBSD use a high-half kernel; seL4 uses a high-half kernel; Hubris (Cortex-M, no MMU) is not directly comparable. The "identity for now, high-half later" path is the Linux 0.x → 1.x evolution, also seen in NetBSD's evbarm port — well-trodden if we choose it.
  • Type-system safety for the mutation discipline. Rust gives us the #[must_use] attribute and unique-lifetime-bound types; we can compile-fail callers who forget to flush after a mapping mutation, instead of relying on reviewer attention. The cost is one new type per HAL trait return; the win is "TLB-invalidation forgotten" becoming a class of bug the type system rejects.
  • Compatibility with ADR-0009's Mmu trait. v1 trait surface: create_address_space / address_space_root / activate / map / unmap / invalidate_tlb_address / invalidate_tlb_all. ADR-0027 cannot break this surface; it can extend the return types of map / unmap and add helper types, but cannot retract or rename existing methods.

Considered options

  1. Option A — Identity-only mapping; no high-half migration in B2; Mmu::map / unmap keep their Result<(), MmuError> shape (no flush token). Smallest possible B2 surface. MMU is enabled; kernel image and MMIO are mapped at their physical addresses; TTBR1_EL1 left zero. TLB-invalidation discipline stays in code-comments and reviewer attention.
  2. Option B — Identity-only mapping; no high-half migration; Mmu::map / unmap return a typed MapperFlush token that the caller must .flush(mmu) or .ignore(). Same layout as Option A but adds the type-system-enforced flush discipline.
  3. Option C — Identity-mapped during bootstrap, jump to kernel high-half (0xFFFF_FFFF_8008_0000+), tear down identity in TTBR0_EL1; MapperFlush token included. The "Linux-on-aarch64" canonical shape, applied at B2.
  4. Option D — Identity-only mapping with the flush token; defer the high-half decision to a clearly-named future ADR (e.g., ADR-0033 "Kernel high-half migration"); commit to the layout shape that supports both today's identity model AND tomorrow's high-half migration without re-doing this ADR's body. Scoped middle ground: settles B2 at identity, settles the flush-token discipline (HAL surface change is permanent regardless of which way we go on identity-vs-high-half), and explicitly forward-flags the future migration as a known-and-named follow-up rather than an unknown-unknown. This is the chosen option (see Decision outcome below).

Decision outcome

Chosen option: Option D — Identity-only mapping for B2, MapperFlush typed flush-token discipline, high-half migration deferred to a clearly-named future ADR (ADR-0033 placeholder; opens when B5 userspace work surfaces the per-task TTBR0_EL1 swap requirement).

The decision splits the three sub-questions of the §Context:

(a) Layout — identity, with TTBR1 reserved

  • Translation regime. 4 KiB granule, 48-bit virtual addresses, four-level translation (L0 → L1 → L2 → L3). Locked by ADR-0009's PAGE_SIZE = 4096 constant and the QEMU virt Cortex-A72 target's 40-bit physical-address support (more than enough for 128 MiB of v1 RAM; future Pi 4 has 4–8 GiB and stays inside the 40-bit IPS range).
  • TTBR0_EL1 holds the bootstrap identity table. Identity-maps two regions:
    • 0x4000_0000 .. 0x4800_0000 (128 MiB RAM) as normal cached memory.
    • 0x0800_0000 .. 0x0920_0000 (18 MiB, 9 × 2 MiB block descriptors covering the GIC distributor + GIC CPU interface + PL011 UART; the block at 0x0900_0000..0x0920_0000 is an aligned superset of the UART's MMIO range, which actually ends near 0x0902_0000 — the extra slack falls in unmapped-but-decoded device space and is harmless under device-nGnRnE) as device-nGnRnE memory.
  • TTBR1_EL1 = 0 (effectively disabled by TCR_EL1.EPD1 = 1). Kernel runs at TTBR0_EL1-mapped identity addresses for the entire B2 milestone. The reservation is structural, not active.
  • MAIR_EL1. Two attribute encodings:
    • Index 0: device-nGnRnE (0x00) — non-Gathering, non-Reordering, non-Early-write-acknowledgement; the strictest device-memory mode, appropriate for GIC + UART control registers.
    • Index 1: normal cached, write-back, write-allocate, inner+outer shareable (0xFF) — the mainline RAM attribute.
    • Indices 2–7 reserved (zero-initialised) for future ADR extension (e.g., index 2 = device-GRE for less-strict device regions; index 3 = normal-uncached for MMIO-DMA buffers).
  • TCR_EL1. T0SZ = 16 (48-bit VA), TG0 = 0b00 (4 KiB granule), IPS = 0b010 (40-bit IPA), SH0 = 0b11 (inner shareable), IRGN0 = ORGN0 = 0b01 (write-back write-allocate cacheable for page-table walks), EPD0 = 0 (translations enabled), EPD1 = 1 (TTBR1_EL1 walks disabled in v1; flipped in the future high-half ADR).
  • SCTLR_EL1. M = 1 (MMU on), C = 1 (D-cache enabled), I = 1 (I-cache enabled). All other bits left at their reset / pre-existing values.
  • ASID. Single global address space in v1; TCR_EL1.AS = 0 (the AS field selects the ASID size0 = 8-bit ASIDs, 1 = 16-bit; v1 selects 8-bit and leaves TTBR0_EL1.ASID = 0 as the only ASID value used). TCR_EL1.A1 = 0 (the A1 field selects which TTBR holds the ASID — 0 = TTBR0_EL1.ASID, 1 = TTBR1_EL1.ASID; v1 keeps A1 = 0 because only TTBR0_EL1 is active and the TTBR1_EL1-swap discipline does not apply yet). All bootstrap mappings are global (ARM nG bit clear → MappingFlags::GLOBAL semantics — the v1 MappingFlags::GLOBAL already exists per ADR-0009, so no HAL surface change is needed here). Per-task ASID value assignment lands with the future high-half ADR (ADR-0033) which will additionally need to flip A1 = 1 if the user-half is moved to TTBR1_EL1-swap shape; that decision is the high-half ADR's, not this ADR's.

(b) Memory-type discipline — MAIR + MappingFlags

  • MappingFlags::DEVICE (already in ADR-0009) maps to MAIR index 0 (device-nGnRnE). The BSP's Mmu::map implementation is responsible for translating flags.contains(DEVICE) into the right AttrIndx value in the page-table entry.
  • Normal RAM is the implicit default when DEVICE is not set: MAIR index 1.
  • This is a one-bit discrimination today. When richer device modes (write-combining, non-cacheable RAM for DMA buffers) are needed, a future ADR introduces a MemoryType enum field on MappingFlags (or a new mapping_type parameter on Mmu::map) and adds the corresponding MAIR indices. Until then, the BSP's translation table is DEVICE → 0, !DEVICE → 1.

(c) Mutation discipline — MapperFlush typed token

  • Mmu::map and Mmu::unmap change return type from Result<(), MmuError> to Result<MapperFlush, MmuError> (and Result<(MapperFlush, PhysFrame), MmuError> for unmap, preserving the unmapped frame the current API returns).
  • MapperFlush is a #[must_use] newtype carrying a VirtAddr that is consumed by either flush(mmu: &impl Mmu) (which executes mmu.invalidate_tlb_address(va) on the held address) or ignore() (a documented no-op for callers performing bulk operations who will issue a single invalidate_tlb_all afterwards). Forgetting to handle the token is a unused_must_use lint failure — the project's workspace lint config promotes this to a deny in the kernel crate.
  • Escape hatches are deliberate, documented, and rare. mem::forget(token), ManuallyDrop::new(token), and let _ = expr silence the #[must_use] lint without invoking flush() or ignore(); they exist because Rust gives no way to make a token literally impossible to drop. The discipline is enforced at the type level (the lint catches the common-case "I forgot"), not at the language level (motivated escape is always available). Mirrors the x86_64::structures::paging::MapperFlush precedent. Reviewers should challenge any mem::forget / ManuallyDrop / let _ = ... against MapperFlush in code review; the project does not currently use them anywhere.
  • The token does not bind the minting Mmu instance. MapperFlush::flush(self, mmu: &impl Mmu) accepts any Mmu implementation; in v1 this is fine because there is exactly one QemuVirtMmu instance. When B3+ introduces multiple per-task AddressSpace values (and multi-CPU in Phase C introduces per-core Mmu instances) the token shape may need to grow a lifetime / instance-identity parameter — flagged here for the future trait extension, not a v1 concern. The B3 / B5 / Phase-C ADRs that introduce those topologies will revisit.
  • The surface is additive in the ADR-0017 sense: the existing Mmu::activate / invalidate_tlb_address / invalidate_tlb_all methods stay byte-stable; only map / unmap return-types grow. ADR-0009 §Revision notes records the additive change. No callers in v1 use these methods yet (B2 is the first consumer); the API breakage cost is zero callers today.

Why Option D beats the alternatives

  • Beats Option A: Option A skips the flush token, leaving TLB-invalidation discipline to reviewer attention. The 2026-05-06 B1 smoke regression's "what we learned" lesson (codified in the §Simulation rule) is type-system-enforce the discipline where you can; the token is the type-system-side enforcement of the same discipline at the MMU surface. Option A is fast-to-write but gives up a free correctness win.
  • Beats Option B: Option B is almost this option — same layout, same flush token — but does not name the future high-half migration explicitly. Option D adds the named-future-ADR forward-flag so a B5 reader does not need to reverse-engineer "wait, where does kernel live in user-half VA?" by reading commit history. The cost is one paragraph of documentation (this section); the win is reader-affordance for the next year of the project.
  • Beats Option C: Option C lands the high-half migration now, in B2. The implementation cost is significant (linker-script AT > RAM discipline, two-stage early-boot stub, identity teardown after jump-to-high-half, and more unsafe audit entries — minimum 4 new entries vs Option D's minimum 2). The benefit (no future ADR) is real but premature: B2's userspace surface is empty; B3 / B4 work does not need the high-half. Per CLAUDE.md non-negotiable #6 ("methodical, phased progress"), B5 is the natural moment to introduce high-half because that is when TTBR0_EL1-swap becomes load-bearing. Doing it in B2 pre-pays the cost without obtaining the benefit until B5.

Simulation

The MMU-activation moment is the worst-case interaction. The table walks the kernel from "MMU off, running at PA 0x4008_0000" to "MMU on, running at the same identity-mapped VA, with caching active" under the chosen Option D shape:

Step State pre Action State post Switch target / observable effect
0 SCTLR_EL1.M = 0; PC at PA 0x4008_NNNN; caches off; TTBR0_EL1 undefined; MAIR_EL1 undefined; TCR_EL1 undefined Reserved page-table frames at PAs __boot_pt_l0__boot_pt_l2_high exist in .boot_pt (statically allocated, pre-zeroed by _start's BSS loop because .boot_pt is bracketed by __bss_start/__bss_end); kernel begins mmu_bootstrap Rust function unchanged
1 as Step 0; bootstrap page-table frames are zero Populate L0[0] = table-pointing-at-L1 (with Type=table, Valid=1); L1[0] = table-pointing-at-L2_low (for MMIO range); L1[1] = table-pointing-at-L2_high (for RAM range); L2_low[64..73] = 9 block-descriptors covering 0x0800_0000..0x0920_0000 (GIC distributor + GIC CPU interface across indices 64..72; PL011 UART at index 72) with AttrIndx=0 (device-nGnRnE), AP=00 (kernel R/W), SH=00 (non-shareable for device), AF=1, nG=0; L2_high[0..64] = 64 block-descriptors covering 0x4000_0000..0x4800_0000 with AttrIndx=1 (normal cached), AP=00, SH=11 (inner shareable), AF=1, nG=0 All 4 bootstrap frames populated; PC still at PA; MMU still off — (range notation [start..end] is half-open Rust-style throughout this ADR; [64..73] means indices 64, 65, …, 72 — 9 entries)
2 bootstrap frames populated; MMU off MSR MAIR_EL1, mair_value (encoding device + normal); MSR TCR_EL1, tcr_value (T0SZ=16, TG0=0, IPS=2, SH0=3, IRGN0/ORGN0=1, EPD0=0, EPD1=1); MSR TTBR0_EL1, &__boot_pt_l0; MSR TTBR1_EL1, 0; ISB system regs configured; MMU still off (SCTLR.M=0); ISB ensures the system-register writes are observed before any MMU enable
3 system regs configured; MMU off TLB invalidate (TLBI VMALLE1) + DSB ISH (ensure invalidate completes) + IC IALLU (invalidate I-cache) + DSB ISH + ISB; then MRS x0, SCTLR_EL1; set bits M, I, C (and clear bits we explicitly want zero); MSR SCTLR_EL1, x0; ISB SCTLR_EL1.M = 1; ISB drains the pipeline so the next instruction-fetch goes through the freshly-installed translation regime; PC still at PA 0x4008_NNNN (which is identity-mapped, so the translation walks succeed and yield the same PA) Critical step: any error here (typo'd page-table entry, wrong attribute index, off-by-one VA range, missing AF access flag, EPD0 accidentally 1) produces a Translation Fault on the very next instruction-fetch — caught by either a CPU exception (if the vectors are installed and the fault is a kernel-mode synchronous exception) or by QEMU -d int,unimp,guest_errors reporting a fault. The §Simulation table is itself the list of things to triple-check before flipping the bit.
4 MMU on; PC at identity-mapped PA; caches on; bootstrap mappings live mmu_bootstrap returns to kernel_entry's caller; rest of kernel proceeds with MMU active unchanged The Rust-side kernel from this point onwards observes (a) memory accesses to the RAM range have the cache attributes for normal-cached, write-back, write-allocate; (b) accesses to GIC + UART have device-nGnRnE semantics, no speculative read, no merging; (c) any subsequent Mmu::map / unmap returns a MapperFlush token the caller must explicitly discharge.

Why DSB ISH rather than DSB NSH in Step 3? ISH (Inner Shareable) drains across all cores in the inner-shareable domain; NSH (Non-Shareable) drains only the local core. v1 is single-core, so NSH would be functionally sufficient and marginally cheaper. The ISH choice is forward-compatible with the eventual SMP boot (ADR-0008 §Open questions; Phase C): when secondary cores come up, the same boot sequence runs on each core, and the TLBI VMALLE1 + DSB ISH + IC IALLU pattern naturally extends to cross-core invalidation without re-litigating the barrier scope. The cost of ISH over NSH on a single-core boot is sub-microsecond and unmeasurable; the win is "single barrier discipline across boot and SMP" rather than "v1 uses NSH, then we re-write to ISH for Phase C". Standard practice on Linux aarch64 (arch/arm64/mm/proc.S) for the same reason.

The 5-step shape is identical to Linux's aarch64 boot's __cpu_setup__primary_switch flow modulo the high-half jump (which Option D omits in v1). Simulation rows 2 and 3 are the two failure-class moments; row 4 documents the steady-state contract that B3+ MMU work inherits.

Dependency chain

For this decision to be fully in effect:

1. Extend [`hal::mmu`](../../hal/src/mmu/mod.rs) with the `MapperFlush`
   typed flush token; change `Mmu::map` / `Mmu::unmap` return types
   to thread the token. Update the in-tree `test-hal` impl
   (currently `tyrne-test-hal::TestMmu`) to return tokens. — T-016
   (Draft, opens with this ADR)
2. Implement `QemuVirtMmu` in [`bsp-qemu-virt/src/mmu.rs`](../../bsp-qemu-virt/src/mmu.rs)
   covering `Mmu::create_address_space` / `address_space_root` /
   `activate` / `map` / `unmap` / `invalidate_tlb_*` for VMSAv8. — T-016
3. Reserve the four bootstrap page-table frames in `bsp-qemu-virt/linker.ld`
   as a new `.boot_pt` section (`PAGE_SIZE`-aligned, sized for L0 + L1
   + L2_low + L2_high; bracketed by `__boot_pt_start` / `__boot_pt_end`
   linker symbols; placed inside `.bss` so the existing BSS-zero loop
   pre-zeros them). — T-016
4. Implement `mmu_bootstrap` in `bsp-qemu-virt/src/main.rs` (or a
   dedicated `bsp-qemu-virt/src/mmu_bootstrap.rs` module): populate
   the four boot frames per the §Simulation §Step 1 layout, configure
   `MAIR_EL1` / `TCR_EL1` / `TTBR0_EL1` / `TTBR1_EL1`, perform the
   TLB + I-cache invalidate + barrier sequence of §Step 3, then flip
   `SCTLR_EL1.{M,I,C}`. Called once by `kernel_entry` immediately
   after the `cpu.now_ns()` boot-snapshot and before any MMIO-touching
   step (timer banner and GIC initialisation alike, so both
   their UART / GIC-distributor writes go through the
   device-attribute mapping). The full kernel_entry order is:
   `cpu.now_ns()` → `mmu_bootstrap()` → "tyrne: mmu activated" print
   → GIC init → timer banner → demo. — T-016
5. Add audit-log entries: UNSAFE-2026-0022 (page-table frame writes
   in `mmu_bootstrap`), UNSAFE-2026-0023 (`MAIR_EL1` / `TCR_EL1` /
   `TTBR0_EL1` / `TTBR1_EL1` / `SCTLR_EL1` writes), UNSAFE-2026-0024
   (`TLBI` / `IC IALLU` / `DSB` / `ISB` asm), UNSAFE-2026-0025
   (per-call `Mmu::map` / `unmap` page-table entry writes inside
   `QemuVirtMmu`). Per [unsafe-policy.md §3](../standards/unsafe-policy.md),
   each entry includes Operation / Invariants / Rejected alternatives. — T-016
6. Update [`docs/architecture/memory-management.md`](../architecture/memory-management.md)
   (new file landing in this PR) with the layout diagram, MAIR
   table, page-table topology, and `MapperFlush` discipline.
   Cross-linked from [ADR-0009](0009-mmu-trait.md), [ADR-0027](0027-kernel-virtual-memory-layout.md),
   and [ADR-0012](0012-boot-flow-qemu-virt.md) §Open questions
   (where the "Boot-time MMU activation" entry now resolves to this
   ADR). — T-016
7. Add host tests for the page-table descriptor encoding helpers
   (`block_descriptor(pa, attr_index, ap, sh, af, ng)` and friends);
   the asm-level transition is QEMU-smoke verified. Tests live in
   `bsp-qemu-virt/src/mmu.rs::tests` under `#[cfg(test)]`. — T-016
8. Update `current.md` headline + `phase-b.md` ADR ledger row.
   Closure trio is **not** required for T-016 in isolation; T-016
   `Done` flips on (cargo gates + miri + smoke unchanged); the B2
   *milestone* closure trio runs when the milestone closes (after
   any follow-on B2 tasks land). — T-016

The first task (T-016) covers steps 1 through 8. No further task is opened by this ADR. T-016 is a single bundled task — the same shape as T-012 (which bundled GIC + IVT + asm trampolines + timer-IRQ in one task). The implementation may land across multiple commits within T-016's scope; splitting into T-016a / T-016b is permitted if the scope grows past one reviewable task per phase-b.md precedent.

T-016's Done flip gates only on its own DoD (host-tests + miri + clippy + kernel-build + smoke-trace-byte-for-byte-unchanged-pre-MMU plus a new trace line "tyrne: mmu activated" or equivalent confirming the post-MMU steady state); it does not require a closure trio.

ADR-0033 (high-half kernel migration) placeholder. A future ADR will introduce the high-half kernel mapping when B5 userspace work surfaces the per-task TTBR0_EL1 swap requirement. The placeholder slot is reserved (per ADR-0025 §Rule 1, no T-NNN is opened today because no implementation work depends on it before B5). When B5's first userspace-driven scheduling event arrives, ADR-0033 opens with a §Simulation table walking the high-half migration's own multi-step transition (build TTBR1_EL1 high-half tables → enable EPD1 → jump to high-half via absolute load → null TTBR0_EL1 / EPD0=1 → continue at high-half).

ADR-0034 (kernel-image section permissions) placeholder. A future ADR will introduce per-section permissions on the kernel image — .text mapped read+execute, .rodata mapped read-only, .bss / .data mapped read+write — by re-mapping the kernel-image region into 4 KiB pages with section-specific MappingFlags. v1 maps the entire 128 MiB RAM range (including the kernel image) as kernel R/W/X via 2 MiB blocks per §Decision outcome (a); finer-grained permissions are deferred because (i) v1 has no userspace that could observe .text writability, (ii) the linker-script section boundaries are not 2 MiB-aligned and the re-map requires block-split logic which T-016 explicitly leaves out of scope, (iii) the discipline win is real but the v1 attack surface that benefits is empty. The placeholder slot is reserved (no T-NNN today; opens with the first B-phase task whose threat model includes a kernel R/W of .text as a meaningful surface — likely paired with the B5+ first userspace destroy that introduces an attacker-controlled execution context). T-016 §Out of scope and memory-management.md §"v1 layout" already mention the deferral; this placeholder gives the deferral a named ADR slot the way ADR-0033 does for high-half migration.

Consequences

Positive

  • MMU on with caching enabled. B3+ kernel work benefits from D-cache + I-cache active, and from MAIR-attribute-aware MMIO accesses (no more relying on QEMU's MMU-off semantics being lenient about device-vs-RAM mixing).
  • Type-system-enforced TLB-invalidation discipline. The MapperFlush token converts "did you remember to flush?" from a reviewer-attention concern into a unused_must_use lint failure. Pattern-of-record for future HAL traits where mutation requires a follow-up step.
  • Userspace-readiness inherited for free. TTBR0_EL1 already holds the kernel mapping; the future high-half ADR moves kernel to TTBR1_EL1 and reuses TTBR0_EL1 for per-task user mappings. The boundary is documented today.
  • Bounded unsafe surface. Four new audit entries (UNSAFE-2026-0022 through 0025), all narrowly-scoped per unsafe-policy.md §1. The same audit pattern T-012 used (one entry per concern) applies.
  • Bounded bootstrap frame budget. Four 4 KiB frames (16 KiB total) for the entire boot-time mapping. Statically reserved in .boot_pt; no kernel allocator dependency for the bootstrap moment.
  • Smoke-trace continuity. The kernel image at PA 0x4008_0000 continues to run at the same address pre- and post-MMU; no PC relocation, no linker-script AT > RAM discipline, no two-stage boot. The QEMU smoke trace adds at most one new line ("tyrne: mmu activated") and is otherwise byte-identical to the post-T-015 baseline. Clean regression-detection.

Negative

  • The high-half migration is deferred, not skipped. The future ADR-0033 will require linker-script changes, a brief identity-bootstrap-and-jump dance, and audit-log Amendments. Real cost; we accept it because the v1 useful work in B3 / B4 does not need the high-half today and the methodical-pace principle outranks the "do it once" gut instinct. Mitigation: ADR-0033 is named in this ADR's §Dependency chain explicitly, and the §Simulation table for that ADR will be drafted under the same write-adr §Simulation rule, so the migration's complexity gets the same scrutiny.
  • MappingFlags::USER is meaningful but unreachable in v1. ADR-0009 defines USER as one of the five flag bits. With EPD1 = 1 and only TTBR0_EL1 populated by kernel-only mappings, no user-permission entry exists in v1. The flag's translation to VMSAv8 AP[1] (unprivileged-access) bits is implemented in QemuVirtMmu::map (the BSP knows how to translate it) but never exercised. Mitigation: a host test in bsp-qemu-virt/src/mmu.rs::tests exercises the encoding for USER-bearing flags so the encoder is correct when B5 needs it.
  • Single MAIR attribute per memory class. Index 0 is locked to device-nGnRnE; index 1 is locked to normal-cached-WB-WA-IS. Future memory types (write-combining, normal-uncached, device-GRE) require either re-allocating MAIR indices (back-compat hazard) or extending MappingFlags with a MemoryType discriminant. Mitigation: the unused MAIR indices 2..7 are reserved by this ADR for that purpose; future ADR adds the encoding without touching indices 0 / 1.
  • Bootstrap page-tables use 2 MiB block descriptors at L2. The HAL trait promises 4 KiB granularity; the bootstrap takes a shortcut (block descriptors) for the boot-time identity mapping because subdividing 128 MiB of RAM into 32 768 4 KiB pages is wasteful and unnecessary. Mitigation: the shortcut is BSP-internal, not exposed via the trait; if any post-MMU code wants to remap a sub-2-MiB region inside an existing block, the BSP's Mmu::map implementation must split the block (a known follow-up; out of scope for T-016 since v1 has no caller exercising it).
  • Token discipline imposes a small ergonomic cost on every map/unmap caller. Each call now ends with flush.flush(mmu) or flush.ignore(). Mitigation: the discipline is the win — the cost is the readability of mandatory flushes. Future helper macros (map_and_flush!) can sugar the common path if the noise becomes excessive.

Neutral

  • No change to ADR-0017's IPC primitive set. The MMU surface is internal infrastructure; user-observable IPC primitives (send / recv / notify) are untouched. ADR-0017 §Revision notes does not need a rider.
  • No change to SchedError / IpcError taxonomies. MMU faults raise CPU exceptions handled by T-012's vector table; they do not surface as scheduler / IPC errors in v1. A future ADR (preemption / fault-handling ABI) defines how MMU faults from userspace map to capability-system errors.
  • Bootstrap mmu_bootstrap runs once per boot. It is not part of Mmu trait. The trait's create_address_space / activate are for post-bootstrap address-space management (dynamic mappings, B3+); bootstrap is BSP-internal.
  • No new ADR governance burden. This ADR follows the write-adr skill §Simulation discipline (codified in commit 77a578a); the §Dependency chain section satisfies ADR-0025 §Rule 1 (every forward-reference is grounded in T-016 which opens with this ADR's Propose commit).

Pros and cons of the options

Option A — Identity-only, no flush token

  • Pro: Smallest possible B2 surface; the fewest moving parts.
  • Pro: Easy to review; no HAL surface change.
  • Con: Skips the type-system-enforced flush discipline; "did you remember to flush?" stays in reviewer attention.
  • Con: When B5 high-half migration lands, it has to add the flush token then, with all existing post-MMU callers needing their return-type-handling updated. Two waves of API churn instead of one.

Option B — Identity-only, with flush token

  • Pro: Everything Option A has, plus the flush discipline win.
  • Con: Does not name the future high-half migration explicitly; a B5 reader reverse-engineers the future from commit history.
  • Con: Marginally larger ADR scope (the flush-token discussion).

Option C — Identity + high-half + identity teardown, with flush token

  • Pro: One-shot B2 commitment; no future migration ADR.
  • Pro: Standard "Linux on aarch64" shape; reference-kernel parity.
  • Con: Implementation cost — linker-script AT > RAM, two-stage boot (early stub at low PA + main kernel at high VA), absolute-address jump after SCTLR.M=1, identity teardown after the jump. ~2× the asm and ~2× the audit-log entries of Option A/B/D.
  • Con: Premature optimisation — B3/B4 do not need the high-half; B5 is the natural moment.
  • Con: Pre-pays the cost without obtaining the benefit until B5.

Option D — Identity-only with flush token, named-future high-half ADR (chosen)

  • Pro: All of Option B's benefits.
  • Pro: B5-readiness without B2 cost: the layout supports future high-half (TTBR1 reservation; MAIR reservation; ASID-zero global mappings), and the future ADR's slot is named.
  • Pro: ADR-0033 placeholder gives a B5 reader a clear forward-pointer.
  • Con: Adds one named-but-not-yet-opened ADR slot to the project's mental load. Mitigation: the slot is named, not allocated; per ADR-0025 §Rule 1, no T-NNN is opened today, and the ADR-0033 file does not exist until B5 surfaces the requirement (mirrors the ADR-0023 placeholder pattern, which has the file but explicitly Deferred status).

Revision notes

  • 2026-05-22 — ADR-0030 / ADR-0031 §Context references are reserved slots, not yet-filed files. §Context names ADR-0030 (syscall ABI) and ADR-0031 (MMU follow-ups / ASID assignment) as future ADRs. These are reserved slot numbers — the phase-b.md §B5 ADR ledger formally reserves ADR-0030 (Syscall ABI) and ADR-0031 (Initial syscall set / MMU follow-ups) — not claims that files exist today. No docs/decisions/0030-*.md or 0031-*.md file exists yet; both open with B5 userspace work. This matches the §Dependency-chain treatment of ADR-0033 / ADR-0034 (named-but-unallocated placeholders). If the syscall ABI eventually lands under a different number, the §Context references here and in ADR-0017 / ADR-0028 / ADR-0029 are the ones to update.

References

  • ADR-0009 — Mmu HAL trait signature (v1) — the trait this ADR extends with the MapperFlush token return type.
  • ADR-0012 — Boot flow and memory layout for bsp-qemu-virt — §Open questions "Boot-time MMU activation" resolves here.
  • ADR-0024 — EL drop to EL1 policy — the kernel runs at EL1 when MMU activates; SCTLR / MAIR / TCR / TTBR0 are EL1 system registers.
  • ADR-0025 — ADR governance amendments — §Rule 1 (forward-reference contract) governs T-016's opening alongside this ADR's Propose commit.
  • ADR-0026 — Idle dispatch via separate fallback slot — §Simulation table is the empirical source of the §Simulation discipline this ADR applies forward.
  • ADR-0032 — Endpoint state rollback + ipc_cancel_recv — first ADR drafted under the §Simulation rule (recovery-primitive subject); this ADR is the first non-recovery-primitive state-machine ADR drafted under the same rule.
  • docs/architecture/memory-management.md — landing in this PR; synthesises the layout in narrative + diagram form.
  • docs/audits/unsafe-log.md — UNSAFE-2026-0022 through 0025 land with T-016.
  • docs/standards/unsafe-policy.md — the audit-discipline contract every new entry follows.
  • ARM Architecture Reference Manual (ARMv8-A), ARM DDI 0487 — §D5.2 (VMSAv8 translation), §D5.3 (page-table entry formats), §D5.5 (memory attributes), §D7 (system registers SCTLR_EL1, TCR_EL1, TTBRn_EL1, MAIR_EL1).
  • Linux aarch64 boot — arch/arm64/kernel/head.S + arch/arm64/mm/proc.S (__cpu_setup, __primary_switch) — prior art for the MMU-enable transition; the §Simulation rows 2 / 3 / 4 mirror Linux's __primary_switch shape modulo the high-half jump Option D defers.
  • seL4 — src/arch/arm/64/kernel/boot.c (init_freemem, init_kernel) — capability-aware kernel that uses identity-mapped early boot before transitioning; Tyrne adopts the identity-only steady state for B2.
  • x86_64::structures::paging::MapperFlush — Rust ecosystem prior art for the typed flush token; same shape adopted here for the aarch64 Mmu trait.