A self-contained RV32I/RV32F SoC with a UART-tethered boot flow, local instruction/data memories, and a pipelined floating-point unit. The core targets the Digilent PYNQ-Z1 (Zynq-7020) and ships with software, tests, and build scripts to take the design from simulation to FPGA.
- Four-stage in-order pipeline: fetch, decode, execute/memory, and writeback, with single-cycle forwarding and centralized stall/kill control.
- ISA support: RV32I + CSR (
tohost) plus an RV32F subset (add/sub, mul, fused multiply-add, sign-inject, moves, int↔fp convert). - Multi-stage FPU with explicit pipeline latency tracking; the integer pipeline stalls while FP results retire to preserve precise state.
- Memory-mapped UART console for boot, file loading, and debug plus MMIO performance counters (cycle, instruction, branch, branch-correct).
- On-chip memories only: BIOS ROM, IMEM, and DMEM inferred as block RAMs with byte-enable writes.
- PYNQ-Z1 top level with PLL-based clock generation, debounced buttons/switch synchronizers, and board pin constraints.
hardware/src/— RTL for the core (riscv_core), FPU (execute/fpu), local memories, UART, clocking, and PYNQ top (z1top.v).hardware/sim/— iverilog/VCS testbenches for ISA, C/assembly suites, BIOS, UART parsing, MMIO counters, and small benchmarks.hardware/scripts/— Vivado Tcl for synth/impl, UART hex loader (hex_to_serial), CPI/FOM helpers, and FPGA control.hardware/run_all_sims— Python wrapper to run the full regression suite (ISA + C tests + directed benches) and capture logs.software/— BIOS, UART utilities, 151 library (MMIO helpers), ISA tests, C micro-tests, and benchmarks (mmult,fpmmult,bsort,ssort,bdd,echo,uart_parse, plus reduced-sizesmall/variants).docs/— CPU datapath diagram (fa25_ee151_cpu_diagram.drawio.png) and checkpoint notes.
- Fetch: parameterized PC with BIOS vs IMEM muxing; synchronous instruction memories.
- Decode: instruction decode, immediate generation, hazard detection, and control-flow kill injection.
- Execute / Memory: ALU, branch comparator, forwarding network, memory address generation, DMEM/IMEM write masking, and FPU issue/retire logic.
- Writeback: load sign/zero extension, MMIO data muxing, CSR writes, and integer/FP register writeback.
A tohost CSR (0x51E) is implemented for ISA and C testbench completion signaling.
- Dedicated FP register file (3R/1W).
- Stage 1 combinational ops (mul, sign-inject, moves, int→fp convert).
- Pipelined add/sub alignment/normalize and FMA paths.
- FP operations assert pipeline stalls until completion to maintain correct forwarding and precise architectural state.
- Free-running cycle, instruction, branch, and branch-correct counters.
- Store to
0x8000_0018resets all counters to zero. - Instruction counter increments only on committed (non-bubble, non-killed) instructions.
| Address | Function |
|---|---|
0x4000_0000 |
BIOS ROM (read-only, initialized from software/bios/bios.hex) |
0x1000_0000 |
IMEM base (64 KiB window, word-addressed) |
0x1000_0000 |
DMEM base (64 KiB window, separate RAM with byte enables) |
0x8000_0000 |
UART ctrl (bit0: TX ready, bit1: RX valid) |
0x8000_0004 |
UART RX data (low 8 bits) |
0x8000_0008 |
UART TX data (store byte) |
0x8000_0010 |
Cycle counter (load) |
0x8000_0014 |
Instruction counter (load) |
0x8000_0018 |
Counter reset (store) |
0x8000_001c |
Branch counter (load) |
0x8000_0020 |
Branch-correct counter (load) |
Address partitioning note: For loads/stores, the top nibble of the address selects the target (DMEM, IMEM write-only, BIOS read-only, or MMIO). The same numeric address may refer to different physical memories depending on whether it is used as a PC (instruction fetch) or a data address.
software/151_library: UART helpers, MMIO counter accessors, ASCII/string utils, and type defs.software/bios: UART command shell (file,jal,lw/lhu/lbu,sw/sh/sb) used to load hex images and jump to user code.- Benchmarks and tests (all build to
.hexviariscv64-unknown-elf-*):- Integer:
mmult,bsort,ssort,bdd - Floating point:
fpmmult small/reduced-size mirrors of the above for faster simulationc_tests/micro-programs (fib, sum, strcmp, cachetest, vecadd, replace)asm/directed assembly,echo/,uart_parse/, andriscv-isa-tests/harness
- Integer:
Typical build pattern:
# Build BIOS and a sample workload
make -C software/bios
make -C software/mmult
make -C software/fpmmult
# Build reduced-size images for simulation
make -C software/small/mmult
make -C software/small/fpmmult- Lint:
make -C hardware lint - Single testbench:
make -C hardware sim/cpu_tb.fst(iverilog) ormake -C hardware sim/cpu_tb.vpd(VCS). Logs land inhardware/sim/. - Full regression:
cd hardware && ./run_all_sims --simulator iverilog(pass--simulator vcsif available). Results stored undertest_results/. - ISA tests only:
make -C hardware isa-tests; C tests:make -C hardware c-tests. - Synthesis/implementation:
make -C hardware synththenmake -C hardware impl. Generated bitstream:hardware/build/impl/z1top.bit. - Program FPGA:
make -C hardware program(usesprogram.tclwith the generated bitstream).make -C hardware screenopens a 115200 baud UART session. - Clean:
make -C hardware clean-simormake -C hardware clean-build.
- Program the bitstream (
make -C hardware program) so the BIOS ROM is baked in. - Open the UART console (
make -C hardware screen). - Load a hex image over UART from the host. The helper script accepts a base address:
# Example: write a program into both IMEM and DMEM, then run it from IMEM hardware/scripts/hex_to_serial software/mmult/mmult.hex 30000000 - At the BIOS prompt (
151>), jump to your program:jal 10000000. - Use BIOS commands to inspect memory (
lw/lhu/lbu), write words/halfs/bytes (sw/sh/sb), or load additional images (file <addr> <len>). - UART status/data are visible at
0x8000_0000/4/8; performance counters at0x8000_0010/14/1c/20(store to0x8000_0018to clear).
- RTL includes simulation stubs (
hardware/stubs) and behavioral models (hardware/sim_models) for PLL/BUFG when running outside Vivado. - Board-level constraints for the PYNQ-Z1 live in
hardware/src/z1top.xdc. - The BIOS hex is pulled in with
$readmemh; keepsoftware/bios/bios.hexup to date before synthesis. - Default CPU clock is derived from
CLK_125MHZ_FPGAvia the PLL inclocks.v/z1top.v.