TeraNoC is a core-to-L1-memory Network-on-Chip (NoC) design aimed at area-efficiently scaling manycore clusters to thousands of cores, sharing multi-megabytes of L1 scratchpad memory. It is an open-source, hybrid mesh–crossbar on-chip interconnect that offers both scalability and low latency, while maintaining very low routing overhead.
A key challenge in on-chip interconnect design is to scale up bandwidth while maintaining low latency and high area efficiency.
- 2D meshes scale with low wiring area and congestion overhead; however, their end-to-end latency increases with the number of hops, making them unsuitable for latency-sensitive core-to-L1-memory access.
- Crossbars, offer low latency, but their routing complexity grows quadratically with the number of I/Os, requiring large physical routing resources and limiting area-efficient scalability.
This two-sided interconnect bottleneck hinders the scale-up of many-core, low-latency, tightly coupled shared-memory clusters — pushing designers toward using many smaller, loosely coupled clusters, which introduces both hardware and software overhead.
TeraNoC adopts a hybrid Mesh–Crossbar topology, combining the scalability of 2D meshes with the low latency of crossbars.
-
A hybrid Mesh–Crossbar topology that combines the low latency of fully combinational logarithmic crossbars with the scalability of 2D meshes. It supports low-latency, word-width, fine-grained multi-channel memory access — enabling efficient scale-up of shared-memory clusters, while remaining compatible with hierarchical physical design methodologies.
-
A router remapper that redistributes traffic load across available channels to fully exploit multi-channel bandwidth.
-
A configurable number of read/write request channels to maximize utilization of available physical wiring resources.
-
A physical-design-aware architecture that simplifies multi-channel NoC implementation; channels in the same direction can be bundled for routing, easing both floorplanning and timing closure.
- This repository is originally branched from the MemPool project, which uses a crossbar-based hierarchical interconnect to scale up shared-memory clusters.
- TeraNoC integrates 2D mesh router designs based on the FlooNoC project.
- Group-to-group (intra-cluster) routers are designed as fine-grained 32-bit mesh routers.
- Cluster-to-main-memory routers use 512-bit AXI-compatible FlooNoC routers.
This repository includes both the hardware and software components of TeraNoC, along with infrastructure for compilation and simulation.
By default, TeraNoC is configured to support 1024 cores sharing 4096 banks of 1 KiB L1 memory (totaling 4 MiB).
Hardware configurations can be modified in config/config.mk.
-
config/Global configurations used by both software and hardware. -
hardware/Contains RTL source code and simulation scripts. -
scripts/Utility scripts for tasks such as linting and formatting. -
software/Example applications and the runtime. -
toolchain/Third-party tools and packages:halide/– Compiler infrastructure for the Halide language.llvm-project/– LLVM compiler infrastructure.riscv-gnu-toolchain/– RISC-V GCC compiler.riscv-isa-sim/– Extended version of Spike, used as the golden model and for parsing simulation traces.riscv-opcodes/– Extended version of riscv-opcodes, including custom image processing extensions.verilator/– Open-source RTL simulator Verilator.
Make sure you clone this repository recursively to include all necessary submodules:
git submodule update --init --recursiveIf any submodule repository paths change, sync your local configuration using:
git submodule sync --recursiveTeraNoC-based cluster design requires patching a few hardware dependencies. To update and patch them, run the following from the project root:
make update-depsThe TeraNoC-based manycore cluster requires at least the RISC-V GCC toolchain to compile applications. It also supports LLVM, which is a dependency for compiling Halide — a domain-specific language for image processing built on top of C++.
To build these toolchains, run:
# Build both GCC, LLVM, and SPIKE simulator
make toolchain
# Build only GCC
make tc-riscv-gcc
# Build only LLVM
make tc-llvm
# Build Halide
make halide
# Build only SPIKE, simulation tracing relies on the **SPIKE** simulator. To build it:
make riscv-isa-sim
# Build only Bender, we use Bender to generate simulation scripts.
make benderTeraNoC also supports simulation using open-source Verilator (Optional). To build Verilator:
make verilatorℹ️ Note: LLVM is required to build Verilator.
Example applications are located under software/apps. Halide-based applications can be found in software/apps/halide, and OpenMP-based applications in software/omp.
To build an application (e.g., hello_world), use:
# Bare-metal applications
cd software/apps/baremetal/
make hello_world
# Halide applications
cd software/apps/halide
make matmul
# OpenMP applications
cd software/apps/omp
make omp_parallelYou can specify the compiler using the COMPILER variable. Supported options are gcc (default) and llvm:
# Compile using LLVM instead of GCC
make COMPILER=llvm hello_worldApplications using the Xpulpimg extension should be compiled using the gcc toolchain. If all Xpulpimg instructions implemented in Snitch are supported by the compiler, use:
# Compile with GCC including Xpulpimg support
make COMPILER=gcc XPULPIMG=1 hello_worldIf the compiler does not support newly implemented Xpulpimg instructions, you must restrict their use to asm volatile blocks and compile with:
# Restrict Xpulpimg instructions to inline assembly
make COMPILER=gcc XPULPIMG=0 hello_worldIf XPULPIMG is not explicitly set, it defaults to the value configured in config/config.mk.
This configuration also controls whether Xpulpimg is enabled in the Snitch core RTL and whether related unit tests should be executed.
TeraNoC includes an automated unit testing suite for system verification, located in software/riscv-tests/isa.
To launch all unit tests:
make riscv-testsThis process compiles the test suite, runs simulations using Spike, and then performs RTL simulation.
The test flow depends on the xpulpimg setting in config/config.mk. If xpulpimg=1, test cases involving Xpulpimg instructions will be included.
To add custom tests, modify the riscv-tests infrastructure. More details can be found in software/riscv-tests/README.md.
You can also compile the unit tests manually with:
cd software
make COMPILER=gcc riscv-tests🔧 Note: Unit tests must be compiled using
gcc. The sameXPULPIMGconfiguration logic applies as with application builds.
TeraNoC follows the LLVM C/C++ coding standards.
Code formatting is enforced via clang-format.
To format your code before committing changes, run:
make formatThe cluster supports both on-chip SRAM or off-chip DRAM co-simulation for higher hierarchy memory transfering. For off-chip DRAM co-simulation, it incorporates the dram_rtl_sim tool as a submodule, build at hardware/deps/dram_rtl_sim. Leveraging DRAMSys5.0, it facilitates an effective co-simulation environment between RTL models and DRAMSys5.0 for the simulation of DRAM + CTRL models, with contemporary off-chip DRAM technologies (e.g., LPDDR, DDR, HBM).
The DRAMsys tool aids are open-sourced and can be found here: https://github.com/pulp-platform/dram_rtl_sim
To prepare for DRAMsys co-simulation, adjust the system configuration by setting l2_sim_type to dram in config/config.mk. Then, execute the following command in the project's root directory to establish the DRAMsys tool aids environment:
make setup-dramThis makefile target automates several tasks:
- Cleans up the existing DRAMSys5.0 repository, if previously built.
- Rebuilds the DRAMSys5.0 repository and applies necessary patches within
hardware/deps/dramsys_rtl_sim/dramsys_lib/. - Applies HBM2 DRAM configuration patches tailored for the cluster simulation.
- Compiles the DRAMSys dynamic linkable library located at
hardware/deps/dramsys_rtl_sim/dramsys_lib/DRAMSys.
Important: This environment requires cmake version 3.28.1 or higher and GCC version 11.2.0 or above.
DRAMsys supports a range of contemporary off-chip DRAM technologies, including LPDDR, DDR, and HBM. Configuration files, formatted as .json, are accessible in the following directory: hardware/deps/dramsys_rtl_sim/dramsys_lib/DRAMSys/configs. Additionally, we provide a recommended HBM2 configuration located within hardware/deps/dramsys_rtl_sim/dramsys_lib/DRAMSys. This configuration is automatically applied as the default setting when establishing the DRAMsys tool aids environment. You are encouraged to review and modify these configurations as necessary to meet your specific simulation requirements.
For data transfer testing between the shared-memory cluster and higher hierarchy memory through DMA transfer, use the prepared example kernel located in software/tests/baremetal/memcpy. For more detailed methods on building applications and setting up RTL simulation, please refer to the sections aboves.
Note: Currently, the simulation crafting tool for off-chip DRAM co-simulation is not open-sourced. We utilize the Questasim simulator exclusively.
To simulate the TeraNoC system using ModelSim, navigate to the hardware directory:
cd hardware
# Compile only (no simulation)
make compile
# Run simulation with the 'hello_world' application
app=apps/baremetal/hello_world make sim
# Run Halide application (e.g., matmul)
app=apps/halide/matmul make sim
# Use full path to preload a binary
preload=/path/to/binary make sim
# Run simulation without GUI
app=apps/baremetal/hello_world make simc
# Generate human-readable traces
make trace
# Generate a visual trace view
app=apps/baremetal/hello_world make tracevis
# Full headless benchmarking: run, trace, and log
app=apps/baremetal/hello_world make benchmarkSystem configurations — such as total core count, tile size, and xpulpimg support — are set in config/config.mk.
To simulate using Verilator, use the same command format, replacing sim with verilate:
cd hardware
make verilateIf you encounter disk space issues during Verilator compilation, disable ccache:
export OBJCACHE=''
⚠️ Disabling object caching may slow down subsequent builds.
Trace files (from both ModelSim and Verilator) are generated under hardware/build/ when tracing is enabled.
Tracing can be controlled per-core via a trace CSR register (type: WARL, values: 0 or 1). For persistent debug tracing, set the environment variable:
export snitch_trace=1To visualize traces, use the provided script:
scripts/tracevis.pyThis generates a JSON file viewable in:
- Trace Viewer
- Google Chrome via
about:tracing
TeraNoC includes Spyglass linting support.
Run the following in the hardware directory to perform lint checks:
make lintThis uses the lint_rtl target with the current configuration in config/config.mk.
If you use this repo, please cite:
Y. Zhang, Z. Fu, T. Fischer, Y. Li, M. Bertuletti, and L. Benini. TeraNoC: A Multi-Channel 32-bit Fine-Grained, Hybrid Mesh-Crossbar NoC for Efficient Scale-up of 1000+ Core Shared-L1-Memory Clusters. In 2025 IEEE 43rd International Conference on Computer Design (ICCD), Dallas, TX, USA, Nov. 2025. (To appear). arXiv:2508.02446
@inproceedings{zhang2025teranoc,
author = {Zhang, Yichao and Fu, Zexin and Fischer, Tim and Li, Yinrong and Bertuletti, Marco and Benini, Luca},
title = {TeraNoC: A Multi-Channel 32-bit Fine-Grained, Hybrid Mesh-Crossbar NoC for Efficient Scale-up of 1000+ Core Shared-{L1}-Memory Clusters},
booktitle = {2025 IEEE 43rd International Conference on Computer Design (ICCD)},
year = {2025},
month = nov,
location = {Dallas, TX, USA},
publisher = {IEEE},
note = {To appear},
eprint = {2508.02446},
archiveprefix = {arXiv}
}TeraNoC is released under permissive open source licenses. Most of TeraNoC's source code is released under the Apache License 2.0 (Apache-2.0) see LICENSE. The code in hardware is released under Solderpad v0.51 (SHL-0.51) see hardware/LICENSE.
Note, TeraNoC includes several third-party packages with their own licenses:
Note, TeraNoC includes several third-party packages with their own licenses:
software/runtime/printf.{c,h}is licensed under the MIT license.software/runtime/omp/libgomp.his licensed under the GPL license.software/riscv-testsis an extended version of RISC-V's riscv-tests repository licensed under a BSD license. Seesoftware/riscv-tests/LICENSEfor details.
The hardware folder is licensed under Solderpad v0.51 see hardware/LICENSE. We use the following exceptions:
hardware/tb/dpi/elfloader.cppis licensed under a BSD license.hardware/tb/verilator/*is licensed under Apache License 2.0 seeLICENSEhardware/tb/verilator/lowrisc_*contain modified versions of lowRISC's helper libraries. They are licensed under Apache License 2.0.
scripts/run_clang_format.pyis licensed under the MIT license.
The following compilers can be used to build applications:
toolchain/halideis licensed under the MIT license. See Halide's license for details.toolchain/llvm-projectis licensed under the Apache License v2.0 with LLVM Exceptions. See LLVM's DeveloperPolicy for more details.toolchain/riscv-gnu-toolchain's licensing information is available here
We use the following RISC-V tools to parse simulation traces and keep opcodes consistent throughout the project.
toolchain/riscv-isa-simis licensed under a BSD license. See riscv-isa-sim's license for details.toolchain/riscv-opcodescontains an extended version of riscv-opcodes licensed under the BSD license. Seetoolchain/riscv-opcodes/LICENSEfor details.
The open-source simulator Verilator can be used for RTL simulation.
toolchain/verilatoris licensed under GPL. See Verilator's license for more details.
- The
dram_rtl_simsubmodule, located athardware/deps/dram_rtl_sim, is licensed under the Solderpad Hardware License 0.51. You can review the license here. - DRAMSys5.0 is utilized for DRAM simulations. For details on its usage and licensing, please refer to the DRAMSys5.0 license information.