NX 10 Gidel and Quartus project for SpMM core

This is the Gidel and Quartus project for Block Aggregation SpMM core. Use this project to generate the bitstream for the SpMM core.

Prerequisites

Quartus 21.4
Gidel ProcWizard (If you want to modify the peripherals like HBM)

Build

Using Gidel to generate Quartus project

Open Gidel project in Gidel ProcWizard
Click "Generate" -> "Generate HDL code"
Move the generated directory under this repo, rename it as you prefer (e.g., quartus_proj_singlecore)

A pre-generated Quartus project is provided here for convenience.

Prepare src files

Specify the path to the SpMM core design generated by SpinalHDL in Block Aggregation SpMM core. If you cloned this repo as a submodule of Block Aggregation SpMM core, the path should be ../../src/generated_spmm_core_6x12x8/. SpinalHDL will generate a file (usually named generated_spmm_core_<core_config>tensor_core_array_wrapper.lst) list including all generated verilog source files. Make sure all the files listed in the lst file are added to the Quartus project file src/ic_tcorearray.qsf, such as:

set_global_assignment -name SYSTEMVERILOG_FILE ../../src/main/sverilog/out_asym_fifo.sv
set_global_assignment -name SYSTEMVERILOG_FILE ../../src/main/sverilog/blk_delay_core.sv

set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/enumdefine.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_252.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_240.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_229.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_231.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_49.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/spram_megafunc_583.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/MergeSortRedundancyRemoverUnit.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/MergeSortRedundancyRemoverUnit_6.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/TensorCoreChainBf12.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/TensorCoreChainBf12_60.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_4.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/spram_megafunc.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/CasLoadBubbleInsert.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/AsymBufferN2One.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/FixedBfp12Converter.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/FixedBfp12Converter_1.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamOutAsymFifo.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/IndexGenerator.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_2.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_3.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/TensorCoreChainArray.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/tensor_core_array_wrapper.v

Then copy the qsf file and top level sv file to the Quartus project directory.

cp src/ic_tcorearray.qsf quartus_proj_singlecore/.
cp src/ic_tcorearray.sv quartus_proj_singlecore/.

Run Quartus compilation

Using quartus_sh is recommended since it supports both local compilation and sbatch submission. To run a full compilation:

cd quartus_project_singlecore
quartus_sh --flow compile ic_tcorearray

For slurm-based remote compilation, a sbatch script sample is provided below:

#!/bin/bash
#
#SBATCH --job-name spmm_core
#SBATCH -p <your_partition>
#SBATCH --nodelist=<your_node>
#SBATCH --output sbatch.out
#SBATCH --error sbatch.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G

# Note: running a full compilation involves IP generation, 
# which requires a quartus GUI process and X server support
quartus_sh --flow compile ic_tcorearray

# if you want to skip ip_generation phase, use this instead:
# quartus_sh --flow compile ic_tcorearray -start synthesis
# if you only want to rerun P&R, use this instead:
# quartus_sh --flow compile ic_tcorearray -start fitter

Run on-chip SpMM test

A testing script gen_onchip_file_and_runtest.sh is provided to generate the on-chip SpMM test files and run the test, using the sparse attention values generated by sparse attention analyzer.

A set of extracted sparse attention value with BFP-format is provided to simplify the simulation process. It is extracted from chatglm2-6b-32k on LongBench's vcsum test. Download and extract partial BFP-format attention value from COMPAS NFS and extract it.

A copy of extracted data is located at /compas-old/projects/sparse-attention in COMPAS NFS.

To run the test:

Program FPGA using Gidel SofLoader:

SofLoader quartus_files/ic_tcorearray.sof

Reboot the host machine.

Modify the attn_dir and onchip_test_dir in gen_onchip_file_and_runtest.sh to the extracted data directory.

# specify the path to the sparse attention values
attn_dir="/compas-old/projects/sparse-attention/chatglm2-6b-32k-attn-bfp20-vcsum"
# intermediate result directory should follow this format:
# attn_dir/../onchip/<modelname-taskname>
onchip_test_dir="/compas-old/projects/sparse-attention/onchip/chatglm2-6b-32k-attn-bfp20-vcsum"

Run the test script:

cd ../sw/scripts
./gen_onchip_file_and_runtest.sh

The result will be written into ${onchip_test_dir}/{inst_id}/hw_config_{head_id}.json, for each head. Following is a sample of the result file:

{
    ...
    "mat a size": 27456,
    "mat a vec size": 208,
    "mat b size": 27456,
    "mat b vec size": 208,
    "total_lat_counter_res": 25048,     // total latency in number of clock cycles
    "compute_lat_counter_res": 24212,   // compute latency in number of clock cycles
    "mat_b_load_counter_res": 27456     // mat b load latency in number of clock cycles
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
debug		debug
ips		ips
src		src
sw		sw
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
tensor_core_array_proj_mem.pcaf		tensor_core_array_proj_mem.pcaf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NX 10 Gidel and Quartus project for SpMM core

Prerequisites

Build

Using Gidel to generate Quartus project

Prepare src files

Run Quartus compilation

Run on-chip SpMM test

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

COMPAS-Lab/sparsity-nx10-matmul-gidel-proj

Folders and files

Latest commit

History

Repository files navigation

NX 10 Gidel and Quartus project for SpMM core

Prerequisites

Build

Using Gidel to generate Quartus project

Prepare src files

Run Quartus compilation

Run on-chip SpMM test

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages