Skip to content

COMPAS-Lab/sparsity-nx10-matmul-gidel-proj

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NX 10 Gidel and Quartus project for SpMM core

This is the Gidel and Quartus project for Block Aggregation SpMM core. Use this project to generate the bitstream for the SpMM core.

Prerequisites

  • Quartus 21.4
  • Gidel ProcWizard (If you want to modify the peripherals like HBM)

Build

Using Gidel to generate Quartus project

  1. Open Gidel project in Gidel ProcWizard
  2. Click "Generate" -> "Generate HDL code"
  3. Move the generated directory under this repo, rename it as you prefer (e.g., quartus_proj_singlecore)

A pre-generated Quartus project is provided here for convenience.

Prepare src files

Specify the path to the SpMM core design generated by SpinalHDL in Block Aggregation SpMM core. If you cloned this repo as a submodule of Block Aggregation SpMM core, the path should be ../../src/generated_spmm_core_6x12x8/. SpinalHDL will generate a file (usually named generated_spmm_core_<core_config>tensor_core_array_wrapper.lst) list including all generated verilog source files. Make sure all the files listed in the lst file are added to the Quartus project file src/ic_tcorearray.qsf, such as:

set_global_assignment -name SYSTEMVERILOG_FILE ../../src/main/sverilog/out_asym_fifo.sv
set_global_assignment -name SYSTEMVERILOG_FILE ../../src/main/sverilog/blk_delay_core.sv

set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/enumdefine.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_252.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_240.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_229.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_231.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_49.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/spram_megafunc_583.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/MergeSortRedundancyRemoverUnit.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/MergeSortRedundancyRemoverUnit_6.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/TensorCoreChainBf12.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/TensorCoreChainBf12_60.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_4.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/spram_megafunc.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/CasLoadBubbleInsert.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/AsymBufferN2One.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/FixedBfp12Converter.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/FixedBfp12Converter_1.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamOutAsymFifo.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/IndexGenerator.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_2.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/StreamFifoIp_3.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/TensorCoreChainArray.v
set_global_assignment -name VERILOG_FILE ../../src/generated_spmm_core_6x12x8/tensor_core_array_wrapper.v

Then copy the qsf file and top level sv file to the Quartus project directory.

cp src/ic_tcorearray.qsf quartus_proj_singlecore/.
cp src/ic_tcorearray.sv quartus_proj_singlecore/.

Run Quartus compilation

Using quartus_sh is recommended since it supports both local compilation and sbatch submission. To run a full compilation:

cd quartus_project_singlecore
quartus_sh --flow compile ic_tcorearray

For slurm-based remote compilation, a sbatch script sample is provided below:

#!/bin/bash
#
#SBATCH --job-name spmm_core
#SBATCH -p <your_partition>
#SBATCH --nodelist=<your_node>
#SBATCH --output sbatch.out
#SBATCH --error sbatch.err
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G

# Note: running a full compilation involves IP generation, 
# which requires a quartus GUI process and X server support
quartus_sh --flow compile ic_tcorearray

# if you want to skip ip_generation phase, use this instead:
# quartus_sh --flow compile ic_tcorearray -start synthesis
# if you only want to rerun P&R, use this instead:
# quartus_sh --flow compile ic_tcorearray -start fitter

Run on-chip SpMM test

A testing script gen_onchip_file_and_runtest.sh is provided to generate the on-chip SpMM test files and run the test, using the sparse attention values generated by sparse attention analyzer.

A set of extracted sparse attention value with BFP-format is provided to simplify the simulation process. It is extracted from chatglm2-6b-32k on LongBench's vcsum test. Download and extract partial BFP-format attention value from COMPAS NFS and extract it.

A copy of extracted data is located at /compas-old/projects/sparse-attention in COMPAS NFS.

To run the test:

  1. Program FPGA using Gidel SofLoader:
    SofLoader quartus_files/ic_tcorearray.sof
  2. Reboot the host machine.
  3. Modify the attn_dir and onchip_test_dir in gen_onchip_file_and_runtest.sh to the extracted data directory.
    # specify the path to the sparse attention values
    attn_dir="/compas-old/projects/sparse-attention/chatglm2-6b-32k-attn-bfp20-vcsum"
    # intermediate result directory should follow this format:
    # attn_dir/../onchip/<modelname-taskname>
    onchip_test_dir="/compas-old/projects/sparse-attention/onchip/chatglm2-6b-32k-attn-bfp20-vcsum"
  4. Run the test script:
    cd ../sw/scripts
    ./gen_onchip_file_and_runtest.sh

The result will be written into ${onchip_test_dir}/{inst_id}/hw_config_{head_id}.json, for each head. Following is a sample of the result file:

{
    ...
    "mat a size": 27456,
    "mat a vec size": 208,
    "mat b size": 27456,
    "mat b vec size": 208,
    "total_lat_counter_res": 25048,     // total latency in number of clock cycles
    "compute_lat_counter_res": 24212,   // compute latency in number of clock cycles
    "mat_b_load_counter_res": 27456     // mat b load latency in number of clock cycles
}

About

Gidel and Quartus project for COMPAS SpMM core

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors