You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add tma unittest
* add regular load to TMA benchmark
* make the regular load to have same access pattern as TMA load
* avoid compiler optimization
* move cuda mempcy to be before kernel launch
* add iteration count for tma ubench
* minor formatting
* move tma to ubench folder
* make setup script works with zsh
* fix the issue that ubench all return 1 even without issue
* add a sample test kernel for mbarrier PTX mapping to SASS
* update gitignore
* add gmma kernels for latency measurement
* increase iter to 1024
* add missed kernels
* add maxflops for gmma
* update block size
* update prints for MaxFlops_gmma
* fix a bug
* fix include after updating it
* fix for cpp and c source
* fix compile
* fix for pattern matching
* fix compilation for mbarrier
* Fix makefile for tma app
* generate SASS and PTX for TMA and GMMA workloads
* update makefile to force PTX to be embedded in final fat bin
* change naming
* comment out parboil as it is using python2
* Add GPU ubench to clean target
* Use dynamic linking by default for GPU apps
* Add test binaries for GMMA instruction
* Checkout CUTLASS during ci
* Use type to specify gmma ubench iteration count and update test code
* Fix typos
* Update Makefiles and setup_environment to use C++17 standard
* missed one rename
* Remove unused clean target and tma build steps from Makefile
---------
Co-authored-by: JRPAN <25518778+JRPan@users.noreply.github.com>
nv-nsight-cu-cli --metrics gpc__cycles_elapsed.avg,sm__cycles_elapsed.sum,smsp__inst_executed.sum,sm__warps_active.avg.pct_of_peak_sustained_active,l1tex__t_sectors_pipe_lsu_mem_global_op_ld_lookup_hit.sum,l1tex__t_sectors_pipe_lsu_mem_global_op_ld.sum,l1tex__t_sectors_pipe_lsu_mem_global_op_st_lookup_hit.sum,l1tex__t_sectors_pipe_lsu_mem_global_op_st.sum,lts__t_sectors_srcunit_tex_op_read.sum,lts__t_sectors_srcunit_tex_op_write.sum,lts__t_sectors_srcunit_tex_op_read_lookup_hit.sum,lts__t_sectors_srcunit_tex_op_write_lookup_hit.sum,lts__t_sector_op_read_hit_rate.pct,lts__t_sector_op_write_hit_rate.pct,lts__t_sectors_srcunit_tex_op_read.sum.per_second,dram__sectors_read.sum,dram__sectors_write.sum,dram__bytes_read.sum --csv --page raw ./$(EXE)| tee nsight.csv
0 commit comments