H100 new apps by Anunalla · Pull Request #86 · accel-sim/gpu-app-collection

Anunalla · 2026-03-22T19:38:15Z

Adding new HPC apps aimed for profiling Hopper

H100 Benchmark Suite

The H100 suite contains 15 modern GPU workloads from H100 profiling and analysis:

cuFFT (2 apps): FFT operations using cuFFT library
cuSolver (2 apps): Linear algebra using cuSolver library
Image Processing (3 apps): Wavelet transform, Gaussian filter, FDTD3d
Graph Algorithms (2 apps): BFS and MST using cuGraph (git submodule)
Physics Simulation (3 apps): Newton physics engine benchmarks (git submodule)
Computer Vision (3 apps): VPI-based vision processing (requires VPI 4.0)

Copilot

Pull request overview

Adds a new “H100 Benchmark Suite” containing multiple CUDA/HPC workloads (cuFFT, cuSolver, imaging, graph, Newton, VPI) plus supporting build/data tooling and a small microbenchmark fix.

Changes:

Add H100 suite apps (CUDA samples + cuGraph/Newton/VPI integrations) with Makefiles/CMake and new README docs.
Add data-generation scripts for the new H100 workloads and hook them into top-level get_data.sh.
Update GPU microbenchmark bandwidth reporting and fix MEM clock unit handling.

Reviewed changes

Copilot reviewed 87 out of 87 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
src/setup_environment	Detect VPI (optional) and export install/version variables
src/cuda/pytorch_examples	Bump submodule commit
src/cuda/cutlass-bench	Bump submodule commit
src/cuda/H100/vpi/vpi_stereo_disparity/main.cpp	Add VPI stereo disparity sample app
src/cuda/H100/vpi/vpi_stereo_disparity/CMakeLists.txt	CMake build for stereo disparity (optional VPI)
src/cuda/H100/vpi/vpi_orb_feature_detector/main.cpp	Add VPI ORB feature detector sample app
src/cuda/H100/vpi/vpi_orb_feature_detector/CMakeLists.txt	CMake build for ORB feature detector (optional VPI)
src/cuda/H100/vpi/vpi_background_subtractor/main.cpp	Add VPI background subtractor sample app
src/cuda/H100/vpi/vpi_background_subtractor/CMakeLists.txt	CMake build for background subtractor (optional VPI)
src/cuda/H100/setup_vpi.sh	Add helper script to install VPI via apt
src/cuda/H100/newton/setup_newton.sh	Add Newton venv setup + editable install
src/cuda/H100/newton/newton_robot_cartpole	Add Newton wrapper runner
src/cuda/H100/newton/newton_mpm_granular	Add Newton wrapper runner
src/cuda/H100/newton/newton_diffsim_ball	Add Newton wrapper runner
src/cuda/H100/image/recursiveGaussian/recursiveGaussian_kernel.cuh	Add recursive Gaussian CUDA sample kernel
src/cuda/H100/image/recursiveGaussian/recursiveGaussian_cuda.cu	Add recursive Gaussian CUDA sample driver
src/cuda/H100/image/recursiveGaussian/helper_gl.h	Add CUDA sample OpenGL helper header
src/cuda/H100/image/recursiveGaussian/helper_functions.h	Add CUDA sample helper-functions header
src/cuda/H100/image/recursiveGaussian/exception.h	Add CUDA sample exception helper header
src/cuda/H100/image/recursiveGaussian/Makefile	Build recursiveGaussian (no OpenGL)
src/cuda/H100/image/dwtHaar1D/helper_functions.h	Add DWT sample helper-functions header
src/cuda/H100/image/dwtHaar1D/exception.h	Add DWT sample exception helper header
src/cuda/H100/image/dwtHaar1D/dwtHaar1D_kernel.cuh	Add Haar DWT kernel header
src/cuda/H100/image/dwtHaar1D/Makefile	Build dwtHaar1D sample
src/cuda/H100/image/FDTD3d/helper_functions.h	Add FDTD sample helper-functions header
src/cuda/H100/image/FDTD3d/exception.h	Add FDTD sample exception helper header
src/cuda/H100/image/FDTD3d/Makefile	Build FDTD3d sample
src/cuda/H100/image/FDTD3d/FDTD3dReference.h	Add FDTD reference header
src/cuda/H100/image/FDTD3d/FDTD3dReference.cpp	Add CPU reference implementation
src/cuda/H100/image/FDTD3d/FDTD3dGPUKernel.cuh	Add GPU kernel implementation
src/cuda/H100/image/FDTD3d/FDTD3dGPU.h	Add GPU implementation header
src/cuda/H100/image/FDTD3d/FDTD3dGPU.cu	Add GPU implementation
src/cuda/H100/image/FDTD3d/FDTD3d.h	Add app configuration header
src/cuda/H100/image/FDTD3d/FDTD3d.cpp	Add main driver + validation
src/cuda/H100/graph/mst_standalone/mst_standalone.cu	Add cuGraph-based MST runner
src/cuda/H100/graph/mst_standalone/CMakeLists.txt	CMake build for MST runner
src/cuda/H100/graph/bfs_standalone/bfs_standalone.cu	Add cuGraph-based BFS runner
src/cuda/H100/graph/bfs_standalone/CMakeLists.txt	CMake build for BFS runner
src/cuda/H100/get_image_data.sh	Generate test images for image workloads
src/cuda/H100/get_graph_data.sh	Download/generate MTX graphs for graph workloads
src/cuda/H100/get_dwt_data.sh	Generate random signals for DWT workload
src/cuda/H100/generate_graph.py	Generate synthetic scale-free MTX graph
src/cuda/H100/external/newton	Add Newton submodule pointer
src/cuda/H100/external/cugraph	Add cuGraph submodule pointer
src/cuda/H100/cusolver/cusolver_ormqr/cusolver_utils.h	Add cuSolver helpers for ormqr sample
src/cuda/H100/cusolver/cusolver_ormqr/cusolver_ormqr_scalable.cu	Add scalable ormqr sample
src/cuda/H100/cusolver/cusolver_ormqr/Makefile	Build scalable ormqr sample
src/cuda/H100/cusolver/cusolver_Xgetrf/cusolver_utils.h	Add cuSolver helpers for Xgetrf sample
src/cuda/H100/cusolver/cusolver_Xgetrf/cusolver_Xgetrf_scalable.cu	Add scalable Xgetrf sample
src/cuda/H100/cusolver/cusolver_Xgetrf/Makefile	Build scalable Xgetrf sample
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_reference.h	Add cuFFT reference header
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_reference.cu	Add cuFFT reference implementation
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_lto_nvrtc_callback_example.cpp	Add NVRTC-based LTO callback sample
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_lto_callback_device.cu	Add LTO callback device code
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_legacy_callback_example.cu	Add legacy callback sample
src/cuda/H100/cufft/cufft_lto_r2c_c2r/nvrtc_helper.h	Add NVRTC helper
src/cuda/H100/cufft/cufft_lto_r2c_c2r/cufft_lto_r2c_c2r_scalable.cpp	Add scalable cuFFT LTO app
src/cuda/H100/cufft/cufft_lto_r2c_c2r/common.h	Add shared cuFFT app helpers
src/cuda/H100/cufft/cufft_lto_r2c_c2r/common.cpp	Add input-signal generator
src/cuda/H100/cufft/cufft_lto_r2c_c2r/callback_params.h	Add callback parameter definitions
src/cuda/H100/cufft/cufft_lto_r2c_c2r/Makefile	Build scalable cuFFT LTO app
src/cuda/H100/cufft/cufft_3d_c2c/cufft_utils.h	Add cuFFT 3D sample helpers
src/cuda/H100/cufft/cufft_3d_c2c/cufft_3d_c2c_scalable.cu	Add scalable cuFFT 3D app
src/cuda/H100/cufft/cufft_3d_c2c/Makefile	Build scalable cuFFT 3D app
src/cuda/H100/README.md	Document H100 suite apps and usage
src/cuda/H100/Makefile	Add H100 suite build orchestration
src/cuda/GPU_Microbenchmark/ubench/mem/mem_bw/mem_bw.cu	Improve achieved BW computation using min/max clocks
src/cuda/GPU_Microbenchmark/hw_def/common/gpuConfig.h	Fix MEM clock unit conversion (keep MHz)
src/Makefile	Add H100 to top-level build + add clean target
get_data.sh	Run H100 data generation after downloading base dataset
README.md	Add H100 suite overview and link to suite README
.gitmodules	Add cuGraph and Newton submodules

Comments suppressed due to low confidence (3)

src/cuda/H100/vpi/vpi_background_subtractor/main.cpp:1

The second isOpened() check is validating outVideo again instead of bgimageVideo, so failures to create bgimageVideo won't be detected. Change the condition to check bgimageVideo.isOpened().
src/cuda/H100/graph/bfs_standalone/bfs_standalone.cu:1
This file uses std::optional, std::numeric_limits, and std::make_shared but does not include <optional>, <limits>, or <memory>. This will fail to compile with stricter toolchains. Add the missing standard library includes.
src/cuda/H100/setup_vpi.sh:1
apt-key is deprecated on modern Ubuntu releases and can fail depending on system policy. Prefer installing the key into /etc/apt/keyrings and referencing it with signed-by= in the .list entry (or use the vendor’s recommended repository setup instructions).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/cuda/H100/cufft/cufft_lto_r2c_c2r/Makefile

src/cuda/HPC/cufft/cufft_lto_r2c_c2r/cufft_lto_r2c_c2r_scalable.cpp

src/cuda/HPC/cufft/cufft_lto_r2c_c2r/r2c_c2r_lto_nvrtc_callback_example.cpp

src/cuda/HPC/cufft/cufft_lto_r2c_c2r/r2c_c2r_legacy_callback_example.cu

src/cuda/HPC/graph/bfs_standalone/bfs_standalone.cu

src/cuda/HPC/cusolver/cusolver_ormqr/cusolver_ormqr_scalable.cu

src/Makefile

src/cuda/HPC/README.md

src/cuda/H100/README.md

src/cuda/HPC/cufft/cufft_lto_r2c_c2r/nvrtc_helper.h

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…collection-public into add-h100-benchmarks add copilot suggestions

tgrogers · 2026-03-24T12:27:52Z

Hey @Anunalla - thanks for the commit.
Can we not call these H100 apps? They are not specific to H100 right, they are just optimized HPC with nvidia libraries?
Also - did you get this code from somewhere else? If there are samples repos or something somewhere, then we should just be doing something similar to cutlass unless we need to make changes to it.

@JRPan - does this make sense to you?

Anunalla · 2026-03-24T12:43:44Z

I will change the name in the yml def.

And yes, some of these scripts have been modified.
Most of these apps MST, BFS, Dwt, gaussian, fdtd3d, CUFFT, CUSOLVER examples have been modified to support different input sizes as user argument
Newton sim apps have been modified to remove graph replay feature.

VPI apps are the only ones that are being used as is. The user needs to "apt install" libnvvpi.
I will remove the source code and have the build workflow build it at installation location (i.e /opt/nvidia/vpi/)

tgrogers · 2026-03-24T12:48:46Z

ok - great even for the ones with small diffs, I wonder if we are better off syncing to the main source than applying a diff file. I am not sure what is the current best practice in SW these days. Also - make sure you change the directory name from H100 to maybe something representative of what these are. Something like "hpc-with-cuda-libs" or something like that.

…

On Tue, Mar 24, 2026 at 8:44 AM Anusuya Nallathambi < ***@***.***> wrote: *Anunalla* left a comment (accel-sim/gpu-app-collection#86) <#86 (comment)> I will change the name in the yml def. And yes, some of these scripts have been modified. Most of these apps MST, BFS, Dwt, gaussian, fdtd3d, CUFFT, CUSOLVER examples have been modified to support different input sizes as user argument Newton sim apps have been modified to remove graph replay feature. VPI apps are the only ones that are being used as is. The user needs to "apt install" libnvvpi. I will remove the source code and have the build workflow build it at installation location (i.e /opt/nvidia/vpi/) — Reply to this email directly, view it on GitHub <#86?email_source=notifications&email_token=AA7UY4LKOTQF26DE6LMYLIL4SJ7JNA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJRG44TKNJZG4YKM4TFMFZW63VHMNXW23LFNZ2KKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4117955970>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA7UY4N7IR4KWLLTWL36ZGT4SJ7JNAVCNFSM6AAAAACW247ZHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCMJXHE2TKOJXGA> . You are receiving this because you commented.Message ID: ***@***.***>

H100 new apps

915fd8d

Anunalla requested review from JRPan and Copilot March 22, 2026 19:38

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Anunalla and others added 4 commits March 22, 2026 16:12

Update src/cuda/H100/cufft/cufft_lto_r2c_c2r/Makefile

fb56a32

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

H100 new apps-copilot suggestions fix

3281c8b

Merge branch 'add-h100-benchmarks' of github.com:purdue-aalp/gpu-app-…

d2df703

…collection-public into add-h100-benchmarks add copilot suggestions

Corrected build errors

f7de66e

Anunalla and others added 6 commits March 25, 2026 12:24

ødata folder organization cleanup

fb90e99

Rename suite folder, remove VPI local copies, use symlink

6732954

modify input args

f34bd0e

VP- limit number of frames processed

e155818

Add 5090def, l2_bw ubench bug fix

b3b0f48

Cugraph make file bug fix

3b09605

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

H100 new apps#86

H100 new apps#86
Anunalla wants to merge 11 commits intoaccel-sim:devfrom
purdue-aalp:add-h100-benchmarks

Anunalla commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgrogers commented Mar 24, 2026

Uh oh!

Anunalla commented Mar 24, 2026

Uh oh!

tgrogers commented Mar 24, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Anunalla commented Mar 22, 2026

H100 Benchmark Suite

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgrogers commented Mar 24, 2026

Uh oh!

Anunalla commented Mar 24, 2026

Uh oh!

tgrogers commented Mar 24, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants