Skip to content

H100 new apps#86

Open
Anunalla wants to merge 11 commits intoaccel-sim:devfrom
purdue-aalp:add-h100-benchmarks
Open

H100 new apps#86
Anunalla wants to merge 11 commits intoaccel-sim:devfrom
purdue-aalp:add-h100-benchmarks

Conversation

@Anunalla
Copy link
Copy Markdown
Contributor

Adding new HPC apps aimed for profiling Hopper

H100 Benchmark Suite

The H100 suite contains 15 modern GPU workloads from H100 profiling and analysis:

  • cuFFT (2 apps): FFT operations using cuFFT library
  • cuSolver (2 apps): Linear algebra using cuSolver library
  • Image Processing (3 apps): Wavelet transform, Gaussian filter, FDTD3d
  • Graph Algorithms (2 apps): BFS and MST using cuGraph (git submodule)
  • Physics Simulation (3 apps): Newton physics engine benchmarks (git submodule)
  • Computer Vision (3 apps): VPI-based vision processing (requires VPI 4.0)

@Anunalla Anunalla requested review from JRPan and Copilot March 22, 2026 19:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “H100 Benchmark Suite” containing multiple CUDA/HPC workloads (cuFFT, cuSolver, imaging, graph, Newton, VPI) plus supporting build/data tooling and a small microbenchmark fix.

Changes:

  • Add H100 suite apps (CUDA samples + cuGraph/Newton/VPI integrations) with Makefiles/CMake and new README docs.
  • Add data-generation scripts for the new H100 workloads and hook them into top-level get_data.sh.
  • Update GPU microbenchmark bandwidth reporting and fix MEM clock unit handling.

Reviewed changes

Copilot reviewed 87 out of 87 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/setup_environment Detect VPI (optional) and export install/version variables
src/cuda/pytorch_examples Bump submodule commit
src/cuda/cutlass-bench Bump submodule commit
src/cuda/H100/vpi/vpi_stereo_disparity/main.cpp Add VPI stereo disparity sample app
src/cuda/H100/vpi/vpi_stereo_disparity/CMakeLists.txt CMake build for stereo disparity (optional VPI)
src/cuda/H100/vpi/vpi_orb_feature_detector/main.cpp Add VPI ORB feature detector sample app
src/cuda/H100/vpi/vpi_orb_feature_detector/CMakeLists.txt CMake build for ORB feature detector (optional VPI)
src/cuda/H100/vpi/vpi_background_subtractor/main.cpp Add VPI background subtractor sample app
src/cuda/H100/vpi/vpi_background_subtractor/CMakeLists.txt CMake build for background subtractor (optional VPI)
src/cuda/H100/setup_vpi.sh Add helper script to install VPI via apt
src/cuda/H100/newton/setup_newton.sh Add Newton venv setup + editable install
src/cuda/H100/newton/newton_robot_cartpole Add Newton wrapper runner
src/cuda/H100/newton/newton_mpm_granular Add Newton wrapper runner
src/cuda/H100/newton/newton_diffsim_ball Add Newton wrapper runner
src/cuda/H100/image/recursiveGaussian/recursiveGaussian_kernel.cuh Add recursive Gaussian CUDA sample kernel
src/cuda/H100/image/recursiveGaussian/recursiveGaussian_cuda.cu Add recursive Gaussian CUDA sample driver
src/cuda/H100/image/recursiveGaussian/helper_gl.h Add CUDA sample OpenGL helper header
src/cuda/H100/image/recursiveGaussian/helper_functions.h Add CUDA sample helper-functions header
src/cuda/H100/image/recursiveGaussian/exception.h Add CUDA sample exception helper header
src/cuda/H100/image/recursiveGaussian/Makefile Build recursiveGaussian (no OpenGL)
src/cuda/H100/image/dwtHaar1D/helper_functions.h Add DWT sample helper-functions header
src/cuda/H100/image/dwtHaar1D/exception.h Add DWT sample exception helper header
src/cuda/H100/image/dwtHaar1D/dwtHaar1D_kernel.cuh Add Haar DWT kernel header
src/cuda/H100/image/dwtHaar1D/Makefile Build dwtHaar1D sample
src/cuda/H100/image/FDTD3d/helper_functions.h Add FDTD sample helper-functions header
src/cuda/H100/image/FDTD3d/exception.h Add FDTD sample exception helper header
src/cuda/H100/image/FDTD3d/Makefile Build FDTD3d sample
src/cuda/H100/image/FDTD3d/FDTD3dReference.h Add FDTD reference header
src/cuda/H100/image/FDTD3d/FDTD3dReference.cpp Add CPU reference implementation
src/cuda/H100/image/FDTD3d/FDTD3dGPUKernel.cuh Add GPU kernel implementation
src/cuda/H100/image/FDTD3d/FDTD3dGPU.h Add GPU implementation header
src/cuda/H100/image/FDTD3d/FDTD3dGPU.cu Add GPU implementation
src/cuda/H100/image/FDTD3d/FDTD3d.h Add app configuration header
src/cuda/H100/image/FDTD3d/FDTD3d.cpp Add main driver + validation
src/cuda/H100/graph/mst_standalone/mst_standalone.cu Add cuGraph-based MST runner
src/cuda/H100/graph/mst_standalone/CMakeLists.txt CMake build for MST runner
src/cuda/H100/graph/bfs_standalone/bfs_standalone.cu Add cuGraph-based BFS runner
src/cuda/H100/graph/bfs_standalone/CMakeLists.txt CMake build for BFS runner
src/cuda/H100/get_image_data.sh Generate test images for image workloads
src/cuda/H100/get_graph_data.sh Download/generate MTX graphs for graph workloads
src/cuda/H100/get_dwt_data.sh Generate random signals for DWT workload
src/cuda/H100/generate_graph.py Generate synthetic scale-free MTX graph
src/cuda/H100/external/newton Add Newton submodule pointer
src/cuda/H100/external/cugraph Add cuGraph submodule pointer
src/cuda/H100/cusolver/cusolver_ormqr/cusolver_utils.h Add cuSolver helpers for ormqr sample
src/cuda/H100/cusolver/cusolver_ormqr/cusolver_ormqr_scalable.cu Add scalable ormqr sample
src/cuda/H100/cusolver/cusolver_ormqr/Makefile Build scalable ormqr sample
src/cuda/H100/cusolver/cusolver_Xgetrf/cusolver_utils.h Add cuSolver helpers for Xgetrf sample
src/cuda/H100/cusolver/cusolver_Xgetrf/cusolver_Xgetrf_scalable.cu Add scalable Xgetrf sample
src/cuda/H100/cusolver/cusolver_Xgetrf/Makefile Build scalable Xgetrf sample
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_reference.h Add cuFFT reference header
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_reference.cu Add cuFFT reference implementation
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_lto_nvrtc_callback_example.cpp Add NVRTC-based LTO callback sample
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_lto_callback_device.cu Add LTO callback device code
src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_legacy_callback_example.cu Add legacy callback sample
src/cuda/H100/cufft/cufft_lto_r2c_c2r/nvrtc_helper.h Add NVRTC helper
src/cuda/H100/cufft/cufft_lto_r2c_c2r/cufft_lto_r2c_c2r_scalable.cpp Add scalable cuFFT LTO app
src/cuda/H100/cufft/cufft_lto_r2c_c2r/common.h Add shared cuFFT app helpers
src/cuda/H100/cufft/cufft_lto_r2c_c2r/common.cpp Add input-signal generator
src/cuda/H100/cufft/cufft_lto_r2c_c2r/callback_params.h Add callback parameter definitions
src/cuda/H100/cufft/cufft_lto_r2c_c2r/Makefile Build scalable cuFFT LTO app
src/cuda/H100/cufft/cufft_3d_c2c/cufft_utils.h Add cuFFT 3D sample helpers
src/cuda/H100/cufft/cufft_3d_c2c/cufft_3d_c2c_scalable.cu Add scalable cuFFT 3D app
src/cuda/H100/cufft/cufft_3d_c2c/Makefile Build scalable cuFFT 3D app
src/cuda/H100/README.md Document H100 suite apps and usage
src/cuda/H100/Makefile Add H100 suite build orchestration
src/cuda/GPU_Microbenchmark/ubench/mem/mem_bw/mem_bw.cu Improve achieved BW computation using min/max clocks
src/cuda/GPU_Microbenchmark/hw_def/common/gpuConfig.h Fix MEM clock unit conversion (keep MHz)
src/Makefile Add H100 to top-level build + add clean target
get_data.sh Run H100 data generation after downloading base dataset
README.md Add H100 suite overview and link to suite README
.gitmodules Add cuGraph and Newton submodules
Comments suppressed due to low confidence (3)

src/cuda/H100/vpi/vpi_background_subtractor/main.cpp:1

  • The second isOpened() check is validating outVideo again instead of bgimageVideo, so failures to create bgimageVideo won't be detected. Change the condition to check bgimageVideo.isOpened().
    src/cuda/H100/graph/bfs_standalone/bfs_standalone.cu:1
  • This file uses std::optional, std::numeric_limits, and std::make_shared but does not include <optional>, <limits>, or <memory>. This will fail to compile with stricter toolchains. Add the missing standard library includes.
    src/cuda/H100/setup_vpi.sh:1
  • apt-key is deprecated on modern Ubuntu releases and can fail depending on system policy. Prefer installing the key into /etc/apt/keyrings and referencing it with signed-by= in the .list entry (or use the vendor’s recommended repository setup instructions).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Anunalla and others added 4 commits March 22, 2026 16:12
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…collection-public into add-h100-benchmarks add copilot suggestions
@tgrogers
Copy link
Copy Markdown
Contributor

Hey @Anunalla - thanks for the commit.
Can we not call these H100 apps? They are not specific to H100 right, they are just optimized HPC with nvidia libraries?
Also - did you get this code from somewhere else? If there are samples repos or something somewhere, then we should just be doing something similar to cutlass unless we need to make changes to it.

@JRPan - does this make sense to you?

@Anunalla
Copy link
Copy Markdown
Contributor Author

I will change the name in the yml def.

And yes, some of these scripts have been modified.
Most of these apps MST, BFS, Dwt, gaussian, fdtd3d, CUFFT, CUSOLVER examples have been modified to support different input sizes as user argument
Newton sim apps have been modified to remove graph replay feature.

VPI apps are the only ones that are being used as is. The user needs to "apt install" libnvvpi.
I will remove the source code and have the build workflow build it at installation location (i.e /opt/nvidia/vpi/)

@tgrogers
Copy link
Copy Markdown
Contributor

tgrogers commented Mar 24, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants