Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new “H100 Benchmark Suite” containing multiple CUDA/HPC workloads (cuFFT, cuSolver, imaging, graph, Newton, VPI) plus supporting build/data tooling and a small microbenchmark fix.
Changes:
- Add H100 suite apps (CUDA samples + cuGraph/Newton/VPI integrations) with Makefiles/CMake and new README docs.
- Add data-generation scripts for the new H100 workloads and hook them into top-level
get_data.sh. - Update GPU microbenchmark bandwidth reporting and fix MEM clock unit handling.
Reviewed changes
Copilot reviewed 87 out of 87 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/setup_environment | Detect VPI (optional) and export install/version variables |
| src/cuda/pytorch_examples | Bump submodule commit |
| src/cuda/cutlass-bench | Bump submodule commit |
| src/cuda/H100/vpi/vpi_stereo_disparity/main.cpp | Add VPI stereo disparity sample app |
| src/cuda/H100/vpi/vpi_stereo_disparity/CMakeLists.txt | CMake build for stereo disparity (optional VPI) |
| src/cuda/H100/vpi/vpi_orb_feature_detector/main.cpp | Add VPI ORB feature detector sample app |
| src/cuda/H100/vpi/vpi_orb_feature_detector/CMakeLists.txt | CMake build for ORB feature detector (optional VPI) |
| src/cuda/H100/vpi/vpi_background_subtractor/main.cpp | Add VPI background subtractor sample app |
| src/cuda/H100/vpi/vpi_background_subtractor/CMakeLists.txt | CMake build for background subtractor (optional VPI) |
| src/cuda/H100/setup_vpi.sh | Add helper script to install VPI via apt |
| src/cuda/H100/newton/setup_newton.sh | Add Newton venv setup + editable install |
| src/cuda/H100/newton/newton_robot_cartpole | Add Newton wrapper runner |
| src/cuda/H100/newton/newton_mpm_granular | Add Newton wrapper runner |
| src/cuda/H100/newton/newton_diffsim_ball | Add Newton wrapper runner |
| src/cuda/H100/image/recursiveGaussian/recursiveGaussian_kernel.cuh | Add recursive Gaussian CUDA sample kernel |
| src/cuda/H100/image/recursiveGaussian/recursiveGaussian_cuda.cu | Add recursive Gaussian CUDA sample driver |
| src/cuda/H100/image/recursiveGaussian/helper_gl.h | Add CUDA sample OpenGL helper header |
| src/cuda/H100/image/recursiveGaussian/helper_functions.h | Add CUDA sample helper-functions header |
| src/cuda/H100/image/recursiveGaussian/exception.h | Add CUDA sample exception helper header |
| src/cuda/H100/image/recursiveGaussian/Makefile | Build recursiveGaussian (no OpenGL) |
| src/cuda/H100/image/dwtHaar1D/helper_functions.h | Add DWT sample helper-functions header |
| src/cuda/H100/image/dwtHaar1D/exception.h | Add DWT sample exception helper header |
| src/cuda/H100/image/dwtHaar1D/dwtHaar1D_kernel.cuh | Add Haar DWT kernel header |
| src/cuda/H100/image/dwtHaar1D/Makefile | Build dwtHaar1D sample |
| src/cuda/H100/image/FDTD3d/helper_functions.h | Add FDTD sample helper-functions header |
| src/cuda/H100/image/FDTD3d/exception.h | Add FDTD sample exception helper header |
| src/cuda/H100/image/FDTD3d/Makefile | Build FDTD3d sample |
| src/cuda/H100/image/FDTD3d/FDTD3dReference.h | Add FDTD reference header |
| src/cuda/H100/image/FDTD3d/FDTD3dReference.cpp | Add CPU reference implementation |
| src/cuda/H100/image/FDTD3d/FDTD3dGPUKernel.cuh | Add GPU kernel implementation |
| src/cuda/H100/image/FDTD3d/FDTD3dGPU.h | Add GPU implementation header |
| src/cuda/H100/image/FDTD3d/FDTD3dGPU.cu | Add GPU implementation |
| src/cuda/H100/image/FDTD3d/FDTD3d.h | Add app configuration header |
| src/cuda/H100/image/FDTD3d/FDTD3d.cpp | Add main driver + validation |
| src/cuda/H100/graph/mst_standalone/mst_standalone.cu | Add cuGraph-based MST runner |
| src/cuda/H100/graph/mst_standalone/CMakeLists.txt | CMake build for MST runner |
| src/cuda/H100/graph/bfs_standalone/bfs_standalone.cu | Add cuGraph-based BFS runner |
| src/cuda/H100/graph/bfs_standalone/CMakeLists.txt | CMake build for BFS runner |
| src/cuda/H100/get_image_data.sh | Generate test images for image workloads |
| src/cuda/H100/get_graph_data.sh | Download/generate MTX graphs for graph workloads |
| src/cuda/H100/get_dwt_data.sh | Generate random signals for DWT workload |
| src/cuda/H100/generate_graph.py | Generate synthetic scale-free MTX graph |
| src/cuda/H100/external/newton | Add Newton submodule pointer |
| src/cuda/H100/external/cugraph | Add cuGraph submodule pointer |
| src/cuda/H100/cusolver/cusolver_ormqr/cusolver_utils.h | Add cuSolver helpers for ormqr sample |
| src/cuda/H100/cusolver/cusolver_ormqr/cusolver_ormqr_scalable.cu | Add scalable ormqr sample |
| src/cuda/H100/cusolver/cusolver_ormqr/Makefile | Build scalable ormqr sample |
| src/cuda/H100/cusolver/cusolver_Xgetrf/cusolver_utils.h | Add cuSolver helpers for Xgetrf sample |
| src/cuda/H100/cusolver/cusolver_Xgetrf/cusolver_Xgetrf_scalable.cu | Add scalable Xgetrf sample |
| src/cuda/H100/cusolver/cusolver_Xgetrf/Makefile | Build scalable Xgetrf sample |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_reference.h | Add cuFFT reference header |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_reference.cu | Add cuFFT reference implementation |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_lto_nvrtc_callback_example.cpp | Add NVRTC-based LTO callback sample |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_lto_callback_device.cu | Add LTO callback device code |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/r2c_c2r_legacy_callback_example.cu | Add legacy callback sample |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/nvrtc_helper.h | Add NVRTC helper |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/cufft_lto_r2c_c2r_scalable.cpp | Add scalable cuFFT LTO app |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/common.h | Add shared cuFFT app helpers |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/common.cpp | Add input-signal generator |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/callback_params.h | Add callback parameter definitions |
| src/cuda/H100/cufft/cufft_lto_r2c_c2r/Makefile | Build scalable cuFFT LTO app |
| src/cuda/H100/cufft/cufft_3d_c2c/cufft_utils.h | Add cuFFT 3D sample helpers |
| src/cuda/H100/cufft/cufft_3d_c2c/cufft_3d_c2c_scalable.cu | Add scalable cuFFT 3D app |
| src/cuda/H100/cufft/cufft_3d_c2c/Makefile | Build scalable cuFFT 3D app |
| src/cuda/H100/README.md | Document H100 suite apps and usage |
| src/cuda/H100/Makefile | Add H100 suite build orchestration |
| src/cuda/GPU_Microbenchmark/ubench/mem/mem_bw/mem_bw.cu | Improve achieved BW computation using min/max clocks |
| src/cuda/GPU_Microbenchmark/hw_def/common/gpuConfig.h | Fix MEM clock unit conversion (keep MHz) |
| src/Makefile | Add H100 to top-level build + add clean target |
| get_data.sh | Run H100 data generation after downloading base dataset |
| README.md | Add H100 suite overview and link to suite README |
| .gitmodules | Add cuGraph and Newton submodules |
Comments suppressed due to low confidence (3)
src/cuda/H100/vpi/vpi_background_subtractor/main.cpp:1
- The second
isOpened()check is validatingoutVideoagain instead ofbgimageVideo, so failures to createbgimageVideowon't be detected. Change the condition to checkbgimageVideo.isOpened().
src/cuda/H100/graph/bfs_standalone/bfs_standalone.cu:1 - This file uses
std::optional,std::numeric_limits, andstd::make_sharedbut does not include<optional>,<limits>, or<memory>. This will fail to compile with stricter toolchains. Add the missing standard library includes.
src/cuda/H100/setup_vpi.sh:1 apt-keyis deprecated on modern Ubuntu releases and can fail depending on system policy. Prefer installing the key into/etc/apt/keyringsand referencing it withsigned-by=in the.listentry (or use the vendor’s recommended repository setup instructions).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/cuda/HPC/cufft/cufft_lto_r2c_c2r/cufft_lto_r2c_c2r_scalable.cpp
Outdated
Show resolved
Hide resolved
src/cuda/HPC/cufft/cufft_lto_r2c_c2r/r2c_c2r_lto_nvrtc_callback_example.cpp
Show resolved
Hide resolved
src/cuda/HPC/cusolver/cusolver_ormqr/cusolver_ormqr_scalable.cu
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…collection-public into add-h100-benchmarks add copilot suggestions
|
Hey @Anunalla - thanks for the commit. @JRPan - does this make sense to you? |
|
I will change the name in the yml def. And yes, some of these scripts have been modified. VPI apps are the only ones that are being used as is. The user needs to "apt install" libnvvpi. |
|
ok - great
even for the ones with small diffs, I wonder if we are better off syncing
to the main source than applying a diff file.
I am not sure what is the current best practice in SW these days.
Also - make sure you change the directory name from H100 to maybe something
representative of what these are. Something like "hpc-with-cuda-libs" or
something like that.
…On Tue, Mar 24, 2026 at 8:44 AM Anusuya Nallathambi < ***@***.***> wrote:
*Anunalla* left a comment (accel-sim/gpu-app-collection#86)
<#86 (comment)>
I will change the name in the yml def.
And yes, some of these scripts have been modified.
Most of these apps MST, BFS, Dwt, gaussian, fdtd3d, CUFFT, CUSOLVER
examples have been modified to support different input sizes as user
argument
Newton sim apps have been modified to remove graph replay feature.
VPI apps are the only ones that are being used as is. The user needs to
"apt install" libnvvpi.
I will remove the source code and have the build workflow build it at
installation location (i.e /opt/nvidia/vpi/)
—
Reply to this email directly, view it on GitHub
<#86?email_source=notifications&email_token=AA7UY4LKOTQF26DE6LMYLIL4SJ7JNA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMJRG44TKNJZG4YKM4TFMFZW63VHMNXW23LFNZ2KKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4117955970>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA7UY4N7IR4KWLLTWL36ZGT4SJ7JNAVCNFSM6AAAAACW247ZHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCMJXHE2TKOJXGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Adding new HPC apps aimed for profiling Hopper
H100 Benchmark Suite
The H100 suite contains 15 modern GPU workloads from H100 profiling and analysis: