Skip to content

[DO NOT MERGE] Add offline throughput benchmark script for multi-modal models#18154

Open
haojin2 wants to merge 1 commit intosgl-project:mainfrom
haojin2:offline_bench
Open

[DO NOT MERGE] Add offline throughput benchmark script for multi-modal models#18154
haojin2 wants to merge 1 commit intosgl-project:mainfrom
haojin2:offline_bench

Conversation

@haojin2
Copy link
Contributor

@haojin2 haojin2 commented Feb 3, 2026

Motivation

Address part of step 1 for #18077

Modifications

  • Added bench_offline_throughput.py under multimodal_gen similar to the counterpart for LLM

Accuracy Tests

N/A

Benchmarking and Profiling

  • Sample single-GPU (RTX 6000 pro) run with GLM-Image + sglang backend + torch.compile: python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend sglang --enable-torch-compile --num-prompts 20 --batch-size 1 with resulting report:
==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     233.38                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0857                        
MP Throughput (MP/sec):                       0.0225                        
Requests Per Second:                          0.0857                        
Latency Per Request (sec):                    11.6688                       
Peak Memory (MB):                             0                             
==============================================================================================================
  • Sample single-GPU (RTX 6000 pro) run with GLM-Image + diffusers backend: python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend diffusers --num-prompts 20 --batch-size 1 with resulting report:
==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     246.26                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0812                        
MP Throughput (MP/sec):                       0.0213                        
Requests Per Second:                          0.0812                        
Latency Per Request (sec):                    12.3132                       
Peak Memory (MB):                             0                             
==============================================================================================================
  • Verification of refactored bench_serving.py (on RTX 6000 pro) with GLM-Image

\server: sglang serve --model-path zai-org/GLM-Image --backend sglang
bench_serving: python3 -m sglang.multimodal_gen.benchmarks.bench_serving --dataset vbench --num-prompts 10 --width 512 --height 512 --model zai-org/GLM-Image

================= Serving Benchmark Result =================
Task:                                    text-to-image  
Model:                                   zai-org/GLM-Image
Dataset:                                 vbench         
--------------------------------------------------
Benchmark duration (s):                  131.30         
Request rate:                            inf            
Max request concurrency:                 1              
Successful requests:                     10/10             
--------------------------------------------------
Request throughput (req/s):              0.08           
Latency Mean (s):                        13.1293        
Latency Median (s):                      12.9035        
Latency P99 (s):                         14.9457        
--------------------------------------------------
Peak Memory Max (MB):                    35387.64       
Peak Memory Mean (MB):                   35387.45       
Peak Memory Median (MB):                 35387.64       
============================================================
  • TODO: verify on all currently-supported models under multimodal_gen for runnability

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the diffusion SGLang Diffusion label Feb 3, 2026
@haojin2 haojin2 force-pushed the offline_bench branch 2 times, most recently from 466c89f to b81f932 Compare February 3, 2026 06:25
@mickqian
Copy link
Collaborator

mickqian commented Feb 3, 2026

also, could you clean the code a bit?x

@haojin2
Copy link
Contributor Author

haojin2 commented Feb 5, 2026

cc @zhaochenyang20 Refactored as requested
Also tested for new bench_serving script

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

diffusion SGLang Diffusion

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants