[DO NOT MERGE] Add offline throughput benchmark script for multi-modal models by haojin2 · Pull Request #18154 · sgl-project/sglang

haojin2 · 2026-02-03T05:46:27Z

Motivation

Address part of step 1 for #18077

Modifications

Added bench_offline_throughput.py under multimodal_gen similar to the counterpart for LLM

Accuracy Tests

N/A

Benchmarking and Profiling

Sample single-GPU (RTX 6000 pro) run with GLM-Image + sglang backend + torch.compile: python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend sglang --enable-torch-compile --num-prompts 20 --batch-size 1 with resulting report:

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     233.38                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0857                        
MP Throughput (MP/sec):                       0.0225                        
Requests Per Second:                          0.0857                        
Latency Per Request (sec):                    11.6688                       
Peak Memory (MB):                             0                             
==============================================================================================================

Sample single-GPU (RTX 6000 pro) run with GLM-Image + diffusers backend: python3 -m sglang.multimodal_gen.benchmarks.bench_offline_throughput --model-path zai-org/GLM-Image --height 512 --width 512 --num-inference-steps 20 --backend diffusers --num-prompts 20 --batch-size 1 with resulting report:

==================================== Offline Throughput Benchmark Result =====================================
Model:                                        zai-org/GLM-Image             
Dataset:                                      random                        
Resolution:                                   512x512x1                     
Num Inference Steps:                          20                            
---------------------------------------------------------------------------
Total Requests:                               20                            
Successful Requests:                          20                            
Failed Requests:                              0                             
Total Duration (seconds):                     246.26                        
---------------------------------------------------------------------------
Frames Generated:                             20                            
Megapixels Generated:                         5.24                          
---------------------------------------------------------------------------
Frame Throughput (frames/sec):                0.0812                        
MP Throughput (MP/sec):                       0.0213                        
Requests Per Second:                          0.0812                        
Latency Per Request (sec):                    12.3132                       
Peak Memory (MB):                             0                             
==============================================================================================================

Verification of refactored bench_serving.py (on RTX 6000 pro) with GLM-Image

\server: sglang serve --model-path zai-org/GLM-Image --backend sglang
bench_serving: python3 -m sglang.multimodal_gen.benchmarks.bench_serving --dataset vbench --num-prompts 10 --width 512 --height 512 --model zai-org/GLM-Image

================= Serving Benchmark Result =================
Task:                                    text-to-image  
Model:                                   zai-org/GLM-Image
Dataset:                                 vbench         
--------------------------------------------------
Benchmark duration (s):                  131.30         
Request rate:                            inf            
Max request concurrency:                 1              
Successful requests:                     10/10             
--------------------------------------------------
Request throughput (req/s):              0.08           
Latency Mean (s):                        13.1293        
Latency Median (s):                      12.9035        
Latency P99 (s):                         14.9457        
--------------------------------------------------
Peak Memory Max (MB):                    35387.64       
Peak Memory Mean (MB):                   35387.45       
Peak Memory Median (MB):                 35387.64       
============================================================

TODO: verify on all currently-supported models under multimodal_gen for runnability

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-02-03T05:46:31Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py

mickqian · 2026-02-03T11:09:10Z

also, could you clean the code a bit?x

haojin2 · 2026-02-05T09:00:36Z

cc @zhaochenyang20 Refactored as requested
Also tested for new bench_serving script

haojin2 requested review from mickqian and yhyang201 as code owners February 3, 2026 05:46

github-actions bot added the diffusion SGLang Diffusion label Feb 3, 2026

haojin2 force-pushed the offline_bench branch 2 times, most recently from 466c89f to b81f932 Compare February 3, 2026 06:25

mickqian reviewed Feb 3, 2026

View reviewed changes

python/sglang/multimodal_gen/benchmarks/bench_offline_throughput.py Show resolved Hide resolved

haojin2 mentioned this pull request Feb 4, 2026

[Feature] Benchmark and Optimize GLM-Image Inference Efficiency (SGLang-D vs. Diffusers) #18077

Open

4 tasks

haojin2 force-pushed the offline_bench branch 8 times, most recently from 5e71a2c to 8021c9e Compare February 5, 2026 08:55

Add offline throughput benchmark script for multi-modal models

86ee88e

haojin2 force-pushed the offline_bench branch from 8021c9e to 86ee88e Compare February 5, 2026 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DO NOT MERGE] Add offline throughput benchmark script for multi-modal models#18154

[DO NOT MERGE] Add offline throughput benchmark script for multi-modal models#18154
haojin2 wants to merge 1 commit intosgl-project:mainfrom
haojin2:offline_bench

haojin2 commented Feb 3, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Uh oh!

Uh oh!

mickqian commented Feb 3, 2026

Uh oh!

haojin2 commented Feb 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haojin2 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Uh oh!

Uh oh!

mickqian commented Feb 3, 2026

Uh oh!

haojin2 commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

haojin2 commented Feb 3, 2026 •

edited

Loading

haojin2 commented Feb 5, 2026 •

edited

Loading