[Benchmark] Qwen-omni models Online Performance benchmark(mix2t+a) #364

yenuo26 · 2025-12-18T08:49:54Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

vllm bench serve --omni --dataset-name random-mm
--port 40473
--model /data/models/Qwen2.5-Omni-7B
--endpoint /v1/chat/completions
--backend openai-chat
--request-rate 1
--num-prompts 1
--random-input-len 10
--random-range-ratio 0.0
--random-mm-base-items-per-request 2
--random-mm-num-mm-items-range-ratio 0
--random-mm-limit-mm-per-prompt '{"image":1,"video":1, "audio": 1}'
--random-mm-bucket-config '{"(32, 32, 1)": 0.5, "(0, 1, 1)": 0.1, "(32, 32, 2)":0.4}'
--ignore-eos
--random-output-len 2

Test Result

Successful requests: 1
Request rate configured (RPS): 1.00
Benchmark duration (s): 140.59
Total input tokens: 181
Total text input tokens: 10
Total generated tokens: 64
Request throughput (req/s): 0.01
Audio throughput (num/s): 0.01
Output token throughput (tok/s): 0.46
Peak output token throughput (tok/s): 1.00
Peak concurrent requests: 1.00
Total Token throughput (tok/s): 1.74
-Time to First Token-
Mean TTFT (ms): 139586.53
Median TTFT (ms): 139586.53
P99 TTFT (ms): 139586.53
-Time per Output Token (excl. 1st token)-
Mean TPOT (ms): 0.00
Median TPOT (ms): 0.00
P99 TPOT (ms): 0.00
-Inter-token Latency-
Mean ITL (ms): 0.00
Median ITL (ms): 0.00
P99 ITL (ms): 0.00
-End-to-end Latency-
Mean E2EL (ms): 139586.53
Median E2EL (ms): 139586.53
P99 E2EL (ms): 139586.53

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: Alicia Wang <[email protected]>

拉取文档

# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md

# Conflicts: # pytest.ini

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-18T08:56:57Z

vllm_omni/benchmarks/serve.py

+    print("{:<40} {:<10}".format("Total input tokens:", metrics.total_input))
+    print("{:<40} {:<10}".format("Total text input tokens:", metrics.total_text_input))


Guard embedding metrics before printing text token counts

When benchmarking the embeddings endpoint (api_url ending with /v1/embeddings), metrics comes from calculate_metrics_for_embeddings and is an EmbedBenchmarkMetrics instance that only carries aggregate input/output timing fields. The unguarded print of metrics.total_text_input here raises an AttributeError before any results are reported, so embedding benchmarks always fail. This block should check the metric type or skip text-token reporting for embedding runs.

Useful? React with 👍 / 👎.

ZJY0516

I am worried about maintainability if we direct copy from vllm. cc @DarkLight1337

ZJY0516 · 2025-12-18T10:11:57Z

vllm_omni/entrypoints/cli/benchmark/main.py

+            cmd_subparser.add_argument(
+                "--omni",
+                action="store_true",
+                default=True,  # 对于 Omni 子命令，默认启用


remove chinese comments

ZJY0516 · 2025-12-18T10:12:38Z

vllm_omni/benchmarks/datasets.py

+
+
+
+


too many blank lines

DarkLight1337 · 2025-12-18T11:55:33Z

I am worried about maintainability if we direct copy from vllm

Same. Can you summarize what you changed from main vLLM so perhaps we can upstream the changes that aren't specific to vllm-omni?

hsliuustc0106

it's better to reuse upstream files rather than copying from upstream

hsliuustc0106 · 2025-12-18T14:44:20Z

vllm_omni/benchmarks/serve.py

@@ -0,0 +1,1170 @@
+# SPDX-License-Identifier: Apache-2.0


is this file copied from upstream?

hsliuustc0106 · 2025-12-18T14:45:09Z

vllm_omni/benchmarks/datasets.py

@@ -0,0 +1,1094 @@
+# SPDX-License-Identifier: Apache-2.0


is this one copied as well?

hsliuustc0106 · 2025-12-21T08:16:11Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

vllm bench serve --omni --dataset-name random-mm --port 40473 --model /data/models/Qwen2.5-Omni-7B --endpoint /v1/chat/completions --backend openai-chat --request-rate 1 --num-prompts 1 --random-input-len 10 --random-range-ratio 0.0 --random-mm-base-items-per-request 2 --random-mm-num-mm-items-range-ratio 0 --random-mm-limit-mm-per-prompt '{"image":1,"video":1, "audio": 1}' --random-mm-bucket-config '{"(32, 32, 1)": 0.5, "(0, 1, 1)": 0.1, "(32, 32, 2)":0.4}' --ignore-eos --random-output-len 2

Test Result

Successful requests: 1 Request rate configured (RPS): 1.00 Benchmark duration (s): 140.59 Total input tokens: 181 Total text input tokens: 10 Total generated tokens: 64 Request throughput (req/s): 0.01 Audio throughput (num/s): 0.01 Output token throughput (tok/s): 0.46 Peak output token throughput (tok/s): 1.00 Peak concurrent requests: 1.00 Total Token throughput (tok/s): 1.74 -Time to First Token- Mean TTFT (ms): 139586.53 Median TTFT (ms): 139586.53 P99 TTFT (ms): 139586.53 -Time per Output Token (excl. 1st token)- Mean TPOT (ms): 0.00 Median TPOT (ms): 0.00 P99 TPOT (ms): 0.00 -Inter-token Latency- Mean ITL (ms): 0.00 Median ITL (ms): 0.00 P99 ITL (ms): 0.00 -End-to-end Latency- Mean E2EL (ms): 139586.53 Median E2EL (ms): 139586.53 P99 E2EL (ms): 139586.53

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".

The test plan, such as providing test command.

The test results, such as pasting the results comparison before and after, or e2e results

(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

why the E2EL is so long?

wangyu31577 and others added 29 commits December 11, 2025 16:22

上传文档

7acf531

上传文档

4730e8c

Add dir structure & coding style for tests.

19d5728

Signed-off-by: Alicia Wang <[email protected]>

Merge pull request #1 from congw729/doc/test_plan

8343e8f

拉取文档

修改文档

b162a3a

Merge branch 'main' of https://github.com/yenuo26/vllm-omni into main

385d91f

# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md

Merge branch 'main' of https://github.com/yenuo26/vllm-omni into main

35b7676

# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md

Merge branch 'main' of https://github.com/yenuo26/vllm-omni into main

37eddcb

Merge branch 'main' of https://github.com/yenuo26/vllm-omni into main

155884c

# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md

Merge branch 'main' of https://github.com/yenuo26/vllm-omni into main

61442b6

# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md

Merge branch 'main' of https://github.com/yenuo26/vllm-omni into main

f2c67c9

# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md

提交benchmark

94875c1

提交benchmark

da30505

提交benchmark

1e5842e

提交benchmark

d80d709

新增benchmark

8d57415

新增benchmark

4ac263a

新增benchmark

1bb5407

新增benchmark

e8d26fc

新增benchmark

9cb9abc

新增benchmark

555d8ae

新增benchmark

0d94a5a

新增benchmark

e07c41e

新增benchmark

556436f

新增benchmark

f5c4591

新增benchmark

bcd0dd6

新增benchmark

e49d385

新增benchmark

76db196

Merge branch 'main' of https://github.com/yenuo26/vllm-omni into main

4e58d43

# Conflicts: # pytest.ini

yenuo26 requested a review from hsliuustc0106 as a code owner December 18, 2025 08:49

chatgpt-codex-connector bot reviewed Dec 18, 2025

View reviewed changes

ZJY0516 requested a review from DarkLight1337 December 18, 2025 10:10

ZJY0516 reviewed Dec 18, 2025

View reviewed changes

modify blank lines and chinese comments

b673b56

hsliuustc0106 reviewed Dec 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Qwen-omni models Online Performance benchmark(mix2t+a) #364

[Benchmark] Qwen-omni models Online Performance benchmark(mix2t+a) #364

Uh oh!

yenuo26 commented Dec 18, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Uh oh!

ZJY0516 left a comment

Uh oh!

ZJY0516 Dec 18, 2025

Uh oh!

ZJY0516 Dec 18, 2025

Uh oh!

DarkLight1337 commented Dec 18, 2025

Uh oh!

hsliuustc0106 left a comment

Uh oh!

hsliuustc0106 Dec 18, 2025

Uh oh!

hsliuustc0106 Dec 18, 2025

Uh oh!

hsliuustc0106 commented Dec 21, 2025

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		print("{:<40} {:<10}".format("Total input tokens:", metrics.total_input))
		print("{:<40} {:<10}".format("Total text input tokens:", metrics.total_text_input))

[Benchmark] Qwen-omni models Online Performance benchmark(mix2t+a) #364

Are you sure you want to change the base?

[Benchmark] Qwen-omni models Online Performance benchmark(mix2t+a) #364

Uh oh!

Conversation

yenuo26 commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 left a comment

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Dec 18, 2025

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 21, 2025

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yenuo26 commented Dec 18, 2025 •

edited

Loading