Skip to content

Conversation

@yenuo26
Copy link

@yenuo26 yenuo26 commented Dec 18, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

vllm bench serve --omni --dataset-name random-mm 
--port  40473                
--model /data/models/Qwen2.5-Omni-7B                
--endpoint /v1/chat/completions                
--backend openai-chat                
--request-rate 1                
--num-prompts 1                
--random-input-len 10                
--random-range-ratio 0.0                
--random-mm-base-items-per-request 2                
--random-mm-num-mm-items-range-ratio 0                
--random-mm-limit-mm-per-prompt '{"image":1,"video":1, "audio": 1}'                
--random-mm-bucket-config '{"(32, 32, 1)": 0.5, "(0, 1, 1)": 0.1, "(32, 32, 2)":0.4}'                
--ignore-eos                
--random-output-len 2

Test Result

Successful requests:                     1
Request rate configured (RPS):           1.00
Benchmark duration (s):                  140.59
Total input tokens:                      181
Total text input tokens:                 10
Total generated tokens:                  64
Request throughput (req/s):              0.01
Audio throughput (num/s):                0.01
Output token throughput (tok/s):         0.46
Peak output token throughput (tok/s):    1.00
Peak concurrent requests:                1.00
Total Token throughput (tok/s):          1.74
-Time to First Token-
Mean TTFT (ms):                          139586.53
Median TTFT (ms):                        139586.53
P99 TTFT (ms):                           139586.53
-Time per Output Token (excl. 1st token)-
Mean TPOT (ms):                          0.00
Median TPOT (ms):                        0.00
P99 TPOT (ms):                           0.00
-Inter-token Latency-
Mean ITL (ms):                           0.00
Median ITL (ms):                         0.00
P99 ITL (ms):                            0.00
-End-to-end Latency-
Mean E2EL (ms):                          139586.53
Median E2EL (ms):                        139586.53
P99 E2EL (ms):                           139586.53


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

wangyu31577 and others added 29 commits December 11, 2025 16:22
# Conflicts:
#	docs/api/README.md
#	docs/user_guide/examples/offline_inference/text_to_image.md
# Conflicts:
#	docs/api/README.md
#	docs/user_guide/examples/offline_inference/text_to_image.md
# Conflicts:
#	docs/api/README.md
#	docs/user_guide/examples/offline_inference/text_to_image.md
# Conflicts:
#	docs/api/README.md
#	docs/user_guide/examples/offline_inference/text_to_image.md
# Conflicts:
#	docs/api/README.md
#	docs/user_guide/examples/offline_inference/text_to_image.md
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +563 to +564
print("{:<40} {:<10}".format("Total input tokens:", metrics.total_input))
print("{:<40} {:<10}".format("Total text input tokens:", metrics.total_text_input))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard embedding metrics before printing text token counts

When benchmarking the embeddings endpoint (api_url ending with /v1/embeddings), metrics comes from calculate_metrics_for_embeddings and is an EmbedBenchmarkMetrics instance that only carries aggregate input/output timing fields. The unguarded print of metrics.total_text_input here raises an AttributeError before any results are reported, so embedding benchmarks always fail. This block should check the metric type or skip text-token reporting for embedding runs.

Useful? React with 👍 / 👎.

Copy link
Collaborator

@ZJY0516 ZJY0516 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about maintainability if we direct copy from vllm. cc @DarkLight1337

cmd_subparser.add_argument(
"--omni",
action="store_true",
default=True, # 对于 Omni 子命令,默认启用
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove chinese comments





Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many blank lines

@DarkLight1337
Copy link
Member

I am worried about maintainability if we direct copy from vllm

Same. Can you summarize what you changed from main vLLM so perhaps we can upstream the changes that aren't specific to vllm-omni?

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to reuse upstream files rather than copying from upstream

@@ -0,0 +1,1170 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file copied from upstream?

@@ -0,0 +1,1094 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this one copied as well?

@hsliuustc0106
Copy link
Collaborator

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

vllm bench serve --omni --dataset-name random-mm  --port  40473                 --model /data/models/Qwen2.5-Omni-7B                 --endpoint /v1/chat/completions                 --backend openai-chat                 --request-rate 1                 --num-prompts 1                 --random-input-len 10                 --random-range-ratio 0.0                 --random-mm-base-items-per-request 2                 --random-mm-num-mm-items-range-ratio 0                 --random-mm-limit-mm-per-prompt '{"image":1,"video":1, "audio": 1}'                 --random-mm-bucket-config '{"(32, 32, 1)": 0.5, "(0, 1, 1)": 0.1, "(32, 32, 2)":0.4}'                 --ignore-eos                 --random-output-len 2

Test Result

Successful requests:                     1 Request rate configured (RPS):           1.00 Benchmark duration (s):                  140.59 Total input tokens:                      181 Total text input tokens:                 10 Total generated tokens:                  64 Request throughput (req/s):              0.01 Audio throughput (num/s):                0.01 Output token throughput (tok/s):         0.46 Peak output token throughput (tok/s):    1.00 Peak concurrent requests:                1.00 Total Token throughput (tok/s):          1.74 -Time to First Token- Mean TTFT (ms):                          139586.53 Median TTFT (ms):                        139586.53 P99 TTFT (ms):                           139586.53 -Time per Output Token (excl. 1st token)- Mean TPOT (ms):                          0.00 Median TPOT (ms):                        0.00 P99 TPOT (ms):                           0.00 -Inter-token Latency- Mean ITL (ms):                           0.00 Median ITL (ms):                         0.00 P99 ITL (ms):                            0.00 -End-to-end Latency- Mean E2EL (ms):                          139586.53 Median E2EL (ms):                        139586.53 P99 E2EL (ms):                           139586.53

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

why the E2EL is so long?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants