-
Notifications
You must be signed in to change notification settings - Fork 188
[Benchmark] Qwen-omni models Online Performance benchmark(mix2t+a) #364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Alicia Wang <[email protected]>
# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md
# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md
# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md
# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md
# Conflicts: # docs/api/README.md # docs/user_guide/examples/offline_inference/text_to_image.md
# Conflicts: # pytest.ini
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| print("{:<40} {:<10}".format("Total input tokens:", metrics.total_input)) | ||
| print("{:<40} {:<10}".format("Total text input tokens:", metrics.total_text_input)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard embedding metrics before printing text token counts
When benchmarking the embeddings endpoint (api_url ending with /v1/embeddings), metrics comes from calculate_metrics_for_embeddings and is an EmbedBenchmarkMetrics instance that only carries aggregate input/output timing fields. The unguarded print of metrics.total_text_input here raises an AttributeError before any results are reported, so embedding benchmarks always fail. This block should check the metric type or skip text-token reporting for embedding runs.
Useful? React with 👍 / 👎.
ZJY0516
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried about maintainability if we direct copy from vllm. cc @DarkLight1337
| cmd_subparser.add_argument( | ||
| "--omni", | ||
| action="store_true", | ||
| default=True, # 对于 Omni 子命令,默认启用 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove chinese comments
vllm_omni/benchmarks/datasets.py
Outdated
|
|
||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many blank lines
Same. Can you summarize what you changed from main vLLM so perhaps we can upstream the changes that aren't specific to vllm-omni? |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better to reuse upstream files rather than copying from upstream
| @@ -0,0 +1,1170 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this file copied from upstream?
| @@ -0,0 +1,1094 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this one copied as well?
why the E2EL is so long? |
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Test Plan
vllm bench serve --omni --dataset-name random-mm
--port 40473
--model /data/models/Qwen2.5-Omni-7B
--endpoint /v1/chat/completions
--backend openai-chat
--request-rate 1
--num-prompts 1
--random-input-len 10
--random-range-ratio 0.0
--random-mm-base-items-per-request 2
--random-mm-num-mm-items-range-ratio 0
--random-mm-limit-mm-per-prompt '{"image":1,"video":1, "audio": 1}'
--random-mm-bucket-config '{"(32, 32, 1)": 0.5, "(0, 1, 1)": 0.1, "(32, 32, 2)":0.4}'
--ignore-eos
--random-output-len 2
Test Result
Successful requests: 1
Request rate configured (RPS): 1.00
Benchmark duration (s): 140.59
Total input tokens: 181
Total text input tokens: 10
Total generated tokens: 64
Request throughput (req/s): 0.01
Audio throughput (num/s): 0.01
Output token throughput (tok/s): 0.46
Peak output token throughput (tok/s): 1.00
Peak concurrent requests: 1.00
Total Token throughput (tok/s): 1.74
-Time to First Token-
Mean TTFT (ms): 139586.53
Median TTFT (ms): 139586.53
P99 TTFT (ms): 139586.53
-Time per Output Token (excl. 1st token)-
Mean TPOT (ms): 0.00
Median TPOT (ms): 0.00
P99 TPOT (ms): 0.00
-Inter-token Latency-
Mean ITL (ms): 0.00
Median ITL (ms): 0.00
P99 ITL (ms): 0.00
-End-to-end Latency-
Mean E2EL (ms): 139586.53
Median E2EL (ms): 139586.53
P99 E2EL (ms): 139586.53
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)