Skip to content

Conversation

@gcanlin
Copy link
Contributor

@gcanlin gcanlin commented Dec 22, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

There has existed one _decode_and_store_request_payloads in execute_model. Remove the one in _preprocess because IIUC GenerationModelRunner doesn't need it indeed.

self._decode_and_store_request_payloads(scheduler_output)

Test Plan

Test Qwen/Qwen2.5-Omni is OK locally.

Test Result

Before:

INFO 12-22 11:31:11 [omni_llm.py:500] [Orchestrator] Stage-2 reported ready
INFO 12-22 11:31:11 [omni_llm.py:534] [Orchestrator] All stages initialized successfully
[Stage-0] Max batch size: 1
--------------------------------
[Stage-0] Received batch size=1, request_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403']
--------------------------------
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
[Stage-0] Generate done: batch=1, req_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403'], gen_ms=28183.1
[Stage-1] Max batch size: 1
--------------------------------
[Stage-1] Received batch size=1, request_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403']
--------------------------------
[Stage-1] Generate done: batch=1, req_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403'], gen_ms=18986.5
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403']
--------------------------------
(EngineCore_DP0 pid=2844701) INFO 12-22 11:31:59 [qwen2_5_omni.py:947] Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
[Stage-2] Generate done: batch=1, req_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403'], gen_ms=7146.4
INFO 12-22 11:32:06 [omni_llm.py:480] [Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 55139.27698135376, 'e2e_sum_time_ms': 55138.81230354309, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 55138.81230354309, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 55139.27698135376, 'final_stage_id': {'0_8f6ec894-13ab-4c28-92f9-16ec55fc1403': 2}, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 53, 'total_time_ms': 28503.944158554077, 'avg_time_per_request_ms': 28503.944158554077, 'avg_tokens_per_s': 1.8593917987344435}, {'stage_id': 1, 'requests': 1, 'tokens': 935, 'total_time_ms': 19211.273193359375, 'avg_time_per_request_ms': 19211.273193359375, 'avg_tokens_per_s': 48.66934068290668}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 7179.709434509277, 'avg_time_per_request_ms': 7179.709434509277, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 98171852, 'total_time_ms': 173.34794998168945, 'tx_mbps': 4530.6265005323585, 'rx_samples': 1, 'rx_total_bytes': 98171852, 'rx_total_time_ms': 178.96413803100586, 'rx_mbps': 4388.448013332886, 'total_samples': 1, 'total_transfer_time_ms': 352.97417640686035, 'total_mbps': 2225.0206063083983}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 18349749, 'total_time_ms': 33.25676918029785, 'tx_mbps': 4414.078565604227, 'rx_samples': 1, 'rx_total_bytes': 18349749, 'rx_total_time_ms': 28.368234634399414, 'rx_mbps': 5174.731310985149, 'total_samples': 1, 'total_transfer_time_ms': 62.669992446899414, 'total_mbps': 2342.3968356846804}]}
[rank0]:[W1222 11:32:06.466625183 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1222 11:32:06.474396140 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1222 11:32:06.477647444 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Request ID: 0_8f6ec894-13ab-4c28-92f9-16ec55fc1403, Text saved to output_audio/0_8f6ec894-13ab-4c28-92f9-16ec55fc1403.txt
Request ID: 0_8f6ec894-13ab-4c28-92f9-16ec55fc1403, Saved audio to output_audio/output_0_8f6ec894-13ab-4c28-92f9-16ec55fc1403.wav

After:

INFO 12-22 10:46:37 [omni_llm.py:500] [Orchestrator] Stage-2 reported ready
INFO 12-22 10:46:37 [omni_llm.py:534] [Orchestrator] All stages initialized successfully
[Stage-0] Max batch size: 1
--------------------------------
[Stage-0] Received batch size=1, request_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e']
--------------------------------
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
[Stage-0] Generate done: batch=1, req_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e'], gen_ms=23372.6
[Stage-1] Max batch size: 1
--------------------------------
[Stage-1] Received batch size=1, request_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e']
--------------------------------
[Stage-1] Generate done: batch=1, req_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e'], gen_ms=20301.7
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e']
--------------------------------
(EngineCore_DP0 pid=2707065) INFO 12-22 10:47:21 [qwen2_5_omni.py:947] Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
[Stage-2] Generate done: batch=1, req_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e'], gen_ms=7153.7
INFO 12-22 10:47:29 [omni_llm.py:480] [Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 52018.84913444519, 'e2e_sum_time_ms': 52018.369913101196, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 52018.369913101196, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 52018.84913444519, 'final_stage_id': {'0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e': 2}, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 53, 'total_time_ms': 23742.552995681763, 'avg_time_per_request_ms': 23742.552995681763, 'avg_tokens_per_s': 2.2322788964455302}, {'stage_id': 1, 'requests': 1, 'tokens': 935, 'total_time_ms': 20697.94487953186, 'avg_time_per_request_ms': 20697.94487953186, 'avg_tokens_per_s': 45.173567010733464}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 7206.0651779174805, 'avg_time_per_request_ms': 7206.0651779174805, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 98171852, 'total_time_ms': 270.3440189361572, 'tx_mbps': 2905.0941059860074, 'rx_samples': 1, 'rx_total_bytes': 98171852, 'rx_total_time_ms': 317.4116611480713, 'rx_mbps': 2474.309901404743, 'total_samples': 1, 'total_transfer_time_ms': 589.0209674835205, 'total_mbps': 1333.3562968995209}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 18349749, 'total_time_ms': 59.831857681274414, 'tx_mbps': 2453.5088444352846, 'rx_samples': 1, 'rx_total_bytes': 18349749, 'rx_total_time_ms': 44.561147689819336, 'rx_mbps': 3294.304559250349, 'total_samples': 1, 'total_transfer_time_ms': 105.30877113342285, 'total_mbps': 1393.976877899483}]}
[rank0]:[W1222 10:47:29.758995453 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1222 10:47:29.778765127 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1222 10:47:29.787276983 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Request ID: 0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e, Text saved to output_audio/0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e.txt
Request ID: 0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e, Saved audio to output_audio/output_0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e.wav

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@gcanlin gcanlin marked this pull request as ready for review December 22, 2025 11:00
@gcanlin
Copy link
Contributor Author

gcanlin commented Dec 22, 2025

cc @tzhouam @R2-Y @hsliuustc0106 PTAL and add a ready tag to test all models. Thanks!

@hsliuustc0106
Copy link
Collaborator

could you please post the test result before and after this commit?

@gcanlin
Copy link
Contributor Author

gcanlin commented Dec 22, 2025

could you please post the test result before and after this commit?

Of course. Update now.

@hsliuustc0106
Copy link
Collaborator

@natureofnature PTAL

@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Dec 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants