[Clean] Remove the redundant decoding payloads logic #404

gcanlin · 2025-12-22T11:00:32Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

There has existed one _decode_and_store_request_payloads in execute_model. Remove the one in _preprocess because IIUC GenerationModelRunner doesn't need it indeed.

vllm-omni/vllm_omni/worker/gpu_ar_model_runner.py

Line 88 in ee2b78e

self._decode_and_store_request_payloads(scheduler_output)

Test Plan

Test Qwen/Qwen2.5-Omni is OK locally.

Test Result

Before:

INFO 12-22 11:31:11 [omni_llm.py:500] [Orchestrator] Stage-2 reported ready
INFO 12-22 11:31:11 [omni_llm.py:534] [Orchestrator] All stages initialized successfully
[Stage-0] Max batch size: 1
--------------------------------
[Stage-0] Received batch size=1, request_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403']
--------------------------------
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
[Stage-0] Generate done: batch=1, req_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403'], gen_ms=28183.1
[Stage-1] Max batch size: 1
--------------------------------
[Stage-1] Received batch size=1, request_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403']
--------------------------------
[Stage-1] Generate done: batch=1, req_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403'], gen_ms=18986.5
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403']
--------------------------------
(EngineCore_DP0 pid=2844701) INFO 12-22 11:31:59 [qwen2_5_omni.py:947] Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
[Stage-2] Generate done: batch=1, req_ids=['0_8f6ec894-13ab-4c28-92f9-16ec55fc1403'], gen_ms=7146.4
INFO 12-22 11:32:06 [omni_llm.py:480] [Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 55139.27698135376, 'e2e_sum_time_ms': 55138.81230354309, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 55138.81230354309, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 55139.27698135376, 'final_stage_id': {'0_8f6ec894-13ab-4c28-92f9-16ec55fc1403': 2}, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 53, 'total_time_ms': 28503.944158554077, 'avg_time_per_request_ms': 28503.944158554077, 'avg_tokens_per_s': 1.8593917987344435}, {'stage_id': 1, 'requests': 1, 'tokens': 935, 'total_time_ms': 19211.273193359375, 'avg_time_per_request_ms': 19211.273193359375, 'avg_tokens_per_s': 48.66934068290668}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 7179.709434509277, 'avg_time_per_request_ms': 7179.709434509277, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 98171852, 'total_time_ms': 173.34794998168945, 'tx_mbps': 4530.6265005323585, 'rx_samples': 1, 'rx_total_bytes': 98171852, 'rx_total_time_ms': 178.96413803100586, 'rx_mbps': 4388.448013332886, 'total_samples': 1, 'total_transfer_time_ms': 352.97417640686035, 'total_mbps': 2225.0206063083983}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 18349749, 'total_time_ms': 33.25676918029785, 'tx_mbps': 4414.078565604227, 'rx_samples': 1, 'rx_total_bytes': 18349749, 'rx_total_time_ms': 28.368234634399414, 'rx_mbps': 5174.731310985149, 'total_samples': 1, 'total_transfer_time_ms': 62.669992446899414, 'total_mbps': 2342.3968356846804}]}
[rank0]:[W1222 11:32:06.466625183 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1222 11:32:06.474396140 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1222 11:32:06.477647444 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Request ID: 0_8f6ec894-13ab-4c28-92f9-16ec55fc1403, Text saved to output_audio/0_8f6ec894-13ab-4c28-92f9-16ec55fc1403.txt
Request ID: 0_8f6ec894-13ab-4c28-92f9-16ec55fc1403, Saved audio to output_audio/output_0_8f6ec894-13ab-4c28-92f9-16ec55fc1403.wav

After:

INFO 12-22 10:46:37 [omni_llm.py:500] [Orchestrator] Stage-2 reported ready
INFO 12-22 10:46:37 [omni_llm.py:534] [Orchestrator] All stages initialized successfully
[Stage-0] Max batch size: 1
--------------------------------
[Stage-0] Received batch size=1, request_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e']
--------------------------------
The image processor of type `Qwen2VLImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was saved with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with `use_fast=False`. Note that this behavior will be extended to all models in a future release.
[Stage-0] Generate done: batch=1, req_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e'], gen_ms=23372.6
[Stage-1] Max batch size: 1
--------------------------------
[Stage-1] Received batch size=1, request_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e']
--------------------------------
[Stage-1] Generate done: batch=1, req_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e'], gen_ms=20301.7
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e']
--------------------------------
(EngineCore_DP0 pid=2707065) INFO 12-22 10:47:21 [qwen2_5_omni.py:947] Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
[Stage-2] Generate done: batch=1, req_ids=['0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e'], gen_ms=7153.7
INFO 12-22 10:47:29 [omni_llm.py:480] [Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 52018.84913444519, 'e2e_sum_time_ms': 52018.369913101196, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 52018.369913101196, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 52018.84913444519, 'final_stage_id': {'0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e': 2}, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 53, 'total_time_ms': 23742.552995681763, 'avg_time_per_request_ms': 23742.552995681763, 'avg_tokens_per_s': 2.2322788964455302}, {'stage_id': 1, 'requests': 1, 'tokens': 935, 'total_time_ms': 20697.94487953186, 'avg_time_per_request_ms': 20697.94487953186, 'avg_tokens_per_s': 45.173567010733464}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 7206.0651779174805, 'avg_time_per_request_ms': 7206.0651779174805, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 98171852, 'total_time_ms': 270.3440189361572, 'tx_mbps': 2905.0941059860074, 'rx_samples': 1, 'rx_total_bytes': 98171852, 'rx_total_time_ms': 317.4116611480713, 'rx_mbps': 2474.309901404743, 'total_samples': 1, 'total_transfer_time_ms': 589.0209674835205, 'total_mbps': 1333.3562968995209}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 18349749, 'total_time_ms': 59.831857681274414, 'tx_mbps': 2453.5088444352846, 'rx_samples': 1, 'rx_total_bytes': 18349749, 'rx_total_time_ms': 44.561147689819336, 'rx_mbps': 3294.304559250349, 'total_samples': 1, 'total_transfer_time_ms': 105.30877113342285, 'total_mbps': 1393.976877899483}]}
[rank0]:[W1222 10:47:29.758995453 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1222 10:47:29.778765127 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1222 10:47:29.787276983 ProcessGroupNCCL.cpp:1524] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Request ID: 0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e, Text saved to output_audio/0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e.txt
Request ID: 0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e, Saved audio to output_audio/output_0_df8bb2f2-cd1f-498e-b9f8-61cb64501c1e.wav

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <[email protected]>

gcanlin · 2025-12-22T11:02:08Z

cc @tzhouam @R2-Y @hsliuustc0106 PTAL and add a ready tag to test all models. Thanks!

hsliuustc0106 · 2025-12-22T11:16:53Z

could you please post the test result before and after this commit?

gcanlin · 2025-12-22T11:37:06Z

could you please post the test result before and after this commit?

Of course. Update now.

hsliuustc0106 · 2025-12-22T11:57:55Z

@natureofnature PTAL

[Clean] Remove the redundant decoding payloads logic

d4c4c8f

Signed-off-by: gcanlin <[email protected]>

gcanlin marked this pull request as ready for review December 22, 2025 11:00

gcanlin requested a review from hsliuustc0106 as a code owner December 22, 2025 11:00

hsliuustc0106 added the ready label to trigger buildkite CI label Dec 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clean] Remove the redundant decoding payloads logic #404

[Clean] Remove the redundant decoding payloads logic #404

Uh oh!

gcanlin commented Dec 22, 2025 •

edited

Loading

Uh oh!

gcanlin commented Dec 22, 2025

Uh oh!

hsliuustc0106 commented Dec 22, 2025

Uh oh!

gcanlin commented Dec 22, 2025

Uh oh!

hsliuustc0106 commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Clean] Remove the redundant decoding payloads logic #404

Are you sure you want to change the base?

[Clean] Remove the redundant decoding payloads logic #404

Uh oh!

Conversation

gcanlin commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gcanlin commented Dec 22, 2025

Uh oh!

hsliuustc0106 commented Dec 22, 2025

Uh oh!

gcanlin commented Dec 22, 2025

Uh oh!

hsliuustc0106 commented Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gcanlin commented Dec 22, 2025 •

edited

Loading