[Feature] Support Qwen Omni online batch inference #438

ZeldaHuang · 2025-12-23T12:38:28Z

Add continuous batching support for omni online serving, ref #410

Purpose

Changes:

Add an async function generation_single_request in _stage_worker_async to handle single request generation
Add an async queue generation_out_q to collect request output
Change test_video_to_audio in e2e test_qwen3_omni.py to test_video_to_audio_concurrent to test batch generation

Test Plan

python -m pytest -s -v test_qwen3_omni.py

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/entrypoints/omni_stage.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tests/e2e/online_serving/test_qwen3_omni.py

vllm_omni/entrypoints/omni_stage.py

hsliuustc0106 · 2025-12-24T12:03:36Z

@Bounty-hunter PTAL

Bounty-hunter · 2025-12-24T13:01:09Z

vllm_omni/entrypoints/omni_stage.py

            _gen_t1 = _time.time()
            _gen_ms = (_gen_t1 - _gen_t0) * 1000.0
+            await generation_out_q.put((rid, gen_output, _gen_ms))
+        except Exception as e:


It seems that _generation_tasks_by_rid[rid] is not cleaned up when an exception occurs.

Bounty-hunter · 2025-12-24T13:12:26Z

vllm_omni/entrypoints/omni_stage.py

            try:
+                generation_task = _generation_tasks_by_rid.pop(rid, None)
+                if generation_task is None or not generation_task.done():
+                    raise asyncio.InvalidStateError(f"[Stage-{stage_id}] generation task failed for request: {rid}")


Is this check actually necessary here? maybe we can use the following code to auto clean _generation_tasks_by_rid for both normal and exception request?

task.add_done_callback(self._generation_tasks_by_rid.discard)

Bounty-hunter

Overall the changes look good.

Bounty-hunter · 2025-12-24T13:23:09Z

vllm_omni/entrypoints/omni_stage.py

+        _batch_seq += 1
+        if _stats_file:
+            _avg_tokens_per_s = (
+                (_agg_total_tokens * 1000.0 / _agg_total_gen_time_ms) if _agg_total_gen_time_ms > 0 else 0.0


For batch manner, the _agg_total_gen_time_ms is overestimated, because the _gen_ms for each individual request may overlaps.

hsliuustc0106 · 2025-12-24T13:30:11Z

fix precommit & DCO sign-off please

batch works but slow

a001aab

ZeldaHuang requested a review from hsliuustc0106 as a code owner December 23, 2025 12:38

chatgpt-codex-connector bot reviewed Dec 23, 2025

View reviewed changes

vllm_omni/entrypoints/omni_stage.py Show resolved Hide resolved

ZeldaHuang marked this pull request as draft December 23, 2025 12:41

ZeldaHuang added 2 commits December 24, 2025 18:58

refine code

5e4eb74

add concurrent e2e test

4958945

ZeldaHuang marked this pull request as ready for review December 24, 2025 11:18

chatgpt-codex-connector bot reviewed Dec 24, 2025

View reviewed changes

tests/e2e/online_serving/test_qwen3_omni.py Show resolved Hide resolved

vllm_omni/entrypoints/omni_stage.py Show resolved Hide resolved

refine code

cd074d5

ZeldaHuang changed the title ~~[WIP][Feature] Support Qwen Omni online batch inference~~ [Feature] Support Qwen Omni online batch inference Dec 24, 2025

Merge branch 'main' into online_batch_inference

00adea8

hsliuustc0106 requested a review from tzhouam December 24, 2025 12:03

Bounty-hunter reviewed Dec 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Support Qwen Omni online batch inference #438

[Feature] Support Qwen Omni online batch inference #438

ZeldaHuang commented Dec 23, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Dec 24, 2025

Uh oh!

Bounty-hunter Dec 24, 2025

Uh oh!

Bounty-hunter Dec 24, 2025

Uh oh!

Bounty-hunter left a comment •

edited

Loading

Uh oh!

Bounty-hunter Dec 24, 2025 •

edited

Loading

Uh oh!

hsliuustc0106 commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feature] Support Qwen Omni online batch inference #438

Are you sure you want to change the base?

[Feature] Support Qwen Omni online batch inference #438

Conversation

ZeldaHuang commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Dec 24, 2025

Uh oh!

Bounty-hunter Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Bounty-hunter Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Bounty-hunter left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bounty-hunter Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZeldaHuang commented Dec 23, 2025 •

edited

Loading

Bounty-hunter left a comment •

edited

Loading

Bounty-hunter Dec 24, 2025 •

edited

Loading