[WIP] Dev/debug qwen3 mix modality #431

tzhouam · 2025-12-23T08:59:37Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR aims to solve the bug that current Qwen 3 Omni in our repo will output empty string for mixed modality inputs.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…ion 0.12.0 - Updated installation commands and version references in CUDA, ROCm, and NPU documentation to reflect the new vLLM v0.12.0 release. - Adjusted Docker image tags and version checks accordingly.

…points - Replaced instances of `AnyTokenizer` with `TokenizerLike` in `output_processor.py`, `async_omni.py`, and `omni_stage.py`. - Updated the tokenizer initialization function to `init_tokenizer_from_config` in `async_omni.py` for consistency.

- Introduced a new function `get_mixed_modalities_query` to handle queries involving audio, image, and video inputs. - Updated the default query type in argument parsing to `use_mixed_modalities` for enhanced functionality.

tzhouam added 4 commits December 19, 2025 11:11

[Doc] Update installation instructions for vLLM and vLLM-Omni to vers…

cc89958

…ion 0.12.0 - Updated installation commands and version references in CUDA, ROCm, and NPU documentation to reflect the new vLLM v0.12.0 release. - Adjusted Docker image tags and version checks accordingly.

Merge branch 'main' of https://github.com/tzhouam/vllm-omni into main

0d39365

[Feature] Add mixed modalities query support in end2end.py

a4eb658

- Introduced a new function `get_mixed_modalities_query` to handle queries involving audio, image, and video inputs. - Updated the default query type in argument parsing to `use_mixed_modalities` for enhanced functionality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Dev/debug qwen3 mix modality #431

[WIP] Dev/debug qwen3 mix modality #431

tzhouam commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[WIP] Dev/debug qwen3 mix modality #431

Are you sure you want to change the base?

[WIP] Dev/debug qwen3 mix modality #431

Conversation

tzhouam commented Dec 23, 2025

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant