Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Input data structures for multi-modal inputs.

Engine classes for offline and online inference.

- [vllm_omni.diffusion.diffusion_engine.BackgroundResources][]
- [vllm_omni.diffusion.diffusion_engine.DiffusionEngine][]
- [vllm_omni.engine.AdditionalInformationEntry][]
- [vllm_omni.engine.AdditionalInformationPayload][]
Expand All @@ -57,6 +58,7 @@ Core scheduling and caching components.

Model execution components.

- [vllm_omni.model_executor.custom_process_mixin.CustomProcessMixin][]
- [vllm_omni.model_executor.models.output_templates.OmniOutput][]
- [vllm_omni.model_executor.models.qwen2_5_omni.qwen2_5_omni.Qwen2_5OmniForConditionalGeneration][]
- [vllm_omni.model_executor.models.qwen2_5_omni.qwen2_5_omni_talker.Qwen2_5OmniTalkerForConditionalGeneration][]
Expand Down
79 changes: 7 additions & 72 deletions docs/user_guide/examples/online_serving/qwen2_5_omni.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,84 +74,19 @@ bash run_curl_multimodal_generation.sh mixed_modalities
```

## Modality control

You can control output modalities to specify which types of output the model should generate. This is useful when you only need text output and want to skip audio generation stages for better performance.

### Supported modalities

| Modalities | Output |
|------------|--------|
| `["text"]` | Text only |
| `["audio"]` | Text + Audio |
| `["text", "audio"]` | Text + Audio |
| Not specified | Text + Audio (default) |

### Using curl

#### Text only

```bash
curl http://localhost:8091/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-Omni-7B",
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
"modalities": ["text"]
}'
```

#### Text + Audio

```bash
curl http://localhost:8091/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-Omni-7B",
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
"modalities": ["audio"]
}'
```

### Using Python client

If you want to control output modalities, e.g. only output text, you can run the command below:
```bash
python openai_chat_completion_client_for_multimodal_generation.py \
--query-type mixed_modalities \
--modalities text
```

### Using OpenAI Python SDK

#### Text only

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")

response = client.chat.completions.create(
model="Qwen/Qwen2.5-Omni-7B",
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
modalities=["text"]
)
print(response.choices[0].message.content)
```

#### Text + Audio

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")

response = client.chat.completions.create(
model="Qwen/Qwen2.5-Omni-7B",
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
modalities=["audio"]
)
# Response contains two choices: one with text, one with audio
print(response.choices[0].message.content) # Text response
print(response.choices[1].message.audio) # Audio response
## Streaming Output
If you want to enable streaming output, please set the argument as below. The final output will be obtained just after generated by corresponding stage. Now we only support text streaming output. Other modalities can output normally.
```bash
python openai_chat_completion_client_for_multimodal_generation.py \
--query-type mixed_modalities \
--stream
```

## Run Local Web UI Demo
Expand Down
79 changes: 7 additions & 72 deletions docs/user_guide/examples/online_serving/qwen3_omni.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,84 +82,19 @@ sudo apt install ffmpeg
```

## Modality control

You can control output modalities to specify which types of output the model should generate. This is useful when you only need text output and want to skip audio generation stages for better performance.

### Supported modalities

| Modalities | Output |
|------------|--------|
| `["text"]` | Text only |
| `["audio"]` | Text + Audio |
| `["text", "audio"]` | Text + Audio |
| Not specified | Text + Audio (default) |

### Using curl

#### Text only

```bash
curl http://localhost:8091/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
"modalities": ["text"]
}'
```

#### Text + Audio

```bash
curl http://localhost:8091/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
"modalities": ["audio"]
}'
```

### Using Python client

If you want to control output modalities, e.g. only output text, you can run the command below:
```bash
python openai_chat_completion_client_for_multimodal_generation.py \
--query-type use_image \
--modalities text
```

### Using OpenAI Python SDK

#### Text only

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")

response = client.chat.completions.create(
model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
modalities=["text"]
)
print(response.choices[0].message.content)
```

#### Text + Audio

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")

response = client.chat.completions.create(
model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
modalities=["audio"]
)
# Response contains two choices: one with text, one with audio
print(response.choices[0].message.content) # Text response
print(response.choices[1].message.audio) # Audio response
## Streaming Output
If you want to enable streaming output, please set the argument as below. The final output will be obtained just after generated by corresponding stage. Now we only support text streaming output. Other modalities can output normally.
```bash
python openai_chat_completion_client_for_multimodal_generation.py \
--query-type use_image \
--stream
```

## Run Local Web UI Demo
Expand Down
4 changes: 2 additions & 2 deletions examples/offline_inference/qwen2_5_omni/end2end.py
Original file line number Diff line number Diff line change
Expand Up @@ -377,12 +377,12 @@ def main(args):
for i, prompt in enumerate(prompts):
prompt["modalities"] = output_modalities

omni_outputs = omni_llm.generate(prompts, sampling_params_list)
omni_generator = omni_llm.generate(prompts, sampling_params_list)

# Determine output directory: prefer --output-dir; fallback to --output-wav
output_dir = args.output_dir if getattr(args, "output_dir", None) else args.output_wav
os.makedirs(output_dir, exist_ok=True)
for stage_outputs in omni_outputs:
for stage_outputs in omni_generator:
if stage_outputs.final_output_type == "text":
for output in stage_outputs.request_output:
request_id = output.request_id
Expand Down
4 changes: 2 additions & 2 deletions examples/offline_inference/qwen3_omni/end2end.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,12 +233,12 @@ def main(args):
for i, prompt in enumerate(prompts):
prompt["modalities"] = output_modalities

omni_outputs = omni_llm.generate(prompts, sampling_params_list)
omni_generator = omni_llm.generate(prompts, sampling_params_list)
# Determine output directory: prefer --output-dir; fallback to --output-wav
output_dir = args.output_dir if getattr(args, "output_dir", None) else args.output_wav
os.makedirs(output_dir, exist_ok=True)

for stage_outputs in omni_outputs:
for stage_outputs in omni_generator:
if stage_outputs.final_output_type == "text":
for output in stage_outputs.request_output:
request_id = output.request_id
Expand Down
8 changes: 8 additions & 0 deletions examples/online_serving/qwen2_5_omni/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,14 @@ python openai_chat_completion_client_for_multimodal_generation.py \
--modalities text
```

## Streaming Output
If you want to enable streaming output, please set the argument as below. The final output will be obtained just after generated by corresponding stage. Now we only support text streaming output. Other modalities can output normally.
```bash
python openai_chat_completion_client_for_multimodal_generation.py \
--query-type mixed_modalities \
--stream
```

## Run Local Web UI Demo

This Web UI demo allows users to interact with the model through a web browser.
Expand Down
Loading