Skip to content

Commit e26bdfa

Browse files
author
wonjae.lee0
committed
[Docs] Add audio modality examples and clarify default behavior
- Add Text + Audio examples for curl and OpenAI Python SDK - Update modalities table with all supported options - Clarify that default (not specified) returns Text + Audio - Remove max_tokens parameter (not working)
1 parent 38378ad commit e26bdfa

File tree

2 files changed

+79
-14
lines changed

2 files changed

+79
-14
lines changed

docs/user_guide/examples/online_serving/qwen2_5_omni.md

Lines changed: 39 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -79,15 +79,17 @@ You can control output modalities to specify which types of output the model sho
7979

8080
### Supported modalities
8181

82-
| Modality | Output |
83-
|----------|--------|
84-
| `text` | Text only |
85-
| `audio` | Text + Audio (audio generation requires text) |
86-
87-
If not specified, the model uses its default output modalities.
82+
| Modalities | Output |
83+
|------------|--------|
84+
| `["text"]` | Text only |
85+
| `["audio"]` | Text + Audio |
86+
| `["text", "audio"]` | Text + Audio |
87+
| Not specified | Text + Audio (default) |
8888

8989
### Using curl
9090

91+
#### Text only
92+
9193
```bash
9294
curl http://localhost:8091/v1/chat/completions \
9395
-H "Content-Type: application/json" \
@@ -98,6 +100,18 @@ curl http://localhost:8091/v1/chat/completions \
98100
}'
99101
```
100102

103+
#### Text + Audio
104+
105+
```bash
106+
curl http://localhost:8091/v1/chat/completions \
107+
-H "Content-Type: application/json" \
108+
-d '{
109+
"model": "Qwen/Qwen2.5-Omni-7B",
110+
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
111+
"modalities": ["audio"]
112+
}'
113+
```
114+
101115
### Using Python client
102116

103117
```bash
@@ -108,6 +122,8 @@ python openai_chat_completion_client_for_multimodal_generation.py \
108122

109123
### Using OpenAI Python SDK
110124

125+
#### Text only
126+
111127
```python
112128
from openai import OpenAI
113129

@@ -121,6 +137,23 @@ response = client.chat.completions.create(
121137
print(response.choices[0].message.content)
122138
```
123139

140+
#### Text + Audio
141+
142+
```python
143+
from openai import OpenAI
144+
145+
client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
146+
147+
response = client.chat.completions.create(
148+
model="Qwen/Qwen2.5-Omni-7B",
149+
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
150+
modalities=["audio"]
151+
)
152+
# Response contains two choices: one with text, one with audio
153+
print(response.choices[0].message.content) # Text response
154+
print(response.choices[1].message.audio) # Audio response
155+
```
156+
124157
## Run Local Web UI Demo
125158

126159
This Web UI demo allows users to interact with the model through a web browser.

docs/user_guide/examples/online_serving/qwen3_omni.md

Lines changed: 40 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -87,15 +87,17 @@ You can control output modalities to specify which types of output the model sho
8787

8888
### Supported modalities
8989

90-
| Modality | Output |
91-
|----------|--------|
92-
| `text` | Text only |
93-
| `audio` | Text + Audio (audio generation requires text) |
94-
95-
If not specified, the model uses its default output modalities.
90+
| Modalities | Output |
91+
|------------|--------|
92+
| `["text"]` | Text only |
93+
| `["audio"]` | Text + Audio |
94+
| `["text", "audio"]` | Text + Audio |
95+
| Not specified | Text + Audio (default) |
9696

9797
### Using curl
9898

99+
#### Text only
100+
99101
```bash
100102
curl http://localhost:8091/v1/chat/completions \
101103
-H "Content-Type: application/json" \
@@ -106,6 +108,18 @@ curl http://localhost:8091/v1/chat/completions \
106108
}'
107109
```
108110

111+
#### Text + Audio
112+
113+
```bash
114+
curl http://localhost:8091/v1/chat/completions \
115+
-H "Content-Type: application/json" \
116+
-d '{
117+
"model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
118+
"messages": [{"role": "user", "content": "Describe vLLM in brief."}],
119+
"modalities": ["audio"]
120+
}'
121+
```
122+
109123
### Using Python client
110124

111125
```bash
@@ -116,6 +130,8 @@ python openai_chat_completion_client_for_multimodal_generation.py \
116130

117131
### Using OpenAI Python SDK
118132

133+
#### Text only
134+
119135
```python
120136
from openai import OpenAI
121137

@@ -124,12 +140,28 @@ client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
124140
response = client.chat.completions.create(
125141
model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
126142
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
127-
modalities=["text"],
128-
max_tokens=100,
143+
modalities=["text"]
129144
)
130145
print(response.choices[0].message.content)
131146
```
132147

148+
#### Text + Audio
149+
150+
```python
151+
from openai import OpenAI
152+
153+
client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
154+
155+
response = client.chat.completions.create(
156+
model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
157+
messages=[{"role": "user", "content": "Describe vLLM in brief."}],
158+
modalities=["audio"]
159+
)
160+
# Response contains two choices: one with text, one with audio
161+
print(response.choices[0].message.content) # Text response
162+
print(response.choices[1].message.audio) # Audio response
163+
```
164+
133165
## Run Local Web UI Demo
134166

135167
This Web UI demo allows users to interact with the model through a web browser.

0 commit comments

Comments
 (0)