[Docs] Add audio modality examples and clarify default behavior

wonjae.lee0 · wonjae.lee0 · commit e26bdfa7a47d · 2025-12-24T10:50:06.000+09:00
- Add Text + Audio examples for curl and OpenAI Python SDK
- Update modalities table with all supported options
- Clarify that default (not specified) returns Text + Audio
- Remove max_tokens parameter (not working)
diff --git a/docs/user_guide/examples/online_serving/qwen2_5_omni.md b/docs/user_guide/examples/online_serving/qwen2_5_omni.md
@@ -79,15 +79,17 @@ You can control output modalities to specify which types of output the model sho
 
 ### Supported modalities
 
-| Modality | Output |
-|----------|--------|
-| `text`   | Text only |
-| `audio`  | Text + Audio (audio generation requires text) |
-
-If not specified, the model uses its default output modalities.
+| Modalities | Output |
+|------------|--------|
+| `["text"]` | Text only |
+| `["audio"]` | Text + Audio |
+| `["text", "audio"]` | Text + Audio |
+| Not specified | Text + Audio (default) |
 
 ### Using curl
 
+#### Text only
+
 ```bash
 curl http://localhost:8091/v1/chat/completions \
   -H "Content-Type: application/json" \
@@ -98,6 +100,18 @@ curl http://localhost:8091/v1/chat/completions \
   }'
 ```
 
+#### Text + Audio
+
+```bash
+curl http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen2.5-Omni-7B",
+    "messages": [{"role": "user", "content": "Describe vLLM in brief."}],
+    "modalities": ["audio"]
+  }'
+```
+
 ### Using Python client
 
 ```bash
@@ -108,6 +122,8 @@ python openai_chat_completion_client_for_multimodal_generation.py \
 
 ### Using OpenAI Python SDK
 
+#### Text only
+
 ```python
 from openai import OpenAI
 
@@ -121,6 +137,23 @@ response = client.chat.completions.create(
 print(response.choices[0].message.content)
 ```
 
+#### Text + Audio
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen2.5-Omni-7B",
+    messages=[{"role": "user", "content": "Describe vLLM in brief."}],
+    modalities=["audio"]
+)
+# Response contains two choices: one with text, one with audio
+print(response.choices[0].message.content)  # Text response
+print(response.choices[1].message.audio)    # Audio response
+```
+
 ## Run Local Web UI Demo
 
 This Web UI demo allows users to interact with the model through a web browser.
diff --git a/docs/user_guide/examples/online_serving/qwen3_omni.md b/docs/user_guide/examples/online_serving/qwen3_omni.md
@@ -87,15 +87,17 @@ You can control output modalities to specify which types of output the model sho
 
 ### Supported modalities
 
-| Modality | Output |
-|----------|--------|
-| `text`   | Text only |
-| `audio`  | Text + Audio (audio generation requires text) |
-
-If not specified, the model uses its default output modalities.
+| Modalities | Output |
+|------------|--------|
+| `["text"]` | Text only |
+| `["audio"]` | Text + Audio |
+| `["text", "audio"]` | Text + Audio |
+| Not specified | Text + Audio (default) |
 
 ### Using curl
 
+#### Text only
+
 ```bash
 curl http://localhost:8091/v1/chat/completions \
   -H "Content-Type: application/json" \
@@ -106,6 +108,18 @@ curl http://localhost:8091/v1/chat/completions \
   }'
 ```
 
+#### Text + Audio
+
+```bash
+curl http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
+    "messages": [{"role": "user", "content": "Describe vLLM in brief."}],
+    "modalities": ["audio"]
+  }'
+```
+
 ### Using Python client
 
 ```bash
@@ -116,6 +130,8 @@ python openai_chat_completion_client_for_multimodal_generation.py \
 
 ### Using OpenAI Python SDK
 
+#### Text only
+
 ```python
 from openai import OpenAI
 
@@ -124,12 +140,28 @@ client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
 response = client.chat.completions.create(
     model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
     messages=[{"role": "user", "content": "Describe vLLM in brief."}],
-    modalities=["text"],
-    max_tokens=100,
+    modalities=["text"]
 )
 print(response.choices[0].message.content)
 ```
 
+#### Text + Audio
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
+    messages=[{"role": "user", "content": "Describe vLLM in brief."}],
+    modalities=["audio"]
+)
+# Response contains two choices: one with text, one with audio
+print(response.choices[0].message.content)  # Text response
+print(response.choices[1].message.audio)    # Audio response
+```
+
 ## Run Local Web UI Demo
 
 This Web UI demo allows users to interact with the model through a web browser.