From 38378adc2d50e6219d5e42b8c453d56cec2f3ec5 Mon Sep 17 00:00:00 2001
From: "wonjae.lee0" <wonjae.lee0@navercorp.com>
Date: Tue, 23 Dec 2025 01:41:08 +0900
Subject: [PATCH 1/2] [Docs] Add API usage examples for modality control in
 online serving

- Add curl example for modalities parameter
- Add OpenAI Python SDK example
- Document supported modality values (text, audio)
- Clarify that audio output includes text (audio generation requires text)

Signed-off-by: wonjae.lee0 <wonjae.lee0@navercorp.com>
---
 .../examples/online_serving/qwen2_5_omni.md   | 42 +++++++++++++++++-
 .../examples/online_serving/qwen3_omni.md     | 43 ++++++++++++++++++-
 2 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/docs/user_guide/examples/online_serving/qwen2_5_omni.md b/docs/user_guide/examples/online_serving/qwen2_5_omni.md
index e54ffe6ff..5aace4fe4 100644
--- a/docs/user_guide/examples/online_serving/qwen2_5_omni.md
+++ b/docs/user_guide/examples/online_serving/qwen2_5_omni.md
@@ -74,13 +74,53 @@ bash run_curl_multimodal_generation.sh mixed_modalities
 ```
 
 ## Modality control
-If you want to control output modalities, e.g. only output text, you can run the command below:
+
+You can control output modalities to specify which types of output the model should generate. This is useful when you only need text output and want to skip audio generation stages for better performance.
+
+### Supported modalities
+
+| Modality | Output |
+|----------|--------|
+| `text`   | Text only |
+| `audio`  | Text + Audio (audio generation requires text) |
+
+If not specified, the model uses its default output modalities.
+
+### Using curl
+
+```bash
+curl http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen2.5-Omni-7B",
+    "messages": [{"role": "user", "content": "Describe vLLM in brief."}],
+    "modalities": ["text"]
+  }'
+```
+
+### Using Python client
+
 ```bash
 python openai_chat_completion_client_for_multimodal_generation.py \
     --query-type mixed_modalities \
     --modalities text
 ```
 
+### Using OpenAI Python SDK
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen2.5-Omni-7B",
+    messages=[{"role": "user", "content": "Describe vLLM in brief."}],
+    modalities=["text"]
+)
+print(response.choices[0].message.content)
+```
+
 ## Run Local Web UI Demo
 
 This Web UI demo allows users to interact with the model through a web browser.
diff --git a/docs/user_guide/examples/online_serving/qwen3_omni.md b/docs/user_guide/examples/online_serving/qwen3_omni.md
index 320271f89..414813325 100644
--- a/docs/user_guide/examples/online_serving/qwen3_omni.md
+++ b/docs/user_guide/examples/online_serving/qwen3_omni.md
@@ -82,13 +82,54 @@ sudo apt install ffmpeg
 ```
 
 ## Modality control
-If you want to control output modalities, e.g. only output text, you can run the command below:
+
+You can control output modalities to specify which types of output the model should generate. This is useful when you only need text output and want to skip audio generation stages for better performance.
+
+### Supported modalities
+
+| Modality | Output |
+|----------|--------|
+| `text`   | Text only |
+| `audio`  | Text + Audio (audio generation requires text) |
+
+If not specified, the model uses its default output modalities.
+
+### Using curl
+
+```bash
+curl http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
+    "messages": [{"role": "user", "content": "Describe vLLM in brief."}],
+    "modalities": ["text"]
+  }'
+```
+
+### Using Python client
+
 ```bash
 python openai_chat_completion_client_for_multimodal_generation.py \
     --query-type use_image \
     --modalities text
 ```
 
+### Using OpenAI Python SDK
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
+    messages=[{"role": "user", "content": "Describe vLLM in brief."}],
+    modalities=["text"],
+    max_tokens=100,
+)
+print(response.choices[0].message.content)
+```
+
 ## Run Local Web UI Demo
 
 This Web UI demo allows users to interact with the model through a web browser.

From bc069601cc5f465a21edefc9c1ab65cf2344afb7 Mon Sep 17 00:00:00 2001
From: "wonjae.lee0" <wonjae.lee0@navercorp.com>
Date: Wed, 24 Dec 2025 10:50:06 +0900
Subject: [PATCH 2/2] [Docs] Add audio modality examples and clarify default
 behavior

- Add Text + Audio examples for curl and OpenAI Python SDK
- Update modalities table with all supported options
- Clarify that default (not specified) returns Text + Audio
- Remove max_tokens parameter (not working)

Signed-off-by: wonjae.lee0 <wonjae.lee0@navercorp.com>
---
 .../examples/online_serving/qwen2_5_omni.md   | 45 ++++++++++++++---
 .../examples/online_serving/qwen3_omni.md     | 48 +++++++++++++++----
 2 files changed, 79 insertions(+), 14 deletions(-)

diff --git a/docs/user_guide/examples/online_serving/qwen2_5_omni.md b/docs/user_guide/examples/online_serving/qwen2_5_omni.md
index 5aace4fe4..867d44e16 100644
--- a/docs/user_guide/examples/online_serving/qwen2_5_omni.md
+++ b/docs/user_guide/examples/online_serving/qwen2_5_omni.md
@@ -79,15 +79,17 @@ You can control output modalities to specify which types of output the model sho
 
 ### Supported modalities
 
-| Modality | Output |
-|----------|--------|
-| `text`   | Text only |
-| `audio`  | Text + Audio (audio generation requires text) |
-
-If not specified, the model uses its default output modalities.
+| Modalities | Output |
+|------------|--------|
+| `["text"]` | Text only |
+| `["audio"]` | Text + Audio |
+| `["text", "audio"]` | Text + Audio |
+| Not specified | Text + Audio (default) |
 
 ### Using curl
 
+#### Text only
+
 ```bash
 curl http://localhost:8091/v1/chat/completions \
   -H "Content-Type: application/json" \
@@ -98,6 +100,18 @@ curl http://localhost:8091/v1/chat/completions \
   }'
 ```
 
+#### Text + Audio
+
+```bash
+curl http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen2.5-Omni-7B",
+    "messages": [{"role": "user", "content": "Describe vLLM in brief."}],
+    "modalities": ["audio"]
+  }'
+```
+
 ### Using Python client
 
 ```bash
@@ -108,6 +122,8 @@ python openai_chat_completion_client_for_multimodal_generation.py \
 
 ### Using OpenAI Python SDK
 
+#### Text only
+
 ```python
 from openai import OpenAI
 
@@ -121,6 +137,23 @@ response = client.chat.completions.create(
 print(response.choices[0].message.content)
 ```
 
+#### Text + Audio
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen2.5-Omni-7B",
+    messages=[{"role": "user", "content": "Describe vLLM in brief."}],
+    modalities=["audio"]
+)
+# Response contains two choices: one with text, one with audio
+print(response.choices[0].message.content)  # Text response
+print(response.choices[1].message.audio)    # Audio response
+```
+
 ## Run Local Web UI Demo
 
 This Web UI demo allows users to interact with the model through a web browser.
diff --git a/docs/user_guide/examples/online_serving/qwen3_omni.md b/docs/user_guide/examples/online_serving/qwen3_omni.md
index 414813325..5b6c3a7b2 100644
--- a/docs/user_guide/examples/online_serving/qwen3_omni.md
+++ b/docs/user_guide/examples/online_serving/qwen3_omni.md
@@ -87,15 +87,17 @@ You can control output modalities to specify which types of output the model sho
 
 ### Supported modalities
 
-| Modality | Output |
-|----------|--------|
-| `text`   | Text only |
-| `audio`  | Text + Audio (audio generation requires text) |
-
-If not specified, the model uses its default output modalities.
+| Modalities | Output |
+|------------|--------|
+| `["text"]` | Text only |
+| `["audio"]` | Text + Audio |
+| `["text", "audio"]` | Text + Audio |
+| Not specified | Text + Audio (default) |
 
 ### Using curl
 
+#### Text only
+
 ```bash
 curl http://localhost:8091/v1/chat/completions \
   -H "Content-Type: application/json" \
@@ -106,6 +108,18 @@ curl http://localhost:8091/v1/chat/completions \
   }'
 ```
 
+#### Text + Audio
+
+```bash
+curl http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
+    "messages": [{"role": "user", "content": "Describe vLLM in brief."}],
+    "modalities": ["audio"]
+  }'
+```
+
 ### Using Python client
 
 ```bash
@@ -116,6 +130,8 @@ python openai_chat_completion_client_for_multimodal_generation.py \
 
 ### Using OpenAI Python SDK
 
+#### Text only
+
 ```python
 from openai import OpenAI
 
@@ -124,12 +140,28 @@ client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
 response = client.chat.completions.create(
     model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
     messages=[{"role": "user", "content": "Describe vLLM in brief."}],
-    modalities=["text"],
-    max_tokens=100,
+    modalities=["text"]
 )
 print(response.choices[0].message.content)
 ```
 
+#### Text + Audio
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="EMPTY")
+
+response = client.chat.completions.create(
+    model="Qwen/Qwen3-Omni-30B-A3B-Instruct",
+    messages=[{"role": "user", "content": "Describe vLLM in brief."}],
+    modalities=["audio"]
+)
+# Response contains two choices: one with text, one with audio
+print(response.choices[0].message.content)  # Text response
+print(response.choices[1].message.audio)    # Audio response
+```
+
 ## Run Local Web UI Demo
 
 This Web UI demo allows users to interact with the model through a web browser.