Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 73 additions & 2 deletions docs/contributing/profiling.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,74 @@
# Profiling vLLM-Omni (update soon)
# Profiling vLLM-Omni
## This guide provides detailed instructions on how to use the logger system in vllm-omni.

Profiling is only intended for vLLM-Omni developers and maintainers to understand the proportion of time spent in different parts of the codebase. **vLLM-Omni end-users should never turn on profiling** as it will significantly slow down the inference.
In vllm-omni, there are two different scheduling paths:
• Diffusion/DiT Single diffusion Pipeline[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add one blank line

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add 1 blank line



• Multi-Stage Pipeline for Multimodal Understanding and Speech Generation[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)


The logging content and usage methods of the logger system under different scheduling paths are as follows:
## Recording Content and Usage Instructions
### 1. VLLM features
VLLM features it log for root module vllm, and the sub model automatically inherit the parent logger. But the vllm_omni module failed to automatically inherit vllm.So we need to init vllm_omni root logger, witch inherit the parent logger.vLLM config includes communication methods, scheduling modes, parallelism, and runtime scale. It also includes shared memory pressure status, model size, and observed GPU memory usage during runtime.The VLLM config content recorded by Single the Diffusion Pipeline model and the Multi-Stage Pipeline model is the same.
#### How to view vllm features
Before running the scripts in the examples, set the environment variables to view the vLLM config in the logs printed in the terminal.
```bash
export VLLM_LOGGING_LEVEL=DEBUG
```
### 2.VLLM-omni features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### 2.VLLM-omni features
### 2.vLLM-Omni features

apply for all

The vllm-omni feature provides multi-dimensional metrics such as end-to-end performance, IPC communication, pipeline scheduling, and engine passthrough, enabling full observability and detailed performance analysis throughout the entire multimodal inference process. However, since the Diffusion Pipeline model does not schedule the omni feature, only the Multi-Stage Pipeline model can access the omni feature.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)
#### How to view VLLM-omni features
During the operation of the Multi-Stage Pipeline model, the Omni feature is automatically invoked. You can directly run the script to view the Omni feature of the model.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a good way to tell other how to view xxx

you need to provide a specific example and explain what you can get from the example and what the results mean to the users

```bash
sh run_multiple_prompts.sh
```

### 3.Diffusion features
• The Multi-Stage Pipeline logs do not directly record the details of the diffusion algorithm. Instead, they abstract a complete diffusion process into a single Stage, indirectly reflecting the overall performance of diffusion through `stage_gen_time_ms`, and focus on recording IPC and scheduling characteristics across different Stages.

• The Diffusion Pipeline logs comprehensively cover the core macro characteristics of diffusion inference, including model loading, CFG, number of inference steps, total diffusion time, average denoising step time, and other parameters.



#### How to view Diffusion features
1.The Multi-Stage Pipeline
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why multi-stage here?


##### Setting the log switch:

```python
omni_llm = Omni(
model=model_name,
log_stats=args.enable_stats,#Setting enable_stats=True
log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None)
)
```
or
```python
omni_llm = Omni(
model=model_name,
log_stats=True
log_file=os.path.join(log_dir, "omni_llm_pipeline.log")
)

```
##### Setting the log switch:

```bash
sh run_multiple_prompts.sh
```

2.The Diffusion Pipeline

Run the Diffusion Pipeline script directly to view the model's diffusion properties(Taking image_to_image as an example, the usage method for other models is the same.)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video):

```python
python image_edit.py \
--image input.png \
--prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \
--output output_image_edit.png \
--num_inference_steps 50 \
--cfg_scale 4.0

```
45 changes: 22 additions & 23 deletions examples/offline_inference/qwen2_5_omni/end2end.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
"""

import os
from typing import NamedTuple

from typing import NamedTuple, Optional
Copy link
Member

@DarkLight1337 DarkLight1337 Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be using | None instead of Optional using Python 3.10+. Have you merged from main and run pre-commit run --all-files?

import time
import librosa
import numpy as np
import soundfile as sf
Expand All @@ -18,7 +18,7 @@
from vllm.multimodal.image import convert_image_mode
from vllm.sampling_params import SamplingParams
from vllm.utils import FlexibleArgumentParser

from datetime import datetime
from vllm_omni.entrypoints.omni import Omni

SEED = 42
Expand Down Expand Up @@ -58,9 +58,9 @@ def get_text_query(question: str = None) -> QueryResult:


def get_mixed_modalities_query(
video_path: str | None = None,
image_path: str | None = None,
audio_path: str | None = None,
video_path: Optional[str] = None,
image_path: Optional[str] = None,
audio_path: Optional[str] = None,
num_frames: int = 16,
sampling_rate: int = 16000,
) -> QueryResult:
Expand Down Expand Up @@ -114,7 +114,7 @@ def get_mixed_modalities_query(


def get_use_audio_in_video_query(
video_path: str | None = None, num_frames: int = 16, sampling_rate: int = 16000
video_path: Optional[str] = None, num_frames: int = 16, sampling_rate: int = 16000
) -> QueryResult:
question = "Describe the content of the video, then convert what the baby say into text."
prompt = (
Expand Down Expand Up @@ -151,7 +151,7 @@ def get_use_audio_in_video_query(
)


def get_multi_audios_query(audio_path: str | None = None, sampling_rate: int = 16000) -> QueryResult:
def get_multi_audios_query(audio_path: Optional[str] = None, sampling_rate: int = 16000) -> QueryResult:
question = "Are these two audio clips the same?"
prompt = (
f"<|im_start|>system\n{default_system}<|im_end|>\n"
Expand Down Expand Up @@ -190,7 +190,7 @@ def get_multi_audios_query(audio_path: str | None = None, sampling_rate: int = 1
)


def get_image_query(question: str = None, image_path: str | None = None) -> QueryResult:
def get_image_query(question: str = None, image_path: Optional[str] = None) -> QueryResult:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change it back?

if question is None:
question = "What is the content of this image?"
prompt = (
Expand Down Expand Up @@ -219,7 +219,7 @@ def get_image_query(question: str = None, image_path: str | None = None) -> Quer
)


def get_video_query(question: str = None, video_path: str | None = None, num_frames: int = 16) -> QueryResult:
def get_video_query(question: str = None, video_path: Optional[str] = None, num_frames: int = 16) -> QueryResult:
if question is None:
question = "Why is this video funny?"
prompt = (
Expand Down Expand Up @@ -247,7 +247,7 @@ def get_video_query(question: str = None, video_path: str | None = None, num_fra
)


def get_audio_query(question: str = None, audio_path: str | None = None, sampling_rate: int = 16000) -> QueryResult:
def get_audio_query(question: str = None, audio_path: Optional[str] = None, sampling_rate: int = 16000) -> QueryResult:
if question is None:
question = "What is the content of this audio?"
prompt = (
Expand Down Expand Up @@ -320,10 +320,17 @@ def main(args):
else:
query_result = query_func()

base_dir = os.path.dirname(os.path.abspath(__file__))
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
log_dir = os.path.join(base_dir, "logs", "omni", ts)
os.makedirs(log_dir, exist_ok=True)

print("Omni logs will be saved to:", log_dir)

omni_llm = Omni(
model=model_name,
log_stats=args.enable_stats,
log_file=("omni_llm_pipeline.log" if args.enable_stats else None),
log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use os.path.join(log_dir, "omni_llm_pipeline.log") ,are there any other alternative methods?

init_sleep_seconds=args.init_sleep_seconds,
batch_timeout=args.batch_timeout,
init_timeout=args.init_timeout,
Expand Down Expand Up @@ -419,7 +426,7 @@ def parse_args():
parser.add_argument(
"--enable-stats",
action="store_true",
default=False,
default=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why default is true, false is better

help="Enable writing detailed statistics (default: disabled)",
)
parser.add_argument(
Expand Down Expand Up @@ -496,18 +503,10 @@ def parse_args():
default=16000,
help="Sampling rate for audio loading (default: 16000).",
)
parser.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need to delete there?

"--worker-backend", type=str, default="multi_process", choices=["multi_process", "ray"], help="backend"
)
parser.add_argument(
"--ray-address",
type=str,
default=None,
help="Address of the Ray cluster.",
)

return parser.parse_args()


if __name__ == "__main__":
args = parse_args()
main(args)
main(args)
20 changes: 14 additions & 6 deletions examples/offline_inference/qwen3_omni/end2end.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"""

import os
from typing import NamedTuple
from typing import NamedTuple, Optional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional is no longer used. Please take care


import librosa
import numpy as np
Expand All @@ -18,7 +18,7 @@
from vllm.assets.video import VideoAsset, video_to_ndarrays
from vllm.multimodal.image import convert_image_mode
from vllm.utils import FlexibleArgumentParser

from datetime import datetime
from vllm_omni.entrypoints.omni import Omni

SEED = 42
Expand Down Expand Up @@ -57,7 +57,7 @@ def get_text_query(question: str = None) -> QueryResult:
)


def get_video_query(question: str = None, video_path: str | None = None, num_frames: int = 16) -> QueryResult:
def get_video_query(question: str = None, video_path: Optional[str] = None, num_frames: int = 16) -> QueryResult:
if question is None:
question = "Why is this video funny?"
prompt = (
Expand Down Expand Up @@ -85,7 +85,7 @@ def get_video_query(question: str = None, video_path: str | None = None, num_fra
)


def get_image_query(question: str = None, image_path: str | None = None) -> QueryResult:
def get_image_query(question: str = None, image_path: Optional[str] = None) -> QueryResult:
if question is None:
question = "What is the content of this image?"
prompt = (
Expand Down Expand Up @@ -114,7 +114,7 @@ def get_image_query(question: str = None, image_path: str | None = None) -> Quer
)


def get_audio_query(question: str = None, audio_path: str | None = None, sampling_rate: int = 16000) -> QueryResult:
def get_audio_query(question: str = None, audio_path: Optional[str] = None, sampling_rate: int = 16000) -> QueryResult:
if question is None:
question = "What is the content of this audio?"
prompt = (
Expand Down Expand Up @@ -169,10 +169,18 @@ def main(args):
query_result = query_func(audio_path=audio_path, sampling_rate=getattr(args, "sampling_rate", 16000))
else:
query_result = query_func()
base_dir = os.path.dirname(os.path.abspath(__file__))
ts = datetime.now().strftime("%Y%m%d_%H%M%S")
log_dir = os.path.join(base_dir, "logs", "omni", ts)
os.makedirs(log_dir, exist_ok=True)

print("Omni logs will be saved to:", log_dir)

omni_llm = Omni(
model=model_name,
stage_configs_path=args.stage_configs_path,
log_stats=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default should be false

log_file=os.path.join(log_dir, "omni_llm_pipeline.log")
)

thinker_sampling_params = SamplingParams(
Expand Down Expand Up @@ -365,4 +373,4 @@ def parse_args():

if __name__ == "__main__":
args = parse_args()
main(args)
main(args)
Loading