vllm-project · erfgss · Dec 17, 2025 · Dec 18, 2025 · Dec 18, 2025 · Dec 19, 2025
diff --git a/docs/contributing/profiling.md b/docs/contributing/profiling.md
@@ -1,3 +1,74 @@
-# Profiling vLLM-Omni (update soon)
+# Profiling vLLM-Omni
+## This guide provides detailed instructions on how to use the logger system in vllm-omni.
 
-Profiling is only intended for vLLM-Omni developers and maintainers to understand the proportion of time spent in different parts of the codebase. **vLLM-Omni end-users should never turn on profiling** as it will significantly slow down the inference.
+In vllm-omni, there are two different scheduling paths:
+• Diffusion/DiT Single diffusion Pipeline[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video)
+
+
+• Multi-Stage Pipeline for Multimodal Understanding and Speech Generation[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)
+
+
+The logging content and usage methods of the logger system under different scheduling paths are as follows:
+## Recording Content and Usage Instructions
+### 1. VLLM features
+VLLM features it log for root module vllm, and the sub model automatically inherit the parent logger. But the vllm_omni module failed to automatically inherit vllm.So we need to init vllm_omni root logger, witch inherit the parent logger.vLLM config includes communication methods, scheduling modes, parallelism, and runtime scale. It also includes shared memory pressure status, model size, and observed GPU memory usage during runtime.The VLLM config content recorded by Single the Diffusion Pipeline model and the Multi-Stage Pipeline model is the same.
+#### How to view vllm features
+Before running the scripts in the examples, set the environment variables to view the vLLM config in the logs printed in the terminal.
+ ```bash
+ export VLLM_LOGGING_LEVEL=DEBUG
+ ```
+### 2.VLLM-omni features
-### 2.VLLM-omni features
+### 2.vLLM-Omni features
-### 2.VLLM-omni features
+### 2.vLLM-Omni features
+The vllm-omni feature provides multi-dimensional metrics such as end-to-end performance, IPC communication, pipeline scheduling, and engine passthrough, enabling full observability and detailed performance analysis throughout the entire multimodal inference process. However, since the Diffusion Pipeline model does not schedule the omni feature, only the Multi-Stage Pipeline model can access the omni feature.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)
+#### How to view VLLM-omni features
+During the operation of the Multi-Stage Pipeline model, the Omni feature is automatically invoked. You can directly run the script to view the Omni feature of the model.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)
+```bash
+sh run_multiple_prompts.sh
+```
+
+### 3.Diffusion features
+• The Multi-Stage Pipeline logs do not directly record the details of the diffusion algorithm. Instead, they abstract a complete diffusion process into a single Stage, indirectly reflecting the overall performance of diffusion through `stage_gen_time_ms`, and focus on recording IPC and scheduling characteristics across different Stages.
+
+• The Diffusion Pipeline logs comprehensively cover the core macro characteristics of diffusion inference, including model loading, CFG, number of inference steps, total diffusion time, average denoising step time, and other parameters.
+
+
+
+#### How to view Diffusion features
+1.The Multi-Stage Pipeline
+
+##### Setting the log switch:
+
+```python
+    omni_llm = Omni(
+        model=model_name,
+        log_stats=args.enable_stats,#Setting  enable_stats=True 
+        log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None)
+    )
+```
+or
+```python
+    omni_llm = Omni(
+        model=model_name,
+        log_stats=True 
+        log_file=os.path.join(log_dir, "omni_llm_pipeline.log") 
+    )
+
+```
+##### Setting the log switch:
+
+```bash
+sh run_multiple_prompts.sh
+```
+
+2.The Diffusion Pipeline
+
+Run the Diffusion Pipeline script directly to view the model's diffusion properties(Taking image_to_image as an example, the usage method for other models is the same.)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video):
+
+```python
+python image_edit.py \
+        --image input.png \
+        --prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \
+        --output output_image_edit.png \
+        --num_inference_steps 50 \
+        --cfg_scale 4.0
+
+```
@@ -6,8 +6,8 @@
 """
 
 import os
-from typing import NamedTuple
-
+from typing import NamedTuple, Optional
+import time
 import librosa
 import numpy as np
 import soundfile as sf
@@ -18,7 +18,7 @@
 from vllm.multimodal.image import convert_image_mode
 from vllm.sampling_params import SamplingParams
 from vllm.utils import FlexibleArgumentParser
-
+from datetime import datetime
 from vllm_omni.entrypoints.omni import Omni
 
 SEED = 42
@@ -58,9 +58,9 @@ def get_text_query(question: str = None) -> QueryResult:
 
 
 def get_mixed_modalities_query(
-    video_path: str | None = None,
-    image_path: str | None = None,
-    audio_path: str | None = None,
+    video_path: Optional[str] = None,
+    image_path: Optional[str] = None,
+    audio_path: Optional[str] = None,
     num_frames: int = 16,
     sampling_rate: int = 16000,
 ) -> QueryResult:
@@ -114,7 +114,7 @@ def get_mixed_modalities_query(
 
 
 def get_use_audio_in_video_query(
-    video_path: str | None = None, num_frames: int = 16, sampling_rate: int = 16000
+    video_path: Optional[str] = None, num_frames: int = 16, sampling_rate: int = 16000
 ) -> QueryResult:
     question = "Describe the content of the video, then convert what the baby say into text."
     prompt = (
@@ -151,7 +151,7 @@ def get_use_audio_in_video_query(
     )
 
 
-def get_multi_audios_query(audio_path: str | None = None, sampling_rate: int = 16000) -> QueryResult:
+def get_multi_audios_query(audio_path: Optional[str] = None, sampling_rate: int = 16000) -> QueryResult:
     question = "Are these two audio clips the same?"
     prompt = (
         f"<|im_start|>system\n{default_system}<|im_end|>\n"
@@ -190,7 +190,7 @@ def get_multi_audios_query(audio_path: str | None = None, sampling_rate: int = 1
     )
 
 
-def get_image_query(question: str = None, image_path: str | None = None) -> QueryResult:
+def get_image_query(question: str = None, image_path: Optional[str] = None) -> QueryResult:
     if question is None:
         question = "What is the content of this image?"
     prompt = (
@@ -219,7 +219,7 @@ def get_image_query(question: str = None, image_path: str | None = None) -> Quer
     )
 
 
-def get_video_query(question: str = None, video_path: str | None = None, num_frames: int = 16) -> QueryResult:
+def get_video_query(question: str = None, video_path: Optional[str] = None, num_frames: int = 16) -> QueryResult:
     if question is None:
         question = "Why is this video funny?"
     prompt = (
@@ -247,7 +247,7 @@ def get_video_query(question: str = None, video_path: str | None = None, num_fra
     )
 
 
-def get_audio_query(question: str = None, audio_path: str | None = None, sampling_rate: int = 16000) -> QueryResult:
+def get_audio_query(question: str = None, audio_path: Optional[str] = None, sampling_rate: int = 16000) -> QueryResult:
     if question is None:
         question = "What is the content of this audio?"
     prompt = (
@@ -320,10 +320,17 @@ def main(args):
     else:
         query_result = query_func()
 
+    base_dir = os.path.dirname(os.path.abspath(__file__))
+    ts = datetime.now().strftime("%Y%m%d_%H%M%S")
+    log_dir = os.path.join(base_dir, "logs", "omni", ts)
+    os.makedirs(log_dir, exist_ok=True)
+
+    print("Omni logs will be saved to:", log_dir)
+
     omni_llm = Omni(
         model=model_name,
         log_stats=args.enable_stats,
-        log_file=("omni_llm_pipeline.log" if args.enable_stats else None),
+        log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None),
         init_sleep_seconds=args.init_sleep_seconds,
         batch_timeout=args.batch_timeout,
         init_timeout=args.init_timeout,
@@ -419,7 +426,7 @@ def parse_args():
     parser.add_argument(
         "--enable-stats",
         action="store_true",
-        default=False,
+        default=True,
         help="Enable writing detailed statistics (default: disabled)",
     )
     parser.add_argument(
@@ -496,18 +503,10 @@ def parse_args():
         default=16000,
         help="Sampling rate for audio loading (default: 16000).",
     )
-    parser.add_argument(
-        "--worker-backend", type=str, default="multi_process", choices=["multi_process", "ray"], help="backend"
-    )
-    parser.add_argument(
-        "--ray-address",
-        type=str,
-        default=None,
-        help="Address of the Ray cluster.",
-    )
+
     return parser.parse_args()
 
 
 if __name__ == "__main__":
     args = parse_args()
-    main(args)
+    main(args)
@@ -6,7 +6,7 @@
 """
 
 import os
-from typing import NamedTuple
+from typing import NamedTuple, Optional
 
 import librosa
 import numpy as np
@@ -18,7 +18,7 @@
 from vllm.assets.video import VideoAsset, video_to_ndarrays
 from vllm.multimodal.image import convert_image_mode
 from vllm.utils import FlexibleArgumentParser
-
+from datetime import datetime
 from vllm_omni.entrypoints.omni import Omni
 
 SEED = 42
@@ -57,7 +57,7 @@ def get_text_query(question: str = None) -> QueryResult:
     )
 
 
-def get_video_query(question: str = None, video_path: str | None = None, num_frames: int = 16) -> QueryResult:
+def get_video_query(question: str = None, video_path: Optional[str] = None, num_frames: int = 16) -> QueryResult:
     if question is None:
         question = "Why is this video funny?"
     prompt = (
@@ -85,7 +85,7 @@ def get_video_query(question: str = None, video_path: str | None = None, num_fra
     )
 
 
-def get_image_query(question: str = None, image_path: str | None = None) -> QueryResult:
+def get_image_query(question: str = None, image_path: Optional[str] = None) -> QueryResult:
     if question is None:
         question = "What is the content of this image?"
     prompt = (
@@ -114,7 +114,7 @@ def get_image_query(question: str = None, image_path: str | None = None) -> Quer
     )
 
 
-def get_audio_query(question: str = None, audio_path: str | None = None, sampling_rate: int = 16000) -> QueryResult:
+def get_audio_query(question: str = None, audio_path: Optional[str] = None, sampling_rate: int = 16000) -> QueryResult:
     if question is None:
         question = "What is the content of this audio?"
     prompt = (
@@ -169,10 +169,18 @@ def main(args):
         query_result = query_func(audio_path=audio_path, sampling_rate=getattr(args, "sampling_rate", 16000))
     else:
         query_result = query_func()
+    base_dir = os.path.dirname(os.path.abspath(__file__))
+    ts = datetime.now().strftime("%Y%m%d_%H%M%S")
+    log_dir = os.path.join(base_dir, "logs", "omni", ts)
+    os.makedirs(log_dir, exist_ok=True)
+
+    print("Omni logs will be saved to:", log_dir)
 
     omni_llm = Omni(
         model=model_name,
         stage_configs_path=args.stage_configs_path,
+        log_stats=True,
+        log_file=os.path.join(log_dir, "omni_llm_pipeline.log")
     )
 
     thinker_sampling_params = SamplingParams(
@@ -365,4 +373,4 @@ def parse_args():
 
 if __name__ == "__main__":
     args = parse_args()
-    main(args)
+    main(args)