-
Notifications
You must be signed in to change notification settings - Fork 225
[Profile] Adding profiling hooks for omni&vllm&diffusion pipeline #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8ae6e64
1b1f2e5
8fb54c7
fe8bd16
da300a4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,3 +1,74 @@ | ||||||
| # Profiling vLLM-Omni (update soon) | ||||||
| # Profiling vLLM-Omni | ||||||
| ## This guide provides detailed instructions on how to use the logger system in vllm-omni. | ||||||
|
|
||||||
| Profiling is only intended for vLLM-Omni developers and maintainers to understand the proportion of time spent in different parts of the codebase. **vLLM-Omni end-users should never turn on profiling** as it will significantly slow down the inference. | ||||||
| In vllm-omni, there are two different scheduling paths: | ||||||
| • Diffusion/DiT Single diffusion Pipeline[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video) | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add 1 blank line |
||||||
|
|
||||||
|
|
||||||
| • Multi-Stage Pipeline for Multimodal Understanding and Speech Generation[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni) | ||||||
|
|
||||||
|
|
||||||
| The logging content and usage methods of the logger system under different scheduling paths are as follows: | ||||||
| ## Recording Content and Usage Instructions | ||||||
| ### 1. VLLM features | ||||||
| VLLM features it log for root module vllm, and the sub model automatically inherit the parent logger. But the vllm_omni module failed to automatically inherit vllm.So we need to init vllm_omni root logger, witch inherit the parent logger.vLLM config includes communication methods, scheduling modes, parallelism, and runtime scale. It also includes shared memory pressure status, model size, and observed GPU memory usage during runtime.The VLLM config content recorded by Single the Diffusion Pipeline model and the Multi-Stage Pipeline model is the same. | ||||||
| #### How to view vllm features | ||||||
| Before running the scripts in the examples, set the environment variables to view the vLLM config in the logs printed in the terminal. | ||||||
| ```bash | ||||||
| export VLLM_LOGGING_LEVEL=DEBUG | ||||||
| ``` | ||||||
| ### 2.VLLM-omni features | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
apply for all |
||||||
| The vllm-omni feature provides multi-dimensional metrics such as end-to-end performance, IPC communication, pipeline scheduling, and engine passthrough, enabling full observability and detailed performance analysis throughout the entire multimodal inference process. However, since the Diffusion Pipeline model does not schedule the omni feature, only the Multi-Stage Pipeline model can access the omni feature.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni) | ||||||
| #### How to view VLLM-omni features | ||||||
| During the operation of the Multi-Stage Pipeline model, the Omni feature is automatically invoked. You can directly run the script to view the Omni feature of the model.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni) | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not a good way to tell other how to view xxx you need to provide a specific example and explain what you can get from the example and what the results mean to the users |
||||||
| ```bash | ||||||
| sh run_multiple_prompts.sh | ||||||
| ``` | ||||||
|
|
||||||
| ### 3.Diffusion features | ||||||
| • The Multi-Stage Pipeline logs do not directly record the details of the diffusion algorithm. Instead, they abstract a complete diffusion process into a single Stage, indirectly reflecting the overall performance of diffusion through `stage_gen_time_ms`, and focus on recording IPC and scheduling characteristics across different Stages. | ||||||
|
|
||||||
| • The Diffusion Pipeline logs comprehensively cover the core macro characteristics of diffusion inference, including model loading, CFG, number of inference steps, total diffusion time, average denoising step time, and other parameters. | ||||||
|
|
||||||
|
|
||||||
|
|
||||||
| #### How to view Diffusion features | ||||||
| 1.The Multi-Stage Pipeline | ||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why multi-stage here? |
||||||
|
|
||||||
| ##### Setting the log switch: | ||||||
|
|
||||||
| ```python | ||||||
| omni_llm = Omni( | ||||||
| model=model_name, | ||||||
| log_stats=args.enable_stats,#Setting enable_stats=True | ||||||
| log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None) | ||||||
| ) | ||||||
| ``` | ||||||
| or | ||||||
| ```python | ||||||
| omni_llm = Omni( | ||||||
| model=model_name, | ||||||
| log_stats=True | ||||||
| log_file=os.path.join(log_dir, "omni_llm_pipeline.log") | ||||||
| ) | ||||||
|
|
||||||
| ``` | ||||||
| ##### Setting the log switch: | ||||||
|
|
||||||
| ```bash | ||||||
| sh run_multiple_prompts.sh | ||||||
| ``` | ||||||
|
|
||||||
| 2.The Diffusion Pipeline | ||||||
|
|
||||||
| Run the Diffusion Pipeline script directly to view the model's diffusion properties(Taking image_to_image as an example, the usage method for other models is the same.)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video): | ||||||
|
|
||||||
| ```python | ||||||
| python image_edit.py \ | ||||||
| --image input.png \ | ||||||
| --prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \ | ||||||
| --output output_image_edit.png \ | ||||||
| --num_inference_steps 50 \ | ||||||
| --cfg_scale 4.0 | ||||||
|
|
||||||
| ``` | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,8 +6,8 @@ | |
| """ | ||
|
|
||
| import os | ||
| from typing import NamedTuple | ||
|
|
||
| from typing import NamedTuple, Optional | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be using |
||
| import time | ||
| import librosa | ||
| import numpy as np | ||
| import soundfile as sf | ||
|
|
@@ -18,7 +18,7 @@ | |
| from vllm.multimodal.image import convert_image_mode | ||
| from vllm.sampling_params import SamplingParams | ||
| from vllm.utils import FlexibleArgumentParser | ||
|
|
||
| from datetime import datetime | ||
| from vllm_omni.entrypoints.omni import Omni | ||
|
|
||
| SEED = 42 | ||
|
|
@@ -58,9 +58,9 @@ def get_text_query(question: str = None) -> QueryResult: | |
|
|
||
|
|
||
| def get_mixed_modalities_query( | ||
| video_path: str | None = None, | ||
| image_path: str | None = None, | ||
| audio_path: str | None = None, | ||
| video_path: Optional[str] = None, | ||
| image_path: Optional[str] = None, | ||
| audio_path: Optional[str] = None, | ||
| num_frames: int = 16, | ||
| sampling_rate: int = 16000, | ||
| ) -> QueryResult: | ||
|
|
@@ -114,7 +114,7 @@ def get_mixed_modalities_query( | |
|
|
||
|
|
||
| def get_use_audio_in_video_query( | ||
| video_path: str | None = None, num_frames: int = 16, sampling_rate: int = 16000 | ||
| video_path: Optional[str] = None, num_frames: int = 16, sampling_rate: int = 16000 | ||
| ) -> QueryResult: | ||
| question = "Describe the content of the video, then convert what the baby say into text." | ||
| prompt = ( | ||
|
|
@@ -151,7 +151,7 @@ def get_use_audio_in_video_query( | |
| ) | ||
|
|
||
|
|
||
| def get_multi_audios_query(audio_path: str | None = None, sampling_rate: int = 16000) -> QueryResult: | ||
| def get_multi_audios_query(audio_path: Optional[str] = None, sampling_rate: int = 16000) -> QueryResult: | ||
| question = "Are these two audio clips the same?" | ||
| prompt = ( | ||
| f"<|im_start|>system\n{default_system}<|im_end|>\n" | ||
|
|
@@ -190,7 +190,7 @@ def get_multi_audios_query(audio_path: str | None = None, sampling_rate: int = 1 | |
| ) | ||
|
|
||
|
|
||
| def get_image_query(question: str = None, image_path: str | None = None) -> QueryResult: | ||
| def get_image_query(question: str = None, image_path: Optional[str] = None) -> QueryResult: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why change it back? |
||
| if question is None: | ||
| question = "What is the content of this image?" | ||
| prompt = ( | ||
|
|
@@ -219,7 +219,7 @@ def get_image_query(question: str = None, image_path: str | None = None) -> Quer | |
| ) | ||
|
|
||
|
|
||
| def get_video_query(question: str = None, video_path: str | None = None, num_frames: int = 16) -> QueryResult: | ||
| def get_video_query(question: str = None, video_path: Optional[str] = None, num_frames: int = 16) -> QueryResult: | ||
| if question is None: | ||
| question = "Why is this video funny?" | ||
| prompt = ( | ||
|
|
@@ -247,7 +247,7 @@ def get_video_query(question: str = None, video_path: str | None = None, num_fra | |
| ) | ||
|
|
||
|
|
||
| def get_audio_query(question: str = None, audio_path: str | None = None, sampling_rate: int = 16000) -> QueryResult: | ||
| def get_audio_query(question: str = None, audio_path: Optional[str] = None, sampling_rate: int = 16000) -> QueryResult: | ||
| if question is None: | ||
| question = "What is the content of this audio?" | ||
| prompt = ( | ||
|
|
@@ -320,10 +320,17 @@ def main(args): | |
| else: | ||
| query_result = query_func() | ||
|
|
||
| base_dir = os.path.dirname(os.path.abspath(__file__)) | ||
| ts = datetime.now().strftime("%Y%m%d_%H%M%S") | ||
| log_dir = os.path.join(base_dir, "logs", "omni", ts) | ||
| os.makedirs(log_dir, exist_ok=True) | ||
|
|
||
| print("Omni logs will be saved to:", log_dir) | ||
|
|
||
| omni_llm = Omni( | ||
| model=model_name, | ||
| log_stats=args.enable_stats, | ||
| log_file=("omni_llm_pipeline.log" if args.enable_stats else None), | ||
| log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None), | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why use os.path.join(log_dir, "omni_llm_pipeline.log") ,are there any other alternative methods? |
||
| init_sleep_seconds=args.init_sleep_seconds, | ||
| batch_timeout=args.batch_timeout, | ||
| init_timeout=args.init_timeout, | ||
|
|
@@ -419,7 +426,7 @@ def parse_args(): | |
| parser.add_argument( | ||
| "--enable-stats", | ||
| action="store_true", | ||
| default=False, | ||
| default=True, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why default is true, false is better |
||
| help="Enable writing detailed statistics (default: disabled)", | ||
| ) | ||
| parser.add_argument( | ||
|
|
@@ -496,18 +503,10 @@ def parse_args(): | |
| default=16000, | ||
| help="Sampling rate for audio loading (default: 16000).", | ||
| ) | ||
| parser.add_argument( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why you need to delete there? |
||
| "--worker-backend", type=str, default="multi_process", choices=["multi_process", "ray"], help="backend" | ||
| ) | ||
| parser.add_argument( | ||
| "--ray-address", | ||
| type=str, | ||
| default=None, | ||
| help="Address of the Ray cluster.", | ||
| ) | ||
|
|
||
| return parser.parse_args() | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| args = parse_args() | ||
| main(args) | ||
| main(args) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -6,7 +6,7 @@ | |
| """ | ||
|
|
||
| import os | ||
| from typing import NamedTuple | ||
| from typing import NamedTuple, Optional | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Optional is no longer used. Please take care |
||
|
|
||
| import librosa | ||
| import numpy as np | ||
|
|
@@ -18,7 +18,7 @@ | |
| from vllm.assets.video import VideoAsset, video_to_ndarrays | ||
| from vllm.multimodal.image import convert_image_mode | ||
| from vllm.utils import FlexibleArgumentParser | ||
|
|
||
| from datetime import datetime | ||
| from vllm_omni.entrypoints.omni import Omni | ||
|
|
||
| SEED = 42 | ||
|
|
@@ -57,7 +57,7 @@ def get_text_query(question: str = None) -> QueryResult: | |
| ) | ||
|
|
||
|
|
||
| def get_video_query(question: str = None, video_path: str | None = None, num_frames: int = 16) -> QueryResult: | ||
| def get_video_query(question: str = None, video_path: Optional[str] = None, num_frames: int = 16) -> QueryResult: | ||
| if question is None: | ||
| question = "Why is this video funny?" | ||
| prompt = ( | ||
|
|
@@ -85,7 +85,7 @@ def get_video_query(question: str = None, video_path: str | None = None, num_fra | |
| ) | ||
|
|
||
|
|
||
| def get_image_query(question: str = None, image_path: str | None = None) -> QueryResult: | ||
| def get_image_query(question: str = None, image_path: Optional[str] = None) -> QueryResult: | ||
| if question is None: | ||
| question = "What is the content of this image?" | ||
| prompt = ( | ||
|
|
@@ -114,7 +114,7 @@ def get_image_query(question: str = None, image_path: str | None = None) -> Quer | |
| ) | ||
|
|
||
|
|
||
| def get_audio_query(question: str = None, audio_path: str | None = None, sampling_rate: int = 16000) -> QueryResult: | ||
| def get_audio_query(question: str = None, audio_path: Optional[str] = None, sampling_rate: int = 16000) -> QueryResult: | ||
| if question is None: | ||
| question = "What is the content of this audio?" | ||
| prompt = ( | ||
|
|
@@ -169,10 +169,18 @@ def main(args): | |
| query_result = query_func(audio_path=audio_path, sampling_rate=getattr(args, "sampling_rate", 16000)) | ||
| else: | ||
| query_result = query_func() | ||
| base_dir = os.path.dirname(os.path.abspath(__file__)) | ||
| ts = datetime.now().strftime("%Y%m%d_%H%M%S") | ||
| log_dir = os.path.join(base_dir, "logs", "omni", ts) | ||
| os.makedirs(log_dir, exist_ok=True) | ||
|
|
||
| print("Omni logs will be saved to:", log_dir) | ||
|
|
||
| omni_llm = Omni( | ||
| model=model_name, | ||
| stage_configs_path=args.stage_configs_path, | ||
| log_stats=True, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. default should be false |
||
| log_file=os.path.join(log_dir, "omni_llm_pipeline.log") | ||
| ) | ||
|
|
||
| thinker_sampling_params = SamplingParams( | ||
|
|
@@ -365,4 +373,4 @@ def parse_args(): | |
|
|
||
| if __name__ == "__main__": | ||
| args = parse_args() | ||
| main(args) | ||
| main(args) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add one blank line