-
Notifications
You must be signed in to change notification settings - Fork 224
[Profile] Adding profiling hooks for omni&vllm&diffusion pipeline #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Profile] Adding profiling hooks for omni&vllm&diffusion pipeline #340
Conversation
Signed-off-by: Chen Yang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try: | ||
| stage_req_counts[stage_id] += 1 | ||
| stage_total_tokens[stage_id] += int(metrics.get("num_tokens_out", 0)) | ||
| rid_key = str(req_id) | ||
| pr = per_request.setdefault(rid_key, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0}) | ||
| rid_int = int(req_id) | ||
| pr = per_request.setdefault(rid_int, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve per-request stage metrics for non-numeric IDs
When a stage emits a request ID that isn’t purely numeric (e.g., the orchestrator logs 0_5369… UUID-style IDs), record_stage_metrics now forces int(req_id) and silently swallows the resulting ValueError. Because the exception short-circuits the function, the per-request stages entry is never recorded, so per-request stats and the overall_request records will be missing stage timings/transfers for any non-numeric request IDs. This regression means the new profiling output is incomplete for the typical string IDs shown in the provided logs.
Useful? React with 👍 / 👎.
|
The pre-commit test failed; you can run the pre-commit locally before submitting. |
| import os | ||
| from typing import NamedTuple | ||
|
|
||
| from typing import NamedTuple, Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be using | None instead of Optional using Python 3.10+. Have you merged from main and run pre-commit run --all-files?
|
please add an docs(profiling vllm-omni) under developer guide :) |
sure! thanks |
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
docs/contributing/profiling.md
Outdated
| # Profiling vLLM-Omni (update soon) | ||
| # \# Profiling vLLM-Omni | ||
|
|
||
| # \## profiling hooks for omni\&vllm\&diffusion pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some background to tell the users why we cannot directly use the vLLM profiling method and what's the different scenarios.
docs/contributing/profiling.md
Outdated
|
|
||
| # | ||
|
|
||
| # \## 1.Usage of Log Statistics for Single-Pipeline Diffusion Scheduling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to reorganize the sections with names, contents, examples. Ask LLM for help
docs/contributing/profiling.md
Outdated
|
|
||
| # | ||
|
|
||
| # In this project, tasks such as text-to-image and text-to-video follow a single-pipeline diffusion scheduling paradigm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the format is in a mess, you can preview the docs locally. check other md docs for reference.
Signed-off-by: Chen Yang <[email protected]>
Signed-off-by: Chen Yang <[email protected]>
hsliuustc0106
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there a lot of places requiring improvement for this PR
| ```bash | ||
| export VLLM_LOGGING_LEVEL=DEBUG | ||
| ``` | ||
| ### 2.VLLM-omni features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### 2.VLLM-omni features | |
| ### 2.vLLM-Omni features |
apply for all
| ### 2.VLLM-omni features | ||
| The vllm-omni feature provides multi-dimensional metrics such as end-to-end performance, IPC communication, pipeline scheduling, and engine passthrough, enabling full observability and detailed performance analysis throughout the entire multimodal inference process. However, since the Diffusion Pipeline model does not schedule the omni feature, only the Multi-Stage Pipeline model can access the omni feature.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni) | ||
| #### How to view VLLM-omni features | ||
| During the operation of the Multi-Stage Pipeline model, the Omni feature is automatically invoked. You can directly run the script to view the Omni feature of the model.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a good way to tell other how to view xxx
you need to provide a specific example and explain what you can get from the example and what the results mean to the users
|
|
||
|
|
||
| #### How to view Diffusion features | ||
| 1.The Multi-Stage Pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why multi-stage here?
|
|
||
| Profiling is only intended for vLLM-Omni developers and maintainers to understand the proportion of time spent in different parts of the codebase. **vLLM-Omni end-users should never turn on profiling** as it will significantly slow down the inference. | ||
| In vllm-omni, there are two different scheduling paths: | ||
| • Diffusion/DiT Single diffusion Pipeline[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add one blank line
|
|
||
| Profiling is only intended for vLLM-Omni developers and maintainers to understand the proportion of time spent in different parts of the codebase. **vLLM-Omni end-users should never turn on profiling** as it will significantly slow down the inference. | ||
| In vllm-omni, there are two different scheduling paths: | ||
| • Diffusion/DiT Single diffusion Pipeline[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add 1 blank line
| omni_llm = Omni( | ||
| model=model_name, | ||
| stage_configs_path=args.stage_configs_path, | ||
| log_stats=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default should be false
| def configure_orchestrator_logger(logger: logging.Logger, log_file: str | None) -> None: | ||
| """ | ||
| Attach a FileHandler to the given logger, and also to the | ||
| module-level orchestrator & diffusion loggers so they share |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| module-level orchestrator & diffusion loggers so they share | |
| multi-stage orchestrator & diffusion loggers so they share |
| _STATS_PATH = _make_run_stats_path() | ||
|
|
||
| _PRINT_DIFFUSION_METRICS = os.getenv("OMNI_DIFFUSION_PRINT", "1") == "1" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many blanks
| logger.info(f"Prepared {len(requests)} requests for generation.") | ||
| return self._run_engine(requests) | ||
|
|
||
| def _run_engine(self, requests: list[OmniDiffusionRequest]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why delete this?
| You can pass either an `OmniDiffusionConfig` via `od_config`, or | ||
| pass kwargs such as `model="Qwen/Qwen-Image"`, | ||
| which will be forwarded to `OmniDiffusionConfig.from_kwargs`. | ||
| High-level entrypoint for vLLM-Omni diffusion models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why you need to change this?
|
@david6666666 please help @erfgss to improve this PR |
|
please fix comment and rebase code |
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| stats_dir = os.getenv("OMNI_DIFFUSION_STATS_DIR", "omni_diffusion_stats") | ||
| os.makedirs(stats_dir, exist_ok=True) | ||
|
|
||
| # Local time; include milliseconds + pid to avoid collisions | ||
| now = datetime.now() | ||
| ts = now.strftime("%Y%m%d_%H%M%S") + f"_{now.microsecond // 1000:03d}" | ||
| pid = os.getpid() | ||
|
|
||
| filename = f"omni_diffusion_{ts}_pid{pid}.jsonl" | ||
| return os.path.join(stats_dir, filename) | ||
|
|
||
|
|
||
| _STATS_PATH = _make_run_stats_path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid failing import on unwritable stats directory
This module now creates the stats directory at import time via _STATS_PATH = _make_run_stats_path(), which calls os.makedirs(...) without any error handling. In environments where the current working directory (or OMNI_DIFFUSION_STATS_DIR) is read-only, simply importing vllm_omni.entrypoints.omni_diffusion will raise and prevent the diffusion entrypoint from working at all. Profiling/logging should be best-effort; consider deferring path creation to runtime and handling filesystem errors so non-writable locations don’t break imports.
Useful? React with 👍 / 👎.
| try: | ||
| stage_req_counts[stage_id] += 1 | ||
| stage_total_tokens[stage_id] += int(metrics.get("num_tokens_out", 0)) | ||
| rid_key = str(req_id) | ||
| pr = per_request.setdefault(rid_key, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0}) | ||
| rid_int = int(req_id) | ||
| pr = per_request.setdefault(rid_int, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0}) | ||
| pr_stages = pr["stages"] # type: ignore[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve non-numeric request IDs in per-request metrics
record_stage_metrics now unconditionally casts req_id to int, and any failure is swallowed by the broad try/except, which skips per-request aggregation entirely. request_id is typed as str in OmniEngineCoreOutput, so non-numeric IDs will silently drop per-request metrics and transfer stats. This is a regression from the previous str(req_id) behavior; use a safe fallback (like the pattern in aggregate_rx_and_maybe_total) to avoid losing metrics for string IDs.
Useful? React with 👍 / 👎.
| model=model_name, | ||
| log_stats=args.enable_stats, | ||
| log_file=("omni_llm_pipeline.log" if args.enable_stats else None), | ||
| log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use os.path.join(log_dir, "omni_llm_pipeline.log") ,are there any other alternative methods?
| stage_total_tokens[stage_id] += int(metrics.get("num_tokens_out", 0)) | ||
| rid_key = str(req_id) | ||
| pr = per_request.setdefault(rid_key, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0}) | ||
| rid_int = int(req_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change int
| pid = os.getpid() | ||
|
|
||
| filename = f"omni_diffusion_{ts}_pid{pid}.jsonl" | ||
| return os.path.join(stats_dir, filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
|
||
|
|
||
|
|
||
| def _record(event: str, **kv: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we need to use this func to do what
Profiling vLLM-Omni
This guide provides detailed instructions on how to use the logger system in vllm-omni.
In vllm-omni, there are two different scheduling paths:
Diffusion/DiT Single diffusion Pipeline[image_to_image][text_to_image][image_to_image][text_to_video]
Multi-Stage Pipeline for Multimodal Understanding and Speech Generation[qwen2_5_omni][qwen3_omni]
The logging content and usage methods of the logger system under different scheduling paths are as follows:
Recording Content and Usage Instructions
1. VLLM features
VLLM features it log for root module vllm, and the sub model automatically inherit the parent logger. But the vllm_omni module failed to automatically inherit vllm.So we need to init vllm_omni root logger, witch inherit the parent logger.vLLM config includes communication methods, scheduling modes, parallelism, and runtime scale. It also includes shared memory pressure status, model size, and observed GPU memory usage during runtime.The VLLM config content recorded by Single the Diffusion Pipeline model and the Multi-Stage Pipeline model is the same.
How to view vllm features
Before running the scripts in the examples, set the environment variables to view the vLLM config in the logs printed in the terminal.
export VLLM_LOGGING_LEVEL=DEBUG2.VLLM-omni features
The vllm-omni feature provides multi-dimensional metrics such as end-to-end performance, IPC communication, pipeline scheduling, and engine passthrough, enabling full observability and detailed performance analysis throughout the entire multimodal inference process. However, since the Diffusion Pipeline model does not schedule the omni feature, only the Multi-Stage Pipeline model can access the omni feature.[qwen2_5_omni][qwen3_omni]
How to view VLLM-omni features
During the operation of the Multi-Stage Pipeline model, the Omni feature is automatically invoked. You can directly run the script to view the Omni feature of the model.[qwen2_5_omni][qwen3_omni]
3.Diffusion features
stage_gen_time_ms, and focus on recording IPC and scheduling characteristics across different Stages.How to view Diffusion features
1.The Multi-Stage Pipeline
Setting the log switch:
or
Setting the log switch:
2.The Diffusion Pipeline
Run the Diffusion Pipeline script directly to view the model's diffusion properties(Taking image_to_image as an example, the usage method for other models is the same.)[image_to_image][text_to_image][image_to_image][text_to_video]: