[Profile] Adding profiling hooks for omni&vllm&diffusion pipeline #340

erfgss · 2025-12-17T01:40:54Z

Profiling vLLM-Omni

This guide provides detailed instructions on how to use the logger system in vllm-omni.

In vllm-omni, there are two different scheduling paths:

Diffusion/DiT Single diffusion Pipeline[image_to_image][text_to_image][image_to_image][text_to_video]
Multi-Stage Pipeline for Multimodal Understanding and Speech Generation[qwen2_5_omni][qwen3_omni]

The logging content and usage methods of the logger system under different scheduling paths are as follows:

Recording Content and Usage Instructions

1. VLLM features

VLLM features it log for root module vllm, and the sub model automatically inherit the parent logger. But the vllm_omni module failed to automatically inherit vllm.So we need to init vllm_omni root logger, witch inherit the parent logger.vLLM config includes communication methods, scheduling modes, parallelism, and runtime scale. It also includes shared memory pressure status, model size, and observed GPU memory usage during runtime.The VLLM config content recorded by Single the Diffusion Pipeline model and the Multi-Stage Pipeline model is the same.

How to view vllm features

Before running the scripts in the examples, set the environment variables to view the vLLM config in the logs printed in the terminal.

export VLLM_LOGGING_LEVEL=DEBUG

2.VLLM-omni features

The vllm-omni feature provides multi-dimensional metrics such as end-to-end performance, IPC communication, pipeline scheduling, and engine passthrough, enabling full observability and detailed performance analysis throughout the entire multimodal inference process. However, since the Diffusion Pipeline model does not schedule the omni feature, only the Multi-Stage Pipeline model can access the omni feature.[qwen2_5_omni][qwen3_omni]

How to view VLLM-omni features

During the operation of the Multi-Stage Pipeline model, the Omni feature is automatically invoked. You can directly run the script to view the Omni feature of the model.[qwen2_5_omni][qwen3_omni]

sh run_multiple_prompts.sh

3.Diffusion features

The Multi-Stage Pipeline logs do not directly record the details of the diffusion algorithm. Instead, they abstract a complete diffusion process into a single Stage, indirectly reflecting the overall performance of diffusion through stage_gen_time_ms, and focus on recording IPC and scheduling characteristics across different Stages.
The Diffusion Pipeline logs comprehensively cover the core macro characteristics of diffusion inference, including model loading, CFG, number of inference steps, total diffusion time, average denoising step time, and other parameters.

How to view Diffusion features

1.The Multi-Stage Pipeline

Setting the log switch:

    omni_llm = Omni(
        model=model_name,
        log_stats=args.enable_stats,#Setting  enable_stats=True 
        log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None)
    )

or

    omni_llm = Omni(
        model=model_name,
        log_stats=True 
        log_file=os.path.join(log_dir, "omni_llm_pipeline.log") 
    )

Setting the log switch:

sh run_multiple_prompts.sh

2.The Diffusion Pipeline

Run the Diffusion Pipeline script directly to view the model's diffusion properties(Taking image_to_image as an example, the usage method for other models is the same.)[image_to_image][text_to_image][image_to_image][text_to_video]:

    python image_edit.py \
        --image input.png \
        --prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \
        --output output_image_edit.png \
        --num_inference_steps 50 \
        --cfg_scale 4.0

Signed-off-by: Chen Yang <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-17T01:44:27Z

vllm_omni/entrypoints/log_utils.py

    try:
        stage_req_counts[stage_id] += 1
        stage_total_tokens[stage_id] += int(metrics.get("num_tokens_out", 0))
-        rid_key = str(req_id)
-        pr = per_request.setdefault(rid_key, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})
+        rid_int = int(req_id)
+        pr = per_request.setdefault(rid_int, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})


Preserve per-request stage metrics for non-numeric IDs

When a stage emits a request ID that isn’t purely numeric (e.g., the orchestrator logs 0_5369… UUID-style IDs), record_stage_metrics now forces int(req_id) and silently swallows the resulting ValueError. Because the exception short-circuits the function, the per-request stages entry is never recorded, so per-request stats and the overall_request records will be missing stage timings/transfers for any non-numeric request IDs. This regression means the new profiling output is incomplete for the typical string IDs shown in the provided logs.

Useful? React with 👍 / 👎.

congw729 · 2025-12-17T02:03:35Z

The pre-commit test failed; you can run the pre-commit locally before submitting.

DarkLight1337 · 2025-12-17T04:15:08Z

examples/offline_inference/qwen2_5_omni/end2end.py

 import os
-from typing import NamedTuple
-
+from typing import NamedTuple, Optional


Should be using | None instead of Optional using Python 3.10+. Have you merged from main and run pre-commit run --all-files?

hsliuustc0106 · 2025-12-17T06:16:04Z

please add an docs(profiling vllm-omni) under developer guide :)

erfgss · 2025-12-17T06:41:09Z

please add an docs(profiling vllm-omni) under developer guide :)

sure！ thanks

Signed-off-by: Chen Yang <[email protected]>

hsliuustc0106 · 2025-12-18T06:37:25Z

docs/contributing/profiling.md

-# Profiling vLLM-Omni (update soon)
+# \# Profiling vLLM-Omni
+
+# \## profiling hooks for omni\&vllm\&diffusion pipeline


add some background to tell the users why we cannot directly use the vLLM profiling method and what's the different scenarios.

hsliuustc0106 · 2025-12-18T06:38:53Z

docs/contributing/profiling.md

+
+# 
+
+# \## 1.Usage of Log Statistics for Single-Pipeline Diffusion Scheduling  


I think you need to reorganize the sections with names, contents, examples. Ask LLM for help

hsliuustc0106 · 2025-12-18T06:40:12Z

docs/contributing/profiling.md

+
+# 
+
+# In this project, tasks such as text-to-image and text-to-video follow a single-pipeline diffusion scheduling paradigm.  


the format is in a mess, you can preview the docs locally. check other md docs for reference.

Signed-off-by: Chen Yang <[email protected]>

hsliuustc0106

I think there a lot of places requiring improvement for this PR

hsliuustc0106 · 2025-12-19T23:06:02Z

docs/contributing/profiling.md

+ ```bash
+ export VLLM_LOGGING_LEVEL=DEBUG
+ ```
+### 2.VLLM-omni features


Suggested change

### 2.VLLM-omni features

### 2.vLLM-Omni features

apply for all

hsliuustc0106 · 2025-12-19T23:08:47Z

docs/contributing/profiling.md

+### 2.VLLM-omni features
+The vllm-omni feature provides multi-dimensional metrics such as end-to-end performance, IPC communication, pipeline scheduling, and engine passthrough, enabling full observability and detailed performance analysis throughout the entire multimodal inference process. However, since the Diffusion Pipeline model does not schedule the omni feature, only the Multi-Stage Pipeline model can access the omni feature.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)
+#### How to view VLLM-omni features
+During the operation of the Multi-Stage Pipeline model, the Omni feature is automatically invoked. You can directly run the script to view the Omni feature of the model.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)


this is not a good way to tell other how to view xxx

you need to provide a specific example and explain what you can get from the example and what the results mean to the users

hsliuustc0106 · 2025-12-19T23:10:01Z

docs/contributing/profiling.md

+
+
+#### How to view Diffusion features
+1.The Multi-Stage Pipeline


why multi-stage here?

hsliuustc0106 · 2025-12-19T23:10:49Z

docs/contributing/profiling.md


-Profiling is only intended for vLLM-Omni developers and maintainers to understand the proportion of time spent in different parts of the codebase. **vLLM-Omni end-users should never turn on profiling** as it will significantly slow down the inference.
+In vllm-omni, there are two different scheduling paths:
+• Diffusion/DiT Single diffusion Pipeline[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video)


add one blank line

hsliuustc0106 · 2025-12-19T23:11:26Z

docs/contributing/profiling.md


-Profiling is only intended for vLLM-Omni developers and maintainers to understand the proportion of time spent in different parts of the codebase. **vLLM-Omni end-users should never turn on profiling** as it will significantly slow down the inference.
+In vllm-omni, there are two different scheduling paths:
+• Diffusion/DiT Single diffusion Pipeline[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video)


add 1 blank line

hsliuustc0106 · 2025-12-19T23:13:21Z

examples/offline_inference/qwen3_omni/end2end.py

    omni_llm = Omni(
        model=model_name,
        stage_configs_path=args.stage_configs_path,
+        log_stats=True,


default should be false

hsliuustc0106 · 2025-12-19T23:13:45Z

vllm_omni/entrypoints/log_utils.py

 def configure_orchestrator_logger(logger: logging.Logger, log_file: str | None) -> None:
+    """
+    Attach a FileHandler to the given logger, and also to the
+    module-level orchestrator & diffusion loggers so they share


Suggested change

module-level orchestrator & diffusion loggers so they share

multi-stage orchestrator & diffusion loggers so they share

hsliuustc0106 · 2025-12-19T23:15:21Z

vllm_omni/entrypoints/omni_diffusion.py

+_STATS_PATH = _make_run_stats_path()
+
+_PRINT_DIFFUSION_METRICS = os.getenv("OMNI_DIFFUSION_PRINT", "1") == "1"
+


too many blanks

hsliuustc0106 · 2025-12-19T23:16:27Z

vllm_omni/entrypoints/omni_diffusion.py

-        logger.info(f"Prepared {len(requests)} requests for generation.")
-        return self._run_engine(requests)

-    def _run_engine(self, requests: list[OmniDiffusionRequest]):


why delete this?

hsliuustc0106 · 2025-12-19T23:17:22Z

vllm_omni/entrypoints/omni_diffusion.py

-    You can pass either an `OmniDiffusionConfig` via `od_config`, or
-    pass kwargs such as `model="Qwen/Qwen-Image"`,
-    which will be forwarded to `OmniDiffusionConfig.from_kwargs`.
+    High-level entrypoint for vLLM-Omni diffusion models.


why you need to change this?

hsliuustc0106 · 2025-12-19T23:18:34Z

@david6666666 please help @erfgss to improve this PR

david6666666 · 2025-12-22T02:34:53Z

please fix comment and rebase code

erfgss · 2025-12-22T09:03:47Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-22T09:05:45Z

vllm_omni/entrypoints/omni_diffusion.py

+    stats_dir = os.getenv("OMNI_DIFFUSION_STATS_DIR", "omni_diffusion_stats")
+    os.makedirs(stats_dir, exist_ok=True)
+
+    # Local time; include milliseconds + pid to avoid collisions
+    now = datetime.now()
+    ts = now.strftime("%Y%m%d_%H%M%S") + f"_{now.microsecond // 1000:03d}"
+    pid = os.getpid()
+
+    filename = f"omni_diffusion_{ts}_pid{pid}.jsonl"
+    return os.path.join(stats_dir, filename)
+
+
+_STATS_PATH = _make_run_stats_path()


Avoid failing import on unwritable stats directory

This module now creates the stats directory at import time via _STATS_PATH = _make_run_stats_path(), which calls os.makedirs(...) without any error handling. In environments where the current working directory (or OMNI_DIFFUSION_STATS_DIR) is read-only, simply importing vllm_omni.entrypoints.omni_diffusion will raise and prevent the diffusion entrypoint from working at all. Profiling/logging should be best-effort; consider deferring path creation to runtime and handling filesystem errors so non-writable locations don’t break imports.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-12-22T09:05:45Z

vllm_omni/entrypoints/log_utils.py

    try:
        stage_req_counts[stage_id] += 1
        stage_total_tokens[stage_id] += int(metrics.get("num_tokens_out", 0))
-        rid_key = str(req_id)
-        pr = per_request.setdefault(rid_key, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})
+        rid_int = int(req_id)
+        pr = per_request.setdefault(rid_int, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})
        pr_stages = pr["stages"]  # type: ignore[index]


Preserve non-numeric request IDs in per-request metrics

record_stage_metrics now unconditionally casts req_id to int, and any failure is swallowed by the broad try/except, which skips per-request aggregation entirely. request_id is typed as str in OmniEngineCoreOutput, so non-numeric IDs will silently drop per-request metrics and transfer stats. This is a regression from the previous str(req_id) behavior; use a safe fallback (like the pattern in aggregate_rx_and_maybe_total) to avoid losing metrics for string IDs.

Useful? React with 👍 / 👎.

david6666666 · 2025-12-22T09:25:49Z

examples/offline_inference/qwen2_5_omni/end2end.py

        model=model_name,
        log_stats=args.enable_stats,
-        log_file=("omni_llm_pipeline.log" if args.enable_stats else None),
+        log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None),


why use os.path.join(log_dir, "omni_llm_pipeline.log") ，are there any other alternative methods?

david6666666 · 2025-12-22T09:26:25Z

vllm_omni/entrypoints/log_utils.py

        stage_total_tokens[stage_id] += int(metrics.get("num_tokens_out", 0))
-        rid_key = str(req_id)
-        pr = per_request.setdefault(rid_key, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})
+        rid_int = int(req_id)


why change int

david6666666 · 2025-12-22T09:26:50Z

vllm_omni/entrypoints/omni_diffusion.py

+    pid = os.getpid()
+
+    filename = f"omni_diffusion_{ts}_pid{pid}.jsonl"
+    return os.path.join(stats_dir, filename)


david6666666 · 2025-12-22T09:28:01Z

vllm_omni/entrypoints/omni_diffusion.py

+
+
+
+def _record(event: str, **kv: Any) -> None:


should we need to use this func to do what

[Profile] Adding profiling hooks for omni&vllm&diffusion pipeline

8ae6e64

Signed-off-by: Chen Yang <[email protected]>

erfgss requested a review from hsliuustc0106 as a code owner December 17, 2025 01:40

chatgpt-codex-connector bot reviewed Dec 17, 2025

View reviewed changes

hsliuustc0106 requested review from DarkLight1337, SamitHuang and tzhouam December 17, 2025 03:41

DarkLight1337 reviewed Dec 17, 2025

View reviewed changes

erfgss added 2 commits December 18, 2025 09:23

[Fix] adjust diffusion stats path & log format

1b1f2e5

Signed-off-by: Chen Yang <[email protected]>

[Add] Add docs (profiling vllm-omni) under developer guide

8fb54c7

Signed-off-by: Chen Yang <[email protected]>

hsliuustc0106 reviewed Dec 18, 2025

View reviewed changes

erfgss added 2 commits December 19, 2025 10:27

[Add] Add docs (profiling vllm-omni) under developer guide

fe8bd16

Signed-off-by: Chen Yang <[email protected]>

[Add] Add new docs (profiling vllm-omni) under developer guide

da300a4

Signed-off-by: Chen Yang <[email protected]>

hsliuustc0106 reviewed Dec 19, 2025

View reviewed changes

hsliuustc0106 requested a review from david6666666 December 19, 2025 23:18

david6666666 mentioned this pull request Dec 22, 2025

[RFC]: DiT model and feature support enhancement #85

Open

55 tasks

chatgpt-codex-connector bot reviewed Dec 22, 2025

View reviewed changes

david6666666 reviewed Dec 22, 2025

View reviewed changes

erfgss mentioned this pull request Dec 23, 2025

[Core] Supports stage abstraction in the diffusion model #391

Merged


		#

		# \## 1.Usage of Log Statistics for Single-Pipeline Diffusion Scheduling


		#

		# In this project, tasks such as text-to-image and text-to-video follow a single-pipeline diffusion scheduling paradigm.



		#### How to view Diffusion features
		1.The Multi-Stage Pipeline

	module-level orchestrator & diffusion loggers so they share
	multi-stage orchestrator & diffusion loggers so they share

		_STATS_PATH = _make_run_stats_path()

		_PRINT_DIFFUSION_METRICS = os.getenv("OMNI_DIFFUSION_PRINT", "1") == "1"

[Profile] Adding profiling hooks for omni&vllm&diffusion pipeline #340

Are you sure you want to change the base?

[Profile] Adding profiling hooks for omni&vllm&diffusion pipeline #340

Conversation

erfgss commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Profiling vLLM-Omni

This guide provides detailed instructions on how to use the logger system in vllm-omni.

Recording Content and Usage Instructions

1. VLLM features

How to view vllm features

2.VLLM-omni features

How to view VLLM-omni features

3.Diffusion features

How to view Diffusion features

Setting the log switch:

Setting the log switch:

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

congw729 commented Dec 17, 2025

Uh oh!

DarkLight1337 Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 17, 2025

Uh oh!

erfgss commented Dec 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 19, 2025

Uh oh!

david6666666 commented Dec 22, 2025

Uh oh!

erfgss commented Dec 22, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

erfgss commented Dec 17, 2025 •

edited

Loading

DarkLight1337 Dec 17, 2025 •

edited

Loading