Skip to content

Conversation

@erfgss
Copy link

@erfgss erfgss commented Dec 17, 2025

Profiling vLLM-Omni

This guide provides detailed instructions on how to use the logger system in vllm-omni.

In vllm-omni, there are two different scheduling paths:

The logging content and usage methods of the logger system under different scheduling paths are as follows:

Recording Content and Usage Instructions

1. VLLM features

VLLM features it log for root module vllm, and the sub model automatically inherit the parent logger. But the vllm_omni module failed to automatically inherit vllm.So we need to init vllm_omni root logger, witch inherit the parent logger.vLLM config includes communication methods, scheduling modes, parallelism, and runtime scale. It also includes shared memory pressure status, model size, and observed GPU memory usage during runtime.The VLLM config content recorded by Single the Diffusion Pipeline model and the Multi-Stage Pipeline model is the same.

How to view vllm features

Before running the scripts in the examples, set the environment variables to view the vLLM config in the logs printed in the terminal.

export VLLM_LOGGING_LEVEL=DEBUG

2.VLLM-omni features

The vllm-omni feature provides multi-dimensional metrics such as end-to-end performance, IPC communication, pipeline scheduling, and engine passthrough, enabling full observability and detailed performance analysis throughout the entire multimodal inference process. However, since the Diffusion Pipeline model does not schedule the omni feature, only the Multi-Stage Pipeline model can access the omni feature.[qwen2_5_omni][qwen3_omni]

How to view VLLM-omni features

During the operation of the Multi-Stage Pipeline model, the Omni feature is automatically invoked. You can directly run the script to view the Omni feature of the model.[qwen2_5_omni][qwen3_omni]

sh run_multiple_prompts.sh

3.Diffusion features

  • The Multi-Stage Pipeline logs do not directly record the details of the diffusion algorithm. Instead, they abstract a complete diffusion process into a single Stage, indirectly reflecting the overall performance of diffusion through stage_gen_time_ms, and focus on recording IPC and scheduling characteristics across different Stages.
  • The Diffusion Pipeline logs comprehensively cover the core macro characteristics of diffusion inference, including model loading, CFG, number of inference steps, total diffusion time, average denoising step time, and other parameters.

How to view Diffusion features

1.The Multi-Stage Pipeline

Setting the log switch:
    omni_llm = Omni(
        model=model_name,
        log_stats=args.enable_stats,#Setting  enable_stats=True 
        log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None)
    )

or

    omni_llm = Omni(
        model=model_name,
        log_stats=True 
        log_file=os.path.join(log_dir, "omni_llm_pipeline.log") 
    )
Setting the log switch:
sh run_multiple_prompts.sh

2.The Diffusion Pipeline

Run the Diffusion Pipeline script directly to view the model's diffusion properties(Taking image_to_image as an example, the usage method for other models is the same.)[image_to_image][text_to_image][image_to_image][text_to_video]:

    python image_edit.py \
        --image input.png \
        --prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \
        --output output_image_edit.png \
        --num_inference_steps 50 \
        --cfg_scale 4.0

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 334 to +338
try:
stage_req_counts[stage_id] += 1
stage_total_tokens[stage_id] += int(metrics.get("num_tokens_out", 0))
rid_key = str(req_id)
pr = per_request.setdefault(rid_key, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})
rid_int = int(req_id)
pr = per_request.setdefault(rid_int, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve per-request stage metrics for non-numeric IDs

When a stage emits a request ID that isn’t purely numeric (e.g., the orchestrator logs 0_5369… UUID-style IDs), record_stage_metrics now forces int(req_id) and silently swallows the resulting ValueError. Because the exception short-circuits the function, the per-request stages entry is never recorded, so per-request stats and the overall_request records will be missing stage timings/transfers for any non-numeric request IDs. This regression means the new profiling output is incomplete for the typical string IDs shown in the provided logs.

Useful? React with 👍 / 👎.

@congw729
Copy link
Contributor

The pre-commit test failed; you can run the pre-commit locally before submitting.

import os
from typing import NamedTuple

from typing import NamedTuple, Optional
Copy link
Member

@DarkLight1337 DarkLight1337 Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be using | None instead of Optional using Python 3.10+. Have you merged from main and run pre-commit run --all-files?

@hsliuustc0106
Copy link
Collaborator

please add an docs(profiling vllm-omni) under developer guide :)

@erfgss
Copy link
Author

erfgss commented Dec 17, 2025

please add an docs(profiling vllm-omni) under developer guide :)

sure! thanks

# Profiling vLLM-Omni (update soon)
# \# Profiling vLLM-Omni

# \## profiling hooks for omni\&vllm\&diffusion pipeline
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some background to tell the users why we cannot directly use the vLLM profiling method and what's the different scenarios.


#

# \## 1.Usage of Log Statistics for Single-Pipeline Diffusion Scheduling
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to reorganize the sections with names, contents, examples. Ask LLM for help


#

# In this project, tasks such as text-to-image and text-to-video follow a single-pipeline diffusion scheduling paradigm.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the format is in a mess, you can preview the docs locally. check other md docs for reference.

Copy link
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there a lot of places requiring improvement for this PR

```bash
export VLLM_LOGGING_LEVEL=DEBUG
```
### 2.VLLM-omni features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### 2.VLLM-omni features
### 2.vLLM-Omni features

apply for all

### 2.VLLM-omni features
The vllm-omni feature provides multi-dimensional metrics such as end-to-end performance, IPC communication, pipeline scheduling, and engine passthrough, enabling full observability and detailed performance analysis throughout the entire multimodal inference process. However, since the Diffusion Pipeline model does not schedule the omni feature, only the Multi-Stage Pipeline model can access the omni feature.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)
#### How to view VLLM-omni features
During the operation of the Multi-Stage Pipeline model, the Omni feature is automatically invoked. You can directly run the script to view the Omni feature of the model.[[qwen2_5_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen2_5_omni)[[qwen3_omni]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/qwen3_omni)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a good way to tell other how to view xxx

you need to provide a specific example and explain what you can get from the example and what the results mean to the users



#### How to view Diffusion features
1.The Multi-Stage Pipeline
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why multi-stage here?


Profiling is only intended for vLLM-Omni developers and maintainers to understand the proportion of time spent in different parts of the codebase. **vLLM-Omni end-users should never turn on profiling** as it will significantly slow down the inference.
In vllm-omni, there are two different scheduling paths:
• Diffusion/DiT Single diffusion Pipeline[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add one blank line


Profiling is only intended for vLLM-Omni developers and maintainers to understand the proportion of time spent in different parts of the codebase. **vLLM-Omni end-users should never turn on profiling** as it will significantly slow down the inference.
In vllm-omni, there are two different scheduling paths:
• Diffusion/DiT Single diffusion Pipeline[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_image)[[image_to_image]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image)[[text_to_video]](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/text_to_video)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add 1 blank line

omni_llm = Omni(
model=model_name,
stage_configs_path=args.stage_configs_path,
log_stats=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default should be false

def configure_orchestrator_logger(logger: logging.Logger, log_file: str | None) -> None:
"""
Attach a FileHandler to the given logger, and also to the
module-level orchestrator & diffusion loggers so they share
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
module-level orchestrator & diffusion loggers so they share
multi-stage orchestrator & diffusion loggers so they share

_STATS_PATH = _make_run_stats_path()

_PRINT_DIFFUSION_METRICS = os.getenv("OMNI_DIFFUSION_PRINT", "1") == "1"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many blanks

logger.info(f"Prepared {len(requests)} requests for generation.")
return self._run_engine(requests)

def _run_engine(self, requests: list[OmniDiffusionRequest]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete this?

You can pass either an `OmniDiffusionConfig` via `od_config`, or
pass kwargs such as `model="Qwen/Qwen-Image"`,
which will be forwarded to `OmniDiffusionConfig.from_kwargs`.
High-level entrypoint for vLLM-Omni diffusion models.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why you need to change this?

@hsliuustc0106
Copy link
Collaborator

@david6666666 please help @erfgss to improve this PR

@david6666666
Copy link
Collaborator

please fix comment and rebase code

@erfgss
Copy link
Author

erfgss commented Dec 22, 2025

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +42 to +54
stats_dir = os.getenv("OMNI_DIFFUSION_STATS_DIR", "omni_diffusion_stats")
os.makedirs(stats_dir, exist_ok=True)

# Local time; include milliseconds + pid to avoid collisions
now = datetime.now()
ts = now.strftime("%Y%m%d_%H%M%S") + f"_{now.microsecond // 1000:03d}"
pid = os.getpid()

filename = f"omni_diffusion_{ts}_pid{pid}.jsonl"
return os.path.join(stats_dir, filename)


_STATS_PATH = _make_run_stats_path()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid failing import on unwritable stats directory

This module now creates the stats directory at import time via _STATS_PATH = _make_run_stats_path(), which calls os.makedirs(...) without any error handling. In environments where the current working directory (or OMNI_DIFFUSION_STATS_DIR) is read-only, simply importing vllm_omni.entrypoints.omni_diffusion will raise and prevent the diffusion entrypoint from working at all. Profiling/logging should be best-effort; consider deferring path creation to runtime and handling filesystem errors so non-writable locations don’t break imports.

Useful? React with 👍 / 👎.

Comment on lines 334 to 339
try:
stage_req_counts[stage_id] += 1
stage_total_tokens[stage_id] += int(metrics.get("num_tokens_out", 0))
rid_key = str(req_id)
pr = per_request.setdefault(rid_key, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})
rid_int = int(req_id)
pr = per_request.setdefault(rid_int, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})
pr_stages = pr["stages"] # type: ignore[index]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve non-numeric request IDs in per-request metrics

record_stage_metrics now unconditionally casts req_id to int, and any failure is swallowed by the broad try/except, which skips per-request aggregation entirely. request_id is typed as str in OmniEngineCoreOutput, so non-numeric IDs will silently drop per-request metrics and transfer stats. This is a regression from the previous str(req_id) behavior; use a safe fallback (like the pattern in aggregate_rx_and_maybe_total) to avoid losing metrics for string IDs.

Useful? React with 👍 / 👎.

model=model_name,
log_stats=args.enable_stats,
log_file=("omni_llm_pipeline.log" if args.enable_stats else None),
log_file=(os.path.join(log_dir, "omni_llm_pipeline.log") if args.enable_stats else None),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use os.path.join(log_dir, "omni_llm_pipeline.log") ,are there any other alternative methods?

stage_total_tokens[stage_id] += int(metrics.get("num_tokens_out", 0))
rid_key = str(req_id)
pr = per_request.setdefault(rid_key, {"stages": {}, "transfers_ms": 0.0, "transfers_bytes": 0})
rid_int = int(req_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change int

pid = os.getpid()

filename = f"omni_diffusion_{ts}_pid{pid}.jsonl"
return os.path.join(stats_dir, filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto




def _record(event: str, **kv: Any) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we need to use this func to do what

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants