Skip to content

Conversation

@tzhouam
Copy link
Collaborator

@tzhouam tzhouam commented Dec 20, 2025

Purpose

This PR aims to control the timeout threshold in the stage init as in Issue #386, which is defined in PR #328 by --stage-init-timeout, and is renamed from --init-sleep-time.

Test Plan

Tested for Qwen2.5/3 Omni 3B for both online and offline

Test Result

Successfully controlled the stage init timeout for both online and offline.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…ion 0.12.0

- Updated installation commands and version references in CUDA, ROCm, and NPU documentation to reflect the new vLLM v0.12.0 release.
- Adjusted Docker image tags and version checks accordingly.
…points

- Replaced instances of `AnyTokenizer` with `TokenizerLike` in `output_processor.py`, `async_omni.py`, and `omni_stage.py`.
- Updated the tokenizer initialization function to `init_tokenizer_from_config` in `async_omni.py` for consistency.
…ingle stage init timeout

Signed-off-by: Taichang Zhou <[email protected]>
Signed-off-by: Taichang Zhou <[email protected]>
Signed-off-by: Taichang Zhou <[email protected]>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the stage initialization timeout mechanism by renaming init_sleep_seconds to stage_init_timeout and changing its behavior from a sleep duration between stage starts to a timeout threshold for device lock acquisition during stage initialization. The default value is updated from 20/30 seconds to 300 seconds (5 minutes).

Key changes:

  • Renamed parameter from init_sleep_seconds/init-sleep-seconds to stage_init_timeout/stage-init-timeout across all interfaces
  • Removed hardcoded 300-second timeout in favor of configurable parameter passed to worker functions
  • Removed sleep between stage starts in orchestration logic
  • Added informational logging for successful stage initialization with timing

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
vllm_omni/entrypoints/omni_stage.py Added stage_init_timeout parameter to __init__, removed hardcoded max_wait_time, propagated timeout to worker functions, added initialization success logging
vllm_omni/entrypoints/omni_llm.py Renamed parameter from init_sleep_seconds to stage_init_timeout, updated default from 20 to 300, removed sleep after stage start, updated error messages
vllm_omni/entrypoints/cli/serve.py Renamed CLI argument from --init-sleep-seconds to --stage-init-timeout, updated default from 30 to 300, improved help text
vllm_omni/entrypoints/async_omni.py Renamed parameter throughout async implementation, updated error messages, removed sleep after stage start
tests/e2e/online_serving/test_qwen3_omni.py Updated test fixture to use new --stage-init-timeout argument
tests/e2e/offline_inference/test_qwen3_omni.py Updated test to use new stage_init_timeout parameter with value 300
tests/e2e/offline_inference/conftest.py Renamed parameter in OmniRunner, updated default to 300, improved docstring
examples/offline_inference/qwen3_omni/run_single_prompt_tp.sh Updated script to use new --stage-init-timeout argument
examples/offline_inference/qwen3_omni/end2end.py Renamed argument from --init-sleep-seconds to --stage-init-timeout, updated default to 300, passed to Omni constructor
examples/offline_inference/qwen2_5_omni/end2end.py Renamed argument from --init-sleep-seconds to --stage-init-timeout, updated default to 300, passed to OmniLLM constructor

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

log_stats: bool = False,
log_file: str | None = None,
init_sleep_seconds: int = 20,
stage_init_timeout: int = 20,
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an inconsistency in the default value for 'stage_init_timeout'. The OmniLLM class uses a default of 20 seconds, while the CLI serve.py uses a default of 300 seconds. Given that this parameter now represents a timeout threshold (not a sleep duration), and considering the change mentioned in the PR description, 300 seconds appears to be the intended default. The default in OmniLLM should be updated to match.

Copilot uses AI. Check for mistakes.
try:
_os.close(lock_fd)
_logging.getLogger(__name__).debug("[Stage-%s] Released initialization lock (fd=%s)", stage_id, lock_fd)
_logging.getLogger(__name__).info(
Copy link

Copilot AI Dec 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable 'wait_start' is only defined inside the 'if device_type == "cuda"' block (line 556), but it's being referenced in the log message on line 630, which is outside that block. This will cause a NameError when running on non-CUDA devices. The variable should be initialized before the device_type check, or the log message should be moved inside the CUDA block.

Copilot uses AI. Check for mistakes.
@tzhouam tzhouam added the ready label to trigger buildkite CI label Dec 20, 2025
…as a new input parameter

Signed-off-by: Taichang Zhou <[email protected]>
@hsliuustc0106
Copy link
Collaborator

please also add your test result for Qwen3-Omni

python end2end.py --output-wav output_audio \
--query-type use_audio \
--init-sleep-seconds 90
--stage-init-timeout 90
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we choose 90 seconds

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not the default 300?

@tzhouam
Copy link
Collaborator Author

tzhouam commented Dec 23, 2025

please also add your test result for Qwen3-Omni

Tested on Qwen3 Omni and also works.

vLLM-Omni is built based on vLLM. Please install it with command below.
```bash
uv pip install vllm==0.11.0 --torch-backend=auto
uv pip install vllm==0.12.0 --torch-backend=auto
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to change this? this will bring troubles since we have not release v0.12.0rc

-p 8091:8091 \
--ipc=host \
vllm/vllm-omni:v0.11.0rc1 \
vllm/vllm-omni:v0.12.0rc1 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to change this? this will bring troubles since we have not release v0.12.0rc

Signed-off-by: Taichang Zhou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants