-
Notifications
You must be signed in to change notification settings - Fork 207
[Feature] Control the stage init timeout threshold by --stage-init-timeout #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ion 0.12.0 - Updated installation commands and version references in CUDA, ROCm, and NPU documentation to reflect the new vLLM v0.12.0 release. - Adjusted Docker image tags and version checks accordingly.
…points - Replaced instances of `AnyTokenizer` with `TokenizerLike` in `output_processor.py`, `async_omni.py`, and `omni_stage.py`. - Updated the tokenizer initialization function to `init_tokenizer_from_config` in `async_omni.py` for consistency.
…ingle stage init timeout Signed-off-by: Taichang Zhou <[email protected]>
Signed-off-by: Taichang Zhou <[email protected]>
Signed-off-by: Taichang Zhou <[email protected]>
Signed-off-by: Taichang Zhou <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the stage initialization timeout mechanism by renaming init_sleep_seconds to stage_init_timeout and changing its behavior from a sleep duration between stage starts to a timeout threshold for device lock acquisition during stage initialization. The default value is updated from 20/30 seconds to 300 seconds (5 minutes).
Key changes:
- Renamed parameter from
init_sleep_seconds/init-sleep-secondstostage_init_timeout/stage-init-timeoutacross all interfaces - Removed hardcoded 300-second timeout in favor of configurable parameter passed to worker functions
- Removed sleep between stage starts in orchestration logic
- Added informational logging for successful stage initialization with timing
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| vllm_omni/entrypoints/omni_stage.py | Added stage_init_timeout parameter to __init__, removed hardcoded max_wait_time, propagated timeout to worker functions, added initialization success logging |
| vllm_omni/entrypoints/omni_llm.py | Renamed parameter from init_sleep_seconds to stage_init_timeout, updated default from 20 to 300, removed sleep after stage start, updated error messages |
| vllm_omni/entrypoints/cli/serve.py | Renamed CLI argument from --init-sleep-seconds to --stage-init-timeout, updated default from 30 to 300, improved help text |
| vllm_omni/entrypoints/async_omni.py | Renamed parameter throughout async implementation, updated error messages, removed sleep after stage start |
| tests/e2e/online_serving/test_qwen3_omni.py | Updated test fixture to use new --stage-init-timeout argument |
| tests/e2e/offline_inference/test_qwen3_omni.py | Updated test to use new stage_init_timeout parameter with value 300 |
| tests/e2e/offline_inference/conftest.py | Renamed parameter in OmniRunner, updated default to 300, improved docstring |
| examples/offline_inference/qwen3_omni/run_single_prompt_tp.sh | Updated script to use new --stage-init-timeout argument |
| examples/offline_inference/qwen3_omni/end2end.py | Renamed argument from --init-sleep-seconds to --stage-init-timeout, updated default to 300, passed to Omni constructor |
| examples/offline_inference/qwen2_5_omni/end2end.py | Renamed argument from --init-sleep-seconds to --stage-init-timeout, updated default to 300, passed to OmniLLM constructor |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
vllm_omni/entrypoints/omni_llm.py
Outdated
| log_stats: bool = False, | ||
| log_file: str | None = None, | ||
| init_sleep_seconds: int = 20, | ||
| stage_init_timeout: int = 20, |
Copilot
AI
Dec 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an inconsistency in the default value for 'stage_init_timeout'. The OmniLLM class uses a default of 20 seconds, while the CLI serve.py uses a default of 300 seconds. Given that this parameter now represents a timeout threshold (not a sleep duration), and considering the change mentioned in the PR description, 300 seconds appears to be the intended default. The default in OmniLLM should be updated to match.
vllm_omni/entrypoints/omni_stage.py
Outdated
| try: | ||
| _os.close(lock_fd) | ||
| _logging.getLogger(__name__).debug("[Stage-%s] Released initialization lock (fd=%s)", stage_id, lock_fd) | ||
| _logging.getLogger(__name__).info( |
Copilot
AI
Dec 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'wait_start' is only defined inside the 'if device_type == "cuda"' block (line 556), but it's being referenced in the log message on line 630, which is outside that block. This will cause a NameError when running on non-CUDA devices. The variable should be initialized before the device_type check, or the log message should be moved inside the CUDA block.
Co-authored-by: Copilot <[email protected]> Signed-off-by: Zhou Taichang <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Zhou Taichang <[email protected]>
Co-authored-by: Copilot <[email protected]> Signed-off-by: Zhou Taichang <[email protected]>
Signed-off-by: Taichang Zhou <[email protected]>
…/vllm-omni into dev/control-init-timeout
Signed-off-by: Taichang Zhou <[email protected]>
…as a new input parameter Signed-off-by: Taichang Zhou <[email protected]>
|
please also add your test result for Qwen3-Omni |
| python end2end.py --output-wav output_audio \ | ||
| --query-type use_audio \ | ||
| --init-sleep-seconds 90 | ||
| --stage-init-timeout 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we choose 90 seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not the default 300?
Signed-off-by: tzhouam <[email protected]>
…e default value 300s. Signed-off-by: tzhouam <[email protected]>
Tested on Qwen3 Omni and also works. |
| vLLM-Omni is built based on vLLM. Please install it with command below. | ||
| ```bash | ||
| uv pip install vllm==0.11.0 --torch-backend=auto | ||
| uv pip install vllm==0.12.0 --torch-backend=auto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need to change this? this will bring troubles since we have not release v0.12.0rc
| -p 8091:8091 \ | ||
| --ipc=host \ | ||
| vllm/vllm-omni:v0.11.0rc1 \ | ||
| vllm/vllm-omni:v0.12.0rc1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need to change this? this will bring troubles since we have not release v0.12.0rc
Signed-off-by: Taichang Zhou <[email protected]>
Purpose
This PR aims to control the timeout threshold in the stage init as in Issue #386, which is defined in PR #328 by --stage-init-timeout, and is renamed from --init-sleep-time.
Test Plan
Tested for Qwen2.5/3 Omni 3B for both online and offline
Test Result
Successfully controlled the stage init timeout for both online and offline.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)