Fix async_scheduling + batched prefill #741

tianmu-li · 2025-12-18T23:04:12Z

Cherry-pick of #740

Signed-off-by: Tianmu Li <[email protected]>

Copilot

Pull request overview

This PR fixes issues with async scheduling when combined with batched prefill by properly handling incomplete prompts and refactoring state management for structured output.

Key Changes:

Refactored structured_output from a local variable to an instance variable use_structured_output for better state management
Added logic to ensure at least 1 logit is generated for incomplete prompts when using async scheduling or structured output
Moved invalid_req_indices initialization and tracking to instance-level scope for proper lifecycle management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-18T23:05:12Z

vllm_gaudi/v1/worker/hpu_model_runner.py

        )

        self.use_async_scheduling = self.scheduler_config.async_scheduling
+        self.use_structured_output: bool = False  # Default to false. Set to true when needed during a run


Capitalized 'false' and 'true' to match Python boolean literals 'False' and 'True'.

Suggested change

self.use_structured_output: bool = False # Default to false. Set to true when needed during a run

self.use_structured_output: bool = False # Default to False. Set to True when needed during a run

Copilot · 2025-12-18T23:05:12Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+                # NOTE(tianmu-li): align behavior of incomplete prompt with gpu_model_runner
+                # Always have at least 1 logit when using async scheduling
+                # or structured output
+                if seq_num_computed_tokens + seq_num_scheduled_tokens - num_prompt_tokens + 1 < 1:


The condition seq_num_computed_tokens + seq_num_scheduled_tokens - num_prompt_tokens + 1 < 1 is complex and unclear. Consider simplifying to seq_num_computed_tokens + seq_num_scheduled_tokens < num_prompt_tokens for better readability.

Suggested change

if seq_num_computed_tokens + seq_num_scheduled_tokens - num_prompt_tokens + 1 < 1:

if seq_num_computed_tokens + seq_num_scheduled_tokens < num_prompt_tokens:

github-actions · 2025-12-19T00:09:55Z

✅ CI Passed

All checks passed successfully against the following vllm commit:
4fd9d6a85c00ac0186aa9abbeff73fc2ac6c721e

cherry-pick vllm-project#740

f53408f

Signed-off-by: Tianmu Li <[email protected]>

Copilot AI review requested due to automatic review settings December 18, 2025 23:04

tianmu-li requested review from mgawarkiewicz-intel, piotrbocian and wpyszka as code owners December 18, 2025 23:04

Copilot AI reviewed Dec 18, 2025

View reviewed changes

github-actions bot mentioned this pull request Dec 19, 2025

🚦 Team Review Dashboard #701

Open

PatrykWo deleted the branch vllm-project:releases/v0.12.0 January 12, 2026 12:10

PatrykWo closed this Jan 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix async_scheduling + batched prefill #741

Fix async_scheduling + batched prefill #741

Uh oh!

tianmu-li commented Dec 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	self.use_structured_output: bool = False # Default to false. Set to true when needed during a run
	self.use_structured_output: bool = False # Default to False. Set to True when needed during a run

	if seq_num_computed_tokens + seq_num_scheduled_tokens - num_prompt_tokens + 1 < 1:
	if seq_num_computed_tokens + seq_num_scheduled_tokens < num_prompt_tokens:

Fix async_scheduling + batched prefill #741

Fix async_scheduling + batched prefill #741

Uh oh!

Conversation

tianmu-li commented Dec 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 19, 2025

✅ CI Passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants