Skip to content

Conversation

@tianmu-li
Copy link
Contributor

Cherry-pick of #740

Copilot AI review requested due to automatic review settings December 18, 2025 23:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes issues with async scheduling when combined with batched prefill by properly handling incomplete prompts and refactoring state management for structured output.

Key Changes:

  • Refactored structured_output from a local variable to an instance variable use_structured_output for better state management
  • Added logic to ensure at least 1 logit is generated for incomplete prompts when using async scheduling or structured output
  • Moved invalid_req_indices initialization and tracking to instance-level scope for proper lifecycle management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

self.use_async_scheduling = self.scheduler_config.async_scheduling
self.use_structured_output: bool = False # Default to false. Set to true when needed during a run
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalized 'false' and 'true' to match Python boolean literals 'False' and 'True'.

Suggested change
self.use_structured_output: bool = False # Default to false. Set to true when needed during a run
self.use_structured_output: bool = False # Default to False. Set to True when needed during a run

Copilot uses AI. Check for mistakes.
# NOTE(tianmu-li): align behavior of incomplete prompt with gpu_model_runner
# Always have at least 1 logit when using async scheduling
# or structured output
if seq_num_computed_tokens + seq_num_scheduled_tokens - num_prompt_tokens + 1 < 1:
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition seq_num_computed_tokens + seq_num_scheduled_tokens - num_prompt_tokens + 1 < 1 is complex and unclear. Consider simplifying to seq_num_computed_tokens + seq_num_scheduled_tokens < num_prompt_tokens for better readability.

Suggested change
if seq_num_computed_tokens + seq_num_scheduled_tokens - num_prompt_tokens + 1 < 1:
if seq_num_computed_tokens + seq_num_scheduled_tokens < num_prompt_tokens:

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
4fd9d6a85c00ac0186aa9abbeff73fc2ac6c721e

@PatrykWo PatrykWo deleted the branch vllm-project:releases/v0.12.0 January 12, 2026 12:10
@PatrykWo PatrykWo closed this Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants