Skip to content

Conversation

@wonjerry
Copy link
Contributor

Purpose

Fix streaming chat completion error when using stream: true.

The parent class OpenAIServingChat.chat_completion_stream_generator() does not accept enable_force_include_usage as a parameter in vLLM 0.12.0. The method accesses this value via self.enable_force_include_usage internally.

Reference: vLLM v0.12.0 serving_chat.py

Fixes error:
OpenAIServingChat.chat_completion_stream_generator() got an unexpected keyword argument 'enable_force_include_usage'

Test Plan

  curl http://localhost:8000/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{"model": "Qwen3-Omni-30B-A3B-Instruct", "messages": [{"role": "user", "content": "Hello"}], "modalities": ["text"], "stream": true}'

Test Result

Before: 500 Internal Server Error (unexpected keyword argument)

{"error":{"message":"OpenAIServingChat.chat_completion_stream_generator() got an unexpected keyword argument 'enable_force_include_usage'","type":"Internal Server Error","param":null,"code":500}}

After: Another error...

 {"error": {"message": "'OmniRequestOutput' object has no attribute 'prompt_token_ids'", "type": "BadRequestError", "param": null, "code": 400}}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

…generator

The parent class OpenAIServingChat.chat_completion_stream_generator()
no longer accepts enable_force_include_usage as a parameter in vLLM 0.12.0.
The value is already accessible via self.enable_force_include_usage.

Signed-off-by: wonjae.lee0 <[email protected]>
@hsliuustc0106
Copy link
Collaborator

@tzhouam PTAL

@wonjerry
Copy link
Contributor Author

I've checked #350 and #367. I'll follow up on these PRs.

@hsliuustc0106
Copy link
Collaborator

I've checked #350 and #367. I'll follow up on these PRs.

I will get PR #350 merged first and wait for #367 ready, sorry for the misleading streaming error.

@wonjerry
Copy link
Contributor Author

@hsliuustc0106 Thanks! I'm going to close this one.

@wonjerry wonjerry closed this Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants