[Model] Support Qwen3 models with enable_thinking field #686
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
q0f16, q0f32, q4f16_1, q4f32_1
{1.7B, 4B, 8B} x {q4f16_1, q4f32_1}
extra_body
field andextra_body.enable_thinking
field to support switching between thinking and non-thinking mode. To prevent Qwen3 from thinking, use:examples/qwen3
/no_think
and/think
in the promptInternal notes
enable_thinking
is achieved by:extra_body
andenable_thinking
field toChatCompletionRequest
enable_thinking
field toGenerationConfig
that forwards the value inengine.ts
llm_chat.ts
, whenprefillStep()
andenable_thinking
is false, we callconversation.appendEmptyThinkingReplyHeader()
, instead of the normalappendReplyHeader()
conversation.ts
, adjustgetPromptArrayInternal()
to support the reply header with an empty thinking block, using a fieldisLastMessageEmptyThinkingReplyHeader
tests/conversation.test.ts
Future work
const emptyThinkingBlockStr = "<think>\n\n</think>\n\n";
. This should be configurable per-model in the future. Perhaps make it a part of theConvConfig
compareConversationObject()
inengine.ts
to allow missing several last messages (in this case, the message without the thinking tokens), so that in longer conversations, those that already stripped the thinking tokens can reuse KV