-
-
Notifications
You must be signed in to change notification settings - Fork 59
feat: enhance thinking mode support for Kiro translator #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @Ravens2121, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Kiro translator's capability to handle Claude's thinking mode. It introduces a more robust and flexible mechanism for enabling this feature, allowing clients to activate it via HTTP headers in addition to existing body-based methods. Furthermore, it improves the output consistency for OpenAI format users by mapping the AI's internal thought processes to a dedicated Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively enhances the Kiro translator to support Claude's thinking mode by detecting it from headers and converting the output to the reasoning_content field for OpenAI compatibility. The changes are well-structured and cover both Claude and OpenAI translation paths.
My review has identified a few areas for improvement:
- A critical logic bug where the duplicate injection prevention for thinking tags is implemented but not actually used.
- Some code duplication that can be refactored for better maintainability, specifically a redundant function and a helper function defined in two places.
Addressing these points will make the implementation more robust and maintainable. Overall, great work on adding this feature.
| // Kiro API doesn't accept max_tokens for thinking. Instead, thinking mode is enabled | ||
| // by injecting <thinking_mode> and <max_thinking_length> tags into the system prompt. | ||
| // We use a fixed max_thinking_length value since Kiro handles the actual budget internally. | ||
| if thinkingEnabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description mentions preventing duplicate injection of thinking tags, and the function hasThinkingTagInBody was added for this purpose. However, this check is not being used within BuildKiroPayload. As a result, the thinking hint will be injected even if the request body already contains <thinking_mode> or <max_thinking_length> tags.
| if thinkingEnabled { | |
| if thinkingEnabled && !hasThinkingTagInBody(claudeBody) { |
| // Kiro API doesn't accept max_tokens for thinking. Instead, thinking mode is enabled | ||
| // by injecting <thinking_mode> and <max_thinking_length> tags into the system prompt. | ||
| // We use a fixed max_thinking_length value since Kiro handles the actual budget internally. | ||
| if thinkingEnabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the Claude translator, the duplicate injection prevention check hasThinkingTagInBody is implemented but not used in BuildKiroPayloadFromOpenAI. This will lead to duplicate thinking tags if the client already provides them.
| if thinkingEnabled { | |
| if thinkingEnabled && !hasThinkingTagInBody(openaiBody) { |
| // BuildClaudeThinkingBlockStopEvent creates a content_block_stop SSE event for thinking blocks. | ||
| func BuildClaudeThinkingBlockStopEvent(index int) []byte { | ||
| event := map[string]interface{}{ | ||
| "type": "content_block_stop", | ||
| "index": index, | ||
| } | ||
| result, _ := json.Marshal(event) | ||
| return []byte("event: content_block_stop\ndata: " + string(result)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new function BuildClaudeThinkingBlockStopEvent is identical to the existing function BuildClaudeContentBlockStopEvent. This introduces unnecessary code duplication. Please remove BuildClaudeThinkingBlockStopEvent and use BuildClaudeContentBlockStopEvent in its place in internal/runtime/executor/kiro_executor.go.
| // hasThinkingTagInBody checks if the request body already contains thinking configuration tags. | ||
| // This is used to prevent duplicate injection when client (e.g., AMP/Cursor) already includes thinking config. | ||
| func hasThinkingTagInBody(body []byte) bool { | ||
| bodyStr := string(body) | ||
| return strings.Contains(bodyStr, "<thinking_mode>") || strings.Contains(bodyStr, "<max_thinking_length>") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function hasThinkingTagInBody is also defined in internal/translator/kiro/claude/kiro_claude_request.go. To improve maintainability and avoid code duplication, this utility function should be moved to a shared package, such as internal/translator/kiro/common, and then called from both kiro_claude_request.go and kiro_openai_request.go.
- Add generateThinkingSignature() function in kiro_claude_response.go
PR: Enhanced Thinking Mode Support - Header Detection & reasoning_content Output
Description
This PR enhances the Kiro translator's support for Claude thinking mode, including:
interleaved-thinking-2025-05-14identifier fromAnthropic-Betaheader to enable thinking modereasoning_contentfield instead of simply skipping itmax_thinking_length: 200000Changes
1.
internal/runtime/executor/kiro_executor.gobuildKiroPayloadForFormat()function signature to addheadersparameteraccumulatedThinkingContentvariable for accumulating thinking content2.
internal/translator/kiro/claude/kiro_claude_request.goNew functions:
IsThinkingEnabledFromHeader(headers http.Header) bool- Detect thinking mode from Anthropic-Beta headerIsThinkingEnabledWithHeaders(req *ClaudeRequest, headers http.Header) bool- Combined detection function, integrating request body and headerhasThinkingTagInBody(req *ClaudeRequest) bool- Detect if thinking tags already exist in request body to prevent duplicate injectionModifications:
max_thinking_length: 2000003.
internal/translator/kiro/claude/kiro_claude_stream.goNew functions:
BuildClaudeThinkingBlockStopEvent()- Build thinking block stop event4.
internal/translator/kiro/openai/kiro_openai.goModifications:
reasoning_contentfield instead of skippingBuildOpenAIResponseWithReasoning()to build response with reasoning content5.
internal/translator/kiro/openai/kiro_openai_request.goNew functions:
checkThinkingModeFromOpenAIWithHeaders(req *OpenAIRequest, headers http.Header) bool- Support detecting thinking mode from headerModifications:
6.
internal/translator/kiro/openai/kiro_openai_response.goNew functions:
BuildOpenAIResponseWithReasoning(content, reasoningContent, model string) *OpenAIResponse- Build OpenAI response withreasoning_contentfield7.
internal/translator/kiro/claude/kiro_claude_response.goNew functions:
generateThinkingSignature()- Generate SHA256 signature for thinking contentModifications:
ExtractThinkingFromContent()- Addsignaturefield to all thinking blocksBug Fixes
Cherry Studio Non-Streaming Mode ZodError Fix
Problem: Cherry Studio reported ZodError validation error in non-streaming mode
Root Cause: Thinking blocks were missing the required
signaturefieldSolution: Generate and add signature field to all thinking blocks using SHA256 hash
Feature Description
Thinking Mode Detection
Two ways to enable thinking mode are supported:
Anthropic-Beta: interleaved-thinking-2025-05-14header to the requestThinking Instruction Injection
When thinking mode is detected as enabled, the system will inject the following content at the beginning of the system prompt:
reasoning_content Output
For OpenAI format responses, thinking content will be converted to the
reasoning_contentfield:{ "choices": [ { "message": { "role": "assistant", "content": "Actual response content", "reasoning_content": "Thinking process content" } } ] }Testing Instructions
1. Claude Format Request Test
Verification points:
2. OpenAI Format Request Test
Verification points:
reasoning_contentfieldcontentfield should contain actual response content3. Duplicate Injection Prevention Test
Send a request that already contains thinking tags to verify no duplicate injection occurs.
4. Streaming Response Test
Test thinking content handling in streaming responses using
stream: trueparameter.Notes
max_thinking_length: 200000is a fixed value and may need adjustment based on actual usage<thinking_mode>or<max_thinking_length>tags already exist in the request body to avoid duplicate injectionRelated Issues
Checklist
PR: 增强 Thinking 模式支持 - Header 检测与 reasoning_content 输出
描述
本 PR 增强了 Kiro 翻译器对 Claude thinking 模式的支持,主要包括:
Anthropic-Betaheader 中检测interleaved-thinking-2025-05-14标识来启用 thinking 模式reasoning_content字段,而非简单跳过max_thinking_length: 200000更改内容
1.
internal/runtime/executor/kiro_executor.gobuildKiroPayloadForFormat()函数签名,添加headers参数accumulatedThinkingContent变量用于累积思考内容2.
internal/translator/kiro/claude/kiro_claude_request.go新增函数:
IsThinkingEnabledFromHeader(headers http.Header) bool- 从 Anthropic-Beta header 检测 thinking 模式IsThinkingEnabledWithHeaders(req *ClaudeRequest, headers http.Header) bool- 综合检测函数,结合请求体和 headerhasThinkingTagInBody(req *ClaudeRequest) bool- 检测请求体中是否已存在 thinking 标签,防止重复注入修改:
max_thinking_length: 2000003.
internal/translator/kiro/claude/kiro_claude_stream.go新增函数:
BuildClaudeThinkingBlockStopEvent()- 构建 thinking 块停止事件4.
internal/translator/kiro/openai/kiro_openai.go修改:
reasoning_content字段,而非跳过BuildOpenAIResponseWithReasoning()构建包含推理内容的响应5.
internal/translator/kiro/openai/kiro_openai_request.go新增函数:
checkThinkingModeFromOpenAIWithHeaders(req *OpenAIRequest, headers http.Header) bool- 支持从 header 检测 thinking 模式修改:
6.
internal/translator/kiro/openai/kiro_openai_response.go新增函数:
BuildOpenAIResponseWithReasoning(content, reasoningContent, model string) *OpenAIResponse- 构建包含reasoning_content字段的 OpenAI 响应7.
internal/translator/kiro/claude/kiro_claude_response.go新增函数:
generateThinkingSignature()- 为 thinking 内容生成 SHA256 签名修改:
ExtractThinkingFromContent()- 为所有 thinking 块添加signature字段问题修复
Cherry Studio 非流模式 ZodError 修复
问题:Cherry Studio 在非流模式下报告 ZodError 验证错误
原因:thinking 块缺少必需的
signature字段解决方案:使用 SHA256 哈希为所有 thinking 块生成并添加 signature 字段
功能说明
Thinking 模式检测
支持两种方式启用 thinking 模式:
Anthropic-Beta: interleaved-thinking-2025-05-14headerThinking 指令注入
当检测到 thinking 模式启用时,系统会在系统提示词开头注入以下内容:
reasoning_content 输出
对于 OpenAI 格式的响应,thinking 内容会被转换为
reasoning_content字段:{ "choices": [ { "message": { "role": "assistant", "content": "实际回复内容", "reasoning_content": "思考过程内容" } } ] }测试说明
1. Claude 格式请求测试
验证点:
2. OpenAI 格式请求测试
验证点:
reasoning_content字段content字段应包含实际回复内容3. 防重复注入测试
发送已包含 thinking 标签的请求,验证不会重复注入。
4. 流式响应测试
使用
stream: true参数测试流式响应中的 thinking 内容处理。注意事项
max_thinking_length: 200000是固定值,可能需要根据实际使用情况调整<thinking_mode>或<max_thinking_length>标签,避免重复注入相关 Issue
Checklist