Skip to content

Conversation

@Ravens2121
Copy link

@Ravens2121 Ravens2121 commented Dec 15, 2025

PR: Enhanced Thinking Mode Support - Header Detection & reasoning_content Output

Description

This PR enhances the Kiro translator's support for Claude thinking mode, including:

  1. Header Detection: Support detecting interleaved-thinking-2025-05-14 identifier from Anthropic-Beta header to enable thinking mode
  2. reasoning_content Output: Convert thinking content to OpenAI-compatible reasoning_content field instead of simply skipping it
  3. System Prompt Optimization: Inject thinking instructions at the beginning of system prompt with fixed max_thinking_length: 200000
  4. Duplicate Injection Prevention: Detect if thinking tags already exist in request body to avoid duplicate injection

Changes

1. internal/runtime/executor/kiro_executor.go

  • Modified buildKiroPayloadForFormat() function signature to add headers parameter
  • Support thinking mode detection, passing headers to translator
  • Added accumulatedThinkingContent variable for accumulating thinking content

2. internal/translator/kiro/claude/kiro_claude_request.go

New functions:

  • IsThinkingEnabledFromHeader(headers http.Header) bool - Detect thinking mode from Anthropic-Beta header
  • IsThinkingEnabledWithHeaders(req *ClaudeRequest, headers http.Header) bool - Combined detection function, integrating request body and header
  • hasThinkingTagInBody(req *ClaudeRequest) bool - Detect if thinking tags already exist in request body to prevent duplicate injection

Modifications:

  • Moved thinking prompt to the beginning of system prompt
  • Using fixed max_thinking_length: 200000

3. internal/translator/kiro/claude/kiro_claude_stream.go

New functions:

  • BuildClaudeThinkingBlockStopEvent() - Build thinking block stop event

4. internal/translator/kiro/openai/kiro_openai.go

Modifications:

  • Convert thinking block content to reasoning_content field instead of skipping
  • Use BuildOpenAIResponseWithReasoning() to build response with reasoning content

5. internal/translator/kiro/openai/kiro_openai_request.go

New functions:

  • checkThinkingModeFromOpenAIWithHeaders(req *OpenAIRequest, headers http.Header) bool - Support detecting thinking mode from header

Modifications:

  • Simplified thinking mode detection logic

6. internal/translator/kiro/openai/kiro_openai_response.go

New functions:

  • BuildOpenAIResponseWithReasoning(content, reasoningContent, model string) *OpenAIResponse - Build OpenAI response with reasoning_content field

7. internal/translator/kiro/claude/kiro_claude_response.go

New functions:

  • generateThinkingSignature() - Generate SHA256 signature for thinking content

Modifications:

  • ExtractThinkingFromContent() - Add signature field to all thinking blocks

Bug Fixes

Cherry Studio Non-Streaming Mode ZodError Fix

Problem: Cherry Studio reported ZodError validation error in non-streaming mode
Root Cause: Thinking blocks were missing the required signature field
Solution: Generate and add signature field to all thinking blocks using SHA256 hash

Feature Description

Thinking Mode Detection

Two ways to enable thinking mode are supported:

  1. Header Method: Add Anthropic-Beta: interleaved-thinking-2025-05-14 header to the request
  2. Request Body Method: Include thinking-related configuration in the request body

Thinking Instruction Injection

When thinking mode is detected as enabled, the system will inject the following content at the beginning of the system prompt:

<thinking_mode>interleaved</thinking_mode>
<max_thinking_length>200000</max_thinking_length>

reasoning_content Output

For OpenAI format responses, thinking content will be converted to the reasoning_content field:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Actual response content",
        "reasoning_content": "Thinking process content"
      }
    }
  ]
}

Testing Instructions

1. Claude Format Request Test

curl -X POST http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "Anthropic-Beta: interleaved-thinking-2025-05-14" \
  -d '{
    "model": "claude-3-opus",
    "messages": [{"role": "user", "content": "Please explain the basic principles of quantum computing"}]
  }'

Verification points:

  • Response should contain thinking blocks
  • System prompt should contain thinking instructions at the beginning

2. OpenAI Format Request Test

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Anthropic-Beta: interleaved-thinking-2025-05-14" \
  -d '{
    "model": "claude-3-opus",
    "messages": [{"role": "user", "content": "Please explain the basic principles of quantum computing"}]
  }'

Verification points:

  • Response should contain reasoning_content field
  • content field should contain actual response content

3. Duplicate Injection Prevention Test

Send a request that already contains thinking tags to verify no duplicate injection occurs.

4. Streaming Response Test

Test thinking content handling in streaming responses using stream: true parameter.

Notes

  1. Compatibility: This change is backward compatible and does not affect requests without thinking mode enabled
  2. Performance: max_thinking_length: 200000 is a fixed value and may need adjustment based on actual usage
  3. Header Priority: When both header and request body specify thinking mode, either one being enabled will take effect
  4. Duplicate Prevention Mechanism: The system will detect if <thinking_mode> or <max_thinking_length> tags already exist in the request body to avoid duplicate injection

Related Issues

  • N/A

Checklist

  • Code has passed local testing
  • Code follows project coding standards
  • Necessary comments have been added
  • Related documentation has been updated

PR: 增强 Thinking 模式支持 - Header 检测与 reasoning_content 输出

描述

本 PR 增强了 Kiro 翻译器对 Claude thinking 模式的支持,主要包括:

  1. Header 检测:支持从 Anthropic-Beta header 中检测 interleaved-thinking-2025-05-14 标识来启用 thinking 模式
  2. reasoning_content 输出:将 thinking 内容转换为 OpenAI 兼容的 reasoning_content 字段,而非简单跳过
  3. 系统提示优化:将 thinking 指令注入到系统提示词开头,使用固定的 max_thinking_length: 200000
  4. 防重复注入:检测请求体中是否已存在 thinking 标签,避免重复注入

更改内容

1. internal/runtime/executor/kiro_executor.go

  • 修改 buildKiroPayloadForFormat() 函数签名,添加 headers 参数
  • 支持 thinking 模式检测,传递 headers 到翻译器
  • 添加 accumulatedThinkingContent 变量用于累积思考内容

2. internal/translator/kiro/claude/kiro_claude_request.go

新增函数:

  • IsThinkingEnabledFromHeader(headers http.Header) bool - 从 Anthropic-Beta header 检测 thinking 模式
  • IsThinkingEnabledWithHeaders(req *ClaudeRequest, headers http.Header) bool - 综合检测函数,结合请求体和 header
  • hasThinkingTagInBody(req *ClaudeRequest) bool - 检测请求体中是否已存在 thinking 标签,防止重复注入

修改:

  • thinking 提示移至系统提示开头位置
  • 使用固定的 max_thinking_length: 200000

3. internal/translator/kiro/claude/kiro_claude_stream.go

新增函数:

  • BuildClaudeThinkingBlockStopEvent() - 构建 thinking 块停止事件

4. internal/translator/kiro/openai/kiro_openai.go

修改:

  • thinking 块内容转换为 reasoning_content 字段,而非跳过
  • 使用 BuildOpenAIResponseWithReasoning() 构建包含推理内容的响应

5. internal/translator/kiro/openai/kiro_openai_request.go

新增函数:

  • checkThinkingModeFromOpenAIWithHeaders(req *OpenAIRequest, headers http.Header) bool - 支持从 header 检测 thinking 模式

修改:

  • 简化 thinking 模式检测逻辑

6. internal/translator/kiro/openai/kiro_openai_response.go

新增函数:

  • BuildOpenAIResponseWithReasoning(content, reasoningContent, model string) *OpenAIResponse - 构建包含 reasoning_content 字段的 OpenAI 响应

7. internal/translator/kiro/claude/kiro_claude_response.go

新增函数:

  • generateThinkingSignature() - 为 thinking 内容生成 SHA256 签名

修改:

  • ExtractThinkingFromContent() - 为所有 thinking 块添加 signature 字段

问题修复

Cherry Studio 非流模式 ZodError 修复

问题:Cherry Studio 在非流模式下报告 ZodError 验证错误
原因:thinking 块缺少必需的 signature 字段
解决方案:使用 SHA256 哈希为所有 thinking 块生成并添加 signature 字段

功能说明

Thinking 模式检测

支持两种方式启用 thinking 模式:

  1. Header 方式:在请求中添加 Anthropic-Beta: interleaved-thinking-2025-05-14 header
  2. 请求体方式:在请求体中包含 thinking 相关配置

Thinking 指令注入

当检测到 thinking 模式启用时,系统会在系统提示词开头注入以下内容:

<thinking_mode>interleaved</thinking_mode>
<max_thinking_length>200000</max_thinking_length>

reasoning_content 输出

对于 OpenAI 格式的响应,thinking 内容会被转换为 reasoning_content 字段:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "实际回复内容",
        "reasoning_content": "思考过程内容"
      }
    }
  ]
}

测试说明

1. Claude 格式请求测试

curl -X POST http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "Anthropic-Beta: interleaved-thinking-2025-05-14" \
  -d '{
    "model": "claude-3-opus",
    "messages": [{"role": "user", "content": "请解释量子计算的基本原理"}]
  }'

验证点:

  • 响应中应包含 thinking 块
  • 系统提示词开头应包含 thinking 指令

2. OpenAI 格式请求测试

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Anthropic-Beta: interleaved-thinking-2025-05-14" \
  -d '{
    "model": "claude-3-opus",
    "messages": [{"role": "user", "content": "请解释量子计算的基本原理"}]
  }'

验证点:

  • 响应中应包含 reasoning_content 字段
  • content 字段应包含实际回复内容

3. 防重复注入测试

发送已包含 thinking 标签的请求,验证不会重复注入。

4. 流式响应测试

使用 stream: true 参数测试流式响应中的 thinking 内容处理。

注意事项

  1. 兼容性:本更改向后兼容,不影响未启用 thinking 模式的请求
  2. 性能max_thinking_length: 200000 是固定值,可能需要根据实际使用情况调整
  3. Header 优先级:当 header 和请求体同时指定 thinking 模式时,任一启用即生效
  4. 防重复机制:系统会检测请求体中是否已存在 <thinking_mode><max_thinking_length> 标签,避免重复注入

相关 Issue

  • N/A

Checklist

  • 代码已通过本地测试
  • 代码符合项目编码规范
  • 已添加必要的注释
  • 更新了相关文档

@gemini-code-assist
Copy link

Summary of Changes

Hello @Ravens2121, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the Kiro translator's capability to handle Claude's thinking mode. It introduces a more robust and flexible mechanism for enabling this feature, allowing clients to activate it via HTTP headers in addition to existing body-based methods. Furthermore, it improves the output consistency for OpenAI format users by mapping the AI's internal thought processes to a dedicated reasoning_content field, making the reasoning accessible and standardized. The changes also refine how thinking instructions are injected into system prompts and include safeguards against duplicate injections, contributing to a more reliable and user-friendly experience when leveraging advanced AI reasoning features.

Highlights

  • Header-based Thinking Mode Detection: The Kiro translator now supports enabling Claude's thinking mode by detecting the 'interleaved-thinking-2025-05-14' identifier from the 'Anthropic-Beta' HTTP header, providing an alternative to body-based configuration.
  • OpenAI reasoning_content Output: Thinking content from Claude responses is now converted into an OpenAI-compatible reasoning_content field instead of being skipped, allowing OpenAI format users to access the AI's internal reasoning process.
  • System Prompt Optimization: Thinking instructions are now consistently injected at the beginning of the system prompt with a fixed max_thinking_length: 200000, streamlining the configuration for thinking mode.
  • Duplicate Injection Prevention: The system now detects if thinking tags already exist in the request body, preventing redundant injection of thinking instructions and ensuring clean prompt construction.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively enhances the Kiro translator to support Claude's thinking mode by detecting it from headers and converting the output to the reasoning_content field for OpenAI compatibility. The changes are well-structured and cover both Claude and OpenAI translation paths.

My review has identified a few areas for improvement:

  • A critical logic bug where the duplicate injection prevention for thinking tags is implemented but not actually used.
  • Some code duplication that can be refactored for better maintainability, specifically a redundant function and a helper function defined in two places.

Addressing these points will make the implementation more robust and maintainable. Overall, great work on adding this feature.

// Kiro API doesn't accept max_tokens for thinking. Instead, thinking mode is enabled
// by injecting <thinking_mode> and <max_thinking_length> tags into the system prompt.
// We use a fixed max_thinking_length value since Kiro handles the actual budget internally.
if thinkingEnabled {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The PR description mentions preventing duplicate injection of thinking tags, and the function hasThinkingTagInBody was added for this purpose. However, this check is not being used within BuildKiroPayload. As a result, the thinking hint will be injected even if the request body already contains <thinking_mode> or <max_thinking_length> tags.

Suggested change
if thinkingEnabled {
if thinkingEnabled && !hasThinkingTagInBody(claudeBody) {

// Kiro API doesn't accept max_tokens for thinking. Instead, thinking mode is enabled
// by injecting <thinking_mode> and <max_thinking_length> tags into the system prompt.
// We use a fixed max_thinking_length value since Kiro handles the actual budget internally.
if thinkingEnabled {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the Claude translator, the duplicate injection prevention check hasThinkingTagInBody is implemented but not used in BuildKiroPayloadFromOpenAI. This will lead to duplicate thinking tags if the client already provides them.

Suggested change
if thinkingEnabled {
if thinkingEnabled && !hasThinkingTagInBody(openaiBody) {

Comment on lines +102 to +110
// BuildClaudeThinkingBlockStopEvent creates a content_block_stop SSE event for thinking blocks.
func BuildClaudeThinkingBlockStopEvent(index int) []byte {
event := map[string]interface{}{
"type": "content_block_stop",
"index": index,
}
result, _ := json.Marshal(event)
return []byte("event: content_block_stop\ndata: " + string(result))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new function BuildClaudeThinkingBlockStopEvent is identical to the existing function BuildClaudeContentBlockStopEvent. This introduces unnecessary code duplication. Please remove BuildClaudeThinkingBlockStopEvent and use BuildClaudeContentBlockStopEvent in its place in internal/runtime/executor/kiro_executor.go.

Comment on lines +750 to 755
// hasThinkingTagInBody checks if the request body already contains thinking configuration tags.
// This is used to prevent duplicate injection when client (e.g., AMP/Cursor) already includes thinking config.
func hasThinkingTagInBody(body []byte) bool {
bodyStr := string(body)
return strings.Contains(bodyStr, "<thinking_mode>") || strings.Contains(bodyStr, "<max_thinking_length>")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The function hasThinkingTagInBody is also defined in internal/translator/kiro/claude/kiro_claude_request.go. To improve maintainability and avoid code duplication, this utility function should be moved to a shared package, such as internal/translator/kiro/common, and then called from both kiro_claude_request.go and kiro_openai_request.go.

- Add generateThinkingSignature() function in kiro_claude_response.go
@luispater luispater merged commit cd0b14d into router-for-me:main Dec 16, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants