Releases: takltc/claude-code-chutes-proxy
Releases · takltc/claude-code-chutes-proxy
v0.0.1 Release
Claude-to-Chutes Proxy v0.0.1 Release Notes
Overview
Claude-to-Chutes Proxy v0.0.1 is the first stable release of this project, implementing a complete bridge between Anthropic Claude API format and Chutes/OpenAI API format. This version includes optimized support for multiple mainstream models, comprehensive error handling, and production-ready deployment solutions.
Version History
Base Commit: v0.0.1 tag is based on commit d54dc79, containing the complete development history from initial version to release.
Major Features
🚀 Core Architecture
1. Protocol Conversion
- ✅ Anthropic ↔ OpenAI Compatibility: Full support for Anthropic
v1/messagesto OpenAIv1/chat/completionsrequest/response conversion - ✅ Streaming Support: Native Server-Sent Events (SSE) streaming responses
- ✅ Tool Calling: Support for both non-streaming and streaming tool calls
2. Model Provider Support
- DeepSeek V3.1: Full THINKING mode support (
:THINKINGsuffix) - LongCat: Dedicated GPT-OSS style tool parser
- Moonshot/Kimi: Intelligent model name case correction
- Cloud Code MCP: Tool markup parsing support
3. Tool Processing System
- Multi-provider Tool Parsing: DeepSeek, LongCat, MCP, and various markup formats
- Streaming Tool Parsing: Real-time
sglang.FunctionCallParserintegration - Tool Name Normalization: Built-in synonyms and fuzzy matching
🚀 Performance Optimization
1. Intelligent Cache System
- Persistent Model Discovery:
/v1/modelsresults cached to disk - Memory + Disk Dual Cache: Disk fallback during network failures
- TTL Validation: Prevents stale cache issues
2. Network Optimization
- Shared HTTP/2 Client: Connection reuse and pooling
- 429 Rate Limiting: Intelligent retry strategies
- HTTP/2 Support: Enabled by default for performance
3. Error Handling System
- Streaming Session Management: Prevents connection termination
- Model ID Correction: Automatic model name case correction
- 404 Retry Mechanism: Smart retry when model discovery fails
🛠️ Deployment & Operations
1. Docker Support
- Docker Compose: Complete production environment configuration
- GHCR Prebuilt Images:
ghcr.io/takltc/claude-code-chutes-proxy:0.0.1 - Health Checks: Built-in health check endpoints
2. Configuration Management
- Environment Variables: Rich configuration options
- .env File Support: Easy development and production environments
- Admin Endpoints: Cache management and monitoring
Technical Specifications
🎯 Intelligent Model Recognition
# Supported model identification patterns
model = "deepseek-ai/DeepSeek-V3.1:THINKING" # THINKING mode
model = "deepseek-ai/DeepSeek-V3.1" # Standard mode
model = "longchat-longcat" # LongCat tools
model = "moonshot-v1" # Moonshot case correction🛠️ Tool Parser Architecture
- MCP Tool Parsing: Cloud Code
<|tool_calls|>markup - LongCat Parsing: GPT-OSS style parser
- DeepSeek Parsing: Native tool call support
- Universal Tool Parsing: Adaptive model-specific formats
🔄 Streaming Processing
- Thinking Block Processing: DeepSeek reasoning mode streaming
- Streaming Tool Arguments: Real-time input parameter parsing
- Session Management: Connection and state preservation
Benchmarking Environment
System Requirements
- Python: 3.11+ (Supports Python 3.10 - 3.13)
- Dependencies: See requirements.txt
- HTTP Client: httpx 0.27.2 (HTTP/2 enabled)
Performance Metrics
- Connection Latency: HTTP/2 optimized connection setup
- Cache Hit Rate: Model discovery cache reduces 95% API calls
- Error Recovery: Smart retry improves success rate
Usage Instructions
Quick Start
# Use prebuilt Docker image
docker pull ghcr.io/takltc/claude-code-chutes-proxy:0.0.1
docker run --rm -p 8090:8080 -e CHUTES_BASE_URL=https://llm.chutes.ai claude-chutes-proxy
# Local development
docker compose up --buildExample Request
# DeepSeek THINKING mode
curl -X POST http://localhost:8090/v1/messages \
-H 'Content-Type: application/json' \
-H 'x-api-key: YOUR_KEY' \
-d '{
"model": "deepseek-ai/DeepSeek-V3.1:THINKING",
"max_tokens": 128,
"messages": [{
"role": "user",
"content": [{
"type": "text",
"text": "Explain quantum computing basics"
}]
}]
}'Admin Functions
Cache Management
GET /_models_cache- Current cache statusPOST /_models_cache/refresh- Manually refresh cacheDELETE /_models_cache- Clear cache
Debugging
GET /_debug/last- Last request debug information
Configuration
Key Environment Variables
CHUTES_BASE_URL=https://llm.chutes.ai
CHUTES_API_KEY=your-api-key
MODEL_DISCOVERY_PERSIST=1
PROXY_HTTP2=1
AUTO_FIX_MODEL_CASE=1
ENABLE_STREAM_TOOL_PARSER=0
CHUTES_MAX_TOKENS=128000Changelog
Major Features
v0.0.1 (2025-09-20)
- ✅ DeepSeek THINKING Mode: Complete thinking/reasoning support
- ✅ LongCat Tool Handling: GPT-OSS style tool parsing
- ✅ Intelligent Cache: Persistent model discovery with disk fallback
- ✅ HTTP/2 Optimization: Connection pooling and performance
- ✅ MCP Tool Parsing: Cloud Code tool markup support
- ✅ Streaming: Complete session management and state preservation
- ✅ Context Compaction: Automatic token management with configurable limits
Bug Fixes
- 🔧 Model Discovery: Fixed Moonshot/Kimi model recognition
- 🔧 Streaming Tools: Fixed DeepSeek tool argument parsing
- 🔧 Session Management: Fixed premature connection termination
- 🔧 Tool Calls: Fixed Cloud Code tool markup parsing
Known Limitations
Current Version
⚠️ Multimedia: Limited image processing capabilities⚠️ Tool Calls: Some advanced tool features may be incomplete
Compatibility
- ✅ Chutes: Fully compatible
- ✅ OpenAI: Standard compatibility
- ✅ vLLM/SGLang: Basic functionality support
Project Info
Repository: https://github.com/takltc/claude-code-chutes-proxy
Current Maintainer
- tak (tak.ltc@ud.me)