This project is a hybrid AI agent server that combines the thinking ability of the deepseek-r1 model with the corpus and state capabilities of the gemini model.
-
Hybrid Model Architecture:
- Use DeepSeek-R1 as the front-end system thinking (supports different parameter scales from 7B to 671B)
- Use Gemini model front-end information processing and discussion group, including processing Internet and polymorphic model information
- Use Gemini as the main output model
- Support model failure automatic switching and retry mechanism
-
Multimodality Support: -Image support recognition and understanding -Supports simultaneous processing of multiple images -Image descriptions are automatically passed to the text model for deep understanding
-
High Availability:
-
Automatic retry mechanism (up to 3 times)
-
Automatic switching of model failures
-
Streaming response support
-
Smart search function:
-
AI autonomously determines whether online search is needed
-
Multi-language, multi-keyword comprehensive search
-
Search results are automatically integrated into the conversation context
-
Support real-time search and information update
-
Local https dynamic analysis
- Node.js and npm installed
- Obtained the necessary API keys:
- DeepSeek-R1 API Key
- Gemini-Beta API Key
- Note that this project requires GeminiAPI to have high concurrency, dynamic load and security filtering, so direct single API cannot be used in this project. It is recommended to deploy a dynamic load project similar to OneAPI.
- Due to the different R1 output JSON formats of different provider, only Silicon Flow and the official API are compatible for the time being. Unexpected errors may occur in NIM and Azure.
- Clone Project:
git clone <repository-url>
cd <project-directory>- Install Dependencies:
npm install- Configuring environment variables:
Create
.envFile and configure the following environment variables:
# Proxy server configuration
PROXY_URL=http://your-proxy-url:3000
PROXY_URL2=http://your-proxy-url:3000
PROXY_URL3=http://your-proxy-url:3000
PROXY_PORT=4120
# DeepSeek R1 Configuration
DEEPSEEK_R1_API_KEY=your-api-key
DEEPSEEK_R1_MODEL=deepseek-ai/DeepSeek-R1
DEEPSEEK_R1_MAX_TOKENS=7985
DEEPSEEK_R1_CONTEXT_WINDOW=2000000
DEEPSEEK_R1_TEMPERATURE=0.7
# Image recognition model configuration
Image_Model_API_KEY=your-api-key
Image_MODEL=gemini-exp-1206
Image_Model_MAX_TOKENS=7985
Image_Model_CONTEXT_WINDOW=2000000
Image_Model_TEMPERATURE=0.4
# API Key
OUTPUT_API_KEY=your-api-key
# Online Search model configuration
GoogleSearch_API_KEY=your-api-key
GoogleSearch_MODEL=gemini-2.0-flash-exp
GoogleSearch_Model_MAX_TOKENS=7985
GoogleSearch_Model_TEMPERATURE=0.4
# Search function prompt configuration
GoogleSearch_Determine_PROMPT=your-prompt
GoogleSearch_PROMPT=your-prompt
GoogleSearch_Send_PROMPT=your-promptnode main.jsEndpoints:/v1/chat/completions
Method:POST
Request Header:
Content-Type: application/json
Authorization: Bearer your-api-key
Text chat request body example:
{
"model": "Gemini1206MIXR1",
"messages": [
{
"role": "user",
"content": "Hello, please introduce yourself"
}
],
"stream": true
}Image request body example:
{
"model": "Gemini1206MIXR1",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is this picture about?"
},
{
"type": "image_url",
"image_url": {
"url": "..."
}
}
]
}
],
"stream": true
}Support sending multiple pictures in the same message:
{
"model": "Gemini1206MIXR1",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Compare the differences between these two pictures"
},
{
"type": "image_url",
"image_url": {
"url": "..."
}
},
{
"type": "image_url",
"image_url": {
"url": "..."
}
}
]
}
],
"stream": true
}DeepSeek-R1 Supports models with various parameter sizes, ranging from 7B to 671B. You can configure it in the environment variables as needed:
DEEPSEEK_R1_MODEL=deepseek-ai/DeepSeek-R1-7B
# OR
DEEPSEEK_R1_MODEL=deepseek-ai/DeepSeek-R1-67B
# OR
DEEPSEEK_R1_MODEL=deepseek-ai/DeepSeek-R1-671B- The system will automatically retry failed requests (up to 3 times)
- When the R1 model is unavailable, it will automatically switch to the Gemini model
- Detailed error information will be returned through the status code and response body
The system integrates advanced AI-driven search capabilities with the following features:
- Autonomous judgment mechanism:
- AI automatically analyzes the content of the conversation to determine whether an online search is needed
- Considers multiple dimensions such as timeliness information, professional data, news events, etc.
- Intelligently avoids unnecessary search requests
- Multi-dimensional search strategy:
- Automatically generate multiple related search keywords
- Use both Chinese and English to search
- Consider professional terms and common terms
- Support exact phrase matching
- Search result processing:
- Automatically integrate information from multiple sources
- Results are filtered and summarized by AI
- Seamlessly integrate into the conversation context
- Keep information timely and accurate
- Usage scenarios:
- Need the latest data or statistics
- Check for real-time news or events
- Verify professional knowledge or technical details
- Need cross-verification of information from multiple sources
在 .env Configure search related parameters in the file:
# Online search model configuration
GoogleSearch_API_KEY=your-api-key
GoogleSearch_MODEL=gemini-2.0-flash-exp
GoogleSearch_Model_MAX_TOKENS=7985
GoogleSearch_Model_TEMPERATURE=0.4
# Search function prompt configuration
GoogleSearch_Determine_PROMPT=your-prompt
GoogleSearch_PROMPT=your-prompt
GoogleSearch_Send_PROMPT=your-promptBasic conversation request:
{
"model": "Gemini1206MIXR1",
"messages": [
{
"role": "user",
"content": "Please tell me about recent AI developments"
}
],
"stream": true
}System Will:
- Automatically determine the need to search for the latest AI development information
- Generate multiple search keywords (such as "latest AI developments 2024", "latest progress in artificial intelligence", "AI breakthrough news", etc.)
- Perform the search and integrate the information
- Incorporate the search results into the answer
- Image data needs to be encoded in Base64 format
- It is recommended to keep
stream: truefor a better response experience - All API calls require a valid API key
- Image recognition results are automatically passed to the text model for deep understanding and answering
- 401: Unauthorized (Invalid API key)
- 429: Too many requests
- 503: Service temporarily unavailable
- 504: Request timed out
- Support more model combinations
- Add automatic model selection function
- Optimize image recognition performance
- Add more error handling mechanisms
Original Repo:https://github.com/lioensky/GeminiMixSuper (Chinese)