Skip to content

TimmyOVO/mcp-smart-fetch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

mcp-smart-fetch

Rust License MCP

English | δΈ­ζ–‡

An intelligent document content extraction service built with Rust's official MCP SDK, supporting multiple document formats and integrated with LLM APIs for smart content extraction, running as a standard MCP server.

✨ Features

  • πŸš€ High-Performance Async Architecture - Built on Tokio async runtime
  • 🧠 Smart Content Extraction - Integrated with multiple LLM APIs
  • πŸ“„ Multi-Format Support - TXT, MD, JSON, YAML, TOML, XML, CSV
  • πŸ”§ MCP Server - Standard Model Context Protocol server
  • βš™οΈ Flexible Configuration - Support for config files and environment variables
  • 🐳 Container Support - Docker deployment ready
  • πŸ§ͺ Complete Testing - Unit and integration test coverage

πŸš€ Quick Start

Requirements

  • Rust 1.75+
  • LLM API key (OpenAI, Claude, Alibaba Cloud, etc.)

Installation

git clone https://github.com/yourusername/mcp-smart-fetch.git
cd mcp-smart-fetch
cargo build --release

Configuration

  1. Copy environment variable example:
cp .env.example .env
  1. Edit .env file with your API key:
LLM_API_KEY="your-api-key-here"
LLM_MODEL="gpt-4"
LLM_API_ENDPOINT="https://api.openai.com/v1/chat/completions"

Basic Usage

Extract from File

cargo run -- extract input.txt
cargo run -- extract --input document.pdf --prompt "Summarize key points"
cargo run -- extract -i data.json -o result.txt

Extract from Text

cargo run -- extract-text --text "This is text that needs analysis..."
cargo run -- extract-text -t "text content" -p "Extract key information"

Start MCP Server

# Start MCP server (stdio mode)
cargo run -- serve

# View detailed configuration
cargo run --verbose serve

πŸ”§ MCP Server

mcp-smart-fetch can run as a standard MCP server, providing the following tools:

Available Tools

  1. extract_from_file - Extract intelligent content from files
  2. extract_from_text - Extract intelligent content from text
  3. get_config - Get server configuration information
  4. list_supported_formats - List supported document formats

Client Configuration

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "smart-fetch": {
      "command": "cargo",
      "args": ["run", "--", "serve"],
      "env": {
        "LLM_API_KEY": "your-api-key"
      }
    }
  }
}

Docker Deployment

# Use environment variables
docker run --env-file .env -v $(pwd)/templates:/app/templates mcp-smart-fetch serve

# Use docker-compose
docker-compose up mcp-server

βš™οΈ Configuration

Environment Variables

The project supports complete configuration through environment variables, with environment variables taking precedence over configuration files.

LLM Configuration

  • LLM_API_KEY - LLM API key (required)
  • LLM_API_ENDPOINT - API endpoint URL
  • LLM_MODEL - Model name to use
  • LLM_MAX_TOKENS - Maximum tokens (u32)
  • LLM_TEMPERATURE - Temperature parameter (f64, 0.0-2.0)
  • LLM_TIMEOUT_SECONDS - Request timeout (u64, seconds)

Server Configuration

  • SERVER_HOST - Server listen address
  • SERVER_PORT - Server port (u16)
  • SERVER_MAX_CONNECTIONS - Maximum connections (u32)
  • SERVER_REQUEST_TIMEOUT_SECONDS - Request timeout (u64, seconds)

Processing Configuration

  • TEMPLATES_DIR - Template directory path
  • DEFAULT_TEMPLATE - Default template name
  • MAX_DOCUMENT_SIZE_MB - Maximum document size (f64, MB)
  • CHUNK_SIZE - Chunk size (usize)
  • ENABLE_PREPROCESSING - Enable preprocessing (bool)

Cleaning Configuration

  • ENABLE_CLEANING - Enable cleaning functionality (bool)
  • REMOVE_BASE64_IMAGES - Remove base64 images (bool)
  • REMOVE_BINARY_DATA - Remove binary data (bool)
  • REMOVE_HTML_TAGS - Remove HTML tags (bool)
  • NORMALIZE_WHITESPACE - Normalize whitespace (bool)
  • MAX_STRING_LENGTH - Maximum string length (usize)

Configuration File

Configuration file located at config/config.toml, supporting layered configuration:

[llm]
api_endpoint = "https://api.openai.com/v1/chat/completions"
model = "gpt-4"
max_tokens = 32768
temperature = 0.7

[server]
host = "127.0.0.1"
port = 8080

[processing]
max_document_size_mb = 10.0
chunk_size = 4000
supported_formats = ["txt", "md", "json", "yaml", "yml", "toml", "xml", "csv"]

View Configuration

# View all supported environment variables
cargo run -- env-vars

# View detailed configuration
cargo run --verbose extract-text --text "test"

πŸ§ͺ Testing

Run Tests

# Run all tests
cargo test

# Run unit tests
cargo test --lib

# Run integration tests (MCP server)
cargo test --test mcp_server_test

# Run specific tests
cargo test test_extract_from_text

Test Coverage

  • Unit tests: Independent functionality of each module
  • MCP server tests: Complete MCP protocol testing
  • Integration tests: End-to-end content extraction workflow

🐳 Docker Deployment

Build Image

docker build -t mcp-smart-fetch .

Run Container

# Use environment variables
docker run --env-file .env -v $(pwd)/templates:/app/templates mcp-smart-fetch

# Run as MCP server
docker run --env-file .env mcp-smart-fetch serve

Docker Compose

version: '3.8'
services:
  mcp-server:
    build: .
    command: ["serve"]
    environment:
      - LLM_API_KEY=${LLM_API_KEY}
      - LLM_MODEL=${LLM_MODEL}
    volumes:
      - ./templates:/app/templates

πŸ“ Project Structure

mcp-smart-fetch/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ main.rs              # Main program entry
β”‚   β”œβ”€β”€ lib.rs               # Library entry
β”‚   β”œβ”€β”€ config.rs            # Configuration management
β”‚   β”œβ”€β”€ mcp_server.rs        # MCP server implementation
β”‚   β”œβ”€β”€ llm_client.rs        # LLM client
β”‚   β”œβ”€β”€ document.rs          # Document processing
β”‚   β”œβ”€β”€ prompt_template.rs   # Prompt template system
β”‚   β”œβ”€β”€ cleaner.rs           # Content cleaning
β”‚   β”œβ”€β”€ progress.rs          # Progress display
β”‚   └── error.rs             # Error handling
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ unit_test.rs         # Unit tests
β”‚   β”œβ”€β”€ integration_test.rs  # Integration tests
β”‚   β”œβ”€β”€ cleaning_test.rs     # Cleaning tests
β”‚   └── mcp_server_test.rs   # MCP server tests
β”œβ”€β”€ config/
β”‚   └── config.toml          # Configuration file
β”œβ”€β”€ templates/               # Template directory
β”œβ”€β”€ examples/                # Example files
β”œβ”€β”€ .env.example            # Environment variable example
β”œβ”€β”€ docker-compose.yml       # Docker Compose config
β”œβ”€β”€ Dockerfile              # Docker image config
└── README.md               # Project documentation

πŸ”§ Development

Development Environment Setup

# Clone project
git clone https://github.com/yourusername/mcp-smart-fetch.git
cd mcp-smart-fetch

# Install Rust toolchain
rustup install stable
rustup component add clippy rustfmt

# Install pre-commit hooks (optional)
cargo install pre-commit
pre-commit install

Development Commands

# Build project
cargo build

# Run development version
cargo run

# Run tests
cargo test

# Format code
cargo fmt

# Check code
cargo clippy

# Generate documentation
cargo doc --open

Adding New LLM Providers

  1. Add new request format in src/llm_client.rs
  2. Add new configuration example in config/config.toml
  3. Update environment variable documentation
  4. Add corresponding test cases

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“ž Support

πŸ™ Acknowledgments


⭐ If this project helps you, please give it a star!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors