Model Context Protocol (MCP) server for web content fetching and extraction.
This MCP server provides tools to fetch webpages, extract clean content using Trafilatura, and discover links for batch processing.
- Fetch Webpages: Extract clean markdown content from any URL
- Batch Fetching: Fetch up to 10 URLs in a single request
- Link Discovery: Find and filter links on any webpage
- llms.txt Support: Parse and fetch LLM-friendly documentation indexes
- Smart Extraction: Trafilatura removes boilerplate (navbars, ads, footers)
- Robots.txt Compliance: Respects robots.txt with graceful timeout handling
- Pagination Support: Handle large pages with
start_indexparameter
- Install
uvfrom Astral - Install Python 3.10 or newer using
uv python install 3.10
| Cursor | VS Code |
|---|---|
| Install MCP Server | Install on VS Code |
Or configure manually in your MCP client:
{
"mcpServers": {
"fetchv2": {
"command": "uvx",
"args": ["fetchv2-mcp-server@latest"],
"disabled": false,
"autoApprove": []
}
}
}Config file locations:
- Claude Desktop (macOS):
~/Library/Application Support/Claude/claude_desktop_config.json - Claude Desktop (Windows):
%APPDATA%\Claude\claude_desktop_config.json - Windsurf:
~/.codeium/windsurf/mcp_config.json - Kiro:
.kiro/settings/mcp.jsonin your project
# Using uv
uv add fetchv2-mcp-server
# Using pip
pip install fetchv2-mcp-serverExample prompts to try:
- "Fetch the documentation from
<URL>" - "Find all links on
<docs URL>that contain 'tutorial'" - "Read these three pages and summarize the differences:
[url1, url2, url3]"
Fetches a webpage and extracts its main content as clean markdown.
fetch(url: str, max_length: int = 5000, start_index: int = 0) -> str| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | The webpage URL to fetch |
max_length |
int | 5000 | Maximum characters to return |
start_index |
int | 0 | Character offset for pagination |
get_raw_html |
bool | false | Skip extraction, return raw HTML |
include_metadata |
bool | true | Include title, author, date |
include_tables |
bool | true | Preserve tables in markdown |
include_links |
bool | false | Preserve hyperlinks |
bypass_robots_txt |
bool | false | Skip robots.txt check |
Fetches multiple webpages in a single request.
fetch_batch(urls: list[str], max_length_per_url: int = 2000) -> str| Parameter | Type | Default | Description |
|---|---|---|---|
urls |
list[str] | required | List of URLs (max 10) |
max_length_per_url |
int | 2000 | Character limit per URL |
get_raw_html |
bool | false | Skip extraction for all URLs |
Discovers all links on a webpage with optional filtering.
discover_links(url: str, filter_pattern: str = "") -> str| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | The webpage URL to scan |
filter_pattern |
str | "" | Regex to filter links (e.g., /docs/) |
Fetch and parse an llms.txt file to discover LLM-friendly documentation.
fetch_llms_txt(url: str, include_content: bool = False) -> str| Parameter | Type | Default | Description |
|---|---|---|---|
url |
str | required | URL to an llms.txt file |
include_content |
bool | false | Also fetch content of all linked pages |
max_length_per_url |
int | 2000 | When include_content=True, max chars per page |
⚠️ Important: By default, only the llms.txt index is fetched — the linked markdown files are NOT downloaded to context. Setinclude_content=Trueto explicitly fetch all linked pages.
Example:
# DEFAULT: Only fetches the index (lightweight, ~1KB)
fetch_llms_txt(url="https://docs.example.com/llms.txt")
# Returns: title + list of links with descriptions
# EXPLICIT: Fetches index + all linked .md files (can be large)
fetch_llms_txt(url="https://docs.example.com/llms.txt", include_content=True)
# Returns: structure + content of all linked pagesNote: Relative URLs (e.g., /docs/guide.md) are automatically resolved to absolute URLs.
Step 1: Discover relevant documentation pages
discover_links(url="https://docs.example.com/", filter_pattern="/guide/")Step 2: Batch fetch the pages you need
fetch_batch(urls=["https://docs.example.com/guide/intro", "https://docs.example.com/guide/setup"])- fetch_manual - User-initiated fetch that bypasses robots.txt
- research_topic - Research a topic by fetching multiple relevant URLs
# Clone and install
git clone https://github.com/praveenc/fetchv2-mcp-server.git
cd fetchv2-mcp-server
uv sync --dev
source .venv/bin/activate
# Run tests
uv run pytest
# Run with MCP Inspector
mcp dev src/fetchv2_mcp_server/server.py
# Linting and type checking
uv run ruff check .
uv run pyrightMIT - see LICENSE for details.
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
For issues and questions, use the GitHub issue tracker.