This repository packages the AIFARMS implementation of the Model Context Protocol (MCP) server stack.
It exposes curated datasets, pluggable machine-learning tools, and an optional LLM-powered semantic search layer to downstream agents and web clients.
- Core MCP server (
mcp_core_server.py) provides protocol endpoints, registers tools, and exposes dataset/model metadata. - Dataset registry dynamically discovers
*_mcp_data.jsonfiles and surfaces their filterable attributes. - Model registry keeps track of inference handlers and records execution history.
- Tool registry manages synchronous/async tools and wraps results in MCP-compliant envelopes.
- FastAPI web interface (
web_interface.py) renders a search UI that calls the MCP server and highlights LLM-backed confidence scores. - Config module centralizes base paths, MCP/HTTP settings, and optional LLM credentials.
config.py– runtime configuration, environment variables, and directory bootstrap.dataset_registry.py– dataset discovery, filtering helpers, and caching with adapter support.dataset_adapter.py– extensible adapter system for different dataset types.model_registry.py– model definitions, registration API, and inference history.tool_registry.py– registration/execution wrapper producing MCP tool responses.mcp_core_server.py– FastAPI app exposing/mcp,/api/datasets,/api/models,/api/inference, etc.web_interface.py– FastAPI app that communicates with the MCP server using HTTPX; auto-generates Jinja templates under/opt/mcp-data-server/templates.ADDING_DATASET_TYPES.md– comprehensive guide for adding new dataset types.example_custom_dataset.json– example of a custom dataset format.mcp_json/– sample MCP JSON payloads for reference.requirements.txt– Python dependencies.LICENSE– Apache-2.0.
- Python 3.10+ recommended.
- Access to
/opt/mcp-data-server(defaultBASE_DIR) with:images/– flat image store used by the web interface.datasets/andplugins/directories (created automatically).- One or more
<dataset>_mcp_data.jsonfiles containingimages, metadata, and optional filters.
- (Optional) OpenAI-compatible API key if you want to enable LLM-powered query understanding.
- Clone
git clone https://github.com/AIFARMS/model-context-protocol.git cd model-context-protocol - Create environment
The project memory suggests a Conda env
python -m venv .venv source .venv/bin/activate pip install -r requirements.txtmyenv; alternatively:conda env create -f ../myenv.yml # adjust path as needed conda activate myenv pip install -r requirements.txt - Configure environment variables (optional values shown).
Add additional overrides for data directories if
export OPENAI_API_KEY=sk-... export LLM_MODEL=gpt-4o-mini export MCP_HOST=0.0.0.0 export MCP_PORT=8188 export WEB_HOST=0.0.0.0 export WEB_PORT=8187 export MCP_BASE_URL=http://127.0.0.1:8188 export MCP_SERVER_URL=http://127.0.0.1:8188
/opt/mcp-data-serveris not available. - Seed datasets: copy or generate
<name>_mcp_data.jsoninto the base directory and ensure referenced images exist inimages/.
Open two terminals (in the same virtual environment):
python mcp_core_server.pyThe server logs discovered datasets, registered tools, and available models.
It exposes:
GET /health– service heartbeat.GET /mcp– protocol metadata.GET /mcp/tools&POST /mcp/tools/{tool}– tool enumeration/execution.GET /mcp/resources– resource listing.GET /api/datasets,/api/datasets/{name},/api/datasets/{name}/imagesGET /api/models,/api/models/{name}POST /api/inferencePOST /api/search(placeholder returning incoming payload until implemented).
python web_interface.pyDefault UI endpoints:
GET /– dataset-aware search form with classic + AI search tabs.POST /search– synchronous tool-based search.POST /llm_search– LLM-assisted search ranked by confidence.GET /api/search,/api/datasets,/api/models,/api/inference– proxy to the MCP server.- Debug helpers:
/debug/search,/debug/files,/test/images.
If the LLM service is unavailable, the interface still functions with rule-based filtering (logs will show a fallback message).
Each *_mcp_data.json file should contain an images array. Example entry:
{
"id": "bobcat_008",
"collection": "bobcat",
"category": "wildlife",
"metadata": {
"species": "lynx rufus",
"time": "night",
"season": "winter",
"action": "hunting",
"scene": "forest edge",
"weather": "snow",
"description": "Bobcat stalking prey near a snowy log."
}
}The dataset registry derives available filters (times, seasons, actions, etc.) and caches the image list in memory for search tools.
The system supports extensible dataset types beyond species observation datasets:
-
Simple Custom Datasets: Drop a
<name>_mcp_data.jsonwithdataset_type: "custom"and a schema definition. The generic adapter will handle it automatically. -
Custom Dataset Adapters: For complex datasets, create a custom adapter class implementing the
DatasetAdapterinterface.
See ADDING_DATASET_TYPES.md for a complete guide with examples.
Quick Example - Custom dataset with schema:
{
"dataset_type": "custom",
"description": "Weather observations",
"schema": {
"filter_fields": ["station", "condition", "temperature_range"]
},
"images": [
{
"id": "weather_001",
"collection": "station_a",
"station": "station_a",
"condition": "sunny",
"metadata": {"timestamp": "2024-01-01T12:00:00Z"}
}
]
}- Models: call
ModelRegistry.register_modelwith a handler that returns prediction payloads. Modify_register_default_modelsor create a plugin. - Tools: use
ToolRegistry.register_toolinsidemcp_core_server.pyor from an external module; handlers can be async or sync. - Web templates: override the generated HTML in
/opt/mcp-data-server/templates.
- Use
uvicorn mcp_core_server:mcp_server.app --reloadfor hot reloads. - Similarly
uvicorn web_interface:web_interface.app --reload. - Configure
.envfiles or.envaifarms(project-specific naming) for persistent environment variables. - Keep an eye on logs—many helpers print informative messages about dataset discovery and LLM resolution.
Apache License 2.0 – see LICENSE for details.