A REST API for Retrieval-Augmented Generation (RAG) over Star Trek's MemoryAlpha database using Ollama and FastAPI.
This project provides a REST API that enables natural language queries over the comprehensive Star Trek MemoryAlpha database. It uses the vectorized database from memoryalpha-vectordb and combines it with local LLMs via Ollama to provide accurate, context-aware responses about Star Trek lore.
The system implements:
- Retrieval-Augmented Generation (RAG) for context-aware responses
- Streaming responses for real-time interaction
- Cross-encoder reranking for improved document relevance
- Conversation history for multi-turn dialogues
- Thinking modes (disabled/quiet/verbose) for different interaction styles
- Docker and Docker Compose
- At least 8GB of available RAM for the models (no GPU needed)
-
Clone and start the services:
git clone https://github.com/aniongithub/memoryalpha-rag-api.git cd memoryalpha-rag-api docker-compose build docker-compose up
-
Wait for initialization: The first startup will download the Ollama model and ML models for reranking. This may take several minutes.
-
Start chatting:
./chat.sh
-
Example queries:
- Health Check:
GET /memoryalpha/health
- Streaming Chat:
GET /memoryalpha/rag/stream
- Streaming API
curl -N -H "Accept: text/event-stream" \
"http://localhost:8000/memoryalpha/rag/stream?question=What%20is%20the%20Enterprise?&thinkingmode=DISABLED&max_tokens=512&top_k=5"
- Synchronous API
curl -N -H "Accept: text/event-stream" \
"http://localhost:8000/memoryalpha/rag/ask?question=What%20is%20a%20Transporter?&thinkingmode=VERBOSE&max_tokens=512&top_k=5&top_p=0.8&temperature=0.3"
The system uses the following environment variables (set in .env
):
# Ollama Configuration
OLLAMA_URL=http://ollama:11434
DEFAULT_MODEL=qwen3:0.5b
# Database Configuration
DB_PATH=/data/enmemoryalpha_db
COLLECTION_NAME=memoryalpha
# API Configuration
THINKING_MODE=DISABLED
MAX_TOKENS=2048
TOP_K=10
question
: Your Star Trek questionthinkingmode
:DISABLED
,QUIET
, orVERBOSE
max_tokens
: Maximum response length (default: 2048)top_k
: Number of documents to retrieve (default: 10)top_p
: Sampling parameter (default: 0.8)temperature
: Response creativity (default: 0.3)
This project includes a complete development environment using VS Code Dev Containers:
-
Install prerequisites:
- VS Code
- Dev Containers extension
- Docker Desktop
-
Open in Dev Container:
- Open the project in VS Code
- Press
Ctrl+Shift+P
(orCmd+Shift+P
on Mac) - Select "Dev Containers: Reopen in Container"
- Wait for the container to build and start
-
Development features:
- Pre-configured Python environment with all dependencies
- Jupyter notebook support for experimentation
- Integrated terminal with access to all tools
- Port forwarding for API testing
For more information on Dev Containers, see the VS Code Dev Containers Tutorial.
If you prefer local development without containers:
- Install Python 3.12+
- Install dependencies:
pip install -r requirements.txt
- Set up Ollama locally:
# Install Ollama (see https://ollama.ai) ollama pull qwen3:0.5b
- Download the MemoryAlpha database:
wget https://github.com/aniongithub/memoryalpha-vectordb/releases/latest/download/enmemoryalpha_db.tar.gz tar -xzf enmemoryalpha_db.tar.gz
- Configure environment variables and start the API:
uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload
graph TD
A[User Query] --> B[FastAPI + RAG Pipeline]
B --> C[Document Retrieval]
C --> D[ChromaDB Vector Database<br/>MemoryAlpha Data]
B --> E[Cross-Encoder Reranking]
B --> F[Ollama + LLM]
F --> G[Streaming Response]
style A fill:#e1f5fe
style B fill:#f3e5f5
style D fill:#e8f5e8
style F fill:#fff3e0
style G fill:#fce4ec
- FastAPI: REST API framework and OpenAPI spec generation
- ChromaDB: Vector database for document storage and retrieval
- Ollama: Local LLM inference server
- Cross-Encoder: Document reranking for improved relevance
- SentenceTransformers: Text embedding models
- Fork the repository
- Create a feature branch
- Make your changes in the Dev Container environment
- Test your changes with
./chat.sh
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- MemoryAlpha for the comprehensive Star Trek database
- Ollama for local LLM inference
- ChromaDB for vector database functionality