A Retrieval-Augmented Generation (RAG) system that combines document processing, vector storage, and large language models to provide accurate, context-aware responses to user queries.
This application processes various document types (PDFs, websites, text files) and creates a searchable knowledge base using vector embeddings. Users can then ask questions and receive answers backed by relevant source material from the processed documents.
- Multi-provider LLM support: Google Gemini, OpenAI GPT models, and local Ollama models
- Document processing: Automatic handling of PDFs, websites, and text files
- Vector storage: Chroma database for efficient similarity search
- REST API: Complete HTTP interface for integration with external applications
- Background processing: Non-blocking document ingestion
- Source attribution: All answers include references to original documents
The system consists of several core components:
- Document Processor: Extracts and chunks text from various sources
- Vector Database: Stores document embeddings for fast retrieval
- RAG Workflow: Orchestrates retrieval and generation processes
- API Server: Provides HTTP endpoints for system interaction
- Python 3.12+
- Optional: Ollama server for local model support
- Clone the repository:
git clone <repository-url>
cd rag- Install dependencies (use uv for faster installs):
pip install -e .- Set up environment variables:
export GOOGLE_API_KEY="your-gemini-api-key"
# and/or
export OPENAI_API_KEY="your-openai-api-key"Note: Ollama models require no API key but need the Ollama server running locally.
from rag.main import RAGApplication
from rag.conf import create_gemini_config
# Initialize with Gemini
config = create_gemini_config(collection_name="my_docs") # this is the name of the chroma collection (database where the embeddings are stored and retrieved from)
app = RAGApplication(config=config)
# Process documents
app.document_processor.process_sources(
pdf_files=['document.pdf'],
text_files=['notes.txt'],
websites=['https://pytorch.org/docs/stable/']
)
# Ask questions
result = app.query("What are the main topics covered?")
print(f"Answer: {result.answer}")
for source in result.sources:
print(f"Source: {source.title} - {source.origin}")Start the REST API server:
python start_api.pyThe server will be available at http://localhost:8000 with interactive documentation at http://localhost:8000/docs.
Returns system status and configuration information.
Response:
{
"status": "healthy",
"rag_initialized": true,
"provider": "gemini",
"model": "gemini-2.5-flash"
}Submit questions to the RAG system.
Request:
{
"question": "What is PyTorch?",
"provider": "gemini",
"max_sources": 3
}Response:
{
"answer": "PyTorch is an open-source machine learning framework...",
"sources": [
{
"title": "PyTorch Overview",
"origin": "https://pytorch.org/docs/",
"snippet": "PyTorch is a Python package that provides..."
}
],
"provider": "gemini"
}Update the LLM provider and model without restarting the server.
Request:
{
"provider": "openai",
"model": "gpt-4o-mini"
}List all available LLM providers and their configurations.
Response:
{
"providers": {
"gemini": {
"models": ["gemini-2.5-flash", "gemini-1.5-pro"],
"requirements": "GOOGLE_API_KEY environment variable"
},
"openai": {
"models": ["gpt-4o-mini", "gpt-3.5-turbo"],
"requirements": "OPENAI_API_KEY environment variable"
},
"ollama": {
"models": ["llama3.2", "mixtral", "codellama"],
"requirements": "Ollama server running locally"
}
}
}Process and ingest documents into the knowledge base.
Request:
{
"pdf_files": ["document.pdf", "manual.pdf"],
"text_files": ["notes.txt"],
"websites": ["pytorch", "pandas"]
}Process a specific website URL.
Request:
{
"url": "https://docs.python.org/3/",
"max_pages": 100,
"include_patterns": ["*.html"],
"exclude_patterns": ["*download*"]
}Get information about the vector database.
Response:
{
"total_documents": 1250,
"collections": ["api_rag_collection"],
"embedding_model": "all-MiniLM-L6-v2"
}Ask a question:
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{
"question": "How do I initialize a neural network?",
"provider": "gemini",
"max_sources": 2
}'Switch to OpenAI:
curl -X POST "http://localhost:8000/config/update" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o-mini"
}'Process a website:
curl -X POST "http://localhost:8000/website/process" \
-H "Content-Type: application/json" \
-d '{
"url": "https://scikit-learn.org/stable/",
"max_pages": 50
}'import requests
# Ask a question
response = requests.post(
"http://localhost:8000/query",
json={
"question": "What are the best practices for model training?",
"provider": "gemini",
"max_sources": 3
}
)
result = response.json()
print(f"Answer: {result['answer']}")- Requires
GOOGLE_API_KEYenvironment variable - Supports structured output and fast responses
- Models:
gemini-2.5-flash,gemini-1.5-pro
- Requires
OPENAI_API_KEYenvironment variable - High-quality responses with excellent reasoning
- Models:
gpt-4o-mini,gpt-3.5-turbo,gpt-4
- No API key required
- Complete privacy and offline operation
- Requires Ollama server:
ollama serve - Models:
llama3.2,mixtral,codellama
pytest tests/The project uses ruff for linting and formatting:
ruff check .
ruff format .Extend the DocumentProcessor class to support additional file types or data sources.
The FastAPI application can be deployed using standard Python web server setups:
- Local development:
uvicorn src.rag.api:app --reload - Production: Use gunicorn, Docker, or cloud platform services
- Environment variables: Ensure API keys are properly configured
MIT License - see LICENSE file for details.