Skip to content
/ rag Public

RAG system with REST API endpoints for chatting with common data science docs (pytorch, scikit-learn ...) built with Langchain

License

Notifications You must be signed in to change notification settings

bhroben/rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Application

A Retrieval-Augmented Generation (RAG) system that combines document processing, vector storage, and large language models to provide accurate, context-aware responses to user queries.

Overview

This application processes various document types (PDFs, websites, text files) and creates a searchable knowledge base using vector embeddings. Users can then ask questions and receive answers backed by relevant source material from the processed documents.

Key Features

  • Multi-provider LLM support: Google Gemini, OpenAI GPT models, and local Ollama models
  • Document processing: Automatic handling of PDFs, websites, and text files
  • Vector storage: Chroma database for efficient similarity search
  • REST API: Complete HTTP interface for integration with external applications
  • Background processing: Non-blocking document ingestion
  • Source attribution: All answers include references to original documents

Architecture

The system consists of several core components:

  • Document Processor: Extracts and chunks text from various sources
  • Vector Database: Stores document embeddings for fast retrieval
  • RAG Workflow: Orchestrates retrieval and generation processes
  • API Server: Provides HTTP endpoints for system interaction

Installation

Prerequisites

  • Python 3.12+
  • Optional: Ollama server for local model support

Setup

  1. Clone the repository:
git clone <repository-url>
cd rag
  1. Install dependencies (use uv for faster installs):
pip install -e .
  1. Set up environment variables:
export GOOGLE_API_KEY="your-gemini-api-key"
# and/or
export OPENAI_API_KEY="your-openai-api-key"

Note: Ollama models require no API key but need the Ollama server running locally.

Quick Start

Command Line Usage

from rag.main import RAGApplication
from rag.conf import create_gemini_config

# Initialize with Gemini
config = create_gemini_config(collection_name="my_docs") # this is the name of the chroma collection (database where the embeddings are stored and retrieved from)
app = RAGApplication(config=config)

# Process documents
app.document_processor.process_sources(
    pdf_files=['document.pdf'],
    text_files=['notes.txt'],
    websites=['https://pytorch.org/docs/stable/']
)

# Ask questions
result = app.query("What are the main topics covered?")
print(f"Answer: {result.answer}")
for source in result.sources:
    print(f"Source: {source.title} - {source.origin}")

API Server

Start the REST API server:

python start_api.py

The server will be available at http://localhost:8000 with interactive documentation at http://localhost:8000/docs.

REST API Endpoints

Core Operations

GET /health

Returns system status and configuration information.

Response:

{
  "status": "healthy",
  "rag_initialized": true,
  "provider": "gemini",
  "model": "gemini-2.5-flash"
}

POST /query

Submit questions to the RAG system.

Request:

{
  "question": "What is PyTorch?",
  "provider": "gemini",
  "max_sources": 3
}

Response:

{
  "answer": "PyTorch is an open-source machine learning framework...",
  "sources": [
    {
      "title": "PyTorch Overview",
      "origin": "https://pytorch.org/docs/",
      "snippet": "PyTorch is a Python package that provides..."
    }
  ],
  "provider": "gemini"
}

Configuration Management

POST /config/update

Update the LLM provider and model without restarting the server.

Request:

{
  "provider": "openai",
  "model": "gpt-4o-mini"
}

GET /providers

List all available LLM providers and their configurations.

Response:

{
  "providers": {
    "gemini": {
      "models": ["gemini-2.5-flash", "gemini-1.5-pro"],
      "requirements": "GOOGLE_API_KEY environment variable"
    },
    "openai": {
      "models": ["gpt-4o-mini", "gpt-3.5-turbo"],
      "requirements": "OPENAI_API_KEY environment variable"
    },
    "ollama": {
      "models": ["llama3.2", "mixtral", "codellama"],
      "requirements": "Ollama server running locally"
    }
  }
}

Document Management

POST /documents/process

Process and ingest documents into the knowledge base.

Request:

{
  "pdf_files": ["document.pdf", "manual.pdf"],
  "text_files": ["notes.txt"],
  "websites": ["pytorch", "pandas"]
}

POST /website/process

Process a specific website URL.

Request:

{
  "url": "https://docs.python.org/3/",
  "max_pages": 100,
  "include_patterns": ["*.html"],
  "exclude_patterns": ["*download*"]
}

GET /database/status

Get information about the vector database.

Response:

{
  "total_documents": 1250,
  "collections": ["api_rag_collection"],
  "embedding_model": "all-MiniLM-L6-v2"
}

API Usage Examples

Using curl

Ask a question:

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "How do I initialize a neural network?",
    "provider": "gemini",
    "max_sources": 2
  }'

Switch to OpenAI:

curl -X POST "http://localhost:8000/config/update" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o-mini"
  }'

Process a website:

curl -X POST "http://localhost:8000/website/process" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://scikit-learn.org/stable/",
    "max_pages": 50
  }'

Using Python requests

import requests

# Ask a question
response = requests.post(
    "http://localhost:8000/query",
    json={
        "question": "What are the best practices for model training?",
        "provider": "gemini",
        "max_sources": 3
    }
)
result = response.json()
print(f"Answer: {result['answer']}")

LLM Provider Configuration

Google Gemini (Default)

  • Requires GOOGLE_API_KEY environment variable
  • Supports structured output and fast responses
  • Models: gemini-2.5-flash, gemini-1.5-pro

OpenAI

  • Requires OPENAI_API_KEY environment variable
  • High-quality responses with excellent reasoning
  • Models: gpt-4o-mini, gpt-3.5-turbo, gpt-4

Ollama (Local)

  • No API key required
  • Complete privacy and offline operation
  • Requires Ollama server: ollama serve
  • Models: llama3.2, mixtral, codellama

Development

Running Tests

pytest tests/

Code Quality

The project uses ruff for linting and formatting:

ruff check .
ruff format .

Adding New Document Sources

Extend the DocumentProcessor class to support additional file types or data sources.

Deployment

The FastAPI application can be deployed using standard Python web server setups:

  • Local development: uvicorn src.rag.api:app --reload
  • Production: Use gunicorn, Docker, or cloud platform services
  • Environment variables: Ensure API keys are properly configured

License

MIT License - see LICENSE file for details.

About

RAG system with REST API endpoints for chatting with common data science docs (pytorch, scikit-learn ...) built with Langchain

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages