RAG Application

A Retrieval-Augmented Generation (RAG) system that combines document processing, vector storage, and large language models to provide accurate, context-aware responses to user queries.

Overview

This application processes various document types (PDFs, websites, text files) and creates a searchable knowledge base using vector embeddings. Users can then ask questions and receive answers backed by relevant source material from the processed documents.

Key Features

Multi-provider LLM support: Google Gemini, OpenAI GPT models, and local Ollama models
Document processing: Automatic handling of PDFs, websites, and text files
Vector storage: Chroma database for efficient similarity search
REST API: Complete HTTP interface for integration with external applications
Background processing: Non-blocking document ingestion
Source attribution: All answers include references to original documents

Architecture

The system consists of several core components:

Document Processor: Extracts and chunks text from various sources
Vector Database: Stores document embeddings for fast retrieval
RAG Workflow: Orchestrates retrieval and generation processes
API Server: Provides HTTP endpoints for system interaction

Installation

Prerequisites

Python 3.12+
Optional: Ollama server for local model support

Setup

Clone the repository:

git clone <repository-url>
cd rag

Install dependencies (use uv for faster installs):

pip install -e .

Set up environment variables:

export GOOGLE_API_KEY="your-gemini-api-key"
# and/or
export OPENAI_API_KEY="your-openai-api-key"

Note: Ollama models require no API key but need the Ollama server running locally.

Quick Start

Command Line Usage

from rag.main import RAGApplication
from rag.conf import create_gemini_config

# Initialize with Gemini
config = create_gemini_config(collection_name="my_docs") # this is the name of the chroma collection (database where the embeddings are stored and retrieved from)
app = RAGApplication(config=config)

# Process documents
app.document_processor.process_sources(
    pdf_files=['document.pdf'],
    text_files=['notes.txt'],
    websites=['https://pytorch.org/docs/stable/']
)

# Ask questions
result = app.query("What are the main topics covered?")
print(f"Answer: {result.answer}")
for source in result.sources:
    print(f"Source: {source.title} - {source.origin}")

API Server

Start the REST API server:

python start_api.py

The server will be available at http://localhost:8000 with interactive documentation at http://localhost:8000/docs.

REST API Endpoints

Core Operations

GET /health

Returns system status and configuration information.

Response:

{
  "status": "healthy",
  "rag_initialized": true,
  "provider": "gemini",
  "model": "gemini-2.5-flash"
}

POST /query

Submit questions to the RAG system.

Request:

{
  "question": "What is PyTorch?",
  "provider": "gemini",
  "max_sources": 3
}

Response:

{
  "answer": "PyTorch is an open-source machine learning framework...",
  "sources": [
    {
      "title": "PyTorch Overview",
      "origin": "https://pytorch.org/docs/",
      "snippet": "PyTorch is a Python package that provides..."
    }
  ],
  "provider": "gemini"
}

Configuration Management

POST /config/update

Update the LLM provider and model without restarting the server.

Request:

{
  "provider": "openai",
  "model": "gpt-4o-mini"
}

GET /providers

List all available LLM providers and their configurations.

Response:

{
  "providers": {
    "gemini": {
      "models": ["gemini-2.5-flash", "gemini-1.5-pro"],
      "requirements": "GOOGLE_API_KEY environment variable"
    },
    "openai": {
      "models": ["gpt-4o-mini", "gpt-3.5-turbo"],
      "requirements": "OPENAI_API_KEY environment variable"
    },
    "ollama": {
      "models": ["llama3.2", "mixtral", "codellama"],
      "requirements": "Ollama server running locally"
    }
  }
}

Document Management

POST /documents/process

Process and ingest documents into the knowledge base.

Request:

{
  "pdf_files": ["document.pdf", "manual.pdf"],
  "text_files": ["notes.txt"],
  "websites": ["pytorch", "pandas"]
}

POST /website/process

Process a specific website URL.

Request:

{
  "url": "https://docs.python.org/3/",
  "max_pages": 100,
  "include_patterns": ["*.html"],
  "exclude_patterns": ["*download*"]
}

GET /database/status

Get information about the vector database.

Response:

{
  "total_documents": 1250,
  "collections": ["api_rag_collection"],
  "embedding_model": "all-MiniLM-L6-v2"
}

API Usage Examples

Using curl

Ask a question:

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "How do I initialize a neural network?",
    "provider": "gemini",
    "max_sources": 2
  }'

Switch to OpenAI:

curl -X POST "http://localhost:8000/config/update" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o-mini"
  }'

Process a website:

curl -X POST "http://localhost:8000/website/process" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://scikit-learn.org/stable/",
    "max_pages": 50
  }'

Using Python requests

import requests

# Ask a question
response = requests.post(
    "http://localhost:8000/query",
    json={
        "question": "What are the best practices for model training?",
        "provider": "gemini",
        "max_sources": 3
    }
)
result = response.json()
print(f"Answer: {result['answer']}")

LLM Provider Configuration

Google Gemini (Default)

Requires GOOGLE_API_KEY environment variable
Supports structured output and fast responses
Models: gemini-2.5-flash, gemini-1.5-pro

OpenAI

Requires OPENAI_API_KEY environment variable
High-quality responses with excellent reasoning
Models: gpt-4o-mini, gpt-3.5-turbo, gpt-4

Ollama (Local)

No API key required
Complete privacy and offline operation
Requires Ollama server: ollama serve
Models: llama3.2, mixtral, codellama

Development

Running Tests

pytest tests/

Code Quality

The project uses ruff for linting and formatting:

ruff check .
ruff format .

Adding New Document Sources

Extend the DocumentProcessor class to support additional file types or data sources.

Deployment

The FastAPI application can be deployed using standard Python web server setups:

Local development: uvicorn src.rag.api:app --reload
Production: Use gunicorn, Docker, or cloud platform services
Environment variables: Ensure API keys are properly configured

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
src/rag		src/rag
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-test.txt		requirements-test.txt
start_api.py		start_api.py
uv.lock		uv.lock

License

bhroben/rag

Folders and files

Latest commit

History

Repository files navigation

RAG Application

Overview

Key Features

Architecture

Installation

Prerequisites

Setup

Quick Start

Command Line Usage

API Server

REST API Endpoints

Core Operations

GET /health

POST /query

Configuration Management

POST /config/update

GET /providers

Document Management

POST /documents/process

POST /website/process

GET /database/status

API Usage Examples

Using curl

Using Python requests

LLM Provider Configuration

Google Gemini (Default)

OpenAI

Ollama (Local)

Development

Running Tests

Code Quality

Adding New Document Sources

Deployment

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages