Skip to content

ansari-project/ragdiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAGDiff

A flexible framework for comparing Retrieval-Augmented Generation (RAG) systems side-by-side, with support for subjective quality evaluation using LLMs.

Features

  • Multi-tool Support: Compare multiple RAG tools in parallel
  • Flexible Adapters: Easy-to-extend adapter pattern for adding new tools
  • Multiple Output Formats: Display, JSON, Markdown, and summary formats
  • Performance Metrics: Automatic latency measurement and result statistics
  • LLM Evaluation: Support for subjective quality assessment using Claude 4.1 Opus
  • Rich CLI: Beautiful terminal output with tables and panels
  • Comprehensive Testing: 90+ tests ensuring reliability

Installation

Prerequisites

  • Python 3.9+
  • uv - Fast Python package installer and resolver

To install uv:

# On macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Or with Homebrew
brew install uv

# Or with pip
pip install uv

Setup

# Clone the repository
git clone https://github.com/ansari-project/ragdiff.git
cd ragdiff

# Install dependencies with uv
uv sync --all-extras  # Install all dependencies including dev tools

# Or install only core dependencies
uv sync

# Or install with goodmem support
uv sync --extra goodmem

# Copy environment template
cp .env.example .env
# Edit .env and add your API keys

Configuration

Create a configs/tools.yaml file:

tools:
  mawsuah:
    api_key_env: VECTARA_API_KEY
    corpus_id: ${VECTARA_CORPUS_ID}
    base_url: https://api.vectara.io
    timeout: 30

  goodmem:
    api_key_env: GOODMEM_API_KEY
    base_url: https://api.goodmem.ai
    timeout: 30

llm:
  model: claude-opus-4-1-20250805
  api_key_env: ANTHROPIC_API_KEY

Usage

Basic Comparison

# Compare all configured tools
uv run python -m src.cli compare "What is Islamic inheritance law?"

# Compare specific tools
uv run python -m src.cli compare "Your query" --tool mawsuah --tool goodmem

# Adjust number of results
uv run python -m src.cli compare "Your query" --top-k 10

Output Formats

# Default display format (side-by-side)
uv run python -m src.cli compare "Your query"

# JSON output
uv run python -m src.cli compare "Your query" --format json

# Markdown output
uv run python -m src.cli compare "Your query" --format markdown

# Summary output
uv run python -m src.cli compare "Your query" --format summary

# Save to file
uv run python -m src.cli compare "Your query" --output results.json --format json

Batch Comparison with LLM Evaluation

Run multiple queries and get comprehensive analysis:

# Basic batch comparison
uv run python -m src.cli batch inputs/tafsir-test-queries.txt \
  --config configs/tafsir.yaml \
  --top-k 10 \
  --format json

# With LLM evaluation (generates holistic summary)
uv run python -m src.cli batch inputs/tafsir-test-queries.txt \
  --config configs/tafsir.yaml \
  --evaluate \
  --top-k 10 \
  --format json

# Custom output directory
uv run python -m src.cli batch inputs/tafsir-test-queries.txt \
  --config configs/tafsir.yaml \
  --evaluate \
  --output-dir my-results \
  --format jsonl

The batch command with --evaluate generates:

  • Individual query results in JSON/JSONL/CSV format
  • Latency statistics (P50, P95, P99)
  • LLM evaluation summary showing wins and quality scores
  • Holistic summary (markdown file) with:
    • Query-by-query breakdown with winners and scores
    • Common themes: win distribution, recurring issues
    • Key differentiators: what makes winner better vs loser weaknesses
    • Overall verdict with production recommendation

Convert holistic summary to PDF:

# Generate PDF from markdown summary
python md2pdf.py outputs/holistic_summary_TIMESTAMP.md

Other Commands

# List available tools
uv run python -m src.cli list-tools

# Validate configuration
uv run python -m src.cli validate-config

# Run quick test
uv run python -m src.cli quick-test

# Get help
uv run python -m src.cli --help
uv run python -m src.cli compare --help
uv run python -m src.cli batch --help

Project Structure

ragdiff/
├── src/
│   ├── core/           # Core models and configuration
│   │   ├── models.py    # Data models (RagResult, ComparisonResult, etc.)
│   │   └── config.py    # Configuration management
│   ├── adapters/        # Tool adapters
│   │   ├── base.py      # Base adapter implementing SearchVectara interface
│   │   ├── mawsuah.py   # Vectara/Mawsuah adapter
│   │   ├── goodmem.py   # Goodmem adapter with mock fallback
│   │   └── factory.py   # Adapter factory
│   ├── comparison/      # Comparison engine
│   │   └── engine.py    # Parallel/sequential search execution
│   ├── display/         # Display formatters
│   │   └── formatter.py # Multiple output format support
│   └── cli.py          # Typer CLI implementation
├── tests/              # Comprehensive test suite
├── configs/            # Configuration files
└── requirements.txt    # Python dependencies

Architecture

The tool follows the SPIDER protocol for systematic development:

  1. Specification: Clear goals for subjective RAG comparison
  2. Planning: Phased implementation approach
  3. Implementation: Clean architecture with separation of concerns
  4. Defense: Comprehensive test coverage (90+ tests)
  5. Evaluation: Expert review and validation
  6. Commit: Version control with clear history

Key Components

  • BaseRagTool: Abstract base implementing SearchVectara interface
  • Adapters: Tool-specific implementations (Mawsuah, Goodmem)
  • ComparisonEngine: Orchestrates parallel/sequential searches
  • ComparisonFormatter: Handles multiple output formats
  • Config: Manages YAML configuration with environment variables

Adding New RAG Tools

  1. Create a new adapter in src/adapters/:
from .base import BaseRagTool
from ..core.models import RagResult

class MyToolAdapter(BaseRagTool):
    def search(self, query: str, top_k: int = 5) -> List[RagResult]:
        # Implement tool-specific search
        results = self.client.search(query, limit=top_k)
        return [self._convert_to_rag_result(r) for r in results]
  1. Register in src/adapters/factory.py:
ADAPTER_REGISTRY["mytool"] = MyToolAdapter
  1. Add configuration in configs/tools.yaml:
tools:
  mytool:
    api_key_env: MYTOOL_API_KEY
    base_url: https://api.mytool.com

Development

Running Tests

# Run all tests
uv run pytest tests/

# Run specific test file
uv run pytest tests/test_cli.py

# Run with coverage
uv run pytest tests/ --cov=src

Code Style

The project uses:

  • Black for formatting
  • Ruff for linting
  • MyPy for type checking
# Format code with Black
uv run black src/ tests/

# Check linting with Ruff
uv run ruff check src/ tests/

# Type checking with MyPy
uv run mypy src/

Environment Variables

Required environment variables:

  • VECTARA_API_KEY: For Mawsuah/Vectara access
  • VECTARA_CORPUS_ID: Vectara corpus ID
  • GOODMEM_API_KEY: For Goodmem access (optional, uses mock if not set)
  • ANTHROPIC_API_KEY: For LLM evaluation (optional)

License

[Your License]

Contributing

Contributions welcome! Please follow the existing code style and add tests for new features.

Acknowledgments

Built following the SPIDER protocol for systematic development.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages