Skip to content

feat: upgrade evaluation pipeline with W&B and update docs#2

Merged
Western-1 merged 1 commit intomainfrom
feat/wandb-integration
Jan 1, 2026
Merged

feat: upgrade evaluation pipeline with W&B and update docs#2
Western-1 merged 1 commit intomainfrom
feat/wandb-integration

Conversation

@Western-1
Copy link
Owner

📝 CHANGELOG

All notable changes to the Talk to Your Docs RAG System.

[3.1.0] - 2026-01-01 - W&B Evaluation Pipeline

✨ New Features

  • Weights & Biases Integration: Added full support for experiment tracking.
    • New script evaluation/track_experiment.py.
    • Logs Ragas metrics (Faithfulness, Precision, Recall) to W&B cloud.
    • Logs detailed pandas DataFrames with Q&A pairs for analysis.
  • Auto-Ingestion (Cold Start): The evaluation pipeline now detects if Qdrant is empty.
    • Automatically generates a synthetic PDF (test_data_autogen.pdf) using ReportLab.
    • Ingests data and cleans up automatically (Zero-setup testing).
  • Robust Evaluator Class: Introduced RAGWandbEvaluator class for cleaner, modular evaluation logic.

🔧 Improvements / Performance

  • Groq Rate Limit Handling: Implemented a "Monkey Patching" mechanism for ChatGroq.
    • Intercepts invoke calls to enforce a 10s delay between requests.
    • Prevents 429 Too Many Requests errors on Free Tier (8k TPM limit).
    • Uses object.__setattr__ to bypass Pydantic validation on LangChain objects.
  • Clean CLI Output: Silenced noisy loggers (httpx, groq, httpcore, qdrant_client) during evaluation.
  • Increased Resilience: Updated RunConfig with max_retries=10 and timeout=600s for long-running evaluations.

🐛 Bug Fixes

  • Pydantic Validation Bypass: Fixed ValueError: "ChatGroq" object has no field "invoke" by using direct attribute setting.
  • LangChain Prompt Handling: Fixed Chain invocation failed error in src/rag.py.
    • Added check: if isinstance(lc_prompt, str) to convert string prompts from Langfuse into ChatPromptTemplate.
  • Git Hygiene: Updated .gitignore to strictly exclude wandb/ local directories and artifacts.

📚 Documentation

  • README.md:
    • Added Evaluation & Tracking section.
    • Added comparison tables for "Baseline" vs "Tracked" results.
    • Added W&B integration screenshot.
    • Added make track command documentation.
  • New Assets: Added docs/rag-eval-metrics-wandb.png.

📂 Files Changed

  • Added:
    • evaluation/track_experiment.py
    • images/rag-eval-metrics-wandb.png
  • Modified:
    • src/rag.py (Fixed prompt template type error)
    • .gitignore (Added wandb rules)
    • README.md (Added evaluation docs)

Upgrade Steps for Evaluation

  1. Install new dependencies:
    pip install reportlab wandb
  2. Run tracked experiment:
    make track

@Western-1 Western-1 merged commit 3b320e1 into main Jan 1, 2026
1 check passed
@Western-1 Western-1 deleted the feat/wandb-integration branch January 1, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant