Skip to content

sumukshashidhar/yourbench

Repository files navigation

YourBench Logo

YourBench: A Dynamic Benchmark Generation Framework

[GitHub] · [Dataset] · [Documentation]

GitHub Repo stars

YourBench is an open-source framework for generating domain-specific benchmarks in a zero-shot manner, inspired by modern software testing practices. It aims to keep your large language models on their toes—even as new data sources, domains, and knowledge demands evolve.

Highlights:

  • Dynamic Benchmark Generation: Produce diverse, up-to-date questions from real-world source documents (PDF, Word, HTML, even multimedia).
  • Scalable & Structured: Seamlessly handles ingestion, summarization, and multi-hop chunking for large or specialized datasets.
  • Zero-Shot Focus: Emulates real-world usage scenarios by creating fresh tasks that guard against memorized knowledge.
  • Extensible: Out-of-the-box pipeline stages (ingestion, summarization, question generation), plus an easy plugin mechanism to accommodate custom models or domain constraints.

Quick Start (Alpha)

# 1. Clone the repo
git clone https://github.com/huggingface/yourbench.git
cd yourbench

# Use uv to install the dependencies
uv venv
source .venv/bin/activate
uv sync

# 3. Get a key from https://openrouter.ai/ and add it to the .env file (or make your own config with a different model!)
touch .env
echo "OPENROUTER_API_KEY=<your_openrouter_api_key>" >> .env

# 4. Run the pipeline with an example config
yourbench --config configs/example.yaml

You can also launch a minimal Gradio UI by including --gui. It will let you interactively explore your pipeline stages.

Note: The above instructions are a work-in-progress, and more comprehensive usage info will be provided soon.


Repository Structure (Overview)

.
├── yourbench/                   # Core framework modules
│   ├── pipeline/                # Stages: ingestion, summarization, question generation, etc.
│   ├── utils/                   # Utility modules (loading, prompts, inference, etc.)
│   └── main.py                  # CLI and UI entry point
├── configs/                     # Example YAML configs
├── docs/                        # Documentation, assets
├── data/                        # Example data (raw, ingested, HF dataset)
├── pyproject.toml               # Project metadata
└── README.md                    # (This File)

About

Benchmark Large Language Models Reliably On Your Data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •