GitHub - sumukshashidhar/yourbench: Benchmark Large Language Models Reliably On Your Data

YourBench: A Dynamic Benchmark Generation Framework

[GitHub] · [Dataset] · [Documentation]

YourBench is an open-source framework for generating domain-specific benchmarks in a zero-shot manner, inspired by modern software testing practices. It aims to keep your large language models on their toes—even as new data sources, domains, and knowledge demands evolve.

Highlights:

Dynamic Benchmark Generation: Produce diverse, up-to-date questions from real-world source documents (PDF, Word, HTML, even multimedia).
Scalable & Structured: Seamlessly handles ingestion, summarization, and multi-hop chunking for large or specialized datasets.
Zero-Shot Focus: Emulates real-world usage scenarios by creating fresh tasks that guard against memorized knowledge.
Extensible: Out-of-the-box pipeline stages (ingestion, summarization, question generation), plus an easy plugin mechanism to accommodate custom models or domain constraints.

Quick Start (Alpha)

# 1. Clone the repo
git clone https://github.com/huggingface/yourbench.git
cd yourbench

# Use uv to install the dependencies
uv venv
source .venv/bin/activate
uv sync

# 3. Get a key from https://openrouter.ai/ and add it to the .env file (or make your own config with a different model!)
touch .env
echo "OPENROUTER_API_KEY=<your_openrouter_api_key>" >> .env

# 4. Run the pipeline with an example config
yourbench --config configs/example.yaml

You can also launch a minimal Gradio UI by including --gui. It will let you interactively explore your pipeline stages.

Note: The above instructions are a work-in-progress, and more comprehensive usage info will be provided soon.

Repository Structure (Overview)

.
├── yourbench/                   # Core framework modules
│   ├── pipeline/                # Stages: ingestion, summarization, question generation, etc.
│   ├── utils/                   # Utility modules (loading, prompts, inference, etc.)
│   └── main.py                  # CLI and UI entry point
├── configs/                     # Example YAML configs
├── docs/                        # Documentation, assets
├── data/                        # Example data (raw, ingested, HF dataset)
├── pyproject.toml               # Project metadata
└── README.md                    # (This File)

Name		Name	Last commit message	Last commit date
Latest commit History 271 Commits
configs		configs
data/example/raw		data/example/raw
docs		docs
yourbench		yourbench
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YourBench: A Dynamic Benchmark Generation Framework

Quick Start (Alpha)

Repository Structure (Overview)

About

Releases

Packages

Contributors 3

Languages

sumukshashidhar/yourbench

Folders and files

Latest commit

History

Repository files navigation

YourBench: A Dynamic Benchmark Generation Framework

Quick Start (Alpha)

Repository Structure (Overview)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages