YourBench is an open-source framework for generating domain-specific benchmarks in a zero-shot manner, inspired by modern software testing practices. It aims to keep your large language models on their toes—even as new data sources, domains, and knowledge demands evolve.
Highlights:
- Dynamic Benchmark Generation: Produce diverse, up-to-date questions from real-world source documents (PDF, Word, HTML, even multimedia).
- Scalable & Structured: Seamlessly handles ingestion, summarization, and multi-hop chunking for large or specialized datasets.
- Zero-Shot Focus: Emulates real-world usage scenarios by creating fresh tasks that guard against memorized knowledge.
- Extensible: Out-of-the-box pipeline stages (ingestion, summarization, question generation), plus an easy plugin mechanism to accommodate custom models or domain constraints.
# 1. Clone the repo
git clone https://github.com/huggingface/yourbench.git
cd yourbench
# Use uv to install the dependencies
uv venv
source .venv/bin/activate
uv sync
# 3. Get a key from https://openrouter.ai/ and add it to the .env file (or make your own config with a different model!)
touch .env
echo "OPENROUTER_API_KEY=<your_openrouter_api_key>" >> .env
# 4. Run the pipeline with an example config
yourbench --config configs/example.yaml
You can also launch a minimal Gradio UI by including --gui
.
It will let you interactively explore your pipeline stages.
Note: The above instructions are a work-in-progress, and more comprehensive usage info will be provided soon.
.
├── yourbench/ # Core framework modules
│ ├── pipeline/ # Stages: ingestion, summarization, question generation, etc.
│ ├── utils/ # Utility modules (loading, prompts, inference, etc.)
│ └── main.py # CLI and UI entry point
├── configs/ # Example YAML configs
├── docs/ # Documentation, assets
├── data/ # Example data (raw, ingested, HF dataset)
├── pyproject.toml # Project metadata
└── README.md # (This File)