You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a branch: git checkout -b feature/your-benchmark
Implement your benchmark following the approved proposal
Test thoroughly with multiple agents and tiers
Create pull request linking to your original issue
PR Requirements
Link to issue: Reference your benchmark proposal issue
Complete implementation: All files from your proposal
Repository fixture builds successfully
Tested with at least 2 different agents
Tested with multiple prompt tiers
Validation commands work correctly
Documentation is clear and complete
Example PR Title
[BENCHMARK] Add framework-migration suite with react-to-solid scenario
Step 6: Review and Iteration
Review Process
Initial review (1-2 business days)
Feedback on implementation, testing, documentation
Address feedback and make requested changes
Final review and approval
Merge into main branch
Common Areas for Improvement
Testing: Add more comprehensive test coverage
Documentation: Improve clarity and examples
Performance: Optimize benchmark execution time
Edge cases: Handle more error conditions
Code quality: Improve organization and readability
Best Practices
Benchmark Design
Realistic: Use scenarios developers actually face
Challenging: Test agent capabilities appropriately
Complete: Include all necessary files and configurations
Tested: Validate with multiple agents and tiers
Documented: Clear documentation and examples
Testing Strategy
# Test with echo agent first (fastest)
pnpm bench your-suite your-scenario L1 echo# Test with anthropic agent
pnpm bench your-suite your-scenario L1 anthropic
# Test all tiers
pnpm bench your-suite your-scenario --batch echo# Test specific combinations
pnpm bench your-suite your-scenario L0,L1,L2 anthropic
Testing strategy (interactive)
# run this and follow instructions
pnpm bench
Screen.Recording.2025-10-28.115901.mp4
Quality Standards
Repository fixture: Minimal but complete, realistic structure
Prompts: Clear progression from minimal to detailed
Validation: Commands that actually test the requirements
Oracle answers: Comprehensive responses to common questions
How to Create New Benchmarks
Welcome everyone and thank you for deciding to add to ze-benchmarks. Below you will find guidance on how to contribute to ze-benchmarks.
Overview
The benchmark creation process follows these steps:
Step 1: Plan Your Benchmark
Before creating anything, familiarize yourself with our documentation:
Essential Reading
Key Concepts
Step 2: Create a Benchmark Proposal
Use our GitHub issue template to propose your benchmark:
How to Create a Proposal
What the Template Asks For
pnpm bench <suite> <scenario> L1 echo)scenario.yamltemplateExample Proposal Structure
Step 3: Get Feedback on Your Proposal
After submitting your proposal:
Common Feedback Areas
Step 4: Implement Your Benchmark
Once your proposal is approved:
File Structure
Implementation Checklist
scenario.yamlwith your configurationecho,anthropic)Step 5: Submit a Pull Request
Creating Your PR
git checkout -b feature/your-benchmarkPR Requirements
Example PR Title
Step 6: Review and Iteration
Review Process
Common Areas for Improvement
Best Practices
Benchmark Design
Testing Strategy
Testing strategy (interactive)
# run this and follow instructions pnpm benchScreen.Recording.2025-10-28.115901.mp4
Quality Standards
Getting Help
Resources
Support Channels
Timeline Expectations
Recognition
Ready to Get Started?
Thank you for contributing to ze-benchmarks! Your benchmarks help make AI agent evaluation more comprehensive and useful for the entire community.
Questions? Feel free to ask in the comments below or create a new discussion!