A command-line interface for interacting with the SWE-bench API. Use this tool to submit predictions, manage runs, and retrieve evaluation reports.
Read the full documentation here. For submission guidelines, see here.
pip install sb-cliBefore using the CLI, you'll need to get an API key:
- Generate an API key:
sb-cli gen-api-key [email protected]- Set your API key as an environment variable - and store it somewhere safe!
export SWEBENCH_API_KEY=your_api_key
# or add export SWEBENCH_API_KEY=your_api_key to your .*rc file- You'll receive an email with a verification code. Verify your API key:
sb-cli verify-api-key YOUR_VERIFICATION_CODESWE-bench has different subsets and splits available:
- swe-bench-m: The SWE-bench Multimodal dataset
- swe-bench_verified: 500 verified problems from SWE-bench Learn more
- swe-bench_lite: A subset of the original SWE-bench for testing
- dev: Development/validation split
- test: Test split (currently only available for- swe-bench_liteand- swe-bench_verified)
You'll need to specify both a subset and split for most commands.
Submit your model's predictions to SWE-bench:
sb-cli submit swe-bench-m test \
    --predictions_path predictions.json \
    --run_id my_run_idOptions:
- --run_id: ID of the run to submit predictions for (optional, defaults to the name of the parent directory of the predictions file)
- --instance_ids: Comma-separated list of specific instance IDs to submit (optional)
- --output_dir: Directory to save report files (default: sb-cli-reports)
- --overwrite: Overwrite existing report (default: 0)
- --gen_report: Generate a report after evaluation is complete (default: 1)
Retrieve evaluation results for a specific run:
sb-cli get-report swe-bench-m dev my_run_id -o ./reportsView all your existing run IDs for a specific subset and split:
sb-cli list-runs swe-bench-m devYour predictions file should be a JSON file in one of these formats:
{
    "instance_id_1": {
        "model_patch": "...",
        "model_name_or_path": "..."
    },
    "instance_id_2": {
        "model_patch": "...",
        "model_name_or_path": "..."
    }
}Or as a list:
[
    {
        "instance_id": "instance_id_1",
        "model_patch": "...",
        "model_name_or_path": "..."
    },
    {
        "instance_id": "instance_id_2",
        "model_patch": "...",
        "model_name_or_path": "..."
    }
]To submit your system to the SWE-bench Multimodal leaderboard:
- Submit your predictions for the swe-bench-m/testsplit using the CLI
- Fork the experiments repository
- Add your submission files under experiments/multimodal/YOUR_MODEL_NAME/
- Create a PR with your submission
See the detailed guide in our submission documentation.
Note: Check your test split quota using sb-cli quota swe-bench-m test before submitting.
