Skip to content

Navigation Menu

Appearance settings

SWE-bench

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

SWE-bench

Organization for maintaining SWE-bench and related projects

127 followers
https://swebench.com/

Overview
Repositories
Projects
Packages
People

More

Overview
Repositories
Projects
Packages
People

README.md

📣 New: Meet mini, the 100 line AI agent that still gets 65% on SWE-bench verified!

Software engineering agents, benchmarks, and models.

Built and maintained by researchers from Stanford University and Princeton University.

This organization contains the source code for several projects in the SWE-* open source ecosystem, including:

SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
SWE-agent, a system that automatically solves GitHub issues using an LM agent.
SWE-smith, a toolkit for generating SWE training data at scale.
mini, an AI agent written in just 100 lines of code that scores 65% on SWE-bench verified

Also check out the supporting infrastructure for working with SWE-* projects

SWE-ReX, infrastructure supporting sandboxed code execution for AI agents
sb-cli, a command line interface for running evaluations on the cloud.
Mirror clones for the SWE-bench and SWE-smith repositories are available here and here.

Pinned Loading

SWE-bench SWE-bench Public

SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?

Python 3.3k 574
SWE-smith SWE-smith Public

Scaling Data for SWE-agents

Python 329 45
experiments experiments Public

Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

Shell 198 219
sb-cli sb-cli Public

Run SWE-bench evaluations remotely

Python 34 1

Repositories

Loading

Type

Select type

All Public Sources Forks Archived Mirrors Templates

Language

Select language

All Dockerfile HTML Jupyter Notebook Python Shell

Sort

Select order

Last updated Name Stars

Showing 9 of 9 repositories

SWE-smith Public
Scaling Data for SWE-agents

SWE-bench/SWE-smith’s past year of commit activity

Python 329 CC-BY-4.0 45 8 (3 issues need help) 7 Updated Jul 31, 2025
swe-bench.github.io Public
Landing page + leaderboard for SWE-Bench benchmark

SWE-bench/swe-bench.github.io’s past year of commit activity

HTML 6 9 4 0 Updated Jul 31, 2025
SWE-bench Public
SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?

SWE-bench/SWE-bench’s past year of commit activity

Python 3,255 MIT 574 37 12 Updated Jul 30, 2025
sb-cli Public
Run SWE-bench evaluations remotely

SWE-bench/sb-cli’s past year of commit activity

Python 34 MIT 1 2 0 Updated Jul 29, 2025
SWE-smith-envs Public
Artifacts for building environments (Docker images) for repositories represented in SWE-smith

SWE-bench/SWE-smith-envs’s past year of commit activity

Dockerfile 0 0 0 0 Updated Jul 29, 2025
.github Public

SWE-bench/.github’s past year of commit activity

0 MIT 0 0 0 Updated Jul 24, 2025
experiments Public
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

SWE-bench/experiments’s past year of commit activity

Shell 198 219 5 9 Updated Jul 11, 2025
reading-list Public
Academic papers and works related to SWE-bench and SWE-agents

SWE-bench/reading-list’s past year of commit activity

5 0 0 0 Updated Jun 27, 2025
humanevalfix-results Public archive
Evaluation data + results for SWE-agent inference on HumanEvalFix task

SWE-bench/humanevalfix-results’s past year of commit activity

Jupyter Notebook 1 0 0 0 Updated Jul 11, 2024

People

Top languages

Python Shell HTML Jupyter Notebook Dockerfile

Most used topics

language-model software-engineering

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.