Skip to content

issues Search Results · repo:openai/mle-bench language:Python

Filter by

27 results
 (49 ms)

27 results

inopenai/mle-bench (press backspace or delete to remove)

I would remove aerial cactus identification. The leaderboard is all ones, so placement becomes meaningless, and the simplest prompt on gemini flash achieves like .996 pretty consistently.
  • SappieKonig
  • Opened 
    13 days ago
  • #40

Thanks for your great work. When I reproduced the seq- seq task myself, I found that the experimental results reported in your paper were inconsistent. I would like to ask if there is a problem with my ...
  • OliverLeeXZ
  • Opened 
    14 days ago
  • #39

Thanks for your great work! I note that when I running mlebench on opendevin and mlagentbench workflow. The kwargs parameter is not passed into docker when executing the run.py file. For example, claudev1 ...
  • OliverLeeXZ
  • Opened 
    14 days ago
  • #38

Hi, thanks for your great work. I have a question about this AIDE branch: https://github.com/thesofakillers/aideml, which is the AIDE repo used in mlebench, right? In this repo the function calling feature ...
  • jessyford
  • Opened 
    16 days ago
  • #37

Thanks for your great work! When i using mlagent bench, facing File /home/agent/MLAgentBench/MLAgentBench/LLM.py , line 175, in complete_text_claude rsp = anthropic_client.completions.create( [2025-03-12 ...
  • OliverLeeXZ
  • Opened 
    21 days ago
  • #36

Hi MLE-Bench Developers, I m currently copying data from your listed competitions and noticed something odd with the ml2021spring-hw2 competition. There appear to be only two submissions (kernels), but ...
  • Bihui-Jin
  • 2
  • Opened 
    25 days ago
  • #35

Hey, I noticed in the OpenAI GPT-4.5 System Card there are results for o1, o3-mini, deep research and gpt-4.5 on MLE-Bench. Would it be possible to get the results, logs and grading reports for these test ...
  • Fardeen786-eng
  • 1
  • Opened 
    on Mar 2
  • #34

Installing nvidia gpu in docker environment has some problems for me. I would like to ask if it is not installed, what impact will it have on the running results of the agent?
  • OliverLeeXZ
  • 1
  • Opened 
    on Feb 28
  • #33

Thanks for your great work! When I m running aide in mlsp-2013-birds task, I m facing buggy note. [2025-02-25 18:27:00,356] INFO: Agent is parsing execution results for node a4b3c6613a194338b7f271fc1bbf1d52 ...
  • OliverLeeXZ
  • 1
  • Opened 
    on Feb 27
  • #32
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue search results · GitHub