Skip to content

Replicate the scores for Table 2 #29

@finalily

Description

@finalily

Hello Authors of SmartPlay,

Thank you for providing this nice testbed.
I am trying to replicate the scores for Table 2, follow your env setting on git repo.
eg. For RockPaperScissorBasic (RPS) game
challenges:
all:
Error/Mistake Handling: 1
Generalization: 2
Instruction Following: 3
Learning from Interactions: 3
Long Text Understanding: 2
Planning: 1
Understanding the Odds: 3
Reasoning: 1
Spatial Reasoning: 1
recorded settings:
RockPaperScissorBasic:
iter: 20
steps: 50
human score: 43
min score: 0

I run with GPT-4, but the score i get for
RPS and Hanoi is 0.70 and 0.30
which is different from Table 2

GPT-4-0613 0.91 0.83
GPT-4-0314 0.98 0.90

Could you please share more details regarding LLM inference parameters. The temperature, top_p, frequency_penalty.

Hopefully I could use them to replicate your score on the paper Table2.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions