Skip to content

Wrong softmax temperature when choosing the action for the next step #1

@andste97

Description

@andste97

Dear maintainers, I have found that the softmax temperature you use in the code in the master branch when expanding the next state is different from what is stated in your publication, as well as the original code.

Instead of setting temperature = 0, to use the deterministic softmax version, you set the parameter to the same value as the one used by the planner, which made led to the policy not learning in the Atari environment in my experiments, as random actions would get inserted into the replay buffer.

The issue can be fixed by changing the temp parameter in line 94 in file piIW_alphazero.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions