Skip to content

Return substeps for reward accumulation? #296

@aaprasad

Description

@aaprasad

Currently mjx_env.step(state, action, n_substeps) only returns the final state of substepping. However, this makes it so that rewards can only be obtained for that state instead of being accumulated/averaged over the substeps as is normally done when frame skipping. I think this is a pretty simple feature in theory - just have an arg called return_substeps=False and return all the states over the substeps. I'm just wondering if theres a more memory efficient way to do so?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions