Replies: 1 comment 3 replies
-
I was toying with DQN a while ago and also needed to select the actions based on the weight of the Q-Values. This is what I put together based on https://github.com/pytorch/rl/blob/main/torchrl/modules/tensordict_module/exploration.py#L31:
Needs to be polished and also does not implement the |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have been going through the code of
QValueActor
and I was hoping to have the option to have anexploration mode
with atemperature
parameter. Rather than always selecting the max-value, during exploration mode we could sample the action based on the weight of the Q-Values.Also, another comment is that if a user provides an
action_mask
theaction values
returned will be the min of the given data type for the illegal actions. Maybe the user should be able to select, if the log-probabilities should be altered or not. It's easy to mask the logits after but impossible to get the original logits out of the masked ones.I can help with the implementation if needed.
Beta Was this translation helpful? Give feedback.
All reactions