SACD Discrete Soft Actor Critic#203
Conversation
|
Hello,
please don't forget that part (see contributing guide). |
|
Hello, |
|
yes please =) |
Do you have the performance results for this? I came across this PR looking for implementations on SACD. Thank you. |
Unfortunately I never found the time to do the performance benchmark. I however use this implementation in several projects of mine with good results. So the implementation seems to be correct. |
| def get_crit_params(self, n): | ||
| return self.q_networks[n].parameters() | ||
|
|
||
| def forward(self, obs: th.Tensor) -> Tuple[th.Tensor, ...]: |
There was a problem hiding this comment.
It seems self.features_extractor is None is not handled in forward method.
|
Hi, I’d like to contribute to this PR by adding the rl-baselines3-zoo benchmarks. Plan:
I already tested the PR locally on CartPole and it trains perfectly (reaches ~250 reward very fast). Before I run the full 5-seed Atari experiments, could you confirm the environments and timestep budget look good, or would you prefer anything different? I can have the results ready within the next few days and either post the plots here or open a small follow-up PR in rl-baselines3-zoo if that’s cleaner. Thanks! |
|
Hello, sorry for the late reply.
That would be a good start. Please use the RL Zoo for that (and the wandb integration if possible).
250 is not very good for CartPole (it should reach ~500). |
|
Hi @araffin , On a second run, CartPole reached 500 reward after about 100k steps, so it seems promising. I'll follow the plan then and use rl zoo and wandb integration for the experiment result logs, then compare the learning curves between the algorithms and contrast it against the original paper. In the potential case the performance of this PR does not reach the expected value, I'll debug it myself in order to understand why. |
This PR introduces the Soft Actor Critic for discrete actions (SACD) algorithm.
Description
This PR implements the SAC-Discrete algorithm as described in this paper https://arxiv.org/abs/1910.07207. This implementation borrows code from the papers original implementation (https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) as well as provided by the issues author who requested this feature in stable baselines (https://github.com/toshikwa/sac-discrete.pytorch)
Context
Types of changes
Checklist:
make format(required)make check-codestyleandmake lint(required)make pytestandmake typeboth pass. (required)Note: we are using a maximum length of 127 characters per line