SACD Discrete Soft Actor Critic by splatter96 · Pull Request #203 · Stable-Baselines-Team/stable-baselines3-contrib

splatter96 · 2023-08-07T12:43:17Z

This PR introduces the Soft Actor Critic for discrete actions (SACD) algorithm.

Description

This PR implements the SAC-Discrete algorithm as described in this paper https://arxiv.org/abs/1910.07207. This implementation borrows code from the papers original implementation (https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) as well as provided by the issues author who requested this feature in stable baselines (https://github.com/toshikwa/sac-discrete.pytorch)

Context

I have raised an issue to propose this change (required)
Original issue in the stable baselines repo [Feature request] Implement SAC-Discrete DLR-RM/stable-baselines3#157

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

Note: we are using a maximum length of 127 characters per line

currently

critics

araffin · 2023-08-12T07:55:03Z

Hello,
thanks for the PR =)

The functionality/performance matches that of the source (required for new training algorithms or training-related features).

please don't forget that part (see contributing guide).
I think there are discussion about the results here too: vwxyzjn/cleanrl#270

splatter96 · 2023-09-01T13:17:39Z

Hello,
thanks for the feedback :)
Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

araffin · 2023-09-01T13:25:42Z

yes please =)

daniel-redder · 2025-03-06T17:26:53Z

Hello, thanks for the feedback :) Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

Do you have the performance results for this? I came across this PR looking for implementations on SACD. Thank you.

splatter96 · 2025-03-24T12:41:25Z

Hello, thanks for the feedback :) Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

Do you have the performance results for this? I came across this PR looking for implementations on SACD. Thank you.

Unfortunately I never found the time to do the performance benchmark. I however use this implementation in several projects of mine with good results. So the implementation seems to be correct.

gyunt · 2025-05-10T03:46:55Z

+    def get_crit_params(self, n):
+        return self.q_networks[n].parameters()
+
+    def forward(self, obs: th.Tensor) -> Tuple[th.Tensor, ...]:


It seems self.features_extractor is None is not handled in forward method.

alektebel · 2025-11-18T22:13:22Z

@araffin @splatter96

Hi, I’d like to contribute to this PR by adding the rl-baselines3-zoo benchmarks.

Plan:

Run 5 seeds each on:
- CartPole-v1 (MlpPolicy)
- MsPacmanNoFrameskip-v4 (CnnPolicy)
- BreakoutNoFrameskip-v4 (CnnPolicy)
Train for 1 million timesteps per environment
Generate the standard zoo learning-curve plots (mean ± std)
Compare visually/qualitatively against the original p-christ and toshikwa results

I already tested the PR locally on CartPole and it trains perfectly (reaches ~250 reward very fast).

Before I run the full 5-seed Atari experiments, could you confirm the environments and timestep budget look good, or would you prefer anything different?

I can have the results ready within the next few days and either post the plots here or open a small follow-up PR in rl-baselines3-zoo if that’s cleaner.

Thanks!

araffin · 2026-03-20T11:14:40Z

Hello,

sorry for the late reply.

Plan:

That would be a good start. Please use the RL Zoo for that (and the wandb integration if possible).

I already tested the PR locally on CartPole and it trains perfectly (reaches ~250 reward very fast).

250 is not very good for CartPole (it should reach ~500).

alektebel · 2026-03-23T00:21:23Z

Hi @araffin ,

On a second run, CartPole reached 500 reward after about 100k steps, so it seems promising. I'll follow the plan then and use rl zoo and wandb integration for the experiment result logs, then compare the learning curves between the algorithms and contrast it against the original paper. In the potential case the performance of this PR does not reach the expected value, I'll debug it myself in order to understand why.

pauerbach and others added 9 commits July 31, 2023 16:07

Added first version of SAC Discrete, which is running but not learning

a14ae69

currently

Fixed bugs in that lead to wrong results, currently only working with 2

875b8bc

critics

Reworked code to work whith more than 2 critic networks

7711813

Code style changes

4a37f58

Prepared files for merge request (minor cleanup)

fca2c6d

Added run test for SACD

610fd3d

Added doc page for SACD

d97dbc7

Added save_load test for SACD

bc08ee9

Merge branch 'Stable-Baselines-Team:master' into master

4e99b74

gyunt reviewed May 10, 2025

View reviewed changes

araffin added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Nov 18, 2025

araffin added 3 commits March 20, 2026 12:02

Merge branch 'master' into splatter96/master

6dbae47

Update type hints

933ee39

Fix mypy errors

e41b3fe

araffin removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Mar 20, 2026

doc: convert to markdown

efc414e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SACD Discrete Soft Actor Critic#203

SACD Discrete Soft Actor Critic#203
splatter96 wants to merge 13 commits into
Stable-Baselines-Team:masterfrom
splatter96:master

splatter96 commented Aug 7, 2023

Uh oh!

araffin commented Aug 12, 2023

Uh oh!

splatter96 commented Sep 1, 2023

Uh oh!

araffin commented Sep 1, 2023

Uh oh!

daniel-redder commented Mar 6, 2025 •

edited

Loading

Uh oh!

splatter96 commented Mar 24, 2025

Uh oh!

gyunt May 10, 2025 •

edited

Loading

Uh oh!

alektebel commented Nov 18, 2025

Uh oh!

araffin commented Mar 20, 2026

Uh oh!

alektebel commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

splatter96 commented Aug 7, 2023

Description

Context

Types of changes

Checklist:

Uh oh!

araffin commented Aug 12, 2023

Uh oh!

splatter96 commented Sep 1, 2023

Uh oh!

araffin commented Sep 1, 2023

Uh oh!

daniel-redder commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

splatter96 commented Mar 24, 2025

Uh oh!

gyunt May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alektebel commented Nov 18, 2025

Uh oh!

araffin commented Mar 20, 2026

Uh oh!

alektebel commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

daniel-redder commented Mar 6, 2025 •

edited

Loading

gyunt May 10, 2025 •

edited

Loading