A3C - Asynchronous Advantage Actor-Critic

Image source

This is our implementation of A3C and the corresponding synchronous version A2C based on the paper Asynchronous Methods for Deep Reinforcement Learning from Mnih, et al. We also combined this with General Advantage Estimation as it has shown improved performance for policy gradient methods.

Code structure

models: Neural network models for actor and critic.
optimizers: Optimizers with shared statistics for A3C.
util: Helper methods to make main code more readable.

Executing experiments

Activate the anaconda environment

source activate my_env

Execute the a3c_runner script (the default environment is CartpoleStabShort-v0)

Training run from scratch:

python3 my/path/to/a3c_runner.py

Continue training run from an existing policy:

python3 my/path/to/a3c_runner.py --path my_model_path

More console arguments (e.g. hyperparameter changes) can be added to the run, for details see

python3 my/path/to/a3c_runner.py --help

(Optional) Start tensorboard to monitor training progress

tensorboard --logdir=./experiments/runs

Executing evaluation run for existing policy

Activate the anaconda environment

source activate my_env

Execute the a3c_runner script

python3 my/path/to/a3c_runner.py --path my_model_path --test

e.g. load pretrained models in test mode:

CartpoleStabShort-v0 (500Hz)

python3 a3c_runner.py --env-name CartpoleStabShort-v0 --max-action 5 --test --path experiments/best_models/a3c/stabilization/simulation/model_split_T-53420290_global-7597.67863_test-9999.97380.pth.tar

CartpoleSwingShort-v0 (500Hz)

python3 a3c_runner.py --env-name CartpoleSwingShort-v0 --max-action 10 --test --path experiments/best_models/a3c/swing_up/model_split_T-13881240_global-4532.753498284313_test-19520.67601316739.pth.tar

Qube-v0 (500Hz)

python3 a3c_runner.py --env-name Qube-v0 --max-action 5 --test --path experiments/best_models/a3c/qube/500Hz/model_split_T-164122000_global-3.66047_test-5.51714.pth.tar

Qube-v0 (50Hz)

python3 a3c_runner.py --env-name Qube-v0 --max-action 5 --test --path experiments/best_models/a3c/qube/50Hz/model_split_T-72839490_global-2.077353393893449_test-3.4406189782812775.pth.tar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

A3C - Asynchronous Advantage Actor-Critic

Code structure

Executing experiments

Executing evaluation run for existing policy

CartpoleStabShort-v0 (500Hz)

CartpoleSwingShort-v0 (500Hz)

Qube-v0 (500Hz)

Qube-v0 (50Hz)

Files

README.md

Latest commit

History

README.md

File metadata and controls

A3C - Asynchronous Advantage Actor-Critic

Code structure

Executing experiments

Executing evaluation run for existing policy

CartpoleStabShort-v0 (500Hz)

CartpoleSwingShort-v0 (500Hz)

Qube-v0 (500Hz)

Qube-v0 (50Hz)