Skip to content

Latest commit

 

History

History
73 lines (57 loc) · 2.65 KB

README.md

File metadata and controls

73 lines (57 loc) · 2.65 KB

A3C - Asynchronous Advantage Actor-Critic

PILCO_overview Image source

This is our implementation of A3C and the corresponding synchronous version A2C based on the paper Asynchronous Methods for Deep Reinforcement Learning from Mnih, et al. We also combined this with General Advantage Estimation as it has shown improved performance for policy gradient methods.

Code structure

  • models: Neural network models for actor and critic.
  • optimizers: Optimizers with shared statistics for A3C.
  • util: Helper methods to make main code more readable.

Executing experiments

  1. Activate the anaconda environment
source activate my_env
  1. Execute the a3c_runner script (the default environment is CartpoleStabShort-v0)

Training run from scratch:

python3 my/path/to/a3c_runner.py

Continue training run from an existing policy:

python3 my/path/to/a3c_runner.py --path my_model_path

More console arguments (e.g. hyperparameter changes) can be added to the run, for details see

python3 my/path/to/a3c_runner.py --help
  1. (Optional) Start tensorboard to monitor training progress
tensorboard --logdir=./experiments/runs 

Executing evaluation run for existing policy

  1. Activate the anaconda environment
source activate my_env
  1. Execute the a3c_runner script
python3 my/path/to/a3c_runner.py --path my_model_path --test

e.g. load pretrained models in test mode:

CartpoleStabShort-v0 (500Hz)

python3 a3c_runner.py --env-name CartpoleStabShort-v0 --max-action 5 --test --path experiments/best_models/a3c/stabilization/simulation/model_split_T-53420290_global-7597.67863_test-9999.97380.pth.tar

CartpoleSwingShort-v0 (500Hz)

python3 a3c_runner.py --env-name CartpoleSwingShort-v0 --max-action 10 --test --path experiments/best_models/a3c/swing_up/model_split_T-13881240_global-4532.753498284313_test-19520.67601316739.pth.tar

Qube-v0 (500Hz)

python3 a3c_runner.py --env-name Qube-v0 --max-action 5 --test --path experiments/best_models/a3c/qube/500Hz/model_split_T-164122000_global-3.66047_test-5.51714.pth.tar

Qube-v0 (50Hz)

python3 a3c_runner.py --env-name Qube-v0 --max-action 5 --test --path experiments/best_models/a3c/qube/50Hz/model_split_T-72839490_global-2.077353393893449_test-3.4406189782812775.pth.tar