This repository provides code for experiments in the paper CM3[1], published in ICLR 2020. It contains the main algorithm and baselines, and the three simulation Markov games on which algorithms were evaluated.
- All experiments were run on Ubuntu 16.04
- Python 3.6
- TensorFlow 1.10
- SUMO
- pygame:
sudo apt-get install python-pygame - OpenAI Gym 0.12.1
alg: Implementation of algorithms and config files.config.jsonis the main config file.config_particle_*.jsonspecifies various instances of the cooperative navigation task.config_sumo_stage{1,2}.jsonspecifies agent initial/goal lane configurations for SUMO.config_checkers_stage{1,2}.jsonspecifies parameters of the Checkers game.env: Python wrappers/definitions of the simulation environments.env_sumo: XML files that define the road and traffic for the underlying SUMO simulator.log: Each experiment run will create a subfolder that contains the reward values logged during the training or test run.saved: Each experiment run will create a subfolder contains trained TensorFlow models.
There are three simulations, selected by the experiment field in alg/config.json.
- Cooperative navigation: particles must move to individual target locations while avoiding collisions.
- Environment code located in
env/multiagent-particle-envs/
- Environment code located in
- SUMO
- Stage 1: single agent on empty road. Corresponds to setting
"stage" : 1 - Stage 2: two agents on empty road. Corresponds to setting
"stage" : 2 - Python wrappers located in
env/. Entry point isenv/multicar_simple.py - SUMO topology and traffic defined in
env_sumo/simple/
- Stage 1: single agent on empty road. Corresponds to setting
- Checkers: two agents cooperate to collect rewards while avoiding penalties in a checkered map.
- Implemented in
env/checkers.py
- Implemented in
- Cooperative navigation: run
pip install -e .insideenv/multiagent-particle-envs/ - SUMO: Install SUMO and add the following to your
.bashrc
export PYTHONPATH=$PYTHONPATH:path/to/sumoexport PYTHONPATH=$PYTHONPATH:path/to/sumo/toolsexport SUMO_HOME="path/to/sumo"
- Checkers: None required
Cooperative navigation
- In
config.json, setexperiment: "particle"particle_configshould be one ofconfig_particle_stage1.json,config_particle_stage2_antipodal.json,config_particle_stage2_cross.json,config_particle_stage2_merge.json
- Inside
alg/, executepython train_onpolicy.py
SUMO
- In
config.json, setexperiment: "sumo"port: if multiple SUMO experiments are run in parallel, each experiment must have its unique number
- Inside
alg/, executepython train_offpolicy.py --env ../env_sumo/simple/merge.sumocfg - Include the option
--guito show SUMO GUI while training (at the cost of increased runtime)
Checkers
- In
config.json, setexperiment: "checkers"
- Inside
alg/, executepython train_offpolicy.py
stage: either 1 or 2dir_restore: for Stage 2 of CM3, this must be equal to the string fordir_namewhen Stage 1 was run.use_alg_credit: 1 for CM3use_Q_credit: 1 for CM3. 0 for ablation that uses value function baseline.train_from_nothing: 1 for Stage 1 of CM3, or the ablation that omits the curriculum. 0 to allow restoring a trained Stage 1 model.model_name: when training Stage 2 and restoring a Stage 1 model, this must be the name of the model in Stage 1.prob_random: 1.0 for Stage 1, 0.2 for Stage 2. Not applicable for Checkers.
@inproceedings{yang2019cm3,
title={CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning},
author={Yang, Jiachen and Nakhaei, Alireza and Isele, David and Fujimura, Kikuo and Zha, Hongyuan},
booktitle={International Conference on Learning Representations},
year={2019}
}
See LICENSE.
SPDX-License-Identifier: MIT