This repository implements reinforcement learning algorithms for training wheel-legged robots in MuJoCo simulation environments. It supports multiple RL algorithms including PPO (via RSL-RL), DDPG, and TD3, with distributed training capabilities for both local machines and HPC clusters. The project includes custom MuJoCo environments, training scripts, visualization tools, and data analysis utilities for studying locomotion behaviors.
It is recommended to use mamba to create and manage the environment. Check the link to see how to setup the miniforge3, with mamba provided. Note: when prompted to setup bash, choose yes.
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
bash Miniforge3-$(uname)-$(uname -m).sh
source ~/.bashrcAfter installing and configuring mamba, run the following command:
# create virtual environment
mamba env create -n wl_learning python=3.12
mamba activate wl_learning
# install simulator
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install warp-lang
pip install jax[cuda12]
pip install playground --extra-index-url=https://py.mujoco.org --extra-index-url=https://pypi.nvidia.com/warp-lang/
pip install imageio imageio-ffmpeg
# install alg and logging libs
pip install rsl-rl-lib==2.3.3 # have to anchor 2.3.x for compatibility
pip install stable-baselines3[extra]
pip install wandb tensorboard
pip install seaborn scikit-learnNote: If you are testing locally on a machine without an Nvidia gpu, run the following commands instead of their counterparts above:
pip install torch torchvision
pip install jax
pip install playgroundWith all the depedencies installed, run editable installation on the wl_learning package:
pip install -e .For local training with PPO, use the train_rsl_rl.py script under scripts folder. An example is provided below.
python scripts/train_rsl_rl.py \
--env_name WheelLegFlat \
--num_envs 4096 \
--use_wandb \
--wandb_entity RoboRambler \
--suffix PACE_Test \
--camera trackFor local training with DDPG, use the train_wl_ddpg.py script under the scripts folder. An example is provided below.
python scripts/train_rsl_rl.py \
--env_name WheelLegFlat \
--num_envs 200 \
--use_wandb \
--wandb_entity RoboRambler \
--suffix PACE_Test \For cluster training, set up your relatives dirs on PACE. Then use push_jobs.sh script under scripts folder. An example is provided below (replace <algo> with PPO or DDPG).
./scripts/push_job.sh <algo> \
--env_name WheelLegFlat \
--num_envs 8192 \
--use_wandb \
--wandb_entity RoboRambler \
--suffix PACE_Test \
--camera track \
--video --video_interval 100To view a Mujoco visualization of a model checkpoint from wandb for DDPG, use the play_wandb_ddpg.py script under the scripts folder. An example is provided below.
python scripts/play_wandb_ddpg.py \
--env_name WheelLegFlat \
--wandb_entity RoboRambler \
--wandb_project mjxrl_ddpg \
--wandb_runid y53v2yyc \You can find the runid in the Overview section of the dashboard of a run in WandB.
Note: If you are on MacOS, use mjpython instead of python