This repository provides a custom Gym-like reinforcement learning (RL) environment built using NVIDIA Warp, a highly efficient framework for GPU-accelerated simulations. The environment is designed for fast, scalable, and high-performance training of RL agents in physics-based tasks.
-
Clone the WarpRL repository. To clone the repository, run the following command:
git clone [email protected]:makolon/WarpRL.git -
Create a
.envfile under thedockerdirectory with the following content.
###
# General settings
###
# WarpRL version
WARPRL_VERSION=0.1.0
# WarpRL default path
WARPRL_PATH=/workspace/warprl
# Docker user directory - by default this is the root user's home directory
DOCKER_USER_HOME=/root
-
Start the Docker container. To start the Docker container, run:
docker-compose -p warprl_docker run warprl -
Install Python packages using uv. Inside the container, install the required Python packages:
uv sync -
Install the pwm package. Still inside the container, install the pwm package with:
uv pip install -e . -
Run the sample code. You can run a sample code using the following command:
uv run python -m warp.examples.optim.example_bounce -
Run the training code. To execute the training script, use the following command:
uv run scripts/train_warp.py env=warp_ant alg=shac
This software includes components derived from NVIDIA Warp, which are licensed under the NVIDIA Software License Agreement. Users must comply with NVIDIA's licensing terms for any redistribution or modification of this software.
WarpRL's development has been made possible thanks to these open-source projects:
- AHAC: An Adaptive Horizon Actor-Critic algorithm designed to optimize policies in contact-rich environments by dynamically adjusting model-based horizons, outperforming SHAC and PPO on high-dimensional locomotion tasks.
- SHAC: A GPU-accelerated policy learning method using differentiable simulation to solve robotic control tasks, enabling faster and more effective learning through parallelized simulation.
- PWM: A Model-Based RL algorithm leveraging large multi-task world models to efficiently learn continuous control policies with first-order gradients, achieving superior performance on complex locomotion tasks without relying on ground-truth simulation dynamics.