This folder contains the source code for the Isaac lab task. It is at a state where the environment is almost ready but the training pipeline is not yet functional as it needs some tweaks.
We are using Isaac Lab (previously known as Isaac Orbit) which is a wrapper for Isaac sim with RL oriented features. The simulator was developed by NVIDIA and provides high performance and fidelity simulation with real time ray tracing, allowing us to run parallelized environments exclusively on GPU.
- NVIDIA RTX GPU with CUDA 11.0 or higher and at least 12GB of VRAM
- Ubuntu 22.04 or higher
- High-ish end CPU
The simulator allows for native ROS2 support and provides a set of sensors, including RGBD cameras, IMU, and LIDAR to mention a few. While learning how to use the simulator, we have been able to create a simple environment with a GO2 robot controlled using a trained low level policy and ROS2 teleop, we were also able to get sensor data such as Lidar (note: these visible points are for debugging).
This is shown in the video below (click).
It is also possible to create more complex tasks such as the one shown in the video below where an agent has to navigate
through rough randomized terrain using force and lazer sensors as shown in the video below (click).
We started by creating a simple room that will be used as the environment for the task. It has lighting and multiple groups of meshes
for example ceiling, floor, wall_wood, wall_reg... which will allow for coherent randomization when we want to train (for example materials and textures).
We also created a model for the lab room.
We can then spawn a model for the robot that is modeled by a cuboid of same dimensions as the Go2 with an RGBD camera
attached to it and a red ball. We made it so that they spawn in random locations in the room each time the environment is reset.
(scene config)
We were able to control out robot using both distance based and velocity based commands, the below shows the robot moving
using distance commands (KI controller), however we opted for velocity commands in the end as they are closer to the actual robot's API.
(actions config) (video)
The observations, rewards and terminations of the environment can be found in the base_env_setup file.
The input to the agent is a 4D tensor of shape (128, 128, 4) (rgbd), which is perfect for a CNN based policy. The group 'sim' is used internally for
example for reward generation and terminations. observations mdp
The rewards are an intermediary reward for being close to the ball (l2 distance smaller than a threshold) and a final
reward for being oriented correctly.
Negative rewards are also used to avoid certain situations like long runs and wall slides etc... A termination command
has not yet been taken into consideration in the rewards design but is a good idea for real robots. rewards mdp
The rest of the Managers look like the following:
Training in isaac lab is done through wrapping the environment in multiple wrappers. Starting with registering it as a
gymnasium environment (previously known as OpenAI gym). This is done in the
init.py file.
Afterward, the environment can be wrapped for video recording, logging, etc...
And finally the environment is wrapped in a training framework such as Stable Baselines3
or rl_games. The configurations can be found in the agents folder and
an example pipeline for sb3 in the train.py file.
We can train a lot of instances in parallel (order of 1000s depending on amount of graphics memory and environment) while
using headless mode which disables full visual rendering and only keeps the required buffers in VRAM and the training
happens on the order of 10s of thousands of FPS.
Below is a video example with 8 environments with rendering on.
We were not able to get a final model due to time constraints, but we tried to train a model using the PPO algorithm with both MLP and CNN architectures which have configs defined in the agents folder.
Using tensorboard to visualize the training, we can see that the model is minimizing the loss but the rewards were not
increasing as expected, instead they were decreasing with time which needs further investigation.
We also faced last minute problems with mesh collisions which were not working properly which led to performance and
scaling problems while training.