Skip to content

Latest commit

 

History

History

isaac

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Isaac SIM task

This folder contains the source code for the Isaac lab task. It is at a state where the environment is almost ready but the training pipeline is not yet functional as it needs some tweaks.

Simulator

We are using Isaac Lab (previously known as Isaac Orbit) which is a wrapper for Isaac sim with RL oriented features. The simulator was developed by NVIDIA and provides high performance and fidelity simulation with real time ray tracing, allowing us to run parallelized environments exclusively on GPU.

Requirements

  • NVIDIA RTX GPU with CUDA 11.0 or higher and at least 12GB of VRAM
  • Ubuntu 22.04 or higher
  • High-ish end CPU

Examples of possible tasks

The simulator allows for native ROS2 support and provides a set of sensors, including RGBD cameras, IMU, and LIDAR to mention a few. While learning how to use the simulator, we have been able to create a simple environment with a GO2 robot controlled using a trained low level policy and ROS2 teleop, we were also able to get sensor data such as Lidar (note: these visible points are for debugging).

This is shown in the video below (click).
lidar_go2

It is also possible to create more complex tasks such as the one shown in the video below where an agent has to navigate through rough randomized terrain using force and lazer sensors as shown in the video below (click).
rough_terrain

Our task

Scene

We started by creating a simple room that will be used as the environment for the task. It has lighting and multiple groups of meshes for example ceiling, floor, wall_wood, wall_reg... which will allow for coherent randomization when we want to train (for example materials and textures).
basic_room

We also created a model for the lab room.


We can then spawn a model for the robot that is modeled by a cuboid of same dimensions as the Go2 with an RGBD camera attached to it and a red ball. We made it so that they spawn in random locations in the room each time the environment is reset. (scene config)


Control

We were able to control out robot using both distance based and velocity based commands, the below shows the robot moving using distance commands (KI controller), however we opted for velocity commands in the end as they are closer to the actual robot's API. (actions config) (video)
distance_control

Reinforcement learning environment

The observations, rewards and terminations of the environment can be found in the base_env_setup file.

The input to the agent is a 4D tensor of shape (128, 128, 4) (rgbd), which is perfect for a CNN based policy. The group 'sim' is used internally for example for reward generation and terminations. observations mdp
observations

The rewards are an intermediary reward for being close to the ball (l2 distance smaller than a threshold) and a final reward for being oriented correctly. Negative rewards are also used to avoid certain situations like long runs and wall slides etc... A termination command has not yet been taken into consideration in the rewards design but is a good idea for real robots. rewards mdp
rewards

The rest of the Managers look like the following:


Training pipeline

Training in isaac lab is done through wrapping the environment in multiple wrappers. Starting with registering it as a gymnasium environment (previously known as OpenAI gym). This is done in the init.py file.
Afterward, the environment can be wrapped for video recording, logging, etc...
And finally the environment is wrapped in a training framework such as Stable Baselines3 or rl_games. The configurations can be found in the agents folder and an example pipeline for sb3 in the train.py file.

We can train a lot of instances in parallel (order of 1000s depending on amount of graphics memory and environment) while using headless mode which disables full visual rendering and only keeps the required buffers in VRAM and the training happens on the order of 10s of thousands of FPS.

Below is a video example with 8 environments with rendering on.
envs

Results

We were not able to get a final model due to time constraints, but we tried to train a model using the PPO algorithm with both MLP and CNN architectures which have configs defined in the agents folder.

Using tensorboard to visualize the training, we can see that the model is minimizing the loss but the rewards were not increasing as expected, instead they were decreasing with time which needs further investigation.

We also faced last minute problems with mesh collisions which were not working properly which led to performance and scaling problems while training.
incompatible mesh