Human2Sim2Robot: Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration
teaser.mov
This repository contains the official implementation of the Human2Sim2Robot framework, which was introduced in Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration. It consists of:
-
Real-to-Sim: Digital twin creation for the robot, environment, and objects.
-
Human Video Demonstration: Data collection and processing.
-
Simulation-based Policy Learning: RL training in simulation to learn dexterous manipulation policies.
-
Sim-to-Real: Policy deployment in the real world.
human2sim2robot
├── assets
│ └── // Assets such as robot URDF files, object models, etc.
├── data
│ └── // Store raw video demos, processed demo data, etc.
├── docs
│ └── // Documentation
├── pretrained_models
│ └── // Pretrained models
├── runs
│ └── // Simulation-based policy learning training runs
└── human2sim2robot
├── // Source code
└── hardware_deployment
│ └── // Sim-to-real hardware deployment of policy
└── human_demo
│ └── // Human video demo collection and processing
└── ppo
│ └── // PPO training and evaluation
└── real_to_sim
│ └── // Real-to-sim digital twin creation
└── sim_training
│ └── // Simulation-based policy learning
└── utils
└── // Shared utilities
Please see the Installation documentation for more details.
Please run all commands from the root directory of this repository.
For most scripts in this codebase, you can add --help
after the script filepath to get the commandline arguments.
You can run commands even faster with tab completion. Run this to get tab completion for most scripts in this codebase. This is experimental and only works for zsh
(not bash
). This is just a small quality of life improvement, so feel free to skip it!
One-time run to set up tab completion (takes a minute):
python human2sim2robot/utils/setup_zsh_tab_completion.py
Run this in every new session (or put in ~/.zshrc
with the proper path):
fpath+=`pwd`/.zsh_tab_completion
autoload -Uz compinit && compinit
Now, for most scripts in the database, you will have tab autocompletion. For example:
python human2sim2robot/utils/download.py \
-
After the script filepath, you can enter a -
and then press "TAB" to get auto-completion!
Note that this does not apply for the sim training script, which uses hydra
.
For the sim training script, enable tab completion of commandline arguments by running the following (or putting this into your .zshrc
with the proper path):
eval "$(python human2sim2robot/sim_training/run.py -sc install=bash)"
For example:
python human2sim2robot/sim_training/run.py \
te
You can now press "TAB" to get auto-completion!
WARNING: These two tab completions seem to conflict with each other. Thus, only use one at a time. Sorry :(
Fill out this form to download the Human2Sim2Robot assets, data, and pretrained models. After filling out this form, you will receive a URL <download_url>
, which will be used next.
NOTE: Navigating to <download_url>
will take you to a forbidden page. This is expected. We use it in the steps below by setting this environment variable:
export DOWNLOAD_URL=<download_url>
First, we will download the assets, human demo raw data (all tasks), human demo processed data (all tasks), and pretrained models:
python human2sim2robot/utils/download.py \
--download_url ${DOWNLOAD_URL} \
--include_assets \
--all_tasks \
--include_human_demo_raw_data \
--include_human_demo_processed_data \
--include_pretrained_models
This should create folders for assets
, data/human_demo_raw_data/
, data/human_demo_processed_data/
, and pretrained_models
. To choose a specific task, you can replace --all_tasks
with --task_name <TASK_NAME>
(e.g., snackbox_push
, plate_pivotliftrack
).
The goal of this section is to create a digital twin of the robot, objects, camera, and environment. The details for this section can be found in the Real-to-Sim documentation.
Note that you can skip this section and simply use the provided digital twins in assets/
.
The goal of this section is to collect human video demonstrations of a task. The details for this section can be found in the Human Demonstration documentation.
Note that you can skip this section and simply use the provided data in data/human_demo_raw_data/
and data/human_demo_processed_data/
.
For example:
export TASK_NAME=snackbox_push # You can replace this with your task name (plate_pivotliftrack)
export OBJ_PATH=assets/kiri/snackbox/snackbox.obj # You can replace this with your object obj path (assets/ikea/plate/plate.obj)
export FPS=30 # You can replace this with your FPS
You should have the following structure:
data/
├── human_demo_processed_data
│ └── $TASK_NAME
│ ├── hand_pose_trajectory
│ ├── object_pose_trajectory
│ └── retargeted_robot.npz
└── human_demo_raw_data
└── $TASK_NAME
├── depth
├── rgb
├── masks
├── hand_masks
└── cam_K.txt
You can run the following to create videos to visualize the data:
export VIDEOS_DIR=data/human_demo_videos/$TASK_NAME
mkdir -p $VIDEOS_DIR
# RGB
ffmpeg \
-framerate $FPS \
-i data/human_demo_raw_data/$TASK_NAME/rgb/%05d.png \
-c:v libx264 \
-pix_fmt yuv420p \
$VIDEOS_DIR/rgb.mp4
# Depth
ffmpeg \
-framerate $FPS \
-i data/human_demo_raw_data/$TASK_NAME/depth/%05d.png \
-c:v libx264 \
-pix_fmt yuv420p \
$VIDEOS_DIR/depth.mp4
# Object Masks
ffmpeg \
-framerate $FPS \
-i data/human_demo_raw_data/$TASK_NAME/masks/%05d.png \
-c:v libx264 \
-pix_fmt yuv420p \
$VIDEOS_DIR/masks.mp4
# Hand Masks
ffmpeg \
-framerate $FPS \
-i data/human_demo_raw_data/$TASK_NAME/hand_masks/%05d.png \
-c:v libx264 \
-pix_fmt yuv420p \
$VIDEOS_DIR/hand_masks.mp4
# Object Pose Trajectory
ffmpeg \
-framerate $FPS \
-i data/human_demo_processed_data/$TASK_NAME/object_pose_trajectory/track_vis/%05d.png \
-c:v libx264 \
-pix_fmt yuv420p \
$VIDEOS_DIR/object_pose_trajectory.mp4
# Hand Pose Trajectory
ffmpeg \
-framerate $FPS \
-i data/human_demo_processed_data/$TASK_NAME/hand_pose_trajectory/%05d.png \
-c:v libx264 \
-pix_fmt yuv420p \
$VIDEOS_DIR/hand_pose_trajectory.mp4
Human2Sim2Robot_Example_nicer.mp4
Visualize the human demonstration in a 3D viewer (with only hand keypoints instead of full hand meshes):
python human2sim2robot/human_demo/visualize_demo.py \
--obj-path $OBJ_PATH \
--object-poses-dir data/human_demo_processed_data/$TASK_NAME/object_pose_trajectory/ob_in_cam \
--hand-poses-dir data/human_demo_processed_data/$TASK_NAME/hand_pose_trajectory/ \
--visualize-table
Snackbox_Push_Demo_Data.mp4
Visualize the human demonstration in a 3D viewer (with full hand meshes, which is slower to load):
python human2sim2robot/human_demo/visualize_demo.py \
--obj-path $OBJ_PATH \
--object-poses-dir data/human_demo_processed_data/$TASK_NAME/object_pose_trajectory/ob_in_cam \
--hand-poses-dir data/human_demo_processed_data/$TASK_NAME/hand_pose_trajectory/ \
--visualize-table \
--visualize-hand-meshes # This makes things slower to load, but you can see the full hand mesh
Snackbox_Push_Demo_Data_Hand.mp4
Visualize the human demonstration and the robot retargeted trajectory in a 3D viewer:
python human2sim2robot/human_demo/visualize_demo.py \
--obj-path $OBJ_PATH \
--object-poses-dir data/human_demo_processed_data/$TASK_NAME/object_pose_trajectory/ob_in_cam \
--hand-poses-dir data/human_demo_processed_data/$TASK_NAME/hand_pose_trajectory/ \
--visualize-table \
--robot-file iiwa_allegro.yml \
--retargeted-robot-file data/human_demo_processed_data/$TASK_NAME/retargeted_robot.npz \
--visualize-transforms
Snackbox_Push_Demo_Robot.mp4
snackbox_push_sim.mp4
ours.mp4
The goal of this section is to train a policy in simulation using the processed human demonstration data and evaluate these policies in simulation. The details for this section can be found in the Sim Training documentation.
For this step, you need data in data/human_demo_processed_data/
as described above.
If you have a pre-trained policy in the following directory:
export PRETRAINED_POLICY_DIR=pretrained_models/Experiment_snackbox_push_2025-03-28_10-33-38-374687
$PRETRAINED_POLICY_DIR
├── config_resolved.yaml
├── config.yaml
├── log.txt
└── nn
└── best.pth
To test this pre-trained policy, run the following:
python human2sim2robot/sim_training/run.py \
--config-path ../../$PRETRAINED_POLICY_DIR \
--config-name config \
device_id=0 \
graphics_device_id=0 \
test=True \
num_envs=1 \
train.player.deterministic=False \
headless=False \
task.sim.enable_viewer_sync_at_start=True \
task.env.custom.enableDebugViz=False \
task.randomize=False \
checkpoint=$PRETRAINED_POLICY_DIR/nn/best.pth
Note that we have the ../../
because when we specify this config path as a relative path, it needs to be relative to the sim_training
directory. If you specify the config path as an absolute path, you can remove the ../../
.
You can also run this other pretrained policy:
export PRETRAINED_POLICY_DIR=pretrained_models/Experiment_plate_pivotliftrack_2025-03-28_17-08-54-978176
From the simulator, you can press "R" to reset the environment and "E" to visualize debug visualizations.
To train a policy, run the following:
export URDF_PATH=assets/kiri/snackbox/snackbox.urdf # You can replace this with your object urdf path
export TASK_NAME=snackbox_push # You can replace this with your task name
python human2sim2robot/sim_training/run.py \
task=CrossEmbodiment \
train=CrossEmbodimentPPOLSTM \
experiment=Experiment_$TASK_NAME \
wandb_group=my_wandb_group \
task.env.custom.ENABLE_FABRIC_COLLISION_AVOIDANCE=True \
task.env.custom.FABRIC_CSPACE_DAMPING=65 \
task.env.custom.FABRIC_CSPACE_DAMPING_HAND=5 \
task.env.controlFrequencyInv=4 \
task.env.custom.USE_CUROBO=True \
task.randomize=True \
randomization_params=RandomizationParams_medium \
task.env.custom.object_urdf_path=$URDF_PATH \
task.env.custom.retargeted_robot_file=data/human_demo_processed_data/$TASK_NAME/retargeted_robot.npz \
task.env.custom.object_poses_dir=data/human_demo_processed_data/$TASK_NAME/object_pose_trajectory/ob_in_cam \
headless=True \
device_id=0 \
graphics_device_id=0
You can remove task.randomize=True
and randomization_params=RandomizationParams_medium
if you want to train a policy without domain randomization.
This will train a policy and save the results in runs/<experiment>_<timestamp>
, and this policy can be tested similar to above, but change:
export PRETRAINED_POLICY_DIR=runs/<experiment>_<timestamp>
You can also train using this different task:
export URDF_PATH=assets/ikea/plate/plate.urdf # You can replace this with your object urdf path
export TASK_NAME=plate_pivotliftrack # You can replace this with your task name
If you are using wandb, you can easily test a policy you see there by opening the associated run, go to "Overview", and then copy the command. Then go to "Files", then "runs", then <experiment>_<timestamp>
, then "nn", then "best.pth", and then copy the URL. We put this together and add/remove some key arguments for testing.
test=True \
num_envs=1 \
train.player.deterministic=False \
headless=False \
task.sim.enable_viewer_sync_at_start=True \
task.env.custom.enableDebugViz=False \
checkpoint="https://wandb.ai/tylerlum/cross_embodiment/groups/my_wandb_group/files/runs/Experiment_snackbox_push_2025-03-27_11-16-39-387984/nn/best.pth\?runName\=Experiment_snackbox_push_2025-03-27_11-16-39-387984_v491d1sj"
Note that the URL is surrounded by double quotes, and the "?" and "=" are escaped with "" (done automatically by the command line if you copy and paste the URL alone).
For full example:
python human2sim2robot/sim_training/run.py \
task=CrossEmbodiment \
train=CrossEmbodimentPPOLSTM \
experiment=Experiment_snackbox_push wandb_group=my_wandb_group \
task.env.custom.ENABLE_FABRIC_COLLISION_AVOIDANCE=True \
task.env.custom.FABRIC_CSPACE_DAMPING=65 \
task.env.custom.FABRIC_CSPACE_DAMPING_HAND=5 \
task.env.controlFrequencyInv=4 \
task.env.custom.USE_CUROBO=True \
task.randomize=True \
randomization_params=RandomizationParams_medium \
task.env.custom.object_urdf_path=assets/kiri/snackbox/snackbox.urdf \
task.env.custom.retargeted_robot_file=data/human_demo_processed_data/snackbox_push/retargeted_robot.npz \
task.env.custom.object_poses_dir=data/human_demo_processed_data/snackbox_push/object_pose_trajectory/ob_in_cam \
test=True \
num_envs=1 \
train.player.deterministic=False \
headless=False \
task.sim.enable_viewer_sync_at_start=True \
task.env.custom.enableDebugViz=False \
checkpoint="https://wandb.ai/tylerlum/cross_embodiment/groups/my_wandb_group/files/runs/Experiment_snackbox_push_2025-03-27_11-16-39-387984/nn/best.pth\?runName\=Experiment_snackbox_push_2025-03-27_11-16-39-387984_v491d1sj"
The goal of this section is to deploy the policy in the real world. The details for this section can be found in the Hardware Deployment documentation. For this step, you need to have a policy in $PRETRAINED_POLICY_DIR
similar to above.
NOTE: The purpose of this documentation is NOT to be super precise and detailed, but rather to be a quick reference for how the hardware deployment roughly works.
This is the RL policy we trained in simulation.
flowchart LR
subgraph "Inputs"
A["/object_pose"]
B["/goal_object_pose"]
C["/iiwa/joint_states"]
D["/allegroHand_0/joint_states"]
E["/fabric_state"]
end
RL["rl_policy_node"]
subgraph "Outputs"
F["/palm_target"]
G["/hand_target"]
end
A --> RL
B --> RL
C --> RL
D --> RL
E --> RL
RL --> F
RL --> G
/object_pose
is the current 6D pose of the object to be manipulated/goal_object_pose
is the desired 6D pose of the object/iiwa/joint_states
provides the current joint states of the IIWA arm/allegroHand_0/joint_states
provides the current joint states of the Allegro hand/fabric_state
is the joint state of the geometric fabric controller/palm_target
is a 6D vector (xyz position and euler RPY) for the palm that the robot should move towards, used by the geometric fabric controller/hand_target
is a 5D vector controlling the robot hand, derived from PCA dimensionality reduction of the 16 joint angles to 5 eigengrasps
This is the geometric fabric controller, which should run similarly to the one used during training.
flowchart LR
subgraph "Inputs"
A["/palm_target"]
B["/hand_target"]
C["/iiwa/joint_states"]
D["/allegroHand_0/joint_states"]
end
IAF["fabric_ros_node"]
subgraph "Outputs"
E["/fabric_state"]
F["/iiwa/joint_cmd"]
G["/allegroHand_0/joint_cmd"]
end
A --> IAF
B --> IAF
C --> IAF
D --> IAF
IAF --> E
IAF --> F
IAF --> G
/palm_target
is the 6D target pose (xyz position and euler RPY) for the palm from the RL policy/hand_target
is the 5D eigengrasp vector for controlling the Allegro hand from the RL policy/iiwa/joint_states
provides the current joint states of the IIWA arm/allegroHand_0/joint_states
provides the current joint states of the Allegro hand/fabric_state
publishes the current state of the geometric fabric controller/iiwa/joint_cmd
outputs the joint position targets for the IIWA arm (not published ifPUBLISH_CMD = False
)/allegroHand_0/joint_cmd
outputs the joint position targets for the Allegro hand (not published ifPUBLISH_CMD = False
)
The geometric fabric controller typically runs at 60Hz, but the robot arm and hand can run at 200-1000Hz. This node upsamples the fabric state positions (otherwise the 60Hz targets would be like "step changes"). This is visualized below.
flowchart LR
subgraph "Inputs"
A["/fabric_state"]
end
FU["fabric_upsampler_ros_node"]
subgraph "Outputs"
B["/iiwa/joint_cmd"]
C["/allegroHand_0/joint_cmd"]
end
A --> FU
FU --> B
FU --> C
/fabric_state
is the joint state of the geometric fabric controller at the standard rate (60Hz)/iiwa/joint_cmd
outputs upsampled joint position targets for the IIWA arm (at 200Hz or 1000Hz)/allegroHand_0/joint_cmd
outputs upsampled joint position targets for the Allegro hand (at 200Hz or 1000Hz)- When using this node, set
PUBLISH_CMD = False
in thefabric_ros_node
to avoid conflicting commands
Below, we show what the visualizer looks like:
Plate_Push_RealWorldVisualization_Trimmed.mp4
ruff check --extend-select I --fix human2sim2robot; ruff format human2sim2robot
@inproceedings{TODO,
title = {TODO},
author = {TODO},
booktitle = {TODO},
year = {TODO},
url = {TODO}
}
If you have any questions, issues, or feedback, please contact Tyler Lum.