+
+
+
\ No newline at end of file
diff --git a/docs/data/0.json b/docs/data/0.json
new file mode 100644
index 00000000..06279d31
--- /dev/null
+++ b/docs/data/0.json
@@ -0,0 +1,544 @@
+{
+ "0": {
+ "file_id": 0,
+ "content": "/README.md",
+ "type": "filepath"
+ },
+ "1": {
+ "file_id": 0,
+ "content": "This code includes ACT, Diffusion Policy, and VINN implementations with two simulated environments, installation instructions for dependencies and environment, demo scripts, data generation and visualization guides, training tips, and expected success rate evaluation.",
+ "type": "summary"
+ },
+ "2": {
+ "file_id": 0,
+ "content": "# Imitation Learning algorithms and Co-training for Mobile ALOHA\n#### Project Website: https://mobile-aloha.github.io/\nThis repo contains the implementation of ACT, Diffusion Policy and VINN, together with 2 simulated environments:\nTransfer Cube and Bimanual Insertion. You can train and evaluate them in sim or real.\nFor real, you would also need to install [Mobile ALOHA](https://github.com/MarkFzp/mobile-aloha). This repo is forked from the [ACT repo](https://github.com/tonyzhaozh/act).\n### Updates:\nYou can find all scripted/human demo for simulated environments [here](https://drive.google.com/drive/folders/1gPR03v05S1xiInoVJn7G7VJ9pDCnxq9O?usp=share_link).\n### Repo Structure\n- ``imitate_episodes.py`` Train and Evaluate ACT\n- ``policy.py`` An adaptor for ACT policy\n- ``detr`` Model definitions of ACT, modified from DETR\n- ``sim_env.py`` Mujoco + DM_Control environments with joint space control\n- ``ee_sim_env.py`` Mujoco + DM_Control environments with EE space control\n- ``scripted_policy.py`` Scripted policies for sim environments",
+ "type": "code",
+ "location": "/README.md:1-20"
+ },
+ "3": {
+ "file_id": 0,
+ "content": "This code contains the implementation of ACT, Diffusion Policy, and VINN along with two simulated environments (Transfer Cube and Bimanual Insertion) that can be trained and evaluated in sim or real settings. It also requires installing Mobile ALOHA from a separate repository, which has been forked from the ACT repo. The code is organized into several Python files, each responsible for specific aspects of the algorithms or environments. Demo scripts for simulated environments are available online.",
+ "type": "comment"
+ },
+ "4": {
+ "file_id": 0,
+ "content": "- ``constants.py`` Constants shared across files\n- ``utils.py`` Utils such as data loading and helper functions\n- ``visualize_episodes.py`` Save videos from a .hdf5 dataset\n### Installation\n conda create -n aloha python=3.8.10\n conda activate aloha\n pip install torchvision\n pip install torch\n pip install pyquaternion\n pip install pyyaml\n pip install rospkg\n pip install pexpect\n pip install mujoco==2.3.7\n pip install dm_control==1.0.14\n pip install opencv-python\n pip install matplotlib\n pip install einops\n pip install packaging\n pip install h5py\n pip install ipython\n cd act/detr && pip install -e .\n- also need to install https://github.com/ARISE-Initiative/robomimic/tree/r2d2 (note the r2d2 branch) for Diffusion Policy by `pip install -e .`\n### Example Usages\nTo set up a new terminal, run:\n conda activate aloha\n cd \n### Simulated experiments (LEGACY table-top ALOHA environments)\nWe use ``sim_transfer_cube_scripted`` task in the examples below. Another option is ``sim_insertion_scripted``.",
+ "type": "code",
+ "location": "/README.md:21-57"
+ },
+ "5": {
+ "file_id": 0,
+ "content": "This code provides installation instructions for the environment and dependencies needed to run the ALOHA codebase. It also mentions the necessary steps to set up a new terminal and highlights some of the available simulation experiments.",
+ "type": "comment"
+ },
+ "6": {
+ "file_id": 0,
+ "content": "To generated 50 episodes of scripted data, run:\n python3 record_sim_episodes.py --task_name sim_transfer_cube_scripted --dataset_dir --num_episodes 50\nTo can add the flag ``--onscreen_render`` to see real-time rendering.\nTo visualize the simulated episodes after it is collected, run\n python3 visualize_episodes.py --dataset_dir --episode_idx 0\nNote: to visualize data from the mobile-aloha hardware, use the visualize_episodes.py from https://github.com/MarkFzp/mobile-aloha\nTo train ACT:\n # Transfer Cube task\n python3 imitate_episodes.py --task_name sim_transfer_cube_scripted --ckpt_dir --policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 --num_epochs 2000 --lr 1e-5 --seed 0\nTo evaluate the policy, run the same command but add ``--eval``. This loads the best validation checkpoint.\nThe success rate should be around 90% for transfer cube, and around 50% for insertion.\nTo enable temporal ensembling, add flag ``--temporal_agg``.",
+ "type": "code",
+ "location": "/README.md:58-77"
+ },
+ "7": {
+ "file_id": 0,
+ "content": "This code provides instructions for generating and visualizing data, training the ACT model, and evaluating its performance. It also mentions the expected success rates for different tasks and includes an option for temporal ensembling.",
+ "type": "comment"
+ },
+ "8": {
+ "file_id": 0,
+ "content": "Videos will be saved to ```` for each rollout.\nYou can also add ``--onscreen_render`` to see real-time rendering during evaluation.\nFor real-world data where things can be harder to model, train for at least 5000 epochs or 3-4 times the length after the loss has plateaued.\nPlease refer to [tuning tips](https://docs.google.com/document/d/1FVIZfoALXg_ZkYKaYVh-qOlaXveq5CtvJHXkY25eYhs/edit?usp=sharing) for more info.\n### [ACT tuning tips](https://docs.google.com/document/d/1FVIZfoALXg_ZkYKaYVh-qOlaXveq5CtvJHXkY25eYhs/edit?usp=sharing)\nTL;DR: if your ACT policy is jerky or pauses in the middle of an episode, just train for longer! Success rate and smoothness can improve way after loss plateaus.",
+ "type": "code",
+ "location": "/README.md:78-85"
+ },
+ "9": {
+ "file_id": 0,
+ "content": "This code snippet provides instructions for saving videos to a specified directory during rollouts, and suggests using the \"--onscreen_render\" option for real-time rendering. It recommends training for at least 5000 epochs or three to four times the length of data after loss plateaus for better results in real-world scenarios. The code also provides a link to additional tuning tips for further information and emphasizes that longer training can improve success rate and smoothness even when the loss has plateaued.",
+ "type": "comment"
+ },
+ "10": {
+ "file_id": 1,
+ "content": "/__init__.py",
+ "type": "filepath"
+ },
+ "11": {
+ "file_id": 1,
+ "content": "The code snippet appears to be incomplete or empty. There is no visible functionality that can be described or commented upon in this context. Please provide more information or a complete code sample for accurate analysis and commenting.",
+ "type": "summary"
+ },
+ "12": {
+ "file_id": 1,
+ "content": "w",
+ "type": "code",
+ "location": "/__init__.py:1-1"
+ },
+ "13": {
+ "file_id": 1,
+ "content": "The code snippet appears to be incomplete or empty. There is no visible functionality that can be described or commented upon in this context. Please provide more information or a complete code sample for accurate analysis and commenting.",
+ "type": "comment"
+ },
+ "14": {
+ "file_id": 2,
+ "content": "/align.py",
+ "type": "filepath"
+ },
+ "15": {
+ "file_id": 2,
+ "content": "This code imports modules, defines a calibration function for head cam and symmetrical arms, creates instances of InterbotixManipulatorXS bots, sets arm positions to sleep for 2 seconds, and opens grippers.",
+ "type": "summary"
+ },
+ "16": {
+ "file_id": 2,
+ "content": "from interbotix_xs_modules.arm import InterbotixManipulatorXS\nfrom aloha_scripts.robot_utils import move_arms, torque_on, move_grippers\nfrom constants import PUPPET_GRIPPER_JOINT_OPEN, PUPPET_GRIPPER_JOINT_CLOSE\nimport argparse\nimport numpy as np\n# for calibrating head cam and arms being symmetrical\ndef main():\n argparser = argparse.ArgumentParser()\n argparser.add_argument('--all', action='store_true', default=False)\n args = argparser.parse_args()\n puppet_bot_left = InterbotixManipulatorXS(robot_model=\"vx300s\", group_name=\"arm\", gripper_name=\"gripper\", robot_name=f'puppet_left', init_node=True)\n puppet_bot_right = InterbotixManipulatorXS(robot_model=\"vx300s\", group_name=\"arm\", gripper_name=\"gripper\", robot_name=f'puppet_right', init_node=False)\n all_bots = [puppet_bot_left, puppet_bot_right]\n for bot in all_bots:\n torque_on(bot)\n multiplier = np.array([-1, 1, 1, -1, 1, 1])\n puppet_sleep_position_left = np.array([-0.8, -0.5, 0.5, 0, 0.65, 0])\n puppet_sleep_position_right = puppet_sleep_position_left * multiplier",
+ "type": "code",
+ "location": "/align.py:1-23"
+ },
+ "17": {
+ "file_id": 2,
+ "content": "Code imports necessary modules and defines a function for calibrating head cam and symmetrical arms. It creates instances of InterbotixManipulatorXS for left and right puppet bots, turns on torque, and initializes positions based on multipliers for symmetry.",
+ "type": "comment"
+ },
+ "18": {
+ "file_id": 2,
+ "content": " all_positions = [puppet_sleep_position_left, puppet_sleep_position_right]\n move_arms(all_bots, all_positions, move_time=2)\n # move_grippers(all_bots, [PUPPET_GRIPPER_JOINT_OPEN] * 2, move_time=1) # open\nif __name__ == '__main__':\n main()",
+ "type": "code",
+ "location": "/align.py:24-31"
+ },
+ "19": {
+ "file_id": 2,
+ "content": "Sets all bots' arm positions to sleep positions for 2 seconds, then opens grippers.",
+ "type": "comment"
+ },
+ "20": {
+ "file_id": 3,
+ "content": "/commands.txt",
+ "type": "filepath"
+ },
+ "21": {
+ "file_id": 3,
+ "content": "The code trains RL models, preprocesses data, and experiments with hyperparameters. It creates a Conda environment, trains multi-task camera views for mobile chair tasks, caches features, evaluates VINN model, and uses separate dataset directories and checkpoints.",
+ "type": "summary"
+ },
+ "22": {
+ "file_id": 3,
+ "content": "conda activate mimic\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\npython3 imitate_episodes.py \\\n--task_name sim_transfer_cube_human \\\n--ckpt_dir /scr/tonyzhao/train_logs/vq_test \\\n--policy_class ACT --kl_weight 10 --chunk_size 100 \\\n--hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \\\n--num_epochs 10000 --lr 1e-5 --seed 0 --vq\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name all \\\n--ckpt_dir /scr/tonyzhao/train_logs/pretrain_all \\\n--policy_class ACT --kl_weight 10 --chunk_size 50 \\\n--hidden_dim 512 --batch_size 24 --dim_feedforward 3200 --num_epochs 5000 --lr 1e-4 --seed 0\n#### NOTE to reproduce this experiment, uncomment the sim data filtering in utils.py\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name all \\\n--ckpt_dir /scr/tonyzhao/train_logs/pretrain_all \\\n--policy_class ACT --kl_weight 10 --chunk_size 50 \\\n--hidden_dim 512 --batch_size 24 --dim_feedforward 3200 --lr 1e-4 --seed 0 \\",
+ "type": "code",
+ "location": "/commands.txt:2-28"
+ },
+ "23": {
+ "file_id": 3,
+ "content": "This code activates a conda environment, sets up some environment variables, and then runs Python scripts with different parameters for model training and experimentation. It seems to be related to reinforcement learning tasks using the MUJOCO library. The code executes multiple experiments with varying hyperparameters to train and evaluate models on different datasets or tasks.",
+ "type": "comment"
+ },
+ "24": {
+ "file_id": 3,
+ "content": "--num_steps 1000000 --eval_every 10000000000 --validate_every 2000 --save_every 5000\n# generate mirrored data\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\npython3 record_sim_episodes.py --task_name sim_transfer_cube_scripted_mirror --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --num_episodes 50\npython3 postprocess_episodes.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --num_episodes 50\n# the sim_transfer_cube_scripted_mirror will have 100 episodes\n# I then copy the whole dir to sim_transfer_cube_scripted then removed all mirrored episodes\n# this gives sim_transfer_cube_scripted_mirror (100 episodes) and sim_transfer_cube_scripted (50 episodes)\n# visualize the original data\npython3 visualize_episodes.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --episode_idx 0\n# visualize the artificially mirrored data\npython3 visualize_episodes.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --episode_idx 0 --ismirror",
+ "type": "code",
+ "location": "/commands.txt:29-43"
+ },
+ "25": {
+ "file_id": 3,
+ "content": "This code generates mirrored data for a simulation task, creates two dataset directories (one with 100 episodes and the other with 50), visualizes original and artificially mirrored data from the first episode in the dataset. The user then activates a conda environment, changes to the directory containing the code, and runs Python scripts to accomplish these tasks.",
+ "type": "comment"
+ },
+ "26": {
+ "file_id": 3,
+ "content": "# sanity check\n# replay the mirrored data action in the original env\npython3 replay_episodes.py --dataset_path /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror/mirror_episode_0.hdf5\n# replay the original data action in the original env\npython3 replay_episodes.py --dataset_path /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror/episode_0.hdf5\n# launch experiment on original data\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted \\\n--policy_class ACT --kl_weight 10 --chunk_size 50 \\\n--hidden_dim 512 --batch_size 12 --dim_feedforward 3200 --lr 1e-5 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000 --no_encoder\n# launch experiment on all data\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted_mirror \\",
+ "type": "code",
+ "location": "/commands.txt:45-69"
+ },
+ "27": {
+ "file_id": 3,
+ "content": "The code sanity checks the mirrored and original data by replaying the actions in their respective environments, then launches experiments on both datasets using the ACT policy with specified parameters.",
+ "type": "comment"
+ },
+ "28": {
+ "file_id": 3,
+ "content": "--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_mirror \\\n--policy_class ACT --kl_weight 10 --chunk_size 50 \\\n--hidden_dim 512 --batch_size 12 --dim_feedforward 3200 --lr 1e-5 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000 --no_encoder\n####### DIFFUSION POLICY\n- first install https://github.com/ARISE-Initiative/robomimic/tree/r2d2 (note the r2d2 branch)\n- on top of it pip install the current repo requirements\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-5 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\",
+ "type": "code",
+ "location": "/commands.txt:70-97"
+ },
+ "29": {
+ "file_id": 3,
+ "content": "The code is running a Python script named \"imitate_episodes.py\" from the act-plus-plus repository, training a policy for imitation learning using different configurations. It switches between two policies (ACT and Diffusion) with varying hyperparameters, such as chunk size, batch size, and number of steps. The code also specifies the task name, checkpoint directory, and activates a specific conda environment before running the script on different GPUs.",
+ "type": "comment"
+ },
+ "30": {
+ "file_id": 3,
+ "content": "--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_1 \\\n--policy_class Diffusion --chunk_size 16 \\\n--batch_size 32 --lr 1e-5 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000\n# above are all 100 train diffusion steps, 1e-5\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_2_50step_1e-4 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000\n# Dec 10\n######################## more diffusion ########################\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_3_chunk64 \\\n--policy_class Diffusion --chunk_size 64 \\",
+ "type": "code",
+ "location": "/commands.txt:98-125"
+ },
+ "31": {
+ "file_id": 3,
+ "content": "The code snippet is used to train and evaluate a policy for a task named \"sim_transfer_cube_scripted\" using different configurations. It activates a specific conda environment, sets the MUJOCO_GL environment variable, changes directory to the project's root, and executes the imitate_episodes.py script multiple times with varying parameters such as CUDA device, learning rate, chunk size, and checkpoint directories. The code seems to be part of a larger training process involving different diffusion steps, potentially for model performance optimization or comparison.",
+ "type": "comment"
+ },
+ "32": {
+ "file_id": 3,
+ "content": "--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 200000 --eval_every 4000 --validate_every 4000 --save_every 4000\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_4_regressionTest \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_5_noEMA \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus",
+ "type": "code",
+ "location": "/commands.txt:126-152"
+ },
+ "33": {
+ "file_id": 3,
+ "content": "This code activates a conda environment, sets MUJOCO_GL to egl, changes directory to act-plus-plus, and runs three different python scripts with varying hyperparameters for training and evaluation on the \"sim_transfer_cube_scripted\" task. The policy class is set to Diffusion and chunk size is 32. Each script has different checkpoint directories, numbers of steps, and evaluation frequencies.",
+ "type": "comment"
+ },
+ "34": {
+ "file_id": 3,
+ "content": "CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_6_noEMA_seed1 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 1 \\\n--num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000\n###### Diffusion Real ######\n## deploy\npython3 imitate_episodes.py --task_name aloha_mobile_wipe_wine --ckpt_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/wipe_wine_diffusion_augmentation_seed0/ --policy_class Diffusion --chunk_size 32 --batch_size 32 --lr 1e-4 --seed 0 --num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000 --eval\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_diffusion_seed0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000",
+ "type": "code",
+ "location": "/commands.txt:153-173"
+ },
+ "35": {
+ "file_id": 3,
+ "content": "This code is training and evaluating a diffusion-based policy model for two different tasks: \"sim_transfer_cube_scripted\" and \"aloha_mobile_wipe_wine\". It specifies the necessary command line arguments such as task name, checkpoint directory, policy class, chunk size, batch size, learning rate, seed, number of steps, evaluation frequency, validation frequency, and save frequency. The code also sets the CUDA device, environment variables, and activates a conda environment before running the training and evaluation scripts.",
+ "type": "comment"
+ },
+ "36": {
+ "file_id": 3,
+ "content": "## Cotrain\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_seed0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# train no cotrain again with augmentations\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_diffusion_augmentation_seed0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n## Cotrain with augmentations\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\",
+ "type": "code",
+ "location": "/commands.txt:175-201"
+ },
+ "37": {
+ "file_id": 3,
+ "content": "This code is activating the mobile conda environment, setting MUJOCO_GL to egl, and running Python scripts in the act-plus-plus directory. It trains a model (Diffusion policy) for two different tasks: \"aloha\\_mobile\\_wipe\\_wine\\_cotrain\" and \"aloha\\_mobile\\_wipe\\_wine\". The first task is trained again with augmentations, while the second task is trained with augmentations. The code is running on CUDA device 0 and 1, saving models every 5000 steps, evaluating every 100,000 steps, and validating every 5,000 steps for a total of 1,000,000 steps.",
+ "type": "comment"
+ },
+ "38": {
+ "file_id": 3,
+ "content": "--task_name aloha_mobile_wipe_wine_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_seed0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# try chunk size 64, no cotrain\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_diffusion_augmentation_chunk64_seed0 \\\n--policy_class Diffusion --chunk_size 64 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# chunk 64 with cotrain\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_chunk64_seed0 \\",
+ "type": "code",
+ "location": "/commands.txt:202-227"
+ },
+ "39": {
+ "file_id": 3,
+ "content": "The code is executing two different training jobs for a robotics task called 'aloha_mobile_wipe_wine'. It first trains the model with chunk size 32 and cotrain, then with chunk size 64 without cotrain. It also validates and saves models every 5000 steps. The code requires specific environment activation and environmental variable settings.",
+ "type": "comment"
+ },
+ "40": {
+ "file_id": 3,
+ "content": "--policy_class Diffusion --chunk_size 64 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# chunk 64 with cotrain + EMA\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine_2_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_chunk64_ema_seed0 \\\n--policy_class Diffusion --chunk_size 64 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# chunk 64 with cotrain + EMA + 3e-4\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine_2_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_chunk64_ema_3e-4_seed0 \\\n--policy_class Diffusion --chunk_size 64 \\\n--batch_size 32 --lr 3e-4 --seed 0 \\",
+ "type": "code",
+ "location": "/commands.txt:228-256"
+ },
+ "41": {
+ "file_id": 3,
+ "content": "This code activates the conda environment, sets environment variables, and runs a Python script to train a diffusion policy model with chunk size 64 for a task named \"aloha\\_mobile\\_wipe\\_wine\\_2\\_cotrain\". It saves checkpoints every 5000 steps. The first command trains the model with learning rate 1e-4, while the second one trains it with learning rate 3e-4.",
+ "type": "comment"
+ },
+ "42": {
+ "file_id": 3,
+ "content": "--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n######################## VINN ########################\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted --cam_name top --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=0 python3 train.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted --cam_name left_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted --cam_name right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=sim_transfer_cube_scripted\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt",
+ "type": "code",
+ "location": "/commands.txt:257-278"
+ },
+ "43": {
+ "file_id": 3,
+ "content": "This code activates a conda environment, changes directory, sets CUDA_VISIBLE_DEVICES, and runs the train.py script for different camera names with the same seed in a loop, then it switches to another conda environment and runs a vinn_cache_feature.py script using the saved checkpoint path.",
+ "type": "comment"
+ },
+ "44": {
+ "file_id": 3,
+ "content": "TASK_NAME=sim_transfer_cube_scripted\npython3 vinn_select_k.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test\npython3 vinn_eval.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test \\\n--task_name $TASK_NAME \n## TODO\nmake sure env is consistent\ntune a bit more\n######################## VINN Real ########################\n### test backward compatibility\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task sim_transfer_cube_scripted --cam_name top --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task sim_transfer_cube_scripted --cam_name left_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA",
+ "type": "code",
+ "location": "/commands.txt:280-307"
+ },
+ "45": {
+ "file_id": 3,
+ "content": "This code is running a series of commands to train and evaluate the VINN model on the sim_transfer_cube_scripted task. It first selects the dataset, loads the pre-trained model, evaluates it, and then tests backward compatibility with two different camera names ('top' and 'left_wrist'). The environment is activated, specific CUDA devices are set, and the training process is executed for both cameras using the byol_pytorch package.",
+ "type": "comment"
+ },
+ "46": {
+ "file_id": 3,
+ "content": "_VISIBLE_DEVICES=1 python3 train.py --task sim_transfer_cube_scripted --cam_name right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=sim_transfer_cube_scripted\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt\nTASK_NAME=sim_transfer_cube_scripted\npython3 vinn_select_k.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test\npython3 vinn_eval.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test \\\n--task_name $TASK_NAME \n### new data loader passed backward compatibility\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine --cam_name cam_high --seed 0",
+ "type": "code",
+ "location": "/commands.txt:307-331"
+ },
+ "47": {
+ "file_id": 3,
+ "content": "Training a BYOL model for the sim_transfer_cube_scripted task, evaluating the trained model using vinn_eval.py, and utilizing the vinn_select_k.py to choose K best features from the dataset.",
+ "type": "comment"
+ },
+ "48": {
+ "file_id": 3,
+ "content": "#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine --cam_name cam_left_wrist --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine --cam_name cam_right_wrist --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_high --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_left_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan --cam_name cam_high --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan --cam_name cam_left_wrist --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan --cam_name cam_right_wrist --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_high --seed 0",
+ "type": "code",
+ "location": "/commands.txt:332-346"
+ },
+ "49": {
+ "file_id": 3,
+ "content": "This code snippet executes Python training scripts using CUDA for various tasks and cameras. It activates a specific conda environment, changes the directory to the relevant project folder, and trains models with different configurations (single-camera or co-trained) on tasks such as aloha_mobile_wipe_wine and aloha_mobile_wash_pan.",
+ "type": "comment"
+ },
+ "50": {
+ "file_id": 3,
+ "content": "#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_left_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_right_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated --cam_name cam_high --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated --cam_name cam_left_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated --cam_name cam_right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_right_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated_cotrain --cam_name cam_high --seed 0",
+ "type": "code",
+ "location": "/commands.txt:347-362"
+ },
+ "51": {
+ "file_id": 3,
+ "content": "This code snippet is running Python scripts using the CUDA_VISIBLE_DEVICES environment variable to control which GPU(s) are used. The commands are training different models for various tasks such as aloha_mobile_wash_pan_cotrain, aloha_mobile_elevator_truncated, etc., using different camera names and seeds. Some models are trained on the cam_left_wrist, cam_right_wrist, or cam_high cameras. The code is activated using Conda environments.",
+ "type": "comment"
+ },
+ "52": {
+ "file_id": 3,
+ "content": "CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated_cotrain --cam_name cam_left_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated_cotrain --cam_name cam_right_wrist --seed 0\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wipe_wine\nDATA_NAME=aloha_mobile_wipe_wine\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wipe_wine_cotrain\nDATA_NAME=aloha_mobile_wipe_wine\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1",
+ "type": "code",
+ "location": "/commands.txt:363-387"
+ },
+ "53": {
+ "file_id": 3,
+ "content": "The code is running two different Python scripts in a conda environment, training models on specific tasks (aloha_mobile_elevator_truncated_cotrain and aloha_mobile_wipe_wine_cotrain), using CUDA device 1. It then uses these trained models to cache features for the corresponding datasets and sets the CUDA visible devices, changing directories between actions.",
+ "type": "comment"
+ },
+ "54": {
+ "file_id": 3,
+ "content": "cd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wash_pan\nDATA_NAME=aloha_mobile_wash_pan\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wash_pan_cotrain\nDATA_NAME=aloha_mobile_wash_pan\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_elevator_truncated\nDATA_NAME=aloha_mobile_elevator_truncated\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}",
+ "type": "code",
+ "location": "/commands.txt:388-409"
+ },
+ "55": {
+ "file_id": 3,
+ "content": "The code activates a conda environment called \"mobile\", sets the CUDA_VISIBLE_DEVICES environment variable to 1, and runs the vinn_cache_feature.py script for multiple tasks using different checkpoint paths and dataset directories.",
+ "type": "comment"
+ },
+ "56": {
+ "file_id": 3,
+ "content": "conda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_elevator_truncated_cotrain\nDATA_NAME=aloha_mobile_elevator_truncated\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\n# push chair task\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=0 \ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\npython3 train.py --task aloha_mobile_chair_truncated --cam_name cam_high --seed 0\npython3 train.py --task aloha_mobile_chair_truncated --cam_name cam_left_wrist --seed 0\npython3 train.py --task aloha_mobile_chair_truncated --cam_name cam_right_wrist --seed 0\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_chair_truncated\nDATA_NAME=aloha_mobile_chair_truncated\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\",
+ "type": "code",
+ "location": "/commands.txt:411-433"
+ },
+ "57": {
+ "file_id": 3,
+ "content": "This code activates a specific conda environment, sets the visible CUDA devices, changes directories, and runs multiple training scripts for different camera views in a mobile chair task. It then activates another environment, changes directories again, and runs a feature caching script on trained models for the chair and elevator tasks.",
+ "type": "comment"
+ },
+ "58": {
+ "file_id": 3,
+ "content": "--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\npython3 train.py --task aloha_mobile_chair_truncated_cotrain --cam_name cam_high --seed 0\npython3 train.py --task aloha_mobile_chair_truncated_cotrain --cam_name cam_left_wrist --seed 0\npython3 train.py --task aloha_mobile_chair_truncated_cotrain --cam_name cam_right_wrist --seed 0\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_chair_truncated_cotrain\nDATA_NAME=aloha_mobile_chair_truncated\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\n# cache feature again for wipe wine\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=0\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wipe_wine\nDATA_NAME=aloha_mobile_wipe_wine\npython3 vinn_c",
+ "type": "code",
+ "location": "/commands.txt:434-459"
+ },
+ "59": {
+ "file_id": 3,
+ "content": "This code snippet trains a BYOL model on the aloha_mobile_chair_truncated_cotrain task, then uses vinn_cache_feature.py to cache features for wipe wine dataset. It activates a conda environment, sets CUDA_VISIBLE_DEVICES, changes directories, and runs Python training scripts with specific parameters.",
+ "type": "comment"
+ },
+ "60": {
+ "file_id": 3,
+ "content": "ache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wipe_wine_cotrain\nDATA_NAME=aloha_mobile_wipe_wine\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\n# run on real robot\nTASK_NAME=aloha_mobile_chair_truncated\npython3 vinn_select_k.py \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME} \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0\npython3 vinn_eval.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0 \\\n--task_name $TASK_NAME ",
+ "type": "code",
+ "location": "/commands.txt:459-481"
+ },
+ "61": {
+ "file_id": 3,
+ "content": "This code is running a series of commands to train and evaluate a vision-in-nervous-system (VINN) model. The model is being trained on different datasets for various tasks such as chair recognition, mobile wipe, and wine classification. The commands use Python scripts with specific paths and arguments to perform these tasks.",
+ "type": "comment"
+ },
+ "62": {
+ "file_id": 3,
+ "content": "TASK_NAME=aloha_mobile_chair_truncated\npython3 vinn_select_k.py \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME} \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0\npython3 vinn_eval.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0 \\\n--task_name $TASK_NAME \n# eval on real robot\nconda activate aloha\ncd /home/mobile-aloha/interbotix_ws/src/act\nTASK_NAME=aloha_mobile_wipe_wine\npython3 vinn_cache_feature.py --ckpt_path /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/byol-${TASK_NAME}-DUMMY-seed-0.pt\nTASK_NAME=aloha_mobile_wipe_wine\npython3 vinn_select_k.py \\\n--dataset_dir /home/mobile-aloha/data/${TASK_NAME} \\\n--ckpt_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/VINN-eval-seed-0-test \\\nTASK_NAME=aloha_mobile_wipe_wine\npython3 vinn_eval.py \\\n--dataset_dir /home/mobile-aloha/data/${TASK_NAME} \\",
+ "type": "code",
+ "location": "/commands.txt:485-514"
+ },
+ "63": {
+ "file_id": 3,
+ "content": "The code runs two separate python scripts for evaluating and training models on different datasets. The first set of commands trains a model using the VINN approach and BYOL implementation, while the second set of commands evaluates and caches features on a real robot. Both processes involve multiple dataset directories and checkpoint paths to train/evaluate/cache feature sets.",
+ "type": "comment"
+ },
+ "64": {
+ "file_id": 3,
+ "content": "--model_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/VINN-eval-seed-0-test \\\n--task_name $TASK_NAME \n---------------------------------------------------------------------------------------\nNOTE: chunk size cannot be any number, try before launching\nTODO: Add history, EMA at test time\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 train_actuator_network.py",
+ "type": "code",
+ "location": "/commands.txt:515-527"
+ },
+ "65": {
+ "file_id": 3,
+ "content": "This code activates a conda environment, changes to the project directory, sets the CUDA device, and runs a Python script for training an actuator network. The task name is provided as a variable, but the chunk size and some additional features are noted for future improvement.",
+ "type": "comment"
+ },
+ "66": {
+ "file_id": 4,
+ "content": "/compress_data.py",
+ "type": "filepath"
+ },
+ "67": {
+ "file_id": 4,
+ "content": "The code compresses images, handles HDF5 datasets, and processes videos. It removes depth images, concatenates camera videos, decompresses/compresses images, and saves the first episode video.",
+ "type": "summary"
+ },
+ "68": {
+ "file_id": 4,
+ "content": "\"\"\"\nExample usage:\n$ python3 script/compress_data.py --dataset_dir /scr/lucyshi/dataset/aloha_test\n\"\"\"\nimport os\nimport h5py\nimport cv2\nimport numpy as np\nimport argparse\nfrom tqdm import tqdm\n# Constants\nDT = 0.02\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTATE_NAMES = JOINT_NAMES + [\"gripper\"]\ndef compress_dataset(input_dataset_path, output_dataset_path):\n # Check if output path exists\n if os.path.exists(output_dataset_path):\n print(f\"The file {output_dataset_path} already exists. Exiting...\")\n return\n # Load the uncompressed dataset\n with h5py.File(input_dataset_path, 'r') as infile:\n # Create the compressed dataset\n with h5py.File(output_dataset_path, 'w') as outfile:\n outfile.attrs['sim'] = infile.attrs['sim']\n outfile.attrs['compress'] = True\n # Copy non-image data directly\n for key in infile.keys():\n if key != 'observations':\n outfile.copy(infile[key], key)",
+ "type": "code",
+ "location": "/compress_data.py:1-35"
+ },
+ "69": {
+ "file_id": 4,
+ "content": "The code compresses a dataset by creating a new compressed HDF5 file. It checks if the output path already exists, loads the uncompressed dataset, creates the compressed dataset with the same non-image data and attributes, and then copies over only the 'observations' key from the input file to the output file.",
+ "type": "comment"
+ },
+ "70": {
+ "file_id": 4,
+ "content": " obs_group = infile['observations']\n # Create observation group in the output\n out_obs_group = outfile.create_group('observations')\n # Copy non-image data in observations directly\n for key in obs_group.keys():\n if key != 'images':\n out_obs_group.copy(obs_group[key], key)\n image_group = obs_group['images']\n out_image_group = out_obs_group.create_group('images')\n # JPEG compression parameters\n encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 50]\n compressed_lens = [] # List to store compressed lengths for each camera\n for cam_name in image_group.keys():\n if \"_depth\" in cam_name: # Depth images are not compressed\n out_image_group.copy(image_group[cam_name], cam_name)\n else:\n images = image_group[cam_name]\n compressed_images = []\n cam_compressed_lens = [] # List to store compressed lengths for this camera",
+ "type": "code",
+ "location": "/compress_data.py:37-61"
+ },
+ "71": {
+ "file_id": 4,
+ "content": "Creates observation group in output file, copies non-image data, creates image group in observations, applies JPEG compression parameters, skips depth images, stores compressed lengths for each camera.",
+ "type": "comment"
+ },
+ "72": {
+ "file_id": 4,
+ "content": " # Compress each image\n for image in images:\n result, encoded_image = cv2.imencode('.jpg', image, encode_param)\n compressed_images.append(encoded_image)\n cam_compressed_lens.append(len(encoded_image)) # Store the length\n compressed_lens.append(cam_compressed_lens)\n # Find the maximum length of the compressed images\n max_len = max(len(img) for img in compressed_images)\n # Create dataset to store compressed images\n compressed_dataset = out_image_group.create_dataset(cam_name, (len(compressed_images), max_len), dtype='uint8')\n # Store compressed images\n for i, img in enumerate(compressed_images):\n compressed_dataset[i, :len(img)] = img\n # Save the compressed lengths to the HDF5 file\n compressed_lens = np.array(compressed_lens)",
+ "type": "code",
+ "location": "/compress_data.py:63-82"
+ },
+ "73": {
+ "file_id": 4,
+ "content": "This code compresses images and stores their lengths in a list. It then finds the maximum length of the compressed images and creates a dataset to store them in an HDF5 file, with the same length as the number of images. Finally, it saves the compressed lengths to the HDF5 file.",
+ "type": "comment"
+ },
+ "74": {
+ "file_id": 4,
+ "content": " _ = outfile.create_dataset('compress_len', compressed_lens.shape)\n outfile['/compress_len'][...] = compressed_lens\n print(f\"Compressed dataset saved to {output_dataset_path}\")\ndef save_videos(video, dt, video_path=None):\n if isinstance(video, list):\n cam_names = list(video[0].keys())\n h, w, _ = video[0][cam_names[0]].shape\n w = w * len(cam_names)\n fps = int(1/dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n # bitrate = 1000000\n # out.set(cv2.VIDEOWRITER_PROP_BITRATE, bitrate)\n for ts, image_dict in enumerate(video):\n images = []\n for cam_name in cam_names:\n image = image_dict[cam_name]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n images.append(image)\n images = np.concatenate(images, axis=1)\n out.write(images)\n out.release()\n print(f'Saved video to: {video_path}')\n elif isinstance(video, dict):",
+ "type": "code",
+ "location": "/compress_data.py:83-108"
+ },
+ "75": {
+ "file_id": 4,
+ "content": "Code saves a compressed dataset to the specified output path. It first checks if the video is in a list or dictionary format, and then creates a VideoWriter object with the desired parameters. For each frame of the video, it concatenates images from all cameras into one image, swaps B and R channels, and writes the resulting image to the output file. Finally, it releases the VideoWriter object and prints the saved video path.",
+ "type": "comment"
+ },
+ "76": {
+ "file_id": 4,
+ "content": " cam_names = list(video.keys())\n # Remove depth images\n cam_names = [cam_name for cam_name in cam_names if '_depth' not in cam_name]\n all_cam_videos = []\n for cam_name in cam_names:\n all_cam_videos.append(video[cam_name])\n all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension\n n_frames, h, w, _ = all_cam_videos.shape\n fps = int(1 / dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n for t in range(n_frames):\n image = all_cam_videos[t]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n out.write(image)\n out.release()\n print(f'Saved video to: {video_path}')\ndef load_and_save_first_episode_video(dataset_dir, video_path):\n dataset_name = 'episode_0'\n _, _, _, _, image_dict = load_hdf5(dataset_dir, dataset_name)\n save_videos(image_dict, DT, video_path=video_path)\ndef load_hdf5(dataset_dir, dataset_name):\n dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')",
+ "type": "code",
+ "location": "/compress_data.py:109-135"
+ },
+ "77": {
+ "file_id": 4,
+ "content": "This code loads an HDF5 dataset, removes depth images, concatenates remaining camera videos along the width dimension, saves the resulting video, and provides functions for loading and saving the first episode video.",
+ "type": "comment"
+ },
+ "78": {
+ "file_id": 4,
+ "content": " if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n compressed = root.attrs.get('compress', False)\n image_dict = dict()\n for cam_name in root[f'/observations/images/'].keys():\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]\n if compressed:\n compress_len = root['/compress_len'][()]\n if compressed:\n for cam_id, cam_name in enumerate(image_dict.keys()):\n padded_compressed_image_list = image_dict[cam_name]\n image_list = []\n for frame_id, padded_compressed_image in enumerate(padded_compressed_image_list):\n image_len = int(compress_len[cam_id, frame_id])\n compressed_image = padded_compressed_image\n image = cv2.imdecode(compressed_image, 1)\n image_list.append(image)\n image_dict[cam_name] = image_list\n return None, None, None, None, image_dict # Return only the image dict for this application",
+ "type": "code",
+ "location": "/compress_data.py:136-159"
+ },
+ "79": {
+ "file_id": 4,
+ "content": "This code checks if the dataset file exists, loads compressed images from the file, and returns an image dictionary. If the dataset file is missing, it prints a message and exits. Compressed images are loaded for each camera, and the compressed images are decompressed into a list of images per camera. The final result is the image dictionary containing these lists of images.",
+ "type": "comment"
+ },
+ "80": {
+ "file_id": 4,
+ "content": "if __name__ == '__main__':\n parser = argparse.ArgumentParser(description=\"Compress all HDF5 datasets in a directory.\")\n parser.add_argument('--dataset_dir', action='store', type=str, required=True, help='Directory containing the uncompressed datasets.')\n args = parser.parse_args()\n output_dataset_dir = args.dataset_dir + '_compressed'\n os.makedirs(output_dataset_dir, exist_ok=True)\n # Iterate over each file in the directory\n for filename in tqdm(os.listdir(args.dataset_dir), desc=\"Compressing data\"):\n if filename.endswith('.hdf5'):\n input_path = os.path.join(args.dataset_dir, filename)\n output_path = os.path.join(output_dataset_dir, filename)\n compress_dataset(input_path, output_path)\n # After processing all datasets, load and save the video for the first episode\n print(f'Saving video for episode 0 in {output_dataset_dir}')\n video_path = os.path.join(output_dataset_dir, 'episode_0_video.mp4')\n load_and_save_first_episode_video(output_dataset_dir, video_path)",
+ "type": "code",
+ "location": "/compress_data.py:162-181"
+ },
+ "81": {
+ "file_id": 4,
+ "content": "This code compresses all HDF5 datasets in a specified directory. It requires the directory path, creates a compressed dataset directory, iterates over each file ending with '.hdf5', compresses the dataset using 'compress_dataset' function, and after processing all datasets, loads and saves the video for the first episode.",
+ "type": "comment"
+ },
+ "82": {
+ "file_id": 5,
+ "content": "/conda_env.yaml",
+ "type": "filepath"
+ },
+ "83": {
+ "file_id": 5,
+ "content": "This YAML file defines a Conda environment named \"aloha\" with specified channels, Python version, and required packages for the codebase.",
+ "type": "summary"
+ },
+ "84": {
+ "file_id": 5,
+ "content": "name: aloha\nchannels:\n - pytorch\n - nvidia\n - conda-forge\ndependencies:\n - python=3.9\n - pip=23.0.1\n - pytorch=2.0.0\n - torchvision=0.15.0\n - pytorch-cuda=11.8\n - pyquaternion=0.9.9\n - pyyaml=6.0\n - rospkg=1.5.0\n - pexpect=4.8.0\n - mujoco=2.3.3\n - dm_control=1.0.9\n - py-opencv=4.7.0\n - matplotlib=3.7.1\n - einops=0.6.0\n - packaging=23.0\n - h5py=3.8.0\n - ipython=8.12.0",
+ "type": "code",
+ "location": "/conda_env.yaml:1-23"
+ },
+ "85": {
+ "file_id": 5,
+ "content": "This YAML file defines a Conda environment named \"aloha\" with specified channels, Python version, and required packages for the codebase.",
+ "type": "comment"
+ },
+ "86": {
+ "file_id": 6,
+ "content": "/constants.py",
+ "type": "filepath"
+ },
+ "87": {
+ "file_id": 6,
+ "content": "This code defines task parameters and simulation environments for robotics applications, including gripper position limits, joint names, and normalization functions for master and puppet grippers.",
+ "type": "summary"
+ },
+ "88": {
+ "file_id": 6,
+ "content": "import pathlib\nimport os\n### Task parameters\nDATA_DIR = '/home/zfu/interbotix_ws/src/act/data' if os.getlogin() == 'zfu' else '/scr/tonyzhao/datasets'\nSIM_TASK_CONFIGS = {\n 'sim_transfer_cube_scripted':{\n 'dataset_dir': DATA_DIR + '/sim_transfer_cube_scripted',\n 'num_episodes': 50,\n 'episode_len': 400,\n 'camera_names': ['top', 'left_wrist', 'right_wrist']\n },\n 'sim_transfer_cube_human':{\n 'dataset_dir': DATA_DIR + '/sim_transfer_cube_human',\n 'num_episodes': 50,\n 'episode_len': 400,\n 'camera_names': ['top']\n },\n 'sim_insertion_scripted': {\n 'dataset_dir': DATA_DIR + '/sim_insertion_scripted',\n 'num_episodes': 50,\n 'episode_len': 400,\n 'camera_names': ['top', 'left_wrist', 'right_wrist']\n },\n 'sim_insertion_human': {\n 'dataset_dir': DATA_DIR + '/sim_insertion_human',\n 'num_episodes': 50,\n 'episode_len': 500,\n 'camera_names': ['top']\n },\n 'all': {\n 'dataset_dir': DATA_DIR + '/',",
+ "type": "code",
+ "location": "/constants.py:1-35"
+ },
+ "89": {
+ "file_id": 6,
+ "content": "This code defines constant values for task parameters. It specifies different simulation tasks, their associated dataset directories, the number of episodes, episode length, and camera names. These constants are used for organizing and accessing datasets in the 'DATA_DIR' directory.",
+ "type": "comment"
+ },
+ "90": {
+ "file_id": 6,
+ "content": " 'num_episodes': None,\n 'episode_len': None,\n 'name_filter': lambda n: 'sim' not in n,\n 'camera_names': ['cam_high', 'cam_left_wrist', 'cam_right_wrist']\n },\n 'sim_transfer_cube_scripted_mirror':{\n 'dataset_dir': DATA_DIR + '/sim_transfer_cube_scripted_mirror',\n 'num_episodes': None,\n 'episode_len': 400,\n 'camera_names': ['top', 'left_wrist', 'right_wrist']\n },\n 'sim_insertion_scripted_mirror': {\n 'dataset_dir': DATA_DIR + '/sim_insertion_scripted_mirror',\n 'num_episodes': None,\n 'episode_len': 400,\n 'camera_names': ['top', 'left_wrist', 'right_wrist']\n },\n}\n### Simulation envs fixed constants\nDT = 0.02\nFPS = 50\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTART_ARM_POSE = [0, -0.96, 1.16, 0, -0.3, 0, 0.02239, -0.02239, 0, -0.96, 1.16, 0, -0.3, 0, 0.02239, -0.02239]\nXML_DIR = str(pathlib.Path(__file__).parent.resolve()) + '/assets/' # note: absolute path\n# Left finger position limits (qpos[7]), right_finger = -1 * left_finger",
+ "type": "code",
+ "location": "/constants.py:36-66"
+ },
+ "91": {
+ "file_id": 6,
+ "content": "This code defines a dictionary containing constant values for simulation environments. It includes dataset directories, episode parameters, and camera names for each environment. Additionally, there are constants defining the time step (DT), frame rate (FPS), joint names, initial arm pose, and finger position limits for the simulation. These constants will be used in the simulation processes to ensure consistency across different environments and tasks.",
+ "type": "comment"
+ },
+ "92": {
+ "file_id": 6,
+ "content": "MASTER_GRIPPER_POSITION_OPEN = 0.02417\nMASTER_GRIPPER_POSITION_CLOSE = 0.01244\nPUPPET_GRIPPER_POSITION_OPEN = 0.05800\nPUPPET_GRIPPER_POSITION_CLOSE = 0.01844\n# Gripper joint limits (qpos[6])\nMASTER_GRIPPER_JOINT_OPEN = -0.8\nMASTER_GRIPPER_JOINT_CLOSE = -1.65\nPUPPET_GRIPPER_JOINT_OPEN = 1.4910\nPUPPET_GRIPPER_JOINT_CLOSE = -0.6213\n############################ Helper functions ############################\nMASTER_GRIPPER_POSITION_NORMALIZE_FN = lambda x: (x - MASTER_GRIPPER_POSITION_CLOSE) / (MASTER_GRIPPER_POSITION_OPEN - MASTER_GRIPPER_POSITION_CLOSE)\nPUPPET_GRIPPER_POSITION_NORMALIZE_FN = lambda x: (x - PUPPET_GRIPPER_POSITION_CLOSE) / (PUPPET_GRIPPER_POSITION_OPEN - PUPPET_GRIPPER_POSITION_CLOSE)\nMASTER_GRIPPER_POSITION_UNNORMALIZE_FN = lambda x: x * (MASTER_GRIPPER_POSITION_OPEN - MASTER_GRIPPER_POSITION_CLOSE) + MASTER_GRIPPER_POSITION_CLOSE\nPUPPET_GRIPPER_POSITION_UNNORMALIZE_FN = lambda x: x * (PUPPET_GRIPPER_POSITION_OPEN - PUPPET_GRIPPER_POSITION_CLOSE) + PUPPET_GRIPPER_POSITION_CLOSE\nMASTER2P",
+ "type": "code",
+ "location": "/constants.py:67-84"
+ },
+ "93": {
+ "file_id": 6,
+ "content": "This code defines gripper position and joint limits for the master and puppet grippers. It also includes normalization and unnormalization functions to convert gripper positions between normalized and actual values. The purpose is likely to enable consistent handling of gripper positions regardless of their current state.",
+ "type": "comment"
+ },
+ "94": {
+ "file_id": 6,
+ "content": "UPPET_POSITION_FN = lambda x: PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(MASTER_GRIPPER_POSITION_NORMALIZE_FN(x))\nMASTER_GRIPPER_JOINT_NORMALIZE_FN = lambda x: (x - MASTER_GRIPPER_JOINT_CLOSE) / (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE)\nPUPPET_GRIPPER_JOINT_NORMALIZE_FN = lambda x: (x - PUPPET_GRIPPER_JOINT_CLOSE) / (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE)\nMASTER_GRIPPER_JOINT_UNNORMALIZE_FN = lambda x: x * (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE) + MASTER_GRIPPER_JOINT_CLOSE\nPUPPET_GRIPPER_JOINT_UNNORMALIZE_FN = lambda x: x * (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE) + PUPPET_GRIPPER_JOINT_CLOSE\nMASTER2PUPPET_JOINT_FN = lambda x: PUPPET_GRIPPER_JOINT_UNNORMALIZE_FN(MASTER_GRIPPER_JOINT_NORMALIZE_FN(x))\nMASTER_GRIPPER_VELOCITY_NORMALIZE_FN = lambda x: x / (MASTER_GRIPPER_POSITION_OPEN - MASTER_GRIPPER_POSITION_CLOSE)\nPUPPET_GRIPPER_VELOCITY_NORMALIZE_FN = lambda x: x / (PUPPET_GRIPPER_POSITION_OPEN - PUPPET_GRIPPER_POSITION_CLOSE)\nMASTE",
+ "type": "code",
+ "location": "/constants.py:84-95"
+ },
+ "95": {
+ "file_id": 6,
+ "content": "This code defines various lambda functions for joint normalization and unnormalization, gripper velocity normalization, as well as a master-to-puppet joint conversion function. These functions are likely used in robotics or similar applications to manipulate and convert gripper positions and velocities between two systems with different open and closed positions.",
+ "type": "comment"
+ },
+ "96": {
+ "file_id": 6,
+ "content": "R_POS2JOINT = lambda x: MASTER_GRIPPER_POSITION_NORMALIZE_FN(x) * (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE) + MASTER_GRIPPER_JOINT_CLOSE\nMASTER_JOINT2POS = lambda x: MASTER_GRIPPER_POSITION_UNNORMALIZE_FN((x - MASTER_GRIPPER_JOINT_CLOSE) / (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE))\nPUPPET_POS2JOINT = lambda x: PUPPET_GRIPPER_POSITION_NORMALIZE_FN(x) * (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE) + PUPPET_GRIPPER_JOINT_CLOSE\nPUPPET_JOINT2POS = lambda x: PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN((x - PUPPET_GRIPPER_JOINT_CLOSE) / (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE))\nMASTER_GRIPPER_JOINT_MID = (MASTER_GRIPPER_JOINT_OPEN + MASTER_GRIPPER_JOINT_CLOSE)/2",
+ "type": "code",
+ "location": "/constants.py:95-100"
+ },
+ "97": {
+ "file_id": 6,
+ "content": "This code defines four lambda functions, two each for the master and puppet grippers. The functions convert gripper positions to joint angles (pos2joint) and vice versa (joint2pos). It also calculates the midpoint of the master gripper's joint range. These functions use normalize and unnormalize FN from respective constants.",
+ "type": "comment"
+ },
+ "98": {
+ "file_id": 7,
+ "content": "/detr/README.md",
+ "type": "filepath"
+ },
+ "99": {
+ "file_id": 7,
+ "content": "This code snippet is modified from the DETR repository and licensed under Apache 2.0. It cites End-to-End Object Detection with Transformers paper as its reference.",
+ "type": "summary"
+ }
+}
\ No newline at end of file
diff --git a/docs/data/1.json b/docs/data/1.json
new file mode 100644
index 00000000..89c3857f
--- /dev/null
+++ b/docs/data/1.json
@@ -0,0 +1,545 @@
+{
+ "100": {
+ "file_id": 7,
+ "content": "This part of the codebase is modified from DETR https://github.com/facebookresearch/detr under APACHE 2.0.\n @article{Carion2020EndtoEndOD,\n title={End-to-End Object Detection with Transformers},\n author={Nicolas Carion and Francisco Massa and Gabriel Synnaeve and Nicolas Usunier and Alexander Kirillov and Sergey Zagoruyko},\n journal={ArXiv},\n year={2020},\n volume={abs/2005.12872}\n }",
+ "type": "code",
+ "location": "/detr/README.md:1-9"
+ },
+ "101": {
+ "file_id": 7,
+ "content": "This code snippet is modified from the DETR repository and licensed under Apache 2.0. It cites End-to-End Object Detection with Transformers paper as its reference.",
+ "type": "comment"
+ },
+ "102": {
+ "file_id": 8,
+ "content": "/detr/main.py",
+ "type": "filepath"
+ },
+ "103": {
+ "file_id": 8,
+ "content": "This script uses argparse to control options for a deep learning model's transformer detector, initializing the model on GPU and creating an AdamW optimizer before returning the model and optimizer.",
+ "type": "summary"
+ },
+ "104": {
+ "file_id": 8,
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport argparse\nfrom pathlib import Path\nimport numpy as np\nimport torch\nfrom .models import build_ACT_model, build_CNNMLP_model\nimport IPython\ne = IPython.embed\ndef get_args_parser():\n parser = argparse.ArgumentParser('Set transformer detector', add_help=False)\n parser.add_argument('--lr', default=1e-4, type=float) # will be overridden\n parser.add_argument('--lr_backbone', default=1e-5, type=float) # will be overridden\n parser.add_argument('--batch_size', default=2, type=int) # not used\n parser.add_argument('--weight_decay', default=1e-4, type=float)\n parser.add_argument('--epochs', default=300, type=int) # not used\n parser.add_argument('--lr_drop', default=200, type=int) # not used\n parser.add_argument('--clip_max_norm', default=0.1, type=float, # not used\n help='gradient clipping max norm')\n # Model parameters\n # * Backbone\n parser.add_argument('--backbone', default='resnet18', type=str, # will be overridden",
+ "type": "code",
+ "location": "/detr/main.py:1-25"
+ },
+ "105": {
+ "file_id": 8,
+ "content": "This code imports necessary libraries and functions, defines a parser for command-line arguments, and sets default values for those arguments. It also includes options to customize the backbone model, learning rates, and weight decay for training a transformer detector.",
+ "type": "comment"
+ },
+ "106": {
+ "file_id": 8,
+ "content": " help=\"Name of the convolutional backbone to use\")\n parser.add_argument('--dilation', action='store_true',\n help=\"If true, we replace stride with dilation in the last convolutional block (DC5)\")\n parser.add_argument('--position_embedding', default='sine', type=str, choices=('sine', 'learned'),\n help=\"Type of positional embedding to use on top of the image features\")\n parser.add_argument('--camera_names', default=[], type=list, # will be overridden\n help=\"A list of camera names\")\n # * Transformer\n parser.add_argument('--enc_layers', default=4, type=int, # will be overridden\n help=\"Number of encoding layers in the transformer\")\n parser.add_argument('--dec_layers', default=6, type=int, # will be overridden\n help=\"Number of decoding layers in the transformer\")\n parser.add_argument('--dim_feedforward', default=2048, type=int, # will be overridden\n",
+ "type": "code",
+ "location": "/detr/main.py:26-40"
+ },
+ "107": {
+ "file_id": 8,
+ "content": "This code is defining command line arguments for the main function of a deep learning model. The options include specifying the backbone, enabling dilation in the last convolutional block, choosing the type of positional embedding, and setting the number of encoding and decoding layers as well as the feedforward dimension size in the transformer component of the model.",
+ "type": "comment"
+ },
+ "108": {
+ "file_id": 8,
+ "content": " help=\"Intermediate size of the feedforward layers in the transformer blocks\")\n parser.add_argument('--hidden_dim', default=256, type=int, # will be overridden\n help=\"Size of the embeddings (dimension of the transformer)\")\n parser.add_argument('--dropout', default=0.1, type=float,\n help=\"Dropout applied in the transformer\")\n parser.add_argument('--nheads', default=8, type=int, # will be overridden\n help=\"Number of attention heads inside the transformer's attentions\")\n parser.add_argument('--num_queries', default=400, type=int, # will be overridden\n help=\"Number of query slots\")\n parser.add_argument('--pre_norm', action='store_true')\n # * Segmentation\n parser.add_argument('--masks', action='store_true',\n help=\"Train segmentation head if the flag is provided\")\n # repeat args in imitate_episodes just to avoid error. Will not be used\n parser.add_argument('--eval', action='store_true')",
+ "type": "code",
+ "location": "/detr/main.py:40-56"
+ },
+ "109": {
+ "file_id": 8,
+ "content": "This code is using the argparse module to define command line arguments for a Python script. The arguments include options such as intermediate layer size, hidden dimensions, dropout rate, number of attention heads, number of query slots, pre-normalization, and training segmentation head. The `eval` argument is used to evaluate the model.",
+ "type": "comment"
+ },
+ "110": {
+ "file_id": 8,
+ "content": " parser.add_argument('--onscreen_render', action='store_true')\n parser.add_argument('--ckpt_dir', action='store', type=str, help='ckpt_dir', required=True)\n parser.add_argument('--policy_class', action='store', type=str, help='policy_class, capitalize', required=True)\n parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--seed', action='store', type=int, help='seed', required=True)\n parser.add_argument('--num_steps', action='store', type=int, help='num_epochs', required=True)\n parser.add_argument('--kl_weight', action='store', type=int, help='KL Weight', required=False)\n parser.add_argument('--chunk_size', action='store', type=int, help='chunk_size', required=False)\n parser.add_argument('--temporal_agg', action='store_true')\n parser.add_argument('--use_vq', action='store_true')\n parser.add_argument('--vq_class', action='store', type=int, help='vq_class', required=False)\n parser.add_argument('--vq_dim', action='store', type=int, help='vq_dim', required=False)",
+ "type": "code",
+ "location": "/detr/main.py:57-69"
+ },
+ "111": {
+ "file_id": 8,
+ "content": "The code defines command-line arguments using the \"argparse\" module. It requires a directory for checkpoints, policy class name, task name, seed value, number of steps, and optional arguments like KL weight, chunk size, temporal aggregation, use VQ, VQ class, and VQ dimension.",
+ "type": "comment"
+ },
+ "112": {
+ "file_id": 8,
+ "content": " parser.add_argument('--load_pretrain', action='store_true', default=False)\n parser.add_argument('--action_dim', action='store', type=int, required=False)\n parser.add_argument('--eval_every', action='store', type=int, default=500, help='eval_every', required=False)\n parser.add_argument('--validate_every', action='store', type=int, default=500, help='validate_every', required=False)\n parser.add_argument('--save_every', action='store', type=int, default=500, help='save_every', required=False)\n parser.add_argument('--resume_ckpt_path', action='store', type=str, help='load_ckpt_path', required=False)\n parser.add_argument('--no_encoder', action='store_true')\n parser.add_argument('--skip_mirrored_data', action='store_true')\n parser.add_argument('--actuator_network_dir', action='store', type=str, help='actuator_network_dir', required=False)\n parser.add_argument('--history_len', action='store', type=int)\n parser.add_argument('--future_len', action='store', type=int)\n parser.add_argument('--prediction_len', action='store', type=int)",
+ "type": "code",
+ "location": "/detr/main.py:70-81"
+ },
+ "113": {
+ "file_id": 8,
+ "content": "The code snippet is from a Python script that uses the 'argparse' module to add various command-line arguments with default values, types, and help messages. These arguments control options such as loading pre-trained data, action dimension, evaluation intervals, validation intervals, saving intervals, resuming from a checkpoint file path, skipping mirrored data, and specifying network directories for actuators.",
+ "type": "comment"
+ },
+ "114": {
+ "file_id": 8,
+ "content": " return parser\ndef build_ACT_model_and_optimizer(args_override):\n parser = argparse.ArgumentParser('DETR training and evaluation script', parents=[get_args_parser()])\n args = parser.parse_args()\n for k, v in args_override.items():\n setattr(args, k, v)\n model = build_ACT_model(args)\n model.cuda()\n param_dicts = [\n {\"params\": [p for n, p in model.named_parameters() if \"backbone\" not in n and p.requires_grad]},\n {\n \"params\": [p for n, p in model.named_parameters() if \"backbone\" in n and p.requires_grad],\n \"lr\": args.lr_backbone,\n },\n ]\n optimizer = torch.optim.AdamW(param_dicts, lr=args.lr,\n weight_decay=args.weight_decay)\n return model, optimizer\ndef build_CNNMLP_model_and_optimizer(args_override):\n parser = argparse.ArgumentParser('DETR training and evaluation script', parents=[get_args_parser()])\n args = parser.parse_args()\n for k, v in args_override.items():\n setattr(args, k, v)\n model = build_CNNMLP_model(args)",
+ "type": "code",
+ "location": "/detr/main.py:83-116"
+ },
+ "115": {
+ "file_id": 8,
+ "content": "This code defines functions `build_ACT_model_and_optimizer` and `build_CNNMLP_model_and_optimizer`. The functions parse arguments for DETR training and evaluation script, build the respective models, and set up AdamW optimizers with specified learning rates and weight decay.",
+ "type": "comment"
+ },
+ "116": {
+ "file_id": 8,
+ "content": " model.cuda()\n param_dicts = [\n {\"params\": [p for n, p in model.named_parameters() if \"backbone\" not in n and p.requires_grad]},\n {\n \"params\": [p for n, p in model.named_parameters() if \"backbone\" in n and p.requires_grad],\n \"lr\": args.lr_backbone,\n },\n ]\n optimizer = torch.optim.AdamW(param_dicts, lr=args.lr,\n weight_decay=args.weight_decay)\n return model, optimizer",
+ "type": "code",
+ "location": "/detr/main.py:117-129"
+ },
+ "117": {
+ "file_id": 8,
+ "content": "The code initializes the model on GPU, separates backbone and non-backbone parameters into two dictionaries for different learning rates, creates an AdamW optimizer with specified learning rate and weight decay, and returns the model and optimizer.",
+ "type": "comment"
+ },
+ "118": {
+ "file_id": 9,
+ "content": "/detr/models/__init__.py",
+ "type": "filepath"
+ },
+ "119": {
+ "file_id": 9,
+ "content": "The code imports the build functions for DETR-VAE and CNN+MLP models from their respective modules. It defines two model building functions, `build_ACT_model` and `build_CNNMLP_model`, which return the built models using the imported build functions based on given arguments.",
+ "type": "summary"
+ },
+ "120": {
+ "file_id": 9,
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nfrom .detr_vae import build as build_vae\nfrom .detr_vae import build_cnnmlp as build_cnnmlp\ndef build_ACT_model(args):\n return build_vae(args)\ndef build_CNNMLP_model(args):\n return build_cnnmlp(args)",
+ "type": "code",
+ "location": "/detr/models/__init__.py:1-9"
+ },
+ "121": {
+ "file_id": 9,
+ "content": "The code imports the build functions for DETR-VAE and CNN+MLP models from their respective modules. It defines two model building functions, `build_ACT_model` and `build_CNNMLP_model`, which return the built models using the imported build functions based on given arguments.",
+ "type": "comment"
+ },
+ "122": {
+ "file_id": 10,
+ "content": "/detr/models/backbone.py",
+ "type": "filepath"
+ },
+ "123": {
+ "file_id": 10,
+ "content": "The code defines a Backbone class for ResNet backbones with frozen BatchNorm layers and builds a vision transformer backbone model using position embedding.",
+ "type": "summary"
+ },
+ "124": {
+ "file_id": 10,
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nBackbone modules.\n\"\"\"\nfrom collections import OrderedDict\nimport torch\nimport torch.nn.functional as F\nimport torchvision\nfrom torch import nn\nfrom torchvision.models._utils import IntermediateLayerGetter\nfrom typing import Dict, List\nfrom util.misc import NestedTensor, is_main_process\nfrom .position_encoding import build_position_encoding\nimport IPython\ne = IPython.embed\nclass FrozenBatchNorm2d(torch.nn.Module):\n \"\"\"\n BatchNorm2d where the batch statistics and the affine parameters are fixed.\n Copy-paste from torchvision.misc.ops with added eps before rqsrt,\n without which any other policy_models than torchvision.policy_models.resnet[18,34,50,101]\n produce nans.\n \"\"\"\n def __init__(self, n):\n super(FrozenBatchNorm2d, self).__init__()\n self.register_buffer(\"weight\", torch.ones(n))\n self.register_buffer(\"bias\", torch.zeros(n))\n self.register_buffer(\"running_mean\", torch.zeros(n))\n self.register_buffer(\"running_var\", torch.ones(n))",
+ "type": "code",
+ "location": "/detr/models/backbone.py:1-35"
+ },
+ "125": {
+ "file_id": 10,
+ "content": "This code snippet defines a class called \"FrozenBatchNorm2d\" which extends torch.nn.Module and fixes the batch statistics and affine parameters in BatchNorm2d. It also initializes buffers for weight, bias, running_mean, and running_var with appropriate values. The purpose is to avoid the BatchNorm2d parameters from updating during training, enabling it to function as a frozen layer.",
+ "type": "comment"
+ },
+ "126": {
+ "file_id": 10,
+ "content": " def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,\n missing_keys, unexpected_keys, error_msgs):\n num_batches_tracked_key = prefix + 'num_batches_tracked'\n if num_batches_tracked_key in state_dict:\n del state_dict[num_batches_tracked_key]\n super(FrozenBatchNorm2d, self)._load_from_state_dict(\n state_dict, prefix, local_metadata, strict,\n missing_keys, unexpected_keys, error_msgs)\n def forward(self, x):\n # move reshapes to the beginning\n # to make it fuser-friendly\n w = self.weight.reshape(1, -1, 1, 1)\n b = self.bias.reshape(1, -1, 1, 1)\n rv = self.running_var.reshape(1, -1, 1, 1)\n rm = self.running_mean.reshape(1, -1, 1, 1)\n eps = 1e-5\n scale = w * (rv + eps).rsqrt()\n bias = b - rm * scale\n return x * scale + bias\nclass BackboneBase(nn.Module):\n def __init__(self, backbone: nn.Module, train_backbone: bool, num_channels: int, return_interm_layers: bool):",
+ "type": "code",
+ "location": "/detr/models/backbone.py:37-62"
+ },
+ "127": {
+ "file_id": 10,
+ "content": "Function \"_load_from_state_dict\" deletes \"num_batches_tracked_key\" from state_dict, then calls parent class's version of _load_from_state_dict. Function \"forward\" reshapes weights and biases for efficient processing, calculates scale and bias, and returns the processed input. Class \"BackboneBase\" initializes with backbone, train_backbone, num_channels, and return_interm_layers parameters.",
+ "type": "comment"
+ },
+ "128": {
+ "file_id": 10,
+ "content": " super().__init__()\n # for name, parameter in backbone.named_parameters(): # only train later layers # TODO do we want this?\n # if not train_backbone or 'layer2' not in name and 'layer3' not in name and 'layer4' not in name:\n # parameter.requires_grad_(False)\n if return_interm_layers:\n return_layers = {\"layer1\": \"0\", \"layer2\": \"1\", \"layer3\": \"2\", \"layer4\": \"3\"}\n else:\n return_layers = {'layer4': \"0\"}\n self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)\n self.num_channels = num_channels\n def forward(self, tensor):\n xs = self.body(tensor)\n return xs\n # out: Dict[str, NestedTensor] = {}\n # for name, x in xs.items():\n # m = tensor_list.mask\n # assert m is not None\n # mask = F.interpolate(m[None].float(), size=x.shape[-2:]).to(torch.bool)[0]\n # out[name] = NestedTensor(x, mask)\n # return out\nclass Backbone(BackboneBase):",
+ "type": "code",
+ "location": "/detr/models/backbone.py:63-86"
+ },
+ "129": {
+ "file_id": 10,
+ "content": "This code defines a Backbone class in Python, which is part of a larger codebase. The class extends the BackboneBase and includes an init method to initialize the object, and a forward method for processing input data through the backbone model. It also handles nested tensors and returns them in a dictionary format.",
+ "type": "comment"
+ },
+ "130": {
+ "file_id": 10,
+ "content": " \"\"\"ResNet backbone with frozen BatchNorm.\"\"\"\n def __init__(self, name: str,\n train_backbone: bool,\n return_interm_layers: bool,\n dilation: bool):\n backbone = getattr(torchvision.models, name)(\n replace_stride_with_dilation=[False, False, dilation],\n pretrained=is_main_process(), norm_layer=FrozenBatchNorm2d) # pretrained # TODO do we want frozen batch_norm??\n num_channels = 512 if name in ('resnet18', 'resnet34') else 2048\n super().__init__(backbone, train_backbone, num_channels, return_interm_layers)\nclass Joiner(nn.Sequential):\n def __init__(self, backbone, position_embedding):\n super().__init__(backbone, position_embedding)\n def forward(self, tensor_list: NestedTensor):\n xs = self[0](tensor_list)\n out: List[NestedTensor] = []\n pos = []\n for name, x in xs.items():\n out.append(x)\n # position encoding\n pos.append(self[1](x).to(x.dtype))\n return out, pos",
+ "type": "code",
+ "location": "/detr/models/backbone.py:87-112"
+ },
+ "131": {
+ "file_id": 10,
+ "content": "The code defines a ResNet backbone model with frozen BatchNorm for transfer learning tasks. It includes an option to freeze the BatchNorm layers and a Joiner class that combines the output of the backbone and position encoding for further processing in a list format.",
+ "type": "comment"
+ },
+ "132": {
+ "file_id": 10,
+ "content": "def build_backbone(args):\n position_embedding = build_position_encoding(args)\n train_backbone = args.lr_backbone > 0\n return_interm_layers = args.masks\n backbone = Backbone(args.backbone, train_backbone, return_interm_layers, args.dilation)\n model = Joiner(backbone, position_embedding)\n model.num_channels = backbone.num_channels\n return model",
+ "type": "code",
+ "location": "/detr/models/backbone.py:115-122"
+ },
+ "133": {
+ "file_id": 10,
+ "content": "This function builds a backbone model for a vision transformer. It takes arguments, creates position embedding, sets train and return flags, initializes the backbone, combines it with the position embedding, and returns the final model.",
+ "type": "comment"
+ },
+ "134": {
+ "file_id": 11,
+ "content": "/detr/models/detr_vae.py",
+ "type": "filepath"
+ },
+ "135": {
+ "file_id": 11,
+ "content": "This code defines a DETRVAE model for image object detection, using deep learning architecture and presents a CVAE-DETR model that generates latent inputs. The transformer-based model predicts actions and latent variables using PyTorch.",
+ "type": "summary"
+ },
+ "136": {
+ "file_id": 11,
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nDETR model and criterion classes.\n\"\"\"\nimport torch\nfrom torch import nn\nfrom torch.autograd import Variable\nimport torch.nn.functional as F\nfrom .backbone import build_backbone\nfrom .transformer import build_transformer, TransformerEncoder, TransformerEncoderLayer\nimport numpy as np\nimport IPython\ne = IPython.embed\ndef reparametrize(mu, logvar):\n std = logvar.div(2).exp()\n eps = Variable(std.data.new(std.size()).normal_())\n return mu + std * eps\ndef get_sinusoid_encoding_table(n_position, d_hid):\n def get_position_angle_vec(position):\n return [position / np.power(10000, 2 * (hid_j // 2) / d_hid) for hid_j in range(d_hid)]\n sinusoid_table = np.array([get_position_angle_vec(pos_i) for pos_i in range(n_position)])\n sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i\n sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1\n return torch.FloatTensor(sinusoid_table).unsqueeze(0)\nclass DETRVAE(nn.Module):",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:1-35"
+ },
+ "137": {
+ "file_id": 11,
+ "content": "This code defines the DETRVAE model and its associated functions. It uses modules like `torch`, `nn`, and `TransformerEncoder` to build a deep learning architecture for detecting objects in images. The `reparametrize` function is used for reparameterization trick, while `get_sinusoid_encoding_table` generates sinusoid encodings for positional encoding. The class `DETRVAE` is the main model implementation.",
+ "type": "comment"
+ },
+ "138": {
+ "file_id": 11,
+ "content": " \"\"\" This is the DETR module that performs object detection \"\"\"\n def __init__(self, backbones, transformer, encoder, state_dim, num_queries, camera_names, vq, vq_class, vq_dim, action_dim):\n \"\"\" Initializes the model.\n Parameters:\n backbones: torch module of the backbone to be used. See backbone.py\n transformer: torch module of the transformer architecture. See transformer.py\n state_dim: robot state dimension of the environment\n num_queries: number of object queries, ie detection slot. This is the maximal number of objects\n DETR can detect in a single image. For COCO, we recommend 100 queries.\n aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.\n \"\"\"\n super().__init__()\n self.num_queries = num_queries\n self.camera_names = camera_names\n self.transformer = transformer\n self.encoder = encoder\n self.vq, self.vq_class, self.vq_dim = vq, vq_class, vq_dim",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:36-52"
+ },
+ "139": {
+ "file_id": 11,
+ "content": "The code defines a class called `DETR` for object detection. It takes in backbone, transformer, encoder, state_dim, num_queries, camera_names, vq, vq_class, and vq_dim as parameters to initialize the model. The `num_queries` represents the maximal number of objects that DETR can detect in a single image, and auxiliary decoding losses are optional.",
+ "type": "comment"
+ },
+ "140": {
+ "file_id": 11,
+ "content": " self.state_dim, self.action_dim = state_dim, action_dim\n hidden_dim = transformer.d_model\n self.action_head = nn.Linear(hidden_dim, action_dim)\n self.is_pad_head = nn.Linear(hidden_dim, 1)\n self.query_embed = nn.Embedding(num_queries, hidden_dim)\n if backbones is not None:\n self.input_proj = nn.Conv2d(backbones[0].num_channels, hidden_dim, kernel_size=1)\n self.backbones = nn.ModuleList(backbones)\n self.input_proj_robot_state = nn.Linear(state_dim, hidden_dim)\n else:\n # input_dim = 14 + 7 # robot_state + env_state\n self.input_proj_robot_state = nn.Linear(state_dim, hidden_dim)\n self.input_proj_env_state = nn.Linear(7, hidden_dim)\n self.pos = torch.nn.Embedding(2, hidden_dim)\n self.backbones = None\n # encoder extra parameters\n self.latent_dim = 32 # final size of latent z # TODO tune\n self.cls_embed = nn.Embedding(1, hidden_dim) # extra cls token embedding",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:53-71"
+ },
+ "141": {
+ "file_id": 11,
+ "content": "The code initializes the DETR-VAE model by setting state and action dimensions, defining linear layers for action and pad heads, an embedding layer for queries, and additional layers based on whether backbones are provided or not. If no backbones are provided, it adds separate layers for robot state and environment state projections, a position embedding, and sets the backbones to None. It also sets the latent dimension of the latent z variable to 32 (to be tuned) and adds an extra cls token embedding.",
+ "type": "comment"
+ },
+ "142": {
+ "file_id": 11,
+ "content": " self.encoder_action_proj = nn.Linear(action_dim, hidden_dim) # project action to embedding\n self.encoder_joint_proj = nn.Linear(state_dim, hidden_dim) # project qpos to embedding\n print(f'Use VQ: {self.vq}, {self.vq_class}, {self.vq_dim}')\n if self.vq:\n self.latent_proj = nn.Linear(hidden_dim, self.vq_class * self.vq_dim)\n else:\n self.latent_proj = nn.Linear(hidden_dim, self.latent_dim*2) # project hidden state to latent std, var\n self.register_buffer('pos_table', get_sinusoid_encoding_table(1+1+num_queries, hidden_dim)) # [CLS], qpos, a_seq\n # decoder extra parameters\n if self.vq:\n self.latent_out_proj = nn.Linear(self.vq_class * self.vq_dim, hidden_dim)\n else:\n self.latent_out_proj = nn.Linear(self.latent_dim, hidden_dim) # project latent sample to embedding\n self.additional_pos_embed = nn.Embedding(2, hidden_dim) # learned position embedding for proprio and latent\n def encode(self, qpos, actions=None, is_pad=None, vq_sample=None):",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:72-90"
+ },
+ "143": {
+ "file_id": 11,
+ "content": "The code initializes the layers for a variational autoencoder (VAE) in DETR model. It includes linear layers to project actions and qpos to embedding, VQ-VAE specific latent projection, and decoder parameters such as latent out projection and learned position embeddings for proprio and latent. The encode function takes qpos, actions, is_pad, and vq_sample as inputs.",
+ "type": "comment"
+ },
+ "144": {
+ "file_id": 11,
+ "content": " bs, _ = qpos.shape\n if self.encoder is None:\n latent_sample = torch.zeros([bs, self.latent_dim], dtype=torch.float32).to(qpos.device)\n latent_input = self.latent_out_proj(latent_sample)\n probs = binaries = mu = logvar = None\n else:\n # cvae encoder\n is_training = actions is not None # train or val\n ### Obtain latent z from action sequence\n if is_training:\n # project action sequence to embedding dim, and concat with a CLS token\n action_embed = self.encoder_action_proj(actions) # (bs, seq, hidden_dim)\n qpos_embed = self.encoder_joint_proj(qpos) # (bs, hidden_dim)\n qpos_embed = torch.unsqueeze(qpos_embed, axis=1) # (bs, 1, hidden_dim)\n cls_embed = self.cls_embed.weight # (1, hidden_dim)\n cls_embed = torch.unsqueeze(cls_embed, axis=0).repeat(bs, 1, 1) # (bs, 1, hidden_dim)\n encoder_input = torch.cat([cls_embed, qpos_embed, action_embed], axis=1) # (bs, seq+1, hidden_dim)",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:91-107"
+ },
+ "145": {
+ "file_id": 11,
+ "content": "This code is part of a CVAE (Conditional Variational Autoencoder) model. It obtains the latent variable z from an action sequence and a query position during training. The encoder projects the action sequence to an embedding dimension and concatenates it with a query position embedding and a fixed CLs token embedding. These inputs are then passed to the encoder to get the latent representation.",
+ "type": "comment"
+ },
+ "146": {
+ "file_id": 11,
+ "content": " encoder_input = encoder_input.permute(1, 0, 2) # (seq+1, bs, hidden_dim)\n # do not mask cls token\n cls_joint_is_pad = torch.full((bs, 2), False).to(qpos.device) # False: not a padding\n is_pad = torch.cat([cls_joint_is_pad, is_pad], axis=1) # (bs, seq+1)\n # obtain position embedding\n pos_embed = self.pos_table.clone().detach()\n pos_embed = pos_embed.permute(1, 0, 2) # (seq+1, 1, hidden_dim)\n # query model\n encoder_output = self.encoder(encoder_input, pos=pos_embed, src_key_padding_mask=is_pad)\n encoder_output = encoder_output[0] # take cls output only\n latent_info = self.latent_proj(encoder_output)\n if self.vq:\n logits = latent_info.reshape([*latent_info.shape[:-1], self.vq_class, self.vq_dim])\n probs = torch.softmax(logits, dim=-1)\n binaries = F.one_hot(torch.mult",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:108-123"
+ },
+ "147": {
+ "file_id": 11,
+ "content": "This code snippet is part of a DETR model, specifically the VAE (Variational Autoencoder) implementation. Here, it prepares the input for the encoder and then passes it through the encoder to obtain an encoded representation (latent_info). This encoding is used for the VQ-VAE loss (if enabled), where a one-hot binary encoding of the latents is used to learn a codebook.",
+ "type": "comment"
+ },
+ "148": {
+ "file_id": 11,
+ "content": "inomial(probs.view(-1, self.vq_dim), 1).squeeze(-1), self.vq_dim).view(-1, self.vq_class, self.vq_dim).float()\n binaries_flat = binaries.view(-1, self.vq_class * self.vq_dim)\n probs_flat = probs.view(-1, self.vq_class * self.vq_dim)\n straigt_through = binaries_flat - probs_flat.detach() + probs_flat\n latent_input = self.latent_out_proj(straigt_through)\n mu = logvar = None\n else:\n probs = binaries = None\n mu = latent_info[:, :self.latent_dim]\n logvar = latent_info[:, self.latent_dim:]\n latent_sample = reparametrize(mu, logvar)\n latent_input = self.latent_out_proj(latent_sample)\n else:\n mu = logvar = binaries = probs = None\n if self.vq:\n latent_input = self.latent_out_proj(vq_sample.view(-1, self.vq_class * self.vq_dim))\n else:\n ",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:123-141"
+ },
+ "149": {
+ "file_id": 11,
+ "content": "This code is for a Variational Autoencoder (VAE) model, specifically the DETR-VAE. It calculates the latent input based on whether or not the model is in VQ-VAE mode. If it is, it computes binaries and probs, subtracts them, passes through the latent projection layer, and assigns them to mu and logvar as None. If not, it uses either the provided vq_sample (if available) or calculates the latent input using the latent projection layer if VQ mode is disabled.",
+ "type": "comment"
+ },
+ "150": {
+ "file_id": 11,
+ "content": " latent_sample = torch.zeros([bs, self.latent_dim], dtype=torch.float32).to(qpos.device)\n latent_input = self.latent_out_proj(latent_sample)\n return latent_input, probs, binaries, mu, logvar\n def forward(self, qpos, image, env_state, actions=None, is_pad=None, vq_sample=None):\n \"\"\"\n qpos: batch, qpos_dim\n image: batch, num_cam, channel, height, width\n env_state: None\n actions: batch, seq, action_dim\n \"\"\"\n latent_input, probs, binaries, mu, logvar = self.encode(qpos, actions, is_pad, vq_sample)\n # cvae decoder\n if self.backbones is not None:\n # Image observation features and position embeddings\n all_cam_features = []\n all_cam_pos = []\n for cam_id, cam_name in enumerate(self.camera_names):\n features, pos = self.backbones[cam_id](image[:, cam_id])\n features = features[0] # take the last layer feature\n pos = pos[0]",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:141-163"
+ },
+ "151": {
+ "file_id": 11,
+ "content": "This code snippet defines a method for creating latent samples, initializing variables, and performing encoding using a VAE (Variational AutoEncoder). The forward function takes in inputs like qpos, image, env_state, actions, is_pad, and vq_sample. It encodes the input using the encode method and then applies the CVAE decoder if backbones are provided.",
+ "type": "comment"
+ },
+ "152": {
+ "file_id": 11,
+ "content": " all_cam_features.append(self.input_proj(features))\n all_cam_pos.append(pos)\n # proprioception features\n proprio_input = self.input_proj_robot_state(qpos)\n # fold camera dimension into width dimension\n src = torch.cat(all_cam_features, axis=3)\n pos = torch.cat(all_cam_pos, axis=3)\n hs = self.transformer(src, None, self.query_embed.weight, pos, latent_input, proprio_input, self.additional_pos_embed.weight)[0]\n else:\n qpos = self.input_proj_robot_state(qpos)\n env_state = self.input_proj_env_state(env_state)\n transformer_input = torch.cat([qpos, env_state], axis=1) # seq length = 2\n hs = self.transformer(transformer_input, None, self.query_embed.weight, self.pos.weight)[0]\n a_hat = self.action_head(hs)\n is_pad_hat = self.is_pad_head(hs)\n return a_hat, is_pad_hat, [mu, logvar], probs, binaries\nclass CNNMLP(nn.Module):\n def __init__(self, backbones, state_dim, camera_names):",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:164-184"
+ },
+ "153": {
+ "file_id": 11,
+ "content": "This code defines a model for predicting actions and latent variables. It includes a transformer network, action head, and is pad head. The input includes camera features, proprioception features, robot state, and environment state. The model handles both scenarios with or without cameras. The CNNMLP class initializes the model using backbones, state_dim, and camera names.",
+ "type": "comment"
+ },
+ "154": {
+ "file_id": 11,
+ "content": " \"\"\" Initializes the model.\n Parameters:\n backbones: torch module of the backbone to be used. See backbone.py\n transformer: torch module of the transformer architecture. See transformer.py\n state_dim: robot state dimension of the environment\n num_queries: number of object queries, ie detection slot. This is the maximal number of objects\n DETR can detect in a single image. For COCO, we recommend 100 queries.\n aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.\n \"\"\"\n super().__init__()\n self.camera_names = camera_names\n self.action_head = nn.Linear(1000, state_dim) # TODO add more\n if backbones is not None:\n self.backbones = nn.ModuleList(backbones)\n backbone_down_projs = []\n for backbone in backbones:\n down_proj = nn.Sequential(\n nn.Conv2d(backbone.num_channels, 128, kernel_size=5),",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:185-202"
+ },
+ "155": {
+ "file_id": 11,
+ "content": "This code initializes the model and takes parameters for backbones, transformer, state_dim, num_queries, and aux_loss. It creates an action head using a linear layer with 1000 input size and state_dim output size. If backbones are provided, it creates a ModuleList of backbones and initializes down_proj for each backbone using conv2d with specified parameters.",
+ "type": "comment"
+ },
+ "156": {
+ "file_id": 11,
+ "content": " nn.Conv2d(128, 64, kernel_size=5),\n nn.Conv2d(64, 32, kernel_size=5)\n )\n backbone_down_projs.append(down_proj)\n self.backbone_down_projs = nn.ModuleList(backbone_down_projs)\n mlp_in_dim = 768 * len(backbones) + state_dim\n self.mlp = mlp(input_dim=mlp_in_dim, hidden_dim=1024, output_dim=self.action_dim, hidden_depth=2)\n else:\n raise NotImplementedError\n def forward(self, qpos, image, env_state, actions=None):\n \"\"\"\n qpos: batch, qpos_dim\n image: batch, num_cam, channel, height, width\n env_state: None\n actions: batch, seq, action_dim\n \"\"\"\n is_training = actions is not None # train or val\n bs, _ = qpos.shape\n # Image observation features and position embeddings\n all_cam_features = []\n for cam_id, cam_name in enumerate(self.camera_names):\n features, pos = self.backbones[cam_id](image[:, cam_id])\n features = features[0] # take the last layer feature",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:203-227"
+ },
+ "157": {
+ "file_id": 11,
+ "content": "This code is for a DETR model in PyTorch. It defines the architecture and forward pass. The backbone network consists of two convolutions to downsample the input, followed by a mlp layer if needed. The forward method takes in qpos, image, env_state (None in this case), and optionally actions for training or validation. It extracts image features from each camera view using backbones, concatenates them, and performs positional encoding.",
+ "type": "comment"
+ },
+ "158": {
+ "file_id": 11,
+ "content": " pos = pos[0] # not used\n all_cam_features.append(self.backbone_down_projs[cam_id](features))\n # flatten everything\n flattened_features = []\n for cam_feature in all_cam_features:\n flattened_features.append(cam_feature.reshape([bs, -1]))\n flattened_features = torch.cat(flattened_features, axis=1) # 768 each\n features = torch.cat([flattened_features, qpos], axis=1) # qpos: 14\n a_hat = self.mlp(features)\n return a_hat\ndef mlp(input_dim, hidden_dim, output_dim, hidden_depth):\n if hidden_depth == 0:\n mods = [nn.Linear(input_dim, output_dim)]\n else:\n mods = [nn.Linear(input_dim, hidden_dim), nn.ReLU(inplace=True)]\n for i in range(hidden_depth - 1):\n mods += [nn.Linear(hidden_dim, hidden_dim), nn.ReLU(inplace=True)]\n mods.append(nn.Linear(hidden_dim, output_dim))\n trunk = nn.Sequential(*mods)\n return trunk\ndef build_encoder(args):\n d_model = args.hidden_dim # 256\n dropout = args.dropout # 0.1",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:228-254"
+ },
+ "159": {
+ "file_id": 11,
+ "content": "This code defines a DETR VAE model, including functions for building the encoder and creating an MLP. The encoder takes input features and positions (qpos) to create a flattened feature matrix, which is then passed through an MLP to produce the final output (a_hat).",
+ "type": "comment"
+ },
+ "160": {
+ "file_id": 11,
+ "content": " nhead = args.nheads # 8\n dim_feedforward = args.dim_feedforward # 2048\n num_encoder_layers = args.enc_layers # 4 # TODO shared with VAE decoder\n normalize_before = args.pre_norm # False\n activation = \"relu\"\n encoder_layer = TransformerEncoderLayer(d_model, nhead, dim_feedforward,\n dropout, activation, normalize_before)\n encoder_norm = nn.LayerNorm(d_model) if normalize_before else None\n encoder = TransformerEncoder(encoder_layer, num_encoder_layers, encoder_norm)\n return encoder\ndef build(args):\n state_dim = 14 # TODO hardcode\n # From state\n # backbone = None # from state for now, no need for conv nets\n # From image\n backbones = []\n for _ in args.camera_names:\n backbone = build_backbone(args)\n backbones.append(backbone)\n transformer = build_transformer(args)\n if args.no_encoder:\n encoder = None\n else:\n encoder = build_transformer(args)\n model = DETRVAE(\n backbones,\n transformer,",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:255-289"
+ },
+ "161": {
+ "file_id": 11,
+ "content": "This code builds a DETRVAE model by defining its components and parameters. It initializes the transformer encoder, decoder, and VAE components based on provided arguments. The backbone for image processing is built using a function call to build_backbone(args). If no encoder is required, it sets the encoder as None.",
+ "type": "comment"
+ },
+ "162": {
+ "file_id": 11,
+ "content": " encoder,\n state_dim=state_dim,\n num_queries=args.num_queries,\n camera_names=args.camera_names,\n vq=args.vq,\n vq_class=args.vq_class,\n vq_dim=args.vq_dim,\n action_dim=args.action_dim,\n )\n n_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)\n print(\"number of parameters: %.2fM\" % (n_parameters/1e6,))\n return model\ndef build_cnnmlp(args):\n state_dim = 14 # TODO hardcode\n # From state\n # backbone = None # from state for now, no need for conv nets\n # From image\n backbones = []\n for _ in args.camera_names:\n backbone = build_backbone(args)\n backbones.append(backbone)\n model = CNNMLP(\n backbones,\n state_dim=state_dim,\n camera_names=args.camera_names,\n )\n n_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)\n print(\"number of parameters: %.2fM\" % (n_parameters/1e6,))\n return model",
+ "type": "code",
+ "location": "/detr/models/detr_vae.py:290-325"
+ },
+ "163": {
+ "file_id": 11,
+ "content": "This code defines two functions, `detr_vae` and `build_cnnmlp`, which build different models. Both functions return a model object after printing the number of parameters it has. The `detr_vae` function requires additional arguments like `state_dim`, `num_queries`, `camera_names`, `vq`, `vq_class`, and `action_dim`. The `build_cnnmlp` function requires an `args` argument, which it uses to create a CNNMLP model by building backbones for each camera name provided in the arguments.",
+ "type": "comment"
+ },
+ "164": {
+ "file_id": 12,
+ "content": "/detr/models/latent_model.py",
+ "type": "filepath"
+ },
+ "165": {
+ "file_id": 12,
+ "content": "Latent_Model_Transformer extends nn.Module, uses self-attention for latent space sequence modeling, has configurable input/output dimensions and sequence length, defaulting to 256 latent dimension, 8 heads, and 3 layers. The class has 'forward' and 'generate' methods for generating new samples by iteratively sampling from the output of the forward pass using temperature-scaled softmax and one-hot encoding.",
+ "type": "summary"
+ },
+ "166": {
+ "file_id": 12,
+ "content": "import torch.nn as nn\nfrom torch.nn import functional as F\nimport torch\nDROPOUT_RATE = 0.1\n# a causal transformer block\nclass Causal_Transformer_Block(nn.Module):\n def __init__(self, seq_len, latent_dim, num_head) -> None:\n super().__init__()\n self.num_head = num_head\n self.latent_dim = latent_dim\n self.ln_1 = nn.LayerNorm(latent_dim)\n self.attn = nn.MultiheadAttention(latent_dim, num_head, dropout=DROPOUT_RATE, batch_first=True)\n self.ln_2 = nn.LayerNorm(latent_dim)\n self.mlp = nn.Sequential(\n nn.Linear(latent_dim, 4 * latent_dim),\n nn.GELU(),\n nn.Linear(4 * latent_dim, latent_dim),\n nn.Dropout(DROPOUT_RATE),\n )\n # self.register_buffer(\"attn_mask\", torch.triu(torch.ones(seq_len, seq_len), diagonal=1).bool())\n def forward(self, x):\n attn_mask = torch.triu(torch.ones(x.shape[1], x.shape[1], device=x.device, dtype=torch.bool), diagonal=1)\n x = self.ln_1(x)\n x = x + self.attn(x, x, x, attn_mask=attn_mask)[0]",
+ "type": "code",
+ "location": "/detr/models/latent_model.py:1-28"
+ },
+ "167": {
+ "file_id": 12,
+ "content": "Causal Transformer block: LayerNormalization, MultiHeadAttention with dropout, and MLP sequential layers.",
+ "type": "comment"
+ },
+ "168": {
+ "file_id": 12,
+ "content": " x = self.ln_2(x)\n x = x + self.mlp(x)\n return x\n# use self-attention instead of RNN to model the latent space sequence\nclass Latent_Model_Transformer(nn.Module):\n def __init__(self, input_dim, output_dim, seq_len, latent_dim=256, num_head=8, num_layer=3) -> None:\n super().__init__()\n self.input_dim = input_dim\n self.output_dim = output_dim\n self.seq_len = seq_len\n self.latent_dim = latent_dim\n self.num_head = num_head\n self.num_layer = num_layer\n self.input_layer = nn.Linear(input_dim, latent_dim)\n self.weight_pos_embed = nn.Embedding(seq_len, latent_dim)\n self.attention_blocks = nn.Sequential(\n nn.Dropout(DROPOUT_RATE),\n *[Causal_Transformer_Block(seq_len, latent_dim, num_head) for _ in range(num_layer)],\n nn.LayerNorm(latent_dim)\n )\n self.output_layer = nn.Linear(latent_dim, output_dim)\n def forward(self, x):\n x = self.input_layer(x)\n x = x + self.weight_pos_embed(torch.arange(x.shape[1], device=x.device))",
+ "type": "code",
+ "location": "/detr/models/latent_model.py:29-55"
+ },
+ "169": {
+ "file_id": 12,
+ "content": "In \"act-plus-plus/detr/models/latent_model.py\", lines 28-54 define a class Latent_Model_Transformer that extends nn.Module. This model uses self-attention instead of RNN to model the latent space sequence. It takes an input dimension, output dimension, sequence length, latent dimension (default 256), number of heads (default 8), and number of layers (default 3). The forward method applies an input layer, adds positional embedding, passes through a series of causal transformer blocks, and finally outputs through an output layer.",
+ "type": "comment"
+ },
+ "170": {
+ "file_id": 12,
+ "content": " x = self.attention_blocks(x)\n logits = self.output_layer(x)\n return logits\n @torch.no_grad()\n def generate(self, n, temperature=0.1, x=None):\n if x is None:\n x = torch.zeros((n, 1, self.input_dim), device=self.weight_pos_embed.weight.device)\n for i in range(self.seq_len):\n logits = self.forward(x)[:, -1]\n probs = torch.softmax(logits / temperature, dim=-1)\n samples = torch.multinomial(probs, num_samples=1)[..., 0]\n samples_one_hot = F.one_hot(samples.long(), num_classes=self.output_dim).float()\n x = torch.cat([x, samples_one_hot[:, None, :]], dim=1)\n return x[:, 1:, :]",
+ "type": "code",
+ "location": "/detr/models/latent_model.py:56-72"
+ },
+ "171": {
+ "file_id": 12,
+ "content": "This code defines a class with two methods: 'forward' and 'generate'. The 'forward' method applies attention blocks to the input, then passes it through an output layer to produce logits. The 'generate' method generates new samples by iteratively sampling from the output of the forward pass using temperature-scaled softmax and one-hot encoding. The generated samples are appended to the original input and returned after trimming unnecessary rows.",
+ "type": "comment"
+ },
+ "172": {
+ "file_id": 13,
+ "content": "/detr/models/position_encoding.py",
+ "type": "filepath"
+ },
+ "173": {
+ "file_id": 13,
+ "content": "This code defines a transformer positional embedding class using sine and cosine encodings for position embeddings. The forward function applies these encodings to the input tensor 'x', normalizing cumulative sums before applying dimensional transformation. This learned absolute position embedding extends nn.Module and is used in transformer models.",
+ "type": "summary"
+ },
+ "174": {
+ "file_id": 13,
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nVarious positional encodings for the transformer.\n\"\"\"\nimport math\nimport torch\nfrom torch import nn\nfrom util.misc import NestedTensor\nimport IPython\ne = IPython.embed\nclass PositionEmbeddingSine(nn.Module):\n \"\"\"\n This is a more standard version of the position embedding, very similar to the one\n used by the Attention is all you need paper, generalized to work on images.\n \"\"\"\n def __init__(self, num_pos_feats=64, temperature=10000, normalize=False, scale=None):\n super().__init__()\n self.num_pos_feats = num_pos_feats\n self.temperature = temperature\n self.normalize = normalize\n if scale is not None and normalize is False:\n raise ValueError(\"normalize should be True if scale is passed\")\n if scale is None:\n scale = 2 * math.pi\n self.scale = scale\n def forward(self, tensor):\n x = tensor\n # mask = tensor_list.mask\n # assert mask is not None",
+ "type": "code",
+ "location": "/detr/models/position_encoding.py:1-33"
+ },
+ "175": {
+ "file_id": 13,
+ "content": "This code defines a positional embedding class for transformers, similar to the one used in the Attention is All You Need paper. It takes in parameters such as num_pos_feats (number of position features), temperature, normalize (whether to normalize or not), and scale. The forward function applies sine and cosine positional encodings to tensor.",
+ "type": "comment"
+ },
+ "176": {
+ "file_id": 13,
+ "content": " # not_mask = ~mask\n not_mask = torch.ones_like(x[0, [0]])\n y_embed = not_mask.cumsum(1, dtype=torch.float32)\n x_embed = not_mask.cumsum(2, dtype=torch.float32)\n if self.normalize:\n eps = 1e-6\n y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale\n x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale\n dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)\n dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)\n pos_x = x_embed[:, :, :, None] / dim_t\n pos_y = y_embed[:, :, :, None] / dim_t\n pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3)\n pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3)\n pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)\n return pos\nclass PositionEmbeddingLearned(nn.Module):\n \"\"\"\n Absolute pos embedding, learned.",
+ "type": "code",
+ "location": "/detr/models/position_encoding.py:34-57"
+ },
+ "177": {
+ "file_id": 13,
+ "content": "This code generates position embeddings for a given input tensor 'x'. It first creates not_mask and computes the cumulative sums along rows and columns. Then, it normalizes these sums by dividing them with their respective last elements plus a small epsilon value and multiplies them by a scale factor. The code then calculates a temperature-based dimensional transformation for each element in 'x'. It further computes the sine and cosine of the transformed values, stacks them and flattens them along one dimension. Finally, it concatenates the y and x embeddings along the last dimension, permutes the dimensions, and returns the result. This class extends nn.Module and is used for creating learned absolute position embeddings.",
+ "type": "comment"
+ },
+ "178": {
+ "file_id": 13,
+ "content": " \"\"\"\n def __init__(self, num_pos_feats=256):\n super().__init__()\n self.row_embed = nn.Embedding(50, num_pos_feats)\n self.col_embed = nn.Embedding(50, num_pos_feats)\n self.reset_parameters()\n def reset_parameters(self):\n nn.init.uniform_(self.row_embed.weight)\n nn.init.uniform_(self.col_embed.weight)\n def forward(self, tensor_list: NestedTensor):\n x = tensor_list.tensors\n h, w = x.shape[-2:]\n i = torch.arange(w, device=x.device)\n j = torch.arange(h, device=x.device)\n x_emb = self.col_embed(i)\n y_emb = self.row_embed(j)\n pos = torch.cat([\n x_emb.unsqueeze(0).repeat(h, 1, 1),\n y_emb.unsqueeze(1).repeat(1, w, 1),\n ], dim=-1).permute(2, 0, 1).unsqueeze(0).repeat(x.shape[0], 1, 1, 1)\n return pos\ndef build_position_encoding(args):\n N_steps = args.hidden_dim // 2\n if args.position_embedding in ('v2', 'sine'):\n # TODO find a better way of exposing other arguments\n position_embedding = PositionEmbeddingSine(N_steps, normalize=True)",
+ "type": "code",
+ "location": "/detr/models/position_encoding.py:58-87"
+ },
+ "179": {
+ "file_id": 13,
+ "content": "This code defines a class \"PositionEmbeddingSine\" for creating position encoding using sine and cosine functions. It takes the number of positional features as input and initializes two embedding layers, one for rows and another for columns. The \"forward\" method computes position embeddings by applying row and column embeddings to image indices and returns them. The \"build_position_encoding\" function creates an instance of PositionEmbeddingSine based on the given arguments.",
+ "type": "comment"
+ },
+ "180": {
+ "file_id": 13,
+ "content": " elif args.position_embedding in ('v3', 'learned'):\n position_embedding = PositionEmbeddingLearned(N_steps)\n else:\n raise ValueError(f\"not supported {args.position_embedding}\")\n return position_embedding",
+ "type": "code",
+ "location": "/detr/models/position_encoding.py:88-93"
+ },
+ "181": {
+ "file_id": 13,
+ "content": "This code snippet checks the value of 'args.position_embedding' and if it is set to either 'v3' or 'learned', it creates a PositionEmbeddingLearned object. If the input is neither of these, it raises a ValueError with an error message. Finally, it returns the created position embedding object.",
+ "type": "comment"
+ },
+ "182": {
+ "file_id": 14,
+ "content": "/detr/models/transformer.py",
+ "type": "filepath"
+ },
+ "183": {
+ "file_id": 14,
+ "content": "The code defines a Transformer class in PyTorch for data processing, featuring encoder and decoder modules, positional embeddings, transformer layers, and optional masks and position embeddings.",
+ "type": "summary"
+ },
+ "184": {
+ "file_id": 14,
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nDETR Transformer class.\nCopy-paste from torch.nn.Transformer with modifications:\n * positional encodings are passed in MHattention\n * extra LN at the end of encoder is removed\n * decoder returns a stack of activations from all decoding layers\n\"\"\"\nimport copy\nfrom typing import Optional, List\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn, Tensor\nimport IPython\ne = IPython.embed\nclass Transformer(nn.Module):\n def __init__(self, d_model=512, nhead=8, num_encoder_layers=6,\n num_decoder_layers=6, dim_feedforward=2048, dropout=0.1,\n activation=\"relu\", normalize_before=False,\n return_intermediate_dec=False):\n super().__init__()\n encoder_layer = TransformerEncoderLayer(d_model, nhead, dim_feedforward,\n dropout, activation, normalize_before)\n encoder_norm = nn.LayerNorm(d_model) if normalize_before else None",
+ "type": "code",
+ "location": "/detr/models/transformer.py:1-30"
+ },
+ "185": {
+ "file_id": 14,
+ "content": "This code defines the Transformer class from scratch with minor modifications to the original implementation, including passing positional encodings in MHAttention, removing an extra LN layer in the encoder, and allowing for intermediate decoder activations to be returned. It inherits from nn.Module and has several parameters for customization.",
+ "type": "comment"
+ },
+ "186": {
+ "file_id": 14,
+ "content": " self.encoder = TransformerEncoder(encoder_layer, num_encoder_layers, encoder_norm)\n decoder_layer = TransformerDecoderLayer(d_model, nhead, dim_feedforward,\n dropout, activation, normalize_before)\n decoder_norm = nn.LayerNorm(d_model)\n self.decoder = TransformerDecoder(decoder_layer, num_decoder_layers, decoder_norm,\n return_intermediate=return_intermediate_dec)\n self._reset_parameters()\n self.d_model = d_model\n self.nhead = nhead\n def _reset_parameters(self):\n for p in self.parameters():\n if p.dim() > 1:\n nn.init.xavier_uniform_(p)\n def forward(self, src, mask, query_embed, pos_embed, latent_input=None, proprio_input=None, additional_pos_embed=None):\n # TODO flatten only when input has H and W\n if len(src.shape) == 4: # has H and W\n # flatten NxCxHxW to HWxNxC\n bs, c, h, w = src.shape\n src = src.flatten(2).permute(2, 0, 1)",
+ "type": "code",
+ "location": "/detr/models/transformer.py:31-54"
+ },
+ "187": {
+ "file_id": 14,
+ "content": "This code initializes a Transformer model with an encoder and decoder, performing parameter initialization and normalization. It also includes a forward method for processing input data with possible flattening for images.",
+ "type": "comment"
+ },
+ "188": {
+ "file_id": 14,
+ "content": " pos_embed = pos_embed.flatten(2).permute(2, 0, 1).repeat(1, bs, 1)\n query_embed = query_embed.unsqueeze(1).repeat(1, bs, 1)\n # mask = mask.flatten(1)\n additional_pos_embed = additional_pos_embed.unsqueeze(1).repeat(1, bs, 1) # seq, bs, dim\n pos_embed = torch.cat([additional_pos_embed, pos_embed], axis=0)\n addition_input = torch.stack([latent_input, proprio_input], axis=0)\n src = torch.cat([addition_input, src], axis=0)\n else:\n assert len(src.shape) == 3\n # flatten NxHWxC to HWxNxC\n bs, hw, c = src.shape\n src = src.permute(1, 0, 2)\n pos_embed = pos_embed.unsqueeze(1).repeat(1, bs, 1)\n query_embed = query_embed.unsqueeze(1).repeat(1, bs, 1)\n tgt = torch.zeros_like(query_embed)\n memory = self.encoder(src, src_key_padding_mask=mask, pos=pos_embed)\n hs = self.decoder(tgt, memory, memory_key_padding_mask=mask,\n pos=pos_embed, query_pos=query_embed)",
+ "type": "code",
+ "location": "/detr/models/transformer.py:55-75"
+ },
+ "189": {
+ "file_id": 14,
+ "content": "The code initializes the transformer model by handling different source (src) input shapes. It either flattens and repeats the inputs if the shape is bs, hw, c or simply permutes and repeats if the shape is NxHWxC. Positional embeddings are calculated for both position and additional positional information. The decoder uses these embeddings to process target (tgt) and source memory.",
+ "type": "comment"
+ },
+ "190": {
+ "file_id": 14,
+ "content": " hs = hs.transpose(1, 2)\n return hs\nclass TransformerEncoder(nn.Module):\n def __init__(self, encoder_layer, num_layers, norm=None):\n super().__init__()\n self.layers = _get_clones(encoder_layer, num_layers)\n self.num_layers = num_layers\n self.norm = norm\n def forward(self, src,\n mask: Optional[Tensor] = None,\n src_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None):\n output = src\n for layer in self.layers:\n output = layer(output, src_mask=mask,\n src_key_padding_mask=src_key_padding_mask, pos=pos)\n if self.norm is not None:\n output = self.norm(output)\n return output\nclass TransformerDecoder(nn.Module):\n def __init__(self, decoder_layer, num_layers, norm=None, return_intermediate=False):\n super().__init__()\n self.layers = _get_clones(decoder_layer, num_layers)\n self.num_layers = num_layers\n self.norm = norm",
+ "type": "code",
+ "location": "/detr/models/transformer.py:76-109"
+ },
+ "191": {
+ "file_id": 14,
+ "content": "This code defines two classes: TransformerEncoder and TransformerDecoder. The TransformerEncoder class initializes an encoder with a specified number of layers and normalization method, then forwards input through each layer in the encoder. The TransformerDecoder class initializes a decoder with a specified number of layers and normalization method, then forwards input through each layer in the decoder. Both classes can handle optional masks and positions during forward propagation.",
+ "type": "comment"
+ },
+ "192": {
+ "file_id": 14,
+ "content": " self.return_intermediate = return_intermediate\n def forward(self, tgt, memory,\n tgt_mask: Optional[Tensor] = None,\n memory_mask: Optional[Tensor] = None,\n tgt_key_padding_mask: Optional[Tensor] = None,\n memory_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None,\n query_pos: Optional[Tensor] = None):\n output = tgt\n intermediate = []\n for layer in self.layers:\n output = layer(output, memory, tgt_mask=tgt_mask,\n memory_mask=memory_mask,\n tgt_key_padding_mask=tgt_key_padding_mask,\n memory_key_padding_mask=memory_key_padding_mask,\n pos=pos, query_pos=query_pos)\n if self.return_intermediate:\n intermediate.append(self.norm(output))\n if self.norm is not None:\n output = self.norm(output)\n if self.return_intermediate:",
+ "type": "code",
+ "location": "/detr/models/transformer.py:110-134"
+ },
+ "193": {
+ "file_id": 14,
+ "content": "The code defines a Transformer model's forward pass, where each layer applies its operations iteratively on the target (tgt) and memory inputs. The intermediate results are stored if return_intermediate is set to True. Finally, the norm layer normalizes the output, and if return_intermediate is set, stores the normalized outputs as intermediates.",
+ "type": "comment"
+ },
+ "194": {
+ "file_id": 14,
+ "content": " intermediate.pop()\n intermediate.append(output)\n if self.return_intermediate:\n return torch.stack(intermediate)\n return output.unsqueeze(0)\nclass TransformerEncoderLayer(nn.Module):\n def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1,\n activation=\"relu\", normalize_before=False):\n super().__init__()\n self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)\n # Implementation of Feedforward model\n self.linear1 = nn.Linear(d_model, dim_feedforward)\n self.dropout = nn.Dropout(dropout)\n self.linear2 = nn.Linear(dim_feedforward, d_model)\n self.norm1 = nn.LayerNorm(d_model)\n self.norm2 = nn.LayerNorm(d_model)\n self.dropout1 = nn.Dropout(dropout)\n self.dropout2 = nn.Dropout(dropout)\n self.activation = _get_activation_fn(activation)\n self.normalize_before = normalize_before\n def with_pos_embed(self, tensor, pos: Optional[Tensor]):",
+ "type": "code",
+ "location": "/detr/models/transformer.py:135-163"
+ },
+ "195": {
+ "file_id": 14,
+ "content": "This code defines a class called \"TransformerEncoderLayer\" which implements a layer for the transformer encoder in the Transformer model. It consists of a self-attention mechanism, followed by a feedforward network and normalization layers. The \"return_intermediate\" parameter controls whether intermediate results are returned or not.",
+ "type": "comment"
+ },
+ "196": {
+ "file_id": 14,
+ "content": " return tensor if pos is None else tensor + pos\n def forward_post(self,\n src,\n src_mask: Optional[Tensor] = None,\n src_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None):\n q = k = self.with_pos_embed(src, pos)\n src2 = self.self_attn(q, k, value=src, attn_mask=src_mask,\n key_padding_mask=src_key_padding_mask)[0]\n src = src + self.dropout1(src2)\n src = self.norm1(src)\n src2 = self.linear2(self.dropout(self.activation(self.linear1(src))))\n src = src + self.dropout2(src2)\n src = self.norm2(src)\n return src\n def forward_pre(self, src,\n src_mask: Optional[Tensor] = None,\n src_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None):\n src2 = self.norm1(src)\n q = k = self.with_pos_embed(src2, pos)\n src2 = self.self_attn(q, k, value=src2, attn_mask=src_mask,",
+ "type": "code",
+ "location": "/detr/models/transformer.py:164-187"
+ },
+ "197": {
+ "file_id": 14,
+ "content": "This code defines three functions: `forward_post`, `forward_pre`, and a helper function that calculates the tensor based on positional embeddings. The `forward_post` function applies self-attention to the input source, adds it back to the original source, and performs two feed-forward layers with residual connections and layer normalization for each of them. The `forward_pre` function applies layer normalization to the input source, calculates self-attention based on positional embeddings, and performs two feed-forward layers similar to `forward_post`. The code seems to be part of a transformer model in natural language processing or computer vision tasks that incorporate position information.",
+ "type": "comment"
+ },
+ "198": {
+ "file_id": 14,
+ "content": " key_padding_mask=src_key_padding_mask)[0]\n src = src + self.dropout1(src2)\n src2 = self.norm2(src)\n src2 = self.linear2(self.dropout(self.activation(self.linear1(src2))))\n src = src + self.dropout2(src2)\n return src\n def forward(self, src,\n src_mask: Optional[Tensor] = None,\n src_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None):\n if self.normalize_before:\n return self.forward_pre(src, src_mask, src_key_padding_mask, pos)\n return self.forward_post(src, src_mask, src_key_padding_mask, pos)\nclass TransformerDecoderLayer(nn.Module):\n def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1,\n activation=\"relu\", normalize_before=False):\n super().__init__()\n self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)\n self.multihead_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)",
+ "type": "code",
+ "location": "/detr/models/transformer.py:188-210"
+ },
+ "199": {
+ "file_id": 14,
+ "content": "This code defines a TransformerDecoderLayer class that inherits from nn.Module and takes in parameters such as d_model, nhead, dim_feedforward, dropout, activation, and normalize_before. The class has methods for forward pass and initializing the layer. It also includes an instance of MultiheadAttention for self attention and multi-headed attention.",
+ "type": "comment"
+ }
+}
\ No newline at end of file
diff --git a/docs/data/2.json b/docs/data/2.json
new file mode 100644
index 00000000..18fbfbc5
--- /dev/null
+++ b/docs/data/2.json
@@ -0,0 +1,545 @@
+{
+ "200": {
+ "file_id": 14,
+ "content": " # Implementation of Feedforward model\n self.linear1 = nn.Linear(d_model, dim_feedforward)\n self.dropout = nn.Dropout(dropout)\n self.linear2 = nn.Linear(dim_feedforward, d_model)\n self.norm1 = nn.LayerNorm(d_model)\n self.norm2 = nn.LayerNorm(d_model)\n self.norm3 = nn.LayerNorm(d_model)\n self.dropout1 = nn.Dropout(dropout)\n self.dropout2 = nn.Dropout(dropout)\n self.dropout3 = nn.Dropout(dropout)\n self.activation = _get_activation_fn(activation)\n self.normalize_before = normalize_before\n def with_pos_embed(self, tensor, pos: Optional[Tensor]):\n return tensor if pos is None else tensor + pos\n def forward_post(self, tgt, memory,\n tgt_mask: Optional[Tensor] = None,\n memory_mask: Optional[Tensor] = None,\n tgt_key_padding_mask: Optional[Tensor] = None,\n memory_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None,",
+ "type": "code",
+ "location": "/detr/models/transformer.py:211-234"
+ },
+ "201": {
+ "file_id": 14,
+ "content": "This code defines a class for the Feedforward model in Transformer architecture. It includes several linear layers, dropout layers, and layer normalization. The forward_post method takes input tensors, masks, and positional embeddings as arguments to perform feed-forward operations.",
+ "type": "comment"
+ },
+ "202": {
+ "file_id": 14,
+ "content": " query_pos: Optional[Tensor] = None):\n q = k = self.with_pos_embed(tgt, query_pos)\n tgt2 = self.self_attn(q, k, value=tgt, attn_mask=tgt_mask,\n key_padding_mask=tgt_key_padding_mask)[0]\n tgt = tgt + self.dropout1(tgt2)\n tgt = self.norm1(tgt)\n tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt, query_pos),\n key=self.with_pos_embed(memory, pos),\n value=memory, attn_mask=memory_mask,\n key_padding_mask=memory_key_padding_mask)[0]\n tgt = tgt + self.dropout2(tgt2)\n tgt = self.norm2(tgt)\n tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))\n tgt = tgt + self.dropout3(tgt2)\n tgt = self.norm3(tgt)\n return tgt\n def forward_pre(self, tgt, memory,\n tgt_mask: Optional[Tensor] = None,\n memory_mask: Optional[Tensor] = None,\n tgt_key_padding_mask: Optional[Tensor] = None,",
+ "type": "code",
+ "location": "/detr/models/transformer.py:235-255"
+ },
+ "203": {
+ "file_id": 14,
+ "content": "This function performs multi-head self-attention, applies layer normalization and feed-forward network layers to the target sequence. It takes in the target (tgt) and memory sequences, along with optional masking tensors for attention masks and key padding masks. It returns the processed target sequence.",
+ "type": "comment"
+ },
+ "204": {
+ "file_id": 14,
+ "content": " memory_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None,\n query_pos: Optional[Tensor] = None):\n tgt2 = self.norm1(tgt)\n q = k = self.with_pos_embed(tgt2, query_pos)\n tgt2 = self.self_attn(q, k, value=tgt2, attn_mask=tgt_mask,\n key_padding_mask=tgt_key_padding_mask)[0]\n tgt = tgt + self.dropout1(tgt2)\n tgt2 = self.norm2(tgt)\n tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt2, query_pos),\n key=self.with_pos_embed(memory, pos),\n value=memory, attn_mask=memory_mask,\n key_padding_mask=memory_key_padding_mask)[0]\n tgt = tgt + self.dropout2(tgt2)\n tgt2 = self.norm3(tgt)\n tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))\n tgt = tgt + self.dropout3(tgt2)\n return tgt\n def forward(self, tgt, memory,",
+ "type": "code",
+ "location": "/detr/models/transformer.py:256-275"
+ },
+ "205": {
+ "file_id": 14,
+ "content": "This code defines a function for the transformer model in PyTorch. It performs self-attention on the target sequence (tgt) and applies multi-head attention to interact with memory, incorporating positional embeddings and masking for attentive processing. Finally, it passes through a feed-forward network and dropout layers before returning the modified target sequence.",
+ "type": "comment"
+ },
+ "206": {
+ "file_id": 14,
+ "content": " tgt_mask: Optional[Tensor] = None,\n memory_mask: Optional[Tensor] = None,\n tgt_key_padding_mask: Optional[Tensor] = None,\n memory_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None,\n query_pos: Optional[Tensor] = None):\n if self.normalize_before:\n return self.forward_pre(tgt, memory, tgt_mask, memory_mask,\n tgt_key_padding_mask, memory_key_padding_mask, pos, query_pos)\n return self.forward_post(tgt, memory, tgt_mask, memory_mask,\n tgt_key_padding_mask, memory_key_padding_mask, pos, query_pos)\ndef _get_clones(module, N):\n return nn.ModuleList([copy.deepcopy(module) for i in range(N)])\ndef build_transformer(args):\n return Transformer(\n d_model=args.hidden_dim,\n dropout=args.dropout,\n nhead=args.nheads,\n dim_feedforward=args.dim_feedforward,\n num_encoder_layers=args.enc_layers,",
+ "type": "code",
+ "location": "/detr/models/transformer.py:276-299"
+ },
+ "207": {
+ "file_id": 14,
+ "content": "The code defines a Transformer model with optional masks and position embeddings, using deepcopy to create N identical modules for parallel processing. The build_transformer function initializes the Transformer model with given argument values.",
+ "type": "comment"
+ },
+ "208": {
+ "file_id": 14,
+ "content": " num_decoder_layers=args.dec_layers,\n normalize_before=args.pre_norm,\n return_intermediate_dec=True,\n )\ndef _get_activation_fn(activation):\n \"\"\"Return an activation function given a string\"\"\"\n if activation == \"relu\":\n return F.relu\n if activation == \"gelu\":\n return F.gelu\n if activation == \"glu\":\n return F.glu\n raise RuntimeError(F\"activation should be relu/gelu, not {activation}.\")",
+ "type": "code",
+ "location": "/detr/models/transformer.py:300-314"
+ },
+ "209": {
+ "file_id": 14,
+ "content": "This code defines a function for creating a transformer model with specified parameters and returns an activation function based on the input string.",
+ "type": "comment"
+ },
+ "210": {
+ "file_id": 15,
+ "content": "/detr/setup.py",
+ "type": "filepath"
+ },
+ "211": {
+ "file_id": 15,
+ "content": "The code imports necessary modules and sets up a setup script for the \"detr\" package using setuptools. It defines the package name, version, licenses, and reads the long description from the README file.",
+ "type": "summary"
+ },
+ "212": {
+ "file_id": 15,
+ "content": "from distutils.core import setup\nfrom setuptools import find_packages\nsetup(\n name='detr',\n version='0.0.0',\n packages=find_packages(),\n license='MIT License',\n long_description=open('README.md').read(),\n)",
+ "type": "code",
+ "location": "/detr/setup.py:1-10"
+ },
+ "213": {
+ "file_id": 15,
+ "content": "The code imports necessary modules and sets up a setup script for the \"detr\" package using setuptools. It defines the package name, version, licenses, and reads the long description from the README file.",
+ "type": "comment"
+ },
+ "214": {
+ "file_id": 16,
+ "content": "/detr/util/__init__.py",
+ "type": "filepath"
+ },
+ "215": {
+ "file_id": 16,
+ "content": "This is the copyright statement for the codebase, indicating that Facebook and its affiliates hold the rights to this code.",
+ "type": "summary"
+ },
+ "216": {
+ "file_id": 16,
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved",
+ "type": "code",
+ "location": "/detr/util/__init__.py:1-1"
+ },
+ "217": {
+ "file_id": 16,
+ "content": "This is the copyright statement for the codebase, indicating that Facebook and its affiliates hold the rights to this code.",
+ "type": "comment"
+ },
+ "218": {
+ "file_id": 17,
+ "content": "/detr/util/box_ops.py",
+ "type": "filepath"
+ },
+ "219": {
+ "file_id": 17,
+ "content": "This code contains functions for bounding box manipulation and GIoU, including coordinate system conversion utilities, IOU calculation, modified torchvision box_iou function, and two functions for computing mask coordinates.",
+ "type": "summary"
+ },
+ "220": {
+ "file_id": 17,
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nUtilities for bounding box manipulation and GIoU.\n\"\"\"\nimport torch\nfrom torchvision.ops.boxes import box_area\ndef box_cxcywh_to_xyxy(x):\n x_c, y_c, w, h = x.unbind(-1)\n b = [(x_c - 0.5 * w), (y_c - 0.5 * h),\n (x_c + 0.5 * w), (y_c + 0.5 * h)]\n return torch.stack(b, dim=-1)\ndef box_xyxy_to_cxcywh(x):\n x0, y0, x1, y1 = x.unbind(-1)\n b = [(x0 + x1) / 2, (y0 + y1) / 2,\n (x1 - x0), (y1 - y0)]\n return torch.stack(b, dim=-1)\n# modified from torchvision to also return the union\ndef box_iou(boxes1, boxes2):\n area1 = box_area(boxes1)\n area2 = box_area(boxes2)\n lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # [N,M,2]\n rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # [N,M,2]\n wh = (rb - lt).clamp(min=0) # [N,M,2]\n inter = wh[:, :, 0] * wh[:, :, 1] # [N,M]\n union = area1[:, None] + area2 - inter\n iou = inter / union\n return iou, union\ndef generalized_box_iou(boxes1, boxes2):\n \"\"\"",
+ "type": "code",
+ "location": "/detr/util/box_ops.py:1-41"
+ },
+ "221": {
+ "file_id": 17,
+ "content": "This code is from the \"act-plus-plus/detr/util/box_ops.py\" file and contains functions for bounding box manipulation and GIoU (Generalized Intersection over Union). The code includes utilities to convert between (cxcywh) and (xyxy) coordinate systems, and calculate the IOU (Intersection Over Union) and Generalized Box IOU between two sets of boxes. It also includes a modified version of torchvision's box_iou function that returns the union as well.",
+ "type": "comment"
+ },
+ "222": {
+ "file_id": 17,
+ "content": " Generalized IoU from https://giou.stanford.edu/\n The boxes should be in [x0, y0, x1, y1] format\n Returns a [N, M] pairwise matrix, where N = len(boxes1)\n and M = len(boxes2)\n \"\"\"\n # degenerate boxes gives inf / nan results\n # so do an early check\n assert (boxes1[:, 2:] >= boxes1[:, :2]).all()\n assert (boxes2[:, 2:] >= boxes2[:, :2]).all()\n iou, union = box_iou(boxes1, boxes2)\n lt = torch.min(boxes1[:, None, :2], boxes2[:, :2])\n rb = torch.max(boxes1[:, None, 2:], boxes2[:, 2:])\n wh = (rb - lt).clamp(min=0) # [N,M,2]\n area = wh[:, :, 0] * wh[:, :, 1]\n return iou - (area - union) / area\ndef masks_to_boxes(masks):\n \"\"\"Compute the bounding boxes around the provided masks\n The masks should be in format [N, H, W] where N is the number of masks, (H, W) are the spatial dimensions.\n Returns a [N, 4] tensors, with the boxes in xyxy format\n \"\"\"\n if masks.numel() == 0:\n return torch.zeros((0, 4), device=masks.device)\n h, w = masks.shape[-2:]\n y = torch.arange(0, h, dtype=torch.float)",
+ "type": "code",
+ "location": "/detr/util/box_ops.py:42-76"
+ },
+ "223": {
+ "file_id": 17,
+ "content": "The code snippet contains two functions: \"generalized_iou\" and \"masks_to_boxes\". The first function calculates a pairwise matrix of Intersection over Union (IoU) between two sets of bounding boxes, taking into account degenerate cases. It asserts that the boxes are in the correct format and computes the IoU and union area between boxes. The second function takes a set of masks and returns the corresponding bounding boxes in xyxy format. It checks if the mask tensor is empty and then calculates the y-coordinates for the bounding boxes.",
+ "type": "comment"
+ },
+ "224": {
+ "file_id": 17,
+ "content": " x = torch.arange(0, w, dtype=torch.float)\n y, x = torch.meshgrid(y, x)\n x_mask = (masks * x.unsqueeze(0))\n x_max = x_mask.flatten(1).max(-1)[0]\n x_min = x_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]\n y_mask = (masks * y.unsqueeze(0))\n y_max = y_mask.flatten(1).max(-1)[0]\n y_min = y_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]\n return torch.stack([x_min, y_min, x_max, y_max], 1)",
+ "type": "code",
+ "location": "/detr/util/box_ops.py:77-88"
+ },
+ "225": {
+ "file_id": 17,
+ "content": "Computes the minimum and maximum x,y coordinates within masks using meshgrid and masked fill operations, then stacks them into a tensor.",
+ "type": "comment"
+ },
+ "226": {
+ "file_id": 18,
+ "content": "/detr/util/misc.py",
+ "type": "filepath"
+ },
+ "227": {
+ "file_id": 18,
+ "content": "The \"SmoothedValue\" and MetricLogger classes log metrics, offer smoothing, and track progress updates with memory usage. The PyTorch NestedTensor class supports distributed training, ONNX tracing, image padding, and accuracy functions.",
+ "type": "summary"
+ },
+ "228": {
+ "file_id": 18,
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nMisc functions, including distributed helpers.\nMostly copy-paste from torchvision references.\n\"\"\"\nimport os\nimport subprocess\nimport time\nfrom collections import defaultdict, deque\nimport datetime\nimport pickle\nfrom packaging import version\nfrom typing import Optional, List\nimport torch\nimport torch.distributed as dist\nfrom torch import Tensor\n# needed due to empty tensor bug in pytorch and torchvision 0.5\nimport torchvision\nif version.parse(torchvision.__version__) < version.parse('0.7'):\n from torchvision.ops import _new_empty_tensor\n from torchvision.ops.misc import _output_size\nclass SmoothedValue(object):\n \"\"\"Track a series of values and provide access to smoothed values over a\n window or the global series average.\n \"\"\"\n def __init__(self, window_size=20, fmt=None):\n if fmt is None:\n fmt = \"{median:.4f} ({global_avg:.4f})\"\n self.deque = deque(maxlen=window_size)\n self.total = 0.0\n self.count = 0",
+ "type": "code",
+ "location": "/detr/util/misc.py:1-37"
+ },
+ "229": {
+ "file_id": 18,
+ "content": "The code is a Python class called \"SmoothedValue\" that tracks a series of values and provides access to smoothed values over a window or the global average. It uses a deque data structure with maximum length \"window_size\" for efficient storage, and keeps track of the total and count of values. The format string \"fmt\" determines how the smoothed value and global average are displayed.",
+ "type": "comment"
+ },
+ "230": {
+ "file_id": 18,
+ "content": " self.fmt = fmt\n def update(self, value, n=1):\n self.deque.append(value)\n self.count += n\n self.total += value * n\n def synchronize_between_processes(self):\n \"\"\"\n Warning: does not synchronize the deque!\n \"\"\"\n if not is_dist_avail_and_initialized():\n return\n t = torch.tensor([self.count, self.total], dtype=torch.float64, device='cuda')\n dist.barrier()\n dist.all_reduce(t)\n t = t.tolist()\n self.count = int(t[0])\n self.total = t[1]\n @property\n def median(self):\n d = torch.tensor(list(self.deque))\n return d.median().item()\n @property\n def avg(self):\n d = torch.tensor(list(self.deque), dtype=torch.float32)\n return d.mean().item()\n @property\n def global_avg(self):\n return self.total / self.count\n @property\n def max(self):\n return max(self.deque)\n @property\n def value(self):\n return self.deque[-1]\n def __str__(self):\n return self.fmt.format(",
+ "type": "code",
+ "location": "/detr/util/misc.py:38-81"
+ },
+ "231": {
+ "file_id": 18,
+ "content": "The code defines a class that tracks a deque (double-ended queue) and provides various properties such as median, average, maximum value, global average, and current value. It also allows updating the deque with values and synchronizing counts and totals across multiple processes using PyTorch's distributed functions.",
+ "type": "comment"
+ },
+ "232": {
+ "file_id": 18,
+ "content": " median=self.median,\n avg=self.avg,\n global_avg=self.global_avg,\n max=self.max,\n value=self.value)\ndef all_gather(data):\n \"\"\"\n Run all_gather on arbitrary picklable data (not necessarily tensors)\n Args:\n data: any picklable object\n Returns:\n list[data]: list of data gathered from each rank\n \"\"\"\n world_size = get_world_size()\n if world_size == 1:\n return [data]\n # serialized to a Tensor\n buffer = pickle.dumps(data)\n storage = torch.ByteStorage.from_buffer(buffer)\n tensor = torch.ByteTensor(storage).to(\"cuda\")\n # obtain Tensor size of each rank\n local_size = torch.tensor([tensor.numel()], device=\"cuda\")\n size_list = [torch.tensor([0], device=\"cuda\") for _ in range(world_size)]\n dist.all_gather(size_list, local_size)\n size_list = [int(size.item()) for size in size_list]\n max_size = max(size_list)\n # receiving Tensor from all ranks\n # we pad the tensor because torch all_gather does not support",
+ "type": "code",
+ "location": "/detr/util/misc.py:82-114"
+ },
+ "233": {
+ "file_id": 18,
+ "content": "This function runs \"all_gather\" on any picklable data object, not necessarily tensors. It first checks if the world size is 1 and returns data if so. If not, it picks up the data, converts it into a byte tensor, gathers the local size of the tensor from each rank using all_gather, finds the maximum size among them, and finally performs an all_gather on the tensor while padding when necessary.",
+ "type": "comment"
+ },
+ "234": {
+ "file_id": 18,
+ "content": " # gathering tensors of different shapes\n tensor_list = []\n for _ in size_list:\n tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device=\"cuda\"))\n if local_size != max_size:\n padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device=\"cuda\")\n tensor = torch.cat((tensor, padding), dim=0)\n dist.all_gather(tensor_list, tensor)\n data_list = []\n for size, tensor in zip(size_list, tensor_list):\n buffer = tensor.cpu().numpy().tobytes()[:size]\n data_list.append(pickle.loads(buffer))\n return data_list\ndef reduce_dict(input_dict, average=True):\n \"\"\"\n Args:\n input_dict (dict): all the values will be reduced\n average (bool): whether to do average or sum\n Reduce the values in the dictionary from all processes so that all processes\n have the averaged results. Returns a dict with the same fields as\n input_dict, after reduction.\n \"\"\"\n world_size = get_world_size()\n if world_size < 2:\n return input_dict",
+ "type": "code",
+ "location": "/detr/util/misc.py:115-143"
+ },
+ "235": {
+ "file_id": 18,
+ "content": "This code snippet is responsible for gathering tensors of different shapes and reducing the values in a dictionary from all processes. It first creates empty tensors for various sizes, then gathers them using all-gather operation. Afterwards, it converts tensors to data and appends them to a list. The second function reduces the values in an input dictionary across multiple processes by averaging or summing them, based on the specified flag.",
+ "type": "comment"
+ },
+ "236": {
+ "file_id": 18,
+ "content": " with torch.no_grad():\n names = []\n values = []\n # sort the keys so that they are consistent across processes\n for k in sorted(input_dict.keys()):\n names.append(k)\n values.append(input_dict[k])\n values = torch.stack(values, dim=0)\n dist.all_reduce(values)\n if average:\n values /= world_size\n reduced_dict = {k: v for k, v in zip(names, values)}\n return reduced_dict\nclass MetricLogger(object):\n def __init__(self, delimiter=\"\\t\"):\n self.meters = defaultdict(SmoothedValue)\n self.delimiter = delimiter\n def update(self, **kwargs):\n for k, v in kwargs.items():\n if isinstance(v, torch.Tensor):\n v = v.item()\n assert isinstance(v, (float, int))\n self.meters[k].update(v)\n def __getattr__(self, attr):\n if attr in self.meters:\n return self.meters[attr]\n if attr in self.__dict__:\n return self.__dict__[attr]\n raise AttributeError(\"'{}' object has no attribute '{}'\".format(",
+ "type": "code",
+ "location": "/detr/util/misc.py:144-176"
+ },
+ "237": {
+ "file_id": 18,
+ "content": "This code snippet defines a class MetricLogger which logs metrics such as average and sum. It also contains a function that averages values across processes, creating a reduced dictionary after performing all-reduce operation. This could be useful for distributed training where different processes need to communicate their results for aggregation and averaging purposes.",
+ "type": "comment"
+ },
+ "238": {
+ "file_id": 18,
+ "content": " type(self).__name__, attr))\n def __str__(self):\n loss_str = []\n for name, meter in self.meters.items():\n loss_str.append(\n \"{}: {}\".format(name, str(meter))\n )\n return self.delimiter.join(loss_str)\n def synchronize_between_processes(self):\n for meter in self.meters.values():\n meter.synchronize_between_processes()\n def add_meter(self, name, meter):\n self.meters[name] = meter\n def log_every(self, iterable, print_freq, header=None):\n i = 0\n if not header:\n header = ''\n start_time = time.time()\n end = time.time()\n iter_time = SmoothedValue(fmt='{avg:.4f}')\n data_time = SmoothedValue(fmt='{avg:.4f}')\n space_fmt = ':' + str(len(str(len(iterable)))) + 'd'\n if torch.cuda.is_available():\n log_msg = self.delimiter.join([\n header,\n '[{0' + space_fmt + '}/{1}]',\n 'eta: {eta}',\n '{meters}',",
+ "type": "code",
+ "location": "/detr/util/misc.py:177-208"
+ },
+ "239": {
+ "file_id": 18,
+ "content": "The code defines a class with methods for logging iterable data every 'print_freq' iterations. It includes synchronization, adding meters, and displaying loss metrics as strings. The class also has a timer to calculate elapsed time for each iteration of the iterable.",
+ "type": "comment"
+ },
+ "240": {
+ "file_id": 18,
+ "content": " 'time: {time}',\n 'data: {data}',\n 'max mem: {memory:.0f}'\n ])\n else:\n log_msg = self.delimiter.join([\n header,\n '[{0' + space_fmt + '}/{1}]',\n 'eta: {eta}',\n '{meters}',\n 'time: {time}',\n 'data: {data}'\n ])\n MB = 1024.0 * 1024.0\n for obj in iterable:\n data_time.update(time.time() - end)\n yield obj\n iter_time.update(time.time() - end)\n if i % print_freq == 0 or i == len(iterable) - 1:\n eta_seconds = iter_time.global_avg * (len(iterable) - i)\n eta_string = str(datetime.timedelta(seconds=int(eta_seconds)))\n if torch.cuda.is_available():\n print(log_msg.format(\n i, len(iterable), eta=eta_string,\n meters=str(self),\n time=str(iter_time), data=str(data_time),",
+ "type": "code",
+ "location": "/detr/util/misc.py:209-234"
+ },
+ "241": {
+ "file_id": 18,
+ "content": "This code snippet is part of a progress bar implementation. It calculates elapsed time, remaining time estimation, and memory usage for an iterable. The log message is constructed with dynamic placeholders and printed at specified intervals based on the print_freq variable. The CUDA availability check ensures proper printing to the console or CUDA device.",
+ "type": "comment"
+ },
+ "242": {
+ "file_id": 18,
+ "content": " memory=torch.cuda.max_memory_allocated() / MB))\n else:\n print(log_msg.format(\n i, len(iterable), eta=eta_string,\n meters=str(self),\n time=str(iter_time), data=str(data_time)))\n i += 1\n end = time.time()\n total_time = time.time() - start_time\n total_time_str = str(datetime.timedelta(seconds=int(total_time)))\n print('{} Total time: {} ({:.4f} s / it)'.format(\n header, total_time_str, total_time / len(iterable)))\ndef get_sha():\n cwd = os.path.dirname(os.path.abspath(__file__))\n def _run(command):\n return subprocess.check_output(command, cwd=cwd).decode('ascii').strip()\n sha = 'N/A'\n diff = \"clean\"\n branch = 'N/A'\n try:\n sha = _run(['git', 'rev-parse', 'HEAD'])\n subprocess.check_output(['git', 'diff'], cwd=cwd)\n diff = _run(['git', 'diff-index', 'HEAD'])\n diff = \"has uncommited changes\" if diff else \"clean\"",
+ "type": "code",
+ "location": "/detr/util/misc.py:235-261"
+ },
+ "243": {
+ "file_id": 18,
+ "content": "The code defines a function that calculates the total time taken for an iterable and logs progress updates. It also includes functions to get the current branch, uncommitted changes, and SHA of the current file's directory.",
+ "type": "comment"
+ },
+ "244": {
+ "file_id": 18,
+ "content": " branch = _run(['git', 'rev-parse', '--abbrev-ref', 'HEAD'])\n except Exception:\n pass\n message = f\"sha: {sha}, status: {diff}, branch: {branch}\"\n return message\ndef collate_fn(batch):\n batch = list(zip(*batch))\n batch[0] = nested_tensor_from_tensor_list(batch[0])\n return tuple(batch)\ndef _max_by_axis(the_list):\n # type: (List[List[int]]) -> List[int]\n maxes = the_list[0]\n for sublist in the_list[1:]:\n for index, item in enumerate(sublist):\n maxes[index] = max(maxes[index], item)\n return maxes\nclass NestedTensor(object):\n def __init__(self, tensors, mask: Optional[Tensor]):\n self.tensors = tensors\n self.mask = mask\n def to(self, device):\n # type: (Device) -> NestedTensor # noqa\n cast_tensor = self.tensors.to(device)\n mask = self.mask\n if mask is not None:\n assert mask is not None\n cast_mask = mask.to(device)\n else:\n cast_mask = None\n return NestedTensor(cast_tensor, cast_mask)",
+ "type": "code",
+ "location": "/detr/util/misc.py:262-298"
+ },
+ "245": {
+ "file_id": 18,
+ "content": "This code defines a class NestedTensor, functions collate_fn, _max_by_axis, and _run. NestedTensor represents tensors with optional masking for PyTorch. collate_fn organizes input batches of different dimensions into tuples. _max_by_axis finds the maximum value along an axis in a list of lists. _run executes a git command and returns its output. These functions appear to be used in deep learning tasks, potentially for data processing or model training.",
+ "type": "comment"
+ },
+ "246": {
+ "file_id": 18,
+ "content": " def decompose(self):\n return self.tensors, self.mask\n def __repr__(self):\n return str(self.tensors)\ndef nested_tensor_from_tensor_list(tensor_list: List[Tensor]):\n # TODO make this more general\n if tensor_list[0].ndim == 3:\n if torchvision._is_tracing():\n # nested_tensor_from_tensor_list() does not export well to ONNX\n # call _onnx_nested_tensor_from_tensor_list() instead\n return _onnx_nested_tensor_from_tensor_list(tensor_list)\n # TODO make it support different-sized images\n max_size = _max_by_axis([list(img.shape) for img in tensor_list])\n # min_size = tuple(min(s) for s in zip(*[img.shape for img in tensor_list]))\n batch_shape = [len(tensor_list)] + max_size\n b, c, h, w = batch_shape\n dtype = tensor_list[0].dtype\n device = tensor_list[0].device\n tensor = torch.zeros(batch_shape, dtype=dtype, device=device)\n mask = torch.ones((b, h, w), dtype=torch.bool, device=device)\n for img, pad_img, m in zip(tensor_list, tensor, mask):",
+ "type": "code",
+ "location": "/detr/util/misc.py:300-324"
+ },
+ "247": {
+ "file_id": 18,
+ "content": "The code defines a `decompose` function that returns tensors and mask, and a `__repr__` function that returns the tensor representation. The main function is `nested_tensor_from_tensor_list`, which takes a list of tensors and creates a nested tensor by resizing them to have the same maximum shape while padding smaller ones. It supports 3D tensors and has TODOs for generalization and supporting different-sized images.",
+ "type": "comment"
+ },
+ "248": {
+ "file_id": 18,
+ "content": " pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)\n m[: img.shape[1], :img.shape[2]] = False\n else:\n raise ValueError('not supported')\n return NestedTensor(tensor, mask)\n# _onnx_nested_tensor_from_tensor_list() is an implementation of\n# nested_tensor_from_tensor_list() that is supported by ONNX tracing.\n@torch.jit.unused\ndef _onnx_nested_tensor_from_tensor_list(tensor_list: List[Tensor]) -> NestedTensor:\n max_size = []\n for i in range(tensor_list[0].dim()):\n max_size_i = torch.max(torch.stack([img.shape[i] for img in tensor_list]).to(torch.float32)).to(torch.int64)\n max_size.append(max_size_i)\n max_size = tuple(max_size)\n # work around for\n # pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)\n # m[: img.shape[1], :img.shape[2]] = False\n # which is not yet supported in onnx\n padded_imgs = []\n padded_masks = []\n for img in tensor_list:\n padding = [(s1 - s2) for s1, s2 in zip(max_size, tuple(img.shape))]",
+ "type": "code",
+ "location": "/detr/util/misc.py:325-349"
+ },
+ "249": {
+ "file_id": 18,
+ "content": "This code is creating a NestedTensor from a list of tensors. It checks if the input tensor_list has the same size and data type, then pads the images to have the maximum size in each dimension, and sets the mask accordingly. If not supported, it raises a ValueError. This implementation is designed to be compatible with ONNX tracing using @torch.jit.unused decorator.",
+ "type": "comment"
+ },
+ "250": {
+ "file_id": 18,
+ "content": " padded_img = torch.nn.functional.pad(img, (0, padding[2], 0, padding[1], 0, padding[0]))\n padded_imgs.append(padded_img)\n m = torch.zeros_like(img[0], dtype=torch.int, device=img.device)\n padded_mask = torch.nn.functional.pad(m, (0, padding[2], 0, padding[1]), \"constant\", 1)\n padded_masks.append(padded_mask.to(torch.bool))\n tensor = torch.stack(padded_imgs)\n mask = torch.stack(padded_masks)\n return NestedTensor(tensor, mask=mask)\ndef setup_for_distributed(is_master):\n \"\"\"\n This function disables printing when not in master process\n \"\"\"\n import builtins as __builtin__\n builtin_print = __builtin__.print\n def print(*args, **kwargs):\n force = kwargs.pop('force', False)\n if is_master or force:\n builtin_print(*args, **kwargs)\n __builtin__.print = print\ndef is_dist_avail_and_initialized():\n if not dist.is_available():\n return False\n if not dist.is_initialized():\n return False\n return True\ndef get_world_size():",
+ "type": "code",
+ "location": "/detr/util/misc.py:350-386"
+ },
+ "251": {
+ "file_id": 18,
+ "content": "This code snippet is from the act-plus-plus/detr/util/misc.py file and contains functions to pad images, handle distributed training, and check if distributed training is available and initialized. It also sets up a custom print function for non-master processes in distributed training and returns the world size.",
+ "type": "comment"
+ },
+ "252": {
+ "file_id": 18,
+ "content": " if not is_dist_avail_and_initialized():\n return 1\n return dist.get_world_size()\ndef get_rank():\n if not is_dist_avail_and_initialized():\n return 0\n return dist.get_rank()\ndef is_main_process():\n return get_rank() == 0\ndef save_on_master(*args, **kwargs):\n if is_main_process():\n torch.save(*args, **kwargs)\ndef init_distributed_mode(args):\n if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:\n args.rank = int(os.environ[\"RANK\"])\n args.world_size = int(os.environ['WORLD_SIZE'])\n args.gpu = int(os.environ['LOCAL_RANK'])\n elif 'SLURM_PROCID' in os.environ:\n args.rank = int(os.environ['SLURM_PROCID'])\n args.gpu = args.rank % torch.cuda.device_count()\n else:\n print('Not using distributed mode')\n args.distributed = False\n return\n args.distributed = True\n torch.cuda.set_device(args.gpu)\n args.dist_backend = 'nccl'\n print('| distributed init (rank {}): {}'.format(\n args.rank, args.dist_url), flush=True)",
+ "type": "code",
+ "location": "/detr/util/misc.py:387-425"
+ },
+ "253": {
+ "file_id": 18,
+ "content": "This code sets up distributed mode for deep learning tasks. It checks if the distribution environment is available and initialized, then gets world size and rank, defines helper functions like saving on master process only, and initializes distributed mode based on the environment variables. The code assumes the use of either Torch or NCCL backend for distributed training.",
+ "type": "comment"
+ },
+ "254": {
+ "file_id": 18,
+ "content": " torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,\n world_size=args.world_size, rank=args.rank)\n torch.distributed.barrier()\n setup_for_distributed(args.rank == 0)\n@torch.no_grad()\ndef accuracy(output, target, topk=(1,)):\n \"\"\"Computes the precision@k for the specified values of k\"\"\"\n if target.numel() == 0:\n return [torch.zeros([], device=output.device)]\n maxk = max(topk)\n batch_size = target.size(0)\n _, pred = output.topk(maxk, 1, True, True)\n pred = pred.t()\n correct = pred.eq(target.view(1, -1).expand_as(pred))\n res = []\n for k in topk:\n correct_k = correct[:k].view(-1).float().sum(0)\n res.append(correct_k.mul_(100.0 / batch_size))\n return res\ndef interpolate(input, size=None, scale_factor=None, mode=\"nearest\", align_corners=None):\n # type: (Tensor, Optional[List[int]], Optional[float], str, Optional[bool]) -> Tensor\n \"\"\"\n Equivalent to nn.functional.interpolate, but with support for empty batch sizes.",
+ "type": "code",
+ "location": "/detr/util/misc.py:426-454"
+ },
+ "255": {
+ "file_id": 18,
+ "content": "This code initializes a distributed process group and sets up functions for calculating accuracy and interpolating tensors. The distributed process group allows for parallel processing across multiple devices, while the accuracy function computes precision@k for specified values of k, and the interpolate function provides equivalent functionality to nn.functional.interpolate but supports empty batch sizes.",
+ "type": "comment"
+ },
+ "256": {
+ "file_id": 18,
+ "content": " This will eventually be supported natively by PyTorch, and this\n class can go away.\n \"\"\"\n if version.parse(torchvision.__version__) < version.parse('0.7'):\n if input.numel() > 0:\n return torch.nn.functional.interpolate(\n input, size, scale_factor, mode, align_corners\n )\n output_shape = _output_size(2, input, size, scale_factor)\n output_shape = list(input.shape[:-2]) + list(output_shape)\n return _new_empty_tensor(input, output_shape)\n else:\n return torchvision.ops.misc.interpolate(input, size, scale_factor, mode, align_corners)",
+ "type": "code",
+ "location": "/detr/util/misc.py:455-468"
+ },
+ "257": {
+ "file_id": 18,
+ "content": "This function checks the PyTorch and torchvision versions, and performs interpolation differently based on the version. If the version is below 0.7, it uses torch.nn.functional.interpolate(). Otherwise, it calls torchvision.ops.misc.interpolate(). The code also handles empty input cases by returning a new tensor with the appropriate shape.",
+ "type": "comment"
+ },
+ "258": {
+ "file_id": 19,
+ "content": "/detr/util/plot_utils.py",
+ "type": "filepath"
+ },
+ "259": {
+ "file_id": 19,
+ "content": "The \"plot_logs\" function generates matplotlib plots using training logs, handling missing files and plotting precision-recall curves with interpolated mAP values, setting axes titles and legends.",
+ "type": "summary"
+ },
+ "260": {
+ "file_id": 19,
+ "content": "\"\"\"\nPlotting utilities to visualize training logs.\n\"\"\"\nimport torch\nimport pandas as pd\nimport numpy as np\nimport seaborn as sns\nimport matplotlib.pyplot as plt\nfrom pathlib import Path, PurePath\ndef plot_logs(logs, fields=('class_error', 'loss_bbox_unscaled', 'mAP'), ewm_col=0, log_name='log.txt'):\n '''\n Function to plot specific fields from training log(s). Plots both training and test results.\n :: Inputs - logs = list containing Path objects, each pointing to individual dir with a log file\n - fields = which results to plot from each log file - plots both training and test for each field.\n - ewm_col = optional, which column to use as the exponential weighted smoothing of the plots\n - log_name = optional, name of log file if different than default 'log.txt'.\n :: Outputs - matplotlib plots of results in fields, color coded for each log file.\n - solid lines are training results, dashed lines are test results.\n '''\n func_name = \"plot_utils.py::plot_logs\"",
+ "type": "code",
+ "location": "/detr/util/plot_utils.py:1-26"
+ },
+ "261": {
+ "file_id": 19,
+ "content": "This code defines a function \"plot_logs\" that takes in training logs, fields to plot (like class_error, loss), and optional parameters like ewm_col and log_name. It then generates matplotlib plots showing the results of each field color-coded for each log file with solid lines representing training results and dashed lines for test results.",
+ "type": "comment"
+ },
+ "262": {
+ "file_id": 19,
+ "content": " # verify logs is a list of Paths (list[Paths]) or single Pathlib object Path,\n # convert single Path to list to avoid 'not iterable' error\n if not isinstance(logs, list):\n if isinstance(logs, PurePath):\n logs = [logs]\n print(f\"{func_name} info: logs param expects a list argument, converted to list[Path].\")\n else:\n raise ValueError(f\"{func_name} - invalid argument for logs parameter.\\n \\\n Expect list[Path] or single Path obj, received {type(logs)}\")\n # Quality checks - verify valid dir(s), that every item in list is Path object, and that log_name exists in each dir\n for i, dir in enumerate(logs):\n if not isinstance(dir, PurePath):\n raise ValueError(f\"{func_name} - non-Path object in logs argument of {type(dir)}: \\n{dir}\")\n if not dir.exists():\n raise ValueError(f\"{func_name} - invalid directory in logs argument:\\n{dir}\")\n # verify log_name exists\n fn = Path(dir / log_name)\n if not fn.exists():",
+ "type": "code",
+ "location": "/detr/util/plot_utils.py:28-47"
+ },
+ "263": {
+ "file_id": 19,
+ "content": "This code checks if the 'logs' argument is a list of Paths or a single Path object. If not, it raises an error. It then iterates over each directory in the logs list and ensures they exist as directories. Finally, it checks if the log_name exists within each directory.",
+ "type": "comment"
+ },
+ "264": {
+ "file_id": 19,
+ "content": " print(f\"-> missing {log_name}. Have you gotten to Epoch 1 in training?\")\n print(f\"--> full path of missing log file: {fn}\")\n return\n # load log file(s) and plot\n dfs = [pd.read_json(Path(p) / log_name, lines=True) for p in logs]\n fig, axs = plt.subplots(ncols=len(fields), figsize=(16, 5))\n for df, color in zip(dfs, sns.color_palette(n_colors=len(logs))):\n for j, field in enumerate(fields):\n if field == 'mAP':\n coco_eval = pd.DataFrame(\n np.stack(df.test_coco_eval_bbox.dropna().values)[:, 1]\n ).ewm(com=ewm_col).mean()\n axs[j].plot(coco_eval, c=color)\n else:\n df.interpolate().ewm(com=ewm_col).mean().plot(\n y=[f'train_{field}', f'test_{field}'],\n ax=axs[j],\n color=[color] * 2,\n style=['-', '--']\n )\n for ax, field in zip(axs, fields):\n ax.legend([Path(p).name for p in logs])",
+ "type": "code",
+ "location": "/detr/util/plot_utils.py:48-72"
+ },
+ "265": {
+ "file_id": 19,
+ "content": "This code checks for a missing log file and prompts the user to make sure they've reached Epoch 1 in training. It then loads log files, plots data frames for specified fields, and handles missing log files. The plot includes mAP (mean average precision) values using COCO evaluation metrics, and other field values interpolated and smoothed using exponential weighted moving averages.",
+ "type": "comment"
+ },
+ "266": {
+ "file_id": 19,
+ "content": " ax.set_title(field)\ndef plot_precision_recall(files, naming_scheme='iter'):\n if naming_scheme == 'exp_id':\n # name becomes exp_id\n names = [f.parts[-3] for f in files]\n elif naming_scheme == 'iter':\n names = [f.stem for f in files]\n else:\n raise ValueError(f'not supported {naming_scheme}')\n fig, axs = plt.subplots(ncols=2, figsize=(16, 5))\n for f, color, name in zip(files, sns.color_palette(\"Blues\", n_colors=len(files)), names):\n data = torch.load(f)\n # precision is n_iou, n_points, n_cat, n_area, max_det\n precision = data['precision']\n recall = data['params'].recThrs\n scores = data['scores']\n # take precision for all classes, all areas and 100 detections\n precision = precision[0, :, :, 0, -1].mean(1)\n scores = scores[0, :, :, 0, -1].mean(1)\n prec = precision.mean()\n rec = data['recall'][0, :, 0, -1].mean()\n print(f'{naming_scheme} {name}: mAP@50={prec * 100: 05.1f}, ' +\n f'score={scores.mean():0.3f}, ' +",
+ "type": "code",
+ "location": "/detr/util/plot_utils.py:73-97"
+ },
+ "267": {
+ "file_id": 19,
+ "content": "The code defines a function plot_precision_recall that takes in files, and depending on the naming_scheme, extracts either the exp_id or stem from each file. It then creates a figure with two subplots and for each file, it loads the corresponding data and calculates precision, recall, and mean average precision (mAP) at 50. The results are printed out for each file in the format \"naming_scheme name: mAP@50=precision%, score=score\".",
+ "type": "comment"
+ },
+ "268": {
+ "file_id": 19,
+ "content": " f'f1={2 * prec * rec / (prec + rec + 1e-8):0.3f}'\n )\n axs[0].plot(recall, precision, c=color)\n axs[1].plot(recall, scores, c=color)\n axs[0].set_title('Precision / Recall')\n axs[0].legend(names)\n axs[1].set_title('Scores / Recall')\n axs[1].legend(names)\n return fig, axs",
+ "type": "code",
+ "location": "/detr/util/plot_utils.py:98-107"
+ },
+ "269": {
+ "file_id": 19,
+ "content": "This code plots Precision-Recall curves and scores against Recall, sets titles for the axes, adds legends with given names, and returns the figure and axis objects.",
+ "type": "comment"
+ },
+ "270": {
+ "file_id": 20,
+ "content": "/dxl_test.py",
+ "type": "filepath"
+ },
+ "271": {
+ "file_id": 20,
+ "content": "This code imports DynamixelClient and creates an instance with IDs 1 and 2, connects to the '/dev/ttyDXL_wheels' port in a non-blocking manner. It then prints the current position, velocity, and current information of the connected motors.",
+ "type": "summary"
+ },
+ "272": {
+ "file_id": 20,
+ "content": "from dynamixel_client import DynamixelClient\nclient = DynamixelClient([1, 2], port='/dev/ttyDXL_wheels', lazy_connect=True)\nprint(client.read_pos_vel_cur())",
+ "type": "code",
+ "location": "/dxl_test.py:1-4"
+ },
+ "273": {
+ "file_id": 20,
+ "content": "This code imports DynamixelClient and creates an instance with IDs 1 and 2, connects to the '/dev/ttyDXL_wheels' port in a non-blocking manner. It then prints the current position, velocity, and current information of the connected motors.",
+ "type": "comment"
+ },
+ "274": {
+ "file_id": 21,
+ "content": "/dynamixel_client.py",
+ "type": "filepath"
+ },
+ "275": {
+ "file_id": 21,
+ "content": "The code uses DynamixelSDK for motor communication, offering a class for control and incorporating functions for cleanup, conversion, and initialization. It manages motion control through command-line arguments and handles data from Dynamixel motors in an infinite loop.",
+ "type": "summary"
+ },
+ "276": {
+ "file_id": 21,
+ "content": "\"\"\"Communication using the DynamixelSDK.\"\"\"\n##This is based off of the dynamixel SDK\nimport atexit\nimport logging\nimport time\nfrom typing import Optional, Sequence, Union, Tuple\nimport numpy as np\nPROTOCOL_VERSION = 2.0\n# The following addresses assume XH motors.\nADDR_TORQUE_ENABLE = 64\nADDR_GOAL_POSITION = 116\nADDR_PRESENT_POSITION = 132\nADDR_PRESENT_VELOCITY = 128\nADDR_PRESENT_CURRENT = 126\nADDR_PRESENT_POS_VEL_CUR = 126\n# Data Byte Length\nLEN_PRESENT_POSITION = 4\nLEN_PRESENT_VELOCITY = 4\nLEN_PRESENT_CURRENT = 2\nLEN_PRESENT_POS_VEL_CUR = 10\nLEN_GOAL_POSITION = 4\nDEFAULT_POS_SCALE = 2.0 * np.pi / 4096 # 0.088 degrees\n# See http://emanual.robotis.com/docs/en/dxl/x/xh430-v210/#goal-velocity\nDEFAULT_VEL_SCALE = 0.229 * 2.0 * np.pi / 60.0 # 0.229 rpm\nDEFAULT_CUR_SCALE = 1.34\ndef dynamixel_cleanup_handler():\n \"\"\"Cleanup function to ensure Dynamixels are disconnected properly.\"\"\"\n open_clients = list(DynamixelClient.OPEN_CLIENTS)\n for open_client in open_clients:\n if open_client.port_handler.is_using:\n logging.warning('Forcing client to close.')",
+ "type": "code",
+ "location": "/dynamixel_client.py:1-38"
+ },
+ "277": {
+ "file_id": 21,
+ "content": "This code is for communicating with Dynamixel motors using the DynamixelSDK. It defines protocol version, addresses for various motor data, byte lengths, and scale factors for position, velocity, and current. The dynamixel_cleanup_handler function ensures Dynamixels are disconnected properly before exiting.",
+ "type": "comment"
+ },
+ "278": {
+ "file_id": 21,
+ "content": " open_client.port_handler.is_using = False\n open_client.disconnect()\ndef signed_to_unsigned(value: int, size: int) -> int:\n \"\"\"Converts the given value to its unsigned representation.\"\"\"\n if value < 0:\n bit_size = 8 * size\n max_value = (1 << bit_size) - 1\n value = max_value + value\n return value\ndef unsigned_to_signed(value: int, size: int) -> int:\n \"\"\"Converts the given value from its unsigned representation.\"\"\"\n bit_size = 8 * size\n if (value & (1 << (bit_size - 1))) != 0:\n value = -((1 << bit_size) - value)\n return value\nclass DynamixelClient:\n \"\"\"Client for communicating with Dynamixel motors.\n NOTE: This only supports Protocol 2.\n \"\"\"\n # The currently open clients.\n OPEN_CLIENTS = set()\n def __init__(self,\n motor_ids: Sequence[int],\n port: str = '/dev/ttyUSB0',\n baudrate: int = 1000000,\n lazy_connect: bool = False,\n pos_scale: Optional[float] = None,",
+ "type": "code",
+ "location": "/dynamixel_client.py:39-74"
+ },
+ "279": {
+ "file_id": 21,
+ "content": "The code defines a class `DynamixelClient` for communicating with Dynamixel motors, supporting Protocol 2. It also contains functions `signed_to_unsigned` and `unsigned_to_signed` for converting signed to unsigned values and vice versa. The client can be initialized with motor IDs, port, baudrate, lazy connect option, and optional position scale.",
+ "type": "comment"
+ },
+ "280": {
+ "file_id": 21,
+ "content": " vel_scale: Optional[float] = None,\n cur_scale: Optional[float] = None):\n \"\"\"Initializes a new client.\n Args:\n motor_ids: All motor IDs being used by the client.\n port: The Dynamixel device to talk to. e.g.\n - Linux: /dev/ttyUSB0\n - Mac: /dev/tty.usbserial-*\n - Windows: COM1\n baudrate: The Dynamixel baudrate to communicate with.\n lazy_connect: If True, automatically connects when calling a method\n that requires a connection, if not already connected.\n pos_scale: The scaling factor for the positions. This is\n motor-dependent. If not provided, uses the default scale.\n vel_scale: The scaling factor for the velocities. This is\n motor-dependent. If not provided uses the default scale.\n cur_scale: The scaling factor for the currents. This is\n motor-dependent. If not provided uses the default scale.",
+ "type": "code",
+ "location": "/dynamixel_client.py:75-93"
+ },
+ "281": {
+ "file_id": 21,
+ "content": "This code snippet is the constructor of a class, initializing a new Dynamixel client. It takes motor IDs, device port, baudrate, and optional scaling factors for positions, velocities, and currents as arguments. If not provided, it uses default scales. Lazy connectivity is also available if a method requires a connection when not already connected.",
+ "type": "comment"
+ },
+ "282": {
+ "file_id": 21,
+ "content": " \"\"\"\n import dynamixel_sdk\n self.dxl = dynamixel_sdk\n self.motor_ids = list(motor_ids)\n self.port_name = port\n self.baudrate = baudrate\n self.lazy_connect = lazy_connect\n self.port_handler = self.dxl.PortHandler(port)\n self.packet_handler = self.dxl.PacketHandler(PROTOCOL_VERSION)\n self._pos_vel_cur_reader = DynamixelPosVelCurReader(\n self,\n self.motor_ids,\n pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,\n vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,\n cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,\n )\n self._pos_reader = DynamixelPosReader(\n self,\n self.motor_ids,\n pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,\n vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,\n cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,",
+ "type": "code",
+ "location": "/dynamixel_client.py:94-118"
+ },
+ "283": {
+ "file_id": 21,
+ "content": "This code imports the dynamixel_sdk library and initializes variables for port, baudrate, lazy connect, and protocol version. It also creates handlers for the port and packet communication and instantiates two reader classes for position, velocity, and current data. These readers can be used to access information from Dynamixel motors.",
+ "type": "comment"
+ },
+ "284": {
+ "file_id": 21,
+ "content": " )\n self._vel_reader = DynamixelVelReader(\n self,\n self.motor_ids,\n pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,\n vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,\n cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,\n )\n self._cur_reader = DynamixelCurReader(\n self,\n self.motor_ids,\n pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,\n vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,\n cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,\n )\n self._sync_writers = {}\n self.OPEN_CLIENTS.add(self)\n @property\n def is_connected(self) -> bool:\n return self.port_handler.is_open\n def connect(self):\n \"\"\"Connects to the Dynamixel motors.\n NOTE: This should be called after all DynamixelClients on the same\n process are created.",
+ "type": "code",
+ "location": "/dynamixel_client.py:119-146"
+ },
+ "285": {
+ "file_id": 21,
+ "content": "The code initializes reader and writer objects for the Dynamixel motors, handles open clients, and provides a connect method. The `_vel_reader` and `_cur_reader` objects are created with optional scales for position (pos_scale), velocity (vel_scale), and current (cur_scale). These scales allow custom adjustment to the motor data readings. The `self._sync_writers` dictionary is initialized, likely used for synchronous writer operations. The code also includes an `is_connected` property that returns the status of the connection to the Dynamixel motors and a `connect` method which should be called after all DynamixelClients on the same process are created.",
+ "type": "comment"
+ },
+ "286": {
+ "file_id": 21,
+ "content": " \"\"\"\n assert not self.is_connected, 'Client is already connected.'\n if self.port_handler.openPort():\n logging.info('Succeeded to open port: %s', self.port_name)\n else:\n raise OSError(\n ('Failed to open port at {} (Check that the device is powered '\n 'on and connected to your computer).').format(self.port_name))\n if self.port_handler.setBaudRate(self.baudrate):\n logging.info('Succeeded to set baudrate to %d', self.baudrate)\n else:\n raise OSError(\n ('Failed to set the baudrate to {} (Ensure that the device was '\n 'configured for this baudrate).').format(self.baudrate))\n # Start with all motors enabled. NO, I want to set settings before enabled\n #self.set_torque_enabled(self.motor_ids, True)\n def disconnect(self):\n \"\"\"Disconnects from the Dynamixel device.\"\"\"\n if not self.is_connected:\n return\n if self.port_handler.is_using:",
+ "type": "code",
+ "location": "/dynamixel_client.py:147-171"
+ },
+ "287": {
+ "file_id": 21,
+ "content": "This code checks if the client is already connected and then attempts to open the port. If successful, it logs a message indicating the port has been opened. It also sets the baud rate and logs a success message if that's successful too. The code then enables all motors with True values for settings before enabling. Lastly, there is a function disconnect() which checks if the client is connected, and if so, it disconnects from the Dynamixel device.",
+ "type": "comment"
+ },
+ "288": {
+ "file_id": 21,
+ "content": " logging.error('Port handler in use; cannot disconnect.')\n return\n # Ensure motors are disabled at the end.\n self.set_torque_enabled(self.motor_ids, False, retries=0)\n self.port_handler.closePort()\n if self in self.OPEN_CLIENTS:\n self.OPEN_CLIENTS.remove(self)\n def set_torque_enabled(self,\n motor_ids: Sequence[int],\n enabled: bool,\n retries: int = -1,\n retry_interval: float = 0.25):\n \"\"\"Sets whether torque is enabled for the motors.\n Args:\n motor_ids: The motor IDs to configure.\n enabled: Whether to engage or disengage the motors.\n retries: The number of times to retry. If this is <0, will retry\n forever.\n retry_interval: The number of seconds to wait between retries.\n \"\"\"\n remaining_ids = list(motor_ids)\n while remaining_ids:\n remaining_ids = self.write_byte(",
+ "type": "code",
+ "location": "/dynamixel_client.py:172-196"
+ },
+ "289": {
+ "file_id": 21,
+ "content": "The code is disconnecting the port handler and ensuring motors are disabled. It removes the client from OPEN_CLIENTS, sets motor torque enabled or disabled, retries if necessary, and waits between retries for a specific duration.",
+ "type": "comment"
+ },
+ "290": {
+ "file_id": 21,
+ "content": " remaining_ids,\n int(enabled),\n ADDR_TORQUE_ENABLE,\n )\n if remaining_ids:\n logging.error('Could not set torque %s for IDs: %s',\n 'enabled' if enabled else 'disabled',\n str(remaining_ids))\n if retries == 0:\n break\n time.sleep(retry_interval)\n retries -= 1\n def read_pos_vel_cur(self) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:\n \"\"\"Returns the current positions and velocities.\"\"\"\n return self._pos_vel_cur_reader.read()\n def read_pos(self) -> np.ndarray:\n \"\"\"Returns the current positions and velocities.\"\"\"\n return self._pos_reader.read()\n def read_vel(self) -> np.ndarray:\n \"\"\"Returns the current positions and velocities.\"\"\"\n return self._vel_reader.read()\n def read_cur(self) -> np.ndarray:\n \"\"\"Returns the current positions and velocities.\"\"\"\n return self._cur_reader.read()",
+ "type": "code",
+ "location": "/dynamixel_client.py:197-221"
+ },
+ "291": {
+ "file_id": 21,
+ "content": "The code defines a function to set the torque of Dynamixel motors. It iterates over each ID and enables/disables the torque for them. If there are remaining unsuccessful IDs, it logs an error message. The code also includes methods to read positions, velocities, and currents from the motors. Each method uses a reader object to retrieve the data.",
+ "type": "comment"
+ },
+ "292": {
+ "file_id": 21,
+ "content": " def write_desired_pos(self, motor_ids: Sequence[int],\n positions: np.ndarray):\n \"\"\"Writes the given desired positions.\n Args:\n motor_ids: The motor IDs to write to.\n positions: The joint angles in radians to write.\n \"\"\"\n assert len(motor_ids) == len(positions)\n # Convert to Dynamixel position space.\n positions = positions / self._pos_vel_cur_reader.pos_scale\n self.sync_write(motor_ids, positions, ADDR_GOAL_POSITION,\n LEN_GOAL_POSITION)\n def write_byte(\n self,\n motor_ids: Sequence[int],\n value: int,\n address: int,\n ) -> Sequence[int]:\n \"\"\"Writes a value to the motors.\n Args:\n motor_ids: The motor IDs to write to.\n value: The value to write to the control table.\n address: The control table address to write to.\n Returns:\n A list of IDs that were unsuccessful.\n \"\"\"\n self.check_connected()",
+ "type": "code",
+ "location": "/dynamixel_client.py:223-254"
+ },
+ "293": {
+ "file_id": 21,
+ "content": "This code defines two functions, \"write_desired_pos\" and \"write_byte\". The first function writes the given desired positions to the specified motor IDs. It takes in a list of motor IDs and an array of joint angles, converts the angles to Dynamixel position space, then uses sync_write to write the positions to the motors' goal position address. The second function writes a value to the control table at a given address for specified motor IDs. It returns a list of unsuccessful IDs if any occur during writing.",
+ "type": "comment"
+ },
+ "294": {
+ "file_id": 21,
+ "content": " errored_ids = []\n for motor_id in motor_ids:\n comm_result, dxl_error = self.packet_handler.write1ByteTxRx(\n self.port_handler, motor_id, address, value)\n success = self.handle_packet_result(\n comm_result, dxl_error, motor_id, context='write_byte')\n if not success:\n errored_ids.append(motor_id)\n return errored_ids\n def sync_write(self, motor_ids: Sequence[int],\n values: Sequence[Union[int, float]], address: int,\n size: int):\n \"\"\"Writes values to a group of motors.\n Args:\n motor_ids: The motor IDs to write to.\n values: The values to write.\n address: The control table address to write to.\n size: The size of the control table value being written to.\n \"\"\"\n self.check_connected()\n key = (address, size)\n if key not in self._sync_writers:\n self._sync_writers[key] = self.dxl.GroupSyncWrite(",
+ "type": "code",
+ "location": "/dynamixel_client.py:255-279"
+ },
+ "295": {
+ "file_id": 21,
+ "content": "This code defines a function `sync_write` that takes motor IDs, values, address, and size as input to write the same value at the specified address for multiple motors. It first checks if the connection is established and then creates a key based on the address and size. If this key is not present in the internal dictionary `self._sync_writers`, it initializes a GroupSyncWrite operation with the given parameters. This function also returns an empty list of motor IDs that had errors during the write operation, which are stored in the variable `errored_ids` by checking if each write operation was successful or not.",
+ "type": "comment"
+ },
+ "296": {
+ "file_id": 21,
+ "content": " self.port_handler, self.packet_handler, address, size)\n sync_writer = self._sync_writers[key]\n errored_ids = []\n for motor_id, desired_pos in zip(motor_ids, values):\n value = signed_to_unsigned(int(desired_pos), size=size)\n value = value.to_bytes(size, byteorder='little')\n success = sync_writer.addParam(motor_id, value)\n if not success:\n errored_ids.append(motor_id)\n if errored_ids:\n logging.error('Sync write failed for: %s', str(errored_ids))\n comm_result = sync_writer.txPacket()\n self.handle_packet_result(comm_result, context='sync_write')\n sync_writer.clearParam()\n def check_connected(self):\n \"\"\"Ensures the robot is connected.\"\"\"\n if self.lazy_connect and not self.is_connected:\n self.connect()\n if not self.is_connected:\n raise OSError('Must call connect() first.')\n def handle_packet_result(self,\n comm_result: int,",
+ "type": "code",
+ "location": "/dynamixel_client.py:280-307"
+ },
+ "297": {
+ "file_id": 21,
+ "content": "The code snippet handles synchronous writes to multiple motors. It iterates over motor IDs and desired positions, converts them to the required format, adds them to the packet writer, logs any failures, sends the packet, clears the packet writer, and checks if the robot is connected.",
+ "type": "comment"
+ },
+ "298": {
+ "file_id": 21,
+ "content": " dxl_error: Optional[int] = None,\n dxl_id: Optional[int] = None,\n context: Optional[str] = None):\n \"\"\"Handles the result from a communication request.\"\"\"\n error_message = None\n if comm_result != self.dxl.COMM_SUCCESS:\n error_message = self.packet_handler.getTxRxResult(comm_result)\n elif dxl_error is not None:\n error_message = self.packet_handler.getRxPacketError(dxl_error)\n if error_message:\n if dxl_id is not None:\n error_message = '[Motor ID: {}] {}'.format(\n dxl_id, error_message)\n if context is not None:\n error_message = '> {}: {}'.format(context, error_message)\n logging.error(error_message)\n return False\n return True\n def convert_to_unsigned(self, value: int, size: int) -> int:\n \"\"\"Converts the given value to its unsigned representation.\"\"\"\n if value < 0:",
+ "type": "code",
+ "location": "/dynamixel_client.py:308-329"
+ },
+ "299": {
+ "file_id": 21,
+ "content": "This function handles communication results and checks for errors. It formats the error message with motor ID and context if provided, then logs the error and returns False. The convert_to_unsigned function converts a given value to its unsigned representation.",
+ "type": "comment"
+ }
+}
\ No newline at end of file
diff --git a/docs/data/3.json b/docs/data/3.json
new file mode 100644
index 00000000..6cfa663d
--- /dev/null
+++ b/docs/data/3.json
@@ -0,0 +1,550 @@
+{
+ "300": {
+ "file_id": 21,
+ "content": " max_value = (1 << (8 * size)) - 1\n value = max_value + value\n return value\n def __enter__(self):\n \"\"\"Enables use as a context manager.\"\"\"\n if not self.is_connected:\n self.connect()\n return self\n def __exit__(self, *args):\n \"\"\"Enables use as a context manager.\"\"\"\n self.disconnect()\n def __del__(self):\n \"\"\"Automatically disconnect on destruction.\"\"\"\n self.disconnect()\nclass DynamixelReader:\n \"\"\"Reads data from Dynamixel motors.\n This wraps a GroupBulkRead from the DynamixelSDK.\n \"\"\"\n def __init__(self, client: DynamixelClient, motor_ids: Sequence[int],\n address: int, size: int):\n \"\"\"Initializes a new reader.\"\"\"\n self.client = client\n self.motor_ids = motor_ids\n self.address = address\n self.size = size\n self._initialize_data()\n self.operation = self.client.dxl.GroupBulkRead(client.port_handler,\n client.packet_handler)",
+ "type": "code",
+ "location": "/dynamixel_client.py:330-365"
+ },
+ "301": {
+ "file_id": 21,
+ "content": "This code defines a DynamixelReader class for reading data from Dynamixel motors using GroupBulkRead from the DynamixelSDK. It also provides context management functionality with __enter__ and __exit__ methods, and automatically disconnects on destruction with __del__.",
+ "type": "comment"
+ },
+ "302": {
+ "file_id": 21,
+ "content": " for motor_id in motor_ids:\n success = self.operation.addParam(motor_id, address, size)\n if not success:\n raise OSError(\n '[Motor ID: {}] Could not add parameter to bulk read.'\n .format(motor_id))\n def read(self, retries: int = 1):\n \"\"\"Reads data from the motors.\"\"\"\n self.client.check_connected()\n success = False\n while not success and retries >= 0:\n comm_result = self.operation.txRxPacket()\n success = self.client.handle_packet_result(\n comm_result, context='read')\n retries -= 1\n # If we failed, send a copy of the previous data.\n if not success:\n return self._get_data()\n errored_ids = []\n for i, motor_id in enumerate(self.motor_ids):\n # Check if the data is available.\n available = self.operation.isAvailable(motor_id, self.address,\n self.size)",
+ "type": "code",
+ "location": "/dynamixel_client.py:367-392"
+ },
+ "303": {
+ "file_id": 21,
+ "content": "This code adds parameters to a bulk read operation for each motor ID, reads data from motors with retries in case of errors or disconnections, and returns previous data if the read fails.",
+ "type": "comment"
+ },
+ "304": {
+ "file_id": 21,
+ "content": " if not available:\n errored_ids.append(motor_id)\n continue\n self._update_data(i, motor_id)\n if errored_ids:\n logging.error('Bulk read data is unavailable for: %s',\n str(errored_ids))\n return self._get_data()\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n self._data[index] = self.operation.getData(motor_id, self.address,\n self.size)\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return self._data.copy()\nclass DynamixelPosVelCurReader(DynamixelReader):\n \"\"\"Reads positions and velocities.\"\"\"\n def __init__(self,\n client: DynamixelClient,\n motor_ids: Sequence[int],\n pos_scale: float = 1.0,",
+ "type": "code",
+ "location": "/dynamixel_client.py:393-425"
+ },
+ "305": {
+ "file_id": 21,
+ "content": "This code is part of a Dynamixel client that communicates with a robot's servo motors to read position and velocity data. It initializes the cached data, updates the data for specific motor IDs, returns a copy of the data, and handles cases where data is unavailable.",
+ "type": "comment"
+ },
+ "306": {
+ "file_id": 21,
+ "content": " vel_scale: float = 1.0,\n cur_scale: float = 1.0):\n super().__init__(\n client,\n motor_ids,\n address=ADDR_PRESENT_POS_VEL_CUR,\n size=LEN_PRESENT_POS_VEL_CUR,\n )\n self.pos_scale = pos_scale\n self.vel_scale = vel_scale\n self.cur_scale = cur_scale\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._pos_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n self._vel_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n self._cur_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n cur = self.operation.getData(motor_id, ADDR_PRESENT_CURRENT,\n LEN_PRESENT_CURRENT)\n vel = self.operation.getData(motor_id, ADDR_PRESENT_VELOCITY,\n LEN_PRESENT_VELOCITY)\n pos = self.operation.getData(motor_id, ADDR_PRESENT_POSITION,",
+ "type": "code",
+ "location": "/dynamixel_client.py:426-450"
+ },
+ "307": {
+ "file_id": 21,
+ "content": "This code defines a class for reading Dynamixel servo data. It takes in a client, motor IDs, and scales for position, velocity, and current. It initializes cached data arrays with zeros for each motor. The _update_data function reads and stores the current, velocity, and position data from the specified address for the given motor ID.",
+ "type": "comment"
+ },
+ "308": {
+ "file_id": 21,
+ "content": " LEN_PRESENT_POSITION)\n cur = unsigned_to_signed(cur, size=2)\n vel = unsigned_to_signed(vel, size=4)\n pos = unsigned_to_signed(pos, size=4)\n self._pos_data[index] = float(pos) * self.pos_scale\n self._vel_data[index] = float(vel) * self.vel_scale\n self._cur_data[index] = float(cur) * self.cur_scale\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return (self._pos_data.copy(), self._vel_data.copy(),\n self._cur_data.copy())\nclass DynamixelPosReader(DynamixelReader):\n \"\"\"Reads positions and velocities.\"\"\"\n def __init__(self,\n client: DynamixelClient,\n motor_ids: Sequence[int],\n pos_scale: float = 1.0,\n vel_scale: float = 1.0,\n cur_scale: float = 1.0):\n super().__init__(\n client,\n motor_ids,\n address=ADDR_PRESENT_POS_VEL_CUR,\n size=LEN_PRESENT_POS_VEL_CUR,\n )",
+ "type": "code",
+ "location": "/dynamixel_client.py:451-479"
+ },
+ "309": {
+ "file_id": 21,
+ "content": "The code defines a class `DynamixelPosReader` that inherits from `DynamixelReader` and reads positions and velocities of motors. It takes a client, motor IDs, and scaling factors for position, velocity, and current as parameters. The `__init__` method initializes the superclass with the address and size for reading present position, velocity, and current data. The `_get_data` method returns a copy of the stored position, velocity, and current data.",
+ "type": "comment"
+ },
+ "310": {
+ "file_id": 21,
+ "content": " self.pos_scale = pos_scale\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._pos_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n pos = self.operation.getData(motor_id, ADDR_PRESENT_POSITION,\n LEN_PRESENT_POSITION)\n pos = unsigned_to_signed(pos, size=4)\n self._pos_data[index] = float(pos) * self.pos_scale\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return self._pos_data.copy()\nclass DynamixelVelReader(DynamixelReader):\n \"\"\"Reads positions and velocities.\"\"\"\n def __init__(self,\n client: DynamixelClient,\n motor_ids: Sequence[int],\n pos_scale: float = 1.0,\n vel_scale: float = 1.0,\n cur_scale: float = 1.0):\n super().__init__(\n client,\n motor_ids,\n address=ADDR_PRESENT_POS_VEL_CUR,",
+ "type": "code",
+ "location": "/dynamixel_client.py:480-509"
+ },
+ "311": {
+ "file_id": 21,
+ "content": "The code defines a class `DynamixelReader` that reads position and velocity data from Dynamixel motors. It initializes cached data, updates the data for a given motor ID, and returns a copy of the data. The `DynamixelVelReader` subclass extends this functionality to read positions, velocities, and currents.",
+ "type": "comment"
+ },
+ "312": {
+ "file_id": 21,
+ "content": " size=LEN_PRESENT_POS_VEL_CUR,\n )\n self.pos_scale = pos_scale\n self.vel_scale = vel_scale\n self.cur_scale = cur_scale\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._vel_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n vel = self.operation.getData(motor_id, ADDR_PRESENT_VELOCITY,\n LEN_PRESENT_VELOCITY)\n vel = unsigned_to_signed(vel, size=4)\n self._vel_data[index] = float(vel) * self.vel_scale\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return self._vel_data.copy()\nclass DynamixelCurReader(DynamixelReader):\n \"\"\"Reads positions and velocities.\"\"\"\n def __init__(self,\n client: DynamixelClient,\n motor_ids: Sequence[int],\n pos_scale: float = 1.0,\n vel_scale: float = 1.0,",
+ "type": "code",
+ "location": "/dynamixel_client.py:510-538"
+ },
+ "313": {
+ "file_id": 21,
+ "content": "This code defines a class DynamixelCurReader that inherits from DynamixelReader and reads positions and velocities from dynamixel motors. The constructor takes in a client, motor IDs, optional position scale, and optional velocity scale. It initializes cached data and sets the position and velocity scales. The _initialize_data method initializes the velocity data with zeros. The _update_data method updates the data index for the given motor ID by getting the velocity from the DynamixelClient, converting it to a signed integer, scaling it by the velocity scale, and storing it in the velocity data. The _get_data method returns a copy of the velocity data.",
+ "type": "comment"
+ },
+ "314": {
+ "file_id": 21,
+ "content": " cur_scale: float = 1.0):\n super().__init__(\n client,\n motor_ids,\n address=ADDR_PRESENT_POS_VEL_CUR,\n size=LEN_PRESENT_POS_VEL_CUR,\n )\n self.cur_scale = cur_scale\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._cur_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n cur = self.operation.getData(motor_id, ADDR_PRESENT_CURRENT,\n LEN_PRESENT_CURRENT)\n cur = unsigned_to_signed(cur, size=2)\n self._cur_data[index] = float(cur) * self.cur_scale\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return self._cur_data.copy()\n# Register global cleanup function.\natexit.register(dynamixel_cleanup_handler)\nif __name__ == '__main__':\n import argparse\n import itertools\n parser = argparse.ArgumentParser()",
+ "type": "code",
+ "location": "/dynamixel_client.py:539-571"
+ },
+ "315": {
+ "file_id": 21,
+ "content": "The code defines a class for reading the present current values from Dynamixel motors. It initializes data and updates data index for the given motor ID. The function returns a copy of the data. Global cleanup function is registered for atexit module to handle clean-up operations upon program termination.",
+ "type": "comment"
+ },
+ "316": {
+ "file_id": 21,
+ "content": " parser.add_argument(\n '-m',\n '--motors',\n required=True,\n help='Comma-separated list of motor IDs.')\n parser.add_argument(\n '-d',\n '--device',\n default='/dev/ttyUSB0',\n help='The Dynamixel device to connect to.')\n parser.add_argument(\n '-b', '--baud', default=1000000, help='The baudrate to connect with.')\n parsed_args = parser.parse_args()\n motors = [int(motor) for motor in parsed_args.motors.split(',')]\n way_points = [np.zeros(len(motors)), np.full(len(motors), np.pi)]\n with DynamixelClient(motors, parsed_args.device,\n parsed_args.baud) as dxl_client:\n for step in itertools.count():\n if step > 0 and step % 50 == 0:\n way_point = way_points[(step // 100) % len(way_points)]\n print('Writing: {}'.format(way_point.tolist()))\n dxl_client.write_desired_pos(motors, way_point)\n read_start = time.time()\n pos_now, vel_now, cur_now = dxl_client.read_pos_vel_cur()",
+ "type": "code",
+ "location": "/dynamixel_client.py:572-598"
+ },
+ "317": {
+ "file_id": 21,
+ "content": "The code defines command-line arguments for motor IDs, device, and baudrate. It then parses these arguments into a list of motors, and creates waypoints for motion control using numpy arrays. The DynamixelClient class is instantiated with the parsed arguments, and in an infinite loop, writes waypoint positions to motors and reads current position, velocity, and current values from the device at regular intervals.",
+ "type": "comment"
+ },
+ "318": {
+ "file_id": 21,
+ "content": " if step % 5 == 0:\n print('[{}] Frequency: {:.2f} Hz'.format(\n step, 1.0 / (time.time() - read_start)))\n print('> Pos: {}'.format(pos_now.tolist()))\n print('> Vel: {}'.format(vel_now.tolist()))\n print('> Cur: {}'.format(cur_now.tolist()))",
+ "type": "code",
+ "location": "/dynamixel_client.py:599-604"
+ },
+ "319": {
+ "file_id": 21,
+ "content": "This code block prints the frequency, positions, velocities, and currents of the dynamixel servos every 5 steps in the loop.",
+ "type": "comment"
+ },
+ "320": {
+ "file_id": 22,
+ "content": "/ee_sim_env.py",
+ "type": "filepath"
+ },
+ "321": {
+ "file_id": 22,
+ "content": "The code creates a function for a bi-manual robot environment, initializes tasks and robots, sets rewards, uses physics simulation, and derives the \"InsertionEETask\" class. It assigns fixed rewards of 4 to contact scenarios in peg insertion tasks.",
+ "type": "summary"
+ },
+ "322": {
+ "file_id": 22,
+ "content": "import numpy as np\nimport collections\nimport os\nfrom constants import DT, XML_DIR, START_ARM_POSE\nfrom constants import PUPPET_GRIPPER_POSITION_CLOSE\nfrom constants import PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN\nfrom constants import PUPPET_GRIPPER_POSITION_NORMALIZE_FN\nfrom constants import PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN\nfrom utils import sample_box_pose, sample_insertion_pose\nfrom dm_control import mujoco\nfrom dm_control.rl import control\nfrom dm_control.suite import base\nimport IPython\ne = IPython.embed\ndef make_ee_sim_env(task_name):\n \"\"\"\n Environment for simulated robot bi-manual manipulation, with end-effector control.\n Action space: [left_arm_pose (7), # position and quaternion for end effector\n left_gripper_positions (1), # normalized gripper position (0: close, 1: open)\n right_arm_pose (7), # position and quaternion for end effector\n right_gripper_positions (1),] # normalized gripper position (0: close, 1: open)",
+ "type": "code",
+ "location": "/ee_sim_env.py:1-26"
+ },
+ "323": {
+ "file_id": 22,
+ "content": "The code imports necessary libraries and defines a function `make_ee_sim_env(task_name)` that creates an environment for simulated robot bi-manual manipulation with end-effector control. The action space includes left and right arm pose, along with gripper positions for both arms.",
+ "type": "comment"
+ },
+ "324": {
+ "file_id": 22,
+ "content": " Observation space: {\"qpos\": Concat[ left_arm_qpos (6), # absolute joint position\n left_gripper_position (1), # normalized gripper position (0: close, 1: open)\n right_arm_qpos (6), # absolute joint position\n right_gripper_qpos (1)] # normalized gripper position (0: close, 1: open)\n \"qvel\": Concat[ left_arm_qvel (6), # absolute joint velocity (rad)\n left_gripper_velocity (1), # normalized gripper velocity (pos: opening, neg: closing)\n right_arm_qvel (6), # absolute joint velocity (rad)\n right_gripper_qvel (1)] # normalized gripper velocity (pos: opening, neg: closing)\n \"images\": {\"main\": (480x640x3)} # h, w, c, dtype='uint8'\n \"\"\"\n if 'sim_transfer_cube' in task_name:",
+ "type": "code",
+ "location": "/ee_sim_env.py:28-38"
+ },
+ "325": {
+ "file_id": 22,
+ "content": "The code defines the observation space for a simulation environment, including absolute joint positions and velocities for both left and right arms, gripper positions and velocities, and image data from a camera. This is likely used in a robotics control algorithm or reinforcement learning task. If \"sim_transfer_cube\" is in the task name, it suggests that the simulation involves transferring an object (possibly a cube) between the left and right arms.",
+ "type": "comment"
+ },
+ "326": {
+ "file_id": 22,
+ "content": " xml_path = os.path.join(XML_DIR, f'bimanual_viperx_ee_transfer_cube.xml')\n physics = mujoco.Physics.from_xml_path(xml_path)\n task = TransferCubeEETask(random=False)\n env = control.Environment(physics, task, time_limit=20, control_timestep=DT,\n n_sub_steps=None, flat_observation=False)\n elif 'sim_insertion' in task_name:\n xml_path = os.path.join(XML_DIR, f'bimanual_viperx_ee_insertion.xml')\n physics = mujoco.Physics.from_xml_path(xml_path)\n task = InsertionEETask(random=False)\n env = control.Environment(physics, task, time_limit=20, control_timestep=DT,\n n_sub_steps=None, flat_observation=False)\n else:\n raise NotImplementedError\n return env\nclass BimanualViperXEETask(base.Task):\n def __init__(self, random=None):\n super().__init__(random=random)\n def before_step(self, action, physics):\n a_len = len(action) // 2\n action_left = action[:a_len]\n action_right = action[a_len:]",
+ "type": "code",
+ "location": "/ee_sim_env.py:39-61"
+ },
+ "327": {
+ "file_id": 22,
+ "content": "This code initializes an environment for a bimanual ViperX EE task, possibly either cube transfer or insertion. It joins the XML file path with the directory and loads the physics from the XML file. Then, it instantiates the specific task (TransferCubeEETask or InsertionEETask) based on the task name. Finally, it creates an environment object using the physics and task, setting the time limit, control timestep, and other options. If no matching task name is found, it raises a NotImplementedError. The BimanualViperXEETask class initializes the base task with an optional random parameter.",
+ "type": "comment"
+ },
+ "328": {
+ "file_id": 22,
+ "content": " # set mocap position and quat\n # left\n np.copyto(physics.data.mocap_pos[0], action_left[:3])\n np.copyto(physics.data.mocap_quat[0], action_left[3:7])\n # right\n np.copyto(physics.data.mocap_pos[1], action_right[:3])\n np.copyto(physics.data.mocap_quat[1], action_right[3:7])\n # set gripper\n g_left_ctrl = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(action_left[7])\n g_right_ctrl = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(action_right[7])\n np.copyto(physics.data.ctrl, np.array([g_left_ctrl, -g_left_ctrl, g_right_ctrl, -g_right_ctrl]))\n def initialize_robots(self, physics):\n # reset joint position\n physics.named.data.qpos[:16] = START_ARM_POSE\n # reset mocap to align with end effector\n # to obtain these numbers:\n # (1) make an ee_sim env and reset to the same start_pose\n # (2) get env._physics.named.data.xpos['vx300s_left/gripper_link']\n # get env._physics.named.data.xquat['vx300s_left/gripper_link']",
+ "type": "code",
+ "location": "/ee_sim_env.py:63-84"
+ },
+ "329": {
+ "file_id": 22,
+ "content": "This code initializes robots in the environment by resetting joint positions and setting mocap (motion capture) position and quaternion for left and right arms. It also sets gripper control values using a provided function, ensuring proper alignment between end effector and mocap data.",
+ "type": "comment"
+ },
+ "330": {
+ "file_id": 22,
+ "content": " # repeat the same for right side\n np.copyto(physics.data.mocap_pos[0], [-0.31718881+0.1, 0.5, 0.29525084])\n np.copyto(physics.data.mocap_quat[0], [1, 0, 0, 0])\n # right\n np.copyto(physics.data.mocap_pos[1], np.array([0.31718881-0.1, 0.49999888, 0.29525084]))\n np.copyto(physics.data.mocap_quat[1], [1, 0, 0, 0])\n # reset gripper control\n close_gripper_control = np.array([\n PUPPET_GRIPPER_POSITION_CLOSE,\n -PUPPET_GRIPPER_POSITION_CLOSE,\n PUPPET_GRIPPER_POSITION_CLOSE,\n -PUPPET_GRIPPER_POSITION_CLOSE,\n ])\n np.copyto(physics.data.ctrl, close_gripper_control)\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n super().initialize_episode(physics)\n @staticmethod\n def get_qpos(physics):\n qpos_raw = physics.data.qpos.copy()\n left_qpos_raw = qpos_raw[:8]\n right_qpos_raw = qpos_raw[8:16]\n left_arm_qpos = left_qpos_raw[:6]",
+ "type": "code",
+ "location": "/ee_sim_env.py:85-110"
+ },
+ "331": {
+ "file_id": 22,
+ "content": "This code segment sets the initial positions, orientations, and gripper control for both left and right sides of a simulated robot arm. It also defines an initialize_episode function and a get_qpos static method in a class inheriting from an unspecified base class. The left and right positions are set using numpy's copyto() function, and the gripper control is initialized to close position.",
+ "type": "comment"
+ },
+ "332": {
+ "file_id": 22,
+ "content": " right_arm_qpos = right_qpos_raw[:6]\n left_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(left_qpos_raw[6])]\n right_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(right_qpos_raw[6])]\n return np.concatenate([left_arm_qpos, left_gripper_qpos, right_arm_qpos, right_gripper_qpos])\n @staticmethod\n def get_qvel(physics):\n qvel_raw = physics.data.qvel.copy()\n left_qvel_raw = qvel_raw[:8]\n right_qvel_raw = qvel_raw[8:16]\n left_arm_qvel = left_qvel_raw[:6]\n right_arm_qvel = right_qvel_raw[:6]\n left_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(left_qvel_raw[6])]\n right_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(right_qvel_raw[6])]\n return np.concatenate([left_arm_qvel, left_gripper_qvel, right_arm_qvel, right_gripper_qvel])\n @staticmethod\n def get_env_state(physics):\n raise NotImplementedError\n def get_observation(self, physics):\n # note: it is important to do .copy()\n obs = collections.OrderedDict()",
+ "type": "code",
+ "location": "/ee_sim_env.py:111-133"
+ },
+ "333": {
+ "file_id": 22,
+ "content": "The code defines functions to extract joint positions, velocities, and environment state from physics data. It normalizes gripper position and velocity values using the respective PUPPET_*_NORMALIZE_FN functions. The get_observation function combines left and right arm joint positions, gripper positions, and velocities into a concatenated numpy array. The code also includes an unimplemented get_env_state method.",
+ "type": "comment"
+ },
+ "334": {
+ "file_id": 22,
+ "content": " obs['qpos'] = self.get_qpos(physics)\n obs['qvel'] = self.get_qvel(physics)\n obs['env_state'] = self.get_env_state(physics)\n obs['images'] = dict()\n obs['images']['top'] = physics.render(height=480, width=640, camera_id='top')\n # obs['images']['angle'] = physics.render(height=480, width=640, camera_id='angle')\n # obs['images']['vis'] = physics.render(height=480, width=640, camera_id='front_close')\n # used in scripted policy to obtain starting pose\n obs['mocap_pose_left'] = np.concatenate([physics.data.mocap_pos[0], physics.data.mocap_quat[0]]).copy()\n obs['mocap_pose_right'] = np.concatenate([physics.data.mocap_pos[1], physics.data.mocap_quat[1]]).copy()\n # used when replaying joint trajectory\n obs['gripper_ctrl'] = physics.data.ctrl.copy()\n return obs\n def get_reward(self, physics):\n raise NotImplementedError\nclass TransferCubeEETask(BimanualViperXEETask):\n def __init__(self, random=None):\n super().__init__(random=random)",
+ "type": "code",
+ "location": "/ee_sim_env.py:134-155"
+ },
+ "335": {
+ "file_id": 22,
+ "content": "This code defines a class for an environment in which a robot arm needs to manipulate a cube. The environment is initialized and returns observation (obs) containing information about the state of the robot, images from different camera perspectives, starting pose of the left and right mocap hands, and gripper control data. It also defines a reward function that needs to be implemented for specific tasks within this environment. This class inherits from BimanualViperXEETask which is likely another class for similar environments.",
+ "type": "comment"
+ },
+ "336": {
+ "file_id": 22,
+ "content": " self.max_reward = 4\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n self.initialize_robots(physics)\n # randomize box position\n cube_pose = sample_box_pose()\n box_start_idx = physics.model.name2id('red_box_joint', 'joint')\n np.copyto(physics.data.qpos[box_start_idx : box_start_idx + 7], cube_pose)\n # print(f\"randomized cube position to {cube_position}\")\n super().initialize_episode(physics)\n @staticmethod\n def get_env_state(physics):\n env_state = physics.data.qpos.copy()[16:]\n return env_state\n def get_reward(self, physics):\n # return whether left gripper is holding the box\n all_contact_pairs = []\n for i_contact in range(physics.data.ncon):\n id_geom_1 = physics.data.contact[i_contact].geom1\n id_geom_2 = physics.data.contact[i_contact].geom2\n name_geom_1 = physics.model.id2name(id_geom_1, 'geom')\n name_geom_2 = physics.model.id2name(id_geom_2, 'geom')",
+ "type": "code",
+ "location": "/ee_sim_env.py:156-181"
+ },
+ "337": {
+ "file_id": 22,
+ "content": "The code initializes the environment for each episode, randomizes the box position, and defines methods to get the environment state and reward in a physics simulation. The maximum reward is set to 4.",
+ "type": "comment"
+ },
+ "338": {
+ "file_id": 22,
+ "content": " contact_pair = (name_geom_1, name_geom_2)\n all_contact_pairs.append(contact_pair)\n touch_left_gripper = (\"red_box\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs\n touch_right_gripper = (\"red_box\", \"vx300s_right/10_right_gripper_finger\") in all_contact_pairs\n touch_table = (\"red_box\", \"table\") in all_contact_pairs\n reward = 0\n if touch_right_gripper:\n reward = 1\n if touch_right_gripper and not touch_table: # lifted\n reward = 2\n if touch_left_gripper: # attempted transfer\n reward = 3\n if touch_left_gripper and not touch_table: # successful transfer\n reward = 4\n return reward\nclass InsertionEETask(BimanualViperXEETask):\n def __init__(self, random=None):\n super().__init__(random=random)\n self.max_reward = 4\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n self.initialize_robots(physics)",
+ "type": "code",
+ "location": "/ee_sim_env.py:182-208"
+ },
+ "339": {
+ "file_id": 22,
+ "content": "The code defines a class called \"InsertionEETask\" which inherits from the \"BimanualViperXEETask\". This task seems to be related to manipulating objects in a simulation environment. It initializes the state of the environment at the start of each episode by calling the \"initialize_robots()\" function. The code checks for different contact scenarios and assigns corresponding rewards, ranging from 0 to 4. The maximum reward is set to 4.",
+ "type": "comment"
+ },
+ "340": {
+ "file_id": 22,
+ "content": " # randomize peg and socket position\n peg_pose, socket_pose = sample_insertion_pose()\n id2index = lambda j_id: 16 + (j_id - 16) * 7 # first 16 is robot qpos, 7 is pose dim # hacky\n peg_start_id = physics.model.name2id('red_peg_joint', 'joint')\n peg_start_idx = id2index(peg_start_id)\n np.copyto(physics.data.qpos[peg_start_idx : peg_start_idx + 7], peg_pose)\n # print(f\"randomized cube position to {cube_position}\")\n socket_start_id = physics.model.name2id('blue_socket_joint', 'joint')\n socket_start_idx = id2index(socket_start_id)\n np.copyto(physics.data.qpos[socket_start_idx : socket_start_idx + 7], socket_pose)\n # print(f\"randomized cube position to {cube_position}\")\n super().initialize_episode(physics)\n @staticmethod\n def get_env_state(physics):\n env_state = physics.data.qpos.copy()[16:]\n return env_state\n def get_reward(self, physics):\n # return whether peg touches the pin\n all_contact_pairs = []",
+ "type": "code",
+ "location": "/ee_sim_env.py:209-232"
+ },
+ "341": {
+ "file_id": 22,
+ "content": "This code initializes the episode by randomizing the peg and socket positions in a physics simulation. It converts joint IDs to indices, sets the new positions for the peg and socket using numpy copyto function, and calls the superclass' initialize_episode method. It also includes a get_env_state function which returns the environment state from the physics data qpos array excluding the first 16 elements (robot qpos), and a placeholder get_reward function that will return whether the peg touches the pin in all contact pairs.",
+ "type": "comment"
+ },
+ "342": {
+ "file_id": 22,
+ "content": " for i_contact in range(physics.data.ncon):\n id_geom_1 = physics.data.contact[i_contact].geom1\n id_geom_2 = physics.data.contact[i_contact].geom2\n name_geom_1 = physics.model.id2name(id_geom_1, 'geom')\n name_geom_2 = physics.model.id2name(id_geom_2, 'geom')\n contact_pair = (name_geom_1, name_geom_2)\n all_contact_pairs.append(contact_pair)\n touch_right_gripper = (\"red_peg\", \"vx300s_right/10_right_gripper_finger\") in all_contact_pairs\n touch_left_gripper = (\"socket-1\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-2\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-3\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-4\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs\n peg_touch_table = (\"red_peg\", \"table\") in all_contact_pairs\n socket_touch_table = (\"socket-1\", \"table\") in all_contact_pairs or \\",
+ "type": "code",
+ "location": "/ee_sim_env.py:233-248"
+ },
+ "343": {
+ "file_id": 22,
+ "content": "This code checks for contact between various objects in a physics simulation. It iterates through all contacts, retrieves the associated geometries and converts their IDs to names. Then, it identifies if a red peg is touching the right gripper, and checks multiple conditions for left gripper and socket-peg interactions with the table.",
+ "type": "comment"
+ },
+ "344": {
+ "file_id": 22,
+ "content": " (\"socket-2\", \"table\") in all_contact_pairs or \\\n (\"socket-3\", \"table\") in all_contact_pairs or \\\n (\"socket-4\", \"table\") in all_contact_pairs\n peg_touch_socket = (\"red_peg\", \"socket-1\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-2\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-3\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-4\") in all_contact_pairs\n pin_touched = (\"red_peg\", \"pin\") in all_contact_pairs\n reward = 0\n if touch_left_gripper and touch_right_gripper: # touch both\n reward = 1\n if touch_left_gripper and touch_right_gripper and (not peg_touch_table) and (not socket_touch_table): # grasp both\n reward = 2\n if peg_touch_socket and (not peg_touch_table) and (not socket_touch_table): # peg and socket touching\n reward = 3\n if pin_touched: # successful insertion",
+ "type": "code",
+ "location": "/ee_sim_env.py:249-265"
+ },
+ "345": {
+ "file_id": 22,
+ "content": "This code determines the reward based on contact pairs. It checks for touching \"socket-1\" to \"table\", \"socket-2\" to \"table\", etc. It also checks if any of the pegs are touching a socket, table or both, and if the red peg is touching the pin. The reward is given based on these conditions. If both gripper touch something, it gives a reward of 1. If both gripper touches nothing but grasp something, reward is 2. If peg touches socket but not table, reward is 3. Finally, if any peg touches the pin, it's considered as successful insertion.",
+ "type": "comment"
+ },
+ "346": {
+ "file_id": 22,
+ "content": " reward = 4\n return reward",
+ "type": "code",
+ "location": "/ee_sim_env.py:266-267"
+ },
+ "347": {
+ "file_id": 22,
+ "content": "This code snippet assigns a fixed reward value of 4 and then returns it. This suggests the reward is determined solely by this function without any external factors influencing it.",
+ "type": "comment"
+ },
+ "348": {
+ "file_id": 23,
+ "content": "/imitate_episodes.py",
+ "type": "filepath"
+ },
+ "349": {
+ "file_id": 23,
+ "content": "This program trains a policy network for robot control using reinforcement learning, VQ-VAE implementation, and behavioral cloning, while logging data, saving checkpoints, and validating performance.",
+ "type": "summary"
+ },
+ "350": {
+ "file_id": 23,
+ "content": "import torch\nimport numpy as np\nimport os\nimport pickle\nimport argparse\nimport matplotlib.pyplot as plt\nfrom copy import deepcopy\nfrom itertools import repeat\nfrom tqdm import tqdm\nfrom einops import rearrange\nimport wandb\nimport time\nfrom torchvision import transforms\nfrom constants import FPS\nfrom constants import PUPPET_GRIPPER_JOINT_OPEN\nfrom utils import load_data # data functions\nfrom utils import sample_box_pose, sample_insertion_pose # robot functions\nfrom utils import compute_dict_mean, set_seed, detach_dict, calibrate_linear_vel, postprocess_base_action # helper functions\nfrom policy import ACTPolicy, CNNMLPPolicy, DiffusionPolicy\nfrom visualize_episodes import save_videos\nfrom detr.models.latent_model import Latent_Model_Transformer\nfrom sim_env import BOX_POSE\nimport IPython\ne = IPython.embed\ndef get_auto_index(dataset_dir):\n max_idx = 1000\n for i in range(max_idx+1):\n if not os.path.isfile(os.path.join(dataset_dir, f'qpos_{i}.npy')):\n return i\n raise Exception(f\"Error getting auto index, or more than {max_idx} episodes\")",
+ "type": "code",
+ "location": "/imitate_episodes.py:1-35"
+ },
+ "351": {
+ "file_id": 23,
+ "content": "This code imports necessary libraries and defines functions for a reinforcement learning task. It sets up the environment, loads data, and initializes policy models. The `get_auto_index` function is used to find the next available index in the dataset directory.",
+ "type": "comment"
+ },
+ "352": {
+ "file_id": 23,
+ "content": "def main(args):\n set_seed(1)\n # command line parameters\n is_eval = args['eval']\n ckpt_dir = args['ckpt_dir']\n policy_class = args['policy_class']\n onscreen_render = args['onscreen_render']\n task_name = args['task_name']\n batch_size_train = args['batch_size']\n batch_size_val = args['batch_size']\n num_steps = args['num_steps']\n eval_every = args['eval_every']\n validate_every = args['validate_every']\n save_every = args['save_every']\n resume_ckpt_path = args['resume_ckpt_path']\n # get task parameters\n is_sim = task_name[:4] == 'sim_'\n if is_sim or task_name == 'all':\n from constants import SIM_TASK_CONFIGS\n task_config = SIM_TASK_CONFIGS[task_name]\n else:\n from aloha_scripts.constants import TASK_CONFIGS\n task_config = TASK_CONFIGS[task_name]\n dataset_dir = task_config['dataset_dir']\n # num_episodes = task_config['num_episodes']\n episode_len = task_config['episode_len']\n camera_names = task_config['camera_names']\n stats_dir = task_config.get('stats_dir', None)",
+ "type": "code",
+ "location": "/imitate_episodes.py:37-65"
+ },
+ "353": {
+ "file_id": 23,
+ "content": "The code defines a main function that takes command line arguments and uses them to set up the environment for running the simulation. It first sets the seed, then parses various parameters such as is_eval, ckpt_dir, policy_class, onscreen_render, task_name, batch_size_train, batch_size_val, num_steps, eval_every, validate_every, save_every, and resume_ckpt_path. It also determines if the task is simulation-based or not, then retrieves the task parameters from either SIM_TASK_CONFIGS or TASK_CONFIGS based on the task name. These parameters include dataset_dir, episode_len, camera_names, and stats_dir.",
+ "type": "comment"
+ },
+ "354": {
+ "file_id": 23,
+ "content": " sample_weights = task_config.get('sample_weights', None)\n train_ratio = task_config.get('train_ratio', 0.99)\n name_filter = task_config.get('name_filter', lambda n: True)\n # fixed parameters\n state_dim = 14\n lr_backbone = 1e-5\n backbone = 'resnet18'\n if policy_class == 'ACT':\n enc_layers = 4\n dec_layers = 7\n nheads = 8\n policy_config = {'lr': args['lr'],\n 'num_queries': args['chunk_size'],\n 'kl_weight': args['kl_weight'],\n 'hidden_dim': args['hidden_dim'],\n 'dim_feedforward': args['dim_feedforward'],\n 'lr_backbone': lr_backbone,\n 'backbone': backbone,\n 'enc_layers': enc_layers,\n 'dec_layers': dec_layers,\n 'nheads': nheads,\n 'camera_names': camera_names,\n 'vq': args['use_vq'],\n 'vq_class': args['vq_class'],",
+ "type": "code",
+ "location": "/imitate_episodes.py:66-90"
+ },
+ "355": {
+ "file_id": 23,
+ "content": "This code sets various fixed parameters for the ACT policy. It gets the sample weights, train ratio, and name filter from the task configuration. The state dimension is set to 14. Backbone learning rate is set to 1e-5 with a predefined backbone model. If the policy class is ACT, it further defines encoder layers, decoder layers, number of attention heads, and other configurations for the policy based on provided arguments. Camera names are also defined if needed. It also handles whether or not to use VQ (if specified by args).",
+ "type": "comment"
+ },
+ "356": {
+ "file_id": 23,
+ "content": " 'vq_dim': args['vq_dim'],\n 'action_dim': 16,\n 'no_encoder': args['no_encoder'],\n }\n elif policy_class == 'Diffusion':\n policy_config = {'lr': args['lr'],\n 'camera_names': camera_names,\n 'action_dim': 16,\n 'observation_horizon': 1,\n 'action_horizon': 8,\n 'prediction_horizon': args['chunk_size'],\n 'num_queries': args['chunk_size'],\n 'num_inference_timesteps': 10,\n 'ema_power': 0.75,\n 'vq': False,\n }\n elif policy_class == 'CNNMLP':\n policy_config = {'lr': args['lr'], 'lr_backbone': lr_backbone, 'backbone' : backbone, 'num_queries': 1,\n 'camera_names': camera_names,}\n else:\n raise NotImplementedError\n actuator_config = {\n 'actuator_network_dir': args['actuator_network_dir'],",
+ "type": "code",
+ "location": "/imitate_episodes.py:91-115"
+ },
+ "357": {
+ "file_id": 23,
+ "content": "This code is setting up different configurations for the policy based on the given policy_class. The 'AuxCritic' configuration includes an auxiliary critic, 'Diffusion' uses diffusion-based policy, and 'CNNMLP' uses a CNN and MLP-based policy. All configurations include learning rate (lr), camera names, and actuator network directory settings.",
+ "type": "comment"
+ },
+ "358": {
+ "file_id": 23,
+ "content": " 'history_len': args['history_len'],\n 'future_len': args['future_len'],\n 'prediction_len': args['prediction_len'],\n }\n config = {\n 'num_steps': num_steps,\n 'eval_every': eval_every,\n 'validate_every': validate_every,\n 'save_every': save_every,\n 'ckpt_dir': ckpt_dir,\n 'resume_ckpt_path': resume_ckpt_path,\n 'episode_len': episode_len,\n 'state_dim': state_dim,\n 'lr': args['lr'],\n 'policy_class': policy_class,\n 'onscreen_render': onscreen_render,\n 'policy_config': policy_config,\n 'task_name': task_name,\n 'seed': args['seed'],\n 'temporal_agg': args['temporal_agg'],\n 'camera_names': camera_names,\n 'real_robot': not is_sim,\n 'load_pretrain': args['load_pretrain'],\n 'actuator_config': actuator_config,\n }\n if not os.path.isdir(ckpt_dir):\n os.makedirs(ckpt_dir)\n config_path = os.path.join(ckpt_dir, 'config.pkl')\n expr_name = ckpt_dir.split('/')[-1]",
+ "type": "code",
+ "location": "/imitate_episodes.py:116-146"
+ },
+ "359": {
+ "file_id": 23,
+ "content": "The code is defining and initializing two dictionaries: 'train_args' and 'config'. These dictionaries store various arguments for the training process. The code also checks if a directory exists and creates it if not, and stores configuration information in a file named 'config.pkl' within that directory. This information will likely be used to train an agent for a specific task or environment.",
+ "type": "comment"
+ },
+ "360": {
+ "file_id": 23,
+ "content": " if not is_eval:\n wandb.init(project=\"mobile-aloha2\", reinit=True, entity=\"mobile-aloha2\", name=expr_name)\n wandb.config.update(config)\n with open(config_path, 'wb') as f:\n pickle.dump(config, f)\n if is_eval:\n ckpt_names = [f'policy_last.ckpt']\n results = []\n for ckpt_name in ckpt_names:\n success_rate, avg_return = eval_bc(config, ckpt_name, save_episode=True, num_rollouts=10)\n # wandb.log({'success_rate': success_rate, 'avg_return': avg_return})\n results.append([ckpt_name, success_rate, avg_return])\n for ckpt_name, success_rate, avg_return in results:\n print(f'{ckpt_name}: {success_rate=} {avg_return=}')\n print()\n exit()\n train_dataloader, val_dataloader, stats, _ = load_data(dataset_dir, name_filter, camera_names, batch_size_train, batch_size_val, args['chunk_size'], args['skip_mirrored_data'], config['load_pretrain'], policy_class, stats_dir_l=stats_dir, sample_weights=sample_weights, train_ratio=train_ratio)",
+ "type": "code",
+ "location": "/imitate_episodes.py:147-165"
+ },
+ "361": {
+ "file_id": 23,
+ "content": "The code initializes the WandB for evaluation, updates the config file if not in evaluation mode, and then evaluates different checkpoints. It logs success rate and average return for each checkpoint, prints them on console, and exits the program. If in training mode, it loads data, creates dataloaders, and returns necessary objects.",
+ "type": "comment"
+ },
+ "362": {
+ "file_id": 23,
+ "content": " # save dataset stats\n stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')\n with open(stats_path, 'wb') as f:\n pickle.dump(stats, f)\n best_ckpt_info = train_bc(train_dataloader, val_dataloader, config)\n best_step, min_val_loss, best_state_dict = best_ckpt_info\n # save best checkpoint\n ckpt_path = os.path.join(ckpt_dir, f'policy_best.ckpt')\n torch.save(best_state_dict, ckpt_path)\n print(f'Best ckpt, val loss {min_val_loss:.6f} @ step{best_step}')\n wandb.finish()\ndef make_policy(policy_class, policy_config):\n if policy_class == 'ACT':\n policy = ACTPolicy(policy_config)\n elif policy_class == 'CNNMLP':\n policy = CNNMLPPolicy(policy_config)\n elif policy_class == 'Diffusion':\n policy = DiffusionPolicy(policy_config)\n else:\n raise NotImplementedError\n return policy\ndef make_optimizer(policy_class, policy):\n if policy_class == 'ACT':\n optimizer = policy.configure_optimizers()\n elif policy_class == 'CNNMLP':\n optimizer = policy.configure_optimizers()",
+ "type": "code",
+ "location": "/imitate_episodes.py:167-198"
+ },
+ "363": {
+ "file_id": 23,
+ "content": "This code saves dataset statistics, trains a behavioral cloning model, and saves the best checkpoint. It also creates a policy object based on the policy class and configures an optimizer for it.",
+ "type": "comment"
+ },
+ "364": {
+ "file_id": 23,
+ "content": " elif policy_class == 'Diffusion':\n optimizer = policy.configure_optimizers()\n else:\n raise NotImplementedError\n return optimizer\ndef get_image(ts, camera_names, rand_crop_resize=False):\n curr_images = []\n for cam_name in camera_names:\n curr_image = rearrange(ts.observation['images'][cam_name], 'h w c -> c h w')\n curr_images.append(curr_image)\n curr_image = np.stack(curr_images, axis=0)\n curr_image = torch.from_numpy(curr_image / 255.0).float().cuda().unsqueeze(0)\n if rand_crop_resize:\n print('rand crop resize is used!')\n original_size = curr_image.shape[-2:]\n ratio = 0.95\n curr_image = curr_image[..., int(original_size[0] * (1 - ratio) / 2): int(original_size[0] * (1 + ratio) / 2),\n int(original_size[1] * (1 - ratio) / 2): int(original_size[1] * (1 + ratio) / 2)]\n curr_image = curr_image.squeeze(0)\n resize_transform = transforms.Resize(original_size, antialias=True)\n curr_image = resize_transform(curr_image)",
+ "type": "code",
+ "location": "/imitate_episodes.py:199-222"
+ },
+ "365": {
+ "file_id": 23,
+ "content": "This code snippet checks the policy class and configures the optimizer accordingly. If the policy class is 'Diffusion', it sets the optimizer using the policy's method. For any other policy class, a NotImplementedError is raised. The get_image function takes timestep (ts), camera names, and rand_crop_resize flag as input. It retrieves images from ts observation and reshapes them into a tensor for further processing. If rand_crop_resize is True, it randomly crops and resizes the image while maintaining aspect ratio.",
+ "type": "comment"
+ },
+ "366": {
+ "file_id": 23,
+ "content": " curr_image = curr_image.unsqueeze(0)\n return curr_image\ndef eval_bc(config, ckpt_name, save_episode=True, num_rollouts=50):\n set_seed(1000)\n ckpt_dir = config['ckpt_dir']\n state_dim = config['state_dim']\n real_robot = config['real_robot']\n policy_class = config['policy_class']\n onscreen_render = config['onscreen_render']\n policy_config = config['policy_config']\n camera_names = config['camera_names']\n max_timesteps = config['episode_len']\n task_name = config['task_name']\n temporal_agg = config['temporal_agg']\n onscreen_cam = 'angle'\n vq = config['policy_config']['vq']\n actuator_config = config['actuator_config']\n use_actuator_net = actuator_config['actuator_network_dir'] is not None\n # load policy and stats\n ckpt_path = os.path.join(ckpt_dir, ckpt_name)\n policy = make_policy(policy_class, policy_config)\n loading_status = policy.deserialize(torch.load(ckpt_path))\n print(loading_status)\n policy.cuda()\n policy.eval()\n if vq:\n vq_dim = config['policy_config']['vq_dim']",
+ "type": "code",
+ "location": "/imitate_episodes.py:223-253"
+ },
+ "367": {
+ "file_id": 23,
+ "content": "The code snippet loads a policy model from a checkpoint file and sets the model to evaluation mode. It also initializes variables related to the task, such as state dimensions and camera names. The policy is created using a specified class and configuration, and if the policy uses a VQ-VAE, it initializes the corresponding dimensions.",
+ "type": "comment"
+ },
+ "368": {
+ "file_id": 23,
+ "content": " vq_class = config['policy_config']['vq_class']\n latent_model = Latent_Model_Transformer(vq_dim, vq_dim, vq_class)\n latent_model_ckpt_path = os.path.join(ckpt_dir, 'latent_model_last.ckpt')\n latent_model.deserialize(torch.load(latent_model_ckpt_path))\n latent_model.eval()\n latent_model.cuda()\n print(f'Loaded policy from: {ckpt_path}, latent model from: {latent_model_ckpt_path}')\n else:\n print(f'Loaded: {ckpt_path}')\n stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')\n with open(stats_path, 'rb') as f:\n stats = pickle.load(f)\n # if use_actuator_net:\n # prediction_len = actuator_config['prediction_len']\n # future_len = actuator_config['future_len']\n # history_len = actuator_config['history_len']\n # actuator_network_dir = actuator_config['actuator_network_dir']\n # from act.train_actuator_network import ActuatorNetwork\n # actuator_network = ActuatorNetwork(prediction_len)\n # actuator_network_path = os.path.join(actuator_network_dir, 'actuator_net_last.ckpt')",
+ "type": "code",
+ "location": "/imitate_episodes.py:254-274"
+ },
+ "369": {
+ "file_id": 23,
+ "content": "This code is loading a policy from the specified checkpoint path and a latent model from the specified latent_model_ckpt_path. It also loads dataset statistics from stats_path. Additionally, if use_actuator_net is True, it initializes an ActuatorNetwork object with specific parameters, and loads the actuator network from its designated checkpoint path.",
+ "type": "comment"
+ },
+ "370": {
+ "file_id": 23,
+ "content": " # loading_status = actuator_network.load_state_dict(torch.load(actuator_network_path))\n # actuator_network.eval()\n # actuator_network.cuda()\n # print(f'Loaded actuator network from: {actuator_network_path}, {loading_status}')\n # actuator_stats_path = os.path.join(actuator_network_dir, 'actuator_net_stats.pkl')\n # with open(actuator_stats_path, 'rb') as f:\n # actuator_stats = pickle.load(f)\n # actuator_unnorm = lambda x: x * actuator_stats['commanded_speed_std'] + actuator_stats['commanded_speed_std']\n # actuator_norm = lambda x: (x - actuator_stats['observed_speed_mean']) / actuator_stats['observed_speed_mean']\n # def collect_base_action(all_actions, norm_episode_all_base_actions):\n # post_processed_actions = post_process(all_actions.squeeze(0).cpu().numpy())\n # norm_episode_all_base_actions += actuator_norm(post_processed_actions[:, -2:]).tolist()\n pre_process = lambda s_qpos: (s_qpos - stats['qpos_mean']) / stats['qpos_std']",
+ "type": "code",
+ "location": "/imitate_episodes.py:275-290"
+ },
+ "371": {
+ "file_id": 23,
+ "content": "Loading the actuator network from the specified path, evaluating the network, moving it to GPU if available, and printing a message confirming the loading status. The actuator_net_stats.pkl file is opened and actuator stats are loaded. Two lambda functions, actuator_unnorm and actuator_norm, are defined for data normalization. A function named collect_base_action is defined to collect base actions after post-processing them. A pre_process lambda function is also defined for normalizing the state qpos.",
+ "type": "comment"
+ },
+ "372": {
+ "file_id": 23,
+ "content": " if policy_class == 'Diffusion':\n post_process = lambda a: ((a + 1) / 2) * (stats['action_max'] - stats['action_min']) + stats['action_min']\n else:\n post_process = lambda a: a * stats['action_std'] + stats['action_mean']\n # load environment\n if real_robot:\n from aloha_scripts.robot_utils import move_grippers # requires aloha\n from aloha_scripts.real_env import make_real_env # requires aloha\n env = make_real_env(init_node=True, setup_robots=True, setup_base=True)\n env_max_reward = 0\n else:\n from sim_env import make_sim_env\n env = make_sim_env(task_name)\n env_max_reward = env.task.max_reward\n query_frequency = policy_config['num_queries']\n if temporal_agg:\n query_frequency = 1\n num_queries = policy_config['num_queries']\n if real_robot:\n BASE_DELAY = 13\n query_frequency -= BASE_DELAY\n max_timesteps = int(max_timesteps * 1) # may increase for real-world tasks\n episode_returns = []\n highest_rewards = []",
+ "type": "code",
+ "location": "/imitate_episodes.py:291-318"
+ },
+ "373": {
+ "file_id": 23,
+ "content": "This code block initializes the environment and sets up parameters based on whether it is running in a real-world or simulation environment. It also accounts for temporal aggregation and potential delay in the real world. Finally, it initializes empty lists to store episode returns and highest rewards during the learning process.",
+ "type": "comment"
+ },
+ "374": {
+ "file_id": 23,
+ "content": " for rollout_id in range(num_rollouts):\n if real_robot:\n e()\n rollout_id += 0\n ### set task\n if 'sim_transfer_cube' in task_name:\n BOX_POSE[0] = sample_box_pose() # used in sim reset\n elif 'sim_insertion' in task_name:\n BOX_POSE[0] = np.concatenate(sample_insertion_pose()) # used in sim reset\n ts = env.reset()\n ### onscreen render\n if onscreen_render:\n ax = plt.subplot()\n plt_img = ax.imshow(env._physics.render(height=480, width=640, camera_id=onscreen_cam))\n plt.ion()\n ### evaluation loop\n if temporal_agg:\n all_time_actions = torch.zeros([max_timesteps, max_timesteps+num_queries, 16]).cuda()\n # qpos_history = torch.zeros((1, max_timesteps, state_dim)).cuda()\n qpos_history_raw = np.zeros((max_timesteps, state_dim))\n image_list = [] # for visualization\n qpos_list = []\n target_qpos_list = []\n rewards = []\n # if use_actuator_net:",
+ "type": "code",
+ "location": "/imitate_episodes.py:319-347"
+ },
+ "375": {
+ "file_id": 23,
+ "content": "This code initializes a rollout_id for a loop, sets the task based on the task name, resets the environment, renders the screen if desired, and prepares variables for an evaluation loop. If \"use_actuator_net\" is enabled, this will be used.",
+ "type": "comment"
+ },
+ "376": {
+ "file_id": 23,
+ "content": " # norm_episode_all_base_actions = [actuator_norm(np.zeros(history_len, 2)).tolist()]\n with torch.inference_mode():\n time0 = time.time()\n DT = 1 / FPS\n culmulated_delay = 0 \n for t in range(max_timesteps):\n time1 = time.time()\n ### update onscreen render and wait for DT\n if onscreen_render:\n image = env._physics.render(height=480, width=640, camera_id=onscreen_cam)\n plt_img.set_data(image)\n plt.pause(DT)\n ### process previous timestep to get qpos and image_list\n time2 = time.time()\n obs = ts.observation\n if 'images' in obs:\n image_list.append(obs['images'])\n else:\n image_list.append({'main': obs['image']})\n qpos_numpy = np.array(obs['qpos'])\n qpos_history_raw[t] = qpos_numpy\n qpos = pre_process(qpos_numpy)",
+ "type": "code",
+ "location": "/imitate_episodes.py:348-370"
+ },
+ "377": {
+ "file_id": 23,
+ "content": "The code updates the onscreen render and waits for a delay (DT), processes previous timestep to get qpos and image_list, and pre-processes qpos. It does this within a loop for maximum timesteps, with timing measurements at specific points.",
+ "type": "comment"
+ },
+ "378": {
+ "file_id": 23,
+ "content": " qpos = torch.from_numpy(qpos).float().cuda().unsqueeze(0)\n # qpos_history[:, t] = qpos\n if t % query_frequency == 0:\n curr_image = get_image(ts, camera_names, rand_crop_resize=(config['policy_class'] == 'Diffusion'))\n # print('get image: ', time.time() - time2)\n if t == 0:\n # warm up\n for _ in range(10):\n policy(qpos, curr_image)\n print('network warm up done')\n time1 = time.time()\n ### query policy\n time3 = time.time()\n if config['policy_class'] == \"ACT\":\n if t % query_frequency == 0:\n if vq:\n if rollout_id == 0:\n for _ in range(10):\n vq_sample = latent_model.generate(1, temperature=1, x=None)\n print(torch.nonzero(vq_sample[0])[:, 1].cpu().numpy())",
+ "type": "code",
+ "location": "/imitate_episodes.py:371-392"
+ },
+ "379": {
+ "file_id": 23,
+ "content": "This code performs query-based policy execution in a reinforcement learning environment. It prepares input data and queries the policy network for action choices based on the current state. If the frequency requirement is met, it captures the image from a specified camera and applies any required preprocessing. The code also includes a warm-up step to prepare the neural network before executing the policy, and handles generating samples from a latent model if necessary.",
+ "type": "comment"
+ },
+ "380": {
+ "file_id": 23,
+ "content": " vq_sample = latent_model.generate(1, temperature=1, x=None)\n all_actions = policy(qpos, curr_image, vq_sample=vq_sample)\n else:\n # e()\n all_actions = policy(qpos, curr_image)\n # if use_actuator_net:\n # collect_base_action(all_actions, norm_episode_all_base_actions)\n if real_robot:\n all_actions = torch.cat([all_actions[:, :-BASE_DELAY, :-2], all_actions[:, BASE_DELAY:, -2:]], dim=2)\n if temporal_agg:\n all_time_actions[[t], t:t+num_queries] = all_actions\n actions_for_curr_step = all_time_actions[:, t]\n actions_populated = torch.all(actions_for_curr_step != 0, axis=1)\n actions_for_curr_step = actions_for_curr_step[actions_populated]\n k = 0.01\n exp_weights = np.exp(-k * np.arange(len(actions_for_curr_step)))",
+ "type": "code",
+ "location": "/imitate_episodes.py:393-408"
+ },
+ "381": {
+ "file_id": 23,
+ "content": "This code generates an action based on the given state and either additional latent variables or just the state. If using a real robot, it modifies the generated actions to account for a base delay in the actuator response time. If temporal aggregation is enabled, the code collects all-time actions, filters out any zeros, and assigns weights based on an exponential function of the action index.",
+ "type": "comment"
+ },
+ "382": {
+ "file_id": 23,
+ "content": " exp_weights = exp_weights / exp_weights.sum()\n exp_weights = torch.from_numpy(exp_weights).cuda().unsqueeze(dim=1)\n raw_action = (actions_for_curr_step * exp_weights).sum(dim=0, keepdim=True)\n else:\n raw_action = all_actions[:, t % query_frequency]\n # if t % query_frequency == query_frequency - 1:\n # # zero out base actions to avoid overshooting\n # raw_action[0, -2:] = 0\n elif config['policy_class'] == \"Diffusion\":\n if t % query_frequency == 0:\n all_actions = policy(qpos, curr_image)\n # if use_actuator_net:\n # collect_base_action(all_actions, norm_episode_all_base_actions)\n if real_robot:\n all_actions = torch.cat([all_actions[:, :-BASE_DELAY, :-2], all_actions[:, BASE_DELAY:, -2:]], dim=2)",
+ "type": "code",
+ "location": "/imitate_episodes.py:409-423"
+ },
+ "383": {
+ "file_id": 23,
+ "content": "This code appears to be part of a larger program that utilizes different policies and actions for robotic control. It seems to handle policy selection based on the current time step, t, and query frequency. If the policy is set as \"Diffusion\", it retrieves new actions from the policy at specific intervals, potentially accounting for delays or base actions. The code also handles real robot interactions, adjusting action sequences accordingly.",
+ "type": "comment"
+ },
+ "384": {
+ "file_id": 23,
+ "content": " raw_action = all_actions[:, t % query_frequency]\n elif config['policy_class'] == \"CNNMLP\":\n raw_action = policy(qpos, curr_image)\n all_actions = raw_action.unsqueeze(0)\n # if use_actuator_net:\n # collect_base_action(all_actions, norm_episode_all_base_actions)\n else:\n raise NotImplementedError\n # print('query policy: ', time.time() - time3)\n ### post-process actions\n time4 = time.time()\n raw_action = raw_action.squeeze(0).cpu().numpy()\n action = post_process(raw_action)\n target_qpos = action[:-2]\n # if use_actuator_net:\n # assert(not temporal_agg)\n # if t % prediction_len == 0:\n # offset_start_ts = t + history_len\n # actuator_net_in = np.array(norm_episode_all_base_actions[offset_start_ts - history_len: offset_start_ts + future_len])",
+ "type": "code",
+ "location": "/imitate_episodes.py:424-444"
+ },
+ "385": {
+ "file_id": 23,
+ "content": "This code selects the policy based on the config value and performs necessary actions. It uses CNNMLP for querying the policy, post-processes the raw action output, and assigns target_qpos from the processed action values. It also handles actuator net usage with temporal aggregation if configured.",
+ "type": "comment"
+ },
+ "386": {
+ "file_id": 23,
+ "content": " # actuator_net_in = torch.from_numpy(actuator_net_in).float().unsqueeze(dim=0).cuda()\n # pred = actuator_network(actuator_net_in)\n # base_action_chunk = actuator_unnorm(pred.detach().cpu().numpy()[0])\n # base_action = base_action_chunk[t % prediction_len]\n # else:\n base_action = action[-2:]\n # base_action = calibrate_linear_vel(base_action, c=0.19)\n # base_action = postprocess_base_action(base_action)\n # print('post process: ', time.time() - time4)\n ### step the environment\n time5 = time.time()\n if real_robot:\n ts = env.step(target_qpos, base_action)\n else:\n ts = env.step(target_qpos)\n # print('step env: ', time.time() - time5)\n ### for visualization\n qpos_list.append(qpos_numpy)\n target_qpos_list.append(target_qpos)",
+ "type": "code",
+ "location": "/imitate_episodes.py:445-465"
+ },
+ "387": {
+ "file_id": 23,
+ "content": "Code segment is responsible for updating the base action based on whether an actuator network prediction is available or not. If a prediction exists, it normalizes and detaches the prediction before selecting the relevant chunk. Else, it uses the last two elements of the given action as the base action after applying linear velocity calibration (commented out) and post-processing (also commented out). The code then steps the environment using the calculated base action and appends current qpos to qpos_list and target_qpos to target_qpos_list for visualization purposes.",
+ "type": "comment"
+ },
+ "388": {
+ "file_id": 23,
+ "content": " rewards.append(ts.reward)\n duration = time.time() - time1\n sleep_time = max(0, DT - duration)\n # print(sleep_time)\n time.sleep(sleep_time)\n # time.sleep(max(0, DT - duration - culmulated_delay))\n if duration >= DT:\n culmulated_delay += (duration - DT)\n print(f'Warning: step duration: {duration:.3f} s at step {t} longer than DT: {DT} s, culmulated delay: {culmulated_delay:.3f} s')\n # else:\n # culmulated_delay = max(0, culmulated_delay - (DT - duration))\n print(f'Avg fps {max_timesteps / (time.time() - time0)}')\n plt.close()\n if real_robot:\n move_grippers([env.puppet_bot_left, env.puppet_bot_right], [PUPPET_GRIPPER_JOINT_OPEN] * 2, move_time=0.5) # open\n # save qpos_history_raw\n log_id = get_auto_index(ckpt_dir)\n np.save(os.path.join(ckpt_dir, f'qpos_{log_id}.npy'), qpos_history_raw)",
+ "type": "code",
+ "location": "/imitate_episodes.py:466-484"
+ },
+ "389": {
+ "file_id": 23,
+ "content": "The code appends rewards to a list, calculates and controls sleep time for synchronization, handles step duration longer than DT by accumulating delay, prints warning and updates cumulative delay if necessary, calculates average FPS, closes the plot window. If real_robot is True, it opens grippers and saves qpos_history_raw in a specified directory with an auto-incrementing index.",
+ "type": "comment"
+ },
+ "390": {
+ "file_id": 23,
+ "content": " plt.figure(figsize=(10, 20))\n # plot qpos_history_raw for each qpos dim using subplots\n for i in range(state_dim):\n plt.subplot(state_dim, 1, i+1)\n plt.plot(qpos_history_raw[:, i])\n # remove x axis\n if i != state_dim - 1:\n plt.xticks([])\n plt.tight_layout()\n plt.savefig(os.path.join(ckpt_dir, f'qpos_{log_id}.png'))\n plt.close()\n rewards = np.array(rewards)\n episode_return = np.sum(rewards[rewards!=None])\n episode_returns.append(episode_return)\n episode_highest_reward = np.max(rewards)\n highest_rewards.append(episode_highest_reward)\n print(f'Rollout {rollout_id}\\n{episode_return=}, {episode_highest_reward=}, {env_max_reward=}, Success: {episode_highest_reward==env_max_reward}')\n # if save_episode:\n # save_videos(image_list, DT, video_path=os.path.join(ckpt_dir, f'video{rollout_id}.mp4'))\n success_rate = np.mean(np.array(highest_rewards) == env_max_reward)",
+ "type": "code",
+ "location": "/imitate_episodes.py:485-508"
+ },
+ "391": {
+ "file_id": 23,
+ "content": "The code plots the history of qpos for each dimension and saves it as an image, calculates episode return and highest reward, prints the results, and checks if the highest reward equals the environment's maximum reward. It then calculates the success rate based on the highest rewards.",
+ "type": "comment"
+ },
+ "392": {
+ "file_id": 23,
+ "content": " avg_return = np.mean(episode_returns)\n summary_str = f'\\nSuccess rate: {success_rate}\\nAverage return: {avg_return}\\n\\n'\n for r in range(env_max_reward+1):\n more_or_equal_r = (np.array(highest_rewards) >= r).sum()\n more_or_equal_r_rate = more_or_equal_r / num_rollouts\n summary_str += f'Reward >= {r}: {more_or_equal_r}/{num_rollouts} = {more_or_equal_r_rate*100}%\\n'\n print(summary_str)\n # save success rate to txt\n result_file_name = 'result_' + ckpt_name.split('.')[0] + '.txt'\n with open(os.path.join(ckpt_dir, result_file_name), 'w') as f:\n f.write(summary_str)\n f.write(repr(episode_returns))\n f.write('\\n\\n')\n f.write(repr(highest_rewards))\n return success_rate, avg_return\ndef forward_pass(data, policy):\n image_data, qpos_data, action_data, is_pad = data\n image_data, qpos_data, action_data, is_pad = image_data.cuda(), qpos_data.cuda(), action_data.cuda(), is_pad.cuda()\n return policy(qpos_data, image_data, action_data, is_pad) # TODO remove None",
+ "type": "code",
+ "location": "/imitate_episodes.py:509-532"
+ },
+ "393": {
+ "file_id": 23,
+ "content": "Code block calculates success rate and average return from episode results, displays summary in console, writes the summary to a text file along with episode returns and highest rewards.\n\nThe forward_pass function takes input data (image_data, qpos_data, action_data, is_pad) and passes it through the policy network.",
+ "type": "comment"
+ },
+ "394": {
+ "file_id": 23,
+ "content": "def train_bc(train_dataloader, val_dataloader, config):\n num_steps = config['num_steps']\n ckpt_dir = config['ckpt_dir']\n seed = config['seed']\n policy_class = config['policy_class']\n policy_config = config['policy_config']\n eval_every = config['eval_every']\n validate_every = config['validate_every']\n save_every = config['save_every']\n set_seed(seed)\n policy = make_policy(policy_class, policy_config)\n if config['load_pretrain']:\n loading_status = policy.deserialize(torch.load(os.path.join('/home/zfu/interbotix_ws/src/act/ckpts/pretrain_all', 'policy_step_50000_seed_0.ckpt')))\n print(f'loaded! {loading_status}')\n if config['resume_ckpt_path'] is not None:\n loading_status = policy.deserialize(torch.load(config['resume_ckpt_path']))\n print(f'Resume policy from: {config[\"resume_ckpt_path\"]}, Status: {loading_status}')\n policy.cuda()\n optimizer = make_optimizer(policy_class, policy)\n min_val_loss = np.inf\n best_ckpt_info = None\n train_dataloader = repeater(train_dataloader)",
+ "type": "code",
+ "location": "/imitate_episodes.py:535-560"
+ },
+ "395": {
+ "file_id": 23,
+ "content": "The code defines a \"train_bc\" function which trains a policy using a specified data loader. It sets up various configurations, checks if it should load pre-trained weights or resume training from a previous checkpoint, and initializes the optimizer. The function uses a repeater to repeat the training data loader for consistency.",
+ "type": "comment"
+ },
+ "396": {
+ "file_id": 23,
+ "content": " for step in tqdm(range(num_steps+1)):\n # validation\n if step % validate_every == 0:\n print('validating')\n with torch.inference_mode():\n policy.eval()\n validation_dicts = []\n for batch_idx, data in enumerate(val_dataloader):\n forward_dict = forward_pass(data, policy)\n validation_dicts.append(forward_dict)\n if batch_idx > 50:\n break\n validation_summary = compute_dict_mean(validation_dicts)\n epoch_val_loss = validation_summary['loss']\n if epoch_val_loss < min_val_loss:\n min_val_loss = epoch_val_loss\n best_ckpt_info = (step, min_val_loss, deepcopy(policy.serialize()))\n for k in list(validation_summary.keys()):\n validation_summary[f'val_{k}'] = validation_summary.pop(k) \n wandb.log(validation_summary, step=step)\n print(f'Val loss: {epoch_val_loss:.5f}')",
+ "type": "code",
+ "location": "/imitate_episodes.py:561-584"
+ },
+ "397": {
+ "file_id": 23,
+ "content": "This code is performing a validation step at certain intervals during training. It logs the validation summary to WandB and keeps track of the best validation loss seen so far. The best model checkpoint information is updated if the current validation loss is lower than the minimum previously observed.",
+ "type": "comment"
+ },
+ "398": {
+ "file_id": 23,
+ "content": " summary_string = ''\n for k, v in validation_summary.items():\n summary_string += f'{k}: {v.item():.3f} '\n print(summary_string)\n # evaluation\n if (step > 0) and (step % eval_every == 0):\n # first save then eval\n ckpt_name = f'policy_step_{step}_seed_{seed}.ckpt'\n ckpt_path = os.path.join(ckpt_dir, ckpt_name)\n torch.save(policy.serialize(), ckpt_path)\n success, _ = eval_bc(config, ckpt_name, save_episode=True, num_rollouts=10)\n wandb.log({'success': success}, step=step)\n # training\n policy.train()\n optimizer.zero_grad()\n data = next(train_dataloader)\n forward_dict = forward_pass(data, policy)\n # backward\n loss = forward_dict['loss']\n loss.backward()\n optimizer.step()\n wandb.log(forward_dict, step=step) # not great, make training 1-2% slower\n if step % save_every == 0:\n ckpt_path = os.path.join(ckpt_dir, f'policy_step_{step}_seed_{seed}.ckpt')",
+ "type": "code",
+ "location": "/imitate_episodes.py:585-611"
+ },
+ "399": {
+ "file_id": 23,
+ "content": "The code performs validation, evaluation, and training steps. It logs the success rate of evaluations, saves checkpoints at certain intervals, trains a policy network using forward and backward passes, and logs data for later analysis.",
+ "type": "comment"
+ }
+}
\ No newline at end of file
diff --git a/docs/data/4.json b/docs/data/4.json
new file mode 100644
index 00000000..cfb2eaf2
--- /dev/null
+++ b/docs/data/4.json
@@ -0,0 +1,546 @@
+{
+ "400": {
+ "file_id": 23,
+ "content": " torch.save(policy.serialize(), ckpt_path)\n ckpt_path = os.path.join(ckpt_dir, f'policy_last.ckpt')\n torch.save(policy.serialize(), ckpt_path)\n best_step, min_val_loss, best_state_dict = best_ckpt_info\n ckpt_path = os.path.join(ckpt_dir, f'policy_step_{best_step}_seed_{seed}.ckpt')\n torch.save(best_state_dict, ckpt_path)\n print(f'Training finished:\\nSeed {seed}, val loss {min_val_loss:.6f} at step {best_step}')\n return best_ckpt_info\ndef repeater(data_loader):\n epoch = 0\n for loader in repeat(data_loader):\n for data in loader:\n yield data\n print(f'Epoch {epoch} done')\n epoch += 1\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--eval', action='store_true')\n parser.add_argument('--onscreen_render', action='store_true')\n parser.add_argument('--ckpt_dir', action='store', type=str, help='ckpt_dir', required=True)\n parser.add_argument('--policy_class', action='store', type=str, help='policy_class, capitalize', required=True)",
+ "type": "code",
+ "location": "/imitate_episodes.py:612-638"
+ },
+ "401": {
+ "file_id": 23,
+ "content": "The code defines a function to train and save a policy, repeats the data loader for multiple epochs, and takes command-line arguments for evaluation, on-screen rendering, checkpoint directory, and policy class. The training finishes when it finds the best model based on validation loss, saves it, and prints information about the best step, seed, and validation loss.",
+ "type": "comment"
+ },
+ "402": {
+ "file_id": 23,
+ "content": " parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--batch_size', action='store', type=int, help='batch_size', required=True)\n parser.add_argument('--seed', action='store', type=int, help='seed', required=True)\n parser.add_argument('--num_steps', action='store', type=int, help='num_steps', required=True)\n parser.add_argument('--lr', action='store', type=float, help='lr', required=True)\n parser.add_argument('--load_pretrain', action='store_true', default=False)\n parser.add_argument('--eval_every', action='store', type=int, default=500, help='eval_every', required=False)\n parser.add_argument('--validate_every', action='store', type=int, default=500, help='validate_every', required=False)\n parser.add_argument('--save_every', action='store', type=int, default=500, help='save_every', required=False)\n parser.add_argument('--resume_ckpt_path', action='store', type=str, help='resume_ckpt_path', required=False)",
+ "type": "code",
+ "location": "/imitate_episodes.py:639-648"
+ },
+ "403": {
+ "file_id": 23,
+ "content": "The code above is using the ArgumentParser from Python's argparse module to add various command-line arguments for a task. These arguments include 'task_name', 'batch_size', 'seed', 'num_steps', 'lr', 'load_pretrain', 'eval_every', and 'validate_every'. The 'save_every' argument is optional, as well as the 'resume_ckpt_path'. These arguments are required or defaulted depending on the specifications.",
+ "type": "comment"
+ },
+ "404": {
+ "file_id": 23,
+ "content": " parser.add_argument('--skip_mirrored_data', action='store_true')\n parser.add_argument('--actuator_network_dir', action='store', type=str, help='actuator_network_dir', required=False)\n parser.add_argument('--history_len', action='store', type=int)\n parser.add_argument('--future_len', action='store', type=int)\n parser.add_argument('--prediction_len', action='store', type=int)\n # for ACT\n parser.add_argument('--kl_weight', action='store', type=int, help='KL Weight', required=False)\n parser.add_argument('--chunk_size', action='store', type=int, help='chunk_size', required=False)\n parser.add_argument('--hidden_dim', action='store', type=int, help='hidden_dim', required=False)\n parser.add_argument('--dim_feedforward', action='store', type=int, help='dim_feedforward', required=False)\n parser.add_argument('--temporal_agg', action='store_true')\n parser.add_argument('--use_vq', action='store_true')\n parser.add_argument('--vq_class', action='store', type=int, help='vq_class')",
+ "type": "code",
+ "location": "/imitate_episodes.py:649-662"
+ },
+ "405": {
+ "file_id": 23,
+ "content": "This code is using the Argparse module to define command-line arguments for a Python script. The arguments include options such as skipping mirrored data, specifying directories and lengths for history, future, and prediction. For ACT (Adaptive Computation Time) model, additional arguments like KL weight, chunk size, hidden dimension, feedforward dimension, and use of Variational Quantization are defined. These arguments allow the user to customize the behavior of the script based on their specific needs.",
+ "type": "comment"
+ },
+ "406": {
+ "file_id": 23,
+ "content": " parser.add_argument('--vq_dim', action='store', type=int, help='vq_dim')\n parser.add_argument('--no_encoder', action='store_true')\n main(vars(parser.parse_args()))",
+ "type": "code",
+ "location": "/imitate_episodes.py:663-666"
+ },
+ "407": {
+ "file_id": 23,
+ "content": "These lines are adding command line arguments to the parser object, allowing users to specify values for 'vq_dim' and 'no_encoder'. The first argument, '--vq_dim', uses integer type and provides a help message. The second argument, '--no_encoder', is set as a boolean flag when true. Lastly, the main function is called with the parsed arguments passed in as keyword arguments.",
+ "type": "comment"
+ },
+ "408": {
+ "file_id": 24,
+ "content": "/policy.py",
+ "type": "filepath"
+ },
+ "409": {
+ "file_id": 24,
+ "content": "The code creates a policy network for multi-camera image tasks, trains a noise residual prediction model, and includes an ACTPolicy class for reinforcement learning with normalization and loss calculation. It also defines a CNNMLP model for processing states, images, actions, with KL divergence, MSE loss, and training/inference modes.",
+ "type": "summary"
+ },
+ "410": {
+ "file_id": 24,
+ "content": "import torch.nn as nn\nfrom torch.nn import functional as F\nimport torchvision.transforms as transforms\nimport torch\nimport numpy as np\nfrom detr.main import build_ACT_model_and_optimizer, build_CNNMLP_model_and_optimizer\nimport IPython\ne = IPython.embed\nfrom collections import OrderedDict\nfrom robomimic.models.base_nets import ResNet18Conv, SpatialSoftmax\nfrom robomimic.algo.diffusion_policy import replace_bn_with_gn, ConditionalUnet1D\nfrom diffusers.schedulers.scheduling_ddpm import DDPMScheduler\nfrom diffusers.schedulers.scheduling_ddim import DDIMScheduler\nfrom diffusers.training_utils import EMAModel\nclass DiffusionPolicy(nn.Module):\n def __init__(self, args_override):\n super().__init__()\n self.camera_names = args_override['camera_names']\n self.observation_horizon = args_override['observation_horizon'] ### TODO TODO TODO DO THIS\n self.action_horizon = args_override['action_horizon'] # apply chunk size\n self.prediction_horizon = args_override['prediction_horizon'] # chunk size",
+ "type": "code",
+ "location": "/policy.py:1-28"
+ },
+ "411": {
+ "file_id": 24,
+ "content": "The code imports necessary libraries and classes, defines a class for the DiffusionPolicy model, and includes parameters such as camera names, observation horizon, action horizon, and prediction horizon. The function build_ACT_model_and_optimizer and build_CNNMLP_model_and_optimizer are used to create models and optimizers, while replace_bn_with_gn and ConditionalUnet1D functions are called. EMAModel and scheduling classes DDPMScheduler and DDIMScheduler are also imported for training and scheduling purposes.",
+ "type": "comment"
+ },
+ "412": {
+ "file_id": 24,
+ "content": " self.num_inference_timesteps = args_override['num_inference_timesteps']\n self.ema_power = args_override['ema_power']\n self.lr = args_override['lr']\n self.weight_decay = 0\n self.num_kp = 32\n self.feature_dimension = 64\n self.ac_dim = args_override['action_dim'] # 14 + 2\n self.obs_dim = self.feature_dimension * len(self.camera_names) + 14 # camera features and proprio\n backbones = []\n pools = []\n linears = []\n for _ in self.camera_names:\n backbones.append(ResNet18Conv(**{'input_channel': 3, 'pretrained': False, 'input_coord_conv': False}))\n pools.append(SpatialSoftmax(**{'input_shape': [512, 15, 20], 'num_kp': self.num_kp, 'temperature': 1.0, 'learnable_temperature': False, 'noise_std': 0.0}))\n linears.append(torch.nn.Linear(int(np.prod([self.num_kp, 2])), self.feature_dimension))\n backbones = nn.ModuleList(backbones)\n pools = nn.ModuleList(pools)\n linears = nn.ModuleList(linears)",
+ "type": "code",
+ "location": "/policy.py:29-48"
+ },
+ "413": {
+ "file_id": 24,
+ "content": "Initializing the model's parameters with values from args_override dictionary. Creating lists of ResNet18Conv, SpatialSoftmax, and Linear layers for each camera name. Converting lists to nn.ModuleList to facilitate efficient computation during model execution.",
+ "type": "comment"
+ },
+ "414": {
+ "file_id": 24,
+ "content": " backbones = replace_bn_with_gn(backbones) # TODO\n noise_pred_net = ConditionalUnet1D(\n input_dim=self.ac_dim,\n global_cond_dim=self.obs_dim*self.observation_horizon\n )\n nets = nn.ModuleDict({\n 'policy': nn.ModuleDict({\n 'backbones': backbones,\n 'pools': pools,\n 'linears': linears,\n 'noise_pred_net': noise_pred_net\n })\n })\n nets = nets.float().cuda()\n ENABLE_EMA = True\n if ENABLE_EMA:\n ema = EMAModel(model=nets, power=self.ema_power)\n else:\n ema = None\n self.nets = nets\n self.ema = ema\n # setup noise scheduler\n self.noise_scheduler = DDIMScheduler(\n num_train_timesteps=50,\n beta_schedule='squaredcos_cap_v2',\n clip_sample=True,\n set_alpha_to_one=True,\n steps_offset=0,\n prediction_type='epsilon'\n )\n n_parameters = sum(p.numel() for p in self.parameters())",
+ "type": "code",
+ "location": "/policy.py:50-86"
+ },
+ "415": {
+ "file_id": 24,
+ "content": "This code defines a policy network with backbones, pools, linears, and noise prediction. The model is created as a PyTorch module, converted to float type, and moved to the GPU for faster computation. Optionally, an exponential moving average (EMA) model is also created if ENABLE_EMA flag is set. A noise scheduler is setup to manage the noise during training.",
+ "type": "comment"
+ },
+ "416": {
+ "file_id": 24,
+ "content": " print(\"number of parameters: %.2fM\" % (n_parameters/1e6,))\n def configure_optimizers(self):\n optimizer = torch.optim.AdamW(self.nets.parameters(), lr=self.lr, weight_decay=self.weight_decay)\n return optimizer\n def __call__(self, qpos, image, actions=None, is_pad=None):\n B = qpos.shape[0]\n if actions is not None: # training time\n nets = self.nets\n all_features = []\n for cam_id in range(len(self.camera_names)):\n cam_image = image[:, cam_id]\n cam_features = nets['policy']['backbones'][cam_id](cam_image)\n pool_features = nets['policy']['pools'][cam_id](cam_features)\n pool_features = torch.flatten(pool_features, start_dim=1)\n out_features = nets['policy']['linears'][cam_id](pool_features)\n all_features.append(out_features)\n obs_cond = torch.cat(all_features + [qpos], dim=1)\n # sample noise to add to actions\n noise = torch.randn(actions.shape, device=obs_cond.device)",
+ "type": "code",
+ "location": "/policy.py:87-111"
+ },
+ "417": {
+ "file_id": 24,
+ "content": "This code initializes an optimizer for the policy network in a multi-camera image task. It prints the number of parameters in the model and defines the __call__ method, which takes in input poses, images, actions (if training), and is_pad flags. During training, it extracts features from each camera's input, concatenates them with qpos, and adds noise to actions for better exploration.",
+ "type": "comment"
+ },
+ "418": {
+ "file_id": 24,
+ "content": " # sample a diffusion iteration for each data point\n timesteps = torch.randint(\n 0, self.noise_scheduler.config.num_train_timesteps, \n (B,), device=obs_cond.device\n ).long()\n # add noise to the clean actions according to the noise magnitude at each diffusion iteration\n # (this is the forward diffusion process)\n noisy_actions = self.noise_scheduler.add_noise(\n actions, noise, timesteps)\n # predict the noise residual\n noise_pred = nets['policy']['noise_pred_net'](noisy_actions, timesteps, global_cond=obs_cond)\n # L2 loss\n all_l2 = F.mse_loss(noise_pred, noise, reduction='none')\n loss = (all_l2 * ~is_pad.unsqueeze(-1)).mean()\n loss_dict = {}\n loss_dict['l2_loss'] = loss\n loss_dict['loss'] = loss\n if self.training and self.ema is not None:\n self.ema.step(nets)\n return loss_dict",
+ "type": "code",
+ "location": "/policy.py:113-137"
+ },
+ "419": {
+ "file_id": 24,
+ "content": "This code snippet samples diffusion iterations for each data point, adds noise to clean actions based on the noise magnitude at each iteration, predicts the noise residual using a neural network, calculates the L2 loss between predicted and actual noise, and returns the loss for training purposes. It also optionally updates an exponential moving average (EMA) of the model's parameters if in training mode and EMA is not None.",
+ "type": "comment"
+ },
+ "420": {
+ "file_id": 24,
+ "content": " else: # inference time\n To = self.observation_horizon\n Ta = self.action_horizon\n Tp = self.prediction_horizon\n action_dim = self.ac_dim\n nets = self.nets\n if self.ema is not None:\n nets = self.ema.averaged_model\n all_features = []\n for cam_id in range(len(self.camera_names)):\n cam_image = image[:, cam_id]\n cam_features = nets['policy']['backbones'][cam_id](cam_image)\n pool_features = nets['policy']['pools'][cam_id](cam_features)\n pool_features = torch.flatten(pool_features, start_dim=1)\n out_features = nets['policy']['linears'][cam_id](pool_features)\n all_features.append(out_features)\n obs_cond = torch.cat(all_features + [qpos], dim=1)\n # initialize action from Guassian noise\n noisy_action = torch.randn(\n (B, Tp, action_dim), device=obs_cond.device)\n naction = noisy_action",
+ "type": "code",
+ "location": "/policy.py:138-162"
+ },
+ "421": {
+ "file_id": 24,
+ "content": "This code is initializing action from Gaussian noise at inference time. It first determines the observation, action, and prediction horizons based on the policy settings. Then it retrieves the camera-specific networks and, if the exponential moving average (EMA) is not None, uses the averaged model instead of the current one. For each camera, it extracts features by passing images through the corresponding backbones, pools, and linears. Finally, it concatenates all extracted features with qpos, initializes noisy action from Gaussian noise, and sets naction to this noisy action.",
+ "type": "comment"
+ },
+ "422": {
+ "file_id": 24,
+ "content": " # init scheduler\n self.noise_scheduler.set_timesteps(self.num_inference_timesteps)\n for k in self.noise_scheduler.timesteps:\n # predict noise\n noise_pred = nets['policy']['noise_pred_net'](\n sample=naction, \n timestep=k,\n global_cond=obs_cond\n )\n # inverse diffusion step (remove noise)\n naction = self.noise_scheduler.step(\n model_output=noise_pred,\n timestep=k,\n sample=naction\n ).prev_sample\n return naction\n def serialize(self):\n return {\n \"nets\": self.nets.state_dict(),\n \"ema\": self.ema.averaged_model.state_dict() if self.ema is not None else None,\n }\n def deserialize(self, model_dict):\n status = self.nets.load_state_dict(model_dict[\"nets\"])\n print('Loaded model')\n if model_dict.get(\"ema\", None) is not None:",
+ "type": "code",
+ "location": "/policy.py:164-193"
+ },
+ "423": {
+ "file_id": 24,
+ "content": "The code initializes the noise scheduler and iterates through timesteps, predicting noise and performing inverse diffusion steps to remove noise from samples. It also includes functions for serializing and deserializing the model's parameters.",
+ "type": "comment"
+ },
+ "424": {
+ "file_id": 24,
+ "content": " print('Loaded EMA')\n status_ema = self.ema.averaged_model.load_state_dict(model_dict[\"ema\"])\n status = [status, status_ema]\n return status\nclass ACTPolicy(nn.Module):\n def __init__(self, args_override):\n super().__init__()\n model, optimizer = build_ACT_model_and_optimizer(args_override)\n self.model = model # CVAE decoder\n self.optimizer = optimizer\n self.kl_weight = args_override['kl_weight']\n self.vq = args_override['vq']\n print(f'KL Weight {self.kl_weight}')\n def __call__(self, qpos, image, actions=None, is_pad=None, vq_sample=None):\n env_state = None\n normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n std=[0.229, 0.224, 0.225])\n image = normalize(image)\n if actions is not None: # training time\n actions = actions[:, :self.model.num_queries]\n is_pad = is_pad[:, :self.model.num_queries]\n loss_dict = dict()",
+ "type": "code",
+ "location": "/policy.py:194-218"
+ },
+ "425": {
+ "file_id": 24,
+ "content": "The code defines an `ACTPolicy` class that uses the ACT model and optimizer for reinforcement learning tasks. It normalizes images, handles both training and testing scenarios, and calculates loss during training time. The kl_weight and vq arguments are taken from args_override.",
+ "type": "comment"
+ },
+ "426": {
+ "file_id": 24,
+ "content": " a_hat, is_pad_hat, (mu, logvar), probs, binaries = self.model(qpos, image, env_state, actions, is_pad, vq_sample)\n if self.vq or self.model.encoder is None:\n total_kld = [torch.tensor(0.0)]\n else:\n total_kld, dim_wise_kld, mean_kld = kl_divergence(mu, logvar)\n if self.vq:\n loss_dict['vq_discrepancy'] = F.l1_loss(probs, binaries, reduction='mean')\n all_l1 = F.l1_loss(actions, a_hat, reduction='none')\n l1 = (all_l1 * ~is_pad.unsqueeze(-1)).mean()\n loss_dict['l1'] = l1\n loss_dict['kl'] = total_kld[0]\n loss_dict['loss'] = loss_dict['l1'] + loss_dict['kl'] * self.kl_weight\n return loss_dict\n else: # inference time\n a_hat, _, (_, _), _, _ = self.model(qpos, image, env_state, vq_sample=vq_sample) # no action, sample from prior\n return a_hat\n def configure_optimizers(self):\n return self.optimizer\n @torch.no_grad()\n def vq_encode(self, qpos, actions, is_pad):",
+ "type": "code",
+ "location": "/policy.py:219-240"
+ },
+ "427": {
+ "file_id": 24,
+ "content": "The code is defining a policy function for an agent in a reinforcement learning environment. It calculates loss based on differences between predicted and actual actions, as well as KL divergence to penalize the model's confidence in its predictions. The code also defines an optimizer for training and a function to encode actions into binary representations for VQ-VAE (Variable Quantization Variational Autoencoder) models.",
+ "type": "comment"
+ },
+ "428": {
+ "file_id": 24,
+ "content": " actions = actions[:, :self.model.num_queries]\n is_pad = is_pad[:, :self.model.num_queries]\n _, _, binaries, _, _ = self.model.encode(qpos, actions, is_pad)\n return binaries\n def serialize(self):\n return self.state_dict()\n def deserialize(self, model_dict):\n return self.load_state_dict(model_dict)\nclass CNNMLPPolicy(nn.Module):\n def __init__(self, args_override):\n super().__init__()\n model, optimizer = build_CNNMLP_model_and_optimizer(args_override)\n self.model = model # decoder\n self.optimizer = optimizer\n def __call__(self, qpos, image, actions=None, is_pad=None):\n env_state = None # TODO\n normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n std=[0.229, 0.224, 0.225])\n image = normalize(image)\n if actions is not None: # training time\n actions = actions[:, 0]\n a_hat = self.model(qpos, image, env_state, actions)\n mse = F.mse_loss(actions, a_hat)",
+ "type": "code",
+ "location": "/policy.py:241-270"
+ },
+ "429": {
+ "file_id": 24,
+ "content": "This code defines a class for the CNNMLP policy model in an environment. The __init__ function initializes the model and optimizer based on arguments override, while the __call__ function takes in state (qpos), image, actions (if training time), and is_pad for processing. It normalizes the image, and if actions are provided, it calculates the MSE loss between predicted (a_hat) and actual (actions) actions.",
+ "type": "comment"
+ },
+ "430": {
+ "file_id": 24,
+ "content": " loss_dict = dict()\n loss_dict['mse'] = mse\n loss_dict['loss'] = loss_dict['mse']\n return loss_dict\n else: # inference time\n a_hat = self.model(qpos, image, env_state) # no action, sample from prior\n return a_hat\n def configure_optimizers(self):\n return self.optimizer\ndef kl_divergence(mu, logvar):\n batch_size = mu.size(0)\n assert batch_size != 0\n if mu.data.ndimension() == 4:\n mu = mu.view(mu.size(0), mu.size(1))\n if logvar.data.ndimension() == 4:\n logvar = logvar.view(logvar.size(0), logvar.size(1))\n klds = -0.5 * (1 + logvar - mu.pow(2) - logvar.exp())\n total_kld = klds.sum(1).mean(0, True)\n dimension_wise_kld = klds.mean(0)\n mean_kld = klds.mean(1).mean(0, True)\n return total_kld, dimension_wise_kld, mean_kld",
+ "type": "code",
+ "location": "/policy.py:271-295"
+ },
+ "431": {
+ "file_id": 24,
+ "content": "This code is a part of a neural network policy model. It calculates the KL divergence between two variables, and depending on whether it's training or inference time, it either returns the action estimate (a_hat) or the losses for different loss types like mse. The optimizer configuration function returns the optimizer used by the model.",
+ "type": "comment"
+ },
+ "432": {
+ "file_id": 25,
+ "content": "/postprocess_episodes.py",
+ "type": "filepath"
+ },
+ "433": {
+ "file_id": 25,
+ "content": "The code imports libraries, loads data, processes episode information, scales actions, compresses images with JPEG quality 50, and saves in HDF5 format. It generates datasets for image variables and populates root dataset from data_dict.",
+ "type": "summary"
+ },
+ "434": {
+ "file_id": 25,
+ "content": "import os\nimport numpy as np\nimport cv2\nimport h5py\nimport argparse\nimport time\nfrom visualize_episodes import visualize_joints, visualize_timestamp, save_videos\nimport matplotlib.pyplot as plt\nfrom constants import DT\nimport IPython\ne = IPython.embed\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTATE_NAMES = JOINT_NAMES + [\"gripper\"]\nMIRROR_STATE_MULTIPLY = np.array([-1, 1, 1, -1, 1, -1, 1]).astype('float32')\nMIRROR_BASE_MULTIPLY = np.array([1, -1]).astype('float32')\ndef load_hdf5(dataset_dir, dataset_name):\n dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')\n if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n is_sim = root.attrs['sim']\n compressed = root.attrs.get('compress', False)\n qpos = root['/observations/qpos'][()]\n qvel = root['/observations/qvel'][()]\n action = root['/action'][()]\n image_dict = dict()",
+ "type": "code",
+ "location": "/postprocess_episodes.py:1-33"
+ },
+ "435": {
+ "file_id": 25,
+ "content": "This code imports necessary libraries and defines constants for a robotics data processing script. It loads data from .hdf5 files, including robot joint positions and velocities, as well as actions performed by the robot.",
+ "type": "comment"
+ },
+ "436": {
+ "file_id": 25,
+ "content": " for cam_name in root[f'/observations/images/'].keys():\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]\n if 'base_action' in root.keys():\n print('base_action exists')\n base_action = root['/base_action'][()]\n else:\n base_action = None\n if compressed:\n compress_len = root['/compress_len'][()]\n if compressed:\n for cam_id, cam_name in enumerate(image_dict.keys()):\n # un-pad and uncompress\n padded_compressed_image_list = image_dict[cam_name]\n image_list = []\n for padded_compressed_image in padded_compressed_image_list: # [:1000] to save memory\n image = cv2.imdecode(padded_compressed_image, 1)\n image_list.append(image)\n image_dict[cam_name] = np.array(image_list)\n return qpos, qvel, action, base_action, image_dict, is_sim\ndef main(args):\n dataset_dir = args['dataset_dir']\n num_episodes = args['num_episodes']\n start_idx = 0",
+ "type": "code",
+ "location": "/postprocess_episodes.py:34-60"
+ },
+ "437": {
+ "file_id": 25,
+ "content": "Iterates through image keys, stores in image_dict.\nChecks if base_action exists and assigns value accordingly.\nIf compressed, un-pads and uncompresses images, stores in image_dict.\nReturns various variables including base_action and image_dict.",
+ "type": "comment"
+ },
+ "438": {
+ "file_id": 25,
+ "content": " for episode_idx in range(start_idx, start_idx + num_episodes):\n dataset_name = f'episode_{episode_idx}'\n qpos, qvel, action, base_action, image_dict, is_sim = load_hdf5(dataset_dir, dataset_name)\n # process proprioception\n qpos = np.concatenate([qpos[:, 7:] * MIRROR_STATE_MULTIPLY, qpos[:, :7] * MIRROR_STATE_MULTIPLY], axis=1)\n qvel = np.concatenate([qvel[:, 7:] * MIRROR_STATE_MULTIPLY, qvel[:, :7] * MIRROR_STATE_MULTIPLY], axis=1)\n action = np.concatenate([action[:, 7:] * MIRROR_STATE_MULTIPLY, action[:, :7] * MIRROR_STATE_MULTIPLY], axis=1)\n if base_action is not None:\n base_action = base_action * MIRROR_BASE_MULTIPLY\n # mirror image obs\n if 'left_wrist' in image_dict.keys():\n image_dict['left_wrist'], image_dict['right_wrist'] = image_dict['right_wrist'][:, :, ::-1], image_dict['left_wrist'][:, :, ::-1]\n elif 'cam_left_wrist' in image_dict.keys():\n image_dict['cam_left_wrist'], image_dict['",
+ "type": "code",
+ "location": "/postprocess_episodes.py:61-77"
+ },
+ "439": {
+ "file_id": 25,
+ "content": "This code is part of a function that loads and processes episode data from HDF5 files. It iterates over multiple episodes, concatenating mirrored proprioception and action data, and optionally scales the base action. If any images with 'left_wrist' or 'cam_left_wrist' keys exist in the image dictionary, it swaps their positions for mirroring purposes.",
+ "type": "comment"
+ },
+ "440": {
+ "file_id": 25,
+ "content": "cam_right_wrist'] = image_dict['cam_right_wrist'][:, :, ::-1], image_dict['cam_left_wrist'][:, :, ::-1]\n else:\n raise Exception('No left_wrist or cam_left_wrist in image_dict')\n if 'top' in image_dict.keys():\n image_dict['top'] = image_dict['top'][:, :, ::-1]\n elif 'cam_high' in image_dict.keys():\n image_dict['cam_high'] = image_dict['cam_high'][:, :, ::-1]\n else:\n raise Exception('No top or cam_high in image_dict')\n # saving\n data_dict = {\n '/observations/qpos': qpos,\n '/observations/qvel': qvel,\n '/action': action,\n '/base_action': base_action,\n } if base_action is not None else {\n '/observations/qpos': qpos,\n '/observations/qvel': qvel,\n '/action': action,\n }\n for cam_name in image_dict.keys():\n data_dict[f'/observations/images/{cam_name}'] = image_dict[cam_name]\n max_timesteps = len(qpos)\n COMPRESS = True",
+ "type": "code",
+ "location": "/postprocess_episodes.py:77-103"
+ },
+ "441": {
+ "file_id": 25,
+ "content": "This code checks for specific keys in the image_dict and adjusts the values if necessary. If 'left_wrist' or 'cam_left_wrist' is present, it flips the image. It also handles if 'top' or 'cam_high' are present, flipping them accordingly. Then, it creates a data_dict with necessary keys ('/observations/qpos', '/observations/qvel', '/action', and '/base_action') for saving. Finally, it loops through the image_dict to add its contents as key-value pairs in the data_dict, and sets max_timesteps as the length of qpos. The code uses compression while saving.",
+ "type": "comment"
+ },
+ "442": {
+ "file_id": 25,
+ "content": " if COMPRESS:\n # JPEG compression\n t0 = time.time()\n encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 50] # tried as low as 20, seems fine\n compressed_len = []\n for cam_name in image_dict.keys():\n image_list = data_dict[f'/observations/images/{cam_name}']\n compressed_list = []\n compressed_len.append([])\n for image in image_list:\n result, encoded_image = cv2.imencode('.jpg', image, encode_param) # 0.02 sec # cv2.imdecode(encoded_image, 1)\n compressed_list.append(encoded_image)\n compressed_len[-1].append(len(encoded_image))\n data_dict[f'/observations/images/{cam_name}'] = compressed_list\n print(f'compression: {time.time() - t0:.2f}s')\n # pad so it has same length\n t0 = time.time()\n compressed_len = np.array(compressed_len)\n padded_size = compressed_len.max()\n for cam_name in image_dict.keys():",
+ "type": "code",
+ "location": "/postprocess_episodes.py:105-125"
+ },
+ "443": {
+ "file_id": 25,
+ "content": "This code compresses images using JPEG compression with a quality level of 50, stores the compressed images in the data dictionary, and measures the time taken for the compression process.",
+ "type": "comment"
+ },
+ "444": {
+ "file_id": 25,
+ "content": " compressed_image_list = data_dict[f'/observations/images/{cam_name}']\n padded_compressed_image_list = []\n for compressed_image in compressed_image_list:\n padded_compressed_image = np.zeros(padded_size, dtype='uint8')\n image_len = len(compressed_image)\n padded_compressed_image[:image_len] = compressed_image\n padded_compressed_image_list.append(padded_compressed_image)\n data_dict[f'/observations/images/{cam_name}'] = padded_compressed_image_list\n print(f'padding: {time.time() - t0:.2f}s')\n # HDF5\n t0 = time.time()\n dataset_path = os.path.join(dataset_dir, f'mirror_episode_{episode_idx}')\n with h5py.File(dataset_path + '.hdf5', 'w', rdcc_nbytes=1024 ** 2 * 2) as root:\n root.attrs['sim'] = is_sim\n root.attrs['compress'] = COMPRESS\n obs = root.create_group('observations')\n image = obs.create_group('images')",
+ "type": "code",
+ "location": "/postprocess_episodes.py:126-143"
+ },
+ "445": {
+ "file_id": 25,
+ "content": "This code is padding compressed images, adding them to the data dictionary, and saving the dataset in HDF5 format. The padding ensures all images have the same length for consistency in the HDF5 file. It also records the time taken to pad the images.",
+ "type": "comment"
+ },
+ "446": {
+ "file_id": 25,
+ "content": " for cam_name in image_dict.keys():\n if COMPRESS:\n _ = image.create_dataset(cam_name, (max_timesteps, padded_size), dtype='uint8',\n chunks=(1, padded_size), )\n else:\n _ = image.create_dataset(cam_name, (max_timesteps, 480, 640, 3), dtype='uint8',\n chunks=(1, 480, 640, 3), )\n qpos = obs.create_dataset('qpos', (max_timesteps, 14))\n qvel = obs.create_dataset('qvel', (max_timesteps, 14))\n action = root.create_dataset('action', (max_timesteps, 14))\n if base_action is not None:\n base_action = root.create_dataset('base_action', (max_timesteps, 2))\n for name, array in data_dict.items():\n root[name][...] = array\n if COMPRESS:\n _ = root.create_dataset('compress_len', (len(image_dict.keys()), max_timesteps))\n root['/compress_len'][...] = compressed_len",
+ "type": "code",
+ "location": "/postprocess_episodes.py:144-162"
+ },
+ "447": {
+ "file_id": 25,
+ "content": "This code creates datasets for image data and other variables, based on whether to compress or not. It also creates datasets for qpos, qvel, action, and base_action if they are not None. Additionally, it populates the root dataset with data from the data_dict and creates a 'compress_len' dataset if compression is enabled.",
+ "type": "comment"
+ },
+ "448": {
+ "file_id": 25,
+ "content": " print(f'Saving {dataset_path}: {time.time() - t0:.1f} secs\\n')\n if episode_idx == start_idx:\n save_videos(image_dict, DT, video_path=os.path.join(dataset_dir, dataset_name + '_mirror_video.mp4'))\n # visualize_joints(qpos, action, plot_path=os.path.join(dataset_dir, dataset_name + '_mirror_qpos.png'))\n # visualize_timestamp(t_list, dataset_path) # TODO addn timestamp back\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--dataset_dir', action='store', type=str, help='Dataset dir.', required=True)\n parser.add_argument('--num_episodes', action='store', type=int, help='Number of episodes.', required=True)\n main(vars(parser.parse_args()))",
+ "type": "code",
+ "location": "/postprocess_episodes.py:164-175"
+ },
+ "449": {
+ "file_id": 25,
+ "content": "The code snippet saves the dataset, prints the time taken for the process, and has options to save videos and visualize joints. The user is required to specify the dataset directory and the number of episodes.",
+ "type": "comment"
+ },
+ "450": {
+ "file_id": 26,
+ "content": "/record_sim_episodes.py",
+ "type": "filepath"
+ },
+ "451": {
+ "file_id": 26,
+ "content": "The code imports libraries, initializes policy and environment, iterates over time steps, takes actions, updates state, determines success, and evaluates simulation episodes, storing data in an HDF5 file for visualization or analysis. It also creates datasets from camera images, qpos, and actions.",
+ "type": "summary"
+ },
+ "452": {
+ "file_id": 26,
+ "content": "import time\nimport os\nimport numpy as np\nimport argparse\nimport matplotlib.pyplot as plt\nimport h5py\nfrom constants import PUPPET_GRIPPER_POSITION_NORMALIZE_FN, SIM_TASK_CONFIGS\nfrom ee_sim_env import make_ee_sim_env\nfrom sim_env import make_sim_env, BOX_POSE\nfrom scripted_policy import PickAndTransferPolicy, InsertionPolicy\nimport IPython\ne = IPython.embed\ndef main(args):\n \"\"\"\n Generate demonstration data in simulation.\n First rollout the policy (defined in ee space) in ee_sim_env. Obtain the joint trajectory.\n Replace the gripper joint positions with the commanded joint position.\n Replay this joint trajectory (as action sequence) in sim_env, and record all observations.\n Save this episode of data, and continue to next episode of data collection.\n \"\"\"\n task_name = args['task_name']\n dataset_dir = args['dataset_dir']\n num_episodes = args['num_episodes']\n onscreen_render = args['onscreen_render']\n inject_noise = False\n render_cam_name = 'top'\n if not os.path.isdir(dataset_dir):",
+ "type": "code",
+ "location": "/record_sim_episodes.py:1-33"
+ },
+ "453": {
+ "file_id": 26,
+ "content": "The code imports necessary libraries, defines the main function to generate demonstration data in simulation. It first rolls out policy in ee_sim_env and obtains joint trajectory, then replaces gripper joint positions with commanded positions. Finally, it replay joint trajectory in sim_env and record observations for each episode before saving the dataset.",
+ "type": "comment"
+ },
+ "454": {
+ "file_id": 26,
+ "content": " os.makedirs(dataset_dir, exist_ok=True)\n episode_len = SIM_TASK_CONFIGS[task_name]['episode_len']\n camera_names = SIM_TASK_CONFIGS[task_name]['camera_names']\n if task_name == 'sim_transfer_cube_scripted':\n policy_cls = PickAndTransferPolicy\n elif task_name == 'sim_insertion_scripted':\n policy_cls = InsertionPolicy\n elif task_name == 'sim_transfer_cube_scripted_mirror':\n policy_cls = PickAndTransferPolicy\n else:\n raise NotImplementedError\n success = []\n for episode_idx in range(num_episodes):\n print(f'{episode_idx=}')\n print('Rollout out EE space scripted policy')\n # setup the environment\n env = make_ee_sim_env(task_name)\n ts = env.reset()\n episode = [ts]\n policy = policy_cls(inject_noise)\n # setup plotting\n if onscreen_render:\n ax = plt.subplot()\n plt_img = ax.imshow(ts.observation['images'][render_cam_name])\n plt.ion()\n for step in range(episode_len):",
+ "type": "code",
+ "location": "/record_sim_episodes.py:34-61"
+ },
+ "455": {
+ "file_id": 26,
+ "content": "This code snippet is creating a new directory for the dataset, setting up the episode length and camera names based on the task name, and then initializing the policy class depending on the task. It also creates an empty list for success and starts a loop for each episode where it sets up the environment, resets the environment, creates an episode list with the first observation, initializes the policy, and then starts another loop to iterate through steps in each episode.",
+ "type": "comment"
+ },
+ "456": {
+ "file_id": 26,
+ "content": " action = policy(ts)\n ts = env.step(action)\n episode.append(ts)\n if onscreen_render:\n plt_img.set_data(ts.observation['images'][render_cam_name])\n plt.pause(0.002)\n plt.close()\n episode_return = np.sum([ts.reward for ts in episode[1:]])\n episode_max_reward = np.max([ts.reward for ts in episode[1:]])\n if episode_max_reward == env.task.max_reward:\n print(f\"{episode_idx=} Successful, {episode_return=}\")\n else:\n print(f\"{episode_idx=} Failed\")\n joint_traj = [ts.observation['qpos'] for ts in episode]\n # replace gripper pose with gripper control\n gripper_ctrl_traj = [ts.observation['gripper_ctrl'] for ts in episode]\n for joint, ctrl in zip(joint_traj, gripper_ctrl_traj):\n left_ctrl = PUPPET_GRIPPER_POSITION_NORMALIZE_FN(ctrl[0])\n right_ctrl = PUPPET_GRIPPER_POSITION_NORMALIZE_FN(ctrl[2])\n joint[6] = left_ctrl\n joint[6+7] = right_ctrl",
+ "type": "code",
+ "location": "/record_sim_episodes.py:62-84"
+ },
+ "457": {
+ "file_id": 26,
+ "content": "This code is iterating over each time step in the episode, taking actions based on a policy, updating the state, and appending the state to the episode list. It also renders images for each state if the onscreen_render flag is set. It calculates the episode return and maximum reward, then prints whether the episode was successful or not. Finally, it extracts joint and gripper control trajectories from the episode and applies normalization to gripper positions.",
+ "type": "comment"
+ },
+ "458": {
+ "file_id": 26,
+ "content": " subtask_info = episode[0].observation['env_state'].copy() # box pose at step 0\n # clear unused variables\n del env\n del episode\n del policy\n # setup the environment\n print('Replaying joint commands')\n env = make_sim_env(task_name)\n BOX_POSE[0] = subtask_info # make sure the sim_env has the same object configurations as ee_sim_env\n ts = env.reset()\n episode_replay = [ts]\n # setup plotting\n if onscreen_render:\n ax = plt.subplot()\n plt_img = ax.imshow(ts.observation['images'][render_cam_name])\n plt.ion()\n for t in range(len(joint_traj)): # note: this will increase episode length by 1\n action = joint_traj[t]\n ts = env.step(action)\n episode_replay.append(ts)\n if onscreen_render:\n plt_img.set_data(ts.observation['images'][render_cam_name])\n plt.pause(0.02)\n episode_return = np.sum([ts.reward for ts in episode_replay[1:]])",
+ "type": "code",
+ "location": "/record_sim_episodes.py:86-113"
+ },
+ "459": {
+ "file_id": 26,
+ "content": "This code is replaying joint commands from a previous episode. It first saves the initial box pose, clears unused variables, sets up the environment, and resets it. Then, for each joint command in the trajectory, it performs an action in the environment and appends the new state to the episode_replay list. If onscreen_render is True, it updates a plot with the current observation image. Finally, it calculates the total reward from the episode and stores it as episode_return.",
+ "type": "comment"
+ },
+ "460": {
+ "file_id": 26,
+ "content": " episode_max_reward = np.max([ts.reward for ts in episode_replay[1:]])\n if episode_max_reward == env.task.max_reward:\n success.append(1)\n print(f\"{episode_idx=} Successful, {episode_return=}\")\n else:\n success.append(0)\n print(f\"{episode_idx=} Failed\")\n plt.close()\n \"\"\"\n For each timestep:\n observations\n - images\n - each_cam_name (480, 640, 3) 'uint8'\n - qpos (14,) 'float64'\n - qvel (14,) 'float64'\n action (14,) 'float64'\n \"\"\"\n data_dict = {\n '/observations/qpos': [],\n '/observations/qvel': [],\n '/action': [],\n }\n for cam_name in camera_names:\n data_dict[f'/observations/images/{cam_name}'] = []\n # because the replaying, there will be eps_len + 1 actions and eps_len + 2 timesteps\n # truncate here to be consistent\n joint_traj = joint_traj[:-1]",
+ "type": "code",
+ "location": "/record_sim_episodes.py:114-145"
+ },
+ "461": {
+ "file_id": 26,
+ "content": "This code measures the success of each episode in a simulation by checking if the maximum reward reached the maximum possible reward. If it did, the episode is considered successful and printed as such; otherwise, it's considered a failure. The code also collects observations and actions into a data dictionary for potential visualization or analysis purposes.",
+ "type": "comment"
+ },
+ "462": {
+ "file_id": 26,
+ "content": " episode_replay = episode_replay[:-1]\n # len(joint_traj) i.e. actions: max_timesteps\n # len(episode_replay) i.e. time steps: max_timesteps + 1\n max_timesteps = len(joint_traj)\n while joint_traj:\n action = joint_traj.pop(0)\n ts = episode_replay.pop(0)\n data_dict['/observations/qpos'].append(ts.observation['qpos'])\n data_dict['/observations/qvel'].append(ts.observation['qvel'])\n data_dict['/action'].append(action)\n for cam_name in camera_names:\n data_dict[f'/observations/images/{cam_name}'].append(ts.observation['images'][cam_name])\n # HDF5\n t0 = time.time()\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_idx}')\n with h5py.File(dataset_path + '.hdf5', 'w', rdcc_nbytes=1024 ** 2 * 2) as root:\n root.attrs['sim'] = True\n obs = root.create_group('observations')\n image = obs.create_group('images')\n for cam_name in camera_names:",
+ "type": "code",
+ "location": "/record_sim_episodes.py:146-167"
+ },
+ "463": {
+ "file_id": 26,
+ "content": "This code segment is part of a function that processes episode data from a simulation and saves it as an HDF5 file. It extracts observations, actions, and camera images from the episode replay and stores them in the dictionary \"data_dict\". After processing all timesteps, it creates an HDF5 file with the episode data, including attributes and groups for observations and images.",
+ "type": "comment"
+ },
+ "464": {
+ "file_id": 26,
+ "content": " _ = image.create_dataset(cam_name, (max_timesteps, 480, 640, 3), dtype='uint8',\n chunks=(1, 480, 640, 3), )\n # compression='gzip',compression_opts=2,)\n # compression=32001, compression_opts=(0, 0, 0, 0, 9, 1, 1), shuffle=False)\n qpos = obs.create_dataset('qpos', (max_timesteps, 14))\n qvel = obs.create_dataset('qvel', (max_timesteps, 14))\n action = root.create_dataset('action', (max_timesteps, 14))\n for name, array in data_dict.items():\n root[name][...] = array\n print(f'Saving: {time.time() - t0:.1f} secs\\n')\n print(f'Saved to {dataset_dir}')\n print(f'Success: {np.sum(success)} / {len(success)}')\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--dataset_dir', action='store', type=str, help='dataset saving dir', required=True)",
+ "type": "code",
+ "location": "/record_sim_episodes.py:168-186"
+ },
+ "465": {
+ "file_id": 26,
+ "content": "This code creates datasets for camera images, qpos, qvel, and actions in a specific order. It then assigns the array values to corresponding names within the root dataset. Finally, it provides statistics on the saving time, saved location, and success rate of the task. The code assumes 'max_timesteps', 'data_dict', 'cam_name' and 'obs' are predefined variables.",
+ "type": "comment"
+ },
+ "466": {
+ "file_id": 26,
+ "content": " parser.add_argument('--num_episodes', action='store', type=int, help='num_episodes', required=False)\n parser.add_argument('--onscreen_render', action='store_true')\n main(vars(parser.parse_args()))",
+ "type": "code",
+ "location": "/record_sim_episodes.py:187-190"
+ },
+ "467": {
+ "file_id": 26,
+ "content": "The code above adds command line arguments for the number of episodes and on-screen rendering to a parser. The 'num_episodes' argument is of type int, required=False, and helps specify the number of episodes to run. The 'onscreen_render' argument, when set to true, enables on-screen rendering during game playback. The main function takes the arguments parsed by the parser object to execute the program.",
+ "type": "comment"
+ },
+ "468": {
+ "file_id": 27,
+ "content": "/replay_episodes.py",
+ "type": "filepath"
+ },
+ "469": {
+ "file_id": 27,
+ "content": "The code imports libraries and defines a main function to replay an episode from an existing dataset, organizing images into videos. The 'save_videos' function is defined for command line arguments and executed if the script is run directly.",
+ "type": "summary"
+ },
+ "470": {
+ "file_id": 27,
+ "content": "import os\nimport h5py\nimport argparse\nfrom collections import defaultdict \nfrom sim_env import make_sim_env\nfrom utils import sample_box_pose, sample_insertion_pose\nfrom sim_env import BOX_POSE\nfrom constants import DT\nfrom visualize_episodes import save_videos\nimport IPython\ne = IPython.embed\ndef main(args):\n dataset_path = args['dataset_path']\n if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n actions = root['/action'][()]\n env = make_sim_env('sim_transfer_cube')\n BOX_POSE[0] = sample_box_pose() # used in sim reset\n ts = env.reset()\n episode_replay = [ts]\n for action in actions:\n ts = env.step(action)\n episode_replay.append(ts)\n # saving\n image_dict = defaultdict(lambda: [])\n while episode_replay:\n ts = episode_replay.pop(0)\n for cam_name, image in ts.observation['images'].items():\n image_dict[cam_name].append(image)\n video_path = dataset_path.replace('episode_', 'replay_episode_').replace('hdf5', 'mp4')",
+ "type": "code",
+ "location": "/replay_episodes.py:1-41"
+ },
+ "471": {
+ "file_id": 27,
+ "content": "The code imports necessary libraries and defines a main function to replay an episode from an existing dataset. It checks if the dataset file exists, then reads the actions and initializes the simulation environment. It performs the steps of the replayed episode by taking actions in the environment, appending states to the episode_replay list. The code then organizes images from each state into a dictionary for saving. Finally, it creates a video path with the modified name and saves the images as videos in that new path.",
+ "type": "comment"
+ },
+ "472": {
+ "file_id": 27,
+ "content": " save_videos(image_dict, DT, video_path=video_path)\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--dataset_path', action='store', type=str, help='Dataset path.', required=True)\n main(vars(parser.parse_args()))",
+ "type": "code",
+ "location": "/replay_episodes.py:42-48"
+ },
+ "473": {
+ "file_id": 27,
+ "content": "The code defines a function \"save_videos\" and checks if the script is run directly. It sets up an ArgumentParser for command line arguments, including '--dataset_path'. Then it calls main with the parsed command line arguments.",
+ "type": "comment"
+ },
+ "474": {
+ "file_id": 28,
+ "content": "/scripted_policy.py",
+ "type": "filepath"
+ },
+ "475": {
+ "file_id": 28,
+ "content": "The code introduces a `BasePolicy` class for robotic arm policy, incorporating trajectory generation, updating poses and gripper commands, and executing pre-generated trajectories. It initializes an environment and runs two episodes of actions using PickAndTransferPolicy to test cube transfer simulation scripts.",
+ "type": "summary"
+ },
+ "476": {
+ "file_id": 28,
+ "content": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom pyquaternion import Quaternion\nfrom constants import SIM_TASK_CONFIGS\nfrom ee_sim_env import make_ee_sim_env\nimport IPython\ne = IPython.embed\nclass BasePolicy:\n def __init__(self, inject_noise=False):\n self.inject_noise = inject_noise\n self.step_count = 0\n self.left_trajectory = None\n self.right_trajectory = None\n def generate_trajectory(self, ts_first):\n raise NotImplementedError\n @staticmethod\n def interpolate(curr_waypoint, next_waypoint, t):\n t_frac = (t - curr_waypoint[\"t\"]) / (next_waypoint[\"t\"] - curr_waypoint[\"t\"])\n curr_xyz = curr_waypoint['xyz']\n curr_quat = curr_waypoint['quat']\n curr_grip = curr_waypoint['gripper']\n next_xyz = next_waypoint['xyz']\n next_quat = next_waypoint['quat']\n next_grip = next_waypoint['gripper']\n xyz = curr_xyz + (next_xyz - curr_xyz) * t_frac\n quat = curr_quat + (next_quat - curr_quat) * t_frac\n gripper = curr_grip + (next_grip - curr_grip) * t_frac",
+ "type": "code",
+ "location": "/scripted_policy.py:1-33"
+ },
+ "477": {
+ "file_id": 28,
+ "content": "The code defines a `BasePolicy` class for a robotic arm policy with methods to generate and interpolate trajectories. It imports necessary libraries, handles injecting noise, and includes utility functions.",
+ "type": "comment"
+ },
+ "478": {
+ "file_id": 28,
+ "content": " return xyz, quat, gripper\n def __call__(self, ts):\n # generate trajectory at first timestep, then open-loop execution\n if self.step_count == 0:\n self.generate_trajectory(ts)\n # obtain left and right waypoints\n if self.left_trajectory[0]['t'] == self.step_count:\n self.curr_left_waypoint = self.left_trajectory.pop(0)\n next_left_waypoint = self.left_trajectory[0]\n if self.right_trajectory[0]['t'] == self.step_count:\n self.curr_right_waypoint = self.right_trajectory.pop(0)\n next_right_waypoint = self.right_trajectory[0]\n # interpolate between waypoints to obtain current pose and gripper command\n left_xyz, left_quat, left_gripper = self.interpolate(self.curr_left_waypoint, next_left_waypoint, self.step_count)\n right_xyz, right_quat, right_gripper = self.interpolate(self.curr_right_waypoint, next_right_waypoint, self.step_count)\n # Inject noise\n if self.inject_noise:\n scale = 0.01",
+ "type": "code",
+ "location": "/scripted_policy.py:34-56"
+ },
+ "479": {
+ "file_id": 28,
+ "content": "This code is responsible for executing a pre-generated trajectory by interpolating between waypoints, obtaining the current pose and gripper command for both left and right sides. It also allows injecting noise if enabled. The function is called at each timestep to update the pose and gripper commands.",
+ "type": "comment"
+ },
+ "480": {
+ "file_id": 28,
+ "content": " left_xyz = left_xyz + np.random.uniform(-scale, scale, left_xyz.shape)\n right_xyz = right_xyz + np.random.uniform(-scale, scale, right_xyz.shape)\n action_left = np.concatenate([left_xyz, left_quat, [left_gripper]])\n action_right = np.concatenate([right_xyz, right_quat, [right_gripper]])\n self.step_count += 1\n return np.concatenate([action_left, action_right])\nclass PickAndTransferPolicy(BasePolicy):\n def generate_trajectory(self, ts_first):\n init_mocap_pose_right = ts_first.observation['mocap_pose_right']\n init_mocap_pose_left = ts_first.observation['mocap_pose_left']\n box_info = np.array(ts_first.observation['env_state'])\n box_xyz = box_info[:3]\n box_quat = box_info[3:]\n # print(f\"Generate trajectory for {box_xyz=}\")\n gripper_pick_quat = Quaternion(init_mocap_pose_right[3:])\n gripper_pick_quat = gripper_pick_quat * Quaternion(axis=[0.0, 1.0, 0.0], degrees=-60)\n meet_left_quat = Quaternion(axis=[1.0, 0.0, 0.0], degrees=90)",
+ "type": "code",
+ "location": "/scripted_policy.py:57-81"
+ },
+ "481": {
+ "file_id": 28,
+ "content": "The code snippet is part of a PickAndTransferPolicy class. It generates a trajectory for picking up an object and transferring it from one robot arm to another. The code adds random uniform noise to the action coordinates, concatenates the actions with quaternions and gripper states, increments the step count, and returns the combined action for both arms. The method also initializes variables based on the first time step observation, including the initial mocap poses of both robot arms and box information (XYZ and quaternion).",
+ "type": "comment"
+ },
+ "482": {
+ "file_id": 28,
+ "content": " meet_xyz = np.array([0, 0.5, 0.25])\n self.left_trajectory = [\n {\"t\": 0, \"xyz\": init_mocap_pose_left[:3], \"quat\": init_mocap_pose_left[3:], \"gripper\": 0}, # sleep\n {\"t\": 100, \"xyz\": meet_xyz + np.array([-0.1, 0, -0.02]), \"quat\": meet_left_quat.elements, \"gripper\": 1}, # approach meet position\n {\"t\": 260, \"xyz\": meet_xyz + np.array([0.02, 0, -0.02]), \"quat\": meet_left_quat.elements, \"gripper\": 1}, # move to meet position\n {\"t\": 310, \"xyz\": meet_xyz + np.array([0.02, 0, -0.02]), \"quat\": meet_left_quat.elements, \"gripper\": 0}, # close gripper\n {\"t\": 360, \"xyz\": meet_xyz + np.array([-0.1, 0, -0.02]), \"quat\": np.array([1, 0, 0, 0]), \"gripper\": 0}, # move left\n {\"t\": 400, \"xyz\": meet_xyz + np.array([-0.1, 0, -0.02]), \"quat\": np.array([1, 0, 0, 0]), \"gripper\": 0}, # stay\n ]\n self.right_trajectory = [\n {\"t\": 0, \"xyz\": init_mocap_pose_right[:3], \"quat\": init_mocap_pose_right[3:], \"gripper\": 0}, # sleep",
+ "type": "code",
+ "location": "/scripted_policy.py:83-95"
+ },
+ "483": {
+ "file_id": 28,
+ "content": "Code defines trajectory for left and right robot arms. Left arm starts by sleeping, then approaches and moves to meet position, closes gripper, moves left, and stays at final position. Right arm also sleeps, follows similar steps as left arm. All movements are time-based.",
+ "type": "comment"
+ },
+ "484": {
+ "file_id": 28,
+ "content": " {\"t\": 90, \"xyz\": box_xyz + np.array([0, 0, 0.08]), \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # approach the cube\n {\"t\": 130, \"xyz\": box_xyz + np.array([0, 0, -0.015]), \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # go down\n {\"t\": 170, \"xyz\": box_xyz + np.array([0, 0, -0.015]), \"quat\": gripper_pick_quat.elements, \"gripper\": 0}, # close gripper\n {\"t\": 200, \"xyz\": meet_xyz + np.array([0.05, 0, 0]), \"quat\": gripper_pick_quat.elements, \"gripper\": 0}, # approach meet position\n {\"t\": 220, \"xyz\": meet_xyz, \"quat\": gripper_pick_quat.elements, \"gripper\": 0}, # move to meet position\n {\"t\": 310, \"xyz\": meet_xyz, \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # open gripper\n {\"t\": 360, \"xyz\": meet_xyz + np.array([0.1, 0, 0]), \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # move to right\n {\"t\": 400, \"xyz\": meet_xyz + np.array([0.1, 0, 0]), \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # stay",
+ "type": "code",
+ "location": "/scripted_policy.py:96-103"
+ },
+ "485": {
+ "file_id": 28,
+ "content": "This code represents a sequence of actions for a robot gripper. It begins by approaching and gripping the cube, then moving downwards, closing the gripper at a certain position, moving to a meet position, opening the gripper, and finally moving right and staying in that position. The actions are time-based with specific positions and gripper states.",
+ "type": "comment"
+ },
+ "486": {
+ "file_id": 28,
+ "content": " ]\nclass InsertionPolicy(BasePolicy):\n def generate_trajectory(self, ts_first):\n init_mocap_pose_right = ts_first.observation['mocap_pose_right']\n init_mocap_pose_left = ts_first.observation['mocap_pose_left']\n peg_info = np.array(ts_first.observation['env_state'])[:7]\n peg_xyz = peg_info[:3]\n peg_quat = peg_info[3:]\n socket_info = np.array(ts_first.observation['env_state'])[7:]\n socket_xyz = socket_info[:3]\n socket_quat = socket_info[3:]\n gripper_pick_quat_right = Quaternion(init_mocap_pose_right[3:])\n gripper_pick_quat_right = gripper_pick_quat_right * Quaternion(axis=[0.0, 1.0, 0.0], degrees=-60)\n gripper_pick_quat_left = Quaternion(init_mocap_pose_right[3:])\n gripper_pick_quat_left = gripper_pick_quat_left * Quaternion(axis=[0.0, 1.0, 0.0], degrees=60)\n meet_xyz = np.array([0, 0.5, 0.15])\n lift_right = 0.00715\n self.left_trajectory = [\n {\"t\": 0, \"xyz\": init_mocap_pose_left[:3], \"quat\": init_mocap_pose_left[3:], \"gripper\": 0}, # sleep",
+ "type": "code",
+ "location": "/scripted_policy.py:104-131"
+ },
+ "487": {
+ "file_id": 28,
+ "content": "This code initializes variables for the InsertionPolicy class's generate_trajectory method. It extracts information from the observation and calculates gripper quaternions for both hands, defining their starting positions and orientation. The meet_xyz variable represents a specific target position, while lift_right is an arbitrary value. The left_trajectory list is initialized with the first point as the initial mocap pose of the left hand in sleep mode.",
+ "type": "comment"
+ },
+ "488": {
+ "file_id": 28,
+ "content": " {\"t\": 120, \"xyz\": socket_xyz + np.array([0, 0, 0.08]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 1}, # approach the cube\n {\"t\": 170, \"xyz\": socket_xyz + np.array([0, 0, -0.03]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 1}, # go down\n {\"t\": 220, \"xyz\": socket_xyz + np.array([0, 0, -0.03]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 0}, # close gripper\n {\"t\": 285, \"xyz\": meet_xyz + np.array([-0.1, 0, 0]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 0}, # approach meet position\n {\"t\": 340, \"xyz\": meet_xyz + np.array([-0.05, 0, 0]), \"quat\": gripper_pick_quat_left.elements,\"gripper\": 0}, # insertion\n {\"t\": 400, \"xyz\": meet_xyz + np.array([-0.05, 0, 0]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 0}, # insertion\n ]\n self.right_trajectory = [\n {\"t\": 0, \"xyz\": init_mocap_pose_right[:3], \"quat\": init_mocap_pose_right[3:], \"gripper\": 0}, # sleep\n {\"t\": 12",
+ "type": "code",
+ "location": "/scripted_policy.py:132-142"
+ },
+ "489": {
+ "file_id": 28,
+ "content": "This code defines a list of trajectory points for left and right arms, specifying their xyz coordinates, orientation quaternion, and gripper state at each time step. It follows a sequence of actions such as approaching the cube, going down, closing the gripper, and reaching insertion positions.",
+ "type": "comment"
+ },
+ "490": {
+ "file_id": 28,
+ "content": "0, \"xyz\": peg_xyz + np.array([0, 0, 0.08]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 1}, # approach the cube\n {\"t\": 170, \"xyz\": peg_xyz + np.array([0, 0, -0.03]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 1}, # go down\n {\"t\": 220, \"xyz\": peg_xyz + np.array([0, 0, -0.03]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 0}, # close gripper\n {\"t\": 285, \"xyz\": meet_xyz + np.array([0.1, 0, lift_right]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 0}, # approach meet position\n {\"t\": 340, \"xyz\": meet_xyz + np.array([0.05, 0, lift_right]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 0}, # insertion\n {\"t\": 400, \"xyz\": meet_xyz + np.array([0.05, 0, lift_right]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 0}, # insertion\n ]\ndef test_policy(task_name):\n # example rolling out pick_and_transfer policy\n onscreen_render = True\n inject_noise = False\n # setup the environment\n episode_len = SIM_TASK_CONFIGS[task_name]['episode_len']",
+ "type": "code",
+ "location": "/scripted_policy.py:142-158"
+ },
+ "491": {
+ "file_id": 28,
+ "content": "This code defines a policy for picking up and transferring an object, with specific timings and positions. The policy is applied within the `test_policy` function, which also sets up the environment and allows for onscreen rendering and noise injection.",
+ "type": "comment"
+ },
+ "492": {
+ "file_id": 28,
+ "content": " if 'sim_transfer_cube' in task_name:\n env = make_ee_sim_env('sim_transfer_cube')\n elif 'sim_insertion' in task_name:\n env = make_ee_sim_env('sim_insertion')\n else:\n raise NotImplementedError\n for episode_idx in range(2):\n ts = env.reset()\n episode = [ts]\n if onscreen_render:\n ax = plt.subplot()\n plt_img = ax.imshow(ts.observation['images']['angle'])\n plt.ion()\n policy = PickAndTransferPolicy(inject_noise)\n for step in range(episode_len):\n action = policy(ts)\n ts = env.step(action)\n episode.append(ts)\n if onscreen_render:\n plt_img.set_data(ts.observation['images']['angle'])\n plt.pause(0.02)\n plt.close()\n episode_return = np.sum([ts.reward for ts in episode[1:]])\n if episode_return > 0:\n print(f\"{episode_idx=} Successful, {episode_return=}\")\n else:\n print(f\"{episode_idx=} Failed\")\nif __name__ == '__main__':",
+ "type": "code",
+ "location": "/scripted_policy.py:159-191"
+ },
+ "493": {
+ "file_id": 28,
+ "content": "The code initializes an environment (env) depending on the task_name, and then executes two episodes of actions. For each episode, it resets the environment, performs actions based on a PickAndTransferPolicy, and updates the state. If onscreen_render is True, it renders the state using matplotlib. It calculates the episode return and prints whether the episode was successful or not based on the return value. The code is called as a main function.",
+ "type": "comment"
+ },
+ "494": {
+ "file_id": 28,
+ "content": " test_task_name = 'sim_transfer_cube_scripted'\n test_policy(test_task_name)",
+ "type": "code",
+ "location": "/scripted_policy.py:192-193"
+ },
+ "495": {
+ "file_id": 28,
+ "content": "The code is calling a test_policy function with the task name \"sim_transfer_cube_scripted\". This suggests it's testing a simulation script for transferring a cube.",
+ "type": "comment"
+ },
+ "496": {
+ "file_id": 29,
+ "content": "/setup.py",
+ "type": "filepath"
+ },
+ "497": {
+ "file_id": 29,
+ "content": "This code is a Python setup script that utilizes the distutils and setuptools packages to create a distribution package for the 'act' software. It specifies the name, version, packages, license, and long_description of the software.",
+ "type": "summary"
+ },
+ "498": {
+ "file_id": 29,
+ "content": "from distutils.core import setup\nfrom setuptools import find_packages\nsetup(\n name='act',\n version='0.0.0',\n packages=find_packages(),\n license='MIT License',\n long_description=open('README.md').read(),\n)",
+ "type": "code",
+ "location": "/setup.py:1-10"
+ },
+ "499": {
+ "file_id": 29,
+ "content": "This code is a Python setup script that utilizes the distutils and setuptools packages to create a distribution package for the 'act' software. It specifies the name, version, packages, license, and long_description of the software.",
+ "type": "comment"
+ }
+}
\ No newline at end of file
diff --git a/docs/data/5.json b/docs/data/5.json
new file mode 100644
index 00000000..f141021d
--- /dev/null
+++ b/docs/data/5.json
@@ -0,0 +1,549 @@
+{
+ "500": {
+ "file_id": 30,
+ "content": "/sim_env.py",
+ "type": "filepath"
+ },
+ "501": {
+ "file_id": 30,
+ "content": "This code defines a bi-manual manipulation environment, sets up action and observation spaces for cube transfer tasks, initializes physics simulation, and enables interactive plotting. It determines rewards based on contact and gripper positions.",
+ "type": "summary"
+ },
+ "502": {
+ "file_id": 30,
+ "content": "import numpy as np\nimport os\nimport collections\nimport matplotlib.pyplot as plt\nfrom dm_control import mujoco\nfrom dm_control.rl import control\nfrom dm_control.suite import base\nfrom constants import DT, XML_DIR, START_ARM_POSE\nfrom constants import PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN\nfrom constants import MASTER_GRIPPER_POSITION_NORMALIZE_FN\nfrom constants import PUPPET_GRIPPER_POSITION_NORMALIZE_FN\nfrom constants import PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN\nimport IPython\ne = IPython.embed\nBOX_POSE = [None] # to be changed from outside\ndef make_sim_env(task_name):\n \"\"\"\n Environment for simulated robot bi-manual manipulation, with joint position control\n Action space: [left_arm_qpos (6), # absolute joint position\n left_gripper_positions (1), # normalized gripper position (0: close, 1: open)\n right_arm_qpos (6), # absolute joint position\n right_gripper_positions (1),] # normalized gripper position (0: close, 1: open)",
+ "type": "code",
+ "location": "/sim_env.py:1-26"
+ },
+ "503": {
+ "file_id": 30,
+ "content": "The code imports necessary libraries and defines a function called make_sim_env, which creates a simulation environment for robot bi-manual manipulation with joint position control. The action space consists of left arm joint positions, left gripper position (normalized), right arm joint positions, and right gripper position (normalized).",
+ "type": "comment"
+ },
+ "504": {
+ "file_id": 30,
+ "content": " Observation space: {\"qpos\": Concat[ left_arm_qpos (6), # absolute joint position\n left_gripper_position (1), # normalized gripper position (0: close, 1: open)\n right_arm_qpos (6), # absolute joint position\n right_gripper_qpos (1)] # normalized gripper position (0: close, 1: open)\n \"qvel\": Concat[ left_arm_qvel (6), # absolute joint velocity (rad)\n left_gripper_velocity (1), # normalized gripper velocity (pos: opening, neg: closing)\n right_arm_qvel (6), # absolute joint velocity (rad)\n right_gripper_qvel (1)] # normalized gripper velocity (pos: opening, neg: closing)\n \"images\": {\"main\": (480x640x3)} # h, w, c, dtype='uint8'\n \"\"\"\n if 'sim_transfer_cube' in task_name:",
+ "type": "code",
+ "location": "/sim_env.py:28-38"
+ },
+ "505": {
+ "file_id": 30,
+ "content": "This code defines the observation space for a simulation environment, including joint positions and velocities of both arms and gripper states, along with image input from a camera. It is specific to tasks involving transferring cubes.",
+ "type": "comment"
+ },
+ "506": {
+ "file_id": 30,
+ "content": " xml_path = os.path.join(XML_DIR, f'bimanual_viperx_transfer_cube.xml')\n physics = mujoco.Physics.from_xml_path(xml_path)\n task = TransferCubeTask(random=False)\n env = control.Environment(physics, task, time_limit=20, control_timestep=DT,\n n_sub_steps=None, flat_observation=False)\n elif 'sim_insertion' in task_name:\n xml_path = os.path.join(XML_DIR, f'bimanual_viperx_insertion.xml')\n physics = mujoco.Physics.from_xml_path(xml_path)\n task = InsertionTask(random=False)\n env = control.Environment(physics, task, time_limit=20, control_timestep=DT,\n n_sub_steps=None, flat_observation=False)\n else:\n raise NotImplementedError\n return env\nclass BimanualViperXTask(base.Task):\n def __init__(self, random=None):\n super().__init__(random=random)\n def before_step(self, action, physics):\n left_arm_action = action[:6]\n right_arm_action = action[7:7+6]\n normalized_left_gripper_action = action[6]",
+ "type": "code",
+ "location": "/sim_env.py:39-61"
+ },
+ "507": {
+ "file_id": 30,
+ "content": "The code sets up a bimanual ViperX environment with either a cube transfer or an insertion task. It first defines the XML path for the environment, then initializes physics from the path and a specific task (TransferCubeTask or InsertionTask) depending on the task_name. The Environment class is instantiated with these parameters, including time limit and control timestep. Finally, it returns the environment and initializes BimanualViperXTask which extends base.Task.",
+ "type": "comment"
+ },
+ "508": {
+ "file_id": 30,
+ "content": " normalized_right_gripper_action = action[7+6]\n left_gripper_action = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(normalized_left_gripper_action)\n right_gripper_action = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(normalized_right_gripper_action)\n full_left_gripper_action = [left_gripper_action, -left_gripper_action]\n full_right_gripper_action = [right_gripper_action, -right_gripper_action]\n env_action = np.concatenate([left_arm_action, full_left_gripper_action, right_arm_action, full_right_gripper_action])\n super().before_step(env_action, physics)\n return\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n super().initialize_episode(physics)\n @staticmethod\n def get_qpos(physics):\n qpos_raw = physics.data.qpos.copy()\n left_qpos_raw = qpos_raw[:8]\n right_qpos_raw = qpos_raw[8:16]\n left_arm_qpos = left_qpos_raw[:6]\n right_arm_qpos = right_qpos_raw[:6]",
+ "type": "code",
+ "location": "/sim_env.py:62-84"
+ },
+ "509": {
+ "file_id": 30,
+ "content": "This code initializes the environment for each episode and before each step, performing actions on a puppet using gripper positions that are first normalized then unnormalized. The actions involve left and right arm movements as well as full gripper actions. It also retrieves the state of the environment using qpos from physics data.",
+ "type": "comment"
+ },
+ "510": {
+ "file_id": 30,
+ "content": " left_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(left_qpos_raw[6])]\n right_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(right_qpos_raw[6])]\n return np.concatenate([left_arm_qpos, left_gripper_qpos, right_arm_qpos, right_gripper_qpos])\n @staticmethod\n def get_qvel(physics):\n qvel_raw = physics.data.qvel.copy()\n left_qvel_raw = qvel_raw[:8]\n right_qvel_raw = qvel_raw[8:16]\n left_arm_qvel = left_qvel_raw[:6]\n right_arm_qvel = right_qvel_raw[:6]\n left_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(left_qvel_raw[6])]\n right_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(right_qvel_raw[6])]\n return np.concatenate([left_arm_qvel, left_gripper_qvel, right_arm_qvel, right_gripper_qvel])\n @staticmethod\n def get_env_state(physics):\n raise NotImplementedError\n def get_observation(self, physics):\n obs = collections.OrderedDict()\n obs['qpos'] = self.get_qpos(physics)\n obs['qvel'] = self.get_qvel(physics)",
+ "type": "code",
+ "location": "/sim_env.py:85-107"
+ },
+ "511": {
+ "file_id": 30,
+ "content": "This code defines two methods, `get_qpos` and `get_qvel`, which extract the joint positions and velocities from physics data. The left and right gripper positions and velocities are normalized using `PUPPET_GRIPPER_POSITION_NORMALIZE_FN` and `PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN`. These values are then concatenated into observation arrays 'qpos' and 'qvel', which will be used for the environment state. The `get_env_state` method is not implemented yet, while `get_observation` combines the qpos and qvel observations in an ordered dictionary.",
+ "type": "comment"
+ },
+ "512": {
+ "file_id": 30,
+ "content": " obs['env_state'] = self.get_env_state(physics)\n obs['images'] = dict()\n obs['images']['top'] = physics.render(height=480, width=640, camera_id='top')\n obs['images']['left_wrist'] = physics.render(height=480, width=640, camera_id='left_wrist')\n obs['images']['right_wrist'] = physics.render(height=480, width=640, camera_id='right_wrist')\n # obs['images']['angle'] = physics.render(height=480, width=640, camera_id='angle')\n # obs['images']['vis'] = physics.render(height=480, width=640, camera_id='front_close')\n return obs\n def get_reward(self, physics):\n # return whether left gripper is holding the box\n raise NotImplementedError\nclass TransferCubeTask(BimanualViperXTask):\n def __init__(self, random=None):\n super().__init__(random=random)\n self.max_reward = 4\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n # TODO Notice: this function does not randomize the env configuration. Instead, set BOX_POSE from outside",
+ "type": "code",
+ "location": "/sim_env.py:108-130"
+ },
+ "513": {
+ "file_id": 30,
+ "content": "This code is defining a class `SimEnv` which returns the observation and reward in a bimanual task. It also includes methods for getting the environment state and calculating rewards based on left gripper holding the box. The `TransferCubeTask` inherits from `BimanualViperXTask` and initializes the environment at the start of each episode, with a maximum reward set to 4.",
+ "type": "comment"
+ },
+ "514": {
+ "file_id": 30,
+ "content": " # reset qpos, control and box position\n with physics.reset_context():\n physics.named.data.qpos[:16] = START_ARM_POSE\n np.copyto(physics.data.ctrl, START_ARM_POSE)\n assert BOX_POSE[0] is not None\n physics.named.data.qpos[-7:] = BOX_POSE[0]\n # print(f\"{BOX_POSE=}\")\n super().initialize_episode(physics)\n @staticmethod\n def get_env_state(physics):\n env_state = physics.data.qpos.copy()[16:]\n return env_state\n def get_reward(self, physics):\n # return whether left gripper is holding the box\n all_contact_pairs = []\n for i_contact in range(physics.data.ncon):\n id_geom_1 = physics.data.contact[i_contact].geom1\n id_geom_2 = physics.data.contact[i_contact].geom2\n name_geom_1 = physics.model.id2name(id_geom_1, 'geom')\n name_geom_2 = physics.model.id2name(id_geom_2, 'geom')\n contact_pair = (name_geom_1, name_geom_2)\n all_contact_pairs.append(contact_pair)",
+ "type": "code",
+ "location": "/sim_env.py:131-154"
+ },
+ "515": {
+ "file_id": 30,
+ "content": "Code resets the arm and box positions, gets the environment state by copying qpos values from 16th index onwards, and calculates the reward based on gripper contact with the box.",
+ "type": "comment"
+ },
+ "516": {
+ "file_id": 30,
+ "content": " touch_left_gripper = (\"red_box\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs\n touch_right_gripper = (\"red_box\", \"vx300s_right/10_right_gripper_finger\") in all_contact_pairs\n touch_table = (\"red_box\", \"table\") in all_contact_pairs\n reward = 0\n if touch_right_gripper:\n reward = 1\n if touch_right_gripper and not touch_table: # lifted\n reward = 2\n if touch_left_gripper: # attempted transfer\n reward = 3\n if touch_left_gripper and not touch_table: # successful transfer\n reward = 4\n return reward\nclass InsertionTask(BimanualViperXTask):\n def __init__(self, random=None):\n super().__init__(random=random)\n self.max_reward = 4\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n # TODO Notice: this function does not randomize the env configuration. Instead, set BOX_POSE from outside\n # reset qpos, control and box position",
+ "type": "code",
+ "location": "/sim_env.py:156-180"
+ },
+ "517": {
+ "file_id": 30,
+ "content": "This code snippet checks for the contact between different objects and assigns a reward based on those contacts. The 'InsertionTask' class initializes an episode by resetting the qpos, control, and box position. However, it currently does not randomize the environment configuration.",
+ "type": "comment"
+ },
+ "518": {
+ "file_id": 30,
+ "content": " with physics.reset_context():\n physics.named.data.qpos[:16] = START_ARM_POSE\n np.copyto(physics.data.ctrl, START_ARM_POSE)\n assert BOX_POSE[0] is not None\n physics.named.data.qpos[-7*2:] = BOX_POSE[0] # two objects\n # print(f\"{BOX_POSE=}\")\n super().initialize_episode(physics)\n @staticmethod\n def get_env_state(physics):\n env_state = physics.data.qpos.copy()[16:]\n return env_state\n def get_reward(self, physics):\n # return whether peg touches the pin\n all_contact_pairs = []\n for i_contact in range(physics.data.ncon):\n id_geom_1 = physics.data.contact[i_contact].geom1\n id_geom_2 = physics.data.contact[i_contact].geom2\n name_geom_1 = physics.model.id2name(id_geom_1, 'geom')\n name_geom_2 = physics.model.id2name(id_geom_2, 'geom')\n contact_pair = (name_geom_1, name_geom_2)\n all_contact_pairs.append(contact_pair)\n touch_right_gripper = (\"red_peg\", \"vx300s_right/10_right_gripper_finger\") in all_contact_pairs",
+ "type": "code",
+ "location": "/sim_env.py:181-205"
+ },
+ "519": {
+ "file_id": 30,
+ "content": "This code is part of a physics simulation environment setup. It initializes the episode by setting up the arm and box positions, and then defines methods to get the environment state and reward based on contact between objects in the simulation.",
+ "type": "comment"
+ },
+ "520": {
+ "file_id": 30,
+ "content": " touch_left_gripper = (\"socket-1\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-2\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-3\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-4\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs\n peg_touch_table = (\"red_peg\", \"table\") in all_contact_pairs\n socket_touch_table = (\"socket-1\", \"table\") in all_contact_pairs or \\\n (\"socket-2\", \"table\") in all_contact_pairs or \\\n (\"socket-3\", \"table\") in all_contact_pairs or \\\n (\"socket-4\", \"table\") in all_contact_pairs\n peg_touch_socket = (\"red_peg\", \"socket-1\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-2\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-3\") in all_contact_pairs or \\",
+ "type": "code",
+ "location": "/sim_env.py:206-218"
+ },
+ "521": {
+ "file_id": 30,
+ "content": "This code checks if the left gripper finger of vx300s_left is in contact with any of the four sockets, and also verifies if the red peg is touching the table, a socket, or itself. The purpose seems to be detecting specific object interactions in a simulated environment.",
+ "type": "comment"
+ },
+ "522": {
+ "file_id": 30,
+ "content": " (\"red_peg\", \"socket-4\") in all_contact_pairs\n pin_touched = (\"red_peg\", \"pin\") in all_contact_pairs\n reward = 0\n if touch_left_gripper and touch_right_gripper: # touch both\n reward = 1\n if touch_left_gripper and touch_right_gripper and (not peg_touch_table) and (not socket_touch_table): # grasp both\n reward = 2\n if peg_touch_socket and (not peg_touch_table) and (not socket_touch_table): # peg and socket touching\n reward = 3\n if pin_touched: # successful insertion\n reward = 4\n return reward\ndef get_action(master_bot_left, master_bot_right):\n action = np.zeros(14)\n # arm action\n action[:6] = master_bot_left.dxl.joint_states.position[:6]\n action[7:7+6] = master_bot_right.dxl.joint_states.position[:6]\n # gripper action\n left_gripper_pos = master_bot_left.dxl.joint_states.position[7]\n right_gripper_pos = master_bot_right.dxl.joint_states.position[7]\n normalized_left_pos = MASTER_GRIPPER_POSITION_NORMALIZE_FN(left_gripper_pos)",
+ "type": "code",
+ "location": "/sim_env.py:219-242"
+ },
+ "523": {
+ "file_id": 30,
+ "content": "The code is defining a function to determine rewards based on contact between different objects and checking gripper positions. It also includes a function for generating action sequences, setting arm joint positions, and normalizing left gripper position.",
+ "type": "comment"
+ },
+ "524": {
+ "file_id": 30,
+ "content": " normalized_right_pos = MASTER_GRIPPER_POSITION_NORMALIZE_FN(right_gripper_pos)\n action[6] = normalized_left_pos\n action[7+6] = normalized_right_pos\n return action\ndef test_sim_teleop():\n \"\"\" Testing teleoperation in sim with ALOHA. Requires hardware and ALOHA repo to work. \"\"\"\n from interbotix_xs_modules.arm import InterbotixManipulatorXS\n BOX_POSE[0] = [0.2, 0.5, 0.05, 1, 0, 0, 0]\n # source of data\n master_bot_left = InterbotixManipulatorXS(robot_model=\"wx250s\", group_name=\"arm\", gripper_name=\"gripper\",\n robot_name=f'master_left', init_node=True)\n master_bot_right = InterbotixManipulatorXS(robot_model=\"wx250s\", group_name=\"arm\", gripper_name=\"gripper\",\n robot_name=f'master_right', init_node=False)\n # setup the environment\n env = make_sim_env('sim_transfer_cube')\n ts = env.reset()\n episode = [ts]\n # setup plotting\n ax = plt.subplot()\n plt_img = ax.imshow(ts.observation['images']['angle'])",
+ "type": "code",
+ "location": "/sim_env.py:243-266"
+ },
+ "525": {
+ "file_id": 30,
+ "content": "This code sets up a teleoperation test in the simulation environment using ALOHA and InterbotixManipulatorXS for left and right arms. It initializes the environment, resets it, and starts an episode by adding the first timestep to the episode list. It also sets up plotting for visualizing the simulation's observation images.",
+ "type": "comment"
+ },
+ "526": {
+ "file_id": 30,
+ "content": " plt.ion()\n for t in range(1000):\n action = get_action(master_bot_left, master_bot_right)\n ts = env.step(action)\n episode.append(ts)\n plt_img.set_data(ts.observation['images']['angle'])\n plt.pause(0.02)\nif __name__ == '__main__':\n test_sim_teleop()",
+ "type": "code",
+ "location": "/sim_env.py:267-279"
+ },
+ "527": {
+ "file_id": 30,
+ "content": "This code enables interactive plotting of the simulation environment's observations and takes input actions for a specific number of time steps. It utilizes matplotlib's `ion()` function to enable interactive plotting, and then iterates through 1000 time steps, getting actions from `get_action` and updating the plot using `plt_img.set_data`. The `test_sim_teleop()` function is called when the script is run directly.",
+ "type": "comment"
+ },
+ "528": {
+ "file_id": 31,
+ "content": "/train_actuator_network.py",
+ "type": "filepath"
+ },
+ "529": {
+ "file_id": 31,
+ "content": "The code trains a neural network, visualizes predictions, logs progress, handles exceptions, performs forward/backward passes, updates policy state, saves checkpoints, and sets up data loaders for validation. It plots and saves commanded, observed, and predicted angular speeds for an actuator network, initializes a transformer-based prediction network, calculates MSE loss, normalizes data, and trains the actuator network if necessary.",
+ "type": "summary"
+ },
+ "530": {
+ "file_id": 31,
+ "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom torch.utils.data import DataLoader\nimport os\nimport h5py\nimport math\nimport wandb\nimport pickle\nimport matplotlib.pyplot as plt\nfrom copy import deepcopy\nfrom tqdm import tqdm\nfrom utils import find_all_hdf5\nfrom imitate_episodes import repeater, compute_dict_mean\nimport IPython\ne = IPython.embed\ndef main():\n ### Idea\n # input : o o o o o o # observed speed \n # target: a a a a a a # commanded speed\n # at test time, input desired speed profile and convert that to command\n #########################################################\n history_len = 50\n future_len = 50\n prediction_len = 50\n batch_size_train = 16\n batch_size_val = 16\n lr = 1e-4\n weight_decay = 1e-4\n num_steps = 10000\n validate_every = 2000\n save_every = 2000\n expr_name = f'actuator_network_test_{history_len}_{future_len}_{prediction_len}'\n ckpt_dir = f'/scr/tonyzhao/train_logs/{expr_name}' if os.getlogin() == 'tonyzhao' else f'./ckpts/{expr_name}'",
+ "type": "code",
+ "location": "/train_actuator_network.py:2-41"
+ },
+ "531": {
+ "file_id": 31,
+ "content": "This code is importing necessary libraries and defining parameters for training an actuator network. The actuator network takes in observed speed inputs and converts them into desired commanded speeds at test time. It will train the network using specified batch sizes, learning rate, weight decay, number of steps, and save checkpoints periodically.",
+ "type": "comment"
+ },
+ "532": {
+ "file_id": 31,
+ "content": " dataset_dir = '/scr/tonyzhao/compressed_datasets/aloha_mobile_fork/' if os.getlogin() == 'tonyzhao' else '/home/zfu/data/aloha_mobile_fork/'\n #########################################################\n assert(history_len + future_len >= prediction_len)\n assert(future_len % prediction_len == 0)\n wandb.init(project=\"mobile-aloha2\", reinit=True, entity=\"mobile-aloha2\", name=expr_name) # mode='disabled', \n if not os.path.isdir(ckpt_dir):\n os.makedirs(ckpt_dir)\n dataset_path_list = find_all_hdf5(dataset_dir, skip_mirrored_data=True)\n dataset_path_list = [n for n in dataset_path_list if 'replayed' in n]\n num_episodes = len(dataset_path_list)\n # obtain train test split\n train_ratio = 0.9\n shuffled_episode_ids = np.random.permutation(num_episodes)\n train_episode_ids = shuffled_episode_ids[:int(train_ratio * num_episodes)]\n val_episode_ids = shuffled_episode_ids[int(train_ratio * num_episodes):]\n print(f'\\n\\nData from: {dataset_dir}\\n- Train on {len(train_episode_ids)} episodes\\n- Test on {len(val_episode_ids)} episodes\\n\\n')",
+ "type": "code",
+ "location": "/train_actuator_network.py:42-61"
+ },
+ "533": {
+ "file_id": 31,
+ "content": "Code initializes variables, asserts conditions, initializes a wandb project, checks if a directory exists, finds HDF5 files in the dataset directory, calculates train and validation split, and prints information about the data source.",
+ "type": "comment"
+ },
+ "534": {
+ "file_id": 31,
+ "content": " # obtain normalization stats for qpos and action\n # if load_pretrain:\n # with open(os.path.join('/home/zfu/interbotix_ws/src/act/ckpts/pretrain_all', 'dataset_stats.pkl'), 'rb') as f:\n # norm_stats = pickle.load(f)\n # print('Loaded pretrain dataset stats')\n norm_stats, all_episode_len = get_norm_stats(dataset_path_list)\n train_episode_len = [all_episode_len[i] for i in train_episode_ids]\n val_episode_len = [all_episode_len[i] for i in val_episode_ids]\n assert(all_episode_len[0] % prediction_len == 0)\n # save dataset stats\n stats_path = os.path.join(ckpt_dir, f'actuator_net_stats.pkl')\n with open(stats_path, 'wb') as f:\n pickle.dump(norm_stats, f)\n # construct dataset and dataloader\n train_dataset = EpisodicDataset(dataset_path_list, norm_stats, train_episode_ids, train_episode_len, history_len, future_len, prediction_len)\n val_dataset = EpisodicDataset(dataset_path_list, norm_stats, val_episode_ids, val_episode_len, history_len, future_len, prediction_len)",
+ "type": "code",
+ "location": "/train_actuator_network.py:63-80"
+ },
+ "535": {
+ "file_id": 31,
+ "content": "This code loads normalization stats for qpos and action, either from a file or by calling get_norm_stats function. It then calculates train and val episode lengths based on episode IDs. The code asserts that the all_episode_len is divisible by prediction_len. Next, it saves the dataset stats in a pickle file, constructs train and val datasets using EpisodicDataset class, and utilizes these datasets for further training.",
+ "type": "comment"
+ },
+ "536": {
+ "file_id": 31,
+ "content": " train_dataloader = DataLoader(train_dataset, batch_size=batch_size_train, shuffle=True, pin_memory=True, num_workers=1, prefetch_factor=1)\n val_dataloader = DataLoader(val_dataset, batch_size=batch_size_val, shuffle=True, pin_memory=True, num_workers=1, prefetch_factor=1)\n policy = ActuatorNetwork(prediction_len).cuda()\n optimizer = torch.optim.AdamW(policy.parameters(), lr=lr, weight_decay=weight_decay)\n n_parameters = sum(p.numel() for p in policy.parameters() if p.requires_grad)\n print(\"number of parameters: %.2fM\" % (n_parameters/1e6,))\n min_val_loss = np.inf\n best_ckpt_info = None\n train_dataloader = repeater(train_dataloader)\n for step in tqdm(range(num_steps+1)):\n # validation\n if step % validate_every == 0:\n print('validating')\n with torch.inference_mode():\n policy.eval()\n validation_dicts = []\n for batch_idx, data in enumerate(val_dataloader):\n observed_speed, commanded_speed = data",
+ "type": "code",
+ "location": "/train_actuator_network.py:81-102"
+ },
+ "537": {
+ "file_id": 31,
+ "content": "Creates data loaders for training and validation datasets. Initializes ActuatorNetwork model, optimizer, and prints the number of parameters. Sets initial minimum validation loss and best checkpoint information. Repeats training data loader for iterations. Validates model performance at specified intervals.",
+ "type": "comment"
+ },
+ "538": {
+ "file_id": 31,
+ "content": " out, forward_dict = policy(observed_speed.cuda(), commanded_speed.cuda())\n validation_dicts.append(forward_dict)\n validation_summary = compute_dict_mean(validation_dicts)\n epoch_val_loss = validation_summary['loss']\n if epoch_val_loss < min_val_loss:\n min_val_loss = epoch_val_loss\n best_ckpt_info = (step, min_val_loss, deepcopy(policy.state_dict()))\n for k in list(validation_summary.keys()):\n validation_summary[f'val_{k}'] = validation_summary.pop(k) \n wandb.log(validation_summary, step=step)\n print(f'Val loss: {epoch_val_loss:.5f}')\n summary_string = ''\n for k, v in validation_summary.items():\n summary_string += f'{k}: {v.item():.3f} '\n print(summary_string)\n visualize_prediction(dataset_path_list, val_episode_ids, policy, norm_stats, history_len, future_len, prediction_len, ckpt_dir, step, 'val')",
+ "type": "code",
+ "location": "/train_actuator_network.py:103-121"
+ },
+ "539": {
+ "file_id": 31,
+ "content": "This code measures the validation loss during training, keeps track of the best validation loss so far, logs the current validation summary to Wandb, and prints out a summary for the current epoch. It also visualizes predictions with a separate function.",
+ "type": "comment"
+ },
+ "540": {
+ "file_id": 31,
+ "content": " visualize_prediction(dataset_path_list, train_episode_ids, policy, norm_stats, history_len, future_len, prediction_len, ckpt_dir, step, 'train')\n # training\n policy.train()\n optimizer.zero_grad()\n data = next(train_dataloader)\n observed_speed, commanded_speed = data\n out, forward_dict = policy(observed_speed.cuda(), commanded_speed.cuda())\n # backward\n loss = forward_dict['loss']\n loss.backward()\n optimizer.step()\n wandb.log(forward_dict, step=step) # not great, make training 1-2% slower\n if step % save_every == 0:\n ckpt_path = os.path.join(ckpt_dir, f'actuator_net_step_{step}.ckpt')\n torch.save(policy.state_dict(), ckpt_path)\n ckpt_path = os.path.join(ckpt_dir, f'actuator_net_last.ckpt')\n torch.save(policy.state_dict(), ckpt_path)\n best_step, min_val_loss, best_state_dict = best_ckpt_info\n ckpt_path = os.path.join(ckpt_dir, f'actuator_net_step_{best_step}.ckpt')\n torch.save(best_state_dict, ckpt_path)",
+ "type": "code",
+ "location": "/train_actuator_network.py:122-146"
+ },
+ "541": {
+ "file_id": 31,
+ "content": "The code is training an actuator network policy using data from a dataloader. It performs forward and backward passes to calculate loss, updates the policy's state with an optimizer, logs progress to W&B, saves checkpoints at specified intervals, and overwrites the latest checkpoint with the final step of training.",
+ "type": "comment"
+ },
+ "542": {
+ "file_id": 31,
+ "content": " print(f'Training finished:\\nval loss {min_val_loss:.6f} at step {best_step}')\ndef visualize_prediction(dataset_path_list, episode_ids, policy, norm_stats, history_len, future_len, prediction_len, ckpt_dir, step, name):\n num_vis = 2\n episode_ids = episode_ids[:num_vis]\n vis_path = [dataset_path_list[i] for i in episode_ids]\n for i, dataset_path in enumerate(vis_path):\n try:\n with h5py.File(dataset_path, 'r') as root:\n commanded_speed = root['/base_action'][()]\n observed_speed = root['/obs_tracer'][()]\n except Exception as ee:\n print(f'Error loading {dataset_path} in get_norm_stats')\n print(ee)\n quit()\n # commanded_speed = (commanded_speed - norm_stats[\"commanded_speed_mean\"]) / norm_stats[\"commanded_speed_std\"]\n norm_observed_speed = (observed_speed - norm_stats[\"observed_speed_mean\"]) / norm_stats[\"observed_speed_std\"]\n out_unnorm_fn = lambda x: (x * norm_stats[\"commanded_speed_std\"]) + norm_stats[\"commanded_speed_mean\"]",
+ "type": "code",
+ "location": "/train_actuator_network.py:147-167"
+ },
+ "543": {
+ "file_id": 31,
+ "content": "This code segment is responsible for training a neural network and visualizing the predictions. It prints the minimum validation loss and the corresponding step number when training finishes. The visualize_prediction function reads data from a dataset path list, selects episodes for visualization, loads data from HDF5 files, normalizes observed speeds, and provides an unnormalized output function. It also handles potential exceptions during data loading.",
+ "type": "comment"
+ },
+ "544": {
+ "file_id": 31,
+ "content": " history_pad = np.zeros((history_len, 2))\n future_pad = np.zeros((future_len, 2))\n norm_observed_speed = np.concatenate([history_pad, norm_observed_speed, future_pad], axis=0)\n episode_len = commanded_speed.shape[0]\n all_pred = []\n for t in range(0, episode_len, prediction_len):\n offset_start_ts = t + history_len\n policy_input = norm_observed_speed[offset_start_ts-history_len: offset_start_ts+future_len]\n policy_input = torch.from_numpy(policy_input).float().unsqueeze(dim=0).cuda()\n pred = policy(policy_input)\n pred = pred.detach().cpu().numpy()[0]\n all_pred += out_unnorm_fn(pred).tolist()\n all_pred = np.array(all_pred)\n plot_path = os.path.join(ckpt_dir, f'{name}{i}_step{step}_linear')\n plt.figure()\n plt.plot(commanded_speed[:, 0], label='commanded_speed_linear')\n plt.plot(observed_speed[:, 0], label='observed_speed_linear')\n plt.plot(all_pred[:, 0], label='pred_commanded_speed_linear')",
+ "type": "code",
+ "location": "/train_actuator_network.py:169-189"
+ },
+ "545": {
+ "file_id": 31,
+ "content": "This code segment is preparing input data and feeding it to a neural network policy for prediction. The predicted commanded speed values are then plotted alongside the actual commanded and observed speeds in a plot.",
+ "type": "comment"
+ },
+ "546": {
+ "file_id": 31,
+ "content": " # plot vertical grey dotted lines every prediction_len\n for t in range(0, episode_len, prediction_len):\n plt.axvline(t, linestyle='--', color='grey')\n plt.legend()\n plt.savefig(plot_path)\n plt.close()\n plot_path = os.path.join(ckpt_dir, f'{name}{i}_step{step}_angular')\n plt.figure()\n plt.plot(commanded_speed[:, 1], label='commanded_speed_angular')\n plt.plot(observed_speed[:, 1], label='observed_speed_angular')\n plt.plot(all_pred[:, 1], label='pred_commanded_speed_angular')\n # plot vertical dotted lines every prediction_len\n for t in range(0, episode_len, prediction_len):\n plt.axvline(t, linestyle='--', color='grey')\n plt.legend()\n plt.savefig(plot_path)\n plt.close()\nclass ActuatorNetwork(nn.Module):\n def __init__(self, prediction_len):\n super().__init__()\n d_model = 256\n encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=8)\n self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=3)",
+ "type": "code",
+ "location": "/train_actuator_network.py:190-217"
+ },
+ "547": {
+ "file_id": 31,
+ "content": "The code plots the commanded, observed, and predicted angular speeds of an actuator network. It saves the resulting plot in a specified directory. The code also includes vertical dotted lines at regular intervals for visual reference. The ActuatorNetwork class initializes a transformer encoder with a specific number of layers and heads.",
+ "type": "comment"
+ },
+ "548": {
+ "file_id": 31,
+ "content": " self.pe = PositionalEncoding(d_model)\n self.in_proj = nn.Linear(2, d_model)\n self.out_proj = nn.Linear(d_model, 2)\n self.prediction_len = prediction_len\n def forward(self, src, tgt=None):\n if tgt is not None: # training time\n # (batch, seq, feature) -> (seq, batch, feature)\n src = self.in_proj(src)\n src = torch.einsum('b s d -> s b d', src)\n src = self.pe(src)\n out = self.transformer(src)\n tgt = torch.einsum('b s d -> s b d', tgt)\n assert(self.prediction_len == tgt.shape[0])\n out = out[0: self.prediction_len] # take first few tokens only for prediction\n out = self.out_proj(out)\n l2_loss = loss = F.mse_loss(out, tgt)\n loss_dict = {'loss': l2_loss}\n out = torch.einsum('s b d -> b s d', out)\n return out, loss_dict\n else:\n src = self.in_proj(src)\n src = torch.einsum('b s d -> s b d', src)\n src = self.pe(src)",
+ "type": "code",
+ "location": "/train_actuator_network.py:218-243"
+ },
+ "549": {
+ "file_id": 31,
+ "content": "This code initializes a network for transformer-based prediction. It includes a PositionalEncoding layer, input and output projection layers, and a prediction length parameter. During training time, it rearranges input data, applies positional encoding, passes through the transformer, and calculates an MSE loss between predicted and target outputs. It returns predicted outputs and loss dictionary.",
+ "type": "comment"
+ },
+ "550": {
+ "file_id": 31,
+ "content": " out = self.transformer(src)\n out = out[0: self.prediction_len] # take first few tokens only for prediction\n out = self.out_proj(out)\n out = torch.einsum('s b d -> b s d', out)\n return out\nclass PositionalEncoding(nn.Module):\n def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):\n super().__init__()\n self.dropout = nn.Dropout(p=dropout)\n position = torch.arange(max_len).unsqueeze(1)\n div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))\n pe = torch.zeros(max_len, 1, d_model)\n pe[:, 0, 0::2] = torch.sin(position * div_term)\n pe[:, 0, 1::2] = torch.cos(position * div_term)\n self.register_buffer('pe', pe)\n def forward(self, x):\n \"\"\"\n Arguments:\n x: Tensor, shape ``[seq_len, batch_size, embedding_dim]``\n \"\"\"\n x = x + self.pe[:x.size(0)]\n return self.dropout(x)\ndef get_norm_stats(dataset_path_list):\n all_commanded_speed = []",
+ "type": "code",
+ "location": "/train_actuator_network.py:244-272"
+ },
+ "551": {
+ "file_id": 31,
+ "content": "train_actuator_network.py:243-271 - Applies transformer and positional encoding to source data, extracts the first few tokens for prediction, and then rearranges the output.\nPositionalEncoding - Generates positional encodings of a given size and applies them as an additional dimension in an embedding layer.",
+ "type": "comment"
+ },
+ "552": {
+ "file_id": 31,
+ "content": " all_observed_speed = []\n all_episode_len = []\n for dataset_path in dataset_path_list:\n try:\n with h5py.File(dataset_path, 'r') as root:\n commanded_speed = root['/base_action'][()]\n observed_speed = root['/obs_tracer'][()]\n except Exception as e:\n print(f'Error loading {dataset_path} in get_norm_stats')\n print(e)\n quit()\n all_commanded_speed.append(torch.from_numpy(commanded_speed))\n all_observed_speed.append(torch.from_numpy(observed_speed))\n all_episode_len.append(len(commanded_speed))\n all_commanded_speed = torch.cat(all_commanded_speed, dim=0)\n all_observed_speed = torch.cat(all_observed_speed, dim=0)\n # normalize all_commanded_speed\n commanded_speed_mean = all_commanded_speed.mean(dim=[0]).float()\n commanded_speed_std = all_commanded_speed.std(dim=[0]).float()\n commanded_speed_std = torch.clip(commanded_speed_std, 1e-2, np.inf) # clipping\n # normalize all_observed_speed",
+ "type": "code",
+ "location": "/train_actuator_network.py:273-295"
+ },
+ "553": {
+ "file_id": 31,
+ "content": "This code loads and normalizes commanded and observed speed data from multiple datasets. It calculates the mean and standard deviation for both sets of data, clips any outliers in the standard deviation, and stores the normalized data for further analysis or training purposes.",
+ "type": "comment"
+ },
+ "554": {
+ "file_id": 31,
+ "content": " observed_speed_mean = all_observed_speed.mean(dim=[0]).float()\n observed_speed_std = all_observed_speed.std(dim=[0]).float()\n observed_speed_std = torch.clip(observed_speed_std, 1e-2, np.inf) # clipping\n stats = {\"commanded_speed_mean\": commanded_speed_mean.numpy(), \"commanded_speed_std\": commanded_speed_std.numpy(),\n \"observed_speed_mean\": observed_speed_mean.numpy(), \"observed_speed_std\": observed_speed_std.numpy()}\n return stats, all_episode_len\nclass EpisodicDataset(torch.utils.data.Dataset):\n def __init__(self, dataset_path_list, norm_stats, episode_ids, episode_len, history_len, future_len, prediction_len):\n super(EpisodicDataset).__init__()\n self.episode_ids = episode_ids\n self.dataset_path_list = dataset_path_list\n self.norm_stats = norm_stats\n self.episode_len = episode_len\n self.cumulative_len = np.cumsum(self.episode_len)\n self.max_episode_len = max(episode_len)\n self.history_len = history_len\n self.future_len = future_len",
+ "type": "code",
+ "location": "/train_actuator_network.py:296-316"
+ },
+ "555": {
+ "file_id": 31,
+ "content": "This code calculates the mean and standard deviation of observed speeds, clips the standard deviation to prevent extreme values, and stores these statistics in a dictionary. The dictionary contains the means and standard deviations for both commanded and observed speeds. The code also defines an EpisodicDataset class that initializes with dataset paths, normalization stats, episode IDs, episode lengths, history length, future length, and prediction length.",
+ "type": "comment"
+ },
+ "556": {
+ "file_id": 31,
+ "content": " self.prediction_len = prediction_len\n self.is_sim = False\n self.history_pad = np.zeros((self.history_len, 2))\n self.future_pad = np.zeros((self.future_len, 2))\n self.prediction_pad = np.zeros((self.prediction_len, 2))\n self.__getitem__(0) # initialize self.is_sim\n def __len__(self):\n return sum(self.episode_len)\n def _locate_transition(self, index):\n assert index < self.cumulative_len[-1]\n episode_index = np.argmax(self.cumulative_len > index) # argmax returns first True index\n start_ts = index - (self.cumulative_len[episode_index] - self.episode_len[episode_index])\n episode_id = self.episode_ids[episode_index]\n return episode_id, start_ts\n def __getitem__(self, index):\n episode_id, start_ts = self._locate_transition(index)\n dataset_path = self.dataset_path_list[episode_id]\n try:\n # print(dataset_path)\n with h5py.File(dataset_path, 'r') as root:\n commanded_speed = root['/base_action'][()]",
+ "type": "code",
+ "location": "/train_actuator_network.py:317-340"
+ },
+ "557": {
+ "file_id": 31,
+ "content": "Initializes attributes and checks if it is a simulation. Returns length based on episode lengths. Locates transition index, finds the dataset path, and reads commanded speed from the HDF5 file.",
+ "type": "comment"
+ },
+ "558": {
+ "file_id": 31,
+ "content": " observed_speed = root['/obs_tracer'][()]\n observed_speed = np.concatenate([self.history_pad, observed_speed, self.future_pad], axis=0)\n commanded_speed = np.concatenate([commanded_speed, self.prediction_pad], axis=0)\n offset_start_ts = start_ts + self.history_len\n commanded_speed = commanded_speed[start_ts: start_ts+self.prediction_len]\n observed_speed = observed_speed[offset_start_ts-self.history_len: offset_start_ts+self.future_len]\n commanded_speed = torch.from_numpy(commanded_speed).float()\n observed_speed = torch.from_numpy(observed_speed).float()\n # normalize to mean 0 std 1\n commanded_speed = (commanded_speed - self.norm_stats[\"commanded_speed_mean\"]) / self.norm_stats[\"commanded_speed_std\"]\n observed_speed = (observed_speed - self.norm_stats[\"observed_speed_mean\"]) / self.norm_stats[\"observed_speed_std\"]\n except:\n print(f'Error loading {dataset_path} in __getitem__')",
+ "type": "code",
+ "location": "/train_actuator_network.py:341-357"
+ },
+ "559": {
+ "file_id": 31,
+ "content": "This code is preparing input data for a machine learning model. It concatenates historical and future observations with commanded speeds, adjusts the timestamps, and normalizes the data to have zero mean and unit standard deviation. If there's an error loading the dataset, it prints an error message.",
+ "type": "comment"
+ },
+ "560": {
+ "file_id": 31,
+ "content": " quit()\n # print(image_data.dtype, qpos_data.dtype, action_data.dtype, is_pad.dtype)\n return observed_speed, commanded_speed\nif __name__ == '__main__':\n main()",
+ "type": "code",
+ "location": "/train_actuator_network.py:358-367"
+ },
+ "561": {
+ "file_id": 31,
+ "content": "This code appears to be part of a program that trains an actuator network. It defines a function, possibly for training the actuator network, which may take in image data, joint position data, and other related data, calculates observed and commanded speeds, and returns these values. The code also includes a quit() command and some print statements for debugging purposes. Lastly, there is an if __name__ == '__main__': statement that suggests this code could be executed directly as a main program when the script is run.",
+ "type": "comment"
+ },
+ "562": {
+ "file_id": 32,
+ "content": "/train_latent_model.py",
+ "type": "filepath"
+ },
+ "563": {
+ "file_id": 32,
+ "content": "The code uses the ACT-Plus-Plus framework for robot manipulation, incorporating deep reinforcement learning and latent models with visual inputs. It saves and plots training curves while supporting customization through command-line arguments, adding new \"--vq_class\" and \"--vq_dim\" options for the latent model's class and dimensionality.",
+ "type": "summary"
+ },
+ "564": {
+ "file_id": 32,
+ "content": "import torch\nimport numpy as np\nimport os\nimport pickle\nimport argparse\nimport matplotlib.pyplot as plt\nfrom copy import deepcopy\nfrom tqdm import tqdm\nfrom einops import rearrange\nimport torch.nn.functional as F\nfrom constants import DT\nfrom constants import PUPPET_GRIPPER_JOINT_OPEN\nfrom utils import load_data # data functions\nfrom utils import sample_box_pose, sample_insertion_pose # robot functions\nfrom utils import compute_dict_mean, set_seed, detach_dict # helper functions\nfrom policy import ACTPolicy, CNNMLPPolicy\nfrom visualize_episodes import save_videos\nfrom detr.models.latent_model import Latent_Model_Transformer\nfrom sim_env import BOX_POSE\nimport IPython\ne = IPython.embed\ndef main(args):\n set_seed(1)\n # command line parameters\n is_eval = args['eval']\n ckpt_dir = args['ckpt_dir']\n policy_class = args['policy_class']\n onscreen_render = args['onscreen_render']\n task_name = args['task_name']\n batch_size_train = args['batch_size']\n batch_size_val = args['batch_size']\n num_epochs = args['num_epochs']",
+ "type": "code",
+ "location": "/train_latent_model.py:1-36"
+ },
+ "565": {
+ "file_id": 32,
+ "content": "The code imports necessary libraries, defines functions for robot manipulation and data processing. It initializes parameters from command line inputs and sets a seed for reproducibility. This script aims to train a latent model in the ACT-Plus-Plus framework.",
+ "type": "comment"
+ },
+ "566": {
+ "file_id": 32,
+ "content": " # get task parameters\n is_sim = task_name[:4] == 'sim_'\n if is_sim:\n from constants import SIM_TASK_CONFIGS\n task_config = SIM_TASK_CONFIGS[task_name]\n else:\n from aloha_scripts.constants import TASK_CONFIGS\n task_config = TASK_CONFIGS[task_name]\n dataset_dir = task_config['dataset_dir']\n num_episodes = task_config['num_episodes']\n episode_len = task_config['episode_len']\n camera_names = task_config['camera_names']\n name_filter = task_config.get('name_filter', lambda n: True)\n # fixed parameters\n state_dim = 14\n lr_backbone = 1e-5\n backbone = 'resnet18'\n if policy_class == 'ACT':\n enc_layers = 4\n dec_layers = 7\n nheads = 8\n policy_config = {'lr': args['lr'],\n 'num_queries': args['chunk_size'],\n 'kl_weight': args['kl_weight'],\n 'hidden_dim': args['hidden_dim'],\n 'dim_feedforward': args['dim_feedforward'],\n 'lr_backbone': lr_backbone,",
+ "type": "code",
+ "location": "/train_latent_model.py:38-65"
+ },
+ "567": {
+ "file_id": 32,
+ "content": "This code retrieves task parameters from the task name and configuration files, sets fixed parameters for the model, and assigns values to variables like dataset_dir, num_episodes, episode_len, camera_names. The code also applies a lambda function as a name filter, if specified in the configuration file.",
+ "type": "comment"
+ },
+ "568": {
+ "file_id": 32,
+ "content": " 'backbone': backbone,\n 'enc_layers': enc_layers,\n 'dec_layers': dec_layers,\n 'nheads': nheads,\n 'camera_names': camera_names,\n 'vq': True,\n 'vq_class': args['vq_class'],\n 'vq_dim': args['vq_dim'],\n }\n elif policy_class == 'CNNMLP':\n policy_config = {'lr': args['lr'], 'lr_backbone': lr_backbone, 'backbone' : backbone, 'num_queries': 1,\n 'camera_names': camera_names,}\n else:\n raise NotImplementedError\n config = {\n 'num_epochs': num_epochs,\n 'ckpt_dir': ckpt_dir,\n 'episode_len': episode_len,\n 'state_dim': state_dim,\n 'lr': args['lr'],\n 'policy_class': policy_class,\n 'onscreen_render': onscreen_render,\n 'policy_config': policy_config,\n 'task_name': task_name,\n 'seed': args['seed'],\n 'temporal_agg': args['temporal_agg'],",
+ "type": "code",
+ "location": "/train_latent_model.py:66-92"
+ },
+ "569": {
+ "file_id": 32,
+ "content": "This code is defining the configuration for training a latent model. It has different policy classes, such as 'Transformer', 'CNNMLP', and others not yet implemented. The configuration includes parameters like learning rate, backbone architecture, camera names, episode length, etc. If an unsupported policy class is given, it raises a NotImplementedError.",
+ "type": "comment"
+ },
+ "570": {
+ "file_id": 32,
+ "content": " 'camera_names': camera_names,\n 'real_robot': not is_sim\n }\n # if is_eval:\n # ckpt_names = [f'policy_best.ckpt']\n # results = []\n # for ckpt_name in ckpt_names:\n # success_rate, avg_return = eval_bc(config, ckpt_name, save_episode=True)\n # results.append([ckpt_name, success_rate, avg_return])\n # for ckpt_name, success_rate, avg_return in results:\n # print(f'{ckpt_name}: {success_rate=} {avg_return=}')\n # print()\n # exit()\n train_dataloader, val_dataloader, stats, _ = load_data(dataset_dir, name_filter, camera_names, batch_size_train, batch_size_val)\n # save dataset stats\n # if not os.path.isdir(ckpt_dir):\n # os.makedirs(ckpt_dir)\n # stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')\n # with open(stats_path, 'wb') as f:\n # pickle.dump(stats, f)\n ckpt_name = f'policy_last.ckpt'\n best_ckpt_info = train_bc(train_dataloader, val_dataloader, config, ckpt_name)\n best_epoch, min_val_loss, best_state_dict = best_ckpt_info",
+ "type": "code",
+ "location": "/train_latent_model.py:93-120"
+ },
+ "571": {
+ "file_id": 32,
+ "content": "This code snippet is loading data and training a behavioral cloning (BC) model. If `is_eval` is true, it evaluates the best checkpoint. It loads the data, saves the dataset stats if necessary, trains the BC model, and stores information about the best checkpoint.",
+ "type": "comment"
+ },
+ "572": {
+ "file_id": 32,
+ "content": " # save best checkpoint\n ckpt_path = os.path.join(ckpt_dir, f'latent_model_best.ckpt')\n torch.save(best_state_dict, ckpt_path)\n print(f'Best ckpt, val loss {min_val_loss:.6f} @ epoch{best_epoch}')\ndef make_policy(policy_class, policy_config):\n if policy_class == 'ACT':\n policy = ACTPolicy(policy_config)\n elif policy_class == 'CNNMLP':\n policy = CNNMLPPolicy(policy_config)\n else:\n raise NotImplementedError\n return policy\n# def make_optimizer(policy_class, policy):\n# if policy_class == 'ACT':\n# optimizer = policy.configure_optimizers()\n# elif policy_class == 'CNNMLP':\n# optimizer = policy.configure_optimizers()\n# else:\n# raise NotImplementedError\n# return optimizer\ndef get_image(ts, camera_names):\n curr_images = []\n for cam_name in camera_names:\n curr_image = rearrange(ts.observation['images'][cam_name], 'h w c -> c h w')\n curr_images.append(curr_image)\n curr_image = np.stack(curr_images, axis=0)\n curr_image = torch.from_numpy(curr_image / 255.0).float().cuda().unsqueeze(0)",
+ "type": "code",
+ "location": "/train_latent_model.py:122-154"
+ },
+ "573": {
+ "file_id": 32,
+ "content": "Code snippet saves the best checkpoint for a latent model, defines a policy function based on the given class, and gets an image from the observations.",
+ "type": "comment"
+ },
+ "574": {
+ "file_id": 32,
+ "content": " return curr_image\n# def eval_bc(config, ckpt_name, save_episode=True):\n# set_seed(1000)\n# ckpt_dir = config['ckpt_dir']\n# state_dim = config['state_dim']\n# real_robot = config['real_robot']\n# policy_class = config['policy_class']\n# onscreen_render = config['onscreen_render']\n# policy_config = config['policy_config']\n# camera_names = config['camera_names']\n# max_timesteps = config['episode_len']\n# task_name = config['task_name']\n# temporal_agg = config['temporal_agg']\n# onscreen_cam = 'angle'\n# # load policy and stats\n# ckpt_path = os.path.join(ckpt_dir, ckpt_name)\n# policy = make_policy(policy_class, policy_config)\n# loading_status = policy.load_state_dict(torch.load(ckpt_path))\n# print(loading_status)\n# policy.cuda()\n# policy.eval()\n# print(f'Loaded: {ckpt_path}')\n# stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')\n# with open(stats_path, 'rb') as f:\n# stats = pickle.load(f)\n# pre_process = lambda s_qpos: (s_qpos - stats['qpos_mean']) / stats['qpos_std']",
+ "type": "code",
+ "location": "/train_latent_model.py:155-184"
+ },
+ "575": {
+ "file_id": 32,
+ "content": "This code defines a function to evaluate the performance of a trained policy. It loads the policy from a checkpoint file, prepares the necessary configurations, and then evaluates the policy by running episodes. The function takes in the configuration, checkpoint name, and an optional parameter for saving episode results. It uses torch and pickle libraries for loading and processing data.",
+ "type": "comment"
+ },
+ "576": {
+ "file_id": 32,
+ "content": "# post_process = lambda a: a * stats['action_std'] + stats['action_mean']\n# # load environment\n# if real_robot:\n# from aloha_scripts.robot_utils import move_grippers # requires aloha\n# from aloha_scripts.real_env import make_real_env # requires aloha\n# env = make_real_env(init_node=True)\n# env_max_reward = 0\n# else:\n# from sim_env import make_sim_env\n# env = make_sim_env(task_name)\n# env_max_reward = env.task.max_reward\n# query_frequency = policy_config['num_queries']\n# if temporal_agg:\n# query_frequency = 1\n# num_queries = policy_config['num_queries']\n# max_timesteps = int(max_timesteps * 1) # may increase for real-world tasks\n# num_rollouts = 50\n# episode_returns = []\n# highest_rewards = []\n# for rollout_id in range(num_rollouts):\n# rollout_id += 0\n# ### set task\n# if 'sim_transfer_cube' in task_name:\n# BOX_POSE[0] = sample_box_pose() # used in sim reset\n# elif 'sim_insertion' in task_name:",
+ "type": "code",
+ "location": "/train_latent_model.py:185-213"
+ },
+ "577": {
+ "file_id": 32,
+ "content": "This code is initializing an environment, either real or simulated, based on the \"real_robot\" flag. It then sets up variables for rollout number of episodes, maximum timesteps, query frequency (which may change depending on temporal aggregation), and stores episode returns and highest rewards in lists. The last few lines seem to set up task-specific poses for certain tasks.",
+ "type": "comment"
+ },
+ "578": {
+ "file_id": 32,
+ "content": "# BOX_POSE[0] = np.concatenate(sample_insertion_pose()) # used in sim reset\n# ts = env.reset()\n# ### onscreen render\n# if onscreen_render:\n# ax = plt.subplot()\n# plt_img = ax.imshow(env._physics.render(height=480, width=640, camera_id=onscreen_cam))\n# plt.ion()\n# ### evaluation loop\n# if temporal_agg:\n# all_time_actions = torch.zeros([max_timesteps, max_timesteps+num_queries, state_dim]).cuda()\n# qpos_history = torch.zeros((1, max_timesteps, state_dim)).cuda()\n# image_list = [] # for visualization\n# qpos_list = []\n# target_qpos_list = []\n# rewards = []\n# with torch.inference_mode():\n# for t in range(max_timesteps):\n# ### update onscreen render and wait for DT\n# if onscreen_render:\n# image = env._physics.render(height=480, width=640, camera_id=onscreen_cam)\n# plt_img.set_data(image)",
+ "type": "code",
+ "location": "/train_latent_model.py:214-238"
+ },
+ "579": {
+ "file_id": 32,
+ "content": "This code snippet is part of a training process for a latent model. It resets the environment, performs on-screen rendering if needed, and then enters an evaluation loop to collect data for training. The code uses PyTorch for inference mode and handles on-screen rendering, image capturing, and storing data for further analysis or model training.",
+ "type": "comment"
+ },
+ "580": {
+ "file_id": 32,
+ "content": "# plt.pause(DT)\n# ### process previous timestep to get qpos and image_list\n# obs = ts.observation\n# if 'images' in obs:\n# image_list.append(obs['images'])\n# else:\n# image_list.append({'main': obs['image']})\n# qpos_numpy = np.array(obs['qpos'])\n# qpos = pre_process(qpos_numpy)\n# qpos = torch.from_numpy(qpos).float().cuda().unsqueeze(0)\n# qpos_history[:, t] = qpos\n# curr_image = get_image(ts, camera_names)\n# ### query policy\n# if config['policy_class'] == \"ACT\":\n# if t % query_frequency == 0:\n# all_actions = policy(qpos, curr_image)\n# if temporal_agg:\n# all_time_actions[[t], t:t+num_queries] = all_actions\n# actions_for_curr_step = all_time_actions[:, t]\n# actions_populated = torch.all(actions_for_curr_step != 0, axis=1)",
+ "type": "code",
+ "location": "/train_latent_model.py:239-260"
+ },
+ "581": {
+ "file_id": 32,
+ "content": "This code segment is part of a deep reinforcement learning algorithm that interacts with an environment. It processes observations, pre-processes state variables (qpos), and queries the policy to generate actions. The 'policy_class' determines whether to use an ACT policy or not. If so, it queries the policy for actions at specific intervals (query_frequency) and possibly aggregates them over time if temporal_agg is set to True. This algorithm likely trains a latent model in an environment with potential visual input from cameras.",
+ "type": "comment"
+ },
+ "582": {
+ "file_id": 32,
+ "content": "# actions_for_curr_step = actions_for_curr_step[actions_populated]\n# k = 0.01\n# exp_weights = np.exp(-k * np.arange(len(actions_for_curr_step)))\n# exp_weights = exp_weights / exp_weights.sum()\n# exp_weights = torch.from_numpy(exp_weights).cuda().unsqueeze(dim=1)\n# raw_action = (actions_for_curr_step * exp_weights).sum(dim=0, keepdim=True)\n# else:\n# raw_action = all_actions[:, t % query_frequency]\n# elif config['policy_class'] == \"CNNMLP\":\n# raw_action = policy(qpos, curr_image)\n# else:\n# raise NotImplementedError\n# ### post-process actions\n# raw_action = raw_action.squeeze(0).cpu().numpy()\n# action = post_process(raw_action)\n# target_qpos = action\n# ### step the environment",
+ "type": "code",
+ "location": "/train_latent_model.py:261-279"
+ },
+ "583": {
+ "file_id": 32,
+ "content": "This code determines the raw action for a given step in an environment. It first checks the policy class and then applies the appropriate method to get the raw action. If the policy class is \"Exponential\", it calculates weights based on actions, sums them, and uses them to compute the raw action. If the policy class is \"CNNMLP\", it calls a predefined function \"policy\" with the current state and image as inputs. If none of these conditions are met, it raises an error. The resulting raw_action is then post-processed and used to determine target_qpos for the next step in the environment.",
+ "type": "comment"
+ },
+ "584": {
+ "file_id": 32,
+ "content": "# ts = env.step(target_qpos)\n# ### for visualization\n# qpos_list.append(qpos_numpy)\n# target_qpos_list.append(target_qpos)\n# rewards.append(ts.reward)\n# plt.close()\n# if real_robot:\n# move_grippers([env.puppet_bot_left, env.puppet_bot_right], [PUPPET_GRIPPER_JOINT_OPEN] * 2, move_time=0.5) # open\n# pass\n# rewards = np.array(rewards)\n# episode_return = np.sum(rewards[rewards!=None])\n# episode_returns.append(episode_return)\n# episode_highest_reward = np.max(rewards)\n# highest_rewards.append(episode_highest_reward)\n# print(f'Rollout {rollout_id}\\n{episode_return=}, {episode_highest_reward=}, {env_max_reward=}, Success: {episode_highest_reward==env_max_reward}')\n# if save_episode:\n# save_videos(image_list, DT, video_path=os.path.join(ckpt_dir, f'video{rollout_id}.mp4'))\n# success_rate = np.mean(np.array(highest_rewards) == env_max_reward)",
+ "type": "code",
+ "location": "/train_latent_model.py:280-302"
+ },
+ "585": {
+ "file_id": 32,
+ "content": "This code segment is tracking the reward, episode return, and highest reward during a rollout in a robotics environment. It also handles visualization by appending qpos and target_qpos to lists, and has options to save videos of the episodes. It prints the rollout results and calculates the success rate based on the highest rewards achieved.",
+ "type": "comment"
+ },
+ "586": {
+ "file_id": 32,
+ "content": "# avg_return = np.mean(episode_returns)\n# summary_str = f'\\nSuccess rate: {success_rate}\\nAverage return: {avg_return}\\n\\n'\n# for r in range(env_max_reward+1):\n# more_or_equal_r = (np.array(highest_rewards) >= r).sum()\n# more_or_equal_r_rate = more_or_equal_r / num_rollouts\n# summary_str += f'Reward >= {r}: {more_or_equal_r}/{num_rollouts} = {more_or_equal_r_rate*100}%\\n'\n# print(summary_str)\n# # save success rate to txt\n# result_file_name = 'result_' + ckpt_name.split('.')[0] + '.txt'\n# with open(os.path.join(ckpt_dir, result_file_name), 'w') as f:\n# f.write(summary_str)\n# f.write(repr(episode_returns))\n# f.write('\\n\\n')\n# f.write(repr(highest_rewards))\n# return success_rate, avg_return\ndef forward_pass(data, policy, latent_model):\n image_data, qpos_data, action_data, is_pad = data\n image_data, qpos_data, action_data, is_pad = image_data.cuda(), qpos_data.cuda(), action_data.cuda(), is_pad.cuda()\n forward_dict = {}",
+ "type": "code",
+ "location": "/train_latent_model.py:303-326"
+ },
+ "587": {
+ "file_id": 32,
+ "content": "The code calculates the success rate and average return for a set of rollouts in an environment. It then creates a summary string with reward thresholds, success rate, and average return, and writes it to a text file along with episode returns and highest rewards. The function is part of a larger codebase for training a latent model using policy and latent_model parameters.",
+ "type": "comment"
+ },
+ "588": {
+ "file_id": 32,
+ "content": " gt_labels = policy.vq_encode(qpos_data, action_data, is_pad)\n inputs = torch.cat([torch.zeros_like(gt_labels)[:, [0]], gt_labels[:, :-1]], dim=1)\n output_logits = latent_model(inputs)\n ce_loss = F.cross_entropy(output_logits, gt_labels)\n with torch.no_grad():\n output_labels = F.one_hot(torch.argmax(output_logits, dim=-1), num_classes=gt_labels.shape[-1]).float()\n # output_latents = F.softmax(output_logits, dim=-1)\n l1_error = F.l1_loss(output_labels, gt_labels, reduction='mean')\n # l1_errors = []\n # for i in range(l1_errors.shape[1]):\n # l1_errors.append(torch.mean(l1_errors[:, i]).item())\n forward_dict['loss'] = ce_loss\n forward_dict['l1_error'] = l1_error\n return forward_dict\ndef train_bc(train_dataloader, val_dataloader, config, ckpt_name):\n num_epochs = config['num_epochs']\n ckpt_dir = config['ckpt_dir']\n seed = config['seed']\n policy_class = config['policy_class']\n policy_config = config['policy_config']\n set_seed(seed)",
+ "type": "code",
+ "location": "/train_latent_model.py:327-353"
+ },
+ "589": {
+ "file_id": 32,
+ "content": "This code uses VQ-VAE to encode data, then feeds it into a latent model and calculates cross entropy loss. It also measures L1 error between output labels and ground truth labels for evaluation. The train_bc function trains the policy using a specified number of epochs with a given configuration and checkpoint directory.",
+ "type": "comment"
+ },
+ "590": {
+ "file_id": 32,
+ "content": " vq_dim = config['policy_config']['vq_dim']\n vq_class = config['policy_config']['vq_class']\n latent_model = Latent_Model_Transformer(vq_dim, vq_dim, vq_class)\n latent_model.cuda()\n ckpt_path = os.path.join(ckpt_dir, ckpt_name)\n policy = make_policy(policy_class, policy_config)\n loading_status = policy.load_state_dict(torch.load(ckpt_path))\n policy.eval()\n policy.cuda()\n optimizer = torch.optim.AdamW(latent_model.parameters(), lr=config['lr'])\n train_history = []\n validation_history = []\n min_val_loss = np.inf\n best_ckpt_info = None\n for epoch in tqdm(range(num_epochs)):\n print(f'\\nEpoch {epoch}')\n # validation\n with torch.inference_mode():\n latent_model.eval()\n epoch_dicts = []\n for batch_idx, data in enumerate(val_dataloader):\n forward_dict = forward_pass(data, policy, latent_model)\n epoch_dicts.append(forward_dict)\n epoch_summary = compute_dict_mean(epoch_dicts)\n validation_history.append(epoch_summary)",
+ "type": "code",
+ "location": "/train_latent_model.py:355-382"
+ },
+ "591": {
+ "file_id": 32,
+ "content": "This code initializes a latent model and policy, loads checkpoints for the policy, optimizes the latent model using AdamW, trains for specified number of epochs, and validates the performance at each epoch.",
+ "type": "comment"
+ },
+ "592": {
+ "file_id": 32,
+ "content": " epoch_val_loss = epoch_summary['loss']\n if epoch_val_loss < min_val_loss:\n min_val_loss = epoch_val_loss\n best_ckpt_info = (epoch, min_val_loss, deepcopy(latent_model.state_dict()))\n print(f'Val loss: {epoch_val_loss:.5f}')\n summary_string = ''\n for k, v in epoch_summary.items():\n summary_string += f'{k}: {v.item():.3f} '\n print(summary_string)\n # training\n optimizer.zero_grad()\n for batch_idx, data in enumerate(train_dataloader):\n forward_dict = forward_pass(data, policy, latent_model)\n # backward\n loss = forward_dict['loss']\n loss.backward()\n optimizer.step()\n optimizer.zero_grad()\n train_history.append(detach_dict(forward_dict))\n epoch_summary = compute_dict_mean(train_history[(batch_idx+1)*epoch:(batch_idx+1)*(epoch+1)])\n epoch_train_loss = epoch_summary['loss']\n print(f'Train loss: {epoch_train_loss:.5f}')",
+ "type": "code",
+ "location": "/train_latent_model.py:384-406"
+ },
+ "593": {
+ "file_id": 32,
+ "content": "This code is saving the best checkpoint, printing validation and training losses, iterating through dataloader for backpropagation, computing mean of dictionary values to get epoch summary, and storing it in a list.",
+ "type": "comment"
+ },
+ "594": {
+ "file_id": 32,
+ "content": " summary_string = ''\n for k, v in epoch_summary.items():\n summary_string += f'{k}: {v.item():.3f} '\n print(summary_string)\n if epoch % 100 == 0:\n ckpt_path = os.path.join(ckpt_dir, f'latent_model_epoch_{epoch}_seed_{seed}.ckpt')\n torch.save(latent_model.state_dict(), ckpt_path)\n plot_history(train_history, validation_history, epoch, ckpt_dir, seed)\n ckpt_path = os.path.join(ckpt_dir, f'latent_model_last.ckpt')\n torch.save(latent_model.state_dict(), ckpt_path)\n best_epoch, min_val_loss, best_state_dict = best_ckpt_info\n ckpt_path = os.path.join(ckpt_dir, f'latent_model_epoch_{best_epoch}_seed_{seed}.ckpt')\n torch.save(best_state_dict, ckpt_path)\n print(f'Training finished:\\nSeed {seed}, val loss {min_val_loss:.6f} at epoch {best_epoch}')\n # save training curves\n plot_history(train_history, validation_history, num_epochs, ckpt_dir, seed)\n return best_ckpt_info\ndef plot_history(train_history, validation_history, num_epochs, ckpt_dir, seed):",
+ "type": "code",
+ "location": "/train_latent_model.py:407-431"
+ },
+ "595": {
+ "file_id": 32,
+ "content": "The code snippet saves the latent model's state at each epoch, keeps track of the best checkpoint, and plots the training curves. It prints the final validation loss and epoch where it occurred.",
+ "type": "comment"
+ },
+ "596": {
+ "file_id": 32,
+ "content": " # save training curves\n for key in train_history[0]:\n plot_path = os.path.join(ckpt_dir, f'latent_model_val_{key}_seed_{seed}.png')\n plt.figure()\n train_values = [summary[key].item() for summary in train_history]\n val_values = [summary[key].item() for summary in validation_history]\n plt.plot(np.linspace(0, num_epochs-1, len(train_history)), train_values, label='train')\n plt.plot(np.linspace(0, num_epochs-1, len(validation_history)), val_values, label='validation')\n # plt.ylim([-0.1, 1])\n plt.tight_layout()\n plt.legend()\n plt.title(key)\n plt.savefig(plot_path)\n print(f'Saved plots to {ckpt_dir}')\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--eval', action='store_true')\n parser.add_argument('--onscreen_render', action='store_true')\n parser.add_argument('--ckpt_dir', action='store', type=str, help='ckpt_dir', required=True)\n parser.add_argument('--policy_class', action='store', type=str, help='policy_class, capitalize', required=True)",
+ "type": "code",
+ "location": "/train_latent_model.py:432-453"
+ },
+ "597": {
+ "file_id": 32,
+ "content": "This code saves training curves for a latent model and plots them. It iterates over keys in train_history, generates plots for each key (train and validation), and saves the plot to ckpt_dir with seed appended. The code also takes command-line arguments such as --eval, --onscreen_render, --ckpt_dir, and --policy_class.",
+ "type": "comment"
+ },
+ "598": {
+ "file_id": 32,
+ "content": " parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--batch_size', action='store', type=int, help='batch_size', required=True)\n parser.add_argument('--seed', action='store', type=int, help='seed', required=True)\n parser.add_argument('--num_epochs', action='store', type=int, help='num_epochs', required=True)\n parser.add_argument('--lr', action='store', type=float, help='lr', required=True)\n # for ACT\n parser.add_argument('--kl_weight', action='store', type=int, help='KL Weight', required=False)\n parser.add_argument('--chunk_size', action='store', type=int, help='chunk_size', required=False)\n parser.add_argument('--hidden_dim', action='store', type=int, help='hidden_dim', required=False)\n parser.add_argument('--dim_feedforward', action='store', type=int, help='dim_feedforward', required=False)\n parser.add_argument('--temporal_agg', action='store_true')\n parser.add_argument('--use_vq', action='store_true')",
+ "type": "code",
+ "location": "/train_latent_model.py:454-466"
+ },
+ "599": {
+ "file_id": 32,
+ "content": "This code defines command-line arguments for the program, specifying required and optional parameters such as task_name, batch_size, seed, num_epochs, lr, kl_weight, chunk_size, hidden_dim, dim_feedforward, and temporal_agg. These options allow the user to customize the training process of the latent model.",
+ "type": "comment"
+ }
+}
\ No newline at end of file
diff --git a/docs/data/6.json b/docs/data/6.json
new file mode 100644
index 00000000..6a126040
--- /dev/null
+++ b/docs/data/6.json
@@ -0,0 +1,547 @@
+{
+ "600": {
+ "file_id": 32,
+ "content": " parser.add_argument('--vq_class', action='store', type=int, help='vq_class')\n parser.add_argument('--vq_dim', action='store', type=int, help='vq_dim')\n main(vars(parser.parse_args()))",
+ "type": "code",
+ "location": "/train_latent_model.py:467-470"
+ },
+ "601": {
+ "file_id": 32,
+ "content": "This code is adding two arguments, \"--vq_class\" and \"--vq_dim\", to the parser using store action and specifying their types as integer (int). These arguments provide parameters for a latent model's class and dimensionality. The main function is then called with these parameters obtained from parsing command line arguments.",
+ "type": "comment"
+ },
+ "602": {
+ "file_id": 33,
+ "content": "/truncate_data.py",
+ "type": "filepath"
+ },
+ "603": {
+ "file_id": 33,
+ "content": "This script truncates and compresses a dataset using h5py, creating an observation group with limited image data. It saves truncated datasets or videos, extracts camera names, resizes images, and requires 'act-plus-plus' for argument parsing and directory manipulation. Output dataset directory has '_truncated' suffix.",
+ "type": "summary"
+ },
+ "604": {
+ "file_id": 33,
+ "content": "\"\"\"\nExample usage:\n$ python3 script/compress_data.py --dataset_dir /scr/lucyshi/dataset/aloha_test\n\"\"\"\nimport os\nimport h5py\nimport cv2\nimport numpy as np\nimport argparse\nfrom tqdm import tqdm\n# Constants\nDT = 0.02\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTATE_NAMES = JOINT_NAMES + [\"gripper\"]\nTRUNCATE_LEN = 2250\ndef compress_dataset(input_dataset_path, output_dataset_path):\n # Check if output path exists\n if os.path.exists(output_dataset_path):\n print(f\"The file {output_dataset_path} already exists. Exiting...\")\n return\n # Load the uncompressed dataset\n with h5py.File(input_dataset_path, 'r') as infile:\n # Create the compressed dataset\n with h5py.File(output_dataset_path, 'w') as outfile:\n outfile.attrs['sim'] = infile.attrs['sim']\n outfile.attrs['compress'] = True\n # Copy non-image data directly\n for key in infile.keys():\n if key != 'observations' and key != 'compress_len':",
+ "type": "code",
+ "location": "/truncate_data.py:1-35"
+ },
+ "605": {
+ "file_id": 33,
+ "content": "This script compresses a dataset by truncating its length and storing the compressed dataset in a new file. It checks if the output path already exists and copies non-image data directly to the output file. The script takes an input_dataset_path and an output_dataset_path as arguments, and it uses h5py library for handling HDF5 files.",
+ "type": "comment"
+ },
+ "606": {
+ "file_id": 33,
+ "content": " data = infile[key][:TRUNCATE_LEN]\n out_data = outfile.create_dataset(key, (TRUNCATE_LEN, data.shape[1]))\n out_data[:] = data\n data_compress_len = infile['compress_len']\n out_data_compress_len = outfile.create_dataset('compress_len', data_compress_len.shape)\n out_data_compress_len[:] = data_compress_len\n # Create observation group in the output\n obs_group = infile['observations']\n out_obs_group = outfile.create_group('observations')\n for key in obs_group.keys():\n if key != 'images':\n data = obs_group[key][:TRUNCATE_LEN]\n out_data = out_obs_group.create_dataset(key, (TRUNCATE_LEN, data.shape[1]))\n out_data[:] = data\n image_group = obs_group['images']\n out_image_group = out_obs_group.create_group('images')\n for cam_name in image_group.keys():\n data = image_group[cam_name][:TRUNCATE_LEN]",
+ "type": "code",
+ "location": "/truncate_data.py:36-57"
+ },
+ "607": {
+ "file_id": 33,
+ "content": "Truncates and compresses data, creates observation group with limited image data.",
+ "type": "comment"
+ },
+ "608": {
+ "file_id": 33,
+ "content": " out_data = out_image_group.create_dataset(cam_name, (TRUNCATE_LEN, data.shape[1]), dtype='uint8')\n out_data[:] = data\n print(f\"Truncated dataset saved to {output_dataset_path}\")\ndef save_videos(video, dt, video_path=None):\n if isinstance(video, list):\n cam_names = list(video[0].keys())\n h, w, _ = video[0][cam_names[0]].shape\n w = w * len(cam_names)\n fps = int(1/dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n # bitrate = 1000000\n # out.set(cv2.VIDEOWRITER_PROP_BITRATE, bitrate)\n for ts, image_dict in enumerate(video):\n images = []\n for cam_name in cam_names:\n image = image_dict[cam_name]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n images.append(image)\n images = np.concatenate(images, axis=1)\n out.write(images)\n out.release()\n print(f'Saved video to: {video_path}')\n elif isinstance(video, dict):",
+ "type": "code",
+ "location": "/truncate_data.py:58-84"
+ },
+ "609": {
+ "file_id": 33,
+ "content": "This code saves a truncated dataset or video depending on the input format. If a list of videos is given, it extracts camera names, resizes the images, and concatenates them into a single video. It then writes the video to the specified path and prints a success message.",
+ "type": "comment"
+ },
+ "610": {
+ "file_id": 33,
+ "content": " cam_names = list(video.keys())\n # Remove depth images\n cam_names = [cam_name for cam_name in cam_names if '_depth' not in cam_name]\n all_cam_videos = []\n for cam_name in cam_names:\n all_cam_videos.append(video[cam_name])\n all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension\n n_frames, h, w, _ = all_cam_videos.shape\n fps = int(1 / dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n for t in range(n_frames):\n image = all_cam_videos[t]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n out.write(image)\n out.release()\n print(f'Saved video to: {video_path}')\ndef load_and_save_first_episode_video(dataset_dir, video_path):\n dataset_name = 'episode_0'\n _, _, _, _, image_dict = load_hdf5(dataset_dir, dataset_name)\n save_videos(image_dict, DT, video_path=video_path)\ndef load_hdf5(dataset_dir, dataset_name):\n dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')",
+ "type": "code",
+ "location": "/truncate_data.py:85-111"
+ },
+ "611": {
+ "file_id": 33,
+ "content": "The code loads and saves a video from an HDF5 file. It first removes depth images, concatenates the remaining videos along the width dimension, converts the BGR image to RGB, then writes the video to a specified path at the given frame rate. The function `load_and_save_first_episode_video` calls other functions to load the dataset and save the video.",
+ "type": "comment"
+ },
+ "612": {
+ "file_id": 33,
+ "content": " if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n compressed = root.attrs.get('compress', False)\n image_dict = dict()\n for cam_name in root[f'/observations/images/'].keys():\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]\n if compressed:\n compress_len = root['/compress_len'][()]\n if compressed:\n for cam_id, cam_name in enumerate(image_dict.keys()):\n padded_compressed_image_list = image_dict[cam_name]\n image_list = []\n for frame_id, padded_compressed_image in enumerate(padded_compressed_image_list):\n image_len = int(compress_len[cam_id, frame_id])\n compressed_image = padded_compressed_image\n image = cv2.imdecode(compressed_image, 1)\n image_list.append(image)\n image_dict[cam_name] = image_list\n return None, None, None, None, image_dict # Return only the image dict for this application",
+ "type": "code",
+ "location": "/truncate_data.py:112-135"
+ },
+ "613": {
+ "file_id": 33,
+ "content": "This code checks if a dataset exists and reads the compressed image data from it. If compression is enabled, it decompresses the images and stores them in an image dictionary for further processing. The function returns only the image dictionary as the output.",
+ "type": "comment"
+ },
+ "614": {
+ "file_id": 33,
+ "content": "if __name__ == '__main__':\n parser = argparse.ArgumentParser(description=\"Compress all HDF5 datasets in a directory.\")\n parser.add_argument('--dataset_dir', action='store', type=str, required=True, help='Directory containing the uncompressed datasets.')\n args = parser.parse_args()\n output_dataset_dir = args.dataset_dir + '_truncated'\n os.makedirs(output_dataset_dir, exist_ok=True)\n # Iterate over each file in the directory\n for filename in tqdm(os.listdir(args.dataset_dir), desc=\"Truncating data\"):\n if filename.endswith('.hdf5'):\n input_path = os.path.join(args.dataset_dir, filename)\n output_path = os.path.join(output_dataset_dir, filename)\n compress_dataset(input_path, output_path)\n # After processing all datasets, load and save the video for the first episode\n print(f'Saving video for episode 0 in {output_dataset_dir}')\n video_path = os.path.join(output_dataset_dir, 'episode_0_video.mp4')\n load_and_save_first_episode_video(output_dataset_dir, video_path)",
+ "type": "code",
+ "location": "/truncate_data.py:138-157"
+ },
+ "615": {
+ "file_id": 33,
+ "content": "This code compresses all HDF5 datasets in a specified directory and saves the video for the first episode. It requires the 'act-plus-plus' library and utilizes argument parsing, file iteration, and directory creation/manipulation. The output dataset directory is created as a suffix of the input dataset directory with '_truncated'.",
+ "type": "comment"
+ },
+ "616": {
+ "file_id": 34,
+ "content": "/utils.py",
+ "type": "filepath"
+ },
+ "617": {
+ "file_id": 34,
+ "content": "EpisodicDataset class processes data, applies augmentations, handles legacy data, and provides torch tensor compatibility for model usage. It loads images, creates masks, retrieves stats, and includes functions for locating HDF5 files, generating batches, pre/post-processing, sampling poses, calculating means, and setting random seeds.",
+ "type": "summary"
+ },
+ "618": {
+ "file_id": 34,
+ "content": "import numpy as np\nimport torch\nimport os\nimport h5py\nimport pickle\nimport fnmatch\nimport cv2\nfrom time import time\nfrom torch.utils.data import TensorDataset, DataLoader\nimport torchvision.transforms as transforms\nimport IPython\ne = IPython.embed\ndef flatten_list(l):\n return [item for sublist in l for item in sublist]\nclass EpisodicDataset(torch.utils.data.Dataset):\n def __init__(self, dataset_path_list, camera_names, norm_stats, episode_ids, episode_len, chunk_size, policy_class):\n super(EpisodicDataset).__init__()\n self.episode_ids = episode_ids\n self.dataset_path_list = dataset_path_list\n self.camera_names = camera_names\n self.norm_stats = norm_stats\n self.episode_len = episode_len\n self.chunk_size = chunk_size\n self.cumulative_len = np.cumsum(self.episode_len)\n self.max_episode_len = max(episode_len)\n self.policy_class = policy_class\n if self.policy_class == 'Diffusion':\n self.augment_images = True\n else:\n self.augment_images = False",
+ "type": "code",
+ "location": "/utils.py:1-33"
+ },
+ "619": {
+ "file_id": 34,
+ "content": "Class EpisodicDataset loads episode data from a list of paths. It can optionally augment images depending on the chosen policy class. The dataset is initialized with the given parameters, including the number of episodes, their IDs, and lengths. It calculates the cumulative length of episodes and checks if the policy class is \"Diffusion\" to determine whether or not to apply image augmentations.",
+ "type": "comment"
+ },
+ "620": {
+ "file_id": 34,
+ "content": " self.transformations = None\n self.__getitem__(0) # initialize self.is_sim and self.transformations\n self.is_sim = False\n # def __len__(self):\n # return sum(self.episode_len)\n def _locate_transition(self, index):\n assert index < self.cumulative_len[-1]\n episode_index = np.argmax(self.cumulative_len > index) # argmax returns first True index\n start_ts = index - (self.cumulative_len[episode_index] - self.episode_len[episode_index])\n episode_id = self.episode_ids[episode_index]\n return episode_id, start_ts\n def __getitem__(self, index):\n episode_id, start_ts = self._locate_transition(index)\n dataset_path = self.dataset_path_list[episode_id]\n try:\n # print(dataset_path)\n with h5py.File(dataset_path, 'r') as root:\n try: # some legacy data does not have this attribute\n is_sim = root.attrs['sim']\n except:\n is_sim = False\n compressed = root.attrs.get('compress', False)",
+ "type": "code",
+ "location": "/utils.py:34-58"
+ },
+ "621": {
+ "file_id": 34,
+ "content": "This code initializes transformations and is_sim, defines a function to locate transition based on index, and gets item at specified index by locating the transition using episode ID and start timestamp. It also handles legacy data without certain attributes.",
+ "type": "comment"
+ },
+ "622": {
+ "file_id": 34,
+ "content": " if '/base_action' in root:\n base_action = root['/base_action'][()]\n base_action = preprocess_base_action(base_action)\n action = np.concatenate([root['/action'][()], base_action], axis=-1)\n else: \n action = root['/action'][()]\n dummy_base_action = np.zeros([action.shape[0], 2])\n action = np.concatenate([action, dummy_base_action], axis=-1)\n original_action_shape = action.shape\n episode_len = original_action_shape[0]\n # get observation at start_ts only\n qpos = root['/observations/qpos'][start_ts]\n qvel = root['/observations/qvel'][start_ts]\n image_dict = dict()\n for cam_name in self.camera_names:\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][start_ts]\n if compressed:\n for cam_name in image_dict.keys():",
+ "type": "code",
+ "location": "/utils.py:59-77"
+ },
+ "623": {
+ "file_id": 34,
+ "content": "This code block is for processing the input data based on whether a base action is specified or not. If it exists, the base action is preprocessed and concatenated with the given action, otherwise a dummy base action is added before concatenation. It also stores the initial observation and image data at the start timestamp.",
+ "type": "comment"
+ },
+ "624": {
+ "file_id": 34,
+ "content": " decompressed_image = cv2.imdecode(image_dict[cam_name], 1)\n image_dict[cam_name] = np.array(decompressed_image)\n # get all actions after and including start_ts\n if is_sim:\n action = action[start_ts:]\n action_len = episode_len - start_ts\n else:\n action = action[max(0, start_ts - 1):] # hack, to make timesteps more aligned\n action_len = episode_len - max(0, start_ts - 1) # hack, to make timesteps more aligned\n # self.is_sim = is_sim\n padded_action = np.zeros((self.max_episode_len, original_action_shape[1]), dtype=np.float32)\n padded_action[:action_len] = action\n is_pad = np.zeros(self.max_episode_len)\n is_pad[action_len:] = 1\n padded_action = padded_action[:self.chunk_size]\n is_pad = is_pad[:self.chunk_size]\n # new axis for different cameras\n all_cam_images = []",
+ "type": "code",
+ "location": "/utils.py:78-99"
+ },
+ "625": {
+ "file_id": 34,
+ "content": "This code segment is preprocessing video data for an agent in a simulation. It loads and decompresses images from the dictionary, adjusts actions based on timestamps, pads actions to match the maximum episode length, creates a padding mask, and stores camera images into a list.",
+ "type": "comment"
+ },
+ "626": {
+ "file_id": 34,
+ "content": " for cam_name in self.camera_names:\n all_cam_images.append(image_dict[cam_name])\n all_cam_images = np.stack(all_cam_images, axis=0)\n # construct observations\n image_data = torch.from_numpy(all_cam_images)\n qpos_data = torch.from_numpy(qpos).float()\n action_data = torch.from_numpy(padded_action).float()\n is_pad = torch.from_numpy(is_pad).bool()\n # channel last\n image_data = torch.einsum('k h w c -> k c h w', image_data)\n # augmentation\n if self.transformations is None:\n print('Initializing transformations')\n original_size = image_data.shape[2:]\n ratio = 0.95\n self.transformations = [\n transforms.RandomCrop(size=[int(original_size[0] * ratio), int(original_size[1] * ratio)]),\n transforms.Resize(original_size, antialias=True),\n transforms.RandomRotation(degrees=[-5.0, 5.0], expand=False),",
+ "type": "code",
+ "location": "/utils.py:100-121"
+ },
+ "627": {
+ "file_id": 34,
+ "content": "The code reads images from multiple camera sources, stacks them into a single numpy array, and converts the arrays to torch tensors. It then rearranges the image tensor's dimensions for compatibility with the model, applies optional augmentations such as cropping and rotation, and assigns boolean values to indicate padding positions.",
+ "type": "comment"
+ },
+ "628": {
+ "file_id": 34,
+ "content": " transforms.ColorJitter(brightness=0.3, contrast=0.4, saturation=0.5) #, hue=0.08)\n ]\n if self.augment_images:\n for transform in self.transformations:\n image_data = transform(image_data)\n # normalize image and change dtype to float\n image_data = image_data / 255.0\n if self.policy_class == 'Diffusion':\n # normalize to [-1, 1]\n action_data = ((action_data - self.norm_stats[\"action_min\"]) / (self.norm_stats[\"action_max\"] - self.norm_stats[\"action_min\"])) * 2 - 1\n else:\n # normalize to mean 0 std 1\n action_data = (action_data - self.norm_stats[\"action_mean\"]) / self.norm_stats[\"action_std\"]\n qpos_data = (qpos_data - self.norm_stats[\"qpos_mean\"]) / self.norm_stats[\"qpos_std\"]\n except:\n print(f'Error loading {dataset_path} in __getitem__')\n quit()\n # print(image_data.dtype, qpos_data.dtype, action_data.dtype, is_pad.dtype)",
+ "type": "code",
+ "location": "/utils.py:122-145"
+ },
+ "629": {
+ "file_id": 34,
+ "content": "The code applies transformations to image data, normalizes the image and action data based on policy class, and adjusts qpos data based on mean and std. It also handles any potential errors while loading the dataset.",
+ "type": "comment"
+ },
+ "630": {
+ "file_id": 34,
+ "content": " return image_data, qpos_data, action_data, is_pad\ndef get_norm_stats(dataset_path_list):\n all_qpos_data = []\n all_action_data = []\n all_episode_len = []\n for dataset_path in dataset_path_list:\n try:\n with h5py.File(dataset_path, 'r') as root:\n qpos = root['/observations/qpos'][()]\n qvel = root['/observations/qvel'][()]\n if '/base_action' in root:\n base_action = root['/base_action'][()]\n base_action = preprocess_base_action(base_action)\n action = np.concatenate([root['/action'][()], base_action], axis=-1)\n else:\n action = root['/action'][()]\n dummy_base_action = np.zeros([action.shape[0], 2])\n action = np.concatenate([action, dummy_base_action], axis=-1)\n except Exception as e:\n print(f'Error loading {dataset_path} in get_norm_stats')\n print(e)\n quit()\n all_qpos_data.append(torch.from_numpy(qpos))",
+ "type": "code",
+ "location": "/utils.py:146-171"
+ },
+ "631": {
+ "file_id": 34,
+ "content": "This function, \"get_norm_stats\", takes a list of dataset paths and returns image data, qpos data, action data, and an indicator whether the pad is needed or not. It first initializes empty lists for all_qpos_data, all_action_data, and all_episode_len. Then, it iterates over each dataset path in the list. For each path, it opens the HDF5 file using 'r' mode and extracts qpos and qvel data from specific paths within the file. If a '/base_action' path exists, it retrieves base_action data and preprocesses it before concatenating with action data. Otherwise, it assumes dummy base_action and performs concatenation. The extracted data is appended to their respective lists, but if an error occurs during loading, the function prints an error message and quits.",
+ "type": "comment"
+ },
+ "632": {
+ "file_id": 34,
+ "content": " all_action_data.append(torch.from_numpy(action))\n all_episode_len.append(len(qpos))\n all_qpos_data = torch.cat(all_qpos_data, dim=0)\n all_action_data = torch.cat(all_action_data, dim=0)\n # normalize action data\n action_mean = all_action_data.mean(dim=[0]).float()\n action_std = all_action_data.std(dim=[0]).float()\n action_std = torch.clip(action_std, 1e-2, np.inf) # clipping\n # normalize qpos data\n qpos_mean = all_qpos_data.mean(dim=[0]).float()\n qpos_std = all_qpos_data.std(dim=[0]).float()\n qpos_std = torch.clip(qpos_std, 1e-2, np.inf) # clipping\n action_min = all_action_data.min(dim=0).values.float()\n action_max = all_action_data.max(dim=0).values.float()\n eps = 0.0001\n stats = {\"action_mean\": action_mean.numpy(), \"action_std\": action_std.numpy(),\n \"action_min\": action_min.numpy() - eps,\"action_max\": action_max.numpy() + eps,\n \"qpos_mean\": qpos_mean.numpy(), \"qpos_std\": qpos_std.numpy(),\n \"example_qpos\": qpos}\n return stats, all_episode_len",
+ "type": "code",
+ "location": "/utils.py:172-196"
+ },
+ "633": {
+ "file_id": 34,
+ "content": "This code is processing and normalizing data for training in a machine learning context. It appends action and qpos data, normalizes the action and qpos data by calculating their means, standard deviations, and clipping them to avoid large values, and stores these statistics along with minimum and maximum action values and an example qpos. Finally, it returns these statistics and all episode lengths.",
+ "type": "comment"
+ },
+ "634": {
+ "file_id": 34,
+ "content": "def find_all_hdf5(dataset_dir, skip_mirrored_data):\n hdf5_files = []\n for root, dirs, files in os.walk(dataset_dir):\n for filename in fnmatch.filter(files, '*.hdf5'):\n if 'features' in filename: continue\n if skip_mirrored_data and 'mirror' in filename:\n continue\n hdf5_files.append(os.path.join(root, filename))\n print(f'Found {len(hdf5_files)} hdf5 files')\n return hdf5_files\ndef BatchSampler(batch_size, episode_len_l, sample_weights):\n sample_probs = np.array(sample_weights) / np.sum(sample_weights) if sample_weights is not None else None\n sum_dataset_len_l = np.cumsum([0] + [np.sum(episode_len) for episode_len in episode_len_l])\n while True:\n batch = []\n for _ in range(batch_size):\n episode_idx = np.random.choice(len(episode_len_l), p=sample_probs)\n step_idx = np.random.randint(sum_dataset_len_l[episode_idx], sum_dataset_len_l[episode_idx + 1])\n batch.append(step_idx)\n yield batch",
+ "type": "code",
+ "location": "/utils.py:198-218"
+ },
+ "635": {
+ "file_id": 34,
+ "content": "The code provides two functions: \"find_all_hdf5\" and \"BatchSampler\". The first function searches for all HDF5 files in a specified directory, excluding any with 'features' in their name or 'mirror' if skipping mirrored data is set. It then returns the list of found files. The second function, BatchSampler, generates batches of samples from a list of episode lengths and sample weights (if provided). It randomly selects an episode, a step within that episode, and appends it to the batch until the desired batch size is reached.",
+ "type": "comment"
+ },
+ "636": {
+ "file_id": 34,
+ "content": "def load_data(dataset_dir_l, name_filter, camera_names, batch_size_train, batch_size_val, chunk_size, skip_mirrored_data=False, load_pretrain=False, policy_class=None, stats_dir_l=None, sample_weights=None, train_ratio=0.99):\n if type(dataset_dir_l) == str:\n dataset_dir_l = [dataset_dir_l]\n dataset_path_list_list = [find_all_hdf5(dataset_dir, skip_mirrored_data) for dataset_dir in dataset_dir_l]\n num_episodes_0 = len(dataset_path_list_list[0])\n dataset_path_list = flatten_list(dataset_path_list_list)\n dataset_path_list = [n for n in dataset_path_list if name_filter(n)]\n num_episodes_l = [len(dataset_path_list) for dataset_path_list in dataset_path_list_list]\n num_episodes_cumsum = np.cumsum(num_episodes_l)\n # obtain train test split on dataset_dir_l[0]\n shuffled_episode_ids_0 = np.random.permutation(num_episodes_0)\n train_episode_ids_0 = shuffled_episode_ids_0[:int(train_ratio * num_episodes_0)]\n val_episode_ids_0 = shuffled_episode_ids_0[int(train_ratio * num_episodes_0):]",
+ "type": "code",
+ "location": "/utils.py:220-233"
+ },
+ "637": {
+ "file_id": 34,
+ "content": "This function loads data from one or multiple directories, applying a name filter and splitting the data into training and validation sets. It also supports skipping mirrored data and loading pre-trained data. The train/val split is done based on a provided ratio, and the data is shuffled randomly before splitting.",
+ "type": "comment"
+ },
+ "638": {
+ "file_id": 34,
+ "content": " train_episode_ids_l = [train_episode_ids_0] + [np.arange(num_episodes) + num_episodes_cumsum[idx] for idx, num_episodes in enumerate(num_episodes_l[1:])]\n val_episode_ids_l = [val_episode_ids_0]\n train_episode_ids = np.concatenate(train_episode_ids_l)\n val_episode_ids = np.concatenate(val_episode_ids_l)\n print(f'\\n\\nData from: {dataset_dir_l}\\n- Train on {[len(x) for x in train_episode_ids_l]} episodes\\n- Test on {[len(x) for x in val_episode_ids_l]} episodes\\n\\n')\n # obtain normalization stats for qpos and action\n # if load_pretrain:\n # with open(os.path.join('/home/zfu/interbotix_ws/src/act/ckpts/pretrain_all', 'dataset_stats.pkl'), 'rb') as f:\n # norm_stats = pickle.load(f)\n # print('Loaded pretrain dataset stats')\n _, all_episode_len = get_norm_stats(dataset_path_list)\n train_episode_len_l = [[all_episode_len[i] for i in train_episode_ids] for train_episode_ids in train_episode_ids_l]\n val_episode_len_l = [[all_episode_len[i] for i in val_episode_ids] for val_episode_ids in val_episode_ids_l]",
+ "type": "code",
+ "location": "/utils.py:234-247"
+ },
+ "639": {
+ "file_id": 34,
+ "content": "Code generates train and validation episode IDs for multiple datasets, concatenates them, and prints details about the data. It also loads normalization stats for qpos and action (if load_pretrain is True) from a specific file path. The code then calculates the length of each episode for training and validation sets based on all_episode_len list.",
+ "type": "comment"
+ },
+ "640": {
+ "file_id": 34,
+ "content": " train_episode_len = flatten_list(train_episode_len_l)\n val_episode_len = flatten_list(val_episode_len_l)\n if stats_dir_l is None:\n stats_dir_l = dataset_dir_l\n elif type(stats_dir_l) == str:\n stats_dir_l = [stats_dir_l]\n norm_stats, _ = get_norm_stats(flatten_list([find_all_hdf5(stats_dir, skip_mirrored_data) for stats_dir in stats_dir_l]))\n print(f'Norm stats from: {stats_dir_l}')\n batch_sampler_train = BatchSampler(batch_size_train, train_episode_len_l, sample_weights)\n batch_sampler_val = BatchSampler(batch_size_val, val_episode_len_l, None)\n # print(f'train_episode_len: {train_episode_len}, val_episode_len: {val_episode_len}, train_episode_ids: {train_episode_ids}, val_episode_ids: {val_episode_ids}')\n # construct dataset and dataloader\n train_dataset = EpisodicDataset(dataset_path_list, camera_names, norm_stats, train_episode_ids, train_episode_len, chunk_size, policy_class)\n val_dataset = EpisodicDataset(dataset_path_list, camera_names, norm_stats, val_episode_ids, val_episode_len, chunk_size, policy_class)",
+ "type": "code",
+ "location": "/utils.py:248-264"
+ },
+ "641": {
+ "file_id": 34,
+ "content": "This code block initializes training and validation episode lengths, checks the stats directory type, fetches normalization statistics from HDF5 files, creates batch samplers for training and validation sets, constructs EpisodicDataset instances for training and validation data.",
+ "type": "comment"
+ },
+ "642": {
+ "file_id": 34,
+ "content": " train_num_workers = (8 if os.getlogin() == 'zfu' else 16) if train_dataset.augment_images else 2\n val_num_workers = 8 if train_dataset.augment_images else 2\n print(f'Augment images: {train_dataset.augment_images}, train_num_workers: {train_num_workers}, val_num_workers: {val_num_workers}')\n train_dataloader = DataLoader(train_dataset, batch_sampler=batch_sampler_train, pin_memory=True, num_workers=train_num_workers, prefetch_factor=2)\n val_dataloader = DataLoader(val_dataset, batch_sampler=batch_sampler_val, pin_memory=True, num_workers=val_num_workers, prefetch_factor=2)\n return train_dataloader, val_dataloader, norm_stats, train_dataset.is_sim\ndef calibrate_linear_vel(base_action, c=None):\n if c is None:\n c = 0.0 # 0.19\n v = base_action[..., 0]\n w = base_action[..., 1]\n base_action = base_action.copy()\n base_action[..., 0] = v - c * w\n return base_action\ndef smooth_base_action(base_action):\n return np.stack([\n np.convolve(base_action[:, i], np.ones(5)/5, mode='same') for i in range(base_action.shape[1])",
+ "type": "code",
+ "location": "/utils.py:265-284"
+ },
+ "643": {
+ "file_id": 34,
+ "content": "This code sets the number of workers for training and validation data loaders based on whether images are being augmented or not. It also defines a function to calibrate linear velocity, smooths the base action using convolution with a moving average filter, and returns the train and validation dataloaders along with other variables.",
+ "type": "comment"
+ },
+ "644": {
+ "file_id": 34,
+ "content": " ], axis=-1).astype(np.float32)\ndef preprocess_base_action(base_action):\n # base_action = calibrate_linear_vel(base_action)\n base_action = smooth_base_action(base_action)\n return base_action\ndef postprocess_base_action(base_action):\n linear_vel, angular_vel = base_action\n linear_vel *= 1.0\n angular_vel *= 1.0\n # angular_vel = 0\n # if np.abs(linear_vel) < 0.05:\n # linear_vel = 0\n return np.array([linear_vel, angular_vel])\n### env utils\ndef sample_box_pose():\n x_range = [0.0, 0.2]\n y_range = [0.4, 0.6]\n z_range = [0.05, 0.05]\n ranges = np.vstack([x_range, y_range, z_range])\n cube_position = np.random.uniform(ranges[:, 0], ranges[:, 1])\n cube_quat = np.array([1, 0, 0, 0])\n return np.concatenate([cube_position, cube_quat])\ndef sample_insertion_pose():\n # Peg\n x_range = [0.1, 0.2]\n y_range = [0.4, 0.6]\n z_range = [0.05, 0.05]\n ranges = np.vstack([x_range, y_range, z_range])\n peg_position = np.random.uniform(ranges[:, 0], ranges[:, 1])\n peg_quat = np.array([1, 0, 0, 0])",
+ "type": "code",
+ "location": "/utils.py:285-324"
+ },
+ "645": {
+ "file_id": 34,
+ "content": "This code defines several functions for preprocessing and postprocessing base actions, as well as sampling random poses for objects. It uses numpy array manipulations and random sampling to accomplish these tasks. The calibration and smoothing of the base action are used to refine input data before it is passed on or returned from a function. The two pose-sampling functions generate random positions and orientations for an object (cube or peg) within specified ranges.",
+ "type": "comment"
+ },
+ "646": {
+ "file_id": 34,
+ "content": " peg_pose = np.concatenate([peg_position, peg_quat])\n # Socket\n x_range = [-0.2, -0.1]\n y_range = [0.4, 0.6]\n z_range = [0.05, 0.05]\n ranges = np.vstack([x_range, y_range, z_range])\n socket_position = np.random.uniform(ranges[:, 0], ranges[:, 1])\n socket_quat = np.array([1, 0, 0, 0])\n socket_pose = np.concatenate([socket_position, socket_quat])\n return peg_pose, socket_pose\n### helper functions\ndef compute_dict_mean(epoch_dicts):\n result = {k: None for k in epoch_dicts[0]}\n num_items = len(epoch_dicts)\n for k in result:\n value_sum = 0\n for epoch_dict in epoch_dicts:\n value_sum += epoch_dict[k]\n result[k] = value_sum / num_items\n return result\ndef detach_dict(d):\n new_d = dict()\n for k, v in d.items():\n new_d[k] = v.detach()\n return new_d\ndef set_seed(seed):\n torch.manual_seed(seed)\n np.random.seed(seed)",
+ "type": "code",
+ "location": "/utils.py:325-360"
+ },
+ "647": {
+ "file_id": 34,
+ "content": "Function: compute_dict_mean\nPurpose: Calculate the mean of values for each key in a list of dictionaries.\n\nFunction: detach_dict\nPurpose: Create a new dictionary where all values are detached from their current computation graph.\n\nFunction: set_seed\nPurpose: Set random seed for both PyTorch and NumPy to ensure reproducible results.",
+ "type": "comment"
+ },
+ "648": {
+ "file_id": 35,
+ "content": "/vinn_cache_feature.py",
+ "type": "filepath"
+ },
+ "649": {
+ "file_id": 35,
+ "content": "The code imports libraries, sets parameters, initializes models and preprocesses images for feature extraction. It performs inference, saves features to an HDF5 file, converts tensors to NumPy arrays, and prints the total time taken using argument parser.",
+ "type": "summary"
+ },
+ "650": {
+ "file_id": 35,
+ "content": "import torch\nimport argparse\nimport pathlib\nfrom torch import nn\nimport torchvision\nimport os\nimport time\nimport h5py\nimport h5py\nfrom torchvision import models, transforms\nfrom PIL import Image\nfrom tqdm import tqdm\nimport cv2\nimport numpy as np\nimport IPython\ne = IPython.embed\ndef chunks(lst, n):\n \"\"\"Yield successive n-sized chunks from lst.\"\"\"\n for i in range(0, len(lst), n):\n yield lst[i:i + n]\ndef expand_greyscale(t):\n return t.expand(3, -1, -1)\ndef main(args):\n #################################################\n batch_size = 256\n #################################################\n ckpt_path = args.ckpt_path\n dataset_dir = args.dataset_dir\n ckpt_name = pathlib.PurePath(ckpt_path).name\n dataset_name = ckpt_name.split('-')[1]\n repr_type = ckpt_name.split('-')[0]\n seed = int(ckpt_name.split('-')[-1][:-3])\n if 'cotrain' in ckpt_name:\n repr_type += '_cotrain'\n episode_idxs = [int(name.split('_')[1].split('.')[0]) for name in os.listdir(dataset_dir) if ('.hdf5' in name) and ('features' not in name)]",
+ "type": "code",
+ "location": "/vinn_cache_feature.py:1-44"
+ },
+ "651": {
+ "file_id": 35,
+ "content": "This code imports necessary libraries, defines functions for chunking lists and expanding greyscale images, and sets parameters such as batch size. It also takes command-line arguments for the checkpoint path and dataset directory, extracts relevant information from the checkpoint name, and lists all episode indexes in the dataset.",
+ "type": "comment"
+ },
+ "652": {
+ "file_id": 35,
+ "content": " episode_idxs.sort()\n assert len(episode_idxs) == episode_idxs[-1] + 1 # no holes\n num_episodes = len(episode_idxs)\n feature_extractors = {}\n for episode_idx in range(num_episodes):\n # load all images\n print(f'loading data')\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_idx}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n image_dict = {}\n camera_names = list(root[f'/observations/images/'].keys())\n print(f'Camera names: {camera_names}')\n for cam_name in camera_names:\n image = root[f'/observations/images/{cam_name}'][:]\n uncompressed_image = []\n for im in image:\n im = np.array(cv2.imdecode(im, 1))\n uncompressed_image.append(im)\n image = np.stack(uncompressed_image, axis=0)\n image_dict[cam_name] = image\n print(f'loading model')\n # load pretrain nets after cam names are known\n if not feature_extractors:",
+ "type": "code",
+ "location": "/vinn_cache_feature.py:45-72"
+ },
+ "653": {
+ "file_id": 35,
+ "content": "Loading data and models for each episode, ensuring no holes in the episode indices, and creating feature extractors. The code first checks if there are any existing feature extractors, then loads images and models for each camera name within the dataset, and stores them in a dictionary.",
+ "type": "comment"
+ },
+ "654": {
+ "file_id": 35,
+ "content": " for cam_name in camera_names:\n resnet = torchvision.models.resnet18(pretrained=True)\n loading_status = resnet.load_state_dict(torch.load(ckpt_path.replace('DUMMY', cam_name)))\n print(cam_name, loading_status)\n resnet = nn.Sequential(*list(resnet.children())[:-1])\n resnet = resnet.cuda()\n resnet.eval()\n feature_extractors[cam_name] = resnet\n # inference with resnet\n feature_dict = {}\n for cam_name, images in image_dict.items():\n # Preprocess images\n image_size = 120 # TODO NOTICE: reduced resolution\n transform = transforms.Compose([\n transforms.Resize(image_size), # will scale the image\n transforms.CenterCrop(image_size),\n transforms.ToTensor(),\n transforms.Lambda(expand_greyscale),\n transforms.Normalize(\n mean=torch.tensor([0.485, 0.456, 0.406]),",
+ "type": "code",
+ "location": "/vinn_cache_feature.py:73-93"
+ },
+ "655": {
+ "file_id": 35,
+ "content": "This code initializes a ResNet18 model for each camera name, loads the checkpoint file with the corresponding camera name, modifies the model, and stores it in feature_extractors. Then, it preprocesses images using specified transforms and normalization before passing them to the model for inference.",
+ "type": "comment"
+ },
+ "656": {
+ "file_id": 35,
+ "content": " std=torch.tensor([0.229, 0.224, 0.225])),\n ])\n processed_images = []\n for image in tqdm(images):\n image = Image.fromarray(image)\n image = transform(image)\n processed_images.append(image)\n processed_images = torch.stack(processed_images).cuda()\n # query the model\n all_features = []\n with torch.inference_mode():\n for batch in chunks(processed_images, batch_size):\n print('inference')\n features = feature_extractors[cam_name](batch)\n features = features.squeeze(axis=3).squeeze(axis=2)\n all_features.append(features)\n all_features = torch.cat(all_features, axis=0)\n max_timesteps = all_features.shape[0]\n feature_dict[cam_name] = all_features\n # TODO START diagnostics\n # first_image = images[0]\n # first_processed_image = processed_images[0].cpu().numpy()",
+ "type": "code",
+ "location": "/vinn_cache_feature.py:94-117"
+ },
+ "657": {
+ "file_id": 35,
+ "content": "This code processes images, queries a model for features, and stores the extracted features in a dictionary. It uses torch.tensor for standardization, Image.fromarray to convert image to PIL image, transforms images, stacks them, performs inference mode, extracts features from each batch of processed images, concatenates them into all_features list, and finally stores them in feature_dict.",
+ "type": "comment"
+ },
+ "658": {
+ "file_id": 35,
+ "content": " # first_feature = all_features[0].cpu().numpy()\n # import numpy as np\n # np.save('first_image.npy', first_image)\n # np.save('first_processed_image.npy', first_processed_image)\n # np.save('first_feature.npy', first_feature)\n # torch.save(resnet.state_dict(), 'rn.ckpt')\n # e()\n # exit()\n # TODO END diagnostics\n # save\n dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_idx}.hdf5')\n print(dataset_path)\n # HDF5\n t0 = time.time()\n with h5py.File(dataset_path, 'w', rdcc_nbytes=1024 ** 2 * 2) as root:\n features = root.create_group('features')\n for cam_name, array in feature_dict.items():\n cam_feature = features.create_dataset(cam_name, (max_timesteps, 512))\n features[cam_name][...] = array.cpu().numpy()\n print(f'Saving: {time.time() - t0:.1f} secs\\n')\nif __name__ == '__main__':",
+ "type": "code",
+ "location": "/vinn_cache_feature.py:118-142"
+ },
+ "659": {
+ "file_id": 35,
+ "content": "The code is saving features to an HDF5 file. It creates a group called 'features' within the file and then saves feature data for each camera name in the feature_dict as datasets within the 'features' group. The feature data is converted from PyTorch tensors to NumPy arrays before being saved, and the total time taken to save the features is printed.",
+ "type": "comment"
+ },
+ "660": {
+ "file_id": 35,
+ "content": " parser = argparse.ArgumentParser(description='cache features')\n parser.add_argument('--ckpt_path', type=str, required=True, help='ckpt_path')\n parser.add_argument('--dataset_dir', type=str, required=True, help='dataset_dir')\n args = parser.parse_args()\n main(args)",
+ "type": "code",
+ "location": "/vinn_cache_feature.py:143-148"
+ },
+ "661": {
+ "file_id": 35,
+ "content": "This code sets up an argument parser, adds arguments for ckpt_path and dataset_dir with necessary types and requirements, and then parses the given arguments to be used in the main function.",
+ "type": "comment"
+ },
+ "662": {
+ "file_id": 36,
+ "content": "/vinn_eval.py",
+ "type": "filepath"
+ },
+ "663": {
+ "file_id": 36,
+ "content": "This code defines a function for nearest neighbor calculation, performs rollouts, and preprocesses features for image classification tasks. It uses command-line arguments to run the script with specific directories and checkpoints.",
+ "type": "summary"
+ },
+ "664": {
+ "file_id": 36,
+ "content": "import torch\nfrom torch import nn\nimport torch.nn.functional as F\nimport numpy as np\nimport h5py\nimport pathlib\nimport os\nimport argparse\nimport matplotlib.pyplot as plt\nfrom PIL import Image\nimport torchvision\nfrom torchvision import transforms\n# from visualize_episodes import visualize_joints\nfrom utils import set_seed, sample_box_pose\n# from imitate_episodes import get_image\nfrom sim_env import BOX_POSE\nfrom constants import DT\nfrom imitate_episodes import save_videos\nfrom einops import rearrange\nimport time\nDT = 0.02\nimport IPython\ne = IPython.embed\n# modified from https://github.com/jyopari/VINN/blob/main/nearest-neighbor-eval/handle_nn.ipynb\ndef calculate_nearest_neighbors(curr_feature, support_inputs, support_targets, k, state_weight):\n has_skip = len(support_targets.shape) == 3\n if has_skip: # when there is action skip\n num_targets, skip, a_dim = support_targets.shape\n support_targets = support_targets.view((num_targets, -1))\n curr_vis_feature, curr_s_feature = curr_feature\n support_vis_feature, support_s_feature = support_inputs",
+ "type": "code",
+ "location": "/vinn_eval.py:1-35"
+ },
+ "665": {
+ "file_id": 36,
+ "content": "This code imports necessary libraries and defines a function that calculates nearest neighbors for a given feature. The function takes the current feature, support inputs, support targets, number of neighbors to consider (k), and state weight as input parameters. It also handles cases where there is an action skip in the support targets by reshaping them before processing. The code defines separate features for visual and spatial modalities (curr_vis_feature, curr_s_feature, support_vis_feature, support_s_feature).",
+ "type": "comment"
+ },
+ "666": {
+ "file_id": 36,
+ "content": " pairwise_dist_vis = torch.norm(curr_vis_feature - support_vis_feature, dim=1).unsqueeze(0)\n pairwise_dist_s = torch.norm(curr_s_feature - support_s_feature, dim=1).unsqueeze(0)\n pairwise_dist = pairwise_dist_vis + pairwise_dist_s * state_weight\n sorted_dist, index = torch.sort(pairwise_dist, dim=1) # sort the support axis\n permuted_support_targets = support_targets[index]\n topk_dist = pairwise_dist[:, :k]\n topk_support_targets = permuted_support_targets[:, :k]\n weights = F.softmax(-topk_dist, dim=1)\n weighted_support_targets = weights.unsqueeze(2) * topk_support_targets\n prediction = torch.sum(weighted_support_targets, dim=1)\n if has_skip:\n num_predictions = prediction.shape[0]\n prediction = prediction.reshape((num_predictions, skip, a_dim))\n return prediction\ndef main(args):\n # TODO ######################\n k = None # for scripted box transfer\n skip = 100\n real_robot = True\n save_episode = True\n # TODO ######################\n onscreen_cam = 'main'",
+ "type": "code",
+ "location": "/vinn_eval.py:37-63"
+ },
+ "667": {
+ "file_id": 36,
+ "content": "The code calculates pairwise distances between current and support features, sorts them, and assigns weights to the top-k distances. It then uses these weights to create a weighted sum of support targets as the prediction. The function takes arguments 'args', but they are not used in this snippet. Additionally, it allows skipping predictions for every 100th frame with 'has_skip' flag.",
+ "type": "comment"
+ },
+ "668": {
+ "file_id": 36,
+ "content": " state_dim = 14\n dataset_dir = args['dataset_dir']\n onscreen_render = args['onscreen_render']\n ckpt_dir = args['ckpt_dir']\n model_dir = args['model_dir']\n task_name = args['task_name']\n if 'insertion' in task_name:\n sim_episode_len = 400\n env_max_reward = 4\n ks = [None]\n elif 'transfer_cube' in task_name:\n sim_episode_len = 400\n env_max_reward = 4\n ks = [1, 1, 1]\n if 'human' in dataset_dir:\n state_weight = 5\n else:\n state_weight = 10\n print(f'{state_weight=}')\n elif task_name == 'ziploc_slide':\n env_max_reward = 1\n ks = [71]\n state_weight = 0\n elif task_name == 'aloha_mobile_wipe_wine':\n sim_episode_len = 1300\n env_max_reward = 4\n ks = [2, 2, 2]\n state_weight = 5\n print(f'{state_weight=}')\n else:\n raise NotImplementedError\n model_name = pathlib.PurePath(model_dir).name\n seed = int(model_name.split('-')[-1][:-3])\n repr_type = 'byol'\n if 'cotrain' in model_name:",
+ "type": "code",
+ "location": "/vinn_eval.py:64-101"
+ },
+ "669": {
+ "file_id": 36,
+ "content": "This code sets various parameters and configurations for different tasks based on the task name provided. It assigns specific episode lengths, maximum rewards, kernel sizes (ks), and state weights depending on the task type. If the task is not implemented, it raises a NotImplementedError. The model name's last part before the file extension is used as the seed, and the representation type is set to 'byol'. For models with 'cotrain' in their names, it assigns the repr_type accordingly.",
+ "type": "comment"
+ },
+ "670": {
+ "file_id": 36,
+ "content": " repr_type += '_cotrain'\n e() # make sure!\n k = ks[seed]\n if real_robot:\n BASE_DELAY = 15\n query_freq = skip - BASE_DELAY\n # load train data\n vis_features = []\n state_features = []\n Y = []\n for episode_id in range(0, 40):\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n action = root['/action'][:]\n base_action = root['/base_action'][:]\n action = np.concatenate([action, base_action], axis=1)\n camera_names = list(root[f'/observations/images/'].keys())\n # Visual feature\n all_cam_feature = []\n for cam_name in camera_names:\n feature_dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_id}.hdf5')\n with h5py.File(feature_dataset_path, 'r') as root:\n cam_feature = root[f'/features/{cam_name}'][:]\n all_cam_feature.append(cam_feature)\n vis_fea = np.concatenate(all_cam_feature, axis=1)",
+ "type": "code",
+ "location": "/vinn_eval.py:102-130"
+ },
+ "671": {
+ "file_id": 36,
+ "content": "This code loads train data by iterating over 40 episodes. It retrieves action, base_action, and camera names from a dataset file. For each episode, it concatenates the visual features of all cameras into 'vis_fea'. The repr_type is extended with '_cotrain', and BASE_DELAY is set to 15 for real_robot cases.",
+ "type": "comment"
+ },
+ "672": {
+ "file_id": 36,
+ "content": " ## State feature\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n s_fea = root['/observations/qpos'][:]\n # stack actions together\n eps_len = len(action)\n indices = np.tile(np.arange(skip), eps_len).reshape(eps_len, skip) # each row is 0, 1, ... skip\n offset = np.expand_dims(np.arange(eps_len), axis=1)\n indices = indices + offset # row1: 0, 1, ... skip; row2: 1, 2, ... skip+1\n # indices will exceed eps_len, thus clamp to eps_len-1\n indices = np.clip(indices, 0, eps_len-1)\n # stack action\n action = action[indices] # new shape: eps_len, skip, a_dim\n vis_features.append(vis_fea)\n state_features.append(s_fea)\n Y.append(action)\n vis_features = np.concatenate(vis_features)\n state_features = np.concatenate(state_features)\n Y = np.concatenate(Y)\n train_inputs = [torch.from_numpy(vis_features).cuda(), torch.from_numpy(state_features).cuda()]",
+ "type": "code",
+ "location": "/vinn_eval.py:132-154"
+ },
+ "673": {
+ "file_id": 36,
+ "content": "This code reads episode data from a file, stacks actions together, appends them to feature lists, and then concatenates the feature lists. Finally, it creates torch tensors for training inputs.",
+ "type": "comment"
+ },
+ "674": {
+ "file_id": 36,
+ "content": " train_targets = torch.from_numpy(Y).cuda()\n set_seed(1000)\n feature_extractors = {}\n for cam_name in camera_names:\n resnet = torchvision.models.resnet18(pretrained=True)\n loading_status = resnet.load_state_dict(torch.load(model_dir.replace('DUMMY', cam_name)))\n print(cam_name, loading_status)\n resnet = nn.Sequential(*list(resnet.children())[:-1])\n resnet = resnet.cuda()\n resnet.eval()\n feature_extractors[cam_name] = resnet\n # load environment\n if real_robot:\n from aloha_scripts.real_env import make_real_env #### TODO TODO\n env = make_real_env(init_node=True, setup_robots=True, setup_base=True)\n max_timesteps = sim_episode_len\n camera_names = ['cam_high', 'cam_left_wrist', 'cam_right_wrist']\n else:\n from sim_env import make_sim_env\n env = make_sim_env(task_name)\n max_timesteps = sim_episode_len\n num_rollouts = 50\n episode_returns = []\n max_rewards = []\n for rollout_id in range(num_rollouts):",
+ "type": "code",
+ "location": "/vinn_eval.py:155-185"
+ },
+ "675": {
+ "file_id": 36,
+ "content": "The code initializes feature extractors for each camera, loads the environment based on real_robot flag, and starts a loop to perform rollouts. It creates episode returns and maximum rewards lists for tracking performance metrics during the rollouts.",
+ "type": "comment"
+ },
+ "676": {
+ "file_id": 36,
+ "content": " ### set task\n BOX_POSE[0] = sample_box_pose() # used in sim reset\n ts = env.reset()\n ### evaluation loop\n qpos_history = torch.zeros((1, max_timesteps, state_dim)).cuda()\n image_list = [] # for visualization\n qpos_list = []\n target_qpos_list = []\n rewards = []\n with torch.inference_mode():\n for t in range(sim_episode_len):\n start_time = time.time()\n if t % 100 == 0: print(t)\n if t % query_freq == 0:\n ### process previous timestep to get qpos and image_list\n obs = ts.observation\n if 'images' in obs:\n image_list.append(obs['images'])\n else:\n image_list.append({'main': obs['image']})\n qpos_numpy = np.array(obs['qpos'])\n # qpos = pre_process(qpos_numpy)\n qpos = torch.from_numpy(qpos_numpy).float().cuda().unsqueeze(0)",
+ "type": "code",
+ "location": "/vinn_eval.py:186-209"
+ },
+ "677": {
+ "file_id": 36,
+ "content": "This code sets up a task, resets the environment, and enters an evaluation loop. It collects data for visualization, including qpos and images, and stores them in lists. The code is performing these actions at specific intervals based on the provided conditions.",
+ "type": "comment"
+ },
+ "678": {
+ "file_id": 36,
+ "content": " qpos_history[:, t] = qpos\n _, curr_image_raw = get_image(ts, camera_names)\n image_size = 120\n transform = transforms.Compose([\n transforms.Resize(image_size), # will scale the image\n transforms.CenterCrop(image_size),\n transforms.ToTensor(),\n transforms.Lambda(expand_greyscale),\n transforms.Normalize(\n mean=torch.tensor([0.485, 0.456, 0.406]),\n std=torch.tensor([0.229, 0.224, 0.225])),\n ])\n all_cam_features = []\n for cam_id, curr_image in enumerate(curr_image_raw):\n curr_image = Image.fromarray(curr_image) # TODO only one camera\n curr_image = transform(curr_image)\n curr_image = curr_image.unsqueeze(dim=0).cuda()\n curr_image_feature = feature_extractors[camera_names[cam_id]](curr_image)",
+ "type": "code",
+ "location": "/vinn_eval.py:210-229"
+ },
+ "679": {
+ "file_id": 36,
+ "content": "This code segment processes an image for a robotics task. It stores the current qpos in history, retrieves and preprocesses raw camera images using transforms such as resizing, cropping, normalization, and tensor conversion. It then collects features from each camera using respective feature extractors and stores them in all_cam_features.",
+ "type": "comment"
+ },
+ "680": {
+ "file_id": 36,
+ "content": " curr_image_feature = curr_image_feature.squeeze(3).squeeze(2)\n all_cam_features.append(curr_image_feature)\n curr_image_feature = torch.cat(all_cam_features, dim=1)\n ### Visual feature\n # curr_feature = curr_image_feature\n ### State feature\n # curr_feature = qpos\n ### Both features\n curr_feature = [curr_image_feature, qpos]\n action = calculate_nearest_neighbors(curr_feature, train_inputs, train_targets, k, state_weight) # TODO use this\n action = action.squeeze(0).cpu().numpy()\n action = np.concatenate([action[:-BASE_DELAY, :-2], action[BASE_DELAY:, -2:]], axis=1)\n print(f'Query: {(time.time() - start_time):.3f}s')\n curr_action = action[t % query_freq]\n target_qpos = curr_action[:-2]\n base_action = curr_action[-2:]",
+ "type": "code",
+ "location": "/vinn_eval.py:230-250"
+ },
+ "681": {
+ "file_id": 36,
+ "content": "The code preprocesses visual and state features, calculates nearest neighbors for action selection using a specified metric, and filters out the required action based on query frequency. The resulting target position and base action are extracted for further processing.",
+ "type": "comment"
+ },
+ "682": {
+ "file_id": 36,
+ "content": " # ### SAFETY\n # max_a = 0.05\n # curr_qpos = qpos.squeeze().cpu().numpy()\n # target_qpos = target_qpos.clip(curr_qpos - max_a, curr_qpos + max_a)\n # ### SAFETY\n ### step the environment\n ts = env.step(target_qpos, base_action=base_action)\n duration = time.time() - start_time\n # print(f'{duration:.3f}')\n time.sleep(max(0, DT - duration))\n ### save things for visualization\n qpos_list.append(qpos_numpy)\n target_qpos_list.append(target_qpos)\n rewards.append(ts.reward)\n # if real_robot and t != 0 and t % 60 == 0:\n # e()\n plt.close()\n if real_robot:\n env.puppet_bot_left.dxl.robot_set_operating_modes(\"single\", \"gripper\", \"position\")\n env.puppet_bot_right.dxl.robot_set_operating_modes(\"single\", \"gripper\", \"position\")\n env.puppet_bot_left.dxl.robot_set_operating_modes(\"single\", \"gripper\", \"pwm\")",
+ "type": "code",
+ "location": "/vinn_eval.py:252-275"
+ },
+ "683": {
+ "file_id": 36,
+ "content": "This code chunk is responsible for controlling the movement of a robot's joints, ensuring safety by clipping target positions within safe limits. It steps through the environment and saves information for visualization. If the robot is real, it sets the operating modes for the gripper and pwm.",
+ "type": "comment"
+ },
+ "684": {
+ "file_id": 36,
+ "content": " env.puppet_bot_right.dxl.robot_set_operating_modes(\"single\", \"gripper\", \"pwm\")\n rewards = np.array(rewards)\n episode_return = np.sum(rewards[rewards!=None])\n episode_returns.append(episode_return)\n max_reward = np.max(rewards)\n max_rewards.append(max_reward)\n print(f'{episode_return=}, {max_reward=}')\n if save_episode:\n save_videos(image_list, DT, video_path=os.path.join(ckpt_dir, f'video{rollout_id}.mp4'))\n # visualize_joints(qpos_list, target_qpos_list, plot_path=os.path.join(ckpt_dir, f'qpos{rollout_id}.png'))\n # visualize_joints(qpos_list, example_qpos, plot_path=os.path.join(ckpt_dir, f'qpos_reference{rollout_id}.png'), label_overwrite=(\"policy\", \"dataset\"))\n success_rate = np.mean(np.array(max_rewards) == env_max_reward)\n avg_return = np.mean(episode_returns)\n summary_str = f'\\nSuccess rate: {success_rate}\\nAverage return: {avg_return}\\n\\n'\n for r in range(env_max_reward+1):\n more_or_equal_r = (np.array(max_rewards) >= r).sum()",
+ "type": "code",
+ "location": "/vinn_eval.py:276-294"
+ },
+ "685": {
+ "file_id": 36,
+ "content": "This code sets the operating modes for the robot's gripper and calculates rewards, episode returns, and maximum rewards. It then prints these values and saves videos or images if required. Finally, it calculates success rate and average return and constructs a summary string.",
+ "type": "comment"
+ },
+ "686": {
+ "file_id": 36,
+ "content": " more_or_equal_r_rate = more_or_equal_r / num_rollouts\n summary_str += f'Reward >= {r}: {more_or_equal_r}/{num_rollouts} = {more_or_equal_r_rate*100}%\\n'\n print(summary_str)\n # save success rate to txt\n result_file_name = f'result_{skip}_{k}' + '.txt'\n with open(os.path.join(ckpt_dir, result_file_name), 'w') as f:\n f.write(summary_str)\n f.write(repr(episode_returns))\n f.write('\\n\\n')\n f.write(repr(max_rewards))\n return success_rate, avg_return\ndef get_image(ts, camera_names):\n if 'images' in ts.observation:\n curr_images = []\n for cam_name in camera_names:\n curr_image = rearrange(ts.observation['images'][cam_name], 'h w c -> c h w')\n curr_images.append(curr_image)\n curr_image_raw = np.stack(curr_images, axis=0)\n else:\n curr_image_raw = rearrange(ts.observation['image'], 'h w c -> c h w')\n curr_image = torch.from_numpy(curr_image_raw / 255.0).float().cuda().unsqueeze(0)\n curr_image_raw = rearrange(curr_image_raw, 'b c h w -> b h w c')",
+ "type": "code",
+ "location": "/vinn_eval.py:295-322"
+ },
+ "687": {
+ "file_id": 36,
+ "content": "This function calculates the success rate, average return, and saves results to a text file for each episode. It retrieves images from observations, processes them, and stores the current image raw data in the correct format for further processing or visualization.",
+ "type": "comment"
+ },
+ "688": {
+ "file_id": 36,
+ "content": " return curr_image, curr_image_raw\ndef expand_greyscale(t):\n return t.expand(3, -1, -1)\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--onscreen_render', action='store_true')\n parser.add_argument('--dataset_dir', action='store', type=str, help='The text to parse.', required=True)\n parser.add_argument('--model_dir', action='store', type=str, help='model_dir', required=True)\n parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--ckpt_dir', action='store', type=str, help='The text to parse.', required=True)\n main(vars(parser.parse_args()))",
+ "type": "code",
+ "location": "/vinn_eval.py:323-336"
+ },
+ "689": {
+ "file_id": 36,
+ "content": "The code defines a function expand_greyscale, sets up argument parsing with required parameters like dataset_dir and model_dir, and calls main function with the parsed arguments. The main function is not defined in this chunk but is called by passing the command-line arguments as variables. It seems to be a script for running an image classification task with specific directories and checkpoints.",
+ "type": "comment"
+ },
+ "690": {
+ "file_id": 37,
+ "content": "/vinn_select_k.py",
+ "type": "filepath"
+ },
+ "691": {
+ "file_id": 37,
+ "content": "The function `calculate_nearest_neighbors()` computes nearest neighbor losses and is used to select the optimal value of 'k' for a dataset, plotting and saving the best loss. User inputs: dataset directory, checkpoint directory.",
+ "type": "summary"
+ },
+ "692": {
+ "file_id": 37,
+ "content": "import torch\nimport torch.nn.functional as F\nimport numpy as np\nimport h5py\nimport pathlib\nimport os\nimport argparse\nimport matplotlib.pyplot as plt\nimport IPython\ne = IPython.embed\n# modified from https://github.com/jyopari/VINN/blob/main/nearest-neighbor-eval/handle_nn.ipynb\ndef calculate_nearest_neighbors(query_inputs, query_targets, support_inputs, support_targets, max_k):\n with torch.no_grad():\n pairwise_dist = []\n for q_in in query_inputs:\n diff = support_inputs - q_in.unsqueeze(0)\n dist = torch.norm(diff, dim=1)\n pairwise_dist.append(dist)\n pairwise_dist = torch.stack(pairwise_dist)\n sorted_dist, index = torch.sort(pairwise_dist, dim=1) # sort the support axis\n permuted_support_targets = support_targets[index]\n errors = []\n for k in range(1, max_k):\n topk_dist = pairwise_dist[:, :k]\n topk_support_targets = permuted_support_targets[:, :k]\n weights = F.softmax(-topk_dist, dim=1)\n weighted_support_targets = weights.unsqueeze(2) * topk_support_targets",
+ "type": "code",
+ "location": "/vinn_select_k.py:1-31"
+ },
+ "693": {
+ "file_id": 37,
+ "content": "Code imports necessary libraries and defines a function `calculate_nearest_neighbors()` that takes in query inputs, target values, support inputs, and support targets as well as a maximum value of K. It then calculates the pairwise distances between the query inputs and support inputs, sorts them, and calculates weights for the nearest neighbors using softmax. Finally, it computes errors by weighting the support targets based on these calculated weights.",
+ "type": "comment"
+ },
+ "694": {
+ "file_id": 37,
+ "content": " prediction = torch.sum(weighted_support_targets, dim=1)\n error = F.mse_loss(prediction, query_targets)\n errors.append(error)\n return errors\ndef chunks(lst, n):\n \"\"\"Yield successive n-sized chunks from lst.\"\"\"\n for i in range(0, len(lst), n):\n yield lst[i:i + n]\ndef main(args):\n # TODO ######################\n dataset_dir = args['dataset_dir']\n ckpt_dir = args['ckpt_dir']\n seed = 0\n max_k = 400\n batch_size = 100\n # TODO ######################\n repr_type = 'byol'\n if 'cotrain' in ckpt_dir:\n repr_type += '_cotrain'\n e() # make sure!\n if not os.path.isdir(ckpt_dir):\n os.makedirs(ckpt_dir)\n episode_idxs = [int(name.split('_')[1].split('.')[0]) for name in os.listdir(dataset_dir) if ('.hdf5' in name) and ('features' not in name)]\n episode_idxs.sort()\n assert len(episode_idxs) == episode_idxs[-1] + 1 # no holes\n num_episodes = len(episode_idxs)\n val_split = int(num_episodes * 0.8)\n # load train data\n X = []",
+ "type": "code",
+ "location": "/vinn_select_k.py:32-66"
+ },
+ "695": {
+ "file_id": 37,
+ "content": "This code reads episode indices from a specified directory, sorts them, and asserts there are no gaps. It then determines a validation split of 80% for training data. The code loads the training data into list X.",
+ "type": "comment"
+ },
+ "696": {
+ "file_id": 37,
+ "content": " Y = []\n for episode_id in range(0, val_split):\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n action = root['/action'][:]\n camera_names = list(root[f'/observations/images/'].keys())\n all_cam_feature = []\n feature_dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_id}.hdf5')\n with h5py.File(feature_dataset_path, 'r') as root:\n for cam_name in camera_names:\n cam_feature = root[f'/features/{cam_name}'][:]\n all_cam_feature.append(cam_feature)\n cam_feature = np.concatenate(all_cam_feature, axis=1)\n X.append(cam_feature)\n Y.append(action)\n X = np.concatenate(X)\n Y = np.concatenate(Y)\n train_inputs = torch.from_numpy(X).cuda()\n train_targets = torch.from_numpy(Y).cuda()\n print(f'All features: {train_inputs.shape}')\n # load test data\n X = []\n Y = []\n for episode_id in range(val_split, num_episodes):",
+ "type": "code",
+ "location": "/vinn_select_k.py:67-94"
+ },
+ "697": {
+ "file_id": 37,
+ "content": "This code loads data from HDF5 files and concatenates it for training. It reads action labels and camera features for each episode, then combines them into a single feature matrix (X) and action label matrix (Y). The code also prints the shape of the feature matrices.",
+ "type": "comment"
+ },
+ "698": {
+ "file_id": 37,
+ "content": " dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n action = root['/action'][:]\n all_cam_feature = []\n feature_dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_id}.hdf5')\n with h5py.File(feature_dataset_path, 'r') as root:\n for cam_name in camera_names:\n cam_feature = root[f'/features/{cam_name}'][:]\n all_cam_feature.append(cam_feature)\n cam_feature = np.concatenate(all_cam_feature, axis=1)\n X.append(cam_feature)\n Y.append(action)\n X = np.concatenate(X)\n Y = np.concatenate(Y)\n val_inputs = torch.from_numpy(X).cuda()\n val_targets = torch.from_numpy(Y).cuda()\n val_losses = []\n for inputs, targets in zip(chunks(val_inputs, batch_size), chunks(val_targets, batch_size)):\n val_loss = calculate_nearest_neighbors(inputs, targets, train_inputs, train_targets, max_k)\n val_loss = torch.stack(val_loss)",
+ "type": "code",
+ "location": "/vinn_select_k.py:95-118"
+ },
+ "699": {
+ "file_id": 37,
+ "content": "This code loads data from multiple HDF5 files, concatenates camera features into a single feature matrix (X), and associates corresponding actions as targets (Y). It then prepares the data for training by converting to PyTorch tensors and computing nearest neighbor losses using a custom function. The resulting losses are stored in val_losses list.",
+ "type": "comment"
+ }
+}
\ No newline at end of file
diff --git a/docs/data/7.json b/docs/data/7.json
new file mode 100644
index 00000000..c58b031b
--- /dev/null
+++ b/docs/data/7.json
@@ -0,0 +1,78 @@
+{
+ "700": {
+ "file_id": 37,
+ "content": " val_losses.append(val_loss)\n val_losses = torch.mean(torch.stack(val_losses), dim=0)\n val_loss = val_losses\n val_loss = torch.tensor(val_loss).cpu().numpy()\n print(f'min val loss of {np.min(val_loss)} at k={np.argmin(val_loss)}')\n plt.plot(np.arange(1, max_k), val_loss)\n plt.savefig(os.path.join(ckpt_dir, f'k_select-seed{seed}.png'))\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--dataset_dir', action='store', type=str, help='The text to parse.', required=True)\n parser.add_argument('--ckpt_dir', action='store', type=str, help='The text to parse.', required=True)\n main(vars(parser.parse_args()))",
+ "type": "code",
+ "location": "/vinn_select_k.py:119-134"
+ },
+ "701": {
+ "file_id": 37,
+ "content": "This code is used to select the optimal value of 'k' for a dataset. It calculates the validation loss for different values of 'k', plots the losses, and saves the best loss in an image file. The user needs to provide the directory path for the dataset and the checkpoint directory as input arguments.",
+ "type": "comment"
+ },
+ "702": {
+ "file_id": 38,
+ "content": "/visualize_episodes.py",
+ "type": "filepath"
+ },
+ "703": {
+ "file_id": 38,
+ "content": "This code imports libraries, defines functions for loading data from HDF5 files and generates a timestamp plot for camera frames using timestamps, converting them to float values, calculating time differences, and saving the resulting plot.",
+ "type": "summary"
+ },
+ "704": {
+ "file_id": 38,
+ "content": "import os\nimport numpy as np\nimport cv2\nimport h5py\nimport argparse\nimport matplotlib.pyplot as plt\nfrom constants import DT\nimport IPython\ne = IPython.embed\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTATE_NAMES = JOINT_NAMES + [\"gripper\"]\ndef load_hdf5(dataset_dir, dataset_name):\n dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')\n if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n is_sim = root.attrs['sim']\n qpos = root['/observations/qpos'][()]\n qvel = root['/observations/qvel'][()]\n action = root['/action'][()]\n image_dict = dict()\n for cam_name in root[f'/observations/images/'].keys():\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]\n return qpos, qvel, action, image_dict\ndef main(args):\n dataset_dir = args['dataset_dir']\n episode_idx = args['episode_idx']",
+ "type": "code",
+ "location": "/visualize_episodes.py:1-35"
+ },
+ "705": {
+ "file_id": 38,
+ "content": "This code imports necessary libraries, defines a list of joint names and state names, and contains two functions. The `load_hdf5` function loads dataset from hdf5 file, retrieves qpos, qvel, action, and image data. It returns these values. The `main` function takes arguments for dataset directory and episode index, but does not contain any code within it. The joint names likely represent different body parts' movement data in a robotics or simulation context.",
+ "type": "comment"
+ },
+ "706": {
+ "file_id": 38,
+ "content": " ismirror = args['ismirror']\n if ismirror:\n dataset_name = f'mirror_episode_{episode_idx}'\n else:\n dataset_name = f'episode_{episode_idx}'\n qpos, qvel, action, image_dict = load_hdf5(dataset_dir, dataset_name)\n save_videos(image_dict, DT, video_path=os.path.join(dataset_dir, dataset_name + '_video.mp4'))\n visualize_joints(qpos, action, plot_path=os.path.join(dataset_dir, dataset_name + '_qpos.png'))\n # visualize_timestamp(t_list, dataset_path) # TODO addn timestamp back\ndef save_videos(video, dt, video_path=None):\n if isinstance(video, list):\n cam_names = list(video[0].keys())\n cam_names = sorted(cam_names)\n h, w, _ = video[0][cam_names[0]].shape\n w = w * len(cam_names)\n fps = int(1/dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n for ts, image_dict in enumerate(video):\n images = []\n for cam_name in cam_names:\n image = image_dict[cam_name]\n image = image[:, :, [2, 1, 0]] # swap B and R channel",
+ "type": "code",
+ "location": "/visualize_episodes.py:36-60"
+ },
+ "707": {
+ "file_id": 38,
+ "content": "The code loads data from an HDF5 file based on a boolean mirror flag, then saves the images as videos and visualizes joint positions. The video saving function takes in a list of images and writes them to a file with a specified fourcc code and framerate.",
+ "type": "comment"
+ },
+ "708": {
+ "file_id": 38,
+ "content": " images.append(image)\n images = np.concatenate(images, axis=1)\n out.write(images)\n out.release()\n print(f'Saved video to: {video_path}')\n elif isinstance(video, dict):\n cam_names = list(video.keys())\n cam_names = sorted(cam_names)\n all_cam_videos = []\n for cam_name in cam_names:\n all_cam_videos.append(video[cam_name])\n all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension\n n_frames, h, w, _ = all_cam_videos.shape\n fps = int(1 / dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n for t in range(n_frames):\n image = all_cam_videos[t]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n out.write(image)\n out.release()\n print(f'Saved video to: {video_path}')\ndef visualize_joints(qpos_list, command_list, plot_path=None, ylim=None, label_overwrite=None):\n if label_overwrite:\n label1, label2 = label_overwrite",
+ "type": "code",
+ "location": "/visualize_episodes.py:61-87"
+ },
+ "709": {
+ "file_id": 38,
+ "content": "Code snippet handles saving a video by either concatenating images or concatenating multiple videos horizontally, then writing to a file. It also has functionality for visualizing joint positions over time and plots them if necessary with optional custom labels.",
+ "type": "comment"
+ },
+ "710": {
+ "file_id": 38,
+ "content": " else:\n label1, label2 = 'State', 'Command'\n qpos = np.array(qpos_list) # ts, dim\n command = np.array(command_list)\n num_ts, num_dim = qpos.shape\n h, w = 2, num_dim\n num_figs = num_dim\n fig, axs = plt.subplots(num_figs, 1, figsize=(w, h * num_figs))\n # plot joint state\n all_names = [name + '_left' for name in STATE_NAMES] + [name + '_right' for name in STATE_NAMES]\n for dim_idx in range(num_dim):\n ax = axs[dim_idx]\n ax.plot(qpos[:, dim_idx], label=label1)\n ax.set_title(f'Joint {dim_idx}: {all_names[dim_idx]}')\n ax.legend()\n # plot arm command\n for dim_idx in range(num_dim):\n ax = axs[dim_idx]\n ax.plot(command[:, dim_idx], label=label2)\n ax.legend()\n if ylim:\n for dim_idx in range(num_dim):\n ax = axs[dim_idx]\n ax.set_ylim(ylim)\n plt.tight_layout()\n plt.savefig(plot_path)\n print(f'Saved qpos plot to: {plot_path}')\n plt.close()\ndef visualize_timestamp(t_list, dataset_path):\n plot_path = dataset_path.replace('.pkl', '_timestamp.png')",
+ "type": "code",
+ "location": "/visualize_episodes.py:88-123"
+ },
+ "711": {
+ "file_id": 38,
+ "content": "This code visualizes the joint state and arm command over time for a given set of timestamps. It first converts the provided data into numpy arrays and creates subplots for each dimension. Then, it plots the joint state and arm command values against timestamps for each dimension. Optionally, it sets the y-axis limits. Finally, it saves the resulting plot as an image and prints its location.",
+ "type": "comment"
+ },
+ "712": {
+ "file_id": 38,
+ "content": " h, w = 4, 10\n fig, axs = plt.subplots(2, 1, figsize=(w, h*2))\n # process t_list\n t_float = []\n for secs, nsecs in t_list:\n t_float.append(secs + nsecs * 10E-10)\n t_float = np.array(t_float)\n ax = axs[0]\n ax.plot(np.arange(len(t_float)), t_float)\n ax.set_title(f'Camera frame timestamps')\n ax.set_xlabel('timestep')\n ax.set_ylabel('time (sec)')\n ax = axs[1]\n ax.plot(np.arange(len(t_float)-1), t_float[:-1] - t_float[1:])\n ax.set_title(f'dt')\n ax.set_xlabel('timestep')\n ax.set_ylabel('time (sec)')\n plt.tight_layout()\n plt.savefig(plot_path)\n print(f'Saved timestamp plot to: {plot_path}')\n plt.close()\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--dataset_dir', action='store', type=str, help='Dataset dir.', required=True)\n parser.add_argument('--episode_idx', action='store', type=int, help='Episode index.', required=False)\n parser.add_argument('--ismirror', action='store_true')\n main(vars(parser.parse_args()))",
+ "type": "code",
+ "location": "/visualize_episodes.py:124-154"
+ },
+ "713": {
+ "file_id": 38,
+ "content": "This code generates a timestamp plot for camera frames from a given dataset. It reads the timestamps, converts them to float values, plots them against timesteps, and calculates the time difference between consecutive timestamps. The resulting plot is saved and the file path is printed. The code expects the dataset directory, episode index, and a flag for mirror augmentation as input arguments.",
+ "type": "comment"
+ }
+}
\ No newline at end of file
diff --git a/docs/data/titles/0.json b/docs/data/titles/0.json
new file mode 100644
index 00000000..686bab31
--- /dev/null
+++ b/docs/data/titles/0.json
@@ -0,0 +1,302 @@
+{
+ "/README.md": "Comprehensive Guide to ACT, Diffusion Policy, and VINN",
+ "/README.md:1-20": "Multi-Algorithm Robotic Environment Simulator",
+ "/README.md:21-57": "ALOHA Installation & Setup Guide",
+ "/README.md:58-77": "Train, Visualize, and Evaluate ACT Model",
+ "/README.md:78-85": "Optimize Video Training with Real-Time Rendering and Extended Epochs",
+ "/__init__.py": "Incomplete Code Snippet",
+ "/align.py": "Calibrating Interbotix Arms and Opening Grippers",
+ "/align.py:1-23": "Calibrating Puppet Bots' Head Cam and Arms",
+ "/align.py:24-31": "Sleep & Open Bots' Grippers",
+ "/commands.txt": "Training RL Models and Preprocessing Data",
+ "/commands.txt:126-152": "Conda Environment, MUJOCO_GL, Act-Plus-Plus Task Training",
+ "/commands.txt:153-173": "Diffusion-based Policy Model Training & Evaluation Code",
+ "/commands.txt:175-201": "Training Diffusion Policies on Aloha Tasks",
+ "/commands.txt:2-28": "Automated Reinforcement Learning Experimentation",
+ "/commands.txt:202-227": "Training Robot Model with Chunk Sizes and Environment Settings",
+ "/commands.txt:228-256": "Training Diffusion Policy Model: Aloha Mobile Wipe",
+ "/commands.txt:257-278": "Activate, Change, Set, Run and Cache",
+ "/commands.txt:280-307": "Training VINN Model with Camera Compatibility",
+ "/commands.txt:29-43": "Mirrored Data Simulation Generator",
+ "/commands.txt:307-331": "BYOL Training and Feature Selection for sim_transfer_cube_scripted Task",
+ "/commands.txt:332-346": "CUDA-Powered Python Training Scripts",
+ "/commands.txt:347-362": "CUDA Control for Multi-Model Training",
+ "/commands.txt:363-387": "Conda Environment: Training and Caching Features with Python",
+ "/commands.txt:388-409": "Activate Conda Environment for CUDA",
+ "/commands.txt:411-433": "Environment Switching and Training Scripts Execution",
+ "/commands.txt:434-459": "BYOL Model Training on Aloha and Wine Datasets",
+ "/commands.txt:45-69": "Sanity Checking Mirrored Data with ACT Policy",
+ "/commands.txt:459-481": "Training VINN Model through Commands",
+ "/commands.txt:485-514": "Training and Evaluating Models with VINN and BYOL",
+ "/commands.txt:515-527": "Activate Conda, Change Directory, Train Network",
+ "/commands.txt:70-97": "Hyperparameter Tuning for Imitation Learning",
+ "/commands.txt:98-125": "Train & Evaluate Sim Transfer Cube Policy",
+ "/compress_data.py": "Data Compression and Processing Tool",
+ "/compress_data.py:1-35": "Compress HDF5 Dataset Efficiently",
+ "/compress_data.py:109-135": "HDF5 Data Preprocessing and Video Conversion",
+ "/compress_data.py:136-159": "Compressed Images Loader",
+ "/compress_data.py:162-181": "Compress HDF5 Datasets in Directory",
+ "/compress_data.py:37-61": "Compress Data with JPEG Parameters",
+ "/compress_data.py:63-82": "Image Compression and Storage in HDF5",
+ "/compress_data.py:83-108": "Compress and Save Multi-Camera Video",
+ "/conda_env.yaml": "Conda Environment Definition: aloha",
+ "/constants.py": "Gripper Limits and Environments: Robotics Task Parameters",
+ "/constants.py:1-35": "Task Parameters Constants",
+ "/constants.py:36-66": "Simulation Constants Dictionary",
+ "/constants.py:67-84": "Gripper Position and Limits Constants",
+ "/constants.py:84-95": "Joint Normalization and Conversion Functions",
+ "/constants.py:95-100": "Gripper Angle Conversion Functions",
+ "/detr/README.md": "DETR Code Snippet",
+ "/detr/main.py": "DETR Model Initialization and Optimizer Creation",
+ "/detr/main.py:1-25": "Transformer Detector Training Code",
+ "/detr/main.py:117-129": "GPU Initialization and Optimizer Setup",
+ "/detr/main.py:26-40": "Command Line Arguments for DETR Model",
+ "/detr/main.py:40-56": "Command Line Arguments for DETR Model",
+ "/detr/main.py:57-69": "CLI Argument Parsing with Argparse",
+ "/detr/main.py:70-81": "Command-Line Arguments for DETR Python Script",
+ "/detr/main.py:83-116": "Build Models and Optimizers for DETR",
+ "/detr/models/__init__.py": "Building Models in DETR",
+ "/detr/models/backbone.py": "ResNet-VT Backbone Builder",
+ "/detr/models/backbone.py:1-35": "Frozen BatchNorm2d Implementation",
+ "/detr/models/backbone.py:115-122": "Building Vision Transformer Backbone Model",
+ "/detr/models/backbone.py:37-62": "Backbone Base: Reshaping and Loading Weights",
+ "/detr/models/backbone.py:63-86": "Nested Backbone Dictionary Class",
+ "/detr/models/backbone.py:87-112": "ResNet Backbone Model for Transfer Learning",
+ "/detr/models/detr_vae.py": "DETR-CVAE: Image Object Detection with Latent Inputs",
+ "/detr/models/detr_vae.py:1-35": "DETRVAE Model Implementation",
+ "/detr/models/detr_vae.py:108-123": "DETR VAE Encoder and Latent Representation",
+ "/detr/models/detr_vae.py:123-141": "DETR-VAE: VQ-VAE Mode Latent Input Calculation",
+ "/detr/models/detr_vae.py:141-163": "VAE Latent Samples and Encoding",
+ "/detr/models/detr_vae.py:164-184": "Detr-VAE Model: Predicting Actions and Latent Variables",
+ "/detr/models/detr_vae.py:185-202": "Initializing DETR-VAE Model",
+ "/detr/models/detr_vae.py:203-227": "DETR Model: PyTorch Implementation",
+ "/detr/models/detr_vae.py:228-254": "DETR VAE Model: Encoder and MLP",
+ "/detr/models/detr_vae.py:255-289": "DETR-VAE Model Initialization",
+ "/detr/models/detr_vae.py:290-325": "Building DETR-VAE Models",
+ "/detr/models/detr_vae.py:36-52": "DETR: Object Detection with Transformer",
+ "/detr/models/detr_vae.py:53-71": "DETR-VAE Model Initialization",
+ "/detr/models/detr_vae.py:72-90": "VAE Layer Initialization in DETR Model",
+ "/detr/models/detr_vae.py:91-107": "Conditional VAE Model for Action Sequences",
+ "/detr/models/latent_model.py": "Latent Model Transformer: Sequential Sampling",
+ "/detr/models/latent_model.py:1-28": "Causal Transformer Architecture",
+ "/detr/models/latent_model.py:29-55": "Latent Space Transformer Model",
+ "/detr/models/latent_model.py:56-72": "Attention-Based Latent Model",
+ "/detr/models/position_encoding.py": "Transformer Positional Embedding Class",
+ "/detr/models/position_encoding.py:1-33": "Transformer Positional Embedding Class",
+ "/detr/models/position_encoding.py:34-57": "Position Embedding Generation",
+ "/detr/models/position_encoding.py:58-87": "Sine-Cosine Position Embedding Class",
+ "/detr/models/position_encoding.py:88-93": "Position Embedding Initializer",
+ "/detr/models/transformer.py": "Transformer Class for Data Processing",
+ "/detr/models/transformer.py:1-30": "Custom Transformer Class from Scratch",
+ "/detr/models/transformer.py:110-134": "Transformer Model Forward Pass: Layer Operations and Normalization",
+ "/detr/models/transformer.py:135-163": "Transformer Encoder Layer: Self-Attention and Feedforward",
+ "/detr/models/transformer.py:164-187": "Transformer Positional Embeddings Functions",
+ "/detr/models/transformer.py:188-210": "Transformer Decoder Layer Class",
+ "/detr/models/transformer.py:211-234": "Transformer Feedforward Layer",
+ "/detr/models/transformer.py:235-255": "Multi-Head Self Attention Layer",
+ "/detr/models/transformer.py:256-275": "Transformer Model Function in PyTorch",
+ "/detr/models/transformer.py:276-299": "Parallel Transformer Models with Masks",
+ "/detr/models/transformer.py:300-314": "Transformer Model Factory Function",
+ "/detr/models/transformer.py:31-54": "Transformer Model Initialization",
+ "/detr/models/transformer.py:55-75": "Transformer Model Input Shape Handling",
+ "/detr/models/transformer.py:76-109": "Transformer Encoder and Decoder Classes",
+ "/detr/setup.py": "Setting Up detr Package",
+ "/detr/util/__init__.py": "Affiliates' Codebase Copyright Statement",
+ "/detr/util/box_ops.py": "Bounding Box Manipulation and GIoU Utilities",
+ "/detr/util/box_ops.py:1-41": "Bounding Box Manipulation and GIoU Utilities",
+ "/detr/util/box_ops.py:42-76": "Calculate IoU and Boxes from Masks",
+ "/detr/util/box_ops.py:77-88": "Box Coordinates Tensor",
+ "/detr/util/misc.py": "Smooth Metric Logger and NestedTensor Utilities",
+ "/detr/util/misc.py:1-37": "Python Smoothed Value Tracker",
+ "/detr/util/misc.py:115-143": "All-Gather and Reduce Dictionaries",
+ "/detr/util/misc.py:144-176": "Distributed Metrics Logger",
+ "/detr/util/misc.py:177-208": "Iterable Data Logger Class",
+ "/detr/util/misc.py:209-234": "Progress Bar Calculation and Printing Code",
+ "/detr/util/misc.py:235-261": "Total Time Calculator and Progress Logger",
+ "/detr/util/misc.py:262-298": "NestedTensor Class for PyTorch",
+ "/detr/util/misc.py:300-324": "Nested Tensor from Tensor List",
+ "/detr/util/misc.py:325-349": "Creating NestedTensor from List of Tensors",
+ "/detr/util/misc.py:350-386": "Distributed Training Utilities",
+ "/detr/util/misc.py:38-81": "Deque Tracker with Properties and Sync",
+ "/detr/util/misc.py:387-425": "Distributed Deep Learning Setup Code",
+ "/detr/util/misc.py:426-454": "Distributed Process Group Initialization",
+ "/detr/util/misc.py:455-468": "Version-Based Interpolation Check",
+ "/detr/util/misc.py:82-114": "All-Gather Utility Function",
+ "/detr/util/plot_utils.py": "Plotting Precision-Recall with Interpolated mAP",
+ "/detr/util/plot_utils.py:1-26": "Visualize Training Logs with Matplotlib",
+ "/detr/util/plot_utils.py:28-47": "Validate Log Directories Existence",
+ "/detr/util/plot_utils.py:48-72": "Epoch 1 Log File Checker and Planner",
+ "/detr/util/plot_utils.py:73-97": "Precision-Recall Plotting Function",
+ "/detr/util/plot_utils.py:98-107": "Plot Precision-Recall Curves and Scores",
+ "/dxl_test.py": "Dynamixel Wheel Control Test",
+ "/dynamixel_client.py": "Dynamixel Motor Control Client",
+ "/dynamixel_client.py:1-38": "Dynamixel Motor Communication Python Library",
+ "/dynamixel_client.py:119-146": "Dynamixel Motor Initialization and Connection",
+ "/dynamixel_client.py:147-171": "Dynamixel Client: Connect, Configure, Control",
+ "/dynamixel_client.py:172-196": "Disabling Motors and Cleaning Clients",
+ "/dynamixel_client.py:197-221": "Dynamixel Motor Torque Control",
+ "/dynamixel_client.py:223-254": "Dynamixel Position Control Functions",
+ "/dynamixel_client.py:255-279": "Sync Write Function",
+ "/dynamixel_client.py:280-307": "Synchronous Motor Write System",
+ "/dynamixel_client.py:308-329": "Error Handling and Unsigned Conversion",
+ "/dynamixel_client.py:330-365": "Dynamixel Motor Data Reader",
+ "/dynamixel_client.py:367-392": "Bulk Motor Data Read with Retries",
+ "/dynamixel_client.py:39-74": "Dynamixel Client Class and Conversion Functions",
+ "/dynamixel_client.py:393-425": "Dynamixel Client: Read and Update Motor Data",
+ "/dynamixel_client.py:426-450": "Dynamixel Servo Data Reader Class",
+ "/dynamixel_client.py:451-479": "Dynamixel Position Reader Class",
+ "/dynamixel_client.py:480-509": "Dynamixel Motor Data Reader",
+ "/dynamixel_client.py:510-538": "Dynamixel Cur Reader Class",
+ "/dynamixel_client.py:539-571": "Dynamixel Motor Data Reader Class",
+ "/dynamixel_client.py:572-598": "Dynamixel Client: Motor Control via Waypoints",
+ "/dynamixel_client.py:599-604": "Dynamixel Servo Status Tracker",
+ "/dynamixel_client.py:75-93": "Dynamixel Client Constructor",
+ "/dynamixel_client.py:94-118": "Dynamixel Motor Readers Initialization",
+ "/ee_sim_env.py": "Insertion Environment: Bi-Manual Peg Tasks",
+ "/ee_sim_env.py:1-26": "Bi-manual Manipulation Environment",
+ "/ee_sim_env.py:111-133": "Normalizing Joints and Gripper Positions",
+ "/ee_sim_env.py:134-155": "Robot Arm Environment Class",
+ "/ee_sim_env.py:156-181": "Randomized Box Environment Initialization",
+ "/ee_sim_env.py:182-208": "Insertion EE Task: Rewarding Contact Scenarios",
+ "/ee_sim_env.py:209-232": "Randomizing Peg and Socket Positions in Physics Simulation",
+ "/ee_sim_env.py:233-248": "Red Peg Detection in Physics Simulation",
+ "/ee_sim_env.py:249-265": "Reward-Based Contact Pair Detector",
+ "/ee_sim_env.py:266-267": "Fixed Reward Value Function",
+ "/ee_sim_env.py:28-38": "Robotics Simulation Environment: Observing Arm and Gripper Positions, Velocities, and Camera Data",
+ "/ee_sim_env.py:39-61": "Initializing Bimanual ViperX EE Task Environment",
+ "/ee_sim_env.py:63-84": "Resetting Robot Positions and Gripper Control",
+ "/ee_sim_env.py:85-110": "Robot Arm Initialization Code",
+ "/imitate_episodes.py": "Episodic Robot Imitation Learning",
+ "/imitate_episodes.py:1-35": "Reinforcement Learning Task Setup",
+ "/imitate_episodes.py:116-146": "Defining and Initializing Training Dictionaries",
+ "/imitate_episodes.py:147-165": "Evaluating Checkpoints and Config Updates",
+ "/imitate_episodes.py:167-198": "Save, Train, Optimize",
+ "/imitate_episodes.py:199-222": "Policy-Based Optimizer Configuration",
+ "/imitate_episodes.py:223-253": "Initialize Policy and Environment Variables",
+ "/imitate_episodes.py:254-274": "Loading and Initializing Models",
+ "/imitate_episodes.py:275-290": "Actuator Network Loading and Evaluation",
+ "/imitate_episodes.py:291-318": "Initialize Environment and Parameters",
+ "/imitate_episodes.py:319-347": "Initializing Rollout Loop and Task",
+ "/imitate_episodes.py:348-370": "Timestep Processing and Render Update",
+ "/imitate_episodes.py:37-65": "Setting Up Simulation Environment",
+ "/imitate_episodes.py:371-392": "Policy Execution in Reinforcement Learning Environment",
+ "/imitate_episodes.py:393-408": "Temporal Aggregation for Robot Action Generation",
+ "/imitate_episodes.py:409-423": "Robotic Control Policy Switching Code",
+ "/imitate_episodes.py:424-444": "Policy Selection and Actions Processing",
+ "/imitate_episodes.py:445-465": "Actuator Network Update and Base Action Calculation",
+ "/imitate_episodes.py:466-484": "Sleep Sync and Reward Appending",
+ "/imitate_episodes.py:485-508": "Episode Performance Analysis",
+ "/imitate_episodes.py:509-532": "Episode Summary Calculator",
+ "/imitate_episodes.py:535-560": "Training Policy with Data Loader",
+ "/imitate_episodes.py:561-584": "Validation Logging and Model Checkpointing",
+ "/imitate_episodes.py:585-611": "Training and Validation Logging",
+ "/imitate_episodes.py:612-638": "Best Model Identification and Saving",
+ "/imitate_episodes.py:639-648": "Command-Line Arguments for Task Execution",
+ "/imitate_episodes.py:649-662": "Customizable ACT Model with Argparse Options",
+ "/imitate_episodes.py:66-90": "ACT Policy Configuration",
+ "/imitate_episodes.py:663-666": "Command Line Arguments for Parser Object",
+ "/imitate_episodes.py:91-115": "Configuring Policy-Based RL Agents",
+ "/policy.py": "Policy Network for Multi-Camera Tasks",
+ "/policy.py:1-28": "Diffusion Policy Model Creation",
+ "/policy.py:113-137": "Diffusion Noise Sampling and Loss Calculation",
+ "/policy.py:138-162": "Gaussian Noise Init Action from Policy",
+ "/policy.py:164-193": "Noise Scheduler Initialization",
+ "/policy.py:194-218": "ACT Policy Class for RL Tasks",
+ "/policy.py:219-240": "Training Policy Function for RL with KL Divergence",
+ "/policy.py:241-270": "CNNMLP Policy Model Initialization",
+ "/policy.py:271-295": "KL Divergence Neural Network Policy Model",
+ "/policy.py:29-48": "Initializing Model Parameters and Creating Layers",
+ "/policy.py:50-86": "Defining Policy Networks with Backbones",
+ "/policy.py:87-111": "Optimizing Multi-Camera Policy Network",
+ "/postprocess_episodes.py": "Efficient Data Processing: Postprocess Episodes",
+ "/postprocess_episodes.py:1-33": "Robot Data Processing Script",
+ "/postprocess_episodes.py:105-125": "Compress and Measure Image Compression Time",
+ "/postprocess_episodes.py:126-143": "Padding and Saving Images for HDF5",
+ "/postprocess_episodes.py:144-162": "Code for Dataset Creation and Population",
+ "/postprocess_episodes.py:164-175": "Episode Post-processing Tool",
+ "/postprocess_episodes.py:34-60": "Postprocess Episodes: Unpack, Uncompress and Store Images",
+ "/postprocess_episodes.py:61-77": "Episode Data Processor",
+ "/postprocess_episodes.py:77-103": "Episode Postprocessing: Flip, Data Dict Creation, Compression",
+ "/record_sim_episodes.py": "Record and Evaluate Simulation Episodes",
+ "/record_sim_episodes.py:1-33": "Replaying Simulated Episodes for Datasets",
+ "/record_sim_episodes.py:114-145": "Episode Success Measurer",
+ "/record_sim_episodes.py:146-167": "Sim Episode Processor",
+ "/record_sim_episodes.py:168-186": "Simulation Data Generation and Analysis",
+ "/record_sim_episodes.py:187-190": "Command Line Arguments for Parser",
+ "/record_sim_episodes.py:34-61": "Episode Loop Initialization",
+ "/record_sim_episodes.py:62-84": "Episode Recording and Trajectory Extraction",
+ "/record_sim_episodes.py:86-113": "Episode Replay Mechanism",
+ "/replay_episodes.py": "Episode Replayer: Organize Images to Videos",
+ "/replay_episodes.py:1-41": "Replay Episode Simulator",
+ "/replay_episodes.py:42-48": "Save Videos Function Setup",
+ "/scripted_policy.py": "Scripted Policy: Robotic Arm Trajectory Execution",
+ "/scripted_policy.py:1-33": "Robotic Arm Policy Generation",
+ "/scripted_policy.py:104-131": "Initialize InsertionPolicy Variables",
+ "/scripted_policy.py:132-142": "Robot Arm Trajectory Sequence Script",
+ "/scripted_policy.py:142-158": "Object Transfer Policy Script",
+ "/scripted_policy.py:159-191": "PickAndTransfer: Policy Execution and Evaluation",
+ "/scripted_policy.py:192-193": "Scripted Policy: Testing Simulated Cube Transfer",
+ "/scripted_policy.py:34-56": "Interpolating Trajectory Executor",
+ "/scripted_policy.py:57-81": "PickAndTransferPolicy: Noisy Trajectory Generation",
+ "/scripted_policy.py:83-95": "Robot Arm Sequence and Movement Control",
+ "/scripted_policy.py:96-103": "Robot Gripper Sequence: Approach, Grip, Move, Close, Meet, Open, Stay",
+ "/setup.py": "Setup 'act' Software Distribution",
+ "/sim_env.py": "Bi-Manual Manipulation Environment",
+ "/sim_env.py:1-26": "Simulation Environment for Bi-Manual Robot Control",
+ "/sim_env.py:108-130": "Bimanual Task Environment Class",
+ "/sim_env.py:131-154": "Gripper Reward Calculation Algorithm",
+ "/sim_env.py:156-180": "Episode Initialization and Contact Rewards",
+ "/sim_env.py:181-205": "Physics Simulation Environment Setup",
+ "/sim_env.py:206-218": "Checking Gripper and Peg Contact in Vx300s",
+ "/sim_env.py:219-242": "Function Defines Rewards and Actions in Sim Environment",
+ "/sim_env.py:243-266": "Teleoperation Test Setup: ALOHA and InterbotixManipulatorXS",
+ "/sim_env.py:267-279": "Interactive Simulation Env. Plotting & Action Inputs",
+ "/sim_env.py:28-38": "Observation Space for Cube Transfer Simulation",
+ "/sim_env.py:39-61": "Bimanual ViperX Environment Setup",
+ "/sim_env.py:62-84": "Gripper Actions in Puppet Environment",
+ "/sim_env.py:85-107": "Joint Data Extraction and Normalization",
+ "/train_actuator_network.py": "Train Actuator Network: Data to Policy",
+ "/train_actuator_network.py:103-121": "Tracking Validation Loss and Predictions",
+ "/train_actuator_network.py:122-146": "Training Actuator Network Policy",
+ "/train_actuator_network.py:147-167": "Visualizing Neural Network Predictions",
+ "/train_actuator_network.py:169-189": "Neural Network Prediction of Commands",
+ "/train_actuator_network.py:190-217": "Visualize Actuator Network Speeds",
+ "/train_actuator_network.py:2-41": "Training Actuator Network: Parameters and Libraries",
+ "/train_actuator_network.py:218-243": "Train Transformer Network for Prediction",
+ "/train_actuator_network.py:244-272": "Transformer & Positional Encoding for Source Data Extraction",
+ "/train_actuator_network.py:273-295": "Normalizing Speed Data for Analysis",
+ "/train_actuator_network.py:296-316": "Calculate Means and Standard Deviations of Speeds",
+ "/train_actuator_network.py:317-340": "Locate Transition Index and Read Commanded Speed",
+ "/train_actuator_network.py:341-357": "Preparing Input Data for ML",
+ "/train_actuator_network.py:358-367": "Train Actuator Network Function",
+ "/train_actuator_network.py:42-61": "Initialize, Assert, Split & Print Data",
+ "/train_actuator_network.py:63-80": "Train Actuator Network: Dataset Preparation and Saving",
+ "/train_actuator_network.py:81-102": "Train Actuator Network: Initialize, Validate, Repeat",
+ "/train_latent_model.py": "Train Latent Model with ACT-Plus-Plus",
+ "/train_latent_model.py:1-36": "Train Latent Model with ACT-Plus-Plus",
+ "/train_latent_model.py:122-154": "Save Best Checkpoint and Define Policy Function",
+ "/train_latent_model.py:155-184": "Policy Evaluation Function",
+ "/train_latent_model.py:185-213": "Real/Simulated Environment Initialization and Setup",
+ "/train_latent_model.py:214-238": "Training Latent Model Evaluation Loop",
+ "/train_latent_model.py:239-260": "Training Latent Model in Interactive Environment",
+ "/train_latent_model.py:261-279": "Policy-Based Action Selection",
+ "/train_latent_model.py:280-302": "Episode Tracking and Visualization in Robotics",
+ "/train_latent_model.py:303-326": "Rollout Performance Summarizer",
+ "/train_latent_model.py:327-353": "Train Latent Model with VQ-VAE",
+ "/train_latent_model.py:355-382": "Train Latent Model and Policy",
+ "/train_latent_model.py:38-65": "Retrieve Task Parameters from File",
+ "/train_latent_model.py:384-406": "Epoch Summary and Loss Tracking",
+ "/train_latent_model.py:407-431": "Saving and Plotting Latent Model Progress",
+ "/train_latent_model.py:432-453": "Plotting Training Curves for Latent Model",
+ "/train_latent_model.py:454-466": "Customizable Latent Model Training",
+ "/train_latent_model.py:467-470": "Parse Latent Model Parameters",
+ "/train_latent_model.py:66-92": "Latent Model Training Configurator",
+ "/train_latent_model.py:93-120": "Train Behavioral Cloning Model",
+ "/truncate_data.py": "Truncate and Compress Dataset with h5py",
+ "/truncate_data.py:1-35": "Truncate and Compress Dataset",
+ "/truncate_data.py:112-135": "Compress-Decompress Image Data",
+ "/truncate_data.py:138-157": "Truncate HDF5 Datasets",
+ "/truncate_data.py:36-57": "Data Compression and Observation Group Creation",
+ "/truncate_data.py:58-84": "Truncate and Concatenate Data",
+ "/truncate_data.py:85-111": "Load and Save First Episode Video"
+}
\ No newline at end of file
diff --git a/docs/data/titles/1.json b/docs/data/titles/1.json
new file mode 100644
index 00000000..e9b5e8c8
--- /dev/null
+++ b/docs/data/titles/1.json
@@ -0,0 +1,51 @@
+{
+ "/utils.py": "Episodic Dataset Processing & Augmentation",
+ "/utils.py:1-33": "Episodic Dataset Augmenter",
+ "/utils.py:100-121": "Image Stacking and Preprocessing",
+ "/utils.py:122-145": "Image Data Preprocessing and Error Handling",
+ "/utils.py:146-171": "Get Normalized Stats from Dataset Paths",
+ "/utils.py:172-196": "Normalizing Data for Machine Learning Training",
+ "/utils.py:198-218": "HDF5 Search and Batch Sampler Functions",
+ "/utils.py:220-233": "Load and Split Data Function",
+ "/utils.py:234-247": "Generate and Print Train/Validation Episode Details",
+ "/utils.py:248-264": "Episodic Dataset Initialization",
+ "/utils.py:265-284": "Augmented Worker Datasets",
+ "/utils.py:285-324": "Base Action Preprocessing and Pose Sampling",
+ "/utils.py:325-360": "Calculate Mean, Detach Values, Set Seed Utilities",
+ "/utils.py:34-58": "Transformation Initialization and Location Function",
+ "/utils.py:59-77": "Base Action Concatenation and Data Storage",
+ "/utils.py:78-99": "Video Preprocessing for Agent in Simulation",
+ "/vinn_cache_feature.py": "Cache Feature Extraction",
+ "/vinn_cache_feature.py:1-44": "VINN Feature Extraction Code",
+ "/vinn_cache_feature.py:118-142": "Saving Features to HDF5 File",
+ "/vinn_cache_feature.py:143-148": "Setting up Argument Parser",
+ "/vinn_cache_feature.py:45-72": "Load and Organize Data for Feature Extractors",
+ "/vinn_cache_feature.py:73-93": "Initialize and Preprocess ResNet18 Model for Inference",
+ "/vinn_cache_feature.py:94-117": "Feature Extraction and Storage from Images",
+ "/vinn_eval.py": "Vinn Evaluation Script",
+ "/vinn_eval.py:1-35": "Nearest Neighbor Calculator",
+ "/vinn_eval.py:102-130": "Train Data Loading: Visual Feature Concatenation",
+ "/vinn_eval.py:132-154": "Stacking Actions and Formatting Torch Tensors",
+ "/vinn_eval.py:155-185": "Rollout Performance Tracking Algorithm",
+ "/vinn_eval.py:186-209": "Evaluation Loop and Data Collection",
+ "/vinn_eval.py:210-229": "Robotics Image Processing",
+ "/vinn_eval.py:230-250": "Nearest Neighbor Action Selection",
+ "/vinn_eval.py:252-275": "Robot Joint Control and Safety Ensuring",
+ "/vinn_eval.py:276-294": "Gripper Mode Setting and Performance Evaluation",
+ "/vinn_eval.py:295-322": "Episode Analysis: Success Rate, Average Return",
+ "/vinn_eval.py:323-336": "Expand Greyscale Image Classification Script",
+ "/vinn_eval.py:37-63": "Weighted Pairwise Distance Prediction",
+ "/vinn_eval.py:64-101": "Task-Specific Parameter Setting and Configuration",
+ "/vinn_select_k.py": "Selecting Optimal K with Nearest Neighbors",
+ "/vinn_select_k.py:1-31": "Selecting K-Nearest Neighbors with Softmax Weights",
+ "/vinn_select_k.py:119-134": "Optimal 'k' Selection with Python Code",
+ "/vinn_select_k.py:32-66": "Episode Index Validator",
+ "/vinn_select_k.py:67-94": "Concatenate HDF5 Data for Training",
+ "/vinn_select_k.py:95-118": "Prepare HDF5 Data for Training in PyTorch",
+ "/visualize_episodes.py": "Timestamp Plot Generator",
+ "/visualize_episodes.py:1-35": "Robotics Data Loading and Processing",
+ "/visualize_episodes.py:124-154": "Timestamp Visualization for Camera Frames",
+ "/visualize_episodes.py:36-60": "Mirrored HDF5 Data Visualization",
+ "/visualize_episodes.py:61-87": "Video Concatenator and Joint Visualizer",
+ "/visualize_episodes.py:88-123": "Visualize Joint States and Arm Commands"
+}
\ No newline at end of file
diff --git a/docs/doc/00a40130-d31a-4515-841b-5c7d06356105.json b/docs/doc/00a40130-d31a-4515-841b-5c7d06356105.json
new file mode 100644
index 00000000..32694b9b
--- /dev/null
+++ b/docs/doc/00a40130-d31a-4515-841b-5c7d06356105.json
@@ -0,0 +1,10 @@
+{
+ "summary": "This YAML file defines a Conda environment named \"aloha\" with specified channels, Python version, and required packages for the codebase.",
+ "details": [
+ {
+ "comment": "This YAML file defines a Conda environment named \"aloha\" with specified channels, Python version, and required packages for the codebase.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/conda_env.yaml\":0-22",
+ "content": "name: aloha\nchannels:\n - pytorch\n - nvidia\n - conda-forge\ndependencies:\n - python=3.9\n - pip=23.0.1\n - pytorch=2.0.0\n - torchvision=0.15.0\n - pytorch-cuda=11.8\n - pyquaternion=0.9.9\n - pyyaml=6.0\n - rospkg=1.5.0\n - pexpect=4.8.0\n - mujoco=2.3.3\n - dm_control=1.0.9\n - py-opencv=4.7.0\n - matplotlib=3.7.1\n - einops=0.6.0\n - packaging=23.0\n - h5py=3.8.0\n - ipython=8.12.0"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/07a12791-77ee-4b6b-a8ab-3b326b5c176e.json b/docs/doc/07a12791-77ee-4b6b-a8ab-3b326b5c176e.json
new file mode 100644
index 00000000..b9f31a13
--- /dev/null
+++ b/docs/doc/07a12791-77ee-4b6b-a8ab-3b326b5c176e.json
@@ -0,0 +1,25 @@
+{
+ "summary": "This code includes ACT, Diffusion Policy, and VINN implementations with two simulated environments, installation instructions for dependencies and environment, demo scripts, data generation and visualization guides, training tips, and expected success rate evaluation.",
+ "details": [
+ {
+ "comment": "This code contains the implementation of ACT, Diffusion Policy, and VINN along with two simulated environments (Transfer Cube and Bimanual Insertion) that can be trained and evaluated in sim or real settings. It also requires installing Mobile ALOHA from a separate repository, which has been forked from the ACT repo. The code is organized into several Python files, each responsible for specific aspects of the algorithms or environments. Demo scripts for simulated environments are available online.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/README.md\":0-19",
+ "content": "# Imitation Learning algorithms and Co-training for Mobile ALOHA\n#### Project Website: https://mobile-aloha.github.io/\nThis repo contains the implementation of ACT, Diffusion Policy and VINN, together with 2 simulated environments:\nTransfer Cube and Bimanual Insertion. You can train and evaluate them in sim or real.\nFor real, you would also need to install [Mobile ALOHA](https://github.com/MarkFzp/mobile-aloha). This repo is forked from the [ACT repo](https://github.com/tonyzhaozh/act).\n### Updates:\nYou can find all scripted/human demo for simulated environments [here](https://drive.google.com/drive/folders/1gPR03v05S1xiInoVJn7G7VJ9pDCnxq9O?usp=share_link).\n### Repo Structure\n- ``imitate_episodes.py`` Train and Evaluate ACT\n- ``policy.py`` An adaptor for ACT policy\n- ``detr`` Model definitions of ACT, modified from DETR\n- ``sim_env.py`` Mujoco + DM_Control environments with joint space control\n- ``ee_sim_env.py`` Mujoco + DM_Control environments with EE space control\n- ``scripted_policy.py`` Scripted policies for sim environments"
+ },
+ {
+ "comment": "This code provides installation instructions for the environment and dependencies needed to run the ALOHA codebase. It also mentions the necessary steps to set up a new terminal and highlights some of the available simulation experiments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/README.md\":20-56",
+ "content": "- ``constants.py`` Constants shared across files\n- ``utils.py`` Utils such as data loading and helper functions\n- ``visualize_episodes.py`` Save videos from a .hdf5 dataset\n### Installation\n conda create -n aloha python=3.8.10\n conda activate aloha\n pip install torchvision\n pip install torch\n pip install pyquaternion\n pip install pyyaml\n pip install rospkg\n pip install pexpect\n pip install mujoco==2.3.7\n pip install dm_control==1.0.14\n pip install opencv-python\n pip install matplotlib\n pip install einops\n pip install packaging\n pip install h5py\n pip install ipython\n cd act/detr && pip install -e .\n- also need to install https://github.com/ARISE-Initiative/robomimic/tree/r2d2 (note the r2d2 branch) for Diffusion Policy by `pip install -e .`\n### Example Usages\nTo set up a new terminal, run:\n conda activate aloha\n cd \n### Simulated experiments (LEGACY table-top ALOHA environments)\nWe use ``sim_transfer_cube_scripted`` task in the examples below. Another option is ``sim_insertion_scripted``."
+ },
+ {
+ "comment": "This code provides instructions for generating and visualizing data, training the ACT model, and evaluating its performance. It also mentions the expected success rates for different tasks and includes an option for temporal ensembling.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/README.md\":57-76",
+ "content": "To generated 50 episodes of scripted data, run:\n python3 record_sim_episodes.py --task_name sim_transfer_cube_scripted --dataset_dir --num_episodes 50\nTo can add the flag ``--onscreen_render`` to see real-time rendering.\nTo visualize the simulated episodes after it is collected, run\n python3 visualize_episodes.py --dataset_dir --episode_idx 0\nNote: to visualize data from the mobile-aloha hardware, use the visualize_episodes.py from https://github.com/MarkFzp/mobile-aloha\nTo train ACT:\n # Transfer Cube task\n python3 imitate_episodes.py --task_name sim_transfer_cube_scripted --ckpt_dir --policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 --num_epochs 2000 --lr 1e-5 --seed 0\nTo evaluate the policy, run the same command but add ``--eval``. This loads the best validation checkpoint.\nThe success rate should be around 90% for transfer cube, and around 50% for insertion.\nTo enable temporal ensembling, add flag ``--temporal_agg``."
+ },
+ {
+ "comment": "This code snippet provides instructions for saving videos to a specified directory during rollouts, and suggests using the \"--onscreen_render\" option for real-time rendering. It recommends training for at least 5000 epochs or three to four times the length of data after loss plateaus for better results in real-world scenarios. The code also provides a link to additional tuning tips for further information and emphasizes that longer training can improve success rate and smoothness even when the loss has plateaued.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/README.md\":77-84",
+ "content": "Videos will be saved to ```` for each rollout.\nYou can also add ``--onscreen_render`` to see real-time rendering during evaluation.\nFor real-world data where things can be harder to model, train for at least 5000 epochs or 3-4 times the length after the loss has plateaued.\nPlease refer to [tuning tips](https://docs.google.com/document/d/1FVIZfoALXg_ZkYKaYVh-qOlaXveq5CtvJHXkY25eYhs/edit?usp=sharing) for more info.\n### [ACT tuning tips](https://docs.google.com/document/d/1FVIZfoALXg_ZkYKaYVh-qOlaXveq5CtvJHXkY25eYhs/edit?usp=sharing)\nTL;DR: if your ACT policy is jerky or pauses in the middle of an episode, just train for longer! Success rate and smoothness can improve way after loss plateaus."
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/0a83e4f4-5f11-4680-a393-edf111c2aa59.json b/docs/doc/0a83e4f4-5f11-4680-a393-edf111c2aa59.json
new file mode 100644
index 00000000..45f8aca4
--- /dev/null
+++ b/docs/doc/0a83e4f4-5f11-4680-a393-edf111c2aa59.json
@@ -0,0 +1,70 @@
+{
+ "summary": "This code defines a function for nearest neighbor calculation, performs rollouts, and preprocesses features for image classification tasks. It uses command-line arguments to run the script with specific directories and checkpoints.",
+ "details": [
+ {
+ "comment": "This code imports necessary libraries and defines a function that calculates nearest neighbors for a given feature. The function takes the current feature, support inputs, support targets, number of neighbors to consider (k), and state weight as input parameters. It also handles cases where there is an action skip in the support targets by reshaping them before processing. The code defines separate features for visual and spatial modalities (curr_vis_feature, curr_s_feature, support_vis_feature, support_s_feature).",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":0-34",
+ "content": "import torch\nfrom torch import nn\nimport torch.nn.functional as F\nimport numpy as np\nimport h5py\nimport pathlib\nimport os\nimport argparse\nimport matplotlib.pyplot as plt\nfrom PIL import Image\nimport torchvision\nfrom torchvision import transforms\n# from visualize_episodes import visualize_joints\nfrom utils import set_seed, sample_box_pose\n# from imitate_episodes import get_image\nfrom sim_env import BOX_POSE\nfrom constants import DT\nfrom imitate_episodes import save_videos\nfrom einops import rearrange\nimport time\nDT = 0.02\nimport IPython\ne = IPython.embed\n# modified from https://github.com/jyopari/VINN/blob/main/nearest-neighbor-eval/handle_nn.ipynb\ndef calculate_nearest_neighbors(curr_feature, support_inputs, support_targets, k, state_weight):\n has_skip = len(support_targets.shape) == 3\n if has_skip: # when there is action skip\n num_targets, skip, a_dim = support_targets.shape\n support_targets = support_targets.view((num_targets, -1))\n curr_vis_feature, curr_s_feature = curr_feature\n support_vis_feature, support_s_feature = support_inputs"
+ },
+ {
+ "comment": "The code calculates pairwise distances between current and support features, sorts them, and assigns weights to the top-k distances. It then uses these weights to create a weighted sum of support targets as the prediction. The function takes arguments 'args', but they are not used in this snippet. Additionally, it allows skipping predictions for every 100th frame with 'has_skip' flag.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":36-62",
+ "content": " pairwise_dist_vis = torch.norm(curr_vis_feature - support_vis_feature, dim=1).unsqueeze(0)\n pairwise_dist_s = torch.norm(curr_s_feature - support_s_feature, dim=1).unsqueeze(0)\n pairwise_dist = pairwise_dist_vis + pairwise_dist_s * state_weight\n sorted_dist, index = torch.sort(pairwise_dist, dim=1) # sort the support axis\n permuted_support_targets = support_targets[index]\n topk_dist = pairwise_dist[:, :k]\n topk_support_targets = permuted_support_targets[:, :k]\n weights = F.softmax(-topk_dist, dim=1)\n weighted_support_targets = weights.unsqueeze(2) * topk_support_targets\n prediction = torch.sum(weighted_support_targets, dim=1)\n if has_skip:\n num_predictions = prediction.shape[0]\n prediction = prediction.reshape((num_predictions, skip, a_dim))\n return prediction\ndef main(args):\n # TODO ######################\n k = None # for scripted box transfer\n skip = 100\n real_robot = True\n save_episode = True\n # TODO ######################\n onscreen_cam = 'main'"
+ },
+ {
+ "comment": "This code sets various parameters and configurations for different tasks based on the task name provided. It assigns specific episode lengths, maximum rewards, kernel sizes (ks), and state weights depending on the task type. If the task is not implemented, it raises a NotImplementedError. The model name's last part before the file extension is used as the seed, and the representation type is set to 'byol'. For models with 'cotrain' in their names, it assigns the repr_type accordingly.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":63-100",
+ "content": " state_dim = 14\n dataset_dir = args['dataset_dir']\n onscreen_render = args['onscreen_render']\n ckpt_dir = args['ckpt_dir']\n model_dir = args['model_dir']\n task_name = args['task_name']\n if 'insertion' in task_name:\n sim_episode_len = 400\n env_max_reward = 4\n ks = [None]\n elif 'transfer_cube' in task_name:\n sim_episode_len = 400\n env_max_reward = 4\n ks = [1, 1, 1]\n if 'human' in dataset_dir:\n state_weight = 5\n else:\n state_weight = 10\n print(f'{state_weight=}')\n elif task_name == 'ziploc_slide':\n env_max_reward = 1\n ks = [71]\n state_weight = 0\n elif task_name == 'aloha_mobile_wipe_wine':\n sim_episode_len = 1300\n env_max_reward = 4\n ks = [2, 2, 2]\n state_weight = 5\n print(f'{state_weight=}')\n else:\n raise NotImplementedError\n model_name = pathlib.PurePath(model_dir).name\n seed = int(model_name.split('-')[-1][:-3])\n repr_type = 'byol'\n if 'cotrain' in model_name:"
+ },
+ {
+ "comment": "This code loads train data by iterating over 40 episodes. It retrieves action, base_action, and camera names from a dataset file. For each episode, it concatenates the visual features of all cameras into 'vis_fea'. The repr_type is extended with '_cotrain', and BASE_DELAY is set to 15 for real_robot cases.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":101-129",
+ "content": " repr_type += '_cotrain'\n e() # make sure!\n k = ks[seed]\n if real_robot:\n BASE_DELAY = 15\n query_freq = skip - BASE_DELAY\n # load train data\n vis_features = []\n state_features = []\n Y = []\n for episode_id in range(0, 40):\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n action = root['/action'][:]\n base_action = root['/base_action'][:]\n action = np.concatenate([action, base_action], axis=1)\n camera_names = list(root[f'/observations/images/'].keys())\n # Visual feature\n all_cam_feature = []\n for cam_name in camera_names:\n feature_dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_id}.hdf5')\n with h5py.File(feature_dataset_path, 'r') as root:\n cam_feature = root[f'/features/{cam_name}'][:]\n all_cam_feature.append(cam_feature)\n vis_fea = np.concatenate(all_cam_feature, axis=1)"
+ },
+ {
+ "comment": "This code reads episode data from a file, stacks actions together, appends them to feature lists, and then concatenates the feature lists. Finally, it creates torch tensors for training inputs.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":131-153",
+ "content": " ## State feature\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n s_fea = root['/observations/qpos'][:]\n # stack actions together\n eps_len = len(action)\n indices = np.tile(np.arange(skip), eps_len).reshape(eps_len, skip) # each row is 0, 1, ... skip\n offset = np.expand_dims(np.arange(eps_len), axis=1)\n indices = indices + offset # row1: 0, 1, ... skip; row2: 1, 2, ... skip+1\n # indices will exceed eps_len, thus clamp to eps_len-1\n indices = np.clip(indices, 0, eps_len-1)\n # stack action\n action = action[indices] # new shape: eps_len, skip, a_dim\n vis_features.append(vis_fea)\n state_features.append(s_fea)\n Y.append(action)\n vis_features = np.concatenate(vis_features)\n state_features = np.concatenate(state_features)\n Y = np.concatenate(Y)\n train_inputs = [torch.from_numpy(vis_features).cuda(), torch.from_numpy(state_features).cuda()]"
+ },
+ {
+ "comment": "The code initializes feature extractors for each camera, loads the environment based on real_robot flag, and starts a loop to perform rollouts. It creates episode returns and maximum rewards lists for tracking performance metrics during the rollouts.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":154-184",
+ "content": " train_targets = torch.from_numpy(Y).cuda()\n set_seed(1000)\n feature_extractors = {}\n for cam_name in camera_names:\n resnet = torchvision.models.resnet18(pretrained=True)\n loading_status = resnet.load_state_dict(torch.load(model_dir.replace('DUMMY', cam_name)))\n print(cam_name, loading_status)\n resnet = nn.Sequential(*list(resnet.children())[:-1])\n resnet = resnet.cuda()\n resnet.eval()\n feature_extractors[cam_name] = resnet\n # load environment\n if real_robot:\n from aloha_scripts.real_env import make_real_env #### TODO TODO\n env = make_real_env(init_node=True, setup_robots=True, setup_base=True)\n max_timesteps = sim_episode_len\n camera_names = ['cam_high', 'cam_left_wrist', 'cam_right_wrist']\n else:\n from sim_env import make_sim_env\n env = make_sim_env(task_name)\n max_timesteps = sim_episode_len\n num_rollouts = 50\n episode_returns = []\n max_rewards = []\n for rollout_id in range(num_rollouts):"
+ },
+ {
+ "comment": "This code sets up a task, resets the environment, and enters an evaluation loop. It collects data for visualization, including qpos and images, and stores them in lists. The code is performing these actions at specific intervals based on the provided conditions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":185-208",
+ "content": " ### set task\n BOX_POSE[0] = sample_box_pose() # used in sim reset\n ts = env.reset()\n ### evaluation loop\n qpos_history = torch.zeros((1, max_timesteps, state_dim)).cuda()\n image_list = [] # for visualization\n qpos_list = []\n target_qpos_list = []\n rewards = []\n with torch.inference_mode():\n for t in range(sim_episode_len):\n start_time = time.time()\n if t % 100 == 0: print(t)\n if t % query_freq == 0:\n ### process previous timestep to get qpos and image_list\n obs = ts.observation\n if 'images' in obs:\n image_list.append(obs['images'])\n else:\n image_list.append({'main': obs['image']})\n qpos_numpy = np.array(obs['qpos'])\n # qpos = pre_process(qpos_numpy)\n qpos = torch.from_numpy(qpos_numpy).float().cuda().unsqueeze(0)"
+ },
+ {
+ "comment": "This code segment processes an image for a robotics task. It stores the current qpos in history, retrieves and preprocesses raw camera images using transforms such as resizing, cropping, normalization, and tensor conversion. It then collects features from each camera using respective feature extractors and stores them in all_cam_features.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":209-228",
+ "content": " qpos_history[:, t] = qpos\n _, curr_image_raw = get_image(ts, camera_names)\n image_size = 120\n transform = transforms.Compose([\n transforms.Resize(image_size), # will scale the image\n transforms.CenterCrop(image_size),\n transforms.ToTensor(),\n transforms.Lambda(expand_greyscale),\n transforms.Normalize(\n mean=torch.tensor([0.485, 0.456, 0.406]),\n std=torch.tensor([0.229, 0.224, 0.225])),\n ])\n all_cam_features = []\n for cam_id, curr_image in enumerate(curr_image_raw):\n curr_image = Image.fromarray(curr_image) # TODO only one camera\n curr_image = transform(curr_image)\n curr_image = curr_image.unsqueeze(dim=0).cuda()\n curr_image_feature = feature_extractors[camera_names[cam_id]](curr_image)"
+ },
+ {
+ "comment": "The code preprocesses visual and state features, calculates nearest neighbors for action selection using a specified metric, and filters out the required action based on query frequency. The resulting target position and base action are extracted for further processing.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":229-249",
+ "content": " curr_image_feature = curr_image_feature.squeeze(3).squeeze(2)\n all_cam_features.append(curr_image_feature)\n curr_image_feature = torch.cat(all_cam_features, dim=1)\n ### Visual feature\n # curr_feature = curr_image_feature\n ### State feature\n # curr_feature = qpos\n ### Both features\n curr_feature = [curr_image_feature, qpos]\n action = calculate_nearest_neighbors(curr_feature, train_inputs, train_targets, k, state_weight) # TODO use this\n action = action.squeeze(0).cpu().numpy()\n action = np.concatenate([action[:-BASE_DELAY, :-2], action[BASE_DELAY:, -2:]], axis=1)\n print(f'Query: {(time.time() - start_time):.3f}s')\n curr_action = action[t % query_freq]\n target_qpos = curr_action[:-2]\n base_action = curr_action[-2:]"
+ },
+ {
+ "comment": "This code chunk is responsible for controlling the movement of a robot's joints, ensuring safety by clipping target positions within safe limits. It steps through the environment and saves information for visualization. If the robot is real, it sets the operating modes for the gripper and pwm.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":251-274",
+ "content": " # ### SAFETY\n # max_a = 0.05\n # curr_qpos = qpos.squeeze().cpu().numpy()\n # target_qpos = target_qpos.clip(curr_qpos - max_a, curr_qpos + max_a)\n # ### SAFETY\n ### step the environment\n ts = env.step(target_qpos, base_action=base_action)\n duration = time.time() - start_time\n # print(f'{duration:.3f}')\n time.sleep(max(0, DT - duration))\n ### save things for visualization\n qpos_list.append(qpos_numpy)\n target_qpos_list.append(target_qpos)\n rewards.append(ts.reward)\n # if real_robot and t != 0 and t % 60 == 0:\n # e()\n plt.close()\n if real_robot:\n env.puppet_bot_left.dxl.robot_set_operating_modes(\"single\", \"gripper\", \"position\")\n env.puppet_bot_right.dxl.robot_set_operating_modes(\"single\", \"gripper\", \"position\")\n env.puppet_bot_left.dxl.robot_set_operating_modes(\"single\", \"gripper\", \"pwm\")"
+ },
+ {
+ "comment": "This code sets the operating modes for the robot's gripper and calculates rewards, episode returns, and maximum rewards. It then prints these values and saves videos or images if required. Finally, it calculates success rate and average return and constructs a summary string.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":275-293",
+ "content": " env.puppet_bot_right.dxl.robot_set_operating_modes(\"single\", \"gripper\", \"pwm\")\n rewards = np.array(rewards)\n episode_return = np.sum(rewards[rewards!=None])\n episode_returns.append(episode_return)\n max_reward = np.max(rewards)\n max_rewards.append(max_reward)\n print(f'{episode_return=}, {max_reward=}')\n if save_episode:\n save_videos(image_list, DT, video_path=os.path.join(ckpt_dir, f'video{rollout_id}.mp4'))\n # visualize_joints(qpos_list, target_qpos_list, plot_path=os.path.join(ckpt_dir, f'qpos{rollout_id}.png'))\n # visualize_joints(qpos_list, example_qpos, plot_path=os.path.join(ckpt_dir, f'qpos_reference{rollout_id}.png'), label_overwrite=(\"policy\", \"dataset\"))\n success_rate = np.mean(np.array(max_rewards) == env_max_reward)\n avg_return = np.mean(episode_returns)\n summary_str = f'\\nSuccess rate: {success_rate}\\nAverage return: {avg_return}\\n\\n'\n for r in range(env_max_reward+1):\n more_or_equal_r = (np.array(max_rewards) >= r).sum()"
+ },
+ {
+ "comment": "This function calculates the success rate, average return, and saves results to a text file for each episode. It retrieves images from observations, processes them, and stores the current image raw data in the correct format for further processing or visualization.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":294-321",
+ "content": " more_or_equal_r_rate = more_or_equal_r / num_rollouts\n summary_str += f'Reward >= {r}: {more_or_equal_r}/{num_rollouts} = {more_or_equal_r_rate*100}%\\n'\n print(summary_str)\n # save success rate to txt\n result_file_name = f'result_{skip}_{k}' + '.txt'\n with open(os.path.join(ckpt_dir, result_file_name), 'w') as f:\n f.write(summary_str)\n f.write(repr(episode_returns))\n f.write('\\n\\n')\n f.write(repr(max_rewards))\n return success_rate, avg_return\ndef get_image(ts, camera_names):\n if 'images' in ts.observation:\n curr_images = []\n for cam_name in camera_names:\n curr_image = rearrange(ts.observation['images'][cam_name], 'h w c -> c h w')\n curr_images.append(curr_image)\n curr_image_raw = np.stack(curr_images, axis=0)\n else:\n curr_image_raw = rearrange(ts.observation['image'], 'h w c -> c h w')\n curr_image = torch.from_numpy(curr_image_raw / 255.0).float().cuda().unsqueeze(0)\n curr_image_raw = rearrange(curr_image_raw, 'b c h w -> b h w c')"
+ },
+ {
+ "comment": "The code defines a function expand_greyscale, sets up argument parsing with required parameters like dataset_dir and model_dir, and calls main function with the parsed arguments. The main function is not defined in this chunk but is called by passing the command-line arguments as variables. It seems to be a script for running an image classification task with specific directories and checkpoints.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_eval.py\":322-335",
+ "content": " return curr_image, curr_image_raw\ndef expand_greyscale(t):\n return t.expand(3, -1, -1)\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--onscreen_render', action='store_true')\n parser.add_argument('--dataset_dir', action='store', type=str, help='The text to parse.', required=True)\n parser.add_argument('--model_dir', action='store', type=str, help='model_dir', required=True)\n parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--ckpt_dir', action='store', type=str, help='The text to parse.', required=True)\n main(vars(parser.parse_args()))"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/0ab776c9-65fb-4ffc-8547-0aa013a25be9.json b/docs/doc/0ab776c9-65fb-4ffc-8547-0aa013a25be9.json
new file mode 100644
index 00000000..b3a79b44
--- /dev/null
+++ b/docs/doc/0ab776c9-65fb-4ffc-8547-0aa013a25be9.json
@@ -0,0 +1,115 @@
+{
+ "summary": "The code uses DynamixelSDK for motor communication, offering a class for control and incorporating functions for cleanup, conversion, and initialization. It manages motion control through command-line arguments and handles data from Dynamixel motors in an infinite loop.",
+ "details": [
+ {
+ "comment": "This code is for communicating with Dynamixel motors using the DynamixelSDK. It defines protocol version, addresses for various motor data, byte lengths, and scale factors for position, velocity, and current. The dynamixel_cleanup_handler function ensures Dynamixels are disconnected properly before exiting.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":0-37",
+ "content": "\"\"\"Communication using the DynamixelSDK.\"\"\"\n##This is based off of the dynamixel SDK\nimport atexit\nimport logging\nimport time\nfrom typing import Optional, Sequence, Union, Tuple\nimport numpy as np\nPROTOCOL_VERSION = 2.0\n# The following addresses assume XH motors.\nADDR_TORQUE_ENABLE = 64\nADDR_GOAL_POSITION = 116\nADDR_PRESENT_POSITION = 132\nADDR_PRESENT_VELOCITY = 128\nADDR_PRESENT_CURRENT = 126\nADDR_PRESENT_POS_VEL_CUR = 126\n# Data Byte Length\nLEN_PRESENT_POSITION = 4\nLEN_PRESENT_VELOCITY = 4\nLEN_PRESENT_CURRENT = 2\nLEN_PRESENT_POS_VEL_CUR = 10\nLEN_GOAL_POSITION = 4\nDEFAULT_POS_SCALE = 2.0 * np.pi / 4096 # 0.088 degrees\n# See http://emanual.robotis.com/docs/en/dxl/x/xh430-v210/#goal-velocity\nDEFAULT_VEL_SCALE = 0.229 * 2.0 * np.pi / 60.0 # 0.229 rpm\nDEFAULT_CUR_SCALE = 1.34\ndef dynamixel_cleanup_handler():\n \"\"\"Cleanup function to ensure Dynamixels are disconnected properly.\"\"\"\n open_clients = list(DynamixelClient.OPEN_CLIENTS)\n for open_client in open_clients:\n if open_client.port_handler.is_using:\n logging.warning('Forcing client to close.')"
+ },
+ {
+ "comment": "The code defines a class `DynamixelClient` for communicating with Dynamixel motors, supporting Protocol 2. It also contains functions `signed_to_unsigned` and `unsigned_to_signed` for converting signed to unsigned values and vice versa. The client can be initialized with motor IDs, port, baudrate, lazy connect option, and optional position scale.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":38-73",
+ "content": " open_client.port_handler.is_using = False\n open_client.disconnect()\ndef signed_to_unsigned(value: int, size: int) -> int:\n \"\"\"Converts the given value to its unsigned representation.\"\"\"\n if value < 0:\n bit_size = 8 * size\n max_value = (1 << bit_size) - 1\n value = max_value + value\n return value\ndef unsigned_to_signed(value: int, size: int) -> int:\n \"\"\"Converts the given value from its unsigned representation.\"\"\"\n bit_size = 8 * size\n if (value & (1 << (bit_size - 1))) != 0:\n value = -((1 << bit_size) - value)\n return value\nclass DynamixelClient:\n \"\"\"Client for communicating with Dynamixel motors.\n NOTE: This only supports Protocol 2.\n \"\"\"\n # The currently open clients.\n OPEN_CLIENTS = set()\n def __init__(self,\n motor_ids: Sequence[int],\n port: str = '/dev/ttyUSB0',\n baudrate: int = 1000000,\n lazy_connect: bool = False,\n pos_scale: Optional[float] = None,"
+ },
+ {
+ "comment": "This code snippet is the constructor of a class, initializing a new Dynamixel client. It takes motor IDs, device port, baudrate, and optional scaling factors for positions, velocities, and currents as arguments. If not provided, it uses default scales. Lazy connectivity is also available if a method requires a connection when not already connected.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":74-92",
+ "content": " vel_scale: Optional[float] = None,\n cur_scale: Optional[float] = None):\n \"\"\"Initializes a new client.\n Args:\n motor_ids: All motor IDs being used by the client.\n port: The Dynamixel device to talk to. e.g.\n - Linux: /dev/ttyUSB0\n - Mac: /dev/tty.usbserial-*\n - Windows: COM1\n baudrate: The Dynamixel baudrate to communicate with.\n lazy_connect: If True, automatically connects when calling a method\n that requires a connection, if not already connected.\n pos_scale: The scaling factor for the positions. This is\n motor-dependent. If not provided, uses the default scale.\n vel_scale: The scaling factor for the velocities. This is\n motor-dependent. If not provided uses the default scale.\n cur_scale: The scaling factor for the currents. This is\n motor-dependent. If not provided uses the default scale."
+ },
+ {
+ "comment": "This code imports the dynamixel_sdk library and initializes variables for port, baudrate, lazy connect, and protocol version. It also creates handlers for the port and packet communication and instantiates two reader classes for position, velocity, and current data. These readers can be used to access information from Dynamixel motors.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":93-117",
+ "content": " \"\"\"\n import dynamixel_sdk\n self.dxl = dynamixel_sdk\n self.motor_ids = list(motor_ids)\n self.port_name = port\n self.baudrate = baudrate\n self.lazy_connect = lazy_connect\n self.port_handler = self.dxl.PortHandler(port)\n self.packet_handler = self.dxl.PacketHandler(PROTOCOL_VERSION)\n self._pos_vel_cur_reader = DynamixelPosVelCurReader(\n self,\n self.motor_ids,\n pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,\n vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,\n cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,\n )\n self._pos_reader = DynamixelPosReader(\n self,\n self.motor_ids,\n pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,\n vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,\n cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,"
+ },
+ {
+ "comment": "The code initializes reader and writer objects for the Dynamixel motors, handles open clients, and provides a connect method. The `_vel_reader` and `_cur_reader` objects are created with optional scales for position (pos_scale), velocity (vel_scale), and current (cur_scale). These scales allow custom adjustment to the motor data readings. The `self._sync_writers` dictionary is initialized, likely used for synchronous writer operations. The code also includes an `is_connected` property that returns the status of the connection to the Dynamixel motors and a `connect` method which should be called after all DynamixelClients on the same process are created.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":118-145",
+ "content": " )\n self._vel_reader = DynamixelVelReader(\n self,\n self.motor_ids,\n pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,\n vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,\n cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,\n )\n self._cur_reader = DynamixelCurReader(\n self,\n self.motor_ids,\n pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,\n vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,\n cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,\n )\n self._sync_writers = {}\n self.OPEN_CLIENTS.add(self)\n @property\n def is_connected(self) -> bool:\n return self.port_handler.is_open\n def connect(self):\n \"\"\"Connects to the Dynamixel motors.\n NOTE: This should be called after all DynamixelClients on the same\n process are created."
+ },
+ {
+ "comment": "This code checks if the client is already connected and then attempts to open the port. If successful, it logs a message indicating the port has been opened. It also sets the baud rate and logs a success message if that's successful too. The code then enables all motors with True values for settings before enabling. Lastly, there is a function disconnect() which checks if the client is connected, and if so, it disconnects from the Dynamixel device.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":146-170",
+ "content": " \"\"\"\n assert not self.is_connected, 'Client is already connected.'\n if self.port_handler.openPort():\n logging.info('Succeeded to open port: %s', self.port_name)\n else:\n raise OSError(\n ('Failed to open port at {} (Check that the device is powered '\n 'on and connected to your computer).').format(self.port_name))\n if self.port_handler.setBaudRate(self.baudrate):\n logging.info('Succeeded to set baudrate to %d', self.baudrate)\n else:\n raise OSError(\n ('Failed to set the baudrate to {} (Ensure that the device was '\n 'configured for this baudrate).').format(self.baudrate))\n # Start with all motors enabled. NO, I want to set settings before enabled\n #self.set_torque_enabled(self.motor_ids, True)\n def disconnect(self):\n \"\"\"Disconnects from the Dynamixel device.\"\"\"\n if not self.is_connected:\n return\n if self.port_handler.is_using:"
+ },
+ {
+ "comment": "The code is disconnecting the port handler and ensuring motors are disabled. It removes the client from OPEN_CLIENTS, sets motor torque enabled or disabled, retries if necessary, and waits between retries for a specific duration.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":171-195",
+ "content": " logging.error('Port handler in use; cannot disconnect.')\n return\n # Ensure motors are disabled at the end.\n self.set_torque_enabled(self.motor_ids, False, retries=0)\n self.port_handler.closePort()\n if self in self.OPEN_CLIENTS:\n self.OPEN_CLIENTS.remove(self)\n def set_torque_enabled(self,\n motor_ids: Sequence[int],\n enabled: bool,\n retries: int = -1,\n retry_interval: float = 0.25):\n \"\"\"Sets whether torque is enabled for the motors.\n Args:\n motor_ids: The motor IDs to configure.\n enabled: Whether to engage or disengage the motors.\n retries: The number of times to retry. If this is <0, will retry\n forever.\n retry_interval: The number of seconds to wait between retries.\n \"\"\"\n remaining_ids = list(motor_ids)\n while remaining_ids:\n remaining_ids = self.write_byte("
+ },
+ {
+ "comment": "The code defines a function to set the torque of Dynamixel motors. It iterates over each ID and enables/disables the torque for them. If there are remaining unsuccessful IDs, it logs an error message. The code also includes methods to read positions, velocities, and currents from the motors. Each method uses a reader object to retrieve the data.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":196-220",
+ "content": " remaining_ids,\n int(enabled),\n ADDR_TORQUE_ENABLE,\n )\n if remaining_ids:\n logging.error('Could not set torque %s for IDs: %s',\n 'enabled' if enabled else 'disabled',\n str(remaining_ids))\n if retries == 0:\n break\n time.sleep(retry_interval)\n retries -= 1\n def read_pos_vel_cur(self) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:\n \"\"\"Returns the current positions and velocities.\"\"\"\n return self._pos_vel_cur_reader.read()\n def read_pos(self) -> np.ndarray:\n \"\"\"Returns the current positions and velocities.\"\"\"\n return self._pos_reader.read()\n def read_vel(self) -> np.ndarray:\n \"\"\"Returns the current positions and velocities.\"\"\"\n return self._vel_reader.read()\n def read_cur(self) -> np.ndarray:\n \"\"\"Returns the current positions and velocities.\"\"\"\n return self._cur_reader.read()"
+ },
+ {
+ "comment": "This code defines two functions, \"write_desired_pos\" and \"write_byte\". The first function writes the given desired positions to the specified motor IDs. It takes in a list of motor IDs and an array of joint angles, converts the angles to Dynamixel position space, then uses sync_write to write the positions to the motors' goal position address. The second function writes a value to the control table at a given address for specified motor IDs. It returns a list of unsuccessful IDs if any occur during writing.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":222-253",
+ "content": " def write_desired_pos(self, motor_ids: Sequence[int],\n positions: np.ndarray):\n \"\"\"Writes the given desired positions.\n Args:\n motor_ids: The motor IDs to write to.\n positions: The joint angles in radians to write.\n \"\"\"\n assert len(motor_ids) == len(positions)\n # Convert to Dynamixel position space.\n positions = positions / self._pos_vel_cur_reader.pos_scale\n self.sync_write(motor_ids, positions, ADDR_GOAL_POSITION,\n LEN_GOAL_POSITION)\n def write_byte(\n self,\n motor_ids: Sequence[int],\n value: int,\n address: int,\n ) -> Sequence[int]:\n \"\"\"Writes a value to the motors.\n Args:\n motor_ids: The motor IDs to write to.\n value: The value to write to the control table.\n address: The control table address to write to.\n Returns:\n A list of IDs that were unsuccessful.\n \"\"\"\n self.check_connected()"
+ },
+ {
+ "comment": "This code defines a function `sync_write` that takes motor IDs, values, address, and size as input to write the same value at the specified address for multiple motors. It first checks if the connection is established and then creates a key based on the address and size. If this key is not present in the internal dictionary `self._sync_writers`, it initializes a GroupSyncWrite operation with the given parameters. This function also returns an empty list of motor IDs that had errors during the write operation, which are stored in the variable `errored_ids` by checking if each write operation was successful or not.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":254-278",
+ "content": " errored_ids = []\n for motor_id in motor_ids:\n comm_result, dxl_error = self.packet_handler.write1ByteTxRx(\n self.port_handler, motor_id, address, value)\n success = self.handle_packet_result(\n comm_result, dxl_error, motor_id, context='write_byte')\n if not success:\n errored_ids.append(motor_id)\n return errored_ids\n def sync_write(self, motor_ids: Sequence[int],\n values: Sequence[Union[int, float]], address: int,\n size: int):\n \"\"\"Writes values to a group of motors.\n Args:\n motor_ids: The motor IDs to write to.\n values: The values to write.\n address: The control table address to write to.\n size: The size of the control table value being written to.\n \"\"\"\n self.check_connected()\n key = (address, size)\n if key not in self._sync_writers:\n self._sync_writers[key] = self.dxl.GroupSyncWrite("
+ },
+ {
+ "comment": "The code snippet handles synchronous writes to multiple motors. It iterates over motor IDs and desired positions, converts them to the required format, adds them to the packet writer, logs any failures, sends the packet, clears the packet writer, and checks if the robot is connected.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":279-306",
+ "content": " self.port_handler, self.packet_handler, address, size)\n sync_writer = self._sync_writers[key]\n errored_ids = []\n for motor_id, desired_pos in zip(motor_ids, values):\n value = signed_to_unsigned(int(desired_pos), size=size)\n value = value.to_bytes(size, byteorder='little')\n success = sync_writer.addParam(motor_id, value)\n if not success:\n errored_ids.append(motor_id)\n if errored_ids:\n logging.error('Sync write failed for: %s', str(errored_ids))\n comm_result = sync_writer.txPacket()\n self.handle_packet_result(comm_result, context='sync_write')\n sync_writer.clearParam()\n def check_connected(self):\n \"\"\"Ensures the robot is connected.\"\"\"\n if self.lazy_connect and not self.is_connected:\n self.connect()\n if not self.is_connected:\n raise OSError('Must call connect() first.')\n def handle_packet_result(self,\n comm_result: int,"
+ },
+ {
+ "comment": "This function handles communication results and checks for errors. It formats the error message with motor ID and context if provided, then logs the error and returns False. The convert_to_unsigned function converts a given value to its unsigned representation.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":307-328",
+ "content": " dxl_error: Optional[int] = None,\n dxl_id: Optional[int] = None,\n context: Optional[str] = None):\n \"\"\"Handles the result from a communication request.\"\"\"\n error_message = None\n if comm_result != self.dxl.COMM_SUCCESS:\n error_message = self.packet_handler.getTxRxResult(comm_result)\n elif dxl_error is not None:\n error_message = self.packet_handler.getRxPacketError(dxl_error)\n if error_message:\n if dxl_id is not None:\n error_message = '[Motor ID: {}] {}'.format(\n dxl_id, error_message)\n if context is not None:\n error_message = '> {}: {}'.format(context, error_message)\n logging.error(error_message)\n return False\n return True\n def convert_to_unsigned(self, value: int, size: int) -> int:\n \"\"\"Converts the given value to its unsigned representation.\"\"\"\n if value < 0:"
+ },
+ {
+ "comment": "This code defines a DynamixelReader class for reading data from Dynamixel motors using GroupBulkRead from the DynamixelSDK. It also provides context management functionality with __enter__ and __exit__ methods, and automatically disconnects on destruction with __del__.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":329-364",
+ "content": " max_value = (1 << (8 * size)) - 1\n value = max_value + value\n return value\n def __enter__(self):\n \"\"\"Enables use as a context manager.\"\"\"\n if not self.is_connected:\n self.connect()\n return self\n def __exit__(self, *args):\n \"\"\"Enables use as a context manager.\"\"\"\n self.disconnect()\n def __del__(self):\n \"\"\"Automatically disconnect on destruction.\"\"\"\n self.disconnect()\nclass DynamixelReader:\n \"\"\"Reads data from Dynamixel motors.\n This wraps a GroupBulkRead from the DynamixelSDK.\n \"\"\"\n def __init__(self, client: DynamixelClient, motor_ids: Sequence[int],\n address: int, size: int):\n \"\"\"Initializes a new reader.\"\"\"\n self.client = client\n self.motor_ids = motor_ids\n self.address = address\n self.size = size\n self._initialize_data()\n self.operation = self.client.dxl.GroupBulkRead(client.port_handler,\n client.packet_handler)"
+ },
+ {
+ "comment": "This code adds parameters to a bulk read operation for each motor ID, reads data from motors with retries in case of errors or disconnections, and returns previous data if the read fails.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":366-391",
+ "content": " for motor_id in motor_ids:\n success = self.operation.addParam(motor_id, address, size)\n if not success:\n raise OSError(\n '[Motor ID: {}] Could not add parameter to bulk read.'\n .format(motor_id))\n def read(self, retries: int = 1):\n \"\"\"Reads data from the motors.\"\"\"\n self.client.check_connected()\n success = False\n while not success and retries >= 0:\n comm_result = self.operation.txRxPacket()\n success = self.client.handle_packet_result(\n comm_result, context='read')\n retries -= 1\n # If we failed, send a copy of the previous data.\n if not success:\n return self._get_data()\n errored_ids = []\n for i, motor_id in enumerate(self.motor_ids):\n # Check if the data is available.\n available = self.operation.isAvailable(motor_id, self.address,\n self.size)"
+ },
+ {
+ "comment": "This code is part of a Dynamixel client that communicates with a robot's servo motors to read position and velocity data. It initializes the cached data, updates the data for specific motor IDs, returns a copy of the data, and handles cases where data is unavailable.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":392-424",
+ "content": " if not available:\n errored_ids.append(motor_id)\n continue\n self._update_data(i, motor_id)\n if errored_ids:\n logging.error('Bulk read data is unavailable for: %s',\n str(errored_ids))\n return self._get_data()\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n self._data[index] = self.operation.getData(motor_id, self.address,\n self.size)\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return self._data.copy()\nclass DynamixelPosVelCurReader(DynamixelReader):\n \"\"\"Reads positions and velocities.\"\"\"\n def __init__(self,\n client: DynamixelClient,\n motor_ids: Sequence[int],\n pos_scale: float = 1.0,"
+ },
+ {
+ "comment": "This code defines a class for reading Dynamixel servo data. It takes in a client, motor IDs, and scales for position, velocity, and current. It initializes cached data arrays with zeros for each motor. The _update_data function reads and stores the current, velocity, and position data from the specified address for the given motor ID.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":425-449",
+ "content": " vel_scale: float = 1.0,\n cur_scale: float = 1.0):\n super().__init__(\n client,\n motor_ids,\n address=ADDR_PRESENT_POS_VEL_CUR,\n size=LEN_PRESENT_POS_VEL_CUR,\n )\n self.pos_scale = pos_scale\n self.vel_scale = vel_scale\n self.cur_scale = cur_scale\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._pos_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n self._vel_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n self._cur_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n cur = self.operation.getData(motor_id, ADDR_PRESENT_CURRENT,\n LEN_PRESENT_CURRENT)\n vel = self.operation.getData(motor_id, ADDR_PRESENT_VELOCITY,\n LEN_PRESENT_VELOCITY)\n pos = self.operation.getData(motor_id, ADDR_PRESENT_POSITION,"
+ },
+ {
+ "comment": "The code defines a class `DynamixelPosReader` that inherits from `DynamixelReader` and reads positions and velocities of motors. It takes a client, motor IDs, and scaling factors for position, velocity, and current as parameters. The `__init__` method initializes the superclass with the address and size for reading present position, velocity, and current data. The `_get_data` method returns a copy of the stored position, velocity, and current data.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":450-478",
+ "content": " LEN_PRESENT_POSITION)\n cur = unsigned_to_signed(cur, size=2)\n vel = unsigned_to_signed(vel, size=4)\n pos = unsigned_to_signed(pos, size=4)\n self._pos_data[index] = float(pos) * self.pos_scale\n self._vel_data[index] = float(vel) * self.vel_scale\n self._cur_data[index] = float(cur) * self.cur_scale\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return (self._pos_data.copy(), self._vel_data.copy(),\n self._cur_data.copy())\nclass DynamixelPosReader(DynamixelReader):\n \"\"\"Reads positions and velocities.\"\"\"\n def __init__(self,\n client: DynamixelClient,\n motor_ids: Sequence[int],\n pos_scale: float = 1.0,\n vel_scale: float = 1.0,\n cur_scale: float = 1.0):\n super().__init__(\n client,\n motor_ids,\n address=ADDR_PRESENT_POS_VEL_CUR,\n size=LEN_PRESENT_POS_VEL_CUR,\n )"
+ },
+ {
+ "comment": "The code defines a class `DynamixelReader` that reads position and velocity data from Dynamixel motors. It initializes cached data, updates the data for a given motor ID, and returns a copy of the data. The `DynamixelVelReader` subclass extends this functionality to read positions, velocities, and currents.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":479-508",
+ "content": " self.pos_scale = pos_scale\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._pos_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n pos = self.operation.getData(motor_id, ADDR_PRESENT_POSITION,\n LEN_PRESENT_POSITION)\n pos = unsigned_to_signed(pos, size=4)\n self._pos_data[index] = float(pos) * self.pos_scale\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return self._pos_data.copy()\nclass DynamixelVelReader(DynamixelReader):\n \"\"\"Reads positions and velocities.\"\"\"\n def __init__(self,\n client: DynamixelClient,\n motor_ids: Sequence[int],\n pos_scale: float = 1.0,\n vel_scale: float = 1.0,\n cur_scale: float = 1.0):\n super().__init__(\n client,\n motor_ids,\n address=ADDR_PRESENT_POS_VEL_CUR,"
+ },
+ {
+ "comment": "This code defines a class DynamixelCurReader that inherits from DynamixelReader and reads positions and velocities from dynamixel motors. The constructor takes in a client, motor IDs, optional position scale, and optional velocity scale. It initializes cached data and sets the position and velocity scales. The _initialize_data method initializes the velocity data with zeros. The _update_data method updates the data index for the given motor ID by getting the velocity from the DynamixelClient, converting it to a signed integer, scaling it by the velocity scale, and storing it in the velocity data. The _get_data method returns a copy of the velocity data.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":509-537",
+ "content": " size=LEN_PRESENT_POS_VEL_CUR,\n )\n self.pos_scale = pos_scale\n self.vel_scale = vel_scale\n self.cur_scale = cur_scale\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._vel_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n vel = self.operation.getData(motor_id, ADDR_PRESENT_VELOCITY,\n LEN_PRESENT_VELOCITY)\n vel = unsigned_to_signed(vel, size=4)\n self._vel_data[index] = float(vel) * self.vel_scale\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return self._vel_data.copy()\nclass DynamixelCurReader(DynamixelReader):\n \"\"\"Reads positions and velocities.\"\"\"\n def __init__(self,\n client: DynamixelClient,\n motor_ids: Sequence[int],\n pos_scale: float = 1.0,\n vel_scale: float = 1.0,"
+ },
+ {
+ "comment": "The code defines a class for reading the present current values from Dynamixel motors. It initializes data and updates data index for the given motor ID. The function returns a copy of the data. Global cleanup function is registered for atexit module to handle clean-up operations upon program termination.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":538-570",
+ "content": " cur_scale: float = 1.0):\n super().__init__(\n client,\n motor_ids,\n address=ADDR_PRESENT_POS_VEL_CUR,\n size=LEN_PRESENT_POS_VEL_CUR,\n )\n self.cur_scale = cur_scale\n def _initialize_data(self):\n \"\"\"Initializes the cached data.\"\"\"\n self._cur_data = np.zeros(len(self.motor_ids), dtype=np.float32)\n def _update_data(self, index: int, motor_id: int):\n \"\"\"Updates the data index for the given motor ID.\"\"\"\n cur = self.operation.getData(motor_id, ADDR_PRESENT_CURRENT,\n LEN_PRESENT_CURRENT)\n cur = unsigned_to_signed(cur, size=2)\n self._cur_data[index] = float(cur) * self.cur_scale\n def _get_data(self):\n \"\"\"Returns a copy of the data.\"\"\"\n return self._cur_data.copy()\n# Register global cleanup function.\natexit.register(dynamixel_cleanup_handler)\nif __name__ == '__main__':\n import argparse\n import itertools\n parser = argparse.ArgumentParser()"
+ },
+ {
+ "comment": "The code defines command-line arguments for motor IDs, device, and baudrate. It then parses these arguments into a list of motors, and creates waypoints for motion control using numpy arrays. The DynamixelClient class is instantiated with the parsed arguments, and in an infinite loop, writes waypoint positions to motors and reads current position, velocity, and current values from the device at regular intervals.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":571-597",
+ "content": " parser.add_argument(\n '-m',\n '--motors',\n required=True,\n help='Comma-separated list of motor IDs.')\n parser.add_argument(\n '-d',\n '--device',\n default='/dev/ttyUSB0',\n help='The Dynamixel device to connect to.')\n parser.add_argument(\n '-b', '--baud', default=1000000, help='The baudrate to connect with.')\n parsed_args = parser.parse_args()\n motors = [int(motor) for motor in parsed_args.motors.split(',')]\n way_points = [np.zeros(len(motors)), np.full(len(motors), np.pi)]\n with DynamixelClient(motors, parsed_args.device,\n parsed_args.baud) as dxl_client:\n for step in itertools.count():\n if step > 0 and step % 50 == 0:\n way_point = way_points[(step // 100) % len(way_points)]\n print('Writing: {}'.format(way_point.tolist()))\n dxl_client.write_desired_pos(motors, way_point)\n read_start = time.time()\n pos_now, vel_now, cur_now = dxl_client.read_pos_vel_cur()"
+ },
+ {
+ "comment": "This code block prints the frequency, positions, velocities, and currents of the dynamixel servos every 5 steps in the loop.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dynamixel_client.py\":598-603",
+ "content": " if step % 5 == 0:\n print('[{}] Frequency: {:.2f} Hz'.format(\n step, 1.0 / (time.time() - read_start)))\n print('> Pos: {}'.format(pos_now.tolist()))\n print('> Vel: {}'.format(vel_now.tolist()))\n print('> Cur: {}'.format(cur_now.tolist()))"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/128bd002-3d65-4bb6-8d7b-be70ac6cd669.json b/docs/doc/128bd002-3d65-4bb6-8d7b-be70ac6cd669.json
new file mode 100644
index 00000000..adb9ba6f
--- /dev/null
+++ b/docs/doc/128bd002-3d65-4bb6-8d7b-be70ac6cd669.json
@@ -0,0 +1,115 @@
+{
+ "summary": "The code trains RL models, preprocesses data, and experiments with hyperparameters. It creates a Conda environment, trains multi-task camera views for mobile chair tasks, caches features, evaluates VINN model, and uses separate dataset directories and checkpoints.",
+ "details": [
+ {
+ "comment": "This code activates a conda environment, sets up some environment variables, and then runs Python scripts with different parameters for model training and experimentation. It seems to be related to reinforcement learning tasks using the MUJOCO library. The code executes multiple experiments with varying hyperparameters to train and evaluate models on different datasets or tasks.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":1-27",
+ "content": "conda activate mimic\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\npython3 imitate_episodes.py \\\n--task_name sim_transfer_cube_human \\\n--ckpt_dir /scr/tonyzhao/train_logs/vq_test \\\n--policy_class ACT --kl_weight 10 --chunk_size 100 \\\n--hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \\\n--num_epochs 10000 --lr 1e-5 --seed 0 --vq\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name all \\\n--ckpt_dir /scr/tonyzhao/train_logs/pretrain_all \\\n--policy_class ACT --kl_weight 10 --chunk_size 50 \\\n--hidden_dim 512 --batch_size 24 --dim_feedforward 3200 --num_epochs 5000 --lr 1e-4 --seed 0\n#### NOTE to reproduce this experiment, uncomment the sim data filtering in utils.py\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name all \\\n--ckpt_dir /scr/tonyzhao/train_logs/pretrain_all \\\n--policy_class ACT --kl_weight 10 --chunk_size 50 \\\n--hidden_dim 512 --batch_size 24 --dim_feedforward 3200 --lr 1e-4 --seed 0 \\"
+ },
+ {
+ "comment": "This code generates mirrored data for a simulation task, creates two dataset directories (one with 100 episodes and the other with 50), visualizes original and artificially mirrored data from the first episode in the dataset. The user then activates a conda environment, changes to the directory containing the code, and runs Python scripts to accomplish these tasks.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":28-42",
+ "content": "--num_steps 1000000 --eval_every 10000000000 --validate_every 2000 --save_every 5000\n# generate mirrored data\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\npython3 record_sim_episodes.py --task_name sim_transfer_cube_scripted_mirror --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --num_episodes 50\npython3 postprocess_episodes.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --num_episodes 50\n# the sim_transfer_cube_scripted_mirror will have 100 episodes\n# I then copy the whole dir to sim_transfer_cube_scripted then removed all mirrored episodes\n# this gives sim_transfer_cube_scripted_mirror (100 episodes) and sim_transfer_cube_scripted (50 episodes)\n# visualize the original data\npython3 visualize_episodes.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --episode_idx 0\n# visualize the artificially mirrored data\npython3 visualize_episodes.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --episode_idx 0 --ismirror"
+ },
+ {
+ "comment": "The code sanity checks the mirrored and original data by replaying the actions in their respective environments, then launches experiments on both datasets using the ACT policy with specified parameters.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":44-68",
+ "content": "# sanity check\n# replay the mirrored data action in the original env\npython3 replay_episodes.py --dataset_path /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror/mirror_episode_0.hdf5\n# replay the original data action in the original env\npython3 replay_episodes.py --dataset_path /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror/episode_0.hdf5\n# launch experiment on original data\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted \\\n--policy_class ACT --kl_weight 10 --chunk_size 50 \\\n--hidden_dim 512 --batch_size 12 --dim_feedforward 3200 --lr 1e-5 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000 --no_encoder\n# launch experiment on all data\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted_mirror \\"
+ },
+ {
+ "comment": "The code is running a Python script named \"imitate_episodes.py\" from the act-plus-plus repository, training a policy for imitation learning using different configurations. It switches between two policies (ACT and Diffusion) with varying hyperparameters, such as chunk size, batch size, and number of steps. The code also specifies the task name, checkpoint directory, and activates a specific conda environment before running the script on different GPUs.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":69-96",
+ "content": "--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_mirror \\\n--policy_class ACT --kl_weight 10 --chunk_size 50 \\\n--hidden_dim 512 --batch_size 12 --dim_feedforward 3200 --lr 1e-5 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000 --no_encoder\n####### DIFFUSION POLICY\n- first install https://github.com/ARISE-Initiative/robomimic/tree/r2d2 (note the r2d2 branch)\n- on top of it pip install the current repo requirements\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-5 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\"
+ },
+ {
+ "comment": "The code snippet is used to train and evaluate a policy for a task named \"sim_transfer_cube_scripted\" using different configurations. It activates a specific conda environment, sets the MUJOCO_GL environment variable, changes directory to the project's root, and executes the imitate_episodes.py script multiple times with varying parameters such as CUDA device, learning rate, chunk size, and checkpoint directories. The code seems to be part of a larger training process involving different diffusion steps, potentially for model performance optimization or comparison.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":97-124",
+ "content": "--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_1 \\\n--policy_class Diffusion --chunk_size 16 \\\n--batch_size 32 --lr 1e-5 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000\n# above are all 100 train diffusion steps, 1e-5\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_2_50step_1e-4 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000\n# Dec 10\n######################## more diffusion ########################\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_3_chunk64 \\\n--policy_class Diffusion --chunk_size 64 \\"
+ },
+ {
+ "comment": "This code activates a conda environment, sets MUJOCO_GL to egl, changes directory to act-plus-plus, and runs three different python scripts with varying hyperparameters for training and evaluation on the \"sim_transfer_cube_scripted\" task. The policy class is set to Diffusion and chunk size is 32. Each script has different checkpoint directories, numbers of steps, and evaluation frequencies.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":125-151",
+ "content": "--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 200000 --eval_every 4000 --validate_every 4000 --save_every 4000\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_4_regressionTest \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_5_noEMA \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus"
+ },
+ {
+ "comment": "This code is training and evaluating a diffusion-based policy model for two different tasks: \"sim_transfer_cube_scripted\" and \"aloha_mobile_wipe_wine\". It specifies the necessary command line arguments such as task name, checkpoint directory, policy class, chunk size, batch size, learning rate, seed, number of steps, evaluation frequency, validation frequency, and save frequency. The code also sets the CUDA device, environment variables, and activates a conda environment before running the training and evaluation scripts.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":152-172",
+ "content": "CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_6_noEMA_seed1 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 1 \\\n--num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000\n###### Diffusion Real ######\n## deploy\npython3 imitate_episodes.py --task_name aloha_mobile_wipe_wine --ckpt_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/wipe_wine_diffusion_augmentation_seed0/ --policy_class Diffusion --chunk_size 32 --batch_size 32 --lr 1e-4 --seed 0 --num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000 --eval\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_diffusion_seed0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000"
+ },
+ {
+ "comment": "This code is activating the mobile conda environment, setting MUJOCO_GL to egl, and running Python scripts in the act-plus-plus directory. It trains a model (Diffusion policy) for two different tasks: \"aloha\\_mobile\\_wipe\\_wine\\_cotrain\" and \"aloha\\_mobile\\_wipe\\_wine\". The first task is trained again with augmentations, while the second task is trained with augmentations. The code is running on CUDA device 0 and 1, saving models every 5000 steps, evaluating every 100,000 steps, and validating every 5,000 steps for a total of 1,000,000 steps.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":174-200",
+ "content": "## Cotrain\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_seed0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# train no cotrain again with augmentations\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_diffusion_augmentation_seed0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n## Cotrain with augmentations\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\"
+ },
+ {
+ "comment": "The code is executing two different training jobs for a robotics task called 'aloha_mobile_wipe_wine'. It first trains the model with chunk size 32 and cotrain, then with chunk size 64 without cotrain. It also validates and saves models every 5000 steps. The code requires specific environment activation and environmental variable settings.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":201-226",
+ "content": "--task_name aloha_mobile_wipe_wine_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_seed0 \\\n--policy_class Diffusion --chunk_size 32 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# try chunk size 64, no cotrain\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_diffusion_augmentation_chunk64_seed0 \\\n--policy_class Diffusion --chunk_size 64 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# chunk 64 with cotrain\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_chunk64_seed0 \\"
+ },
+ {
+ "comment": "This code activates the conda environment, sets environment variables, and runs a Python script to train a diffusion policy model with chunk size 64 for a task named \"aloha\\_mobile\\_wipe\\_wine\\_2\\_cotrain\". It saves checkpoints every 5000 steps. The first command trains the model with learning rate 1e-4, while the second one trains it with learning rate 3e-4.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":227-255",
+ "content": "--policy_class Diffusion --chunk_size 64 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# chunk 64 with cotrain + EMA\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine_2_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_chunk64_ema_seed0 \\\n--policy_class Diffusion --chunk_size 64 \\\n--batch_size 32 --lr 1e-4 --seed 0 \\\n--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n# chunk 64 with cotrain + EMA + 3e-4\nconda activate mobile\nexport MUJOCO_GL=egl\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \\\n--task_name aloha_mobile_wipe_wine_2_cotrain \\\n--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_chunk64_ema_3e-4_seed0 \\\n--policy_class Diffusion --chunk_size 64 \\\n--batch_size 32 --lr 3e-4 --seed 0 \\"
+ },
+ {
+ "comment": "This code activates a conda environment, changes directory, sets CUDA_VISIBLE_DEVICES, and runs the train.py script for different camera names with the same seed in a loop, then it switches to another conda environment and runs a vinn_cache_feature.py script using the saved checkpoint path.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":256-277",
+ "content": "--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000\n######################## VINN ########################\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted --cam_name top --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=0 python3 train.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted --cam_name left_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted --cam_name right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=sim_transfer_cube_scripted\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt"
+ },
+ {
+ "comment": "This code is running a series of commands to train and evaluate the VINN model on the sim_transfer_cube_scripted task. It first selects the dataset, loads the pre-trained model, evaluates it, and then tests backward compatibility with two different camera names ('top' and 'left_wrist'). The environment is activated, specific CUDA devices are set, and the training process is executed for both cameras using the byol_pytorch package.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":279-306",
+ "content": "TASK_NAME=sim_transfer_cube_scripted\npython3 vinn_select_k.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test\npython3 vinn_eval.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test \\\n--task_name $TASK_NAME \n## TODO\nmake sure env is consistent\ntune a bit more\n######################## VINN Real ########################\n### test backward compatibility\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task sim_transfer_cube_scripted --cam_name top --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task sim_transfer_cube_scripted --cam_name left_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA"
+ },
+ {
+ "comment": "Training a BYOL model for the sim_transfer_cube_scripted task, evaluating the trained model using vinn_eval.py, and utilizing the vinn_select_k.py to choose K best features from the dataset.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":306-330",
+ "content": "_VISIBLE_DEVICES=1 python3 train.py --task sim_transfer_cube_scripted --cam_name right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=sim_transfer_cube_scripted\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt\nTASK_NAME=sim_transfer_cube_scripted\npython3 vinn_select_k.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test\npython3 vinn_eval.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test \\\n--task_name $TASK_NAME \n### new data loader passed backward compatibility\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine --cam_name cam_high --seed 0"
+ },
+ {
+ "comment": "This code snippet executes Python training scripts using CUDA for various tasks and cameras. It activates a specific conda environment, changes the directory to the relevant project folder, and trains models with different configurations (single-camera or co-trained) on tasks such as aloha_mobile_wipe_wine and aloha_mobile_wash_pan.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":331-345",
+ "content": "#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine --cam_name cam_left_wrist --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine --cam_name cam_right_wrist --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_high --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_left_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan --cam_name cam_high --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan --cam_name cam_left_wrist --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan --cam_name cam_right_wrist --seed 0\n#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_high --seed 0"
+ },
+ {
+ "comment": "This code snippet is running Python scripts using the CUDA_VISIBLE_DEVICES environment variable to control which GPU(s) are used. The commands are training different models for various tasks such as aloha_mobile_wash_pan_cotrain, aloha_mobile_elevator_truncated, etc., using different camera names and seeds. Some models are trained on the cam_left_wrist, cam_right_wrist, or cam_high cameras. The code is activated using Conda environments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":346-361",
+ "content": "#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_left_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_right_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated --cam_name cam_high --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated --cam_name cam_left_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated --cam_name cam_right_wrist --seed 0\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_right_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated_cotrain --cam_name cam_high --seed 0"
+ },
+ {
+ "comment": "The code is running two different Python scripts in a conda environment, training models on specific tasks (aloha_mobile_elevator_truncated_cotrain and aloha_mobile_wipe_wine_cotrain), using CUDA device 1. It then uses these trained models to cache features for the corresponding datasets and sets the CUDA visible devices, changing directories between actions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":362-386",
+ "content": "CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated_cotrain --cam_name cam_left_wrist --seed 0\nCUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated_cotrain --cam_name cam_right_wrist --seed 0\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wipe_wine\nDATA_NAME=aloha_mobile_wipe_wine\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wipe_wine_cotrain\nDATA_NAME=aloha_mobile_wipe_wine\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1"
+ },
+ {
+ "comment": "The code activates a conda environment called \"mobile\", sets the CUDA_VISIBLE_DEVICES environment variable to 1, and runs the vinn_cache_feature.py script for multiple tasks using different checkpoint paths and dataset directories.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":387-408",
+ "content": "cd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wash_pan\nDATA_NAME=aloha_mobile_wash_pan\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wash_pan_cotrain\nDATA_NAME=aloha_mobile_wash_pan\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_elevator_truncated\nDATA_NAME=aloha_mobile_elevator_truncated\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}"
+ },
+ {
+ "comment": "This code activates a specific conda environment, sets the visible CUDA devices, changes directories, and runs multiple training scripts for different camera views in a mobile chair task. It then activates another environment, changes directories again, and runs a feature caching script on trained models for the chair and elevator tasks.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":410-432",
+ "content": "conda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_elevator_truncated_cotrain\nDATA_NAME=aloha_mobile_elevator_truncated\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\n# push chair task\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=0 \ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\npython3 train.py --task aloha_mobile_chair_truncated --cam_name cam_high --seed 0\npython3 train.py --task aloha_mobile_chair_truncated --cam_name cam_left_wrist --seed 0\npython3 train.py --task aloha_mobile_chair_truncated --cam_name cam_right_wrist --seed 0\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_chair_truncated\nDATA_NAME=aloha_mobile_chair_truncated\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\"
+ },
+ {
+ "comment": "This code snippet trains a BYOL model on the aloha_mobile_chair_truncated_cotrain task, then uses vinn_cache_feature.py to cache features for wipe wine dataset. It activates a conda environment, sets CUDA_VISIBLE_DEVICES, changes directories, and runs Python training scripts with specific parameters.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":433-458",
+ "content": "--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=1\ncd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning\npython3 train.py --task aloha_mobile_chair_truncated_cotrain --cam_name cam_high --seed 0\npython3 train.py --task aloha_mobile_chair_truncated_cotrain --cam_name cam_left_wrist --seed 0\npython3 train.py --task aloha_mobile_chair_truncated_cotrain --cam_name cam_right_wrist --seed 0\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_chair_truncated_cotrain\nDATA_NAME=aloha_mobile_chair_truncated\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\n# cache feature again for wipe wine\nconda activate mobile\nexport CUDA_VISIBLE_DEVICES=0\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wipe_wine\nDATA_NAME=aloha_mobile_wipe_wine\npython3 vinn_c"
+ },
+ {
+ "comment": "This code is running a series of commands to train and evaluate a vision-in-nervous-system (VINN) model. The model is being trained on different datasets for various tasks such as chair recognition, mobile wipe, and wine classification. The commands use Python scripts with specific paths and arguments to perform these tasks.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":458-480",
+ "content": "ache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\ncd /home/tonyzhao/Research/act-plus-plus\nTASK_NAME=aloha_mobile_wipe_wine_cotrain\nDATA_NAME=aloha_mobile_wipe_wine\npython3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}\n# run on real robot\nTASK_NAME=aloha_mobile_chair_truncated\npython3 vinn_select_k.py \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME} \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0\npython3 vinn_eval.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0 \\\n--task_name $TASK_NAME "
+ },
+ {
+ "comment": "The code runs two separate python scripts for evaluating and training models on different datasets. The first set of commands trains a model using the VINN approach and BYOL implementation, while the second set of commands evaluates and caches features on a real robot. Both processes involve multiple dataset directories and checkpoint paths to train/evaluate/cache feature sets.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":484-513",
+ "content": "TASK_NAME=aloha_mobile_chair_truncated\npython3 vinn_select_k.py \\\n--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME} \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0\npython3 vinn_eval.py \\\n--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \\\n--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0 \\\n--task_name $TASK_NAME \n# eval on real robot\nconda activate aloha\ncd /home/mobile-aloha/interbotix_ws/src/act\nTASK_NAME=aloha_mobile_wipe_wine\npython3 vinn_cache_feature.py --ckpt_path /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/byol-${TASK_NAME}-DUMMY-seed-0.pt\nTASK_NAME=aloha_mobile_wipe_wine\npython3 vinn_select_k.py \\\n--dataset_dir /home/mobile-aloha/data/${TASK_NAME} \\\n--ckpt_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/VINN-eval-seed-0-test \\\nTASK_NAME=aloha_mobile_wipe_wine\npython3 vinn_eval.py \\\n--dataset_dir /home/mobile-aloha/data/${TASK_NAME} \\"
+ },
+ {
+ "comment": "This code activates a conda environment, changes to the project directory, sets the CUDA device, and runs a Python script for training an actuator network. The task name is provided as a variable, but the chunk size and some additional features are noted for future improvement.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/commands.txt\":514-526",
+ "content": "--model_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/byol-${TASK_NAME}-DUMMY-seed-0.pt \\\n--ckpt_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/VINN-eval-seed-0-test \\\n--task_name $TASK_NAME \n---------------------------------------------------------------------------------------\nNOTE: chunk size cannot be any number, try before launching\nTODO: Add history, EMA at test time\nconda activate mobile\ncd /home/tonyzhao/Research/act-plus-plus\nCUDA_VISIBLE_DEVICES=1 python3 train_actuator_network.py"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/2997b934-83e4-46a4-b750-e794d914f340.json b/docs/doc/2997b934-83e4-46a4-b750-e794d914f340.json
new file mode 100644
index 00000000..d53d0548
--- /dev/null
+++ b/docs/doc/2997b934-83e4-46a4-b750-e794d914f340.json
@@ -0,0 +1,55 @@
+{
+ "summary": "The code introduces a `BasePolicy` class for robotic arm policy, incorporating trajectory generation, updating poses and gripper commands, and executing pre-generated trajectories. It initializes an environment and runs two episodes of actions using PickAndTransferPolicy to test cube transfer simulation scripts.",
+ "details": [
+ {
+ "comment": "The code defines a `BasePolicy` class for a robotic arm policy with methods to generate and interpolate trajectories. It imports necessary libraries, handles injecting noise, and includes utility functions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":0-32",
+ "content": "import numpy as np\nimport matplotlib.pyplot as plt\nfrom pyquaternion import Quaternion\nfrom constants import SIM_TASK_CONFIGS\nfrom ee_sim_env import make_ee_sim_env\nimport IPython\ne = IPython.embed\nclass BasePolicy:\n def __init__(self, inject_noise=False):\n self.inject_noise = inject_noise\n self.step_count = 0\n self.left_trajectory = None\n self.right_trajectory = None\n def generate_trajectory(self, ts_first):\n raise NotImplementedError\n @staticmethod\n def interpolate(curr_waypoint, next_waypoint, t):\n t_frac = (t - curr_waypoint[\"t\"]) / (next_waypoint[\"t\"] - curr_waypoint[\"t\"])\n curr_xyz = curr_waypoint['xyz']\n curr_quat = curr_waypoint['quat']\n curr_grip = curr_waypoint['gripper']\n next_xyz = next_waypoint['xyz']\n next_quat = next_waypoint['quat']\n next_grip = next_waypoint['gripper']\n xyz = curr_xyz + (next_xyz - curr_xyz) * t_frac\n quat = curr_quat + (next_quat - curr_quat) * t_frac\n gripper = curr_grip + (next_grip - curr_grip) * t_frac"
+ },
+ {
+ "comment": "This code is responsible for executing a pre-generated trajectory by interpolating between waypoints, obtaining the current pose and gripper command for both left and right sides. It also allows injecting noise if enabled. The function is called at each timestep to update the pose and gripper commands.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":33-55",
+ "content": " return xyz, quat, gripper\n def __call__(self, ts):\n # generate trajectory at first timestep, then open-loop execution\n if self.step_count == 0:\n self.generate_trajectory(ts)\n # obtain left and right waypoints\n if self.left_trajectory[0]['t'] == self.step_count:\n self.curr_left_waypoint = self.left_trajectory.pop(0)\n next_left_waypoint = self.left_trajectory[0]\n if self.right_trajectory[0]['t'] == self.step_count:\n self.curr_right_waypoint = self.right_trajectory.pop(0)\n next_right_waypoint = self.right_trajectory[0]\n # interpolate between waypoints to obtain current pose and gripper command\n left_xyz, left_quat, left_gripper = self.interpolate(self.curr_left_waypoint, next_left_waypoint, self.step_count)\n right_xyz, right_quat, right_gripper = self.interpolate(self.curr_right_waypoint, next_right_waypoint, self.step_count)\n # Inject noise\n if self.inject_noise:\n scale = 0.01"
+ },
+ {
+ "comment": "The code snippet is part of a PickAndTransferPolicy class. It generates a trajectory for picking up an object and transferring it from one robot arm to another. The code adds random uniform noise to the action coordinates, concatenates the actions with quaternions and gripper states, increments the step count, and returns the combined action for both arms. The method also initializes variables based on the first time step observation, including the initial mocap poses of both robot arms and box information (XYZ and quaternion).",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":56-80",
+ "content": " left_xyz = left_xyz + np.random.uniform(-scale, scale, left_xyz.shape)\n right_xyz = right_xyz + np.random.uniform(-scale, scale, right_xyz.shape)\n action_left = np.concatenate([left_xyz, left_quat, [left_gripper]])\n action_right = np.concatenate([right_xyz, right_quat, [right_gripper]])\n self.step_count += 1\n return np.concatenate([action_left, action_right])\nclass PickAndTransferPolicy(BasePolicy):\n def generate_trajectory(self, ts_first):\n init_mocap_pose_right = ts_first.observation['mocap_pose_right']\n init_mocap_pose_left = ts_first.observation['mocap_pose_left']\n box_info = np.array(ts_first.observation['env_state'])\n box_xyz = box_info[:3]\n box_quat = box_info[3:]\n # print(f\"Generate trajectory for {box_xyz=}\")\n gripper_pick_quat = Quaternion(init_mocap_pose_right[3:])\n gripper_pick_quat = gripper_pick_quat * Quaternion(axis=[0.0, 1.0, 0.0], degrees=-60)\n meet_left_quat = Quaternion(axis=[1.0, 0.0, 0.0], degrees=90)"
+ },
+ {
+ "comment": "Code defines trajectory for left and right robot arms. Left arm starts by sleeping, then approaches and moves to meet position, closes gripper, moves left, and stays at final position. Right arm also sleeps, follows similar steps as left arm. All movements are time-based.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":82-94",
+ "content": " meet_xyz = np.array([0, 0.5, 0.25])\n self.left_trajectory = [\n {\"t\": 0, \"xyz\": init_mocap_pose_left[:3], \"quat\": init_mocap_pose_left[3:], \"gripper\": 0}, # sleep\n {\"t\": 100, \"xyz\": meet_xyz + np.array([-0.1, 0, -0.02]), \"quat\": meet_left_quat.elements, \"gripper\": 1}, # approach meet position\n {\"t\": 260, \"xyz\": meet_xyz + np.array([0.02, 0, -0.02]), \"quat\": meet_left_quat.elements, \"gripper\": 1}, # move to meet position\n {\"t\": 310, \"xyz\": meet_xyz + np.array([0.02, 0, -0.02]), \"quat\": meet_left_quat.elements, \"gripper\": 0}, # close gripper\n {\"t\": 360, \"xyz\": meet_xyz + np.array([-0.1, 0, -0.02]), \"quat\": np.array([1, 0, 0, 0]), \"gripper\": 0}, # move left\n {\"t\": 400, \"xyz\": meet_xyz + np.array([-0.1, 0, -0.02]), \"quat\": np.array([1, 0, 0, 0]), \"gripper\": 0}, # stay\n ]\n self.right_trajectory = [\n {\"t\": 0, \"xyz\": init_mocap_pose_right[:3], \"quat\": init_mocap_pose_right[3:], \"gripper\": 0}, # sleep"
+ },
+ {
+ "comment": "This code represents a sequence of actions for a robot gripper. It begins by approaching and gripping the cube, then moving downwards, closing the gripper at a certain position, moving to a meet position, opening the gripper, and finally moving right and staying in that position. The actions are time-based with specific positions and gripper states.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":95-102",
+ "content": " {\"t\": 90, \"xyz\": box_xyz + np.array([0, 0, 0.08]), \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # approach the cube\n {\"t\": 130, \"xyz\": box_xyz + np.array([0, 0, -0.015]), \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # go down\n {\"t\": 170, \"xyz\": box_xyz + np.array([0, 0, -0.015]), \"quat\": gripper_pick_quat.elements, \"gripper\": 0}, # close gripper\n {\"t\": 200, \"xyz\": meet_xyz + np.array([0.05, 0, 0]), \"quat\": gripper_pick_quat.elements, \"gripper\": 0}, # approach meet position\n {\"t\": 220, \"xyz\": meet_xyz, \"quat\": gripper_pick_quat.elements, \"gripper\": 0}, # move to meet position\n {\"t\": 310, \"xyz\": meet_xyz, \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # open gripper\n {\"t\": 360, \"xyz\": meet_xyz + np.array([0.1, 0, 0]), \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # move to right\n {\"t\": 400, \"xyz\": meet_xyz + np.array([0.1, 0, 0]), \"quat\": gripper_pick_quat.elements, \"gripper\": 1}, # stay"
+ },
+ {
+ "comment": "This code initializes variables for the InsertionPolicy class's generate_trajectory method. It extracts information from the observation and calculates gripper quaternions for both hands, defining their starting positions and orientation. The meet_xyz variable represents a specific target position, while lift_right is an arbitrary value. The left_trajectory list is initialized with the first point as the initial mocap pose of the left hand in sleep mode.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":103-130",
+ "content": " ]\nclass InsertionPolicy(BasePolicy):\n def generate_trajectory(self, ts_first):\n init_mocap_pose_right = ts_first.observation['mocap_pose_right']\n init_mocap_pose_left = ts_first.observation['mocap_pose_left']\n peg_info = np.array(ts_first.observation['env_state'])[:7]\n peg_xyz = peg_info[:3]\n peg_quat = peg_info[3:]\n socket_info = np.array(ts_first.observation['env_state'])[7:]\n socket_xyz = socket_info[:3]\n socket_quat = socket_info[3:]\n gripper_pick_quat_right = Quaternion(init_mocap_pose_right[3:])\n gripper_pick_quat_right = gripper_pick_quat_right * Quaternion(axis=[0.0, 1.0, 0.0], degrees=-60)\n gripper_pick_quat_left = Quaternion(init_mocap_pose_right[3:])\n gripper_pick_quat_left = gripper_pick_quat_left * Quaternion(axis=[0.0, 1.0, 0.0], degrees=60)\n meet_xyz = np.array([0, 0.5, 0.15])\n lift_right = 0.00715\n self.left_trajectory = [\n {\"t\": 0, \"xyz\": init_mocap_pose_left[:3], \"quat\": init_mocap_pose_left[3:], \"gripper\": 0}, # sleep"
+ },
+ {
+ "comment": "This code defines a list of trajectory points for left and right arms, specifying their xyz coordinates, orientation quaternion, and gripper state at each time step. It follows a sequence of actions such as approaching the cube, going down, closing the gripper, and reaching insertion positions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":131-141",
+ "content": " {\"t\": 120, \"xyz\": socket_xyz + np.array([0, 0, 0.08]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 1}, # approach the cube\n {\"t\": 170, \"xyz\": socket_xyz + np.array([0, 0, -0.03]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 1}, # go down\n {\"t\": 220, \"xyz\": socket_xyz + np.array([0, 0, -0.03]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 0}, # close gripper\n {\"t\": 285, \"xyz\": meet_xyz + np.array([-0.1, 0, 0]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 0}, # approach meet position\n {\"t\": 340, \"xyz\": meet_xyz + np.array([-0.05, 0, 0]), \"quat\": gripper_pick_quat_left.elements,\"gripper\": 0}, # insertion\n {\"t\": 400, \"xyz\": meet_xyz + np.array([-0.05, 0, 0]), \"quat\": gripper_pick_quat_left.elements, \"gripper\": 0}, # insertion\n ]\n self.right_trajectory = [\n {\"t\": 0, \"xyz\": init_mocap_pose_right[:3], \"quat\": init_mocap_pose_right[3:], \"gripper\": 0}, # sleep\n {\"t\": 12"
+ },
+ {
+ "comment": "This code defines a policy for picking up and transferring an object, with specific timings and positions. The policy is applied within the `test_policy` function, which also sets up the environment and allows for onscreen rendering and noise injection.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":141-157",
+ "content": "0, \"xyz\": peg_xyz + np.array([0, 0, 0.08]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 1}, # approach the cube\n {\"t\": 170, \"xyz\": peg_xyz + np.array([0, 0, -0.03]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 1}, # go down\n {\"t\": 220, \"xyz\": peg_xyz + np.array([0, 0, -0.03]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 0}, # close gripper\n {\"t\": 285, \"xyz\": meet_xyz + np.array([0.1, 0, lift_right]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 0}, # approach meet position\n {\"t\": 340, \"xyz\": meet_xyz + np.array([0.05, 0, lift_right]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 0}, # insertion\n {\"t\": 400, \"xyz\": meet_xyz + np.array([0.05, 0, lift_right]), \"quat\": gripper_pick_quat_right.elements, \"gripper\": 0}, # insertion\n ]\ndef test_policy(task_name):\n # example rolling out pick_and_transfer policy\n onscreen_render = True\n inject_noise = False\n # setup the environment\n episode_len = SIM_TASK_CONFIGS[task_name]['episode_len']"
+ },
+ {
+ "comment": "The code initializes an environment (env) depending on the task_name, and then executes two episodes of actions. For each episode, it resets the environment, performs actions based on a PickAndTransferPolicy, and updates the state. If onscreen_render is True, it renders the state using matplotlib. It calculates the episode return and prints whether the episode was successful or not based on the return value. The code is called as a main function.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":158-190",
+ "content": " if 'sim_transfer_cube' in task_name:\n env = make_ee_sim_env('sim_transfer_cube')\n elif 'sim_insertion' in task_name:\n env = make_ee_sim_env('sim_insertion')\n else:\n raise NotImplementedError\n for episode_idx in range(2):\n ts = env.reset()\n episode = [ts]\n if onscreen_render:\n ax = plt.subplot()\n plt_img = ax.imshow(ts.observation['images']['angle'])\n plt.ion()\n policy = PickAndTransferPolicy(inject_noise)\n for step in range(episode_len):\n action = policy(ts)\n ts = env.step(action)\n episode.append(ts)\n if onscreen_render:\n plt_img.set_data(ts.observation['images']['angle'])\n plt.pause(0.02)\n plt.close()\n episode_return = np.sum([ts.reward for ts in episode[1:]])\n if episode_return > 0:\n print(f\"{episode_idx=} Successful, {episode_return=}\")\n else:\n print(f\"{episode_idx=} Failed\")\nif __name__ == '__main__':"
+ },
+ {
+ "comment": "The code is calling a test_policy function with the task name \"sim_transfer_cube_scripted\". This suggests it's testing a simulation script for transferring a cube.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/scripted_policy.py\":191-192",
+ "content": " test_task_name = 'sim_transfer_cube_scripted'\n test_policy(test_task_name)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/32aeb5cb-10b7-4c0f-bcf1-893eed15be1c.json b/docs/doc/32aeb5cb-10b7-4c0f-bcf1-893eed15be1c.json
new file mode 100644
index 00000000..77ec92fb
--- /dev/null
+++ b/docs/doc/32aeb5cb-10b7-4c0f-bcf1-893eed15be1c.json
@@ -0,0 +1,85 @@
+{
+ "summary": "The code trains a neural network, visualizes predictions, logs progress, handles exceptions, performs forward/backward passes, updates policy state, saves checkpoints, and sets up data loaders for validation. It plots and saves commanded, observed, and predicted angular speeds for an actuator network, initializes a transformer-based prediction network, calculates MSE loss, normalizes data, and trains the actuator network if necessary.",
+ "details": [
+ {
+ "comment": "This code is importing necessary libraries and defining parameters for training an actuator network. The actuator network takes in observed speed inputs and converts them into desired commanded speeds at test time. It will train the network using specified batch sizes, learning rate, weight decay, number of steps, and save checkpoints periodically.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":1-40",
+ "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom torch.utils.data import DataLoader\nimport os\nimport h5py\nimport math\nimport wandb\nimport pickle\nimport matplotlib.pyplot as plt\nfrom copy import deepcopy\nfrom tqdm import tqdm\nfrom utils import find_all_hdf5\nfrom imitate_episodes import repeater, compute_dict_mean\nimport IPython\ne = IPython.embed\ndef main():\n ### Idea\n # input : o o o o o o # observed speed \n # target: a a a a a a # commanded speed\n # at test time, input desired speed profile and convert that to command\n #########################################################\n history_len = 50\n future_len = 50\n prediction_len = 50\n batch_size_train = 16\n batch_size_val = 16\n lr = 1e-4\n weight_decay = 1e-4\n num_steps = 10000\n validate_every = 2000\n save_every = 2000\n expr_name = f'actuator_network_test_{history_len}_{future_len}_{prediction_len}'\n ckpt_dir = f'/scr/tonyzhao/train_logs/{expr_name}' if os.getlogin() == 'tonyzhao' else f'./ckpts/{expr_name}'"
+ },
+ {
+ "comment": "Code initializes variables, asserts conditions, initializes a wandb project, checks if a directory exists, finds HDF5 files in the dataset directory, calculates train and validation split, and prints information about the data source.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":41-60",
+ "content": " dataset_dir = '/scr/tonyzhao/compressed_datasets/aloha_mobile_fork/' if os.getlogin() == 'tonyzhao' else '/home/zfu/data/aloha_mobile_fork/'\n #########################################################\n assert(history_len + future_len >= prediction_len)\n assert(future_len % prediction_len == 0)\n wandb.init(project=\"mobile-aloha2\", reinit=True, entity=\"mobile-aloha2\", name=expr_name) # mode='disabled', \n if not os.path.isdir(ckpt_dir):\n os.makedirs(ckpt_dir)\n dataset_path_list = find_all_hdf5(dataset_dir, skip_mirrored_data=True)\n dataset_path_list = [n for n in dataset_path_list if 'replayed' in n]\n num_episodes = len(dataset_path_list)\n # obtain train test split\n train_ratio = 0.9\n shuffled_episode_ids = np.random.permutation(num_episodes)\n train_episode_ids = shuffled_episode_ids[:int(train_ratio * num_episodes)]\n val_episode_ids = shuffled_episode_ids[int(train_ratio * num_episodes):]\n print(f'\\n\\nData from: {dataset_dir}\\n- Train on {len(train_episode_ids)} episodes\\n- Test on {len(val_episode_ids)} episodes\\n\\n')"
+ },
+ {
+ "comment": "This code loads normalization stats for qpos and action, either from a file or by calling get_norm_stats function. It then calculates train and val episode lengths based on episode IDs. The code asserts that the all_episode_len is divisible by prediction_len. Next, it saves the dataset stats in a pickle file, constructs train and val datasets using EpisodicDataset class, and utilizes these datasets for further training.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":62-79",
+ "content": " # obtain normalization stats for qpos and action\n # if load_pretrain:\n # with open(os.path.join('/home/zfu/interbotix_ws/src/act/ckpts/pretrain_all', 'dataset_stats.pkl'), 'rb') as f:\n # norm_stats = pickle.load(f)\n # print('Loaded pretrain dataset stats')\n norm_stats, all_episode_len = get_norm_stats(dataset_path_list)\n train_episode_len = [all_episode_len[i] for i in train_episode_ids]\n val_episode_len = [all_episode_len[i] for i in val_episode_ids]\n assert(all_episode_len[0] % prediction_len == 0)\n # save dataset stats\n stats_path = os.path.join(ckpt_dir, f'actuator_net_stats.pkl')\n with open(stats_path, 'wb') as f:\n pickle.dump(norm_stats, f)\n # construct dataset and dataloader\n train_dataset = EpisodicDataset(dataset_path_list, norm_stats, train_episode_ids, train_episode_len, history_len, future_len, prediction_len)\n val_dataset = EpisodicDataset(dataset_path_list, norm_stats, val_episode_ids, val_episode_len, history_len, future_len, prediction_len)"
+ },
+ {
+ "comment": "Creates data loaders for training and validation datasets. Initializes ActuatorNetwork model, optimizer, and prints the number of parameters. Sets initial minimum validation loss and best checkpoint information. Repeats training data loader for iterations. Validates model performance at specified intervals.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":80-101",
+ "content": " train_dataloader = DataLoader(train_dataset, batch_size=batch_size_train, shuffle=True, pin_memory=True, num_workers=1, prefetch_factor=1)\n val_dataloader = DataLoader(val_dataset, batch_size=batch_size_val, shuffle=True, pin_memory=True, num_workers=1, prefetch_factor=1)\n policy = ActuatorNetwork(prediction_len).cuda()\n optimizer = torch.optim.AdamW(policy.parameters(), lr=lr, weight_decay=weight_decay)\n n_parameters = sum(p.numel() for p in policy.parameters() if p.requires_grad)\n print(\"number of parameters: %.2fM\" % (n_parameters/1e6,))\n min_val_loss = np.inf\n best_ckpt_info = None\n train_dataloader = repeater(train_dataloader)\n for step in tqdm(range(num_steps+1)):\n # validation\n if step % validate_every == 0:\n print('validating')\n with torch.inference_mode():\n policy.eval()\n validation_dicts = []\n for batch_idx, data in enumerate(val_dataloader):\n observed_speed, commanded_speed = data"
+ },
+ {
+ "comment": "This code measures the validation loss during training, keeps track of the best validation loss so far, logs the current validation summary to Wandb, and prints out a summary for the current epoch. It also visualizes predictions with a separate function.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":102-120",
+ "content": " out, forward_dict = policy(observed_speed.cuda(), commanded_speed.cuda())\n validation_dicts.append(forward_dict)\n validation_summary = compute_dict_mean(validation_dicts)\n epoch_val_loss = validation_summary['loss']\n if epoch_val_loss < min_val_loss:\n min_val_loss = epoch_val_loss\n best_ckpt_info = (step, min_val_loss, deepcopy(policy.state_dict()))\n for k in list(validation_summary.keys()):\n validation_summary[f'val_{k}'] = validation_summary.pop(k) \n wandb.log(validation_summary, step=step)\n print(f'Val loss: {epoch_val_loss:.5f}')\n summary_string = ''\n for k, v in validation_summary.items():\n summary_string += f'{k}: {v.item():.3f} '\n print(summary_string)\n visualize_prediction(dataset_path_list, val_episode_ids, policy, norm_stats, history_len, future_len, prediction_len, ckpt_dir, step, 'val')"
+ },
+ {
+ "comment": "The code is training an actuator network policy using data from a dataloader. It performs forward and backward passes to calculate loss, updates the policy's state with an optimizer, logs progress to W&B, saves checkpoints at specified intervals, and overwrites the latest checkpoint with the final step of training.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":121-145",
+ "content": " visualize_prediction(dataset_path_list, train_episode_ids, policy, norm_stats, history_len, future_len, prediction_len, ckpt_dir, step, 'train')\n # training\n policy.train()\n optimizer.zero_grad()\n data = next(train_dataloader)\n observed_speed, commanded_speed = data\n out, forward_dict = policy(observed_speed.cuda(), commanded_speed.cuda())\n # backward\n loss = forward_dict['loss']\n loss.backward()\n optimizer.step()\n wandb.log(forward_dict, step=step) # not great, make training 1-2% slower\n if step % save_every == 0:\n ckpt_path = os.path.join(ckpt_dir, f'actuator_net_step_{step}.ckpt')\n torch.save(policy.state_dict(), ckpt_path)\n ckpt_path = os.path.join(ckpt_dir, f'actuator_net_last.ckpt')\n torch.save(policy.state_dict(), ckpt_path)\n best_step, min_val_loss, best_state_dict = best_ckpt_info\n ckpt_path = os.path.join(ckpt_dir, f'actuator_net_step_{best_step}.ckpt')\n torch.save(best_state_dict, ckpt_path)"
+ },
+ {
+ "comment": "This code segment is responsible for training a neural network and visualizing the predictions. It prints the minimum validation loss and the corresponding step number when training finishes. The visualize_prediction function reads data from a dataset path list, selects episodes for visualization, loads data from HDF5 files, normalizes observed speeds, and provides an unnormalized output function. It also handles potential exceptions during data loading.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":146-166",
+ "content": " print(f'Training finished:\\nval loss {min_val_loss:.6f} at step {best_step}')\ndef visualize_prediction(dataset_path_list, episode_ids, policy, norm_stats, history_len, future_len, prediction_len, ckpt_dir, step, name):\n num_vis = 2\n episode_ids = episode_ids[:num_vis]\n vis_path = [dataset_path_list[i] for i in episode_ids]\n for i, dataset_path in enumerate(vis_path):\n try:\n with h5py.File(dataset_path, 'r') as root:\n commanded_speed = root['/base_action'][()]\n observed_speed = root['/obs_tracer'][()]\n except Exception as ee:\n print(f'Error loading {dataset_path} in get_norm_stats')\n print(ee)\n quit()\n # commanded_speed = (commanded_speed - norm_stats[\"commanded_speed_mean\"]) / norm_stats[\"commanded_speed_std\"]\n norm_observed_speed = (observed_speed - norm_stats[\"observed_speed_mean\"]) / norm_stats[\"observed_speed_std\"]\n out_unnorm_fn = lambda x: (x * norm_stats[\"commanded_speed_std\"]) + norm_stats[\"commanded_speed_mean\"]"
+ },
+ {
+ "comment": "This code segment is preparing input data and feeding it to a neural network policy for prediction. The predicted commanded speed values are then plotted alongside the actual commanded and observed speeds in a plot.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":168-188",
+ "content": " history_pad = np.zeros((history_len, 2))\n future_pad = np.zeros((future_len, 2))\n norm_observed_speed = np.concatenate([history_pad, norm_observed_speed, future_pad], axis=0)\n episode_len = commanded_speed.shape[0]\n all_pred = []\n for t in range(0, episode_len, prediction_len):\n offset_start_ts = t + history_len\n policy_input = norm_observed_speed[offset_start_ts-history_len: offset_start_ts+future_len]\n policy_input = torch.from_numpy(policy_input).float().unsqueeze(dim=0).cuda()\n pred = policy(policy_input)\n pred = pred.detach().cpu().numpy()[0]\n all_pred += out_unnorm_fn(pred).tolist()\n all_pred = np.array(all_pred)\n plot_path = os.path.join(ckpt_dir, f'{name}{i}_step{step}_linear')\n plt.figure()\n plt.plot(commanded_speed[:, 0], label='commanded_speed_linear')\n plt.plot(observed_speed[:, 0], label='observed_speed_linear')\n plt.plot(all_pred[:, 0], label='pred_commanded_speed_linear')"
+ },
+ {
+ "comment": "The code plots the commanded, observed, and predicted angular speeds of an actuator network. It saves the resulting plot in a specified directory. The code also includes vertical dotted lines at regular intervals for visual reference. The ActuatorNetwork class initializes a transformer encoder with a specific number of layers and heads.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":189-216",
+ "content": " # plot vertical grey dotted lines every prediction_len\n for t in range(0, episode_len, prediction_len):\n plt.axvline(t, linestyle='--', color='grey')\n plt.legend()\n plt.savefig(plot_path)\n plt.close()\n plot_path = os.path.join(ckpt_dir, f'{name}{i}_step{step}_angular')\n plt.figure()\n plt.plot(commanded_speed[:, 1], label='commanded_speed_angular')\n plt.plot(observed_speed[:, 1], label='observed_speed_angular')\n plt.plot(all_pred[:, 1], label='pred_commanded_speed_angular')\n # plot vertical dotted lines every prediction_len\n for t in range(0, episode_len, prediction_len):\n plt.axvline(t, linestyle='--', color='grey')\n plt.legend()\n plt.savefig(plot_path)\n plt.close()\nclass ActuatorNetwork(nn.Module):\n def __init__(self, prediction_len):\n super().__init__()\n d_model = 256\n encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=8)\n self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=3)"
+ },
+ {
+ "comment": "This code initializes a network for transformer-based prediction. It includes a PositionalEncoding layer, input and output projection layers, and a prediction length parameter. During training time, it rearranges input data, applies positional encoding, passes through the transformer, and calculates an MSE loss between predicted and target outputs. It returns predicted outputs and loss dictionary.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":217-242",
+ "content": " self.pe = PositionalEncoding(d_model)\n self.in_proj = nn.Linear(2, d_model)\n self.out_proj = nn.Linear(d_model, 2)\n self.prediction_len = prediction_len\n def forward(self, src, tgt=None):\n if tgt is not None: # training time\n # (batch, seq, feature) -> (seq, batch, feature)\n src = self.in_proj(src)\n src = torch.einsum('b s d -> s b d', src)\n src = self.pe(src)\n out = self.transformer(src)\n tgt = torch.einsum('b s d -> s b d', tgt)\n assert(self.prediction_len == tgt.shape[0])\n out = out[0: self.prediction_len] # take first few tokens only for prediction\n out = self.out_proj(out)\n l2_loss = loss = F.mse_loss(out, tgt)\n loss_dict = {'loss': l2_loss}\n out = torch.einsum('s b d -> b s d', out)\n return out, loss_dict\n else:\n src = self.in_proj(src)\n src = torch.einsum('b s d -> s b d', src)\n src = self.pe(src)"
+ },
+ {
+ "comment": "train_actuator_network.py:243-271 - Applies transformer and positional encoding to source data, extracts the first few tokens for prediction, and then rearranges the output.\nPositionalEncoding - Generates positional encodings of a given size and applies them as an additional dimension in an embedding layer.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":243-271",
+ "content": " out = self.transformer(src)\n out = out[0: self.prediction_len] # take first few tokens only for prediction\n out = self.out_proj(out)\n out = torch.einsum('s b d -> b s d', out)\n return out\nclass PositionalEncoding(nn.Module):\n def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):\n super().__init__()\n self.dropout = nn.Dropout(p=dropout)\n position = torch.arange(max_len).unsqueeze(1)\n div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))\n pe = torch.zeros(max_len, 1, d_model)\n pe[:, 0, 0::2] = torch.sin(position * div_term)\n pe[:, 0, 1::2] = torch.cos(position * div_term)\n self.register_buffer('pe', pe)\n def forward(self, x):\n \"\"\"\n Arguments:\n x: Tensor, shape ``[seq_len, batch_size, embedding_dim]``\n \"\"\"\n x = x + self.pe[:x.size(0)]\n return self.dropout(x)\ndef get_norm_stats(dataset_path_list):\n all_commanded_speed = []"
+ },
+ {
+ "comment": "This code loads and normalizes commanded and observed speed data from multiple datasets. It calculates the mean and standard deviation for both sets of data, clips any outliers in the standard deviation, and stores the normalized data for further analysis or training purposes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":272-294",
+ "content": " all_observed_speed = []\n all_episode_len = []\n for dataset_path in dataset_path_list:\n try:\n with h5py.File(dataset_path, 'r') as root:\n commanded_speed = root['/base_action'][()]\n observed_speed = root['/obs_tracer'][()]\n except Exception as e:\n print(f'Error loading {dataset_path} in get_norm_stats')\n print(e)\n quit()\n all_commanded_speed.append(torch.from_numpy(commanded_speed))\n all_observed_speed.append(torch.from_numpy(observed_speed))\n all_episode_len.append(len(commanded_speed))\n all_commanded_speed = torch.cat(all_commanded_speed, dim=0)\n all_observed_speed = torch.cat(all_observed_speed, dim=0)\n # normalize all_commanded_speed\n commanded_speed_mean = all_commanded_speed.mean(dim=[0]).float()\n commanded_speed_std = all_commanded_speed.std(dim=[0]).float()\n commanded_speed_std = torch.clip(commanded_speed_std, 1e-2, np.inf) # clipping\n # normalize all_observed_speed"
+ },
+ {
+ "comment": "This code calculates the mean and standard deviation of observed speeds, clips the standard deviation to prevent extreme values, and stores these statistics in a dictionary. The dictionary contains the means and standard deviations for both commanded and observed speeds. The code also defines an EpisodicDataset class that initializes with dataset paths, normalization stats, episode IDs, episode lengths, history length, future length, and prediction length.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":295-315",
+ "content": " observed_speed_mean = all_observed_speed.mean(dim=[0]).float()\n observed_speed_std = all_observed_speed.std(dim=[0]).float()\n observed_speed_std = torch.clip(observed_speed_std, 1e-2, np.inf) # clipping\n stats = {\"commanded_speed_mean\": commanded_speed_mean.numpy(), \"commanded_speed_std\": commanded_speed_std.numpy(),\n \"observed_speed_mean\": observed_speed_mean.numpy(), \"observed_speed_std\": observed_speed_std.numpy()}\n return stats, all_episode_len\nclass EpisodicDataset(torch.utils.data.Dataset):\n def __init__(self, dataset_path_list, norm_stats, episode_ids, episode_len, history_len, future_len, prediction_len):\n super(EpisodicDataset).__init__()\n self.episode_ids = episode_ids\n self.dataset_path_list = dataset_path_list\n self.norm_stats = norm_stats\n self.episode_len = episode_len\n self.cumulative_len = np.cumsum(self.episode_len)\n self.max_episode_len = max(episode_len)\n self.history_len = history_len\n self.future_len = future_len"
+ },
+ {
+ "comment": "Initializes attributes and checks if it is a simulation. Returns length based on episode lengths. Locates transition index, finds the dataset path, and reads commanded speed from the HDF5 file.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":316-339",
+ "content": " self.prediction_len = prediction_len\n self.is_sim = False\n self.history_pad = np.zeros((self.history_len, 2))\n self.future_pad = np.zeros((self.future_len, 2))\n self.prediction_pad = np.zeros((self.prediction_len, 2))\n self.__getitem__(0) # initialize self.is_sim\n def __len__(self):\n return sum(self.episode_len)\n def _locate_transition(self, index):\n assert index < self.cumulative_len[-1]\n episode_index = np.argmax(self.cumulative_len > index) # argmax returns first True index\n start_ts = index - (self.cumulative_len[episode_index] - self.episode_len[episode_index])\n episode_id = self.episode_ids[episode_index]\n return episode_id, start_ts\n def __getitem__(self, index):\n episode_id, start_ts = self._locate_transition(index)\n dataset_path = self.dataset_path_list[episode_id]\n try:\n # print(dataset_path)\n with h5py.File(dataset_path, 'r') as root:\n commanded_speed = root['/base_action'][()]"
+ },
+ {
+ "comment": "This code is preparing input data for a machine learning model. It concatenates historical and future observations with commanded speeds, adjusts the timestamps, and normalizes the data to have zero mean and unit standard deviation. If there's an error loading the dataset, it prints an error message.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":340-356",
+ "content": " observed_speed = root['/obs_tracer'][()]\n observed_speed = np.concatenate([self.history_pad, observed_speed, self.future_pad], axis=0)\n commanded_speed = np.concatenate([commanded_speed, self.prediction_pad], axis=0)\n offset_start_ts = start_ts + self.history_len\n commanded_speed = commanded_speed[start_ts: start_ts+self.prediction_len]\n observed_speed = observed_speed[offset_start_ts-self.history_len: offset_start_ts+self.future_len]\n commanded_speed = torch.from_numpy(commanded_speed).float()\n observed_speed = torch.from_numpy(observed_speed).float()\n # normalize to mean 0 std 1\n commanded_speed = (commanded_speed - self.norm_stats[\"commanded_speed_mean\"]) / self.norm_stats[\"commanded_speed_std\"]\n observed_speed = (observed_speed - self.norm_stats[\"observed_speed_mean\"]) / self.norm_stats[\"observed_speed_std\"]\n except:\n print(f'Error loading {dataset_path} in __getitem__')"
+ },
+ {
+ "comment": "This code appears to be part of a program that trains an actuator network. It defines a function, possibly for training the actuator network, which may take in image data, joint position data, and other related data, calculates observed and commanded speeds, and returns these values. The code also includes a quit() command and some print statements for debugging purposes. Lastly, there is an if __name__ == '__main__': statement that suggests this code could be executed directly as a main program when the script is run.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_actuator_network.py\":357-366",
+ "content": " quit()\n # print(image_data.dtype, qpos_data.dtype, action_data.dtype, is_pad.dtype)\n return observed_speed, commanded_speed\nif __name__ == '__main__':\n main()"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/34ebf763-4413-4a8e-8d86-a9bdb8d93ace.json b/docs/doc/34ebf763-4413-4a8e-8d86-a9bdb8d93ace.json
new file mode 100644
index 00000000..91374214
--- /dev/null
+++ b/docs/doc/34ebf763-4413-4a8e-8d86-a9bdb8d93ace.json
@@ -0,0 +1,60 @@
+{
+ "summary": "The code creates a policy network for multi-camera image tasks, trains a noise residual prediction model, and includes an ACTPolicy class for reinforcement learning with normalization and loss calculation. It also defines a CNNMLP model for processing states, images, actions, with KL divergence, MSE loss, and training/inference modes.",
+ "details": [
+ {
+ "comment": "The code imports necessary libraries and classes, defines a class for the DiffusionPolicy model, and includes parameters such as camera names, observation horizon, action horizon, and prediction horizon. The function build_ACT_model_and_optimizer and build_CNNMLP_model_and_optimizer are used to create models and optimizers, while replace_bn_with_gn and ConditionalUnet1D functions are called. EMAModel and scheduling classes DDPMScheduler and DDIMScheduler are also imported for training and scheduling purposes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":0-27",
+ "content": "import torch.nn as nn\nfrom torch.nn import functional as F\nimport torchvision.transforms as transforms\nimport torch\nimport numpy as np\nfrom detr.main import build_ACT_model_and_optimizer, build_CNNMLP_model_and_optimizer\nimport IPython\ne = IPython.embed\nfrom collections import OrderedDict\nfrom robomimic.models.base_nets import ResNet18Conv, SpatialSoftmax\nfrom robomimic.algo.diffusion_policy import replace_bn_with_gn, ConditionalUnet1D\nfrom diffusers.schedulers.scheduling_ddpm import DDPMScheduler\nfrom diffusers.schedulers.scheduling_ddim import DDIMScheduler\nfrom diffusers.training_utils import EMAModel\nclass DiffusionPolicy(nn.Module):\n def __init__(self, args_override):\n super().__init__()\n self.camera_names = args_override['camera_names']\n self.observation_horizon = args_override['observation_horizon'] ### TODO TODO TODO DO THIS\n self.action_horizon = args_override['action_horizon'] # apply chunk size\n self.prediction_horizon = args_override['prediction_horizon'] # chunk size"
+ },
+ {
+ "comment": "Initializing the model's parameters with values from args_override dictionary. Creating lists of ResNet18Conv, SpatialSoftmax, and Linear layers for each camera name. Converting lists to nn.ModuleList to facilitate efficient computation during model execution.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":28-47",
+ "content": " self.num_inference_timesteps = args_override['num_inference_timesteps']\n self.ema_power = args_override['ema_power']\n self.lr = args_override['lr']\n self.weight_decay = 0\n self.num_kp = 32\n self.feature_dimension = 64\n self.ac_dim = args_override['action_dim'] # 14 + 2\n self.obs_dim = self.feature_dimension * len(self.camera_names) + 14 # camera features and proprio\n backbones = []\n pools = []\n linears = []\n for _ in self.camera_names:\n backbones.append(ResNet18Conv(**{'input_channel': 3, 'pretrained': False, 'input_coord_conv': False}))\n pools.append(SpatialSoftmax(**{'input_shape': [512, 15, 20], 'num_kp': self.num_kp, 'temperature': 1.0, 'learnable_temperature': False, 'noise_std': 0.0}))\n linears.append(torch.nn.Linear(int(np.prod([self.num_kp, 2])), self.feature_dimension))\n backbones = nn.ModuleList(backbones)\n pools = nn.ModuleList(pools)\n linears = nn.ModuleList(linears)"
+ },
+ {
+ "comment": "This code defines a policy network with backbones, pools, linears, and noise prediction. The model is created as a PyTorch module, converted to float type, and moved to the GPU for faster computation. Optionally, an exponential moving average (EMA) model is also created if ENABLE_EMA flag is set. A noise scheduler is setup to manage the noise during training.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":49-85",
+ "content": " backbones = replace_bn_with_gn(backbones) # TODO\n noise_pred_net = ConditionalUnet1D(\n input_dim=self.ac_dim,\n global_cond_dim=self.obs_dim*self.observation_horizon\n )\n nets = nn.ModuleDict({\n 'policy': nn.ModuleDict({\n 'backbones': backbones,\n 'pools': pools,\n 'linears': linears,\n 'noise_pred_net': noise_pred_net\n })\n })\n nets = nets.float().cuda()\n ENABLE_EMA = True\n if ENABLE_EMA:\n ema = EMAModel(model=nets, power=self.ema_power)\n else:\n ema = None\n self.nets = nets\n self.ema = ema\n # setup noise scheduler\n self.noise_scheduler = DDIMScheduler(\n num_train_timesteps=50,\n beta_schedule='squaredcos_cap_v2',\n clip_sample=True,\n set_alpha_to_one=True,\n steps_offset=0,\n prediction_type='epsilon'\n )\n n_parameters = sum(p.numel() for p in self.parameters())"
+ },
+ {
+ "comment": "This code initializes an optimizer for the policy network in a multi-camera image task. It prints the number of parameters in the model and defines the __call__ method, which takes in input poses, images, actions (if training), and is_pad flags. During training, it extracts features from each camera's input, concatenates them with qpos, and adds noise to actions for better exploration.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":86-110",
+ "content": " print(\"number of parameters: %.2fM\" % (n_parameters/1e6,))\n def configure_optimizers(self):\n optimizer = torch.optim.AdamW(self.nets.parameters(), lr=self.lr, weight_decay=self.weight_decay)\n return optimizer\n def __call__(self, qpos, image, actions=None, is_pad=None):\n B = qpos.shape[0]\n if actions is not None: # training time\n nets = self.nets\n all_features = []\n for cam_id in range(len(self.camera_names)):\n cam_image = image[:, cam_id]\n cam_features = nets['policy']['backbones'][cam_id](cam_image)\n pool_features = nets['policy']['pools'][cam_id](cam_features)\n pool_features = torch.flatten(pool_features, start_dim=1)\n out_features = nets['policy']['linears'][cam_id](pool_features)\n all_features.append(out_features)\n obs_cond = torch.cat(all_features + [qpos], dim=1)\n # sample noise to add to actions\n noise = torch.randn(actions.shape, device=obs_cond.device)"
+ },
+ {
+ "comment": "This code snippet samples diffusion iterations for each data point, adds noise to clean actions based on the noise magnitude at each iteration, predicts the noise residual using a neural network, calculates the L2 loss between predicted and actual noise, and returns the loss for training purposes. It also optionally updates an exponential moving average (EMA) of the model's parameters if in training mode and EMA is not None.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":112-136",
+ "content": " # sample a diffusion iteration for each data point\n timesteps = torch.randint(\n 0, self.noise_scheduler.config.num_train_timesteps, \n (B,), device=obs_cond.device\n ).long()\n # add noise to the clean actions according to the noise magnitude at each diffusion iteration\n # (this is the forward diffusion process)\n noisy_actions = self.noise_scheduler.add_noise(\n actions, noise, timesteps)\n # predict the noise residual\n noise_pred = nets['policy']['noise_pred_net'](noisy_actions, timesteps, global_cond=obs_cond)\n # L2 loss\n all_l2 = F.mse_loss(noise_pred, noise, reduction='none')\n loss = (all_l2 * ~is_pad.unsqueeze(-1)).mean()\n loss_dict = {}\n loss_dict['l2_loss'] = loss\n loss_dict['loss'] = loss\n if self.training and self.ema is not None:\n self.ema.step(nets)\n return loss_dict"
+ },
+ {
+ "comment": "This code is initializing action from Gaussian noise at inference time. It first determines the observation, action, and prediction horizons based on the policy settings. Then it retrieves the camera-specific networks and, if the exponential moving average (EMA) is not None, uses the averaged model instead of the current one. For each camera, it extracts features by passing images through the corresponding backbones, pools, and linears. Finally, it concatenates all extracted features with qpos, initializes noisy action from Gaussian noise, and sets naction to this noisy action.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":137-161",
+ "content": " else: # inference time\n To = self.observation_horizon\n Ta = self.action_horizon\n Tp = self.prediction_horizon\n action_dim = self.ac_dim\n nets = self.nets\n if self.ema is not None:\n nets = self.ema.averaged_model\n all_features = []\n for cam_id in range(len(self.camera_names)):\n cam_image = image[:, cam_id]\n cam_features = nets['policy']['backbones'][cam_id](cam_image)\n pool_features = nets['policy']['pools'][cam_id](cam_features)\n pool_features = torch.flatten(pool_features, start_dim=1)\n out_features = nets['policy']['linears'][cam_id](pool_features)\n all_features.append(out_features)\n obs_cond = torch.cat(all_features + [qpos], dim=1)\n # initialize action from Guassian noise\n noisy_action = torch.randn(\n (B, Tp, action_dim), device=obs_cond.device)\n naction = noisy_action"
+ },
+ {
+ "comment": "The code initializes the noise scheduler and iterates through timesteps, predicting noise and performing inverse diffusion steps to remove noise from samples. It also includes functions for serializing and deserializing the model's parameters.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":163-192",
+ "content": " # init scheduler\n self.noise_scheduler.set_timesteps(self.num_inference_timesteps)\n for k in self.noise_scheduler.timesteps:\n # predict noise\n noise_pred = nets['policy']['noise_pred_net'](\n sample=naction, \n timestep=k,\n global_cond=obs_cond\n )\n # inverse diffusion step (remove noise)\n naction = self.noise_scheduler.step(\n model_output=noise_pred,\n timestep=k,\n sample=naction\n ).prev_sample\n return naction\n def serialize(self):\n return {\n \"nets\": self.nets.state_dict(),\n \"ema\": self.ema.averaged_model.state_dict() if self.ema is not None else None,\n }\n def deserialize(self, model_dict):\n status = self.nets.load_state_dict(model_dict[\"nets\"])\n print('Loaded model')\n if model_dict.get(\"ema\", None) is not None:"
+ },
+ {
+ "comment": "The code defines an `ACTPolicy` class that uses the ACT model and optimizer for reinforcement learning tasks. It normalizes images, handles both training and testing scenarios, and calculates loss during training time. The kl_weight and vq arguments are taken from args_override.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":193-217",
+ "content": " print('Loaded EMA')\n status_ema = self.ema.averaged_model.load_state_dict(model_dict[\"ema\"])\n status = [status, status_ema]\n return status\nclass ACTPolicy(nn.Module):\n def __init__(self, args_override):\n super().__init__()\n model, optimizer = build_ACT_model_and_optimizer(args_override)\n self.model = model # CVAE decoder\n self.optimizer = optimizer\n self.kl_weight = args_override['kl_weight']\n self.vq = args_override['vq']\n print(f'KL Weight {self.kl_weight}')\n def __call__(self, qpos, image, actions=None, is_pad=None, vq_sample=None):\n env_state = None\n normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n std=[0.229, 0.224, 0.225])\n image = normalize(image)\n if actions is not None: # training time\n actions = actions[:, :self.model.num_queries]\n is_pad = is_pad[:, :self.model.num_queries]\n loss_dict = dict()"
+ },
+ {
+ "comment": "The code is defining a policy function for an agent in a reinforcement learning environment. It calculates loss based on differences between predicted and actual actions, as well as KL divergence to penalize the model's confidence in its predictions. The code also defines an optimizer for training and a function to encode actions into binary representations for VQ-VAE (Variable Quantization Variational Autoencoder) models.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":218-239",
+ "content": " a_hat, is_pad_hat, (mu, logvar), probs, binaries = self.model(qpos, image, env_state, actions, is_pad, vq_sample)\n if self.vq or self.model.encoder is None:\n total_kld = [torch.tensor(0.0)]\n else:\n total_kld, dim_wise_kld, mean_kld = kl_divergence(mu, logvar)\n if self.vq:\n loss_dict['vq_discrepancy'] = F.l1_loss(probs, binaries, reduction='mean')\n all_l1 = F.l1_loss(actions, a_hat, reduction='none')\n l1 = (all_l1 * ~is_pad.unsqueeze(-1)).mean()\n loss_dict['l1'] = l1\n loss_dict['kl'] = total_kld[0]\n loss_dict['loss'] = loss_dict['l1'] + loss_dict['kl'] * self.kl_weight\n return loss_dict\n else: # inference time\n a_hat, _, (_, _), _, _ = self.model(qpos, image, env_state, vq_sample=vq_sample) # no action, sample from prior\n return a_hat\n def configure_optimizers(self):\n return self.optimizer\n @torch.no_grad()\n def vq_encode(self, qpos, actions, is_pad):"
+ },
+ {
+ "comment": "This code defines a class for the CNNMLP policy model in an environment. The __init__ function initializes the model and optimizer based on arguments override, while the __call__ function takes in state (qpos), image, actions (if training time), and is_pad for processing. It normalizes the image, and if actions are provided, it calculates the MSE loss between predicted (a_hat) and actual (actions) actions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":240-269",
+ "content": " actions = actions[:, :self.model.num_queries]\n is_pad = is_pad[:, :self.model.num_queries]\n _, _, binaries, _, _ = self.model.encode(qpos, actions, is_pad)\n return binaries\n def serialize(self):\n return self.state_dict()\n def deserialize(self, model_dict):\n return self.load_state_dict(model_dict)\nclass CNNMLPPolicy(nn.Module):\n def __init__(self, args_override):\n super().__init__()\n model, optimizer = build_CNNMLP_model_and_optimizer(args_override)\n self.model = model # decoder\n self.optimizer = optimizer\n def __call__(self, qpos, image, actions=None, is_pad=None):\n env_state = None # TODO\n normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n std=[0.229, 0.224, 0.225])\n image = normalize(image)\n if actions is not None: # training time\n actions = actions[:, 0]\n a_hat = self.model(qpos, image, env_state, actions)\n mse = F.mse_loss(actions, a_hat)"
+ },
+ {
+ "comment": "This code is a part of a neural network policy model. It calculates the KL divergence between two variables, and depending on whether it's training or inference time, it either returns the action estimate (a_hat) or the losses for different loss types like mse. The optimizer configuration function returns the optimizer used by the model.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/policy.py\":270-294",
+ "content": " loss_dict = dict()\n loss_dict['mse'] = mse\n loss_dict['loss'] = loss_dict['mse']\n return loss_dict\n else: # inference time\n a_hat = self.model(qpos, image, env_state) # no action, sample from prior\n return a_hat\n def configure_optimizers(self):\n return self.optimizer\ndef kl_divergence(mu, logvar):\n batch_size = mu.size(0)\n assert batch_size != 0\n if mu.data.ndimension() == 4:\n mu = mu.view(mu.size(0), mu.size(1))\n if logvar.data.ndimension() == 4:\n logvar = logvar.view(logvar.size(0), logvar.size(1))\n klds = -0.5 * (1 + logvar - mu.pow(2) - logvar.exp())\n total_kld = klds.sum(1).mean(0, True)\n dimension_wise_kld = klds.mean(0)\n mean_kld = klds.mean(1).mean(0, True)\n return total_kld, dimension_wise_kld, mean_kld"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/36cd1645-4712-4e54-83f4-a8ce014a390f.json b/docs/doc/36cd1645-4712-4e54-83f4-a8ce014a390f.json
new file mode 100644
index 00000000..26d68d80
--- /dev/null
+++ b/docs/doc/36cd1645-4712-4e54-83f4-a8ce014a390f.json
@@ -0,0 +1,10 @@
+{
+ "summary": "The code imports necessary modules and sets up a setup script for the \"detr\" package using setuptools. It defines the package name, version, licenses, and reads the long description from the README file.",
+ "details": [
+ {
+ "comment": "The code imports necessary modules and sets up a setup script for the \"detr\" package using setuptools. It defines the package name, version, licenses, and reads the long description from the README file.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/setup.py\":0-9",
+ "content": "from distutils.core import setup\nfrom setuptools import find_packages\nsetup(\n name='detr',\n version='0.0.0',\n packages=find_packages(),\n license='MIT License',\n long_description=open('README.md').read(),\n)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/41e6ec73-e0fc-4445-8118-8616de3da593.json b/docs/doc/41e6ec73-e0fc-4445-8118-8616de3da593.json
new file mode 100644
index 00000000..b2f5c46c
--- /dev/null
+++ b/docs/doc/41e6ec73-e0fc-4445-8118-8616de3da593.json
@@ -0,0 +1,100 @@
+{
+ "summary": "The code uses the ACT-Plus-Plus framework for robot manipulation, incorporating deep reinforcement learning and latent models with visual inputs. It saves and plots training curves while supporting customization through command-line arguments, adding new \"--vq_class\" and \"--vq_dim\" options for the latent model's class and dimensionality.",
+ "details": [
+ {
+ "comment": "The code imports necessary libraries, defines functions for robot manipulation and data processing. It initializes parameters from command line inputs and sets a seed for reproducibility. This script aims to train a latent model in the ACT-Plus-Plus framework.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":0-35",
+ "content": "import torch\nimport numpy as np\nimport os\nimport pickle\nimport argparse\nimport matplotlib.pyplot as plt\nfrom copy import deepcopy\nfrom tqdm import tqdm\nfrom einops import rearrange\nimport torch.nn.functional as F\nfrom constants import DT\nfrom constants import PUPPET_GRIPPER_JOINT_OPEN\nfrom utils import load_data # data functions\nfrom utils import sample_box_pose, sample_insertion_pose # robot functions\nfrom utils import compute_dict_mean, set_seed, detach_dict # helper functions\nfrom policy import ACTPolicy, CNNMLPPolicy\nfrom visualize_episodes import save_videos\nfrom detr.models.latent_model import Latent_Model_Transformer\nfrom sim_env import BOX_POSE\nimport IPython\ne = IPython.embed\ndef main(args):\n set_seed(1)\n # command line parameters\n is_eval = args['eval']\n ckpt_dir = args['ckpt_dir']\n policy_class = args['policy_class']\n onscreen_render = args['onscreen_render']\n task_name = args['task_name']\n batch_size_train = args['batch_size']\n batch_size_val = args['batch_size']\n num_epochs = args['num_epochs']"
+ },
+ {
+ "comment": "This code retrieves task parameters from the task name and configuration files, sets fixed parameters for the model, and assigns values to variables like dataset_dir, num_episodes, episode_len, camera_names. The code also applies a lambda function as a name filter, if specified in the configuration file.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":37-64",
+ "content": " # get task parameters\n is_sim = task_name[:4] == 'sim_'\n if is_sim:\n from constants import SIM_TASK_CONFIGS\n task_config = SIM_TASK_CONFIGS[task_name]\n else:\n from aloha_scripts.constants import TASK_CONFIGS\n task_config = TASK_CONFIGS[task_name]\n dataset_dir = task_config['dataset_dir']\n num_episodes = task_config['num_episodes']\n episode_len = task_config['episode_len']\n camera_names = task_config['camera_names']\n name_filter = task_config.get('name_filter', lambda n: True)\n # fixed parameters\n state_dim = 14\n lr_backbone = 1e-5\n backbone = 'resnet18'\n if policy_class == 'ACT':\n enc_layers = 4\n dec_layers = 7\n nheads = 8\n policy_config = {'lr': args['lr'],\n 'num_queries': args['chunk_size'],\n 'kl_weight': args['kl_weight'],\n 'hidden_dim': args['hidden_dim'],\n 'dim_feedforward': args['dim_feedforward'],\n 'lr_backbone': lr_backbone,"
+ },
+ {
+ "comment": "This code is defining the configuration for training a latent model. It has different policy classes, such as 'Transformer', 'CNNMLP', and others not yet implemented. The configuration includes parameters like learning rate, backbone architecture, camera names, episode length, etc. If an unsupported policy class is given, it raises a NotImplementedError.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":65-91",
+ "content": " 'backbone': backbone,\n 'enc_layers': enc_layers,\n 'dec_layers': dec_layers,\n 'nheads': nheads,\n 'camera_names': camera_names,\n 'vq': True,\n 'vq_class': args['vq_class'],\n 'vq_dim': args['vq_dim'],\n }\n elif policy_class == 'CNNMLP':\n policy_config = {'lr': args['lr'], 'lr_backbone': lr_backbone, 'backbone' : backbone, 'num_queries': 1,\n 'camera_names': camera_names,}\n else:\n raise NotImplementedError\n config = {\n 'num_epochs': num_epochs,\n 'ckpt_dir': ckpt_dir,\n 'episode_len': episode_len,\n 'state_dim': state_dim,\n 'lr': args['lr'],\n 'policy_class': policy_class,\n 'onscreen_render': onscreen_render,\n 'policy_config': policy_config,\n 'task_name': task_name,\n 'seed': args['seed'],\n 'temporal_agg': args['temporal_agg'],"
+ },
+ {
+ "comment": "This code snippet is loading data and training a behavioral cloning (BC) model. If `is_eval` is true, it evaluates the best checkpoint. It loads the data, saves the dataset stats if necessary, trains the BC model, and stores information about the best checkpoint.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":92-119",
+ "content": " 'camera_names': camera_names,\n 'real_robot': not is_sim\n }\n # if is_eval:\n # ckpt_names = [f'policy_best.ckpt']\n # results = []\n # for ckpt_name in ckpt_names:\n # success_rate, avg_return = eval_bc(config, ckpt_name, save_episode=True)\n # results.append([ckpt_name, success_rate, avg_return])\n # for ckpt_name, success_rate, avg_return in results:\n # print(f'{ckpt_name}: {success_rate=} {avg_return=}')\n # print()\n # exit()\n train_dataloader, val_dataloader, stats, _ = load_data(dataset_dir, name_filter, camera_names, batch_size_train, batch_size_val)\n # save dataset stats\n # if not os.path.isdir(ckpt_dir):\n # os.makedirs(ckpt_dir)\n # stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')\n # with open(stats_path, 'wb') as f:\n # pickle.dump(stats, f)\n ckpt_name = f'policy_last.ckpt'\n best_ckpt_info = train_bc(train_dataloader, val_dataloader, config, ckpt_name)\n best_epoch, min_val_loss, best_state_dict = best_ckpt_info"
+ },
+ {
+ "comment": "Code snippet saves the best checkpoint for a latent model, defines a policy function based on the given class, and gets an image from the observations.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":121-153",
+ "content": " # save best checkpoint\n ckpt_path = os.path.join(ckpt_dir, f'latent_model_best.ckpt')\n torch.save(best_state_dict, ckpt_path)\n print(f'Best ckpt, val loss {min_val_loss:.6f} @ epoch{best_epoch}')\ndef make_policy(policy_class, policy_config):\n if policy_class == 'ACT':\n policy = ACTPolicy(policy_config)\n elif policy_class == 'CNNMLP':\n policy = CNNMLPPolicy(policy_config)\n else:\n raise NotImplementedError\n return policy\n# def make_optimizer(policy_class, policy):\n# if policy_class == 'ACT':\n# optimizer = policy.configure_optimizers()\n# elif policy_class == 'CNNMLP':\n# optimizer = policy.configure_optimizers()\n# else:\n# raise NotImplementedError\n# return optimizer\ndef get_image(ts, camera_names):\n curr_images = []\n for cam_name in camera_names:\n curr_image = rearrange(ts.observation['images'][cam_name], 'h w c -> c h w')\n curr_images.append(curr_image)\n curr_image = np.stack(curr_images, axis=0)\n curr_image = torch.from_numpy(curr_image / 255.0).float().cuda().unsqueeze(0)"
+ },
+ {
+ "comment": "This code defines a function to evaluate the performance of a trained policy. It loads the policy from a checkpoint file, prepares the necessary configurations, and then evaluates the policy by running episodes. The function takes in the configuration, checkpoint name, and an optional parameter for saving episode results. It uses torch and pickle libraries for loading and processing data.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":154-183",
+ "content": " return curr_image\n# def eval_bc(config, ckpt_name, save_episode=True):\n# set_seed(1000)\n# ckpt_dir = config['ckpt_dir']\n# state_dim = config['state_dim']\n# real_robot = config['real_robot']\n# policy_class = config['policy_class']\n# onscreen_render = config['onscreen_render']\n# policy_config = config['policy_config']\n# camera_names = config['camera_names']\n# max_timesteps = config['episode_len']\n# task_name = config['task_name']\n# temporal_agg = config['temporal_agg']\n# onscreen_cam = 'angle'\n# # load policy and stats\n# ckpt_path = os.path.join(ckpt_dir, ckpt_name)\n# policy = make_policy(policy_class, policy_config)\n# loading_status = policy.load_state_dict(torch.load(ckpt_path))\n# print(loading_status)\n# policy.cuda()\n# policy.eval()\n# print(f'Loaded: {ckpt_path}')\n# stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')\n# with open(stats_path, 'rb') as f:\n# stats = pickle.load(f)\n# pre_process = lambda s_qpos: (s_qpos - stats['qpos_mean']) / stats['qpos_std']"
+ },
+ {
+ "comment": "This code is initializing an environment, either real or simulated, based on the \"real_robot\" flag. It then sets up variables for rollout number of episodes, maximum timesteps, query frequency (which may change depending on temporal aggregation), and stores episode returns and highest rewards in lists. The last few lines seem to set up task-specific poses for certain tasks.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":184-212",
+ "content": "# post_process = lambda a: a * stats['action_std'] + stats['action_mean']\n# # load environment\n# if real_robot:\n# from aloha_scripts.robot_utils import move_grippers # requires aloha\n# from aloha_scripts.real_env import make_real_env # requires aloha\n# env = make_real_env(init_node=True)\n# env_max_reward = 0\n# else:\n# from sim_env import make_sim_env\n# env = make_sim_env(task_name)\n# env_max_reward = env.task.max_reward\n# query_frequency = policy_config['num_queries']\n# if temporal_agg:\n# query_frequency = 1\n# num_queries = policy_config['num_queries']\n# max_timesteps = int(max_timesteps * 1) # may increase for real-world tasks\n# num_rollouts = 50\n# episode_returns = []\n# highest_rewards = []\n# for rollout_id in range(num_rollouts):\n# rollout_id += 0\n# ### set task\n# if 'sim_transfer_cube' in task_name:\n# BOX_POSE[0] = sample_box_pose() # used in sim reset\n# elif 'sim_insertion' in task_name:"
+ },
+ {
+ "comment": "This code snippet is part of a training process for a latent model. It resets the environment, performs on-screen rendering if needed, and then enters an evaluation loop to collect data for training. The code uses PyTorch for inference mode and handles on-screen rendering, image capturing, and storing data for further analysis or model training.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":213-237",
+ "content": "# BOX_POSE[0] = np.concatenate(sample_insertion_pose()) # used in sim reset\n# ts = env.reset()\n# ### onscreen render\n# if onscreen_render:\n# ax = plt.subplot()\n# plt_img = ax.imshow(env._physics.render(height=480, width=640, camera_id=onscreen_cam))\n# plt.ion()\n# ### evaluation loop\n# if temporal_agg:\n# all_time_actions = torch.zeros([max_timesteps, max_timesteps+num_queries, state_dim]).cuda()\n# qpos_history = torch.zeros((1, max_timesteps, state_dim)).cuda()\n# image_list = [] # for visualization\n# qpos_list = []\n# target_qpos_list = []\n# rewards = []\n# with torch.inference_mode():\n# for t in range(max_timesteps):\n# ### update onscreen render and wait for DT\n# if onscreen_render:\n# image = env._physics.render(height=480, width=640, camera_id=onscreen_cam)\n# plt_img.set_data(image)"
+ },
+ {
+ "comment": "This code segment is part of a deep reinforcement learning algorithm that interacts with an environment. It processes observations, pre-processes state variables (qpos), and queries the policy to generate actions. The 'policy_class' determines whether to use an ACT policy or not. If so, it queries the policy for actions at specific intervals (query_frequency) and possibly aggregates them over time if temporal_agg is set to True. This algorithm likely trains a latent model in an environment with potential visual input from cameras.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":238-259",
+ "content": "# plt.pause(DT)\n# ### process previous timestep to get qpos and image_list\n# obs = ts.observation\n# if 'images' in obs:\n# image_list.append(obs['images'])\n# else:\n# image_list.append({'main': obs['image']})\n# qpos_numpy = np.array(obs['qpos'])\n# qpos = pre_process(qpos_numpy)\n# qpos = torch.from_numpy(qpos).float().cuda().unsqueeze(0)\n# qpos_history[:, t] = qpos\n# curr_image = get_image(ts, camera_names)\n# ### query policy\n# if config['policy_class'] == \"ACT\":\n# if t % query_frequency == 0:\n# all_actions = policy(qpos, curr_image)\n# if temporal_agg:\n# all_time_actions[[t], t:t+num_queries] = all_actions\n# actions_for_curr_step = all_time_actions[:, t]\n# actions_populated = torch.all(actions_for_curr_step != 0, axis=1)"
+ },
+ {
+ "comment": "This code determines the raw action for a given step in an environment. It first checks the policy class and then applies the appropriate method to get the raw action. If the policy class is \"Exponential\", it calculates weights based on actions, sums them, and uses them to compute the raw action. If the policy class is \"CNNMLP\", it calls a predefined function \"policy\" with the current state and image as inputs. If none of these conditions are met, it raises an error. The resulting raw_action is then post-processed and used to determine target_qpos for the next step in the environment.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":260-278",
+ "content": "# actions_for_curr_step = actions_for_curr_step[actions_populated]\n# k = 0.01\n# exp_weights = np.exp(-k * np.arange(len(actions_for_curr_step)))\n# exp_weights = exp_weights / exp_weights.sum()\n# exp_weights = torch.from_numpy(exp_weights).cuda().unsqueeze(dim=1)\n# raw_action = (actions_for_curr_step * exp_weights).sum(dim=0, keepdim=True)\n# else:\n# raw_action = all_actions[:, t % query_frequency]\n# elif config['policy_class'] == \"CNNMLP\":\n# raw_action = policy(qpos, curr_image)\n# else:\n# raise NotImplementedError\n# ### post-process actions\n# raw_action = raw_action.squeeze(0).cpu().numpy()\n# action = post_process(raw_action)\n# target_qpos = action\n# ### step the environment"
+ },
+ {
+ "comment": "This code segment is tracking the reward, episode return, and highest reward during a rollout in a robotics environment. It also handles visualization by appending qpos and target_qpos to lists, and has options to save videos of the episodes. It prints the rollout results and calculates the success rate based on the highest rewards achieved.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":279-301",
+ "content": "# ts = env.step(target_qpos)\n# ### for visualization\n# qpos_list.append(qpos_numpy)\n# target_qpos_list.append(target_qpos)\n# rewards.append(ts.reward)\n# plt.close()\n# if real_robot:\n# move_grippers([env.puppet_bot_left, env.puppet_bot_right], [PUPPET_GRIPPER_JOINT_OPEN] * 2, move_time=0.5) # open\n# pass\n# rewards = np.array(rewards)\n# episode_return = np.sum(rewards[rewards!=None])\n# episode_returns.append(episode_return)\n# episode_highest_reward = np.max(rewards)\n# highest_rewards.append(episode_highest_reward)\n# print(f'Rollout {rollout_id}\\n{episode_return=}, {episode_highest_reward=}, {env_max_reward=}, Success: {episode_highest_reward==env_max_reward}')\n# if save_episode:\n# save_videos(image_list, DT, video_path=os.path.join(ckpt_dir, f'video{rollout_id}.mp4'))\n# success_rate = np.mean(np.array(highest_rewards) == env_max_reward)"
+ },
+ {
+ "comment": "The code calculates the success rate and average return for a set of rollouts in an environment. It then creates a summary string with reward thresholds, success rate, and average return, and writes it to a text file along with episode returns and highest rewards. The function is part of a larger codebase for training a latent model using policy and latent_model parameters.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":302-325",
+ "content": "# avg_return = np.mean(episode_returns)\n# summary_str = f'\\nSuccess rate: {success_rate}\\nAverage return: {avg_return}\\n\\n'\n# for r in range(env_max_reward+1):\n# more_or_equal_r = (np.array(highest_rewards) >= r).sum()\n# more_or_equal_r_rate = more_or_equal_r / num_rollouts\n# summary_str += f'Reward >= {r}: {more_or_equal_r}/{num_rollouts} = {more_or_equal_r_rate*100}%\\n'\n# print(summary_str)\n# # save success rate to txt\n# result_file_name = 'result_' + ckpt_name.split('.')[0] + '.txt'\n# with open(os.path.join(ckpt_dir, result_file_name), 'w') as f:\n# f.write(summary_str)\n# f.write(repr(episode_returns))\n# f.write('\\n\\n')\n# f.write(repr(highest_rewards))\n# return success_rate, avg_return\ndef forward_pass(data, policy, latent_model):\n image_data, qpos_data, action_data, is_pad = data\n image_data, qpos_data, action_data, is_pad = image_data.cuda(), qpos_data.cuda(), action_data.cuda(), is_pad.cuda()\n forward_dict = {}"
+ },
+ {
+ "comment": "This code uses VQ-VAE to encode data, then feeds it into a latent model and calculates cross entropy loss. It also measures L1 error between output labels and ground truth labels for evaluation. The train_bc function trains the policy using a specified number of epochs with a given configuration and checkpoint directory.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":326-352",
+ "content": " gt_labels = policy.vq_encode(qpos_data, action_data, is_pad)\n inputs = torch.cat([torch.zeros_like(gt_labels)[:, [0]], gt_labels[:, :-1]], dim=1)\n output_logits = latent_model(inputs)\n ce_loss = F.cross_entropy(output_logits, gt_labels)\n with torch.no_grad():\n output_labels = F.one_hot(torch.argmax(output_logits, dim=-1), num_classes=gt_labels.shape[-1]).float()\n # output_latents = F.softmax(output_logits, dim=-1)\n l1_error = F.l1_loss(output_labels, gt_labels, reduction='mean')\n # l1_errors = []\n # for i in range(l1_errors.shape[1]):\n # l1_errors.append(torch.mean(l1_errors[:, i]).item())\n forward_dict['loss'] = ce_loss\n forward_dict['l1_error'] = l1_error\n return forward_dict\ndef train_bc(train_dataloader, val_dataloader, config, ckpt_name):\n num_epochs = config['num_epochs']\n ckpt_dir = config['ckpt_dir']\n seed = config['seed']\n policy_class = config['policy_class']\n policy_config = config['policy_config']\n set_seed(seed)"
+ },
+ {
+ "comment": "This code initializes a latent model and policy, loads checkpoints for the policy, optimizes the latent model using AdamW, trains for specified number of epochs, and validates the performance at each epoch.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":354-381",
+ "content": " vq_dim = config['policy_config']['vq_dim']\n vq_class = config['policy_config']['vq_class']\n latent_model = Latent_Model_Transformer(vq_dim, vq_dim, vq_class)\n latent_model.cuda()\n ckpt_path = os.path.join(ckpt_dir, ckpt_name)\n policy = make_policy(policy_class, policy_config)\n loading_status = policy.load_state_dict(torch.load(ckpt_path))\n policy.eval()\n policy.cuda()\n optimizer = torch.optim.AdamW(latent_model.parameters(), lr=config['lr'])\n train_history = []\n validation_history = []\n min_val_loss = np.inf\n best_ckpt_info = None\n for epoch in tqdm(range(num_epochs)):\n print(f'\\nEpoch {epoch}')\n # validation\n with torch.inference_mode():\n latent_model.eval()\n epoch_dicts = []\n for batch_idx, data in enumerate(val_dataloader):\n forward_dict = forward_pass(data, policy, latent_model)\n epoch_dicts.append(forward_dict)\n epoch_summary = compute_dict_mean(epoch_dicts)\n validation_history.append(epoch_summary)"
+ },
+ {
+ "comment": "This code is saving the best checkpoint, printing validation and training losses, iterating through dataloader for backpropagation, computing mean of dictionary values to get epoch summary, and storing it in a list.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":383-405",
+ "content": " epoch_val_loss = epoch_summary['loss']\n if epoch_val_loss < min_val_loss:\n min_val_loss = epoch_val_loss\n best_ckpt_info = (epoch, min_val_loss, deepcopy(latent_model.state_dict()))\n print(f'Val loss: {epoch_val_loss:.5f}')\n summary_string = ''\n for k, v in epoch_summary.items():\n summary_string += f'{k}: {v.item():.3f} '\n print(summary_string)\n # training\n optimizer.zero_grad()\n for batch_idx, data in enumerate(train_dataloader):\n forward_dict = forward_pass(data, policy, latent_model)\n # backward\n loss = forward_dict['loss']\n loss.backward()\n optimizer.step()\n optimizer.zero_grad()\n train_history.append(detach_dict(forward_dict))\n epoch_summary = compute_dict_mean(train_history[(batch_idx+1)*epoch:(batch_idx+1)*(epoch+1)])\n epoch_train_loss = epoch_summary['loss']\n print(f'Train loss: {epoch_train_loss:.5f}')"
+ },
+ {
+ "comment": "The code snippet saves the latent model's state at each epoch, keeps track of the best checkpoint, and plots the training curves. It prints the final validation loss and epoch where it occurred.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":406-430",
+ "content": " summary_string = ''\n for k, v in epoch_summary.items():\n summary_string += f'{k}: {v.item():.3f} '\n print(summary_string)\n if epoch % 100 == 0:\n ckpt_path = os.path.join(ckpt_dir, f'latent_model_epoch_{epoch}_seed_{seed}.ckpt')\n torch.save(latent_model.state_dict(), ckpt_path)\n plot_history(train_history, validation_history, epoch, ckpt_dir, seed)\n ckpt_path = os.path.join(ckpt_dir, f'latent_model_last.ckpt')\n torch.save(latent_model.state_dict(), ckpt_path)\n best_epoch, min_val_loss, best_state_dict = best_ckpt_info\n ckpt_path = os.path.join(ckpt_dir, f'latent_model_epoch_{best_epoch}_seed_{seed}.ckpt')\n torch.save(best_state_dict, ckpt_path)\n print(f'Training finished:\\nSeed {seed}, val loss {min_val_loss:.6f} at epoch {best_epoch}')\n # save training curves\n plot_history(train_history, validation_history, num_epochs, ckpt_dir, seed)\n return best_ckpt_info\ndef plot_history(train_history, validation_history, num_epochs, ckpt_dir, seed):"
+ },
+ {
+ "comment": "This code saves training curves for a latent model and plots them. It iterates over keys in train_history, generates plots for each key (train and validation), and saves the plot to ckpt_dir with seed appended. The code also takes command-line arguments such as --eval, --onscreen_render, --ckpt_dir, and --policy_class.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":431-452",
+ "content": " # save training curves\n for key in train_history[0]:\n plot_path = os.path.join(ckpt_dir, f'latent_model_val_{key}_seed_{seed}.png')\n plt.figure()\n train_values = [summary[key].item() for summary in train_history]\n val_values = [summary[key].item() for summary in validation_history]\n plt.plot(np.linspace(0, num_epochs-1, len(train_history)), train_values, label='train')\n plt.plot(np.linspace(0, num_epochs-1, len(validation_history)), val_values, label='validation')\n # plt.ylim([-0.1, 1])\n plt.tight_layout()\n plt.legend()\n plt.title(key)\n plt.savefig(plot_path)\n print(f'Saved plots to {ckpt_dir}')\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--eval', action='store_true')\n parser.add_argument('--onscreen_render', action='store_true')\n parser.add_argument('--ckpt_dir', action='store', type=str, help='ckpt_dir', required=True)\n parser.add_argument('--policy_class', action='store', type=str, help='policy_class, capitalize', required=True)"
+ },
+ {
+ "comment": "This code defines command-line arguments for the program, specifying required and optional parameters such as task_name, batch_size, seed, num_epochs, lr, kl_weight, chunk_size, hidden_dim, dim_feedforward, and temporal_agg. These options allow the user to customize the training process of the latent model.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":453-465",
+ "content": " parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--batch_size', action='store', type=int, help='batch_size', required=True)\n parser.add_argument('--seed', action='store', type=int, help='seed', required=True)\n parser.add_argument('--num_epochs', action='store', type=int, help='num_epochs', required=True)\n parser.add_argument('--lr', action='store', type=float, help='lr', required=True)\n # for ACT\n parser.add_argument('--kl_weight', action='store', type=int, help='KL Weight', required=False)\n parser.add_argument('--chunk_size', action='store', type=int, help='chunk_size', required=False)\n parser.add_argument('--hidden_dim', action='store', type=int, help='hidden_dim', required=False)\n parser.add_argument('--dim_feedforward', action='store', type=int, help='dim_feedforward', required=False)\n parser.add_argument('--temporal_agg', action='store_true')\n parser.add_argument('--use_vq', action='store_true')"
+ },
+ {
+ "comment": "This code is adding two arguments, \"--vq_class\" and \"--vq_dim\", to the parser using store action and specifying their types as integer (int). These arguments provide parameters for a latent model's class and dimensionality. The main function is then called with these parameters obtained from parsing command line arguments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/train_latent_model.py\":466-469",
+ "content": " parser.add_argument('--vq_class', action='store', type=int, help='vq_class')\n parser.add_argument('--vq_dim', action='store', type=int, help='vq_dim')\n main(vars(parser.parse_args()))"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/4b1cfb7e-44a5-41be-ad85-3a22b3a1af72.json b/docs/doc/4b1cfb7e-44a5-41be-ad85-3a22b3a1af72.json
new file mode 100644
index 00000000..ee1ce14a
--- /dev/null
+++ b/docs/doc/4b1cfb7e-44a5-41be-ad85-3a22b3a1af72.json
@@ -0,0 +1,30 @@
+{
+ "summary": "The function `calculate_nearest_neighbors()` computes nearest neighbor losses and is used to select the optimal value of 'k' for a dataset, plotting and saving the best loss. User inputs: dataset directory, checkpoint directory.",
+ "details": [
+ {
+ "comment": "Code imports necessary libraries and defines a function `calculate_nearest_neighbors()` that takes in query inputs, target values, support inputs, and support targets as well as a maximum value of K. It then calculates the pairwise distances between the query inputs and support inputs, sorts them, and calculates weights for the nearest neighbors using softmax. Finally, it computes errors by weighting the support targets based on these calculated weights.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_select_k.py\":0-30",
+ "content": "import torch\nimport torch.nn.functional as F\nimport numpy as np\nimport h5py\nimport pathlib\nimport os\nimport argparse\nimport matplotlib.pyplot as plt\nimport IPython\ne = IPython.embed\n# modified from https://github.com/jyopari/VINN/blob/main/nearest-neighbor-eval/handle_nn.ipynb\ndef calculate_nearest_neighbors(query_inputs, query_targets, support_inputs, support_targets, max_k):\n with torch.no_grad():\n pairwise_dist = []\n for q_in in query_inputs:\n diff = support_inputs - q_in.unsqueeze(0)\n dist = torch.norm(diff, dim=1)\n pairwise_dist.append(dist)\n pairwise_dist = torch.stack(pairwise_dist)\n sorted_dist, index = torch.sort(pairwise_dist, dim=1) # sort the support axis\n permuted_support_targets = support_targets[index]\n errors = []\n for k in range(1, max_k):\n topk_dist = pairwise_dist[:, :k]\n topk_support_targets = permuted_support_targets[:, :k]\n weights = F.softmax(-topk_dist, dim=1)\n weighted_support_targets = weights.unsqueeze(2) * topk_support_targets"
+ },
+ {
+ "comment": "This code reads episode indices from a specified directory, sorts them, and asserts there are no gaps. It then determines a validation split of 80% for training data. The code loads the training data into list X.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_select_k.py\":31-65",
+ "content": " prediction = torch.sum(weighted_support_targets, dim=1)\n error = F.mse_loss(prediction, query_targets)\n errors.append(error)\n return errors\ndef chunks(lst, n):\n \"\"\"Yield successive n-sized chunks from lst.\"\"\"\n for i in range(0, len(lst), n):\n yield lst[i:i + n]\ndef main(args):\n # TODO ######################\n dataset_dir = args['dataset_dir']\n ckpt_dir = args['ckpt_dir']\n seed = 0\n max_k = 400\n batch_size = 100\n # TODO ######################\n repr_type = 'byol'\n if 'cotrain' in ckpt_dir:\n repr_type += '_cotrain'\n e() # make sure!\n if not os.path.isdir(ckpt_dir):\n os.makedirs(ckpt_dir)\n episode_idxs = [int(name.split('_')[1].split('.')[0]) for name in os.listdir(dataset_dir) if ('.hdf5' in name) and ('features' not in name)]\n episode_idxs.sort()\n assert len(episode_idxs) == episode_idxs[-1] + 1 # no holes\n num_episodes = len(episode_idxs)\n val_split = int(num_episodes * 0.8)\n # load train data\n X = []"
+ },
+ {
+ "comment": "This code loads data from HDF5 files and concatenates it for training. It reads action labels and camera features for each episode, then combines them into a single feature matrix (X) and action label matrix (Y). The code also prints the shape of the feature matrices.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_select_k.py\":66-93",
+ "content": " Y = []\n for episode_id in range(0, val_split):\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n action = root['/action'][:]\n camera_names = list(root[f'/observations/images/'].keys())\n all_cam_feature = []\n feature_dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_id}.hdf5')\n with h5py.File(feature_dataset_path, 'r') as root:\n for cam_name in camera_names:\n cam_feature = root[f'/features/{cam_name}'][:]\n all_cam_feature.append(cam_feature)\n cam_feature = np.concatenate(all_cam_feature, axis=1)\n X.append(cam_feature)\n Y.append(action)\n X = np.concatenate(X)\n Y = np.concatenate(Y)\n train_inputs = torch.from_numpy(X).cuda()\n train_targets = torch.from_numpy(Y).cuda()\n print(f'All features: {train_inputs.shape}')\n # load test data\n X = []\n Y = []\n for episode_id in range(val_split, num_episodes):"
+ },
+ {
+ "comment": "This code loads data from multiple HDF5 files, concatenates camera features into a single feature matrix (X), and associates corresponding actions as targets (Y). It then prepares the data for training by converting to PyTorch tensors and computing nearest neighbor losses using a custom function. The resulting losses are stored in val_losses list.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_select_k.py\":94-117",
+ "content": " dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n action = root['/action'][:]\n all_cam_feature = []\n feature_dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_id}.hdf5')\n with h5py.File(feature_dataset_path, 'r') as root:\n for cam_name in camera_names:\n cam_feature = root[f'/features/{cam_name}'][:]\n all_cam_feature.append(cam_feature)\n cam_feature = np.concatenate(all_cam_feature, axis=1)\n X.append(cam_feature)\n Y.append(action)\n X = np.concatenate(X)\n Y = np.concatenate(Y)\n val_inputs = torch.from_numpy(X).cuda()\n val_targets = torch.from_numpy(Y).cuda()\n val_losses = []\n for inputs, targets in zip(chunks(val_inputs, batch_size), chunks(val_targets, batch_size)):\n val_loss = calculate_nearest_neighbors(inputs, targets, train_inputs, train_targets, max_k)\n val_loss = torch.stack(val_loss)"
+ },
+ {
+ "comment": "This code is used to select the optimal value of 'k' for a dataset. It calculates the validation loss for different values of 'k', plots the losses, and saves the best loss in an image file. The user needs to provide the directory path for the dataset and the checkpoint directory as input arguments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_select_k.py\":118-133",
+ "content": " val_losses.append(val_loss)\n val_losses = torch.mean(torch.stack(val_losses), dim=0)\n val_loss = val_losses\n val_loss = torch.tensor(val_loss).cpu().numpy()\n print(f'min val loss of {np.min(val_loss)} at k={np.argmin(val_loss)}')\n plt.plot(np.arange(1, max_k), val_loss)\n plt.savefig(os.path.join(ckpt_dir, f'k_select-seed{seed}.png'))\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--dataset_dir', action='store', type=str, help='The text to parse.', required=True)\n parser.add_argument('--ckpt_dir', action='store', type=str, help='The text to parse.', required=True)\n main(vars(parser.parse_args()))"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/504f2448-310e-4eb1-87c6-a4a0c865531e.json b/docs/doc/504f2448-310e-4eb1-87c6-a4a0c865531e.json
new file mode 100644
index 00000000..d2bc6f5e
--- /dev/null
+++ b/docs/doc/504f2448-310e-4eb1-87c6-a4a0c865531e.json
@@ -0,0 +1,25 @@
+{
+ "summary": "This code defines a transformer positional embedding class using sine and cosine encodings for position embeddings. The forward function applies these encodings to the input tensor 'x', normalizing cumulative sums before applying dimensional transformation. This learned absolute position embedding extends nn.Module and is used in transformer models.",
+ "details": [
+ {
+ "comment": "This code defines a positional embedding class for transformers, similar to the one used in the Attention is All You Need paper. It takes in parameters such as num_pos_feats (number of position features), temperature, normalize (whether to normalize or not), and scale. The forward function applies sine and cosine positional encodings to tensor.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/position_encoding.py\":0-32",
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nVarious positional encodings for the transformer.\n\"\"\"\nimport math\nimport torch\nfrom torch import nn\nfrom util.misc import NestedTensor\nimport IPython\ne = IPython.embed\nclass PositionEmbeddingSine(nn.Module):\n \"\"\"\n This is a more standard version of the position embedding, very similar to the one\n used by the Attention is all you need paper, generalized to work on images.\n \"\"\"\n def __init__(self, num_pos_feats=64, temperature=10000, normalize=False, scale=None):\n super().__init__()\n self.num_pos_feats = num_pos_feats\n self.temperature = temperature\n self.normalize = normalize\n if scale is not None and normalize is False:\n raise ValueError(\"normalize should be True if scale is passed\")\n if scale is None:\n scale = 2 * math.pi\n self.scale = scale\n def forward(self, tensor):\n x = tensor\n # mask = tensor_list.mask\n # assert mask is not None"
+ },
+ {
+ "comment": "This code generates position embeddings for a given input tensor 'x'. It first creates not_mask and computes the cumulative sums along rows and columns. Then, it normalizes these sums by dividing them with their respective last elements plus a small epsilon value and multiplies them by a scale factor. The code then calculates a temperature-based dimensional transformation for each element in 'x'. It further computes the sine and cosine of the transformed values, stacks them and flattens them along one dimension. Finally, it concatenates the y and x embeddings along the last dimension, permutes the dimensions, and returns the result. This class extends nn.Module and is used for creating learned absolute position embeddings.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/position_encoding.py\":33-56",
+ "content": " # not_mask = ~mask\n not_mask = torch.ones_like(x[0, [0]])\n y_embed = not_mask.cumsum(1, dtype=torch.float32)\n x_embed = not_mask.cumsum(2, dtype=torch.float32)\n if self.normalize:\n eps = 1e-6\n y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale\n x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale\n dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)\n dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)\n pos_x = x_embed[:, :, :, None] / dim_t\n pos_y = y_embed[:, :, :, None] / dim_t\n pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3)\n pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3)\n pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)\n return pos\nclass PositionEmbeddingLearned(nn.Module):\n \"\"\"\n Absolute pos embedding, learned."
+ },
+ {
+ "comment": "This code defines a class \"PositionEmbeddingSine\" for creating position encoding using sine and cosine functions. It takes the number of positional features as input and initializes two embedding layers, one for rows and another for columns. The \"forward\" method computes position embeddings by applying row and column embeddings to image indices and returns them. The \"build_position_encoding\" function creates an instance of PositionEmbeddingSine based on the given arguments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/position_encoding.py\":57-86",
+ "content": " \"\"\"\n def __init__(self, num_pos_feats=256):\n super().__init__()\n self.row_embed = nn.Embedding(50, num_pos_feats)\n self.col_embed = nn.Embedding(50, num_pos_feats)\n self.reset_parameters()\n def reset_parameters(self):\n nn.init.uniform_(self.row_embed.weight)\n nn.init.uniform_(self.col_embed.weight)\n def forward(self, tensor_list: NestedTensor):\n x = tensor_list.tensors\n h, w = x.shape[-2:]\n i = torch.arange(w, device=x.device)\n j = torch.arange(h, device=x.device)\n x_emb = self.col_embed(i)\n y_emb = self.row_embed(j)\n pos = torch.cat([\n x_emb.unsqueeze(0).repeat(h, 1, 1),\n y_emb.unsqueeze(1).repeat(1, w, 1),\n ], dim=-1).permute(2, 0, 1).unsqueeze(0).repeat(x.shape[0], 1, 1, 1)\n return pos\ndef build_position_encoding(args):\n N_steps = args.hidden_dim // 2\n if args.position_embedding in ('v2', 'sine'):\n # TODO find a better way of exposing other arguments\n position_embedding = PositionEmbeddingSine(N_steps, normalize=True)"
+ },
+ {
+ "comment": "This code snippet checks the value of 'args.position_embedding' and if it is set to either 'v3' or 'learned', it creates a PositionEmbeddingLearned object. If the input is neither of these, it raises a ValueError with an error message. Finally, it returns the created position embedding object.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/position_encoding.py\":87-92",
+ "content": " elif args.position_embedding in ('v3', 'learned'):\n position_embedding = PositionEmbeddingLearned(N_steps)\n else:\n raise ValueError(f\"not supported {args.position_embedding}\")\n return position_embedding"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/53e32bad-9a45-4da6-a0cf-0886ddb2d0f8.json b/docs/doc/53e32bad-9a45-4da6-a0cf-0886ddb2d0f8.json
new file mode 100644
index 00000000..b29af775
--- /dev/null
+++ b/docs/doc/53e32bad-9a45-4da6-a0cf-0886ddb2d0f8.json
@@ -0,0 +1,75 @@
+{
+ "summary": "This code defines a DETRVAE model for image object detection, using deep learning architecture and presents a CVAE-DETR model that generates latent inputs. The transformer-based model predicts actions and latent variables using PyTorch.",
+ "details": [
+ {
+ "comment": "This code defines the DETRVAE model and its associated functions. It uses modules like `torch`, `nn`, and `TransformerEncoder` to build a deep learning architecture for detecting objects in images. The `reparametrize` function is used for reparameterization trick, while `get_sinusoid_encoding_table` generates sinusoid encodings for positional encoding. The class `DETRVAE` is the main model implementation.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":0-34",
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nDETR model and criterion classes.\n\"\"\"\nimport torch\nfrom torch import nn\nfrom torch.autograd import Variable\nimport torch.nn.functional as F\nfrom .backbone import build_backbone\nfrom .transformer import build_transformer, TransformerEncoder, TransformerEncoderLayer\nimport numpy as np\nimport IPython\ne = IPython.embed\ndef reparametrize(mu, logvar):\n std = logvar.div(2).exp()\n eps = Variable(std.data.new(std.size()).normal_())\n return mu + std * eps\ndef get_sinusoid_encoding_table(n_position, d_hid):\n def get_position_angle_vec(position):\n return [position / np.power(10000, 2 * (hid_j // 2) / d_hid) for hid_j in range(d_hid)]\n sinusoid_table = np.array([get_position_angle_vec(pos_i) for pos_i in range(n_position)])\n sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i\n sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1\n return torch.FloatTensor(sinusoid_table).unsqueeze(0)\nclass DETRVAE(nn.Module):"
+ },
+ {
+ "comment": "The code defines a class called `DETR` for object detection. It takes in backbone, transformer, encoder, state_dim, num_queries, camera_names, vq, vq_class, and vq_dim as parameters to initialize the model. The `num_queries` represents the maximal number of objects that DETR can detect in a single image, and auxiliary decoding losses are optional.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":35-51",
+ "content": " \"\"\" This is the DETR module that performs object detection \"\"\"\n def __init__(self, backbones, transformer, encoder, state_dim, num_queries, camera_names, vq, vq_class, vq_dim, action_dim):\n \"\"\" Initializes the model.\n Parameters:\n backbones: torch module of the backbone to be used. See backbone.py\n transformer: torch module of the transformer architecture. See transformer.py\n state_dim: robot state dimension of the environment\n num_queries: number of object queries, ie detection slot. This is the maximal number of objects\n DETR can detect in a single image. For COCO, we recommend 100 queries.\n aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.\n \"\"\"\n super().__init__()\n self.num_queries = num_queries\n self.camera_names = camera_names\n self.transformer = transformer\n self.encoder = encoder\n self.vq, self.vq_class, self.vq_dim = vq, vq_class, vq_dim"
+ },
+ {
+ "comment": "The code initializes the DETR-VAE model by setting state and action dimensions, defining linear layers for action and pad heads, an embedding layer for queries, and additional layers based on whether backbones are provided or not. If no backbones are provided, it adds separate layers for robot state and environment state projections, a position embedding, and sets the backbones to None. It also sets the latent dimension of the latent z variable to 32 (to be tuned) and adds an extra cls token embedding.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":52-70",
+ "content": " self.state_dim, self.action_dim = state_dim, action_dim\n hidden_dim = transformer.d_model\n self.action_head = nn.Linear(hidden_dim, action_dim)\n self.is_pad_head = nn.Linear(hidden_dim, 1)\n self.query_embed = nn.Embedding(num_queries, hidden_dim)\n if backbones is not None:\n self.input_proj = nn.Conv2d(backbones[0].num_channels, hidden_dim, kernel_size=1)\n self.backbones = nn.ModuleList(backbones)\n self.input_proj_robot_state = nn.Linear(state_dim, hidden_dim)\n else:\n # input_dim = 14 + 7 # robot_state + env_state\n self.input_proj_robot_state = nn.Linear(state_dim, hidden_dim)\n self.input_proj_env_state = nn.Linear(7, hidden_dim)\n self.pos = torch.nn.Embedding(2, hidden_dim)\n self.backbones = None\n # encoder extra parameters\n self.latent_dim = 32 # final size of latent z # TODO tune\n self.cls_embed = nn.Embedding(1, hidden_dim) # extra cls token embedding"
+ },
+ {
+ "comment": "The code initializes the layers for a variational autoencoder (VAE) in DETR model. It includes linear layers to project actions and qpos to embedding, VQ-VAE specific latent projection, and decoder parameters such as latent out projection and learned position embeddings for proprio and latent. The encode function takes qpos, actions, is_pad, and vq_sample as inputs.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":71-89",
+ "content": " self.encoder_action_proj = nn.Linear(action_dim, hidden_dim) # project action to embedding\n self.encoder_joint_proj = nn.Linear(state_dim, hidden_dim) # project qpos to embedding\n print(f'Use VQ: {self.vq}, {self.vq_class}, {self.vq_dim}')\n if self.vq:\n self.latent_proj = nn.Linear(hidden_dim, self.vq_class * self.vq_dim)\n else:\n self.latent_proj = nn.Linear(hidden_dim, self.latent_dim*2) # project hidden state to latent std, var\n self.register_buffer('pos_table', get_sinusoid_encoding_table(1+1+num_queries, hidden_dim)) # [CLS], qpos, a_seq\n # decoder extra parameters\n if self.vq:\n self.latent_out_proj = nn.Linear(self.vq_class * self.vq_dim, hidden_dim)\n else:\n self.latent_out_proj = nn.Linear(self.latent_dim, hidden_dim) # project latent sample to embedding\n self.additional_pos_embed = nn.Embedding(2, hidden_dim) # learned position embedding for proprio and latent\n def encode(self, qpos, actions=None, is_pad=None, vq_sample=None):"
+ },
+ {
+ "comment": "This code is part of a CVAE (Conditional Variational Autoencoder) model. It obtains the latent variable z from an action sequence and a query position during training. The encoder projects the action sequence to an embedding dimension and concatenates it with a query position embedding and a fixed CLs token embedding. These inputs are then passed to the encoder to get the latent representation.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":90-106",
+ "content": " bs, _ = qpos.shape\n if self.encoder is None:\n latent_sample = torch.zeros([bs, self.latent_dim], dtype=torch.float32).to(qpos.device)\n latent_input = self.latent_out_proj(latent_sample)\n probs = binaries = mu = logvar = None\n else:\n # cvae encoder\n is_training = actions is not None # train or val\n ### Obtain latent z from action sequence\n if is_training:\n # project action sequence to embedding dim, and concat with a CLS token\n action_embed = self.encoder_action_proj(actions) # (bs, seq, hidden_dim)\n qpos_embed = self.encoder_joint_proj(qpos) # (bs, hidden_dim)\n qpos_embed = torch.unsqueeze(qpos_embed, axis=1) # (bs, 1, hidden_dim)\n cls_embed = self.cls_embed.weight # (1, hidden_dim)\n cls_embed = torch.unsqueeze(cls_embed, axis=0).repeat(bs, 1, 1) # (bs, 1, hidden_dim)\n encoder_input = torch.cat([cls_embed, qpos_embed, action_embed], axis=1) # (bs, seq+1, hidden_dim)"
+ },
+ {
+ "comment": "This code snippet is part of a DETR model, specifically the VAE (Variational Autoencoder) implementation. Here, it prepares the input for the encoder and then passes it through the encoder to obtain an encoded representation (latent_info). This encoding is used for the VQ-VAE loss (if enabled), where a one-hot binary encoding of the latents is used to learn a codebook.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":107-122",
+ "content": " encoder_input = encoder_input.permute(1, 0, 2) # (seq+1, bs, hidden_dim)\n # do not mask cls token\n cls_joint_is_pad = torch.full((bs, 2), False).to(qpos.device) # False: not a padding\n is_pad = torch.cat([cls_joint_is_pad, is_pad], axis=1) # (bs, seq+1)\n # obtain position embedding\n pos_embed = self.pos_table.clone().detach()\n pos_embed = pos_embed.permute(1, 0, 2) # (seq+1, 1, hidden_dim)\n # query model\n encoder_output = self.encoder(encoder_input, pos=pos_embed, src_key_padding_mask=is_pad)\n encoder_output = encoder_output[0] # take cls output only\n latent_info = self.latent_proj(encoder_output)\n if self.vq:\n logits = latent_info.reshape([*latent_info.shape[:-1], self.vq_class, self.vq_dim])\n probs = torch.softmax(logits, dim=-1)\n binaries = F.one_hot(torch.mult"
+ },
+ {
+ "comment": "This code is for a Variational Autoencoder (VAE) model, specifically the DETR-VAE. It calculates the latent input based on whether or not the model is in VQ-VAE mode. If it is, it computes binaries and probs, subtracts them, passes through the latent projection layer, and assigns them to mu and logvar as None. If not, it uses either the provided vq_sample (if available) or calculates the latent input using the latent projection layer if VQ mode is disabled.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":122-140",
+ "content": "inomial(probs.view(-1, self.vq_dim), 1).squeeze(-1), self.vq_dim).view(-1, self.vq_class, self.vq_dim).float()\n binaries_flat = binaries.view(-1, self.vq_class * self.vq_dim)\n probs_flat = probs.view(-1, self.vq_class * self.vq_dim)\n straigt_through = binaries_flat - probs_flat.detach() + probs_flat\n latent_input = self.latent_out_proj(straigt_through)\n mu = logvar = None\n else:\n probs = binaries = None\n mu = latent_info[:, :self.latent_dim]\n logvar = latent_info[:, self.latent_dim:]\n latent_sample = reparametrize(mu, logvar)\n latent_input = self.latent_out_proj(latent_sample)\n else:\n mu = logvar = binaries = probs = None\n if self.vq:\n latent_input = self.latent_out_proj(vq_sample.view(-1, self.vq_class * self.vq_dim))\n else:\n "
+ },
+ {
+ "comment": "This code snippet defines a method for creating latent samples, initializing variables, and performing encoding using a VAE (Variational AutoEncoder). The forward function takes in inputs like qpos, image, env_state, actions, is_pad, and vq_sample. It encodes the input using the encode method and then applies the CVAE decoder if backbones are provided.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":140-162",
+ "content": " latent_sample = torch.zeros([bs, self.latent_dim], dtype=torch.float32).to(qpos.device)\n latent_input = self.latent_out_proj(latent_sample)\n return latent_input, probs, binaries, mu, logvar\n def forward(self, qpos, image, env_state, actions=None, is_pad=None, vq_sample=None):\n \"\"\"\n qpos: batch, qpos_dim\n image: batch, num_cam, channel, height, width\n env_state: None\n actions: batch, seq, action_dim\n \"\"\"\n latent_input, probs, binaries, mu, logvar = self.encode(qpos, actions, is_pad, vq_sample)\n # cvae decoder\n if self.backbones is not None:\n # Image observation features and position embeddings\n all_cam_features = []\n all_cam_pos = []\n for cam_id, cam_name in enumerate(self.camera_names):\n features, pos = self.backbones[cam_id](image[:, cam_id])\n features = features[0] # take the last layer feature\n pos = pos[0]"
+ },
+ {
+ "comment": "This code defines a model for predicting actions and latent variables. It includes a transformer network, action head, and is pad head. The input includes camera features, proprioception features, robot state, and environment state. The model handles both scenarios with or without cameras. The CNNMLP class initializes the model using backbones, state_dim, and camera names.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":163-183",
+ "content": " all_cam_features.append(self.input_proj(features))\n all_cam_pos.append(pos)\n # proprioception features\n proprio_input = self.input_proj_robot_state(qpos)\n # fold camera dimension into width dimension\n src = torch.cat(all_cam_features, axis=3)\n pos = torch.cat(all_cam_pos, axis=3)\n hs = self.transformer(src, None, self.query_embed.weight, pos, latent_input, proprio_input, self.additional_pos_embed.weight)[0]\n else:\n qpos = self.input_proj_robot_state(qpos)\n env_state = self.input_proj_env_state(env_state)\n transformer_input = torch.cat([qpos, env_state], axis=1) # seq length = 2\n hs = self.transformer(transformer_input, None, self.query_embed.weight, self.pos.weight)[0]\n a_hat = self.action_head(hs)\n is_pad_hat = self.is_pad_head(hs)\n return a_hat, is_pad_hat, [mu, logvar], probs, binaries\nclass CNNMLP(nn.Module):\n def __init__(self, backbones, state_dim, camera_names):"
+ },
+ {
+ "comment": "This code initializes the model and takes parameters for backbones, transformer, state_dim, num_queries, and aux_loss. It creates an action head using a linear layer with 1000 input size and state_dim output size. If backbones are provided, it creates a ModuleList of backbones and initializes down_proj for each backbone using conv2d with specified parameters.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":184-201",
+ "content": " \"\"\" Initializes the model.\n Parameters:\n backbones: torch module of the backbone to be used. See backbone.py\n transformer: torch module of the transformer architecture. See transformer.py\n state_dim: robot state dimension of the environment\n num_queries: number of object queries, ie detection slot. This is the maximal number of objects\n DETR can detect in a single image. For COCO, we recommend 100 queries.\n aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.\n \"\"\"\n super().__init__()\n self.camera_names = camera_names\n self.action_head = nn.Linear(1000, state_dim) # TODO add more\n if backbones is not None:\n self.backbones = nn.ModuleList(backbones)\n backbone_down_projs = []\n for backbone in backbones:\n down_proj = nn.Sequential(\n nn.Conv2d(backbone.num_channels, 128, kernel_size=5),"
+ },
+ {
+ "comment": "This code is for a DETR model in PyTorch. It defines the architecture and forward pass. The backbone network consists of two convolutions to downsample the input, followed by a mlp layer if needed. The forward method takes in qpos, image, env_state (None in this case), and optionally actions for training or validation. It extracts image features from each camera view using backbones, concatenates them, and performs positional encoding.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":202-226",
+ "content": " nn.Conv2d(128, 64, kernel_size=5),\n nn.Conv2d(64, 32, kernel_size=5)\n )\n backbone_down_projs.append(down_proj)\n self.backbone_down_projs = nn.ModuleList(backbone_down_projs)\n mlp_in_dim = 768 * len(backbones) + state_dim\n self.mlp = mlp(input_dim=mlp_in_dim, hidden_dim=1024, output_dim=self.action_dim, hidden_depth=2)\n else:\n raise NotImplementedError\n def forward(self, qpos, image, env_state, actions=None):\n \"\"\"\n qpos: batch, qpos_dim\n image: batch, num_cam, channel, height, width\n env_state: None\n actions: batch, seq, action_dim\n \"\"\"\n is_training = actions is not None # train or val\n bs, _ = qpos.shape\n # Image observation features and position embeddings\n all_cam_features = []\n for cam_id, cam_name in enumerate(self.camera_names):\n features, pos = self.backbones[cam_id](image[:, cam_id])\n features = features[0] # take the last layer feature"
+ },
+ {
+ "comment": "This code defines a DETR VAE model, including functions for building the encoder and creating an MLP. The encoder takes input features and positions (qpos) to create a flattened feature matrix, which is then passed through an MLP to produce the final output (a_hat).",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":227-253",
+ "content": " pos = pos[0] # not used\n all_cam_features.append(self.backbone_down_projs[cam_id](features))\n # flatten everything\n flattened_features = []\n for cam_feature in all_cam_features:\n flattened_features.append(cam_feature.reshape([bs, -1]))\n flattened_features = torch.cat(flattened_features, axis=1) # 768 each\n features = torch.cat([flattened_features, qpos], axis=1) # qpos: 14\n a_hat = self.mlp(features)\n return a_hat\ndef mlp(input_dim, hidden_dim, output_dim, hidden_depth):\n if hidden_depth == 0:\n mods = [nn.Linear(input_dim, output_dim)]\n else:\n mods = [nn.Linear(input_dim, hidden_dim), nn.ReLU(inplace=True)]\n for i in range(hidden_depth - 1):\n mods += [nn.Linear(hidden_dim, hidden_dim), nn.ReLU(inplace=True)]\n mods.append(nn.Linear(hidden_dim, output_dim))\n trunk = nn.Sequential(*mods)\n return trunk\ndef build_encoder(args):\n d_model = args.hidden_dim # 256\n dropout = args.dropout # 0.1"
+ },
+ {
+ "comment": "This code builds a DETRVAE model by defining its components and parameters. It initializes the transformer encoder, decoder, and VAE components based on provided arguments. The backbone for image processing is built using a function call to build_backbone(args). If no encoder is required, it sets the encoder as None.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":254-288",
+ "content": " nhead = args.nheads # 8\n dim_feedforward = args.dim_feedforward # 2048\n num_encoder_layers = args.enc_layers # 4 # TODO shared with VAE decoder\n normalize_before = args.pre_norm # False\n activation = \"relu\"\n encoder_layer = TransformerEncoderLayer(d_model, nhead, dim_feedforward,\n dropout, activation, normalize_before)\n encoder_norm = nn.LayerNorm(d_model) if normalize_before else None\n encoder = TransformerEncoder(encoder_layer, num_encoder_layers, encoder_norm)\n return encoder\ndef build(args):\n state_dim = 14 # TODO hardcode\n # From state\n # backbone = None # from state for now, no need for conv nets\n # From image\n backbones = []\n for _ in args.camera_names:\n backbone = build_backbone(args)\n backbones.append(backbone)\n transformer = build_transformer(args)\n if args.no_encoder:\n encoder = None\n else:\n encoder = build_transformer(args)\n model = DETRVAE(\n backbones,\n transformer,"
+ },
+ {
+ "comment": "This code defines two functions, `detr_vae` and `build_cnnmlp`, which build different models. Both functions return a model object after printing the number of parameters it has. The `detr_vae` function requires additional arguments like `state_dim`, `num_queries`, `camera_names`, `vq`, `vq_class`, and `action_dim`. The `build_cnnmlp` function requires an `args` argument, which it uses to create a CNNMLP model by building backbones for each camera name provided in the arguments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/detr_vae.py\":289-324",
+ "content": " encoder,\n state_dim=state_dim,\n num_queries=args.num_queries,\n camera_names=args.camera_names,\n vq=args.vq,\n vq_class=args.vq_class,\n vq_dim=args.vq_dim,\n action_dim=args.action_dim,\n )\n n_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)\n print(\"number of parameters: %.2fM\" % (n_parameters/1e6,))\n return model\ndef build_cnnmlp(args):\n state_dim = 14 # TODO hardcode\n # From state\n # backbone = None # from state for now, no need for conv nets\n # From image\n backbones = []\n for _ in args.camera_names:\n backbone = build_backbone(args)\n backbones.append(backbone)\n model = CNNMLP(\n backbones,\n state_dim=state_dim,\n camera_names=args.camera_names,\n )\n n_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)\n print(\"number of parameters: %.2fM\" % (n_parameters/1e6,))\n return model"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/57309c6a-3e99-4ff0-b6b5-25ccaeab58df.json b/docs/doc/57309c6a-3e99-4ff0-b6b5-25ccaeab58df.json
new file mode 100644
index 00000000..d3d0b135
--- /dev/null
+++ b/docs/doc/57309c6a-3e99-4ff0-b6b5-25ccaeab58df.json
@@ -0,0 +1,10 @@
+{
+ "summary": "This is the copyright statement for the codebase, indicating that Facebook and its affiliates hold the rights to this code.",
+ "details": [
+ {
+ "comment": "This is the copyright statement for the codebase, indicating that Facebook and its affiliates hold the rights to this code.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/__init__.py\":0-0",
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/5d29d82d-0f38-4492-81e6-8b9fc999c5ea.json b/docs/doc/5d29d82d-0f38-4492-81e6-8b9fc999c5ea.json
new file mode 100644
index 00000000..cb5a2582
--- /dev/null
+++ b/docs/doc/5d29d82d-0f38-4492-81e6-8b9fc999c5ea.json
@@ -0,0 +1,20 @@
+{
+ "summary": "Latent_Model_Transformer extends nn.Module, uses self-attention for latent space sequence modeling, has configurable input/output dimensions and sequence length, defaulting to 256 latent dimension, 8 heads, and 3 layers. The class has 'forward' and 'generate' methods for generating new samples by iteratively sampling from the output of the forward pass using temperature-scaled softmax and one-hot encoding.",
+ "details": [
+ {
+ "comment": "Causal Transformer block: LayerNormalization, MultiHeadAttention with dropout, and MLP sequential layers.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/latent_model.py\":0-27",
+ "content": "import torch.nn as nn\nfrom torch.nn import functional as F\nimport torch\nDROPOUT_RATE = 0.1\n# a causal transformer block\nclass Causal_Transformer_Block(nn.Module):\n def __init__(self, seq_len, latent_dim, num_head) -> None:\n super().__init__()\n self.num_head = num_head\n self.latent_dim = latent_dim\n self.ln_1 = nn.LayerNorm(latent_dim)\n self.attn = nn.MultiheadAttention(latent_dim, num_head, dropout=DROPOUT_RATE, batch_first=True)\n self.ln_2 = nn.LayerNorm(latent_dim)\n self.mlp = nn.Sequential(\n nn.Linear(latent_dim, 4 * latent_dim),\n nn.GELU(),\n nn.Linear(4 * latent_dim, latent_dim),\n nn.Dropout(DROPOUT_RATE),\n )\n # self.register_buffer(\"attn_mask\", torch.triu(torch.ones(seq_len, seq_len), diagonal=1).bool())\n def forward(self, x):\n attn_mask = torch.triu(torch.ones(x.shape[1], x.shape[1], device=x.device, dtype=torch.bool), diagonal=1)\n x = self.ln_1(x)\n x = x + self.attn(x, x, x, attn_mask=attn_mask)[0]"
+ },
+ {
+ "comment": "In \"act-plus-plus/detr/models/latent_model.py\", lines 28-54 define a class Latent_Model_Transformer that extends nn.Module. This model uses self-attention instead of RNN to model the latent space sequence. It takes an input dimension, output dimension, sequence length, latent dimension (default 256), number of heads (default 8), and number of layers (default 3). The forward method applies an input layer, adds positional embedding, passes through a series of causal transformer blocks, and finally outputs through an output layer.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/latent_model.py\":28-54",
+ "content": " x = self.ln_2(x)\n x = x + self.mlp(x)\n return x\n# use self-attention instead of RNN to model the latent space sequence\nclass Latent_Model_Transformer(nn.Module):\n def __init__(self, input_dim, output_dim, seq_len, latent_dim=256, num_head=8, num_layer=3) -> None:\n super().__init__()\n self.input_dim = input_dim\n self.output_dim = output_dim\n self.seq_len = seq_len\n self.latent_dim = latent_dim\n self.num_head = num_head\n self.num_layer = num_layer\n self.input_layer = nn.Linear(input_dim, latent_dim)\n self.weight_pos_embed = nn.Embedding(seq_len, latent_dim)\n self.attention_blocks = nn.Sequential(\n nn.Dropout(DROPOUT_RATE),\n *[Causal_Transformer_Block(seq_len, latent_dim, num_head) for _ in range(num_layer)],\n nn.LayerNorm(latent_dim)\n )\n self.output_layer = nn.Linear(latent_dim, output_dim)\n def forward(self, x):\n x = self.input_layer(x)\n x = x + self.weight_pos_embed(torch.arange(x.shape[1], device=x.device))"
+ },
+ {
+ "comment": "This code defines a class with two methods: 'forward' and 'generate'. The 'forward' method applies attention blocks to the input, then passes it through an output layer to produce logits. The 'generate' method generates new samples by iteratively sampling from the output of the forward pass using temperature-scaled softmax and one-hot encoding. The generated samples are appended to the original input and returned after trimming unnecessary rows.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/latent_model.py\":55-71",
+ "content": " x = self.attention_blocks(x)\n logits = self.output_layer(x)\n return logits\n @torch.no_grad()\n def generate(self, n, temperature=0.1, x=None):\n if x is None:\n x = torch.zeros((n, 1, self.input_dim), device=self.weight_pos_embed.weight.device)\n for i in range(self.seq_len):\n logits = self.forward(x)[:, -1]\n probs = torch.softmax(logits / temperature, dim=-1)\n samples = torch.multinomial(probs, num_samples=1)[..., 0]\n samples_one_hot = F.one_hot(samples.long(), num_classes=self.output_dim).float()\n x = torch.cat([x, samples_one_hot[:, None, :]], dim=1)\n return x[:, 1:, :]"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/6c1a572f-cac7-42f1-afc2-8194a105a46f.json b/docs/doc/6c1a572f-cac7-42f1-afc2-8194a105a46f.json
new file mode 100644
index 00000000..fa9bdddc
--- /dev/null
+++ b/docs/doc/6c1a572f-cac7-42f1-afc2-8194a105a46f.json
@@ -0,0 +1,70 @@
+{
+ "summary": "This code defines a bi-manual manipulation environment, sets up action and observation spaces for cube transfer tasks, initializes physics simulation, and enables interactive plotting. It determines rewards based on contact and gripper positions.",
+ "details": [
+ {
+ "comment": "The code imports necessary libraries and defines a function called make_sim_env, which creates a simulation environment for robot bi-manual manipulation with joint position control. The action space consists of left arm joint positions, left gripper position (normalized), right arm joint positions, and right gripper position (normalized).",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":0-25",
+ "content": "import numpy as np\nimport os\nimport collections\nimport matplotlib.pyplot as plt\nfrom dm_control import mujoco\nfrom dm_control.rl import control\nfrom dm_control.suite import base\nfrom constants import DT, XML_DIR, START_ARM_POSE\nfrom constants import PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN\nfrom constants import MASTER_GRIPPER_POSITION_NORMALIZE_FN\nfrom constants import PUPPET_GRIPPER_POSITION_NORMALIZE_FN\nfrom constants import PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN\nimport IPython\ne = IPython.embed\nBOX_POSE = [None] # to be changed from outside\ndef make_sim_env(task_name):\n \"\"\"\n Environment for simulated robot bi-manual manipulation, with joint position control\n Action space: [left_arm_qpos (6), # absolute joint position\n left_gripper_positions (1), # normalized gripper position (0: close, 1: open)\n right_arm_qpos (6), # absolute joint position\n right_gripper_positions (1),] # normalized gripper position (0: close, 1: open)"
+ },
+ {
+ "comment": "This code defines the observation space for a simulation environment, including joint positions and velocities of both arms and gripper states, along with image input from a camera. It is specific to tasks involving transferring cubes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":27-37",
+ "content": " Observation space: {\"qpos\": Concat[ left_arm_qpos (6), # absolute joint position\n left_gripper_position (1), # normalized gripper position (0: close, 1: open)\n right_arm_qpos (6), # absolute joint position\n right_gripper_qpos (1)] # normalized gripper position (0: close, 1: open)\n \"qvel\": Concat[ left_arm_qvel (6), # absolute joint velocity (rad)\n left_gripper_velocity (1), # normalized gripper velocity (pos: opening, neg: closing)\n right_arm_qvel (6), # absolute joint velocity (rad)\n right_gripper_qvel (1)] # normalized gripper velocity (pos: opening, neg: closing)\n \"images\": {\"main\": (480x640x3)} # h, w, c, dtype='uint8'\n \"\"\"\n if 'sim_transfer_cube' in task_name:"
+ },
+ {
+ "comment": "The code sets up a bimanual ViperX environment with either a cube transfer or an insertion task. It first defines the XML path for the environment, then initializes physics from the path and a specific task (TransferCubeTask or InsertionTask) depending on the task_name. The Environment class is instantiated with these parameters, including time limit and control timestep. Finally, it returns the environment and initializes BimanualViperXTask which extends base.Task.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":38-60",
+ "content": " xml_path = os.path.join(XML_DIR, f'bimanual_viperx_transfer_cube.xml')\n physics = mujoco.Physics.from_xml_path(xml_path)\n task = TransferCubeTask(random=False)\n env = control.Environment(physics, task, time_limit=20, control_timestep=DT,\n n_sub_steps=None, flat_observation=False)\n elif 'sim_insertion' in task_name:\n xml_path = os.path.join(XML_DIR, f'bimanual_viperx_insertion.xml')\n physics = mujoco.Physics.from_xml_path(xml_path)\n task = InsertionTask(random=False)\n env = control.Environment(physics, task, time_limit=20, control_timestep=DT,\n n_sub_steps=None, flat_observation=False)\n else:\n raise NotImplementedError\n return env\nclass BimanualViperXTask(base.Task):\n def __init__(self, random=None):\n super().__init__(random=random)\n def before_step(self, action, physics):\n left_arm_action = action[:6]\n right_arm_action = action[7:7+6]\n normalized_left_gripper_action = action[6]"
+ },
+ {
+ "comment": "This code initializes the environment for each episode and before each step, performing actions on a puppet using gripper positions that are first normalized then unnormalized. The actions involve left and right arm movements as well as full gripper actions. It also retrieves the state of the environment using qpos from physics data.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":61-83",
+ "content": " normalized_right_gripper_action = action[7+6]\n left_gripper_action = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(normalized_left_gripper_action)\n right_gripper_action = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(normalized_right_gripper_action)\n full_left_gripper_action = [left_gripper_action, -left_gripper_action]\n full_right_gripper_action = [right_gripper_action, -right_gripper_action]\n env_action = np.concatenate([left_arm_action, full_left_gripper_action, right_arm_action, full_right_gripper_action])\n super().before_step(env_action, physics)\n return\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n super().initialize_episode(physics)\n @staticmethod\n def get_qpos(physics):\n qpos_raw = physics.data.qpos.copy()\n left_qpos_raw = qpos_raw[:8]\n right_qpos_raw = qpos_raw[8:16]\n left_arm_qpos = left_qpos_raw[:6]\n right_arm_qpos = right_qpos_raw[:6]"
+ },
+ {
+ "comment": "This code defines two methods, `get_qpos` and `get_qvel`, which extract the joint positions and velocities from physics data. The left and right gripper positions and velocities are normalized using `PUPPET_GRIPPER_POSITION_NORMALIZE_FN` and `PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN`. These values are then concatenated into observation arrays 'qpos' and 'qvel', which will be used for the environment state. The `get_env_state` method is not implemented yet, while `get_observation` combines the qpos and qvel observations in an ordered dictionary.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":84-106",
+ "content": " left_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(left_qpos_raw[6])]\n right_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(right_qpos_raw[6])]\n return np.concatenate([left_arm_qpos, left_gripper_qpos, right_arm_qpos, right_gripper_qpos])\n @staticmethod\n def get_qvel(physics):\n qvel_raw = physics.data.qvel.copy()\n left_qvel_raw = qvel_raw[:8]\n right_qvel_raw = qvel_raw[8:16]\n left_arm_qvel = left_qvel_raw[:6]\n right_arm_qvel = right_qvel_raw[:6]\n left_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(left_qvel_raw[6])]\n right_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(right_qvel_raw[6])]\n return np.concatenate([left_arm_qvel, left_gripper_qvel, right_arm_qvel, right_gripper_qvel])\n @staticmethod\n def get_env_state(physics):\n raise NotImplementedError\n def get_observation(self, physics):\n obs = collections.OrderedDict()\n obs['qpos'] = self.get_qpos(physics)\n obs['qvel'] = self.get_qvel(physics)"
+ },
+ {
+ "comment": "This code is defining a class `SimEnv` which returns the observation and reward in a bimanual task. It also includes methods for getting the environment state and calculating rewards based on left gripper holding the box. The `TransferCubeTask` inherits from `BimanualViperXTask` and initializes the environment at the start of each episode, with a maximum reward set to 4.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":107-129",
+ "content": " obs['env_state'] = self.get_env_state(physics)\n obs['images'] = dict()\n obs['images']['top'] = physics.render(height=480, width=640, camera_id='top')\n obs['images']['left_wrist'] = physics.render(height=480, width=640, camera_id='left_wrist')\n obs['images']['right_wrist'] = physics.render(height=480, width=640, camera_id='right_wrist')\n # obs['images']['angle'] = physics.render(height=480, width=640, camera_id='angle')\n # obs['images']['vis'] = physics.render(height=480, width=640, camera_id='front_close')\n return obs\n def get_reward(self, physics):\n # return whether left gripper is holding the box\n raise NotImplementedError\nclass TransferCubeTask(BimanualViperXTask):\n def __init__(self, random=None):\n super().__init__(random=random)\n self.max_reward = 4\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n # TODO Notice: this function does not randomize the env configuration. Instead, set BOX_POSE from outside"
+ },
+ {
+ "comment": "Code resets the arm and box positions, gets the environment state by copying qpos values from 16th index onwards, and calculates the reward based on gripper contact with the box.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":130-153",
+ "content": " # reset qpos, control and box position\n with physics.reset_context():\n physics.named.data.qpos[:16] = START_ARM_POSE\n np.copyto(physics.data.ctrl, START_ARM_POSE)\n assert BOX_POSE[0] is not None\n physics.named.data.qpos[-7:] = BOX_POSE[0]\n # print(f\"{BOX_POSE=}\")\n super().initialize_episode(physics)\n @staticmethod\n def get_env_state(physics):\n env_state = physics.data.qpos.copy()[16:]\n return env_state\n def get_reward(self, physics):\n # return whether left gripper is holding the box\n all_contact_pairs = []\n for i_contact in range(physics.data.ncon):\n id_geom_1 = physics.data.contact[i_contact].geom1\n id_geom_2 = physics.data.contact[i_contact].geom2\n name_geom_1 = physics.model.id2name(id_geom_1, 'geom')\n name_geom_2 = physics.model.id2name(id_geom_2, 'geom')\n contact_pair = (name_geom_1, name_geom_2)\n all_contact_pairs.append(contact_pair)"
+ },
+ {
+ "comment": "This code snippet checks for the contact between different objects and assigns a reward based on those contacts. The 'InsertionTask' class initializes an episode by resetting the qpos, control, and box position. However, it currently does not randomize the environment configuration.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":155-179",
+ "content": " touch_left_gripper = (\"red_box\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs\n touch_right_gripper = (\"red_box\", \"vx300s_right/10_right_gripper_finger\") in all_contact_pairs\n touch_table = (\"red_box\", \"table\") in all_contact_pairs\n reward = 0\n if touch_right_gripper:\n reward = 1\n if touch_right_gripper and not touch_table: # lifted\n reward = 2\n if touch_left_gripper: # attempted transfer\n reward = 3\n if touch_left_gripper and not touch_table: # successful transfer\n reward = 4\n return reward\nclass InsertionTask(BimanualViperXTask):\n def __init__(self, random=None):\n super().__init__(random=random)\n self.max_reward = 4\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n # TODO Notice: this function does not randomize the env configuration. Instead, set BOX_POSE from outside\n # reset qpos, control and box position"
+ },
+ {
+ "comment": "This code is part of a physics simulation environment setup. It initializes the episode by setting up the arm and box positions, and then defines methods to get the environment state and reward based on contact between objects in the simulation.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":180-204",
+ "content": " with physics.reset_context():\n physics.named.data.qpos[:16] = START_ARM_POSE\n np.copyto(physics.data.ctrl, START_ARM_POSE)\n assert BOX_POSE[0] is not None\n physics.named.data.qpos[-7*2:] = BOX_POSE[0] # two objects\n # print(f\"{BOX_POSE=}\")\n super().initialize_episode(physics)\n @staticmethod\n def get_env_state(physics):\n env_state = physics.data.qpos.copy()[16:]\n return env_state\n def get_reward(self, physics):\n # return whether peg touches the pin\n all_contact_pairs = []\n for i_contact in range(physics.data.ncon):\n id_geom_1 = physics.data.contact[i_contact].geom1\n id_geom_2 = physics.data.contact[i_contact].geom2\n name_geom_1 = physics.model.id2name(id_geom_1, 'geom')\n name_geom_2 = physics.model.id2name(id_geom_2, 'geom')\n contact_pair = (name_geom_1, name_geom_2)\n all_contact_pairs.append(contact_pair)\n touch_right_gripper = (\"red_peg\", \"vx300s_right/10_right_gripper_finger\") in all_contact_pairs"
+ },
+ {
+ "comment": "This code checks if the left gripper finger of vx300s_left is in contact with any of the four sockets, and also verifies if the red peg is touching the table, a socket, or itself. The purpose seems to be detecting specific object interactions in a simulated environment.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":205-217",
+ "content": " touch_left_gripper = (\"socket-1\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-2\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-3\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-4\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs\n peg_touch_table = (\"red_peg\", \"table\") in all_contact_pairs\n socket_touch_table = (\"socket-1\", \"table\") in all_contact_pairs or \\\n (\"socket-2\", \"table\") in all_contact_pairs or \\\n (\"socket-3\", \"table\") in all_contact_pairs or \\\n (\"socket-4\", \"table\") in all_contact_pairs\n peg_touch_socket = (\"red_peg\", \"socket-1\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-2\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-3\") in all_contact_pairs or \\"
+ },
+ {
+ "comment": "The code is defining a function to determine rewards based on contact between different objects and checking gripper positions. It also includes a function for generating action sequences, setting arm joint positions, and normalizing left gripper position.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":218-241",
+ "content": " (\"red_peg\", \"socket-4\") in all_contact_pairs\n pin_touched = (\"red_peg\", \"pin\") in all_contact_pairs\n reward = 0\n if touch_left_gripper and touch_right_gripper: # touch both\n reward = 1\n if touch_left_gripper and touch_right_gripper and (not peg_touch_table) and (not socket_touch_table): # grasp both\n reward = 2\n if peg_touch_socket and (not peg_touch_table) and (not socket_touch_table): # peg and socket touching\n reward = 3\n if pin_touched: # successful insertion\n reward = 4\n return reward\ndef get_action(master_bot_left, master_bot_right):\n action = np.zeros(14)\n # arm action\n action[:6] = master_bot_left.dxl.joint_states.position[:6]\n action[7:7+6] = master_bot_right.dxl.joint_states.position[:6]\n # gripper action\n left_gripper_pos = master_bot_left.dxl.joint_states.position[7]\n right_gripper_pos = master_bot_right.dxl.joint_states.position[7]\n normalized_left_pos = MASTER_GRIPPER_POSITION_NORMALIZE_FN(left_gripper_pos)"
+ },
+ {
+ "comment": "This code sets up a teleoperation test in the simulation environment using ALOHA and InterbotixManipulatorXS for left and right arms. It initializes the environment, resets it, and starts an episode by adding the first timestep to the episode list. It also sets up plotting for visualizing the simulation's observation images.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":242-265",
+ "content": " normalized_right_pos = MASTER_GRIPPER_POSITION_NORMALIZE_FN(right_gripper_pos)\n action[6] = normalized_left_pos\n action[7+6] = normalized_right_pos\n return action\ndef test_sim_teleop():\n \"\"\" Testing teleoperation in sim with ALOHA. Requires hardware and ALOHA repo to work. \"\"\"\n from interbotix_xs_modules.arm import InterbotixManipulatorXS\n BOX_POSE[0] = [0.2, 0.5, 0.05, 1, 0, 0, 0]\n # source of data\n master_bot_left = InterbotixManipulatorXS(robot_model=\"wx250s\", group_name=\"arm\", gripper_name=\"gripper\",\n robot_name=f'master_left', init_node=True)\n master_bot_right = InterbotixManipulatorXS(robot_model=\"wx250s\", group_name=\"arm\", gripper_name=\"gripper\",\n robot_name=f'master_right', init_node=False)\n # setup the environment\n env = make_sim_env('sim_transfer_cube')\n ts = env.reset()\n episode = [ts]\n # setup plotting\n ax = plt.subplot()\n plt_img = ax.imshow(ts.observation['images']['angle'])"
+ },
+ {
+ "comment": "This code enables interactive plotting of the simulation environment's observations and takes input actions for a specific number of time steps. It utilizes matplotlib's `ion()` function to enable interactive plotting, and then iterates through 1000 time steps, getting actions from `get_action` and updating the plot using `plt_img.set_data`. The `test_sim_teleop()` function is called when the script is run directly.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/sim_env.py\":266-278",
+ "content": " plt.ion()\n for t in range(1000):\n action = get_action(master_bot_left, master_bot_right)\n ts = env.step(action)\n episode.append(ts)\n plt_img.set_data(ts.observation['images']['angle'])\n plt.pause(0.02)\nif __name__ == '__main__':\n test_sim_teleop()"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/77134bbb-c326-4338-86ba-f62497573dc1.json b/docs/doc/77134bbb-c326-4338-86ba-f62497573dc1.json
new file mode 100644
index 00000000..5f3ba75e
--- /dev/null
+++ b/docs/doc/77134bbb-c326-4338-86ba-f62497573dc1.json
@@ -0,0 +1,10 @@
+{
+ "summary": "The code imports the build functions for DETR-VAE and CNN+MLP models from their respective modules. It defines two model building functions, `build_ACT_model` and `build_CNNMLP_model`, which return the built models using the imported build functions based on given arguments.",
+ "details": [
+ {
+ "comment": "The code imports the build functions for DETR-VAE and CNN+MLP models from their respective modules. It defines two model building functions, `build_ACT_model` and `build_CNNMLP_model`, which return the built models using the imported build functions based on given arguments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/__init__.py\":0-8",
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nfrom .detr_vae import build as build_vae\nfrom .detr_vae import build_cnnmlp as build_cnnmlp\ndef build_ACT_model(args):\n return build_vae(args)\ndef build_CNNMLP_model(args):\n return build_cnnmlp(args)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/78c83283-e5ac-4d86-a89f-63356cf80d2d.json b/docs/doc/78c83283-e5ac-4d86-a89f-63356cf80d2d.json
new file mode 100644
index 00000000..721f1edd
--- /dev/null
+++ b/docs/doc/78c83283-e5ac-4d86-a89f-63356cf80d2d.json
@@ -0,0 +1,10 @@
+{
+ "summary": "This code snippet is modified from the DETR repository and licensed under Apache 2.0. It cites End-to-End Object Detection with Transformers paper as its reference.",
+ "details": [
+ {
+ "comment": "This code snippet is modified from the DETR repository and licensed under Apache 2.0. It cites End-to-End Object Detection with Transformers paper as its reference.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/README.md\":0-8",
+ "content": "This part of the codebase is modified from DETR https://github.com/facebookresearch/detr under APACHE 2.0.\n @article{Carion2020EndtoEndOD,\n title={End-to-End Object Detection with Transformers},\n author={Nicolas Carion and Francisco Massa and Gabriel Synnaeve and Nicolas Usunier and Alexander Kirillov and Sergey Zagoruyko},\n journal={ArXiv},\n year={2020},\n volume={abs/2005.12872}\n }"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/7e3bec04-4f19-4c39-81c1-25731840c4aa.json b/docs/doc/7e3bec04-4f19-4c39-81c1-25731840c4aa.json
new file mode 100644
index 00000000..94458055
--- /dev/null
+++ b/docs/doc/7e3bec04-4f19-4c39-81c1-25731840c4aa.json
@@ -0,0 +1,15 @@
+{
+ "summary": "The code imports libraries and defines a main function to replay an episode from an existing dataset, organizing images into videos. The 'save_videos' function is defined for command line arguments and executed if the script is run directly.",
+ "details": [
+ {
+ "comment": "The code imports necessary libraries and defines a main function to replay an episode from an existing dataset. It checks if the dataset file exists, then reads the actions and initializes the simulation environment. It performs the steps of the replayed episode by taking actions in the environment, appending states to the episode_replay list. The code then organizes images from each state into a dictionary for saving. Finally, it creates a video path with the modified name and saves the images as videos in that new path.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/replay_episodes.py\":0-40",
+ "content": "import os\nimport h5py\nimport argparse\nfrom collections import defaultdict \nfrom sim_env import make_sim_env\nfrom utils import sample_box_pose, sample_insertion_pose\nfrom sim_env import BOX_POSE\nfrom constants import DT\nfrom visualize_episodes import save_videos\nimport IPython\ne = IPython.embed\ndef main(args):\n dataset_path = args['dataset_path']\n if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n actions = root['/action'][()]\n env = make_sim_env('sim_transfer_cube')\n BOX_POSE[0] = sample_box_pose() # used in sim reset\n ts = env.reset()\n episode_replay = [ts]\n for action in actions:\n ts = env.step(action)\n episode_replay.append(ts)\n # saving\n image_dict = defaultdict(lambda: [])\n while episode_replay:\n ts = episode_replay.pop(0)\n for cam_name, image in ts.observation['images'].items():\n image_dict[cam_name].append(image)\n video_path = dataset_path.replace('episode_', 'replay_episode_').replace('hdf5', 'mp4')"
+ },
+ {
+ "comment": "The code defines a function \"save_videos\" and checks if the script is run directly. It sets up an ArgumentParser for command line arguments, including '--dataset_path'. Then it calls main with the parsed command line arguments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/replay_episodes.py\":41-47",
+ "content": " save_videos(image_dict, DT, video_path=video_path)\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--dataset_path', action='store', type=str, help='Dataset path.', required=True)\n main(vars(parser.parse_args()))"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/82b74cd8-88bf-4f3e-ab20-000c681d49d5.json b/docs/doc/82b74cd8-88bf-4f3e-ab20-000c681d49d5.json
new file mode 100644
index 00000000..482ea08b
--- /dev/null
+++ b/docs/doc/82b74cd8-88bf-4f3e-ab20-000c681d49d5.json
@@ -0,0 +1,80 @@
+{
+ "summary": "The \"SmoothedValue\" and MetricLogger classes log metrics, offer smoothing, and track progress updates with memory usage. The PyTorch NestedTensor class supports distributed training, ONNX tracing, image padding, and accuracy functions.",
+ "details": [
+ {
+ "comment": "The code is a Python class called \"SmoothedValue\" that tracks a series of values and provides access to smoothed values over a window or the global average. It uses a deque data structure with maximum length \"window_size\" for efficient storage, and keeps track of the total and count of values. The format string \"fmt\" determines how the smoothed value and global average are displayed.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":0-36",
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nMisc functions, including distributed helpers.\nMostly copy-paste from torchvision references.\n\"\"\"\nimport os\nimport subprocess\nimport time\nfrom collections import defaultdict, deque\nimport datetime\nimport pickle\nfrom packaging import version\nfrom typing import Optional, List\nimport torch\nimport torch.distributed as dist\nfrom torch import Tensor\n# needed due to empty tensor bug in pytorch and torchvision 0.5\nimport torchvision\nif version.parse(torchvision.__version__) < version.parse('0.7'):\n from torchvision.ops import _new_empty_tensor\n from torchvision.ops.misc import _output_size\nclass SmoothedValue(object):\n \"\"\"Track a series of values and provide access to smoothed values over a\n window or the global series average.\n \"\"\"\n def __init__(self, window_size=20, fmt=None):\n if fmt is None:\n fmt = \"{median:.4f} ({global_avg:.4f})\"\n self.deque = deque(maxlen=window_size)\n self.total = 0.0\n self.count = 0"
+ },
+ {
+ "comment": "The code defines a class that tracks a deque (double-ended queue) and provides various properties such as median, average, maximum value, global average, and current value. It also allows updating the deque with values and synchronizing counts and totals across multiple processes using PyTorch's distributed functions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":37-80",
+ "content": " self.fmt = fmt\n def update(self, value, n=1):\n self.deque.append(value)\n self.count += n\n self.total += value * n\n def synchronize_between_processes(self):\n \"\"\"\n Warning: does not synchronize the deque!\n \"\"\"\n if not is_dist_avail_and_initialized():\n return\n t = torch.tensor([self.count, self.total], dtype=torch.float64, device='cuda')\n dist.barrier()\n dist.all_reduce(t)\n t = t.tolist()\n self.count = int(t[0])\n self.total = t[1]\n @property\n def median(self):\n d = torch.tensor(list(self.deque))\n return d.median().item()\n @property\n def avg(self):\n d = torch.tensor(list(self.deque), dtype=torch.float32)\n return d.mean().item()\n @property\n def global_avg(self):\n return self.total / self.count\n @property\n def max(self):\n return max(self.deque)\n @property\n def value(self):\n return self.deque[-1]\n def __str__(self):\n return self.fmt.format("
+ },
+ {
+ "comment": "This function runs \"all_gather\" on any picklable data object, not necessarily tensors. It first checks if the world size is 1 and returns data if so. If not, it picks up the data, converts it into a byte tensor, gathers the local size of the tensor from each rank using all_gather, finds the maximum size among them, and finally performs an all_gather on the tensor while padding when necessary.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":81-113",
+ "content": " median=self.median,\n avg=self.avg,\n global_avg=self.global_avg,\n max=self.max,\n value=self.value)\ndef all_gather(data):\n \"\"\"\n Run all_gather on arbitrary picklable data (not necessarily tensors)\n Args:\n data: any picklable object\n Returns:\n list[data]: list of data gathered from each rank\n \"\"\"\n world_size = get_world_size()\n if world_size == 1:\n return [data]\n # serialized to a Tensor\n buffer = pickle.dumps(data)\n storage = torch.ByteStorage.from_buffer(buffer)\n tensor = torch.ByteTensor(storage).to(\"cuda\")\n # obtain Tensor size of each rank\n local_size = torch.tensor([tensor.numel()], device=\"cuda\")\n size_list = [torch.tensor([0], device=\"cuda\") for _ in range(world_size)]\n dist.all_gather(size_list, local_size)\n size_list = [int(size.item()) for size in size_list]\n max_size = max(size_list)\n # receiving Tensor from all ranks\n # we pad the tensor because torch all_gather does not support"
+ },
+ {
+ "comment": "This code snippet is responsible for gathering tensors of different shapes and reducing the values in a dictionary from all processes. It first creates empty tensors for various sizes, then gathers them using all-gather operation. Afterwards, it converts tensors to data and appends them to a list. The second function reduces the values in an input dictionary across multiple processes by averaging or summing them, based on the specified flag.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":114-142",
+ "content": " # gathering tensors of different shapes\n tensor_list = []\n for _ in size_list:\n tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device=\"cuda\"))\n if local_size != max_size:\n padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device=\"cuda\")\n tensor = torch.cat((tensor, padding), dim=0)\n dist.all_gather(tensor_list, tensor)\n data_list = []\n for size, tensor in zip(size_list, tensor_list):\n buffer = tensor.cpu().numpy().tobytes()[:size]\n data_list.append(pickle.loads(buffer))\n return data_list\ndef reduce_dict(input_dict, average=True):\n \"\"\"\n Args:\n input_dict (dict): all the values will be reduced\n average (bool): whether to do average or sum\n Reduce the values in the dictionary from all processes so that all processes\n have the averaged results. Returns a dict with the same fields as\n input_dict, after reduction.\n \"\"\"\n world_size = get_world_size()\n if world_size < 2:\n return input_dict"
+ },
+ {
+ "comment": "This code snippet defines a class MetricLogger which logs metrics such as average and sum. It also contains a function that averages values across processes, creating a reduced dictionary after performing all-reduce operation. This could be useful for distributed training where different processes need to communicate their results for aggregation and averaging purposes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":143-175",
+ "content": " with torch.no_grad():\n names = []\n values = []\n # sort the keys so that they are consistent across processes\n for k in sorted(input_dict.keys()):\n names.append(k)\n values.append(input_dict[k])\n values = torch.stack(values, dim=0)\n dist.all_reduce(values)\n if average:\n values /= world_size\n reduced_dict = {k: v for k, v in zip(names, values)}\n return reduced_dict\nclass MetricLogger(object):\n def __init__(self, delimiter=\"\\t\"):\n self.meters = defaultdict(SmoothedValue)\n self.delimiter = delimiter\n def update(self, **kwargs):\n for k, v in kwargs.items():\n if isinstance(v, torch.Tensor):\n v = v.item()\n assert isinstance(v, (float, int))\n self.meters[k].update(v)\n def __getattr__(self, attr):\n if attr in self.meters:\n return self.meters[attr]\n if attr in self.__dict__:\n return self.__dict__[attr]\n raise AttributeError(\"'{}' object has no attribute '{}'\".format("
+ },
+ {
+ "comment": "The code defines a class with methods for logging iterable data every 'print_freq' iterations. It includes synchronization, adding meters, and displaying loss metrics as strings. The class also has a timer to calculate elapsed time for each iteration of the iterable.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":176-207",
+ "content": " type(self).__name__, attr))\n def __str__(self):\n loss_str = []\n for name, meter in self.meters.items():\n loss_str.append(\n \"{}: {}\".format(name, str(meter))\n )\n return self.delimiter.join(loss_str)\n def synchronize_between_processes(self):\n for meter in self.meters.values():\n meter.synchronize_between_processes()\n def add_meter(self, name, meter):\n self.meters[name] = meter\n def log_every(self, iterable, print_freq, header=None):\n i = 0\n if not header:\n header = ''\n start_time = time.time()\n end = time.time()\n iter_time = SmoothedValue(fmt='{avg:.4f}')\n data_time = SmoothedValue(fmt='{avg:.4f}')\n space_fmt = ':' + str(len(str(len(iterable)))) + 'd'\n if torch.cuda.is_available():\n log_msg = self.delimiter.join([\n header,\n '[{0' + space_fmt + '}/{1}]',\n 'eta: {eta}',\n '{meters}',"
+ },
+ {
+ "comment": "This code snippet is part of a progress bar implementation. It calculates elapsed time, remaining time estimation, and memory usage for an iterable. The log message is constructed with dynamic placeholders and printed at specified intervals based on the print_freq variable. The CUDA availability check ensures proper printing to the console or CUDA device.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":208-233",
+ "content": " 'time: {time}',\n 'data: {data}',\n 'max mem: {memory:.0f}'\n ])\n else:\n log_msg = self.delimiter.join([\n header,\n '[{0' + space_fmt + '}/{1}]',\n 'eta: {eta}',\n '{meters}',\n 'time: {time}',\n 'data: {data}'\n ])\n MB = 1024.0 * 1024.0\n for obj in iterable:\n data_time.update(time.time() - end)\n yield obj\n iter_time.update(time.time() - end)\n if i % print_freq == 0 or i == len(iterable) - 1:\n eta_seconds = iter_time.global_avg * (len(iterable) - i)\n eta_string = str(datetime.timedelta(seconds=int(eta_seconds)))\n if torch.cuda.is_available():\n print(log_msg.format(\n i, len(iterable), eta=eta_string,\n meters=str(self),\n time=str(iter_time), data=str(data_time),"
+ },
+ {
+ "comment": "The code defines a function that calculates the total time taken for an iterable and logs progress updates. It also includes functions to get the current branch, uncommitted changes, and SHA of the current file's directory.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":234-260",
+ "content": " memory=torch.cuda.max_memory_allocated() / MB))\n else:\n print(log_msg.format(\n i, len(iterable), eta=eta_string,\n meters=str(self),\n time=str(iter_time), data=str(data_time)))\n i += 1\n end = time.time()\n total_time = time.time() - start_time\n total_time_str = str(datetime.timedelta(seconds=int(total_time)))\n print('{} Total time: {} ({:.4f} s / it)'.format(\n header, total_time_str, total_time / len(iterable)))\ndef get_sha():\n cwd = os.path.dirname(os.path.abspath(__file__))\n def _run(command):\n return subprocess.check_output(command, cwd=cwd).decode('ascii').strip()\n sha = 'N/A'\n diff = \"clean\"\n branch = 'N/A'\n try:\n sha = _run(['git', 'rev-parse', 'HEAD'])\n subprocess.check_output(['git', 'diff'], cwd=cwd)\n diff = _run(['git', 'diff-index', 'HEAD'])\n diff = \"has uncommited changes\" if diff else \"clean\""
+ },
+ {
+ "comment": "This code defines a class NestedTensor, functions collate_fn, _max_by_axis, and _run. NestedTensor represents tensors with optional masking for PyTorch. collate_fn organizes input batches of different dimensions into tuples. _max_by_axis finds the maximum value along an axis in a list of lists. _run executes a git command and returns its output. These functions appear to be used in deep learning tasks, potentially for data processing or model training.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":261-297",
+ "content": " branch = _run(['git', 'rev-parse', '--abbrev-ref', 'HEAD'])\n except Exception:\n pass\n message = f\"sha: {sha}, status: {diff}, branch: {branch}\"\n return message\ndef collate_fn(batch):\n batch = list(zip(*batch))\n batch[0] = nested_tensor_from_tensor_list(batch[0])\n return tuple(batch)\ndef _max_by_axis(the_list):\n # type: (List[List[int]]) -> List[int]\n maxes = the_list[0]\n for sublist in the_list[1:]:\n for index, item in enumerate(sublist):\n maxes[index] = max(maxes[index], item)\n return maxes\nclass NestedTensor(object):\n def __init__(self, tensors, mask: Optional[Tensor]):\n self.tensors = tensors\n self.mask = mask\n def to(self, device):\n # type: (Device) -> NestedTensor # noqa\n cast_tensor = self.tensors.to(device)\n mask = self.mask\n if mask is not None:\n assert mask is not None\n cast_mask = mask.to(device)\n else:\n cast_mask = None\n return NestedTensor(cast_tensor, cast_mask)"
+ },
+ {
+ "comment": "The code defines a `decompose` function that returns tensors and mask, and a `__repr__` function that returns the tensor representation. The main function is `nested_tensor_from_tensor_list`, which takes a list of tensors and creates a nested tensor by resizing them to have the same maximum shape while padding smaller ones. It supports 3D tensors and has TODOs for generalization and supporting different-sized images.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":299-323",
+ "content": " def decompose(self):\n return self.tensors, self.mask\n def __repr__(self):\n return str(self.tensors)\ndef nested_tensor_from_tensor_list(tensor_list: List[Tensor]):\n # TODO make this more general\n if tensor_list[0].ndim == 3:\n if torchvision._is_tracing():\n # nested_tensor_from_tensor_list() does not export well to ONNX\n # call _onnx_nested_tensor_from_tensor_list() instead\n return _onnx_nested_tensor_from_tensor_list(tensor_list)\n # TODO make it support different-sized images\n max_size = _max_by_axis([list(img.shape) for img in tensor_list])\n # min_size = tuple(min(s) for s in zip(*[img.shape for img in tensor_list]))\n batch_shape = [len(tensor_list)] + max_size\n b, c, h, w = batch_shape\n dtype = tensor_list[0].dtype\n device = tensor_list[0].device\n tensor = torch.zeros(batch_shape, dtype=dtype, device=device)\n mask = torch.ones((b, h, w), dtype=torch.bool, device=device)\n for img, pad_img, m in zip(tensor_list, tensor, mask):"
+ },
+ {
+ "comment": "This code is creating a NestedTensor from a list of tensors. It checks if the input tensor_list has the same size and data type, then pads the images to have the maximum size in each dimension, and sets the mask accordingly. If not supported, it raises a ValueError. This implementation is designed to be compatible with ONNX tracing using @torch.jit.unused decorator.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":324-348",
+ "content": " pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)\n m[: img.shape[1], :img.shape[2]] = False\n else:\n raise ValueError('not supported')\n return NestedTensor(tensor, mask)\n# _onnx_nested_tensor_from_tensor_list() is an implementation of\n# nested_tensor_from_tensor_list() that is supported by ONNX tracing.\n@torch.jit.unused\ndef _onnx_nested_tensor_from_tensor_list(tensor_list: List[Tensor]) -> NestedTensor:\n max_size = []\n for i in range(tensor_list[0].dim()):\n max_size_i = torch.max(torch.stack([img.shape[i] for img in tensor_list]).to(torch.float32)).to(torch.int64)\n max_size.append(max_size_i)\n max_size = tuple(max_size)\n # work around for\n # pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)\n # m[: img.shape[1], :img.shape[2]] = False\n # which is not yet supported in onnx\n padded_imgs = []\n padded_masks = []\n for img in tensor_list:\n padding = [(s1 - s2) for s1, s2 in zip(max_size, tuple(img.shape))]"
+ },
+ {
+ "comment": "This code snippet is from the act-plus-plus/detr/util/misc.py file and contains functions to pad images, handle distributed training, and check if distributed training is available and initialized. It also sets up a custom print function for non-master processes in distributed training and returns the world size.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":349-385",
+ "content": " padded_img = torch.nn.functional.pad(img, (0, padding[2], 0, padding[1], 0, padding[0]))\n padded_imgs.append(padded_img)\n m = torch.zeros_like(img[0], dtype=torch.int, device=img.device)\n padded_mask = torch.nn.functional.pad(m, (0, padding[2], 0, padding[1]), \"constant\", 1)\n padded_masks.append(padded_mask.to(torch.bool))\n tensor = torch.stack(padded_imgs)\n mask = torch.stack(padded_masks)\n return NestedTensor(tensor, mask=mask)\ndef setup_for_distributed(is_master):\n \"\"\"\n This function disables printing when not in master process\n \"\"\"\n import builtins as __builtin__\n builtin_print = __builtin__.print\n def print(*args, **kwargs):\n force = kwargs.pop('force', False)\n if is_master or force:\n builtin_print(*args, **kwargs)\n __builtin__.print = print\ndef is_dist_avail_and_initialized():\n if not dist.is_available():\n return False\n if not dist.is_initialized():\n return False\n return True\ndef get_world_size():"
+ },
+ {
+ "comment": "This code sets up distributed mode for deep learning tasks. It checks if the distribution environment is available and initialized, then gets world size and rank, defines helper functions like saving on master process only, and initializes distributed mode based on the environment variables. The code assumes the use of either Torch or NCCL backend for distributed training.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":386-424",
+ "content": " if not is_dist_avail_and_initialized():\n return 1\n return dist.get_world_size()\ndef get_rank():\n if not is_dist_avail_and_initialized():\n return 0\n return dist.get_rank()\ndef is_main_process():\n return get_rank() == 0\ndef save_on_master(*args, **kwargs):\n if is_main_process():\n torch.save(*args, **kwargs)\ndef init_distributed_mode(args):\n if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:\n args.rank = int(os.environ[\"RANK\"])\n args.world_size = int(os.environ['WORLD_SIZE'])\n args.gpu = int(os.environ['LOCAL_RANK'])\n elif 'SLURM_PROCID' in os.environ:\n args.rank = int(os.environ['SLURM_PROCID'])\n args.gpu = args.rank % torch.cuda.device_count()\n else:\n print('Not using distributed mode')\n args.distributed = False\n return\n args.distributed = True\n torch.cuda.set_device(args.gpu)\n args.dist_backend = 'nccl'\n print('| distributed init (rank {}): {}'.format(\n args.rank, args.dist_url), flush=True)"
+ },
+ {
+ "comment": "This code initializes a distributed process group and sets up functions for calculating accuracy and interpolating tensors. The distributed process group allows for parallel processing across multiple devices, while the accuracy function computes precision@k for specified values of k, and the interpolate function provides equivalent functionality to nn.functional.interpolate but supports empty batch sizes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":425-453",
+ "content": " torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,\n world_size=args.world_size, rank=args.rank)\n torch.distributed.barrier()\n setup_for_distributed(args.rank == 0)\n@torch.no_grad()\ndef accuracy(output, target, topk=(1,)):\n \"\"\"Computes the precision@k for the specified values of k\"\"\"\n if target.numel() == 0:\n return [torch.zeros([], device=output.device)]\n maxk = max(topk)\n batch_size = target.size(0)\n _, pred = output.topk(maxk, 1, True, True)\n pred = pred.t()\n correct = pred.eq(target.view(1, -1).expand_as(pred))\n res = []\n for k in topk:\n correct_k = correct[:k].view(-1).float().sum(0)\n res.append(correct_k.mul_(100.0 / batch_size))\n return res\ndef interpolate(input, size=None, scale_factor=None, mode=\"nearest\", align_corners=None):\n # type: (Tensor, Optional[List[int]], Optional[float], str, Optional[bool]) -> Tensor\n \"\"\"\n Equivalent to nn.functional.interpolate, but with support for empty batch sizes."
+ },
+ {
+ "comment": "This function checks the PyTorch and torchvision versions, and performs interpolation differently based on the version. If the version is below 0.7, it uses torch.nn.functional.interpolate(). Otherwise, it calls torchvision.ops.misc.interpolate(). The code also handles empty input cases by returning a new tensor with the appropriate shape.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/misc.py\":454-467",
+ "content": " This will eventually be supported natively by PyTorch, and this\n class can go away.\n \"\"\"\n if version.parse(torchvision.__version__) < version.parse('0.7'):\n if input.numel() > 0:\n return torch.nn.functional.interpolate(\n input, size, scale_factor, mode, align_corners\n )\n output_shape = _output_size(2, input, size, scale_factor)\n output_shape = list(input.shape[:-2]) + list(output_shape)\n return _new_empty_tensor(input, output_shape)\n else:\n return torchvision.ops.misc.interpolate(input, size, scale_factor, mode, align_corners)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/8557fd34-6870-45f5-9434-1ebfe0512282.json b/docs/doc/8557fd34-6870-45f5-9434-1ebfe0512282.json
new file mode 100644
index 00000000..7c49b7de
--- /dev/null
+++ b/docs/doc/8557fd34-6870-45f5-9434-1ebfe0512282.json
@@ -0,0 +1,40 @@
+{
+ "summary": "The code compresses images, handles HDF5 datasets, and processes videos. It removes depth images, concatenates camera videos, decompresses/compresses images, and saves the first episode video.",
+ "details": [
+ {
+ "comment": "The code compresses a dataset by creating a new compressed HDF5 file. It checks if the output path already exists, loads the uncompressed dataset, creates the compressed dataset with the same non-image data and attributes, and then copies over only the 'observations' key from the input file to the output file.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/compress_data.py\":0-34",
+ "content": "\"\"\"\nExample usage:\n$ python3 script/compress_data.py --dataset_dir /scr/lucyshi/dataset/aloha_test\n\"\"\"\nimport os\nimport h5py\nimport cv2\nimport numpy as np\nimport argparse\nfrom tqdm import tqdm\n# Constants\nDT = 0.02\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTATE_NAMES = JOINT_NAMES + [\"gripper\"]\ndef compress_dataset(input_dataset_path, output_dataset_path):\n # Check if output path exists\n if os.path.exists(output_dataset_path):\n print(f\"The file {output_dataset_path} already exists. Exiting...\")\n return\n # Load the uncompressed dataset\n with h5py.File(input_dataset_path, 'r') as infile:\n # Create the compressed dataset\n with h5py.File(output_dataset_path, 'w') as outfile:\n outfile.attrs['sim'] = infile.attrs['sim']\n outfile.attrs['compress'] = True\n # Copy non-image data directly\n for key in infile.keys():\n if key != 'observations':\n outfile.copy(infile[key], key)"
+ },
+ {
+ "comment": "Creates observation group in output file, copies non-image data, creates image group in observations, applies JPEG compression parameters, skips depth images, stores compressed lengths for each camera.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/compress_data.py\":36-60",
+ "content": " obs_group = infile['observations']\n # Create observation group in the output\n out_obs_group = outfile.create_group('observations')\n # Copy non-image data in observations directly\n for key in obs_group.keys():\n if key != 'images':\n out_obs_group.copy(obs_group[key], key)\n image_group = obs_group['images']\n out_image_group = out_obs_group.create_group('images')\n # JPEG compression parameters\n encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 50]\n compressed_lens = [] # List to store compressed lengths for each camera\n for cam_name in image_group.keys():\n if \"_depth\" in cam_name: # Depth images are not compressed\n out_image_group.copy(image_group[cam_name], cam_name)\n else:\n images = image_group[cam_name]\n compressed_images = []\n cam_compressed_lens = [] # List to store compressed lengths for this camera"
+ },
+ {
+ "comment": "This code compresses images and stores their lengths in a list. It then finds the maximum length of the compressed images and creates a dataset to store them in an HDF5 file, with the same length as the number of images. Finally, it saves the compressed lengths to the HDF5 file.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/compress_data.py\":62-81",
+ "content": " # Compress each image\n for image in images:\n result, encoded_image = cv2.imencode('.jpg', image, encode_param)\n compressed_images.append(encoded_image)\n cam_compressed_lens.append(len(encoded_image)) # Store the length\n compressed_lens.append(cam_compressed_lens)\n # Find the maximum length of the compressed images\n max_len = max(len(img) for img in compressed_images)\n # Create dataset to store compressed images\n compressed_dataset = out_image_group.create_dataset(cam_name, (len(compressed_images), max_len), dtype='uint8')\n # Store compressed images\n for i, img in enumerate(compressed_images):\n compressed_dataset[i, :len(img)] = img\n # Save the compressed lengths to the HDF5 file\n compressed_lens = np.array(compressed_lens)"
+ },
+ {
+ "comment": "Code saves a compressed dataset to the specified output path. It first checks if the video is in a list or dictionary format, and then creates a VideoWriter object with the desired parameters. For each frame of the video, it concatenates images from all cameras into one image, swaps B and R channels, and writes the resulting image to the output file. Finally, it releases the VideoWriter object and prints the saved video path.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/compress_data.py\":82-107",
+ "content": " _ = outfile.create_dataset('compress_len', compressed_lens.shape)\n outfile['/compress_len'][...] = compressed_lens\n print(f\"Compressed dataset saved to {output_dataset_path}\")\ndef save_videos(video, dt, video_path=None):\n if isinstance(video, list):\n cam_names = list(video[0].keys())\n h, w, _ = video[0][cam_names[0]].shape\n w = w * len(cam_names)\n fps = int(1/dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n # bitrate = 1000000\n # out.set(cv2.VIDEOWRITER_PROP_BITRATE, bitrate)\n for ts, image_dict in enumerate(video):\n images = []\n for cam_name in cam_names:\n image = image_dict[cam_name]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n images.append(image)\n images = np.concatenate(images, axis=1)\n out.write(images)\n out.release()\n print(f'Saved video to: {video_path}')\n elif isinstance(video, dict):"
+ },
+ {
+ "comment": "This code loads an HDF5 dataset, removes depth images, concatenates remaining camera videos along the width dimension, saves the resulting video, and provides functions for loading and saving the first episode video.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/compress_data.py\":108-134",
+ "content": " cam_names = list(video.keys())\n # Remove depth images\n cam_names = [cam_name for cam_name in cam_names if '_depth' not in cam_name]\n all_cam_videos = []\n for cam_name in cam_names:\n all_cam_videos.append(video[cam_name])\n all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension\n n_frames, h, w, _ = all_cam_videos.shape\n fps = int(1 / dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n for t in range(n_frames):\n image = all_cam_videos[t]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n out.write(image)\n out.release()\n print(f'Saved video to: {video_path}')\ndef load_and_save_first_episode_video(dataset_dir, video_path):\n dataset_name = 'episode_0'\n _, _, _, _, image_dict = load_hdf5(dataset_dir, dataset_name)\n save_videos(image_dict, DT, video_path=video_path)\ndef load_hdf5(dataset_dir, dataset_name):\n dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')"
+ },
+ {
+ "comment": "This code checks if the dataset file exists, loads compressed images from the file, and returns an image dictionary. If the dataset file is missing, it prints a message and exits. Compressed images are loaded for each camera, and the compressed images are decompressed into a list of images per camera. The final result is the image dictionary containing these lists of images.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/compress_data.py\":135-158",
+ "content": " if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n compressed = root.attrs.get('compress', False)\n image_dict = dict()\n for cam_name in root[f'/observations/images/'].keys():\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]\n if compressed:\n compress_len = root['/compress_len'][()]\n if compressed:\n for cam_id, cam_name in enumerate(image_dict.keys()):\n padded_compressed_image_list = image_dict[cam_name]\n image_list = []\n for frame_id, padded_compressed_image in enumerate(padded_compressed_image_list):\n image_len = int(compress_len[cam_id, frame_id])\n compressed_image = padded_compressed_image\n image = cv2.imdecode(compressed_image, 1)\n image_list.append(image)\n image_dict[cam_name] = image_list\n return None, None, None, None, image_dict # Return only the image dict for this application"
+ },
+ {
+ "comment": "This code compresses all HDF5 datasets in a specified directory. It requires the directory path, creates a compressed dataset directory, iterates over each file ending with '.hdf5', compresses the dataset using 'compress_dataset' function, and after processing all datasets, loads and saves the video for the first episode.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/compress_data.py\":161-180",
+ "content": "if __name__ == '__main__':\n parser = argparse.ArgumentParser(description=\"Compress all HDF5 datasets in a directory.\")\n parser.add_argument('--dataset_dir', action='store', type=str, required=True, help='Directory containing the uncompressed datasets.')\n args = parser.parse_args()\n output_dataset_dir = args.dataset_dir + '_compressed'\n os.makedirs(output_dataset_dir, exist_ok=True)\n # Iterate over each file in the directory\n for filename in tqdm(os.listdir(args.dataset_dir), desc=\"Compressing data\"):\n if filename.endswith('.hdf5'):\n input_path = os.path.join(args.dataset_dir, filename)\n output_path = os.path.join(output_dataset_dir, filename)\n compress_dataset(input_path, output_path)\n # After processing all datasets, load and save the video for the first episode\n print(f'Saving video for episode 0 in {output_dataset_dir}')\n video_path = os.path.join(output_dataset_dir, 'episode_0_video.mp4')\n load_and_save_first_episode_video(output_dataset_dir, video_path)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/85d60cd4-44ad-47b6-ba14-c1a611468bd6.json b/docs/doc/85d60cd4-44ad-47b6-ba14-c1a611468bd6.json
new file mode 100644
index 00000000..b764082b
--- /dev/null
+++ b/docs/doc/85d60cd4-44ad-47b6-ba14-c1a611468bd6.json
@@ -0,0 +1,10 @@
+{
+ "summary": "This code is a Python setup script that utilizes the distutils and setuptools packages to create a distribution package for the 'act' software. It specifies the name, version, packages, license, and long_description of the software.",
+ "details": [
+ {
+ "comment": "This code is a Python setup script that utilizes the distutils and setuptools packages to create a distribution package for the 'act' software. It specifies the name, version, packages, license, and long_description of the software.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/setup.py\":0-9",
+ "content": "from distutils.core import setup\nfrom setuptools import find_packages\nsetup(\n name='act',\n version='0.0.0',\n packages=find_packages(),\n license='MIT License',\n long_description=open('README.md').read(),\n)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/8741269f-98a0-4e12-9960-2e02f3cb8af2.json b/docs/doc/8741269f-98a0-4e12-9960-2e02f3cb8af2.json
new file mode 100644
index 00000000..e9b1c282
--- /dev/null
+++ b/docs/doc/8741269f-98a0-4e12-9960-2e02f3cb8af2.json
@@ -0,0 +1,15 @@
+{
+ "summary": "This code imports modules, defines a calibration function for head cam and symmetrical arms, creates instances of InterbotixManipulatorXS bots, sets arm positions to sleep for 2 seconds, and opens grippers.",
+ "details": [
+ {
+ "comment": "Code imports necessary modules and defines a function for calibrating head cam and symmetrical arms. It creates instances of InterbotixManipulatorXS for left and right puppet bots, turns on torque, and initializes positions based on multipliers for symmetry.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/align.py\":0-22",
+ "content": "from interbotix_xs_modules.arm import InterbotixManipulatorXS\nfrom aloha_scripts.robot_utils import move_arms, torque_on, move_grippers\nfrom constants import PUPPET_GRIPPER_JOINT_OPEN, PUPPET_GRIPPER_JOINT_CLOSE\nimport argparse\nimport numpy as np\n# for calibrating head cam and arms being symmetrical\ndef main():\n argparser = argparse.ArgumentParser()\n argparser.add_argument('--all', action='store_true', default=False)\n args = argparser.parse_args()\n puppet_bot_left = InterbotixManipulatorXS(robot_model=\"vx300s\", group_name=\"arm\", gripper_name=\"gripper\", robot_name=f'puppet_left', init_node=True)\n puppet_bot_right = InterbotixManipulatorXS(robot_model=\"vx300s\", group_name=\"arm\", gripper_name=\"gripper\", robot_name=f'puppet_right', init_node=False)\n all_bots = [puppet_bot_left, puppet_bot_right]\n for bot in all_bots:\n torque_on(bot)\n multiplier = np.array([-1, 1, 1, -1, 1, 1])\n puppet_sleep_position_left = np.array([-0.8, -0.5, 0.5, 0, 0.65, 0])\n puppet_sleep_position_right = puppet_sleep_position_left * multiplier"
+ },
+ {
+ "comment": "Sets all bots' arm positions to sleep positions for 2 seconds, then opens grippers.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/align.py\":23-30",
+ "content": " all_positions = [puppet_sleep_position_left, puppet_sleep_position_right]\n move_arms(all_bots, all_positions, move_time=2)\n # move_grippers(all_bots, [PUPPET_GRIPPER_JOINT_OPEN] * 2, move_time=1) # open\nif __name__ == '__main__':\n main()"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/8f01b981-212d-440f-82e5-a17d6e4efd62.json b/docs/doc/8f01b981-212d-440f-82e5-a17d6e4efd62.json
new file mode 100644
index 00000000..16478bd5
--- /dev/null
+++ b/docs/doc/8f01b981-212d-440f-82e5-a17d6e4efd62.json
@@ -0,0 +1,35 @@
+{
+ "summary": "This script truncates and compresses a dataset using h5py, creating an observation group with limited image data. It saves truncated datasets or videos, extracts camera names, resizes images, and requires 'act-plus-plus' for argument parsing and directory manipulation. Output dataset directory has '_truncated' suffix.",
+ "details": [
+ {
+ "comment": "This script compresses a dataset by truncating its length and storing the compressed dataset in a new file. It checks if the output path already exists and copies non-image data directly to the output file. The script takes an input_dataset_path and an output_dataset_path as arguments, and it uses h5py library for handling HDF5 files.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/truncate_data.py\":0-34",
+ "content": "\"\"\"\nExample usage:\n$ python3 script/compress_data.py --dataset_dir /scr/lucyshi/dataset/aloha_test\n\"\"\"\nimport os\nimport h5py\nimport cv2\nimport numpy as np\nimport argparse\nfrom tqdm import tqdm\n# Constants\nDT = 0.02\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTATE_NAMES = JOINT_NAMES + [\"gripper\"]\nTRUNCATE_LEN = 2250\ndef compress_dataset(input_dataset_path, output_dataset_path):\n # Check if output path exists\n if os.path.exists(output_dataset_path):\n print(f\"The file {output_dataset_path} already exists. Exiting...\")\n return\n # Load the uncompressed dataset\n with h5py.File(input_dataset_path, 'r') as infile:\n # Create the compressed dataset\n with h5py.File(output_dataset_path, 'w') as outfile:\n outfile.attrs['sim'] = infile.attrs['sim']\n outfile.attrs['compress'] = True\n # Copy non-image data directly\n for key in infile.keys():\n if key != 'observations' and key != 'compress_len':"
+ },
+ {
+ "comment": "Truncates and compresses data, creates observation group with limited image data.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/truncate_data.py\":35-56",
+ "content": " data = infile[key][:TRUNCATE_LEN]\n out_data = outfile.create_dataset(key, (TRUNCATE_LEN, data.shape[1]))\n out_data[:] = data\n data_compress_len = infile['compress_len']\n out_data_compress_len = outfile.create_dataset('compress_len', data_compress_len.shape)\n out_data_compress_len[:] = data_compress_len\n # Create observation group in the output\n obs_group = infile['observations']\n out_obs_group = outfile.create_group('observations')\n for key in obs_group.keys():\n if key != 'images':\n data = obs_group[key][:TRUNCATE_LEN]\n out_data = out_obs_group.create_dataset(key, (TRUNCATE_LEN, data.shape[1]))\n out_data[:] = data\n image_group = obs_group['images']\n out_image_group = out_obs_group.create_group('images')\n for cam_name in image_group.keys():\n data = image_group[cam_name][:TRUNCATE_LEN]"
+ },
+ {
+ "comment": "This code saves a truncated dataset or video depending on the input format. If a list of videos is given, it extracts camera names, resizes the images, and concatenates them into a single video. It then writes the video to the specified path and prints a success message.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/truncate_data.py\":57-83",
+ "content": " out_data = out_image_group.create_dataset(cam_name, (TRUNCATE_LEN, data.shape[1]), dtype='uint8')\n out_data[:] = data\n print(f\"Truncated dataset saved to {output_dataset_path}\")\ndef save_videos(video, dt, video_path=None):\n if isinstance(video, list):\n cam_names = list(video[0].keys())\n h, w, _ = video[0][cam_names[0]].shape\n w = w * len(cam_names)\n fps = int(1/dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n # bitrate = 1000000\n # out.set(cv2.VIDEOWRITER_PROP_BITRATE, bitrate)\n for ts, image_dict in enumerate(video):\n images = []\n for cam_name in cam_names:\n image = image_dict[cam_name]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n images.append(image)\n images = np.concatenate(images, axis=1)\n out.write(images)\n out.release()\n print(f'Saved video to: {video_path}')\n elif isinstance(video, dict):"
+ },
+ {
+ "comment": "The code loads and saves a video from an HDF5 file. It first removes depth images, concatenates the remaining videos along the width dimension, converts the BGR image to RGB, then writes the video to a specified path at the given frame rate. The function `load_and_save_first_episode_video` calls other functions to load the dataset and save the video.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/truncate_data.py\":84-110",
+ "content": " cam_names = list(video.keys())\n # Remove depth images\n cam_names = [cam_name for cam_name in cam_names if '_depth' not in cam_name]\n all_cam_videos = []\n for cam_name in cam_names:\n all_cam_videos.append(video[cam_name])\n all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension\n n_frames, h, w, _ = all_cam_videos.shape\n fps = int(1 / dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n for t in range(n_frames):\n image = all_cam_videos[t]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n out.write(image)\n out.release()\n print(f'Saved video to: {video_path}')\ndef load_and_save_first_episode_video(dataset_dir, video_path):\n dataset_name = 'episode_0'\n _, _, _, _, image_dict = load_hdf5(dataset_dir, dataset_name)\n save_videos(image_dict, DT, video_path=video_path)\ndef load_hdf5(dataset_dir, dataset_name):\n dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')"
+ },
+ {
+ "comment": "This code checks if a dataset exists and reads the compressed image data from it. If compression is enabled, it decompresses the images and stores them in an image dictionary for further processing. The function returns only the image dictionary as the output.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/truncate_data.py\":111-134",
+ "content": " if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n compressed = root.attrs.get('compress', False)\n image_dict = dict()\n for cam_name in root[f'/observations/images/'].keys():\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]\n if compressed:\n compress_len = root['/compress_len'][()]\n if compressed:\n for cam_id, cam_name in enumerate(image_dict.keys()):\n padded_compressed_image_list = image_dict[cam_name]\n image_list = []\n for frame_id, padded_compressed_image in enumerate(padded_compressed_image_list):\n image_len = int(compress_len[cam_id, frame_id])\n compressed_image = padded_compressed_image\n image = cv2.imdecode(compressed_image, 1)\n image_list.append(image)\n image_dict[cam_name] = image_list\n return None, None, None, None, image_dict # Return only the image dict for this application"
+ },
+ {
+ "comment": "This code compresses all HDF5 datasets in a specified directory and saves the video for the first episode. It requires the 'act-plus-plus' library and utilizes argument parsing, file iteration, and directory creation/manipulation. The output dataset directory is created as a suffix of the input dataset directory with '_truncated'.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/truncate_data.py\":137-156",
+ "content": "if __name__ == '__main__':\n parser = argparse.ArgumentParser(description=\"Compress all HDF5 datasets in a directory.\")\n parser.add_argument('--dataset_dir', action='store', type=str, required=True, help='Directory containing the uncompressed datasets.')\n args = parser.parse_args()\n output_dataset_dir = args.dataset_dir + '_truncated'\n os.makedirs(output_dataset_dir, exist_ok=True)\n # Iterate over each file in the directory\n for filename in tqdm(os.listdir(args.dataset_dir), desc=\"Truncating data\"):\n if filename.endswith('.hdf5'):\n input_path = os.path.join(args.dataset_dir, filename)\n output_path = os.path.join(output_dataset_dir, filename)\n compress_dataset(input_path, output_path)\n # After processing all datasets, load and save the video for the first episode\n print(f'Saving video for episode 0 in {output_dataset_dir}')\n video_path = os.path.join(output_dataset_dir, 'episode_0_video.mp4')\n load_and_save_first_episode_video(output_dataset_dir, video_path)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/902d9bd9-43f3-4fc4-b249-a3e0ccf8c860.json b/docs/doc/902d9bd9-43f3-4fc4-b249-a3e0ccf8c860.json
new file mode 100644
index 00000000..cd5ae73d
--- /dev/null
+++ b/docs/doc/902d9bd9-43f3-4fc4-b249-a3e0ccf8c860.json
@@ -0,0 +1,70 @@
+{
+ "summary": "The code creates a function for a bi-manual robot environment, initializes tasks and robots, sets rewards, uses physics simulation, and derives the \"InsertionEETask\" class. It assigns fixed rewards of 4 to contact scenarios in peg insertion tasks.",
+ "details": [
+ {
+ "comment": "The code imports necessary libraries and defines a function `make_ee_sim_env(task_name)` that creates an environment for simulated robot bi-manual manipulation with end-effector control. The action space includes left and right arm pose, along with gripper positions for both arms.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":0-25",
+ "content": "import numpy as np\nimport collections\nimport os\nfrom constants import DT, XML_DIR, START_ARM_POSE\nfrom constants import PUPPET_GRIPPER_POSITION_CLOSE\nfrom constants import PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN\nfrom constants import PUPPET_GRIPPER_POSITION_NORMALIZE_FN\nfrom constants import PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN\nfrom utils import sample_box_pose, sample_insertion_pose\nfrom dm_control import mujoco\nfrom dm_control.rl import control\nfrom dm_control.suite import base\nimport IPython\ne = IPython.embed\ndef make_ee_sim_env(task_name):\n \"\"\"\n Environment for simulated robot bi-manual manipulation, with end-effector control.\n Action space: [left_arm_pose (7), # position and quaternion for end effector\n left_gripper_positions (1), # normalized gripper position (0: close, 1: open)\n right_arm_pose (7), # position and quaternion for end effector\n right_gripper_positions (1),] # normalized gripper position (0: close, 1: open)"
+ },
+ {
+ "comment": "The code defines the observation space for a simulation environment, including absolute joint positions and velocities for both left and right arms, gripper positions and velocities, and image data from a camera. This is likely used in a robotics control algorithm or reinforcement learning task. If \"sim_transfer_cube\" is in the task name, it suggests that the simulation involves transferring an object (possibly a cube) between the left and right arms.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":27-37",
+ "content": " Observation space: {\"qpos\": Concat[ left_arm_qpos (6), # absolute joint position\n left_gripper_position (1), # normalized gripper position (0: close, 1: open)\n right_arm_qpos (6), # absolute joint position\n right_gripper_qpos (1)] # normalized gripper position (0: close, 1: open)\n \"qvel\": Concat[ left_arm_qvel (6), # absolute joint velocity (rad)\n left_gripper_velocity (1), # normalized gripper velocity (pos: opening, neg: closing)\n right_arm_qvel (6), # absolute joint velocity (rad)\n right_gripper_qvel (1)] # normalized gripper velocity (pos: opening, neg: closing)\n \"images\": {\"main\": (480x640x3)} # h, w, c, dtype='uint8'\n \"\"\"\n if 'sim_transfer_cube' in task_name:"
+ },
+ {
+ "comment": "This code initializes an environment for a bimanual ViperX EE task, possibly either cube transfer or insertion. It joins the XML file path with the directory and loads the physics from the XML file. Then, it instantiates the specific task (TransferCubeEETask or InsertionEETask) based on the task name. Finally, it creates an environment object using the physics and task, setting the time limit, control timestep, and other options. If no matching task name is found, it raises a NotImplementedError. The BimanualViperXEETask class initializes the base task with an optional random parameter.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":38-60",
+ "content": " xml_path = os.path.join(XML_DIR, f'bimanual_viperx_ee_transfer_cube.xml')\n physics = mujoco.Physics.from_xml_path(xml_path)\n task = TransferCubeEETask(random=False)\n env = control.Environment(physics, task, time_limit=20, control_timestep=DT,\n n_sub_steps=None, flat_observation=False)\n elif 'sim_insertion' in task_name:\n xml_path = os.path.join(XML_DIR, f'bimanual_viperx_ee_insertion.xml')\n physics = mujoco.Physics.from_xml_path(xml_path)\n task = InsertionEETask(random=False)\n env = control.Environment(physics, task, time_limit=20, control_timestep=DT,\n n_sub_steps=None, flat_observation=False)\n else:\n raise NotImplementedError\n return env\nclass BimanualViperXEETask(base.Task):\n def __init__(self, random=None):\n super().__init__(random=random)\n def before_step(self, action, physics):\n a_len = len(action) // 2\n action_left = action[:a_len]\n action_right = action[a_len:]"
+ },
+ {
+ "comment": "This code initializes robots in the environment by resetting joint positions and setting mocap (motion capture) position and quaternion for left and right arms. It also sets gripper control values using a provided function, ensuring proper alignment between end effector and mocap data.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":62-83",
+ "content": " # set mocap position and quat\n # left\n np.copyto(physics.data.mocap_pos[0], action_left[:3])\n np.copyto(physics.data.mocap_quat[0], action_left[3:7])\n # right\n np.copyto(physics.data.mocap_pos[1], action_right[:3])\n np.copyto(physics.data.mocap_quat[1], action_right[3:7])\n # set gripper\n g_left_ctrl = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(action_left[7])\n g_right_ctrl = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(action_right[7])\n np.copyto(physics.data.ctrl, np.array([g_left_ctrl, -g_left_ctrl, g_right_ctrl, -g_right_ctrl]))\n def initialize_robots(self, physics):\n # reset joint position\n physics.named.data.qpos[:16] = START_ARM_POSE\n # reset mocap to align with end effector\n # to obtain these numbers:\n # (1) make an ee_sim env and reset to the same start_pose\n # (2) get env._physics.named.data.xpos['vx300s_left/gripper_link']\n # get env._physics.named.data.xquat['vx300s_left/gripper_link']"
+ },
+ {
+ "comment": "This code segment sets the initial positions, orientations, and gripper control for both left and right sides of a simulated robot arm. It also defines an initialize_episode function and a get_qpos static method in a class inheriting from an unspecified base class. The left and right positions are set using numpy's copyto() function, and the gripper control is initialized to close position.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":84-109",
+ "content": " # repeat the same for right side\n np.copyto(physics.data.mocap_pos[0], [-0.31718881+0.1, 0.5, 0.29525084])\n np.copyto(physics.data.mocap_quat[0], [1, 0, 0, 0])\n # right\n np.copyto(physics.data.mocap_pos[1], np.array([0.31718881-0.1, 0.49999888, 0.29525084]))\n np.copyto(physics.data.mocap_quat[1], [1, 0, 0, 0])\n # reset gripper control\n close_gripper_control = np.array([\n PUPPET_GRIPPER_POSITION_CLOSE,\n -PUPPET_GRIPPER_POSITION_CLOSE,\n PUPPET_GRIPPER_POSITION_CLOSE,\n -PUPPET_GRIPPER_POSITION_CLOSE,\n ])\n np.copyto(physics.data.ctrl, close_gripper_control)\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n super().initialize_episode(physics)\n @staticmethod\n def get_qpos(physics):\n qpos_raw = physics.data.qpos.copy()\n left_qpos_raw = qpos_raw[:8]\n right_qpos_raw = qpos_raw[8:16]\n left_arm_qpos = left_qpos_raw[:6]"
+ },
+ {
+ "comment": "The code defines functions to extract joint positions, velocities, and environment state from physics data. It normalizes gripper position and velocity values using the respective PUPPET_*_NORMALIZE_FN functions. The get_observation function combines left and right arm joint positions, gripper positions, and velocities into a concatenated numpy array. The code also includes an unimplemented get_env_state method.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":110-132",
+ "content": " right_arm_qpos = right_qpos_raw[:6]\n left_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(left_qpos_raw[6])]\n right_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(right_qpos_raw[6])]\n return np.concatenate([left_arm_qpos, left_gripper_qpos, right_arm_qpos, right_gripper_qpos])\n @staticmethod\n def get_qvel(physics):\n qvel_raw = physics.data.qvel.copy()\n left_qvel_raw = qvel_raw[:8]\n right_qvel_raw = qvel_raw[8:16]\n left_arm_qvel = left_qvel_raw[:6]\n right_arm_qvel = right_qvel_raw[:6]\n left_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(left_qvel_raw[6])]\n right_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(right_qvel_raw[6])]\n return np.concatenate([left_arm_qvel, left_gripper_qvel, right_arm_qvel, right_gripper_qvel])\n @staticmethod\n def get_env_state(physics):\n raise NotImplementedError\n def get_observation(self, physics):\n # note: it is important to do .copy()\n obs = collections.OrderedDict()"
+ },
+ {
+ "comment": "This code defines a class for an environment in which a robot arm needs to manipulate a cube. The environment is initialized and returns observation (obs) containing information about the state of the robot, images from different camera perspectives, starting pose of the left and right mocap hands, and gripper control data. It also defines a reward function that needs to be implemented for specific tasks within this environment. This class inherits from BimanualViperXEETask which is likely another class for similar environments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":133-154",
+ "content": " obs['qpos'] = self.get_qpos(physics)\n obs['qvel'] = self.get_qvel(physics)\n obs['env_state'] = self.get_env_state(physics)\n obs['images'] = dict()\n obs['images']['top'] = physics.render(height=480, width=640, camera_id='top')\n # obs['images']['angle'] = physics.render(height=480, width=640, camera_id='angle')\n # obs['images']['vis'] = physics.render(height=480, width=640, camera_id='front_close')\n # used in scripted policy to obtain starting pose\n obs['mocap_pose_left'] = np.concatenate([physics.data.mocap_pos[0], physics.data.mocap_quat[0]]).copy()\n obs['mocap_pose_right'] = np.concatenate([physics.data.mocap_pos[1], physics.data.mocap_quat[1]]).copy()\n # used when replaying joint trajectory\n obs['gripper_ctrl'] = physics.data.ctrl.copy()\n return obs\n def get_reward(self, physics):\n raise NotImplementedError\nclass TransferCubeEETask(BimanualViperXEETask):\n def __init__(self, random=None):\n super().__init__(random=random)"
+ },
+ {
+ "comment": "The code initializes the environment for each episode, randomizes the box position, and defines methods to get the environment state and reward in a physics simulation. The maximum reward is set to 4.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":155-180",
+ "content": " self.max_reward = 4\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n self.initialize_robots(physics)\n # randomize box position\n cube_pose = sample_box_pose()\n box_start_idx = physics.model.name2id('red_box_joint', 'joint')\n np.copyto(physics.data.qpos[box_start_idx : box_start_idx + 7], cube_pose)\n # print(f\"randomized cube position to {cube_position}\")\n super().initialize_episode(physics)\n @staticmethod\n def get_env_state(physics):\n env_state = physics.data.qpos.copy()[16:]\n return env_state\n def get_reward(self, physics):\n # return whether left gripper is holding the box\n all_contact_pairs = []\n for i_contact in range(physics.data.ncon):\n id_geom_1 = physics.data.contact[i_contact].geom1\n id_geom_2 = physics.data.contact[i_contact].geom2\n name_geom_1 = physics.model.id2name(id_geom_1, 'geom')\n name_geom_2 = physics.model.id2name(id_geom_2, 'geom')"
+ },
+ {
+ "comment": "The code defines a class called \"InsertionEETask\" which inherits from the \"BimanualViperXEETask\". This task seems to be related to manipulating objects in a simulation environment. It initializes the state of the environment at the start of each episode by calling the \"initialize_robots()\" function. The code checks for different contact scenarios and assigns corresponding rewards, ranging from 0 to 4. The maximum reward is set to 4.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":181-207",
+ "content": " contact_pair = (name_geom_1, name_geom_2)\n all_contact_pairs.append(contact_pair)\n touch_left_gripper = (\"red_box\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs\n touch_right_gripper = (\"red_box\", \"vx300s_right/10_right_gripper_finger\") in all_contact_pairs\n touch_table = (\"red_box\", \"table\") in all_contact_pairs\n reward = 0\n if touch_right_gripper:\n reward = 1\n if touch_right_gripper and not touch_table: # lifted\n reward = 2\n if touch_left_gripper: # attempted transfer\n reward = 3\n if touch_left_gripper and not touch_table: # successful transfer\n reward = 4\n return reward\nclass InsertionEETask(BimanualViperXEETask):\n def __init__(self, random=None):\n super().__init__(random=random)\n self.max_reward = 4\n def initialize_episode(self, physics):\n \"\"\"Sets the state of the environment at the start of each episode.\"\"\"\n self.initialize_robots(physics)"
+ },
+ {
+ "comment": "This code initializes the episode by randomizing the peg and socket positions in a physics simulation. It converts joint IDs to indices, sets the new positions for the peg and socket using numpy copyto function, and calls the superclass' initialize_episode method. It also includes a get_env_state function which returns the environment state from the physics data qpos array excluding the first 16 elements (robot qpos), and a placeholder get_reward function that will return whether the peg touches the pin in all contact pairs.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":208-231",
+ "content": " # randomize peg and socket position\n peg_pose, socket_pose = sample_insertion_pose()\n id2index = lambda j_id: 16 + (j_id - 16) * 7 # first 16 is robot qpos, 7 is pose dim # hacky\n peg_start_id = physics.model.name2id('red_peg_joint', 'joint')\n peg_start_idx = id2index(peg_start_id)\n np.copyto(physics.data.qpos[peg_start_idx : peg_start_idx + 7], peg_pose)\n # print(f\"randomized cube position to {cube_position}\")\n socket_start_id = physics.model.name2id('blue_socket_joint', 'joint')\n socket_start_idx = id2index(socket_start_id)\n np.copyto(physics.data.qpos[socket_start_idx : socket_start_idx + 7], socket_pose)\n # print(f\"randomized cube position to {cube_position}\")\n super().initialize_episode(physics)\n @staticmethod\n def get_env_state(physics):\n env_state = physics.data.qpos.copy()[16:]\n return env_state\n def get_reward(self, physics):\n # return whether peg touches the pin\n all_contact_pairs = []"
+ },
+ {
+ "comment": "This code checks for contact between various objects in a physics simulation. It iterates through all contacts, retrieves the associated geometries and converts their IDs to names. Then, it identifies if a red peg is touching the right gripper, and checks multiple conditions for left gripper and socket-peg interactions with the table.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":232-247",
+ "content": " for i_contact in range(physics.data.ncon):\n id_geom_1 = physics.data.contact[i_contact].geom1\n id_geom_2 = physics.data.contact[i_contact].geom2\n name_geom_1 = physics.model.id2name(id_geom_1, 'geom')\n name_geom_2 = physics.model.id2name(id_geom_2, 'geom')\n contact_pair = (name_geom_1, name_geom_2)\n all_contact_pairs.append(contact_pair)\n touch_right_gripper = (\"red_peg\", \"vx300s_right/10_right_gripper_finger\") in all_contact_pairs\n touch_left_gripper = (\"socket-1\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-2\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-3\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs or \\\n (\"socket-4\", \"vx300s_left/10_left_gripper_finger\") in all_contact_pairs\n peg_touch_table = (\"red_peg\", \"table\") in all_contact_pairs\n socket_touch_table = (\"socket-1\", \"table\") in all_contact_pairs or \\"
+ },
+ {
+ "comment": "This code determines the reward based on contact pairs. It checks for touching \"socket-1\" to \"table\", \"socket-2\" to \"table\", etc. It also checks if any of the pegs are touching a socket, table or both, and if the red peg is touching the pin. The reward is given based on these conditions. If both gripper touch something, it gives a reward of 1. If both gripper touches nothing but grasp something, reward is 2. If peg touches socket but not table, reward is 3. Finally, if any peg touches the pin, it's considered as successful insertion.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":248-264",
+ "content": " (\"socket-2\", \"table\") in all_contact_pairs or \\\n (\"socket-3\", \"table\") in all_contact_pairs or \\\n (\"socket-4\", \"table\") in all_contact_pairs\n peg_touch_socket = (\"red_peg\", \"socket-1\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-2\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-3\") in all_contact_pairs or \\\n (\"red_peg\", \"socket-4\") in all_contact_pairs\n pin_touched = (\"red_peg\", \"pin\") in all_contact_pairs\n reward = 0\n if touch_left_gripper and touch_right_gripper: # touch both\n reward = 1\n if touch_left_gripper and touch_right_gripper and (not peg_touch_table) and (not socket_touch_table): # grasp both\n reward = 2\n if peg_touch_socket and (not peg_touch_table) and (not socket_touch_table): # peg and socket touching\n reward = 3\n if pin_touched: # successful insertion"
+ },
+ {
+ "comment": "This code snippet assigns a fixed reward value of 4 and then returns it. This suggests the reward is determined solely by this function without any external factors influencing it.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/ee_sim_env.py\":265-266",
+ "content": " reward = 4\n return reward"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/95dbf634-c6f8-406d-bde4-51976e63f59c.json b/docs/doc/95dbf634-c6f8-406d-bde4-51976e63f59c.json
new file mode 100644
index 00000000..9a23bfd8
--- /dev/null
+++ b/docs/doc/95dbf634-c6f8-406d-bde4-51976e63f59c.json
@@ -0,0 +1,80 @@
+{
+ "summary": "EpisodicDataset class processes data, applies augmentations, handles legacy data, and provides torch tensor compatibility for model usage. It loads images, creates masks, retrieves stats, and includes functions for locating HDF5 files, generating batches, pre/post-processing, sampling poses, calculating means, and setting random seeds.",
+ "details": [
+ {
+ "comment": "Class EpisodicDataset loads episode data from a list of paths. It can optionally augment images depending on the chosen policy class. The dataset is initialized with the given parameters, including the number of episodes, their IDs, and lengths. It calculates the cumulative length of episodes and checks if the policy class is \"Diffusion\" to determine whether or not to apply image augmentations.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":0-32",
+ "content": "import numpy as np\nimport torch\nimport os\nimport h5py\nimport pickle\nimport fnmatch\nimport cv2\nfrom time import time\nfrom torch.utils.data import TensorDataset, DataLoader\nimport torchvision.transforms as transforms\nimport IPython\ne = IPython.embed\ndef flatten_list(l):\n return [item for sublist in l for item in sublist]\nclass EpisodicDataset(torch.utils.data.Dataset):\n def __init__(self, dataset_path_list, camera_names, norm_stats, episode_ids, episode_len, chunk_size, policy_class):\n super(EpisodicDataset).__init__()\n self.episode_ids = episode_ids\n self.dataset_path_list = dataset_path_list\n self.camera_names = camera_names\n self.norm_stats = norm_stats\n self.episode_len = episode_len\n self.chunk_size = chunk_size\n self.cumulative_len = np.cumsum(self.episode_len)\n self.max_episode_len = max(episode_len)\n self.policy_class = policy_class\n if self.policy_class == 'Diffusion':\n self.augment_images = True\n else:\n self.augment_images = False"
+ },
+ {
+ "comment": "This code initializes transformations and is_sim, defines a function to locate transition based on index, and gets item at specified index by locating the transition using episode ID and start timestamp. It also handles legacy data without certain attributes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":33-57",
+ "content": " self.transformations = None\n self.__getitem__(0) # initialize self.is_sim and self.transformations\n self.is_sim = False\n # def __len__(self):\n # return sum(self.episode_len)\n def _locate_transition(self, index):\n assert index < self.cumulative_len[-1]\n episode_index = np.argmax(self.cumulative_len > index) # argmax returns first True index\n start_ts = index - (self.cumulative_len[episode_index] - self.episode_len[episode_index])\n episode_id = self.episode_ids[episode_index]\n return episode_id, start_ts\n def __getitem__(self, index):\n episode_id, start_ts = self._locate_transition(index)\n dataset_path = self.dataset_path_list[episode_id]\n try:\n # print(dataset_path)\n with h5py.File(dataset_path, 'r') as root:\n try: # some legacy data does not have this attribute\n is_sim = root.attrs['sim']\n except:\n is_sim = False\n compressed = root.attrs.get('compress', False)"
+ },
+ {
+ "comment": "This code block is for processing the input data based on whether a base action is specified or not. If it exists, the base action is preprocessed and concatenated with the given action, otherwise a dummy base action is added before concatenation. It also stores the initial observation and image data at the start timestamp.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":58-76",
+ "content": " if '/base_action' in root:\n base_action = root['/base_action'][()]\n base_action = preprocess_base_action(base_action)\n action = np.concatenate([root['/action'][()], base_action], axis=-1)\n else: \n action = root['/action'][()]\n dummy_base_action = np.zeros([action.shape[0], 2])\n action = np.concatenate([action, dummy_base_action], axis=-1)\n original_action_shape = action.shape\n episode_len = original_action_shape[0]\n # get observation at start_ts only\n qpos = root['/observations/qpos'][start_ts]\n qvel = root['/observations/qvel'][start_ts]\n image_dict = dict()\n for cam_name in self.camera_names:\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][start_ts]\n if compressed:\n for cam_name in image_dict.keys():"
+ },
+ {
+ "comment": "This code segment is preprocessing video data for an agent in a simulation. It loads and decompresses images from the dictionary, adjusts actions based on timestamps, pads actions to match the maximum episode length, creates a padding mask, and stores camera images into a list.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":77-98",
+ "content": " decompressed_image = cv2.imdecode(image_dict[cam_name], 1)\n image_dict[cam_name] = np.array(decompressed_image)\n # get all actions after and including start_ts\n if is_sim:\n action = action[start_ts:]\n action_len = episode_len - start_ts\n else:\n action = action[max(0, start_ts - 1):] # hack, to make timesteps more aligned\n action_len = episode_len - max(0, start_ts - 1) # hack, to make timesteps more aligned\n # self.is_sim = is_sim\n padded_action = np.zeros((self.max_episode_len, original_action_shape[1]), dtype=np.float32)\n padded_action[:action_len] = action\n is_pad = np.zeros(self.max_episode_len)\n is_pad[action_len:] = 1\n padded_action = padded_action[:self.chunk_size]\n is_pad = is_pad[:self.chunk_size]\n # new axis for different cameras\n all_cam_images = []"
+ },
+ {
+ "comment": "The code reads images from multiple camera sources, stacks them into a single numpy array, and converts the arrays to torch tensors. It then rearranges the image tensor's dimensions for compatibility with the model, applies optional augmentations such as cropping and rotation, and assigns boolean values to indicate padding positions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":99-120",
+ "content": " for cam_name in self.camera_names:\n all_cam_images.append(image_dict[cam_name])\n all_cam_images = np.stack(all_cam_images, axis=0)\n # construct observations\n image_data = torch.from_numpy(all_cam_images)\n qpos_data = torch.from_numpy(qpos).float()\n action_data = torch.from_numpy(padded_action).float()\n is_pad = torch.from_numpy(is_pad).bool()\n # channel last\n image_data = torch.einsum('k h w c -> k c h w', image_data)\n # augmentation\n if self.transformations is None:\n print('Initializing transformations')\n original_size = image_data.shape[2:]\n ratio = 0.95\n self.transformations = [\n transforms.RandomCrop(size=[int(original_size[0] * ratio), int(original_size[1] * ratio)]),\n transforms.Resize(original_size, antialias=True),\n transforms.RandomRotation(degrees=[-5.0, 5.0], expand=False),"
+ },
+ {
+ "comment": "The code applies transformations to image data, normalizes the image and action data based on policy class, and adjusts qpos data based on mean and std. It also handles any potential errors while loading the dataset.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":121-144",
+ "content": " transforms.ColorJitter(brightness=0.3, contrast=0.4, saturation=0.5) #, hue=0.08)\n ]\n if self.augment_images:\n for transform in self.transformations:\n image_data = transform(image_data)\n # normalize image and change dtype to float\n image_data = image_data / 255.0\n if self.policy_class == 'Diffusion':\n # normalize to [-1, 1]\n action_data = ((action_data - self.norm_stats[\"action_min\"]) / (self.norm_stats[\"action_max\"] - self.norm_stats[\"action_min\"])) * 2 - 1\n else:\n # normalize to mean 0 std 1\n action_data = (action_data - self.norm_stats[\"action_mean\"]) / self.norm_stats[\"action_std\"]\n qpos_data = (qpos_data - self.norm_stats[\"qpos_mean\"]) / self.norm_stats[\"qpos_std\"]\n except:\n print(f'Error loading {dataset_path} in __getitem__')\n quit()\n # print(image_data.dtype, qpos_data.dtype, action_data.dtype, is_pad.dtype)"
+ },
+ {
+ "comment": "This function, \"get_norm_stats\", takes a list of dataset paths and returns image data, qpos data, action data, and an indicator whether the pad is needed or not. It first initializes empty lists for all_qpos_data, all_action_data, and all_episode_len. Then, it iterates over each dataset path in the list. For each path, it opens the HDF5 file using 'r' mode and extracts qpos and qvel data from specific paths within the file. If a '/base_action' path exists, it retrieves base_action data and preprocesses it before concatenating with action data. Otherwise, it assumes dummy base_action and performs concatenation. The extracted data is appended to their respective lists, but if an error occurs during loading, the function prints an error message and quits.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":145-170",
+ "content": " return image_data, qpos_data, action_data, is_pad\ndef get_norm_stats(dataset_path_list):\n all_qpos_data = []\n all_action_data = []\n all_episode_len = []\n for dataset_path in dataset_path_list:\n try:\n with h5py.File(dataset_path, 'r') as root:\n qpos = root['/observations/qpos'][()]\n qvel = root['/observations/qvel'][()]\n if '/base_action' in root:\n base_action = root['/base_action'][()]\n base_action = preprocess_base_action(base_action)\n action = np.concatenate([root['/action'][()], base_action], axis=-1)\n else:\n action = root['/action'][()]\n dummy_base_action = np.zeros([action.shape[0], 2])\n action = np.concatenate([action, dummy_base_action], axis=-1)\n except Exception as e:\n print(f'Error loading {dataset_path} in get_norm_stats')\n print(e)\n quit()\n all_qpos_data.append(torch.from_numpy(qpos))"
+ },
+ {
+ "comment": "This code is processing and normalizing data for training in a machine learning context. It appends action and qpos data, normalizes the action and qpos data by calculating their means, standard deviations, and clipping them to avoid large values, and stores these statistics along with minimum and maximum action values and an example qpos. Finally, it returns these statistics and all episode lengths.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":171-195",
+ "content": " all_action_data.append(torch.from_numpy(action))\n all_episode_len.append(len(qpos))\n all_qpos_data = torch.cat(all_qpos_data, dim=0)\n all_action_data = torch.cat(all_action_data, dim=0)\n # normalize action data\n action_mean = all_action_data.mean(dim=[0]).float()\n action_std = all_action_data.std(dim=[0]).float()\n action_std = torch.clip(action_std, 1e-2, np.inf) # clipping\n # normalize qpos data\n qpos_mean = all_qpos_data.mean(dim=[0]).float()\n qpos_std = all_qpos_data.std(dim=[0]).float()\n qpos_std = torch.clip(qpos_std, 1e-2, np.inf) # clipping\n action_min = all_action_data.min(dim=0).values.float()\n action_max = all_action_data.max(dim=0).values.float()\n eps = 0.0001\n stats = {\"action_mean\": action_mean.numpy(), \"action_std\": action_std.numpy(),\n \"action_min\": action_min.numpy() - eps,\"action_max\": action_max.numpy() + eps,\n \"qpos_mean\": qpos_mean.numpy(), \"qpos_std\": qpos_std.numpy(),\n \"example_qpos\": qpos}\n return stats, all_episode_len"
+ },
+ {
+ "comment": "The code provides two functions: \"find_all_hdf5\" and \"BatchSampler\". The first function searches for all HDF5 files in a specified directory, excluding any with 'features' in their name or 'mirror' if skipping mirrored data is set. It then returns the list of found files. The second function, BatchSampler, generates batches of samples from a list of episode lengths and sample weights (if provided). It randomly selects an episode, a step within that episode, and appends it to the batch until the desired batch size is reached.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":197-217",
+ "content": "def find_all_hdf5(dataset_dir, skip_mirrored_data):\n hdf5_files = []\n for root, dirs, files in os.walk(dataset_dir):\n for filename in fnmatch.filter(files, '*.hdf5'):\n if 'features' in filename: continue\n if skip_mirrored_data and 'mirror' in filename:\n continue\n hdf5_files.append(os.path.join(root, filename))\n print(f'Found {len(hdf5_files)} hdf5 files')\n return hdf5_files\ndef BatchSampler(batch_size, episode_len_l, sample_weights):\n sample_probs = np.array(sample_weights) / np.sum(sample_weights) if sample_weights is not None else None\n sum_dataset_len_l = np.cumsum([0] + [np.sum(episode_len) for episode_len in episode_len_l])\n while True:\n batch = []\n for _ in range(batch_size):\n episode_idx = np.random.choice(len(episode_len_l), p=sample_probs)\n step_idx = np.random.randint(sum_dataset_len_l[episode_idx], sum_dataset_len_l[episode_idx + 1])\n batch.append(step_idx)\n yield batch"
+ },
+ {
+ "comment": "This function loads data from one or multiple directories, applying a name filter and splitting the data into training and validation sets. It also supports skipping mirrored data and loading pre-trained data. The train/val split is done based on a provided ratio, and the data is shuffled randomly before splitting.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":219-232",
+ "content": "def load_data(dataset_dir_l, name_filter, camera_names, batch_size_train, batch_size_val, chunk_size, skip_mirrored_data=False, load_pretrain=False, policy_class=None, stats_dir_l=None, sample_weights=None, train_ratio=0.99):\n if type(dataset_dir_l) == str:\n dataset_dir_l = [dataset_dir_l]\n dataset_path_list_list = [find_all_hdf5(dataset_dir, skip_mirrored_data) for dataset_dir in dataset_dir_l]\n num_episodes_0 = len(dataset_path_list_list[0])\n dataset_path_list = flatten_list(dataset_path_list_list)\n dataset_path_list = [n for n in dataset_path_list if name_filter(n)]\n num_episodes_l = [len(dataset_path_list) for dataset_path_list in dataset_path_list_list]\n num_episodes_cumsum = np.cumsum(num_episodes_l)\n # obtain train test split on dataset_dir_l[0]\n shuffled_episode_ids_0 = np.random.permutation(num_episodes_0)\n train_episode_ids_0 = shuffled_episode_ids_0[:int(train_ratio * num_episodes_0)]\n val_episode_ids_0 = shuffled_episode_ids_0[int(train_ratio * num_episodes_0):]"
+ },
+ {
+ "comment": "Code generates train and validation episode IDs for multiple datasets, concatenates them, and prints details about the data. It also loads normalization stats for qpos and action (if load_pretrain is True) from a specific file path. The code then calculates the length of each episode for training and validation sets based on all_episode_len list.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":233-246",
+ "content": " train_episode_ids_l = [train_episode_ids_0] + [np.arange(num_episodes) + num_episodes_cumsum[idx] for idx, num_episodes in enumerate(num_episodes_l[1:])]\n val_episode_ids_l = [val_episode_ids_0]\n train_episode_ids = np.concatenate(train_episode_ids_l)\n val_episode_ids = np.concatenate(val_episode_ids_l)\n print(f'\\n\\nData from: {dataset_dir_l}\\n- Train on {[len(x) for x in train_episode_ids_l]} episodes\\n- Test on {[len(x) for x in val_episode_ids_l]} episodes\\n\\n')\n # obtain normalization stats for qpos and action\n # if load_pretrain:\n # with open(os.path.join('/home/zfu/interbotix_ws/src/act/ckpts/pretrain_all', 'dataset_stats.pkl'), 'rb') as f:\n # norm_stats = pickle.load(f)\n # print('Loaded pretrain dataset stats')\n _, all_episode_len = get_norm_stats(dataset_path_list)\n train_episode_len_l = [[all_episode_len[i] for i in train_episode_ids] for train_episode_ids in train_episode_ids_l]\n val_episode_len_l = [[all_episode_len[i] for i in val_episode_ids] for val_episode_ids in val_episode_ids_l]"
+ },
+ {
+ "comment": "This code block initializes training and validation episode lengths, checks the stats directory type, fetches normalization statistics from HDF5 files, creates batch samplers for training and validation sets, constructs EpisodicDataset instances for training and validation data.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":247-263",
+ "content": " train_episode_len = flatten_list(train_episode_len_l)\n val_episode_len = flatten_list(val_episode_len_l)\n if stats_dir_l is None:\n stats_dir_l = dataset_dir_l\n elif type(stats_dir_l) == str:\n stats_dir_l = [stats_dir_l]\n norm_stats, _ = get_norm_stats(flatten_list([find_all_hdf5(stats_dir, skip_mirrored_data) for stats_dir in stats_dir_l]))\n print(f'Norm stats from: {stats_dir_l}')\n batch_sampler_train = BatchSampler(batch_size_train, train_episode_len_l, sample_weights)\n batch_sampler_val = BatchSampler(batch_size_val, val_episode_len_l, None)\n # print(f'train_episode_len: {train_episode_len}, val_episode_len: {val_episode_len}, train_episode_ids: {train_episode_ids}, val_episode_ids: {val_episode_ids}')\n # construct dataset and dataloader\n train_dataset = EpisodicDataset(dataset_path_list, camera_names, norm_stats, train_episode_ids, train_episode_len, chunk_size, policy_class)\n val_dataset = EpisodicDataset(dataset_path_list, camera_names, norm_stats, val_episode_ids, val_episode_len, chunk_size, policy_class)"
+ },
+ {
+ "comment": "This code sets the number of workers for training and validation data loaders based on whether images are being augmented or not. It also defines a function to calibrate linear velocity, smooths the base action using convolution with a moving average filter, and returns the train and validation dataloaders along with other variables.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":264-283",
+ "content": " train_num_workers = (8 if os.getlogin() == 'zfu' else 16) if train_dataset.augment_images else 2\n val_num_workers = 8 if train_dataset.augment_images else 2\n print(f'Augment images: {train_dataset.augment_images}, train_num_workers: {train_num_workers}, val_num_workers: {val_num_workers}')\n train_dataloader = DataLoader(train_dataset, batch_sampler=batch_sampler_train, pin_memory=True, num_workers=train_num_workers, prefetch_factor=2)\n val_dataloader = DataLoader(val_dataset, batch_sampler=batch_sampler_val, pin_memory=True, num_workers=val_num_workers, prefetch_factor=2)\n return train_dataloader, val_dataloader, norm_stats, train_dataset.is_sim\ndef calibrate_linear_vel(base_action, c=None):\n if c is None:\n c = 0.0 # 0.19\n v = base_action[..., 0]\n w = base_action[..., 1]\n base_action = base_action.copy()\n base_action[..., 0] = v - c * w\n return base_action\ndef smooth_base_action(base_action):\n return np.stack([\n np.convolve(base_action[:, i], np.ones(5)/5, mode='same') for i in range(base_action.shape[1])"
+ },
+ {
+ "comment": "This code defines several functions for preprocessing and postprocessing base actions, as well as sampling random poses for objects. It uses numpy array manipulations and random sampling to accomplish these tasks. The calibration and smoothing of the base action are used to refine input data before it is passed on or returned from a function. The two pose-sampling functions generate random positions and orientations for an object (cube or peg) within specified ranges.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":284-323",
+ "content": " ], axis=-1).astype(np.float32)\ndef preprocess_base_action(base_action):\n # base_action = calibrate_linear_vel(base_action)\n base_action = smooth_base_action(base_action)\n return base_action\ndef postprocess_base_action(base_action):\n linear_vel, angular_vel = base_action\n linear_vel *= 1.0\n angular_vel *= 1.0\n # angular_vel = 0\n # if np.abs(linear_vel) < 0.05:\n # linear_vel = 0\n return np.array([linear_vel, angular_vel])\n### env utils\ndef sample_box_pose():\n x_range = [0.0, 0.2]\n y_range = [0.4, 0.6]\n z_range = [0.05, 0.05]\n ranges = np.vstack([x_range, y_range, z_range])\n cube_position = np.random.uniform(ranges[:, 0], ranges[:, 1])\n cube_quat = np.array([1, 0, 0, 0])\n return np.concatenate([cube_position, cube_quat])\ndef sample_insertion_pose():\n # Peg\n x_range = [0.1, 0.2]\n y_range = [0.4, 0.6]\n z_range = [0.05, 0.05]\n ranges = np.vstack([x_range, y_range, z_range])\n peg_position = np.random.uniform(ranges[:, 0], ranges[:, 1])\n peg_quat = np.array([1, 0, 0, 0])"
+ },
+ {
+ "comment": "Function: compute_dict_mean\nPurpose: Calculate the mean of values for each key in a list of dictionaries.\n\nFunction: detach_dict\nPurpose: Create a new dictionary where all values are detached from their current computation graph.\n\nFunction: set_seed\nPurpose: Set random seed for both PyTorch and NumPy to ensure reproducible results.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/utils.py\":324-359",
+ "content": " peg_pose = np.concatenate([peg_position, peg_quat])\n # Socket\n x_range = [-0.2, -0.1]\n y_range = [0.4, 0.6]\n z_range = [0.05, 0.05]\n ranges = np.vstack([x_range, y_range, z_range])\n socket_position = np.random.uniform(ranges[:, 0], ranges[:, 1])\n socket_quat = np.array([1, 0, 0, 0])\n socket_pose = np.concatenate([socket_position, socket_quat])\n return peg_pose, socket_pose\n### helper functions\ndef compute_dict_mean(epoch_dicts):\n result = {k: None for k in epoch_dicts[0]}\n num_items = len(epoch_dicts)\n for k in result:\n value_sum = 0\n for epoch_dict in epoch_dicts:\n value_sum += epoch_dict[k]\n result[k] = value_sum / num_items\n return result\ndef detach_dict(d):\n new_d = dict()\n for k, v in d.items():\n new_d[k] = v.detach()\n return new_d\ndef set_seed(seed):\n torch.manual_seed(seed)\n np.random.seed(seed)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/a0556bb0-cf54-4bde-8083-aa4ee6740ee1.json b/docs/doc/a0556bb0-cf54-4bde-8083-aa4ee6740ee1.json
new file mode 100644
index 00000000..4786f8f6
--- /dev/null
+++ b/docs/doc/a0556bb0-cf54-4bde-8083-aa4ee6740ee1.json
@@ -0,0 +1,10 @@
+{
+ "summary": "This code imports DynamixelClient and creates an instance with IDs 1 and 2, connects to the '/dev/ttyDXL_wheels' port in a non-blocking manner. It then prints the current position, velocity, and current information of the connected motors.",
+ "details": [
+ {
+ "comment": "This code imports DynamixelClient and creates an instance with IDs 1 and 2, connects to the '/dev/ttyDXL_wheels' port in a non-blocking manner. It then prints the current position, velocity, and current information of the connected motors.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/dxl_test.py\":0-3",
+ "content": "from dynamixel_client import DynamixelClient\nclient = DynamixelClient([1, 2], port='/dev/ttyDXL_wheels', lazy_connect=True)\nprint(client.read_pos_vel_cur())"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/bd818ef2-b15d-49da-9128-3134dac15a8e.json b/docs/doc/bd818ef2-b15d-49da-9128-3134dac15a8e.json
new file mode 100644
index 00000000..4c117c98
--- /dev/null
+++ b/docs/doc/bd818ef2-b15d-49da-9128-3134dac15a8e.json
@@ -0,0 +1,35 @@
+{
+ "summary": "The code imports libraries, sets parameters, initializes models and preprocesses images for feature extraction. It performs inference, saves features to an HDF5 file, converts tensors to NumPy arrays, and prints the total time taken using argument parser.",
+ "details": [
+ {
+ "comment": "This code imports necessary libraries, defines functions for chunking lists and expanding greyscale images, and sets parameters such as batch size. It also takes command-line arguments for the checkpoint path and dataset directory, extracts relevant information from the checkpoint name, and lists all episode indexes in the dataset.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_cache_feature.py\":0-43",
+ "content": "import torch\nimport argparse\nimport pathlib\nfrom torch import nn\nimport torchvision\nimport os\nimport time\nimport h5py\nimport h5py\nfrom torchvision import models, transforms\nfrom PIL import Image\nfrom tqdm import tqdm\nimport cv2\nimport numpy as np\nimport IPython\ne = IPython.embed\ndef chunks(lst, n):\n \"\"\"Yield successive n-sized chunks from lst.\"\"\"\n for i in range(0, len(lst), n):\n yield lst[i:i + n]\ndef expand_greyscale(t):\n return t.expand(3, -1, -1)\ndef main(args):\n #################################################\n batch_size = 256\n #################################################\n ckpt_path = args.ckpt_path\n dataset_dir = args.dataset_dir\n ckpt_name = pathlib.PurePath(ckpt_path).name\n dataset_name = ckpt_name.split('-')[1]\n repr_type = ckpt_name.split('-')[0]\n seed = int(ckpt_name.split('-')[-1][:-3])\n if 'cotrain' in ckpt_name:\n repr_type += '_cotrain'\n episode_idxs = [int(name.split('_')[1].split('.')[0]) for name in os.listdir(dataset_dir) if ('.hdf5' in name) and ('features' not in name)]"
+ },
+ {
+ "comment": "Loading data and models for each episode, ensuring no holes in the episode indices, and creating feature extractors. The code first checks if there are any existing feature extractors, then loads images and models for each camera name within the dataset, and stores them in a dictionary.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_cache_feature.py\":44-71",
+ "content": " episode_idxs.sort()\n assert len(episode_idxs) == episode_idxs[-1] + 1 # no holes\n num_episodes = len(episode_idxs)\n feature_extractors = {}\n for episode_idx in range(num_episodes):\n # load all images\n print(f'loading data')\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_idx}.hdf5')\n with h5py.File(dataset_path, 'r') as root:\n image_dict = {}\n camera_names = list(root[f'/observations/images/'].keys())\n print(f'Camera names: {camera_names}')\n for cam_name in camera_names:\n image = root[f'/observations/images/{cam_name}'][:]\n uncompressed_image = []\n for im in image:\n im = np.array(cv2.imdecode(im, 1))\n uncompressed_image.append(im)\n image = np.stack(uncompressed_image, axis=0)\n image_dict[cam_name] = image\n print(f'loading model')\n # load pretrain nets after cam names are known\n if not feature_extractors:"
+ },
+ {
+ "comment": "This code initializes a ResNet18 model for each camera name, loads the checkpoint file with the corresponding camera name, modifies the model, and stores it in feature_extractors. Then, it preprocesses images using specified transforms and normalization before passing them to the model for inference.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_cache_feature.py\":72-92",
+ "content": " for cam_name in camera_names:\n resnet = torchvision.models.resnet18(pretrained=True)\n loading_status = resnet.load_state_dict(torch.load(ckpt_path.replace('DUMMY', cam_name)))\n print(cam_name, loading_status)\n resnet = nn.Sequential(*list(resnet.children())[:-1])\n resnet = resnet.cuda()\n resnet.eval()\n feature_extractors[cam_name] = resnet\n # inference with resnet\n feature_dict = {}\n for cam_name, images in image_dict.items():\n # Preprocess images\n image_size = 120 # TODO NOTICE: reduced resolution\n transform = transforms.Compose([\n transforms.Resize(image_size), # will scale the image\n transforms.CenterCrop(image_size),\n transforms.ToTensor(),\n transforms.Lambda(expand_greyscale),\n transforms.Normalize(\n mean=torch.tensor([0.485, 0.456, 0.406]),"
+ },
+ {
+ "comment": "This code processes images, queries a model for features, and stores the extracted features in a dictionary. It uses torch.tensor for standardization, Image.fromarray to convert image to PIL image, transforms images, stacks them, performs inference mode, extracts features from each batch of processed images, concatenates them into all_features list, and finally stores them in feature_dict.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_cache_feature.py\":93-116",
+ "content": " std=torch.tensor([0.229, 0.224, 0.225])),\n ])\n processed_images = []\n for image in tqdm(images):\n image = Image.fromarray(image)\n image = transform(image)\n processed_images.append(image)\n processed_images = torch.stack(processed_images).cuda()\n # query the model\n all_features = []\n with torch.inference_mode():\n for batch in chunks(processed_images, batch_size):\n print('inference')\n features = feature_extractors[cam_name](batch)\n features = features.squeeze(axis=3).squeeze(axis=2)\n all_features.append(features)\n all_features = torch.cat(all_features, axis=0)\n max_timesteps = all_features.shape[0]\n feature_dict[cam_name] = all_features\n # TODO START diagnostics\n # first_image = images[0]\n # first_processed_image = processed_images[0].cpu().numpy()"
+ },
+ {
+ "comment": "The code is saving features to an HDF5 file. It creates a group called 'features' within the file and then saves feature data for each camera name in the feature_dict as datasets within the 'features' group. The feature data is converted from PyTorch tensors to NumPy arrays before being saved, and the total time taken to save the features is printed.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_cache_feature.py\":117-141",
+ "content": " # first_feature = all_features[0].cpu().numpy()\n # import numpy as np\n # np.save('first_image.npy', first_image)\n # np.save('first_processed_image.npy', first_processed_image)\n # np.save('first_feature.npy', first_feature)\n # torch.save(resnet.state_dict(), 'rn.ckpt')\n # e()\n # exit()\n # TODO END diagnostics\n # save\n dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_idx}.hdf5')\n print(dataset_path)\n # HDF5\n t0 = time.time()\n with h5py.File(dataset_path, 'w', rdcc_nbytes=1024 ** 2 * 2) as root:\n features = root.create_group('features')\n for cam_name, array in feature_dict.items():\n cam_feature = features.create_dataset(cam_name, (max_timesteps, 512))\n features[cam_name][...] = array.cpu().numpy()\n print(f'Saving: {time.time() - t0:.1f} secs\\n')\nif __name__ == '__main__':"
+ },
+ {
+ "comment": "This code sets up an argument parser, adds arguments for ckpt_path and dataset_dir with necessary types and requirements, and then parses the given arguments to be used in the main function.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/vinn_cache_feature.py\":142-147",
+ "content": " parser = argparse.ArgumentParser(description='cache features')\n parser.add_argument('--ckpt_path', type=str, required=True, help='ckpt_path')\n parser.add_argument('--dataset_dir', type=str, required=True, help='dataset_dir')\n args = parser.parse_args()\n main(args)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/be5aa4ad-78d6-43e7-9242-66ebe1cc3c05.json b/docs/doc/be5aa4ad-78d6-43e7-9242-66ebe1cc3c05.json
new file mode 100644
index 00000000..ab06d7a7
--- /dev/null
+++ b/docs/doc/be5aa4ad-78d6-43e7-9242-66ebe1cc3c05.json
@@ -0,0 +1,45 @@
+{
+ "summary": "The code imports libraries, initializes policy and environment, iterates over time steps, takes actions, updates state, determines success, and evaluates simulation episodes, storing data in an HDF5 file for visualization or analysis. It also creates datasets from camera images, qpos, and actions.",
+ "details": [
+ {
+ "comment": "The code imports necessary libraries, defines the main function to generate demonstration data in simulation. It first rolls out policy in ee_sim_env and obtains joint trajectory, then replaces gripper joint positions with commanded positions. Finally, it replay joint trajectory in sim_env and record observations for each episode before saving the dataset.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/record_sim_episodes.py\":0-32",
+ "content": "import time\nimport os\nimport numpy as np\nimport argparse\nimport matplotlib.pyplot as plt\nimport h5py\nfrom constants import PUPPET_GRIPPER_POSITION_NORMALIZE_FN, SIM_TASK_CONFIGS\nfrom ee_sim_env import make_ee_sim_env\nfrom sim_env import make_sim_env, BOX_POSE\nfrom scripted_policy import PickAndTransferPolicy, InsertionPolicy\nimport IPython\ne = IPython.embed\ndef main(args):\n \"\"\"\n Generate demonstration data in simulation.\n First rollout the policy (defined in ee space) in ee_sim_env. Obtain the joint trajectory.\n Replace the gripper joint positions with the commanded joint position.\n Replay this joint trajectory (as action sequence) in sim_env, and record all observations.\n Save this episode of data, and continue to next episode of data collection.\n \"\"\"\n task_name = args['task_name']\n dataset_dir = args['dataset_dir']\n num_episodes = args['num_episodes']\n onscreen_render = args['onscreen_render']\n inject_noise = False\n render_cam_name = 'top'\n if not os.path.isdir(dataset_dir):"
+ },
+ {
+ "comment": "This code snippet is creating a new directory for the dataset, setting up the episode length and camera names based on the task name, and then initializing the policy class depending on the task. It also creates an empty list for success and starts a loop for each episode where it sets up the environment, resets the environment, creates an episode list with the first observation, initializes the policy, and then starts another loop to iterate through steps in each episode.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/record_sim_episodes.py\":33-60",
+ "content": " os.makedirs(dataset_dir, exist_ok=True)\n episode_len = SIM_TASK_CONFIGS[task_name]['episode_len']\n camera_names = SIM_TASK_CONFIGS[task_name]['camera_names']\n if task_name == 'sim_transfer_cube_scripted':\n policy_cls = PickAndTransferPolicy\n elif task_name == 'sim_insertion_scripted':\n policy_cls = InsertionPolicy\n elif task_name == 'sim_transfer_cube_scripted_mirror':\n policy_cls = PickAndTransferPolicy\n else:\n raise NotImplementedError\n success = []\n for episode_idx in range(num_episodes):\n print(f'{episode_idx=}')\n print('Rollout out EE space scripted policy')\n # setup the environment\n env = make_ee_sim_env(task_name)\n ts = env.reset()\n episode = [ts]\n policy = policy_cls(inject_noise)\n # setup plotting\n if onscreen_render:\n ax = plt.subplot()\n plt_img = ax.imshow(ts.observation['images'][render_cam_name])\n plt.ion()\n for step in range(episode_len):"
+ },
+ {
+ "comment": "This code is iterating over each time step in the episode, taking actions based on a policy, updating the state, and appending the state to the episode list. It also renders images for each state if the onscreen_render flag is set. It calculates the episode return and maximum reward, then prints whether the episode was successful or not. Finally, it extracts joint and gripper control trajectories from the episode and applies normalization to gripper positions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/record_sim_episodes.py\":61-83",
+ "content": " action = policy(ts)\n ts = env.step(action)\n episode.append(ts)\n if onscreen_render:\n plt_img.set_data(ts.observation['images'][render_cam_name])\n plt.pause(0.002)\n plt.close()\n episode_return = np.sum([ts.reward for ts in episode[1:]])\n episode_max_reward = np.max([ts.reward for ts in episode[1:]])\n if episode_max_reward == env.task.max_reward:\n print(f\"{episode_idx=} Successful, {episode_return=}\")\n else:\n print(f\"{episode_idx=} Failed\")\n joint_traj = [ts.observation['qpos'] for ts in episode]\n # replace gripper pose with gripper control\n gripper_ctrl_traj = [ts.observation['gripper_ctrl'] for ts in episode]\n for joint, ctrl in zip(joint_traj, gripper_ctrl_traj):\n left_ctrl = PUPPET_GRIPPER_POSITION_NORMALIZE_FN(ctrl[0])\n right_ctrl = PUPPET_GRIPPER_POSITION_NORMALIZE_FN(ctrl[2])\n joint[6] = left_ctrl\n joint[6+7] = right_ctrl"
+ },
+ {
+ "comment": "This code is replaying joint commands from a previous episode. It first saves the initial box pose, clears unused variables, sets up the environment, and resets it. Then, for each joint command in the trajectory, it performs an action in the environment and appends the new state to the episode_replay list. If onscreen_render is True, it updates a plot with the current observation image. Finally, it calculates the total reward from the episode and stores it as episode_return.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/record_sim_episodes.py\":85-112",
+ "content": " subtask_info = episode[0].observation['env_state'].copy() # box pose at step 0\n # clear unused variables\n del env\n del episode\n del policy\n # setup the environment\n print('Replaying joint commands')\n env = make_sim_env(task_name)\n BOX_POSE[0] = subtask_info # make sure the sim_env has the same object configurations as ee_sim_env\n ts = env.reset()\n episode_replay = [ts]\n # setup plotting\n if onscreen_render:\n ax = plt.subplot()\n plt_img = ax.imshow(ts.observation['images'][render_cam_name])\n plt.ion()\n for t in range(len(joint_traj)): # note: this will increase episode length by 1\n action = joint_traj[t]\n ts = env.step(action)\n episode_replay.append(ts)\n if onscreen_render:\n plt_img.set_data(ts.observation['images'][render_cam_name])\n plt.pause(0.02)\n episode_return = np.sum([ts.reward for ts in episode_replay[1:]])"
+ },
+ {
+ "comment": "This code measures the success of each episode in a simulation by checking if the maximum reward reached the maximum possible reward. If it did, the episode is considered successful and printed as such; otherwise, it's considered a failure. The code also collects observations and actions into a data dictionary for potential visualization or analysis purposes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/record_sim_episodes.py\":113-144",
+ "content": " episode_max_reward = np.max([ts.reward for ts in episode_replay[1:]])\n if episode_max_reward == env.task.max_reward:\n success.append(1)\n print(f\"{episode_idx=} Successful, {episode_return=}\")\n else:\n success.append(0)\n print(f\"{episode_idx=} Failed\")\n plt.close()\n \"\"\"\n For each timestep:\n observations\n - images\n - each_cam_name (480, 640, 3) 'uint8'\n - qpos (14,) 'float64'\n - qvel (14,) 'float64'\n action (14,) 'float64'\n \"\"\"\n data_dict = {\n '/observations/qpos': [],\n '/observations/qvel': [],\n '/action': [],\n }\n for cam_name in camera_names:\n data_dict[f'/observations/images/{cam_name}'] = []\n # because the replaying, there will be eps_len + 1 actions and eps_len + 2 timesteps\n # truncate here to be consistent\n joint_traj = joint_traj[:-1]"
+ },
+ {
+ "comment": "This code segment is part of a function that processes episode data from a simulation and saves it as an HDF5 file. It extracts observations, actions, and camera images from the episode replay and stores them in the dictionary \"data_dict\". After processing all timesteps, it creates an HDF5 file with the episode data, including attributes and groups for observations and images.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/record_sim_episodes.py\":145-166",
+ "content": " episode_replay = episode_replay[:-1]\n # len(joint_traj) i.e. actions: max_timesteps\n # len(episode_replay) i.e. time steps: max_timesteps + 1\n max_timesteps = len(joint_traj)\n while joint_traj:\n action = joint_traj.pop(0)\n ts = episode_replay.pop(0)\n data_dict['/observations/qpos'].append(ts.observation['qpos'])\n data_dict['/observations/qvel'].append(ts.observation['qvel'])\n data_dict['/action'].append(action)\n for cam_name in camera_names:\n data_dict[f'/observations/images/{cam_name}'].append(ts.observation['images'][cam_name])\n # HDF5\n t0 = time.time()\n dataset_path = os.path.join(dataset_dir, f'episode_{episode_idx}')\n with h5py.File(dataset_path + '.hdf5', 'w', rdcc_nbytes=1024 ** 2 * 2) as root:\n root.attrs['sim'] = True\n obs = root.create_group('observations')\n image = obs.create_group('images')\n for cam_name in camera_names:"
+ },
+ {
+ "comment": "This code creates datasets for camera images, qpos, qvel, and actions in a specific order. It then assigns the array values to corresponding names within the root dataset. Finally, it provides statistics on the saving time, saved location, and success rate of the task. The code assumes 'max_timesteps', 'data_dict', 'cam_name' and 'obs' are predefined variables.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/record_sim_episodes.py\":167-185",
+ "content": " _ = image.create_dataset(cam_name, (max_timesteps, 480, 640, 3), dtype='uint8',\n chunks=(1, 480, 640, 3), )\n # compression='gzip',compression_opts=2,)\n # compression=32001, compression_opts=(0, 0, 0, 0, 9, 1, 1), shuffle=False)\n qpos = obs.create_dataset('qpos', (max_timesteps, 14))\n qvel = obs.create_dataset('qvel', (max_timesteps, 14))\n action = root.create_dataset('action', (max_timesteps, 14))\n for name, array in data_dict.items():\n root[name][...] = array\n print(f'Saving: {time.time() - t0:.1f} secs\\n')\n print(f'Saved to {dataset_dir}')\n print(f'Success: {np.sum(success)} / {len(success)}')\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--dataset_dir', action='store', type=str, help='dataset saving dir', required=True)"
+ },
+ {
+ "comment": "The code above adds command line arguments for the number of episodes and on-screen rendering to a parser. The 'num_episodes' argument is of type int, required=False, and helps specify the number of episodes to run. The 'onscreen_render' argument, when set to true, enables on-screen rendering during game playback. The main function takes the arguments parsed by the parser object to execute the program.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/record_sim_episodes.py\":186-189",
+ "content": " parser.add_argument('--num_episodes', action='store', type=int, help='num_episodes', required=False)\n parser.add_argument('--onscreen_render', action='store_true')\n main(vars(parser.parse_args()))"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/bf83c79f-d679-48b7-b3e1-b0cadbe952ee.json b/docs/doc/bf83c79f-d679-48b7-b3e1-b0cadbe952ee.json
new file mode 100644
index 00000000..632859c1
--- /dev/null
+++ b/docs/doc/bf83c79f-d679-48b7-b3e1-b0cadbe952ee.json
@@ -0,0 +1,40 @@
+{
+ "summary": "This script uses argparse to control options for a deep learning model's transformer detector, initializing the model on GPU and creating an AdamW optimizer before returning the model and optimizer.",
+ "details": [
+ {
+ "comment": "This code imports necessary libraries and functions, defines a parser for command-line arguments, and sets default values for those arguments. It also includes options to customize the backbone model, learning rates, and weight decay for training a transformer detector.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/main.py\":0-24",
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport argparse\nfrom pathlib import Path\nimport numpy as np\nimport torch\nfrom .models import build_ACT_model, build_CNNMLP_model\nimport IPython\ne = IPython.embed\ndef get_args_parser():\n parser = argparse.ArgumentParser('Set transformer detector', add_help=False)\n parser.add_argument('--lr', default=1e-4, type=float) # will be overridden\n parser.add_argument('--lr_backbone', default=1e-5, type=float) # will be overridden\n parser.add_argument('--batch_size', default=2, type=int) # not used\n parser.add_argument('--weight_decay', default=1e-4, type=float)\n parser.add_argument('--epochs', default=300, type=int) # not used\n parser.add_argument('--lr_drop', default=200, type=int) # not used\n parser.add_argument('--clip_max_norm', default=0.1, type=float, # not used\n help='gradient clipping max norm')\n # Model parameters\n # * Backbone\n parser.add_argument('--backbone', default='resnet18', type=str, # will be overridden"
+ },
+ {
+ "comment": "This code is defining command line arguments for the main function of a deep learning model. The options include specifying the backbone, enabling dilation in the last convolutional block, choosing the type of positional embedding, and setting the number of encoding and decoding layers as well as the feedforward dimension size in the transformer component of the model.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/main.py\":25-39",
+ "content": " help=\"Name of the convolutional backbone to use\")\n parser.add_argument('--dilation', action='store_true',\n help=\"If true, we replace stride with dilation in the last convolutional block (DC5)\")\n parser.add_argument('--position_embedding', default='sine', type=str, choices=('sine', 'learned'),\n help=\"Type of positional embedding to use on top of the image features\")\n parser.add_argument('--camera_names', default=[], type=list, # will be overridden\n help=\"A list of camera names\")\n # * Transformer\n parser.add_argument('--enc_layers', default=4, type=int, # will be overridden\n help=\"Number of encoding layers in the transformer\")\n parser.add_argument('--dec_layers', default=6, type=int, # will be overridden\n help=\"Number of decoding layers in the transformer\")\n parser.add_argument('--dim_feedforward', default=2048, type=int, # will be overridden\n"
+ },
+ {
+ "comment": "This code is using the argparse module to define command line arguments for a Python script. The arguments include options such as intermediate layer size, hidden dimensions, dropout rate, number of attention heads, number of query slots, pre-normalization, and training segmentation head. The `eval` argument is used to evaluate the model.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/main.py\":39-55",
+ "content": " help=\"Intermediate size of the feedforward layers in the transformer blocks\")\n parser.add_argument('--hidden_dim', default=256, type=int, # will be overridden\n help=\"Size of the embeddings (dimension of the transformer)\")\n parser.add_argument('--dropout', default=0.1, type=float,\n help=\"Dropout applied in the transformer\")\n parser.add_argument('--nheads', default=8, type=int, # will be overridden\n help=\"Number of attention heads inside the transformer's attentions\")\n parser.add_argument('--num_queries', default=400, type=int, # will be overridden\n help=\"Number of query slots\")\n parser.add_argument('--pre_norm', action='store_true')\n # * Segmentation\n parser.add_argument('--masks', action='store_true',\n help=\"Train segmentation head if the flag is provided\")\n # repeat args in imitate_episodes just to avoid error. Will not be used\n parser.add_argument('--eval', action='store_true')"
+ },
+ {
+ "comment": "The code defines command-line arguments using the \"argparse\" module. It requires a directory for checkpoints, policy class name, task name, seed value, number of steps, and optional arguments like KL weight, chunk size, temporal aggregation, use VQ, VQ class, and VQ dimension.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/main.py\":56-68",
+ "content": " parser.add_argument('--onscreen_render', action='store_true')\n parser.add_argument('--ckpt_dir', action='store', type=str, help='ckpt_dir', required=True)\n parser.add_argument('--policy_class', action='store', type=str, help='policy_class, capitalize', required=True)\n parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--seed', action='store', type=int, help='seed', required=True)\n parser.add_argument('--num_steps', action='store', type=int, help='num_epochs', required=True)\n parser.add_argument('--kl_weight', action='store', type=int, help='KL Weight', required=False)\n parser.add_argument('--chunk_size', action='store', type=int, help='chunk_size', required=False)\n parser.add_argument('--temporal_agg', action='store_true')\n parser.add_argument('--use_vq', action='store_true')\n parser.add_argument('--vq_class', action='store', type=int, help='vq_class', required=False)\n parser.add_argument('--vq_dim', action='store', type=int, help='vq_dim', required=False)"
+ },
+ {
+ "comment": "The code snippet is from a Python script that uses the 'argparse' module to add various command-line arguments with default values, types, and help messages. These arguments control options such as loading pre-trained data, action dimension, evaluation intervals, validation intervals, saving intervals, resuming from a checkpoint file path, skipping mirrored data, and specifying network directories for actuators.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/main.py\":69-80",
+ "content": " parser.add_argument('--load_pretrain', action='store_true', default=False)\n parser.add_argument('--action_dim', action='store', type=int, required=False)\n parser.add_argument('--eval_every', action='store', type=int, default=500, help='eval_every', required=False)\n parser.add_argument('--validate_every', action='store', type=int, default=500, help='validate_every', required=False)\n parser.add_argument('--save_every', action='store', type=int, default=500, help='save_every', required=False)\n parser.add_argument('--resume_ckpt_path', action='store', type=str, help='load_ckpt_path', required=False)\n parser.add_argument('--no_encoder', action='store_true')\n parser.add_argument('--skip_mirrored_data', action='store_true')\n parser.add_argument('--actuator_network_dir', action='store', type=str, help='actuator_network_dir', required=False)\n parser.add_argument('--history_len', action='store', type=int)\n parser.add_argument('--future_len', action='store', type=int)\n parser.add_argument('--prediction_len', action='store', type=int)"
+ },
+ {
+ "comment": "This code defines functions `build_ACT_model_and_optimizer` and `build_CNNMLP_model_and_optimizer`. The functions parse arguments for DETR training and evaluation script, build the respective models, and set up AdamW optimizers with specified learning rates and weight decay.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/main.py\":82-115",
+ "content": " return parser\ndef build_ACT_model_and_optimizer(args_override):\n parser = argparse.ArgumentParser('DETR training and evaluation script', parents=[get_args_parser()])\n args = parser.parse_args()\n for k, v in args_override.items():\n setattr(args, k, v)\n model = build_ACT_model(args)\n model.cuda()\n param_dicts = [\n {\"params\": [p for n, p in model.named_parameters() if \"backbone\" not in n and p.requires_grad]},\n {\n \"params\": [p for n, p in model.named_parameters() if \"backbone\" in n and p.requires_grad],\n \"lr\": args.lr_backbone,\n },\n ]\n optimizer = torch.optim.AdamW(param_dicts, lr=args.lr,\n weight_decay=args.weight_decay)\n return model, optimizer\ndef build_CNNMLP_model_and_optimizer(args_override):\n parser = argparse.ArgumentParser('DETR training and evaluation script', parents=[get_args_parser()])\n args = parser.parse_args()\n for k, v in args_override.items():\n setattr(args, k, v)\n model = build_CNNMLP_model(args)"
+ },
+ {
+ "comment": "The code initializes the model on GPU, separates backbone and non-backbone parameters into two dictionaries for different learning rates, creates an AdamW optimizer with specified learning rate and weight decay, and returns the model and optimizer.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/main.py\":116-128",
+ "content": " model.cuda()\n param_dicts = [\n {\"params\": [p for n, p in model.named_parameters() if \"backbone\" not in n and p.requires_grad]},\n {\n \"params\": [p for n, p in model.named_parameters() if \"backbone\" in n and p.requires_grad],\n \"lr\": args.lr_backbone,\n },\n ]\n optimizer = torch.optim.AdamW(param_dicts, lr=args.lr,\n weight_decay=args.weight_decay)\n return model, optimizer"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/d57440ff-c372-4ddc-83ff-6bca097ba32c.json b/docs/doc/d57440ff-c372-4ddc-83ff-6bca097ba32c.json
new file mode 100644
index 00000000..c5e89510
--- /dev/null
+++ b/docs/doc/d57440ff-c372-4ddc-83ff-6bca097ba32c.json
@@ -0,0 +1,10 @@
+{
+ "summary": "The code snippet appears to be incomplete or empty. There is no visible functionality that can be described or commented upon in this context. Please provide more information or a complete code sample for accurate analysis and commenting.",
+ "details": [
+ {
+ "comment": "The code snippet appears to be incomplete or empty. There is no visible functionality that can be described or commented upon in this context. Please provide more information or a complete code sample for accurate analysis and commenting.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/__init__.py\":0-0",
+ "content": "w"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/dcb8b74a-7974-41a3-85da-c1cb52853eed.json b/docs/doc/dcb8b74a-7974-41a3-85da-c1cb52853eed.json
new file mode 100644
index 00000000..4b891641
--- /dev/null
+++ b/docs/doc/dcb8b74a-7974-41a3-85da-c1cb52853eed.json
@@ -0,0 +1,20 @@
+{
+ "summary": "This code contains functions for bounding box manipulation and GIoU, including coordinate system conversion utilities, IOU calculation, modified torchvision box_iou function, and two functions for computing mask coordinates.",
+ "details": [
+ {
+ "comment": "This code is from the \"act-plus-plus/detr/util/box_ops.py\" file and contains functions for bounding box manipulation and GIoU (Generalized Intersection over Union). The code includes utilities to convert between (cxcywh) and (xyxy) coordinate systems, and calculate the IOU (Intersection Over Union) and Generalized Box IOU between two sets of boxes. It also includes a modified version of torchvision's box_iou function that returns the union as well.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/box_ops.py\":0-40",
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nUtilities for bounding box manipulation and GIoU.\n\"\"\"\nimport torch\nfrom torchvision.ops.boxes import box_area\ndef box_cxcywh_to_xyxy(x):\n x_c, y_c, w, h = x.unbind(-1)\n b = [(x_c - 0.5 * w), (y_c - 0.5 * h),\n (x_c + 0.5 * w), (y_c + 0.5 * h)]\n return torch.stack(b, dim=-1)\ndef box_xyxy_to_cxcywh(x):\n x0, y0, x1, y1 = x.unbind(-1)\n b = [(x0 + x1) / 2, (y0 + y1) / 2,\n (x1 - x0), (y1 - y0)]\n return torch.stack(b, dim=-1)\n# modified from torchvision to also return the union\ndef box_iou(boxes1, boxes2):\n area1 = box_area(boxes1)\n area2 = box_area(boxes2)\n lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # [N,M,2]\n rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # [N,M,2]\n wh = (rb - lt).clamp(min=0) # [N,M,2]\n inter = wh[:, :, 0] * wh[:, :, 1] # [N,M]\n union = area1[:, None] + area2 - inter\n iou = inter / union\n return iou, union\ndef generalized_box_iou(boxes1, boxes2):\n \"\"\""
+ },
+ {
+ "comment": "The code snippet contains two functions: \"generalized_iou\" and \"masks_to_boxes\". The first function calculates a pairwise matrix of Intersection over Union (IoU) between two sets of bounding boxes, taking into account degenerate cases. It asserts that the boxes are in the correct format and computes the IoU and union area between boxes. The second function takes a set of masks and returns the corresponding bounding boxes in xyxy format. It checks if the mask tensor is empty and then calculates the y-coordinates for the bounding boxes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/box_ops.py\":41-75",
+ "content": " Generalized IoU from https://giou.stanford.edu/\n The boxes should be in [x0, y0, x1, y1] format\n Returns a [N, M] pairwise matrix, where N = len(boxes1)\n and M = len(boxes2)\n \"\"\"\n # degenerate boxes gives inf / nan results\n # so do an early check\n assert (boxes1[:, 2:] >= boxes1[:, :2]).all()\n assert (boxes2[:, 2:] >= boxes2[:, :2]).all()\n iou, union = box_iou(boxes1, boxes2)\n lt = torch.min(boxes1[:, None, :2], boxes2[:, :2])\n rb = torch.max(boxes1[:, None, 2:], boxes2[:, 2:])\n wh = (rb - lt).clamp(min=0) # [N,M,2]\n area = wh[:, :, 0] * wh[:, :, 1]\n return iou - (area - union) / area\ndef masks_to_boxes(masks):\n \"\"\"Compute the bounding boxes around the provided masks\n The masks should be in format [N, H, W] where N is the number of masks, (H, W) are the spatial dimensions.\n Returns a [N, 4] tensors, with the boxes in xyxy format\n \"\"\"\n if masks.numel() == 0:\n return torch.zeros((0, 4), device=masks.device)\n h, w = masks.shape[-2:]\n y = torch.arange(0, h, dtype=torch.float)"
+ },
+ {
+ "comment": "Computes the minimum and maximum x,y coordinates within masks using meshgrid and masked fill operations, then stacks them into a tensor.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/box_ops.py\":76-87",
+ "content": " x = torch.arange(0, w, dtype=torch.float)\n y, x = torch.meshgrid(y, x)\n x_mask = (masks * x.unsqueeze(0))\n x_max = x_mask.flatten(1).max(-1)[0]\n x_min = x_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]\n y_mask = (masks * y.unsqueeze(0))\n y_max = y_mask.flatten(1).max(-1)[0]\n y_min = y_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]\n return torch.stack([x_min, y_min, x_max, y_max], 1)"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/e760b2b5-71f7-4912-aab1-d855338f4a1b.json b/docs/doc/e760b2b5-71f7-4912-aab1-d855338f4a1b.json
new file mode 100644
index 00000000..4e23e83d
--- /dev/null
+++ b/docs/doc/e760b2b5-71f7-4912-aab1-d855338f4a1b.json
@@ -0,0 +1,30 @@
+{
+ "summary": "This code imports libraries, defines functions for loading data from HDF5 files and generates a timestamp plot for camera frames using timestamps, converting them to float values, calculating time differences, and saving the resulting plot.",
+ "details": [
+ {
+ "comment": "This code imports necessary libraries, defines a list of joint names and state names, and contains two functions. The `load_hdf5` function loads dataset from hdf5 file, retrieves qpos, qvel, action, and image data. It returns these values. The `main` function takes arguments for dataset directory and episode index, but does not contain any code within it. The joint names likely represent different body parts' movement data in a robotics or simulation context.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/visualize_episodes.py\":0-34",
+ "content": "import os\nimport numpy as np\nimport cv2\nimport h5py\nimport argparse\nimport matplotlib.pyplot as plt\nfrom constants import DT\nimport IPython\ne = IPython.embed\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTATE_NAMES = JOINT_NAMES + [\"gripper\"]\ndef load_hdf5(dataset_dir, dataset_name):\n dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')\n if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n is_sim = root.attrs['sim']\n qpos = root['/observations/qpos'][()]\n qvel = root['/observations/qvel'][()]\n action = root['/action'][()]\n image_dict = dict()\n for cam_name in root[f'/observations/images/'].keys():\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]\n return qpos, qvel, action, image_dict\ndef main(args):\n dataset_dir = args['dataset_dir']\n episode_idx = args['episode_idx']"
+ },
+ {
+ "comment": "The code loads data from an HDF5 file based on a boolean mirror flag, then saves the images as videos and visualizes joint positions. The video saving function takes in a list of images and writes them to a file with a specified fourcc code and framerate.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/visualize_episodes.py\":35-59",
+ "content": " ismirror = args['ismirror']\n if ismirror:\n dataset_name = f'mirror_episode_{episode_idx}'\n else:\n dataset_name = f'episode_{episode_idx}'\n qpos, qvel, action, image_dict = load_hdf5(dataset_dir, dataset_name)\n save_videos(image_dict, DT, video_path=os.path.join(dataset_dir, dataset_name + '_video.mp4'))\n visualize_joints(qpos, action, plot_path=os.path.join(dataset_dir, dataset_name + '_qpos.png'))\n # visualize_timestamp(t_list, dataset_path) # TODO addn timestamp back\ndef save_videos(video, dt, video_path=None):\n if isinstance(video, list):\n cam_names = list(video[0].keys())\n cam_names = sorted(cam_names)\n h, w, _ = video[0][cam_names[0]].shape\n w = w * len(cam_names)\n fps = int(1/dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n for ts, image_dict in enumerate(video):\n images = []\n for cam_name in cam_names:\n image = image_dict[cam_name]\n image = image[:, :, [2, 1, 0]] # swap B and R channel"
+ },
+ {
+ "comment": "Code snippet handles saving a video by either concatenating images or concatenating multiple videos horizontally, then writing to a file. It also has functionality for visualizing joint positions over time and plots them if necessary with optional custom labels.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/visualize_episodes.py\":60-86",
+ "content": " images.append(image)\n images = np.concatenate(images, axis=1)\n out.write(images)\n out.release()\n print(f'Saved video to: {video_path}')\n elif isinstance(video, dict):\n cam_names = list(video.keys())\n cam_names = sorted(cam_names)\n all_cam_videos = []\n for cam_name in cam_names:\n all_cam_videos.append(video[cam_name])\n all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension\n n_frames, h, w, _ = all_cam_videos.shape\n fps = int(1 / dt)\n out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n for t in range(n_frames):\n image = all_cam_videos[t]\n image = image[:, :, [2, 1, 0]] # swap B and R channel\n out.write(image)\n out.release()\n print(f'Saved video to: {video_path}')\ndef visualize_joints(qpos_list, command_list, plot_path=None, ylim=None, label_overwrite=None):\n if label_overwrite:\n label1, label2 = label_overwrite"
+ },
+ {
+ "comment": "This code visualizes the joint state and arm command over time for a given set of timestamps. It first converts the provided data into numpy arrays and creates subplots for each dimension. Then, it plots the joint state and arm command values against timestamps for each dimension. Optionally, it sets the y-axis limits. Finally, it saves the resulting plot as an image and prints its location.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/visualize_episodes.py\":87-122",
+ "content": " else:\n label1, label2 = 'State', 'Command'\n qpos = np.array(qpos_list) # ts, dim\n command = np.array(command_list)\n num_ts, num_dim = qpos.shape\n h, w = 2, num_dim\n num_figs = num_dim\n fig, axs = plt.subplots(num_figs, 1, figsize=(w, h * num_figs))\n # plot joint state\n all_names = [name + '_left' for name in STATE_NAMES] + [name + '_right' for name in STATE_NAMES]\n for dim_idx in range(num_dim):\n ax = axs[dim_idx]\n ax.plot(qpos[:, dim_idx], label=label1)\n ax.set_title(f'Joint {dim_idx}: {all_names[dim_idx]}')\n ax.legend()\n # plot arm command\n for dim_idx in range(num_dim):\n ax = axs[dim_idx]\n ax.plot(command[:, dim_idx], label=label2)\n ax.legend()\n if ylim:\n for dim_idx in range(num_dim):\n ax = axs[dim_idx]\n ax.set_ylim(ylim)\n plt.tight_layout()\n plt.savefig(plot_path)\n print(f'Saved qpos plot to: {plot_path}')\n plt.close()\ndef visualize_timestamp(t_list, dataset_path):\n plot_path = dataset_path.replace('.pkl', '_timestamp.png')"
+ },
+ {
+ "comment": "This code generates a timestamp plot for camera frames from a given dataset. It reads the timestamps, converts them to float values, plots them against timesteps, and calculates the time difference between consecutive timestamps. The resulting plot is saved and the file path is printed. The code expects the dataset directory, episode index, and a flag for mirror augmentation as input arguments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/visualize_episodes.py\":123-153",
+ "content": " h, w = 4, 10\n fig, axs = plt.subplots(2, 1, figsize=(w, h*2))\n # process t_list\n t_float = []\n for secs, nsecs in t_list:\n t_float.append(secs + nsecs * 10E-10)\n t_float = np.array(t_float)\n ax = axs[0]\n ax.plot(np.arange(len(t_float)), t_float)\n ax.set_title(f'Camera frame timestamps')\n ax.set_xlabel('timestep')\n ax.set_ylabel('time (sec)')\n ax = axs[1]\n ax.plot(np.arange(len(t_float)-1), t_float[:-1] - t_float[1:])\n ax.set_title(f'dt')\n ax.set_xlabel('timestep')\n ax.set_ylabel('time (sec)')\n plt.tight_layout()\n plt.savefig(plot_path)\n print(f'Saved timestamp plot to: {plot_path}')\n plt.close()\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--dataset_dir', action='store', type=str, help='Dataset dir.', required=True)\n parser.add_argument('--episode_idx', action='store', type=int, help='Episode index.', required=False)\n parser.add_argument('--ismirror', action='store_true')\n main(vars(parser.parse_args()))"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/ead63cfe-5616-420e-8de0-99573c1d448c.json b/docs/doc/ead63cfe-5616-420e-8de0-99573c1d448c.json
new file mode 100644
index 00000000..b95fe290
--- /dev/null
+++ b/docs/doc/ead63cfe-5616-420e-8de0-99573c1d448c.json
@@ -0,0 +1,45 @@
+{
+ "summary": "The code imports libraries, loads data, processes episode information, scales actions, compresses images with JPEG quality 50, and saves in HDF5 format. It generates datasets for image variables and populates root dataset from data_dict.",
+ "details": [
+ {
+ "comment": "This code imports necessary libraries and defines constants for a robotics data processing script. It loads data from .hdf5 files, including robot joint positions and velocities, as well as actions performed by the robot.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/postprocess_episodes.py\":0-32",
+ "content": "import os\nimport numpy as np\nimport cv2\nimport h5py\nimport argparse\nimport time\nfrom visualize_episodes import visualize_joints, visualize_timestamp, save_videos\nimport matplotlib.pyplot as plt\nfrom constants import DT\nimport IPython\ne = IPython.embed\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTATE_NAMES = JOINT_NAMES + [\"gripper\"]\nMIRROR_STATE_MULTIPLY = np.array([-1, 1, 1, -1, 1, -1, 1]).astype('float32')\nMIRROR_BASE_MULTIPLY = np.array([1, -1]).astype('float32')\ndef load_hdf5(dataset_dir, dataset_name):\n dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')\n if not os.path.isfile(dataset_path):\n print(f'Dataset does not exist at \\n{dataset_path}\\n')\n exit()\n with h5py.File(dataset_path, 'r') as root:\n is_sim = root.attrs['sim']\n compressed = root.attrs.get('compress', False)\n qpos = root['/observations/qpos'][()]\n qvel = root['/observations/qvel'][()]\n action = root['/action'][()]\n image_dict = dict()"
+ },
+ {
+ "comment": "Iterates through image keys, stores in image_dict.\nChecks if base_action exists and assigns value accordingly.\nIf compressed, un-pads and uncompresses images, stores in image_dict.\nReturns various variables including base_action and image_dict.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/postprocess_episodes.py\":33-59",
+ "content": " for cam_name in root[f'/observations/images/'].keys():\n image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]\n if 'base_action' in root.keys():\n print('base_action exists')\n base_action = root['/base_action'][()]\n else:\n base_action = None\n if compressed:\n compress_len = root['/compress_len'][()]\n if compressed:\n for cam_id, cam_name in enumerate(image_dict.keys()):\n # un-pad and uncompress\n padded_compressed_image_list = image_dict[cam_name]\n image_list = []\n for padded_compressed_image in padded_compressed_image_list: # [:1000] to save memory\n image = cv2.imdecode(padded_compressed_image, 1)\n image_list.append(image)\n image_dict[cam_name] = np.array(image_list)\n return qpos, qvel, action, base_action, image_dict, is_sim\ndef main(args):\n dataset_dir = args['dataset_dir']\n num_episodes = args['num_episodes']\n start_idx = 0"
+ },
+ {
+ "comment": "This code is part of a function that loads and processes episode data from HDF5 files. It iterates over multiple episodes, concatenating mirrored proprioception and action data, and optionally scales the base action. If any images with 'left_wrist' or 'cam_left_wrist' keys exist in the image dictionary, it swaps their positions for mirroring purposes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/postprocess_episodes.py\":60-76",
+ "content": " for episode_idx in range(start_idx, start_idx + num_episodes):\n dataset_name = f'episode_{episode_idx}'\n qpos, qvel, action, base_action, image_dict, is_sim = load_hdf5(dataset_dir, dataset_name)\n # process proprioception\n qpos = np.concatenate([qpos[:, 7:] * MIRROR_STATE_MULTIPLY, qpos[:, :7] * MIRROR_STATE_MULTIPLY], axis=1)\n qvel = np.concatenate([qvel[:, 7:] * MIRROR_STATE_MULTIPLY, qvel[:, :7] * MIRROR_STATE_MULTIPLY], axis=1)\n action = np.concatenate([action[:, 7:] * MIRROR_STATE_MULTIPLY, action[:, :7] * MIRROR_STATE_MULTIPLY], axis=1)\n if base_action is not None:\n base_action = base_action * MIRROR_BASE_MULTIPLY\n # mirror image obs\n if 'left_wrist' in image_dict.keys():\n image_dict['left_wrist'], image_dict['right_wrist'] = image_dict['right_wrist'][:, :, ::-1], image_dict['left_wrist'][:, :, ::-1]\n elif 'cam_left_wrist' in image_dict.keys():\n image_dict['cam_left_wrist'], image_dict['"
+ },
+ {
+ "comment": "This code checks for specific keys in the image_dict and adjusts the values if necessary. If 'left_wrist' or 'cam_left_wrist' is present, it flips the image. It also handles if 'top' or 'cam_high' are present, flipping them accordingly. Then, it creates a data_dict with necessary keys ('/observations/qpos', '/observations/qvel', '/action', and '/base_action') for saving. Finally, it loops through the image_dict to add its contents as key-value pairs in the data_dict, and sets max_timesteps as the length of qpos. The code uses compression while saving.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/postprocess_episodes.py\":76-102",
+ "content": "cam_right_wrist'] = image_dict['cam_right_wrist'][:, :, ::-1], image_dict['cam_left_wrist'][:, :, ::-1]\n else:\n raise Exception('No left_wrist or cam_left_wrist in image_dict')\n if 'top' in image_dict.keys():\n image_dict['top'] = image_dict['top'][:, :, ::-1]\n elif 'cam_high' in image_dict.keys():\n image_dict['cam_high'] = image_dict['cam_high'][:, :, ::-1]\n else:\n raise Exception('No top or cam_high in image_dict')\n # saving\n data_dict = {\n '/observations/qpos': qpos,\n '/observations/qvel': qvel,\n '/action': action,\n '/base_action': base_action,\n } if base_action is not None else {\n '/observations/qpos': qpos,\n '/observations/qvel': qvel,\n '/action': action,\n }\n for cam_name in image_dict.keys():\n data_dict[f'/observations/images/{cam_name}'] = image_dict[cam_name]\n max_timesteps = len(qpos)\n COMPRESS = True"
+ },
+ {
+ "comment": "This code compresses images using JPEG compression with a quality level of 50, stores the compressed images in the data dictionary, and measures the time taken for the compression process.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/postprocess_episodes.py\":104-124",
+ "content": " if COMPRESS:\n # JPEG compression\n t0 = time.time()\n encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 50] # tried as low as 20, seems fine\n compressed_len = []\n for cam_name in image_dict.keys():\n image_list = data_dict[f'/observations/images/{cam_name}']\n compressed_list = []\n compressed_len.append([])\n for image in image_list:\n result, encoded_image = cv2.imencode('.jpg', image, encode_param) # 0.02 sec # cv2.imdecode(encoded_image, 1)\n compressed_list.append(encoded_image)\n compressed_len[-1].append(len(encoded_image))\n data_dict[f'/observations/images/{cam_name}'] = compressed_list\n print(f'compression: {time.time() - t0:.2f}s')\n # pad so it has same length\n t0 = time.time()\n compressed_len = np.array(compressed_len)\n padded_size = compressed_len.max()\n for cam_name in image_dict.keys():"
+ },
+ {
+ "comment": "This code is padding compressed images, adding them to the data dictionary, and saving the dataset in HDF5 format. The padding ensures all images have the same length for consistency in the HDF5 file. It also records the time taken to pad the images.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/postprocess_episodes.py\":125-142",
+ "content": " compressed_image_list = data_dict[f'/observations/images/{cam_name}']\n padded_compressed_image_list = []\n for compressed_image in compressed_image_list:\n padded_compressed_image = np.zeros(padded_size, dtype='uint8')\n image_len = len(compressed_image)\n padded_compressed_image[:image_len] = compressed_image\n padded_compressed_image_list.append(padded_compressed_image)\n data_dict[f'/observations/images/{cam_name}'] = padded_compressed_image_list\n print(f'padding: {time.time() - t0:.2f}s')\n # HDF5\n t0 = time.time()\n dataset_path = os.path.join(dataset_dir, f'mirror_episode_{episode_idx}')\n with h5py.File(dataset_path + '.hdf5', 'w', rdcc_nbytes=1024 ** 2 * 2) as root:\n root.attrs['sim'] = is_sim\n root.attrs['compress'] = COMPRESS\n obs = root.create_group('observations')\n image = obs.create_group('images')"
+ },
+ {
+ "comment": "This code creates datasets for image data and other variables, based on whether to compress or not. It also creates datasets for qpos, qvel, action, and base_action if they are not None. Additionally, it populates the root dataset with data from the data_dict and creates a 'compress_len' dataset if compression is enabled.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/postprocess_episodes.py\":143-161",
+ "content": " for cam_name in image_dict.keys():\n if COMPRESS:\n _ = image.create_dataset(cam_name, (max_timesteps, padded_size), dtype='uint8',\n chunks=(1, padded_size), )\n else:\n _ = image.create_dataset(cam_name, (max_timesteps, 480, 640, 3), dtype='uint8',\n chunks=(1, 480, 640, 3), )\n qpos = obs.create_dataset('qpos', (max_timesteps, 14))\n qvel = obs.create_dataset('qvel', (max_timesteps, 14))\n action = root.create_dataset('action', (max_timesteps, 14))\n if base_action is not None:\n base_action = root.create_dataset('base_action', (max_timesteps, 2))\n for name, array in data_dict.items():\n root[name][...] = array\n if COMPRESS:\n _ = root.create_dataset('compress_len', (len(image_dict.keys()), max_timesteps))\n root['/compress_len'][...] = compressed_len"
+ },
+ {
+ "comment": "The code snippet saves the dataset, prints the time taken for the process, and has options to save videos and visualize joints. The user is required to specify the dataset directory and the number of episodes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/postprocess_episodes.py\":163-174",
+ "content": " print(f'Saving {dataset_path}: {time.time() - t0:.1f} secs\\n')\n if episode_idx == start_idx:\n save_videos(image_dict, DT, video_path=os.path.join(dataset_dir, dataset_name + '_mirror_video.mp4'))\n # visualize_joints(qpos, action, plot_path=os.path.join(dataset_dir, dataset_name + '_mirror_qpos.png'))\n # visualize_timestamp(t_list, dataset_path) # TODO addn timestamp back\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--dataset_dir', action='store', type=str, help='Dataset dir.', required=True)\n parser.add_argument('--num_episodes', action='store', type=int, help='Number of episodes.', required=True)\n main(vars(parser.parse_args()))"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/eadbdecf-ab15-4002-8de8-e5c78ebe1b9b.json b/docs/doc/eadbdecf-ab15-4002-8de8-e5c78ebe1b9b.json
new file mode 100644
index 00000000..d1732c85
--- /dev/null
+++ b/docs/doc/eadbdecf-ab15-4002-8de8-e5c78ebe1b9b.json
@@ -0,0 +1,150 @@
+{
+ "summary": "This program trains a policy network for robot control using reinforcement learning, VQ-VAE implementation, and behavioral cloning, while logging data, saving checkpoints, and validating performance.",
+ "details": [
+ {
+ "comment": "This code imports necessary libraries and defines functions for a reinforcement learning task. It sets up the environment, loads data, and initializes policy models. The `get_auto_index` function is used to find the next available index in the dataset directory.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":0-34",
+ "content": "import torch\nimport numpy as np\nimport os\nimport pickle\nimport argparse\nimport matplotlib.pyplot as plt\nfrom copy import deepcopy\nfrom itertools import repeat\nfrom tqdm import tqdm\nfrom einops import rearrange\nimport wandb\nimport time\nfrom torchvision import transforms\nfrom constants import FPS\nfrom constants import PUPPET_GRIPPER_JOINT_OPEN\nfrom utils import load_data # data functions\nfrom utils import sample_box_pose, sample_insertion_pose # robot functions\nfrom utils import compute_dict_mean, set_seed, detach_dict, calibrate_linear_vel, postprocess_base_action # helper functions\nfrom policy import ACTPolicy, CNNMLPPolicy, DiffusionPolicy\nfrom visualize_episodes import save_videos\nfrom detr.models.latent_model import Latent_Model_Transformer\nfrom sim_env import BOX_POSE\nimport IPython\ne = IPython.embed\ndef get_auto_index(dataset_dir):\n max_idx = 1000\n for i in range(max_idx+1):\n if not os.path.isfile(os.path.join(dataset_dir, f'qpos_{i}.npy')):\n return i\n raise Exception(f\"Error getting auto index, or more than {max_idx} episodes\")"
+ },
+ {
+ "comment": "The code defines a main function that takes command line arguments and uses them to set up the environment for running the simulation. It first sets the seed, then parses various parameters such as is_eval, ckpt_dir, policy_class, onscreen_render, task_name, batch_size_train, batch_size_val, num_steps, eval_every, validate_every, save_every, and resume_ckpt_path. It also determines if the task is simulation-based or not, then retrieves the task parameters from either SIM_TASK_CONFIGS or TASK_CONFIGS based on the task name. These parameters include dataset_dir, episode_len, camera_names, and stats_dir.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":36-64",
+ "content": "def main(args):\n set_seed(1)\n # command line parameters\n is_eval = args['eval']\n ckpt_dir = args['ckpt_dir']\n policy_class = args['policy_class']\n onscreen_render = args['onscreen_render']\n task_name = args['task_name']\n batch_size_train = args['batch_size']\n batch_size_val = args['batch_size']\n num_steps = args['num_steps']\n eval_every = args['eval_every']\n validate_every = args['validate_every']\n save_every = args['save_every']\n resume_ckpt_path = args['resume_ckpt_path']\n # get task parameters\n is_sim = task_name[:4] == 'sim_'\n if is_sim or task_name == 'all':\n from constants import SIM_TASK_CONFIGS\n task_config = SIM_TASK_CONFIGS[task_name]\n else:\n from aloha_scripts.constants import TASK_CONFIGS\n task_config = TASK_CONFIGS[task_name]\n dataset_dir = task_config['dataset_dir']\n # num_episodes = task_config['num_episodes']\n episode_len = task_config['episode_len']\n camera_names = task_config['camera_names']\n stats_dir = task_config.get('stats_dir', None)"
+ },
+ {
+ "comment": "This code sets various fixed parameters for the ACT policy. It gets the sample weights, train ratio, and name filter from the task configuration. The state dimension is set to 14. Backbone learning rate is set to 1e-5 with a predefined backbone model. If the policy class is ACT, it further defines encoder layers, decoder layers, number of attention heads, and other configurations for the policy based on provided arguments. Camera names are also defined if needed. It also handles whether or not to use VQ (if specified by args).",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":65-89",
+ "content": " sample_weights = task_config.get('sample_weights', None)\n train_ratio = task_config.get('train_ratio', 0.99)\n name_filter = task_config.get('name_filter', lambda n: True)\n # fixed parameters\n state_dim = 14\n lr_backbone = 1e-5\n backbone = 'resnet18'\n if policy_class == 'ACT':\n enc_layers = 4\n dec_layers = 7\n nheads = 8\n policy_config = {'lr': args['lr'],\n 'num_queries': args['chunk_size'],\n 'kl_weight': args['kl_weight'],\n 'hidden_dim': args['hidden_dim'],\n 'dim_feedforward': args['dim_feedforward'],\n 'lr_backbone': lr_backbone,\n 'backbone': backbone,\n 'enc_layers': enc_layers,\n 'dec_layers': dec_layers,\n 'nheads': nheads,\n 'camera_names': camera_names,\n 'vq': args['use_vq'],\n 'vq_class': args['vq_class'],"
+ },
+ {
+ "comment": "This code is setting up different configurations for the policy based on the given policy_class. The 'AuxCritic' configuration includes an auxiliary critic, 'Diffusion' uses diffusion-based policy, and 'CNNMLP' uses a CNN and MLP-based policy. All configurations include learning rate (lr), camera names, and actuator network directory settings.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":90-114",
+ "content": " 'vq_dim': args['vq_dim'],\n 'action_dim': 16,\n 'no_encoder': args['no_encoder'],\n }\n elif policy_class == 'Diffusion':\n policy_config = {'lr': args['lr'],\n 'camera_names': camera_names,\n 'action_dim': 16,\n 'observation_horizon': 1,\n 'action_horizon': 8,\n 'prediction_horizon': args['chunk_size'],\n 'num_queries': args['chunk_size'],\n 'num_inference_timesteps': 10,\n 'ema_power': 0.75,\n 'vq': False,\n }\n elif policy_class == 'CNNMLP':\n policy_config = {'lr': args['lr'], 'lr_backbone': lr_backbone, 'backbone' : backbone, 'num_queries': 1,\n 'camera_names': camera_names,}\n else:\n raise NotImplementedError\n actuator_config = {\n 'actuator_network_dir': args['actuator_network_dir'],"
+ },
+ {
+ "comment": "The code is defining and initializing two dictionaries: 'train_args' and 'config'. These dictionaries store various arguments for the training process. The code also checks if a directory exists and creates it if not, and stores configuration information in a file named 'config.pkl' within that directory. This information will likely be used to train an agent for a specific task or environment.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":115-145",
+ "content": " 'history_len': args['history_len'],\n 'future_len': args['future_len'],\n 'prediction_len': args['prediction_len'],\n }\n config = {\n 'num_steps': num_steps,\n 'eval_every': eval_every,\n 'validate_every': validate_every,\n 'save_every': save_every,\n 'ckpt_dir': ckpt_dir,\n 'resume_ckpt_path': resume_ckpt_path,\n 'episode_len': episode_len,\n 'state_dim': state_dim,\n 'lr': args['lr'],\n 'policy_class': policy_class,\n 'onscreen_render': onscreen_render,\n 'policy_config': policy_config,\n 'task_name': task_name,\n 'seed': args['seed'],\n 'temporal_agg': args['temporal_agg'],\n 'camera_names': camera_names,\n 'real_robot': not is_sim,\n 'load_pretrain': args['load_pretrain'],\n 'actuator_config': actuator_config,\n }\n if not os.path.isdir(ckpt_dir):\n os.makedirs(ckpt_dir)\n config_path = os.path.join(ckpt_dir, 'config.pkl')\n expr_name = ckpt_dir.split('/')[-1]"
+ },
+ {
+ "comment": "The code initializes the WandB for evaluation, updates the config file if not in evaluation mode, and then evaluates different checkpoints. It logs success rate and average return for each checkpoint, prints them on console, and exits the program. If in training mode, it loads data, creates dataloaders, and returns necessary objects.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":146-164",
+ "content": " if not is_eval:\n wandb.init(project=\"mobile-aloha2\", reinit=True, entity=\"mobile-aloha2\", name=expr_name)\n wandb.config.update(config)\n with open(config_path, 'wb') as f:\n pickle.dump(config, f)\n if is_eval:\n ckpt_names = [f'policy_last.ckpt']\n results = []\n for ckpt_name in ckpt_names:\n success_rate, avg_return = eval_bc(config, ckpt_name, save_episode=True, num_rollouts=10)\n # wandb.log({'success_rate': success_rate, 'avg_return': avg_return})\n results.append([ckpt_name, success_rate, avg_return])\n for ckpt_name, success_rate, avg_return in results:\n print(f'{ckpt_name}: {success_rate=} {avg_return=}')\n print()\n exit()\n train_dataloader, val_dataloader, stats, _ = load_data(dataset_dir, name_filter, camera_names, batch_size_train, batch_size_val, args['chunk_size'], args['skip_mirrored_data'], config['load_pretrain'], policy_class, stats_dir_l=stats_dir, sample_weights=sample_weights, train_ratio=train_ratio)"
+ },
+ {
+ "comment": "This code saves dataset statistics, trains a behavioral cloning model, and saves the best checkpoint. It also creates a policy object based on the policy class and configures an optimizer for it.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":166-197",
+ "content": " # save dataset stats\n stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')\n with open(stats_path, 'wb') as f:\n pickle.dump(stats, f)\n best_ckpt_info = train_bc(train_dataloader, val_dataloader, config)\n best_step, min_val_loss, best_state_dict = best_ckpt_info\n # save best checkpoint\n ckpt_path = os.path.join(ckpt_dir, f'policy_best.ckpt')\n torch.save(best_state_dict, ckpt_path)\n print(f'Best ckpt, val loss {min_val_loss:.6f} @ step{best_step}')\n wandb.finish()\ndef make_policy(policy_class, policy_config):\n if policy_class == 'ACT':\n policy = ACTPolicy(policy_config)\n elif policy_class == 'CNNMLP':\n policy = CNNMLPPolicy(policy_config)\n elif policy_class == 'Diffusion':\n policy = DiffusionPolicy(policy_config)\n else:\n raise NotImplementedError\n return policy\ndef make_optimizer(policy_class, policy):\n if policy_class == 'ACT':\n optimizer = policy.configure_optimizers()\n elif policy_class == 'CNNMLP':\n optimizer = policy.configure_optimizers()"
+ },
+ {
+ "comment": "This code snippet checks the policy class and configures the optimizer accordingly. If the policy class is 'Diffusion', it sets the optimizer using the policy's method. For any other policy class, a NotImplementedError is raised. The get_image function takes timestep (ts), camera names, and rand_crop_resize flag as input. It retrieves images from ts observation and reshapes them into a tensor for further processing. If rand_crop_resize is True, it randomly crops and resizes the image while maintaining aspect ratio.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":198-221",
+ "content": " elif policy_class == 'Diffusion':\n optimizer = policy.configure_optimizers()\n else:\n raise NotImplementedError\n return optimizer\ndef get_image(ts, camera_names, rand_crop_resize=False):\n curr_images = []\n for cam_name in camera_names:\n curr_image = rearrange(ts.observation['images'][cam_name], 'h w c -> c h w')\n curr_images.append(curr_image)\n curr_image = np.stack(curr_images, axis=0)\n curr_image = torch.from_numpy(curr_image / 255.0).float().cuda().unsqueeze(0)\n if rand_crop_resize:\n print('rand crop resize is used!')\n original_size = curr_image.shape[-2:]\n ratio = 0.95\n curr_image = curr_image[..., int(original_size[0] * (1 - ratio) / 2): int(original_size[0] * (1 + ratio) / 2),\n int(original_size[1] * (1 - ratio) / 2): int(original_size[1] * (1 + ratio) / 2)]\n curr_image = curr_image.squeeze(0)\n resize_transform = transforms.Resize(original_size, antialias=True)\n curr_image = resize_transform(curr_image)"
+ },
+ {
+ "comment": "The code snippet loads a policy model from a checkpoint file and sets the model to evaluation mode. It also initializes variables related to the task, such as state dimensions and camera names. The policy is created using a specified class and configuration, and if the policy uses a VQ-VAE, it initializes the corresponding dimensions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":222-252",
+ "content": " curr_image = curr_image.unsqueeze(0)\n return curr_image\ndef eval_bc(config, ckpt_name, save_episode=True, num_rollouts=50):\n set_seed(1000)\n ckpt_dir = config['ckpt_dir']\n state_dim = config['state_dim']\n real_robot = config['real_robot']\n policy_class = config['policy_class']\n onscreen_render = config['onscreen_render']\n policy_config = config['policy_config']\n camera_names = config['camera_names']\n max_timesteps = config['episode_len']\n task_name = config['task_name']\n temporal_agg = config['temporal_agg']\n onscreen_cam = 'angle'\n vq = config['policy_config']['vq']\n actuator_config = config['actuator_config']\n use_actuator_net = actuator_config['actuator_network_dir'] is not None\n # load policy and stats\n ckpt_path = os.path.join(ckpt_dir, ckpt_name)\n policy = make_policy(policy_class, policy_config)\n loading_status = policy.deserialize(torch.load(ckpt_path))\n print(loading_status)\n policy.cuda()\n policy.eval()\n if vq:\n vq_dim = config['policy_config']['vq_dim']"
+ },
+ {
+ "comment": "This code is loading a policy from the specified checkpoint path and a latent model from the specified latent_model_ckpt_path. It also loads dataset statistics from stats_path. Additionally, if use_actuator_net is True, it initializes an ActuatorNetwork object with specific parameters, and loads the actuator network from its designated checkpoint path.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":253-273",
+ "content": " vq_class = config['policy_config']['vq_class']\n latent_model = Latent_Model_Transformer(vq_dim, vq_dim, vq_class)\n latent_model_ckpt_path = os.path.join(ckpt_dir, 'latent_model_last.ckpt')\n latent_model.deserialize(torch.load(latent_model_ckpt_path))\n latent_model.eval()\n latent_model.cuda()\n print(f'Loaded policy from: {ckpt_path}, latent model from: {latent_model_ckpt_path}')\n else:\n print(f'Loaded: {ckpt_path}')\n stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')\n with open(stats_path, 'rb') as f:\n stats = pickle.load(f)\n # if use_actuator_net:\n # prediction_len = actuator_config['prediction_len']\n # future_len = actuator_config['future_len']\n # history_len = actuator_config['history_len']\n # actuator_network_dir = actuator_config['actuator_network_dir']\n # from act.train_actuator_network import ActuatorNetwork\n # actuator_network = ActuatorNetwork(prediction_len)\n # actuator_network_path = os.path.join(actuator_network_dir, 'actuator_net_last.ckpt')"
+ },
+ {
+ "comment": "Loading the actuator network from the specified path, evaluating the network, moving it to GPU if available, and printing a message confirming the loading status. The actuator_net_stats.pkl file is opened and actuator stats are loaded. Two lambda functions, actuator_unnorm and actuator_norm, are defined for data normalization. A function named collect_base_action is defined to collect base actions after post-processing them. A pre_process lambda function is also defined for normalizing the state qpos.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":274-289",
+ "content": " # loading_status = actuator_network.load_state_dict(torch.load(actuator_network_path))\n # actuator_network.eval()\n # actuator_network.cuda()\n # print(f'Loaded actuator network from: {actuator_network_path}, {loading_status}')\n # actuator_stats_path = os.path.join(actuator_network_dir, 'actuator_net_stats.pkl')\n # with open(actuator_stats_path, 'rb') as f:\n # actuator_stats = pickle.load(f)\n # actuator_unnorm = lambda x: x * actuator_stats['commanded_speed_std'] + actuator_stats['commanded_speed_std']\n # actuator_norm = lambda x: (x - actuator_stats['observed_speed_mean']) / actuator_stats['observed_speed_mean']\n # def collect_base_action(all_actions, norm_episode_all_base_actions):\n # post_processed_actions = post_process(all_actions.squeeze(0).cpu().numpy())\n # norm_episode_all_base_actions += actuator_norm(post_processed_actions[:, -2:]).tolist()\n pre_process = lambda s_qpos: (s_qpos - stats['qpos_mean']) / stats['qpos_std']"
+ },
+ {
+ "comment": "This code block initializes the environment and sets up parameters based on whether it is running in a real-world or simulation environment. It also accounts for temporal aggregation and potential delay in the real world. Finally, it initializes empty lists to store episode returns and highest rewards during the learning process.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":290-317",
+ "content": " if policy_class == 'Diffusion':\n post_process = lambda a: ((a + 1) / 2) * (stats['action_max'] - stats['action_min']) + stats['action_min']\n else:\n post_process = lambda a: a * stats['action_std'] + stats['action_mean']\n # load environment\n if real_robot:\n from aloha_scripts.robot_utils import move_grippers # requires aloha\n from aloha_scripts.real_env import make_real_env # requires aloha\n env = make_real_env(init_node=True, setup_robots=True, setup_base=True)\n env_max_reward = 0\n else:\n from sim_env import make_sim_env\n env = make_sim_env(task_name)\n env_max_reward = env.task.max_reward\n query_frequency = policy_config['num_queries']\n if temporal_agg:\n query_frequency = 1\n num_queries = policy_config['num_queries']\n if real_robot:\n BASE_DELAY = 13\n query_frequency -= BASE_DELAY\n max_timesteps = int(max_timesteps * 1) # may increase for real-world tasks\n episode_returns = []\n highest_rewards = []"
+ },
+ {
+ "comment": "This code initializes a rollout_id for a loop, sets the task based on the task name, resets the environment, renders the screen if desired, and prepares variables for an evaluation loop. If \"use_actuator_net\" is enabled, this will be used.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":318-346",
+ "content": " for rollout_id in range(num_rollouts):\n if real_robot:\n e()\n rollout_id += 0\n ### set task\n if 'sim_transfer_cube' in task_name:\n BOX_POSE[0] = sample_box_pose() # used in sim reset\n elif 'sim_insertion' in task_name:\n BOX_POSE[0] = np.concatenate(sample_insertion_pose()) # used in sim reset\n ts = env.reset()\n ### onscreen render\n if onscreen_render:\n ax = plt.subplot()\n plt_img = ax.imshow(env._physics.render(height=480, width=640, camera_id=onscreen_cam))\n plt.ion()\n ### evaluation loop\n if temporal_agg:\n all_time_actions = torch.zeros([max_timesteps, max_timesteps+num_queries, 16]).cuda()\n # qpos_history = torch.zeros((1, max_timesteps, state_dim)).cuda()\n qpos_history_raw = np.zeros((max_timesteps, state_dim))\n image_list = [] # for visualization\n qpos_list = []\n target_qpos_list = []\n rewards = []\n # if use_actuator_net:"
+ },
+ {
+ "comment": "The code updates the onscreen render and waits for a delay (DT), processes previous timestep to get qpos and image_list, and pre-processes qpos. It does this within a loop for maximum timesteps, with timing measurements at specific points.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":347-369",
+ "content": " # norm_episode_all_base_actions = [actuator_norm(np.zeros(history_len, 2)).tolist()]\n with torch.inference_mode():\n time0 = time.time()\n DT = 1 / FPS\n culmulated_delay = 0 \n for t in range(max_timesteps):\n time1 = time.time()\n ### update onscreen render and wait for DT\n if onscreen_render:\n image = env._physics.render(height=480, width=640, camera_id=onscreen_cam)\n plt_img.set_data(image)\n plt.pause(DT)\n ### process previous timestep to get qpos and image_list\n time2 = time.time()\n obs = ts.observation\n if 'images' in obs:\n image_list.append(obs['images'])\n else:\n image_list.append({'main': obs['image']})\n qpos_numpy = np.array(obs['qpos'])\n qpos_history_raw[t] = qpos_numpy\n qpos = pre_process(qpos_numpy)"
+ },
+ {
+ "comment": "This code performs query-based policy execution in a reinforcement learning environment. It prepares input data and queries the policy network for action choices based on the current state. If the frequency requirement is met, it captures the image from a specified camera and applies any required preprocessing. The code also includes a warm-up step to prepare the neural network before executing the policy, and handles generating samples from a latent model if necessary.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":370-391",
+ "content": " qpos = torch.from_numpy(qpos).float().cuda().unsqueeze(0)\n # qpos_history[:, t] = qpos\n if t % query_frequency == 0:\n curr_image = get_image(ts, camera_names, rand_crop_resize=(config['policy_class'] == 'Diffusion'))\n # print('get image: ', time.time() - time2)\n if t == 0:\n # warm up\n for _ in range(10):\n policy(qpos, curr_image)\n print('network warm up done')\n time1 = time.time()\n ### query policy\n time3 = time.time()\n if config['policy_class'] == \"ACT\":\n if t % query_frequency == 0:\n if vq:\n if rollout_id == 0:\n for _ in range(10):\n vq_sample = latent_model.generate(1, temperature=1, x=None)\n print(torch.nonzero(vq_sample[0])[:, 1].cpu().numpy())"
+ },
+ {
+ "comment": "This code generates an action based on the given state and either additional latent variables or just the state. If using a real robot, it modifies the generated actions to account for a base delay in the actuator response time. If temporal aggregation is enabled, the code collects all-time actions, filters out any zeros, and assigns weights based on an exponential function of the action index.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":392-407",
+ "content": " vq_sample = latent_model.generate(1, temperature=1, x=None)\n all_actions = policy(qpos, curr_image, vq_sample=vq_sample)\n else:\n # e()\n all_actions = policy(qpos, curr_image)\n # if use_actuator_net:\n # collect_base_action(all_actions, norm_episode_all_base_actions)\n if real_robot:\n all_actions = torch.cat([all_actions[:, :-BASE_DELAY, :-2], all_actions[:, BASE_DELAY:, -2:]], dim=2)\n if temporal_agg:\n all_time_actions[[t], t:t+num_queries] = all_actions\n actions_for_curr_step = all_time_actions[:, t]\n actions_populated = torch.all(actions_for_curr_step != 0, axis=1)\n actions_for_curr_step = actions_for_curr_step[actions_populated]\n k = 0.01\n exp_weights = np.exp(-k * np.arange(len(actions_for_curr_step)))"
+ },
+ {
+ "comment": "This code appears to be part of a larger program that utilizes different policies and actions for robotic control. It seems to handle policy selection based on the current time step, t, and query frequency. If the policy is set as \"Diffusion\", it retrieves new actions from the policy at specific intervals, potentially accounting for delays or base actions. The code also handles real robot interactions, adjusting action sequences accordingly.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":408-422",
+ "content": " exp_weights = exp_weights / exp_weights.sum()\n exp_weights = torch.from_numpy(exp_weights).cuda().unsqueeze(dim=1)\n raw_action = (actions_for_curr_step * exp_weights).sum(dim=0, keepdim=True)\n else:\n raw_action = all_actions[:, t % query_frequency]\n # if t % query_frequency == query_frequency - 1:\n # # zero out base actions to avoid overshooting\n # raw_action[0, -2:] = 0\n elif config['policy_class'] == \"Diffusion\":\n if t % query_frequency == 0:\n all_actions = policy(qpos, curr_image)\n # if use_actuator_net:\n # collect_base_action(all_actions, norm_episode_all_base_actions)\n if real_robot:\n all_actions = torch.cat([all_actions[:, :-BASE_DELAY, :-2], all_actions[:, BASE_DELAY:, -2:]], dim=2)"
+ },
+ {
+ "comment": "This code selects the policy based on the config value and performs necessary actions. It uses CNNMLP for querying the policy, post-processes the raw action output, and assigns target_qpos from the processed action values. It also handles actuator net usage with temporal aggregation if configured.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":423-443",
+ "content": " raw_action = all_actions[:, t % query_frequency]\n elif config['policy_class'] == \"CNNMLP\":\n raw_action = policy(qpos, curr_image)\n all_actions = raw_action.unsqueeze(0)\n # if use_actuator_net:\n # collect_base_action(all_actions, norm_episode_all_base_actions)\n else:\n raise NotImplementedError\n # print('query policy: ', time.time() - time3)\n ### post-process actions\n time4 = time.time()\n raw_action = raw_action.squeeze(0).cpu().numpy()\n action = post_process(raw_action)\n target_qpos = action[:-2]\n # if use_actuator_net:\n # assert(not temporal_agg)\n # if t % prediction_len == 0:\n # offset_start_ts = t + history_len\n # actuator_net_in = np.array(norm_episode_all_base_actions[offset_start_ts - history_len: offset_start_ts + future_len])"
+ },
+ {
+ "comment": "Code segment is responsible for updating the base action based on whether an actuator network prediction is available or not. If a prediction exists, it normalizes and detaches the prediction before selecting the relevant chunk. Else, it uses the last two elements of the given action as the base action after applying linear velocity calibration (commented out) and post-processing (also commented out). The code then steps the environment using the calculated base action and appends current qpos to qpos_list and target_qpos to target_qpos_list for visualization purposes.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":444-464",
+ "content": " # actuator_net_in = torch.from_numpy(actuator_net_in).float().unsqueeze(dim=0).cuda()\n # pred = actuator_network(actuator_net_in)\n # base_action_chunk = actuator_unnorm(pred.detach().cpu().numpy()[0])\n # base_action = base_action_chunk[t % prediction_len]\n # else:\n base_action = action[-2:]\n # base_action = calibrate_linear_vel(base_action, c=0.19)\n # base_action = postprocess_base_action(base_action)\n # print('post process: ', time.time() - time4)\n ### step the environment\n time5 = time.time()\n if real_robot:\n ts = env.step(target_qpos, base_action)\n else:\n ts = env.step(target_qpos)\n # print('step env: ', time.time() - time5)\n ### for visualization\n qpos_list.append(qpos_numpy)\n target_qpos_list.append(target_qpos)"
+ },
+ {
+ "comment": "The code appends rewards to a list, calculates and controls sleep time for synchronization, handles step duration longer than DT by accumulating delay, prints warning and updates cumulative delay if necessary, calculates average FPS, closes the plot window. If real_robot is True, it opens grippers and saves qpos_history_raw in a specified directory with an auto-incrementing index.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":465-483",
+ "content": " rewards.append(ts.reward)\n duration = time.time() - time1\n sleep_time = max(0, DT - duration)\n # print(sleep_time)\n time.sleep(sleep_time)\n # time.sleep(max(0, DT - duration - culmulated_delay))\n if duration >= DT:\n culmulated_delay += (duration - DT)\n print(f'Warning: step duration: {duration:.3f} s at step {t} longer than DT: {DT} s, culmulated delay: {culmulated_delay:.3f} s')\n # else:\n # culmulated_delay = max(0, culmulated_delay - (DT - duration))\n print(f'Avg fps {max_timesteps / (time.time() - time0)}')\n plt.close()\n if real_robot:\n move_grippers([env.puppet_bot_left, env.puppet_bot_right], [PUPPET_GRIPPER_JOINT_OPEN] * 2, move_time=0.5) # open\n # save qpos_history_raw\n log_id = get_auto_index(ckpt_dir)\n np.save(os.path.join(ckpt_dir, f'qpos_{log_id}.npy'), qpos_history_raw)"
+ },
+ {
+ "comment": "The code plots the history of qpos for each dimension and saves it as an image, calculates episode return and highest reward, prints the results, and checks if the highest reward equals the environment's maximum reward. It then calculates the success rate based on the highest rewards.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":484-507",
+ "content": " plt.figure(figsize=(10, 20))\n # plot qpos_history_raw for each qpos dim using subplots\n for i in range(state_dim):\n plt.subplot(state_dim, 1, i+1)\n plt.plot(qpos_history_raw[:, i])\n # remove x axis\n if i != state_dim - 1:\n plt.xticks([])\n plt.tight_layout()\n plt.savefig(os.path.join(ckpt_dir, f'qpos_{log_id}.png'))\n plt.close()\n rewards = np.array(rewards)\n episode_return = np.sum(rewards[rewards!=None])\n episode_returns.append(episode_return)\n episode_highest_reward = np.max(rewards)\n highest_rewards.append(episode_highest_reward)\n print(f'Rollout {rollout_id}\\n{episode_return=}, {episode_highest_reward=}, {env_max_reward=}, Success: {episode_highest_reward==env_max_reward}')\n # if save_episode:\n # save_videos(image_list, DT, video_path=os.path.join(ckpt_dir, f'video{rollout_id}.mp4'))\n success_rate = np.mean(np.array(highest_rewards) == env_max_reward)"
+ },
+ {
+ "comment": "Code block calculates success rate and average return from episode results, displays summary in console, writes the summary to a text file along with episode returns and highest rewards.\n\nThe forward_pass function takes input data (image_data, qpos_data, action_data, is_pad) and passes it through the policy network.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":508-531",
+ "content": " avg_return = np.mean(episode_returns)\n summary_str = f'\\nSuccess rate: {success_rate}\\nAverage return: {avg_return}\\n\\n'\n for r in range(env_max_reward+1):\n more_or_equal_r = (np.array(highest_rewards) >= r).sum()\n more_or_equal_r_rate = more_or_equal_r / num_rollouts\n summary_str += f'Reward >= {r}: {more_or_equal_r}/{num_rollouts} = {more_or_equal_r_rate*100}%\\n'\n print(summary_str)\n # save success rate to txt\n result_file_name = 'result_' + ckpt_name.split('.')[0] + '.txt'\n with open(os.path.join(ckpt_dir, result_file_name), 'w') as f:\n f.write(summary_str)\n f.write(repr(episode_returns))\n f.write('\\n\\n')\n f.write(repr(highest_rewards))\n return success_rate, avg_return\ndef forward_pass(data, policy):\n image_data, qpos_data, action_data, is_pad = data\n image_data, qpos_data, action_data, is_pad = image_data.cuda(), qpos_data.cuda(), action_data.cuda(), is_pad.cuda()\n return policy(qpos_data, image_data, action_data, is_pad) # TODO remove None"
+ },
+ {
+ "comment": "The code defines a \"train_bc\" function which trains a policy using a specified data loader. It sets up various configurations, checks if it should load pre-trained weights or resume training from a previous checkpoint, and initializes the optimizer. The function uses a repeater to repeat the training data loader for consistency.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":534-559",
+ "content": "def train_bc(train_dataloader, val_dataloader, config):\n num_steps = config['num_steps']\n ckpt_dir = config['ckpt_dir']\n seed = config['seed']\n policy_class = config['policy_class']\n policy_config = config['policy_config']\n eval_every = config['eval_every']\n validate_every = config['validate_every']\n save_every = config['save_every']\n set_seed(seed)\n policy = make_policy(policy_class, policy_config)\n if config['load_pretrain']:\n loading_status = policy.deserialize(torch.load(os.path.join('/home/zfu/interbotix_ws/src/act/ckpts/pretrain_all', 'policy_step_50000_seed_0.ckpt')))\n print(f'loaded! {loading_status}')\n if config['resume_ckpt_path'] is not None:\n loading_status = policy.deserialize(torch.load(config['resume_ckpt_path']))\n print(f'Resume policy from: {config[\"resume_ckpt_path\"]}, Status: {loading_status}')\n policy.cuda()\n optimizer = make_optimizer(policy_class, policy)\n min_val_loss = np.inf\n best_ckpt_info = None\n train_dataloader = repeater(train_dataloader)"
+ },
+ {
+ "comment": "This code is performing a validation step at certain intervals during training. It logs the validation summary to WandB and keeps track of the best validation loss seen so far. The best model checkpoint information is updated if the current validation loss is lower than the minimum previously observed.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":560-583",
+ "content": " for step in tqdm(range(num_steps+1)):\n # validation\n if step % validate_every == 0:\n print('validating')\n with torch.inference_mode():\n policy.eval()\n validation_dicts = []\n for batch_idx, data in enumerate(val_dataloader):\n forward_dict = forward_pass(data, policy)\n validation_dicts.append(forward_dict)\n if batch_idx > 50:\n break\n validation_summary = compute_dict_mean(validation_dicts)\n epoch_val_loss = validation_summary['loss']\n if epoch_val_loss < min_val_loss:\n min_val_loss = epoch_val_loss\n best_ckpt_info = (step, min_val_loss, deepcopy(policy.serialize()))\n for k in list(validation_summary.keys()):\n validation_summary[f'val_{k}'] = validation_summary.pop(k) \n wandb.log(validation_summary, step=step)\n print(f'Val loss: {epoch_val_loss:.5f}')"
+ },
+ {
+ "comment": "The code performs validation, evaluation, and training steps. It logs the success rate of evaluations, saves checkpoints at certain intervals, trains a policy network using forward and backward passes, and logs data for later analysis.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":584-610",
+ "content": " summary_string = ''\n for k, v in validation_summary.items():\n summary_string += f'{k}: {v.item():.3f} '\n print(summary_string)\n # evaluation\n if (step > 0) and (step % eval_every == 0):\n # first save then eval\n ckpt_name = f'policy_step_{step}_seed_{seed}.ckpt'\n ckpt_path = os.path.join(ckpt_dir, ckpt_name)\n torch.save(policy.serialize(), ckpt_path)\n success, _ = eval_bc(config, ckpt_name, save_episode=True, num_rollouts=10)\n wandb.log({'success': success}, step=step)\n # training\n policy.train()\n optimizer.zero_grad()\n data = next(train_dataloader)\n forward_dict = forward_pass(data, policy)\n # backward\n loss = forward_dict['loss']\n loss.backward()\n optimizer.step()\n wandb.log(forward_dict, step=step) # not great, make training 1-2% slower\n if step % save_every == 0:\n ckpt_path = os.path.join(ckpt_dir, f'policy_step_{step}_seed_{seed}.ckpt')"
+ },
+ {
+ "comment": "The code defines a function to train and save a policy, repeats the data loader for multiple epochs, and takes command-line arguments for evaluation, on-screen rendering, checkpoint directory, and policy class. The training finishes when it finds the best model based on validation loss, saves it, and prints information about the best step, seed, and validation loss.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":611-637",
+ "content": " torch.save(policy.serialize(), ckpt_path)\n ckpt_path = os.path.join(ckpt_dir, f'policy_last.ckpt')\n torch.save(policy.serialize(), ckpt_path)\n best_step, min_val_loss, best_state_dict = best_ckpt_info\n ckpt_path = os.path.join(ckpt_dir, f'policy_step_{best_step}_seed_{seed}.ckpt')\n torch.save(best_state_dict, ckpt_path)\n print(f'Training finished:\\nSeed {seed}, val loss {min_val_loss:.6f} at step {best_step}')\n return best_ckpt_info\ndef repeater(data_loader):\n epoch = 0\n for loader in repeat(data_loader):\n for data in loader:\n yield data\n print(f'Epoch {epoch} done')\n epoch += 1\nif __name__ == '__main__':\n parser = argparse.ArgumentParser()\n parser.add_argument('--eval', action='store_true')\n parser.add_argument('--onscreen_render', action='store_true')\n parser.add_argument('--ckpt_dir', action='store', type=str, help='ckpt_dir', required=True)\n parser.add_argument('--policy_class', action='store', type=str, help='policy_class, capitalize', required=True)"
+ },
+ {
+ "comment": "The code above is using the ArgumentParser from Python's argparse module to add various command-line arguments for a task. These arguments include 'task_name', 'batch_size', 'seed', 'num_steps', 'lr', 'load_pretrain', 'eval_every', and 'validate_every'. The 'save_every' argument is optional, as well as the 'resume_ckpt_path'. These arguments are required or defaulted depending on the specifications.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":638-647",
+ "content": " parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)\n parser.add_argument('--batch_size', action='store', type=int, help='batch_size', required=True)\n parser.add_argument('--seed', action='store', type=int, help='seed', required=True)\n parser.add_argument('--num_steps', action='store', type=int, help='num_steps', required=True)\n parser.add_argument('--lr', action='store', type=float, help='lr', required=True)\n parser.add_argument('--load_pretrain', action='store_true', default=False)\n parser.add_argument('--eval_every', action='store', type=int, default=500, help='eval_every', required=False)\n parser.add_argument('--validate_every', action='store', type=int, default=500, help='validate_every', required=False)\n parser.add_argument('--save_every', action='store', type=int, default=500, help='save_every', required=False)\n parser.add_argument('--resume_ckpt_path', action='store', type=str, help='resume_ckpt_path', required=False)"
+ },
+ {
+ "comment": "This code is using the Argparse module to define command-line arguments for a Python script. The arguments include options such as skipping mirrored data, specifying directories and lengths for history, future, and prediction. For ACT (Adaptive Computation Time) model, additional arguments like KL weight, chunk size, hidden dimension, feedforward dimension, and use of Variational Quantization are defined. These arguments allow the user to customize the behavior of the script based on their specific needs.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":648-661",
+ "content": " parser.add_argument('--skip_mirrored_data', action='store_true')\n parser.add_argument('--actuator_network_dir', action='store', type=str, help='actuator_network_dir', required=False)\n parser.add_argument('--history_len', action='store', type=int)\n parser.add_argument('--future_len', action='store', type=int)\n parser.add_argument('--prediction_len', action='store', type=int)\n # for ACT\n parser.add_argument('--kl_weight', action='store', type=int, help='KL Weight', required=False)\n parser.add_argument('--chunk_size', action='store', type=int, help='chunk_size', required=False)\n parser.add_argument('--hidden_dim', action='store', type=int, help='hidden_dim', required=False)\n parser.add_argument('--dim_feedforward', action='store', type=int, help='dim_feedforward', required=False)\n parser.add_argument('--temporal_agg', action='store_true')\n parser.add_argument('--use_vq', action='store_true')\n parser.add_argument('--vq_class', action='store', type=int, help='vq_class')"
+ },
+ {
+ "comment": "These lines are adding command line arguments to the parser object, allowing users to specify values for 'vq_dim' and 'no_encoder'. The first argument, '--vq_dim', uses integer type and provides a help message. The second argument, '--no_encoder', is set as a boolean flag when true. Lastly, the main function is called with the parsed arguments passed in as keyword arguments.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/imitate_episodes.py\":662-665",
+ "content": " parser.add_argument('--vq_dim', action='store', type=int, help='vq_dim')\n parser.add_argument('--no_encoder', action='store_true')\n main(vars(parser.parse_args()))"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/f506ace2-d779-4831-be0b-8d52fef2db26.json b/docs/doc/f506ace2-d779-4831-be0b-8d52fef2db26.json
new file mode 100644
index 00000000..34127dbf
--- /dev/null
+++ b/docs/doc/f506ace2-d779-4831-be0b-8d52fef2db26.json
@@ -0,0 +1,30 @@
+{
+ "summary": "The code defines a Backbone class for ResNet backbones with frozen BatchNorm layers and builds a vision transformer backbone model using position embedding.",
+ "details": [
+ {
+ "comment": "This code snippet defines a class called \"FrozenBatchNorm2d\" which extends torch.nn.Module and fixes the batch statistics and affine parameters in BatchNorm2d. It also initializes buffers for weight, bias, running_mean, and running_var with appropriate values. The purpose is to avoid the BatchNorm2d parameters from updating during training, enabling it to function as a frozen layer.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/backbone.py\":0-34",
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nBackbone modules.\n\"\"\"\nfrom collections import OrderedDict\nimport torch\nimport torch.nn.functional as F\nimport torchvision\nfrom torch import nn\nfrom torchvision.models._utils import IntermediateLayerGetter\nfrom typing import Dict, List\nfrom util.misc import NestedTensor, is_main_process\nfrom .position_encoding import build_position_encoding\nimport IPython\ne = IPython.embed\nclass FrozenBatchNorm2d(torch.nn.Module):\n \"\"\"\n BatchNorm2d where the batch statistics and the affine parameters are fixed.\n Copy-paste from torchvision.misc.ops with added eps before rqsrt,\n without which any other policy_models than torchvision.policy_models.resnet[18,34,50,101]\n produce nans.\n \"\"\"\n def __init__(self, n):\n super(FrozenBatchNorm2d, self).__init__()\n self.register_buffer(\"weight\", torch.ones(n))\n self.register_buffer(\"bias\", torch.zeros(n))\n self.register_buffer(\"running_mean\", torch.zeros(n))\n self.register_buffer(\"running_var\", torch.ones(n))"
+ },
+ {
+ "comment": "Function \"_load_from_state_dict\" deletes \"num_batches_tracked_key\" from state_dict, then calls parent class's version of _load_from_state_dict. Function \"forward\" reshapes weights and biases for efficient processing, calculates scale and bias, and returns the processed input. Class \"BackboneBase\" initializes with backbone, train_backbone, num_channels, and return_interm_layers parameters.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/backbone.py\":36-61",
+ "content": " def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,\n missing_keys, unexpected_keys, error_msgs):\n num_batches_tracked_key = prefix + 'num_batches_tracked'\n if num_batches_tracked_key in state_dict:\n del state_dict[num_batches_tracked_key]\n super(FrozenBatchNorm2d, self)._load_from_state_dict(\n state_dict, prefix, local_metadata, strict,\n missing_keys, unexpected_keys, error_msgs)\n def forward(self, x):\n # move reshapes to the beginning\n # to make it fuser-friendly\n w = self.weight.reshape(1, -1, 1, 1)\n b = self.bias.reshape(1, -1, 1, 1)\n rv = self.running_var.reshape(1, -1, 1, 1)\n rm = self.running_mean.reshape(1, -1, 1, 1)\n eps = 1e-5\n scale = w * (rv + eps).rsqrt()\n bias = b - rm * scale\n return x * scale + bias\nclass BackboneBase(nn.Module):\n def __init__(self, backbone: nn.Module, train_backbone: bool, num_channels: int, return_interm_layers: bool):"
+ },
+ {
+ "comment": "This code defines a Backbone class in Python, which is part of a larger codebase. The class extends the BackboneBase and includes an init method to initialize the object, and a forward method for processing input data through the backbone model. It also handles nested tensors and returns them in a dictionary format.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/backbone.py\":62-85",
+ "content": " super().__init__()\n # for name, parameter in backbone.named_parameters(): # only train later layers # TODO do we want this?\n # if not train_backbone or 'layer2' not in name and 'layer3' not in name and 'layer4' not in name:\n # parameter.requires_grad_(False)\n if return_interm_layers:\n return_layers = {\"layer1\": \"0\", \"layer2\": \"1\", \"layer3\": \"2\", \"layer4\": \"3\"}\n else:\n return_layers = {'layer4': \"0\"}\n self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)\n self.num_channels = num_channels\n def forward(self, tensor):\n xs = self.body(tensor)\n return xs\n # out: Dict[str, NestedTensor] = {}\n # for name, x in xs.items():\n # m = tensor_list.mask\n # assert m is not None\n # mask = F.interpolate(m[None].float(), size=x.shape[-2:]).to(torch.bool)[0]\n # out[name] = NestedTensor(x, mask)\n # return out\nclass Backbone(BackboneBase):"
+ },
+ {
+ "comment": "The code defines a ResNet backbone model with frozen BatchNorm for transfer learning tasks. It includes an option to freeze the BatchNorm layers and a Joiner class that combines the output of the backbone and position encoding for further processing in a list format.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/backbone.py\":86-111",
+ "content": " \"\"\"ResNet backbone with frozen BatchNorm.\"\"\"\n def __init__(self, name: str,\n train_backbone: bool,\n return_interm_layers: bool,\n dilation: bool):\n backbone = getattr(torchvision.models, name)(\n replace_stride_with_dilation=[False, False, dilation],\n pretrained=is_main_process(), norm_layer=FrozenBatchNorm2d) # pretrained # TODO do we want frozen batch_norm??\n num_channels = 512 if name in ('resnet18', 'resnet34') else 2048\n super().__init__(backbone, train_backbone, num_channels, return_interm_layers)\nclass Joiner(nn.Sequential):\n def __init__(self, backbone, position_embedding):\n super().__init__(backbone, position_embedding)\n def forward(self, tensor_list: NestedTensor):\n xs = self[0](tensor_list)\n out: List[NestedTensor] = []\n pos = []\n for name, x in xs.items():\n out.append(x)\n # position encoding\n pos.append(self[1](x).to(x.dtype))\n return out, pos"
+ },
+ {
+ "comment": "This function builds a backbone model for a vision transformer. It takes arguments, creates position embedding, sets train and return flags, initializes the backbone, combines it with the position embedding, and returns the final model.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/backbone.py\":114-121",
+ "content": "def build_backbone(args):\n position_embedding = build_position_encoding(args)\n train_backbone = args.lr_backbone > 0\n return_interm_layers = args.masks\n backbone = Backbone(args.backbone, train_backbone, return_interm_layers, args.dilation)\n model = Joiner(backbone, position_embedding)\n model.num_channels = backbone.num_channels\n return model"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/f9303e9e-b4b1-459f-a325-a9eb84cff3e5.json b/docs/doc/f9303e9e-b4b1-459f-a325-a9eb84cff3e5.json
new file mode 100644
index 00000000..8f4bb692
--- /dev/null
+++ b/docs/doc/f9303e9e-b4b1-459f-a325-a9eb84cff3e5.json
@@ -0,0 +1,70 @@
+{
+ "summary": "The code defines a Transformer class in PyTorch for data processing, featuring encoder and decoder modules, positional embeddings, transformer layers, and optional masks and position embeddings.",
+ "details": [
+ {
+ "comment": "This code defines the Transformer class from scratch with minor modifications to the original implementation, including passing positional encodings in MHAttention, removing an extra LN layer in the encoder, and allowing for intermediate decoder activations to be returned. It inherits from nn.Module and has several parameters for customization.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":0-29",
+ "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nDETR Transformer class.\nCopy-paste from torch.nn.Transformer with modifications:\n * positional encodings are passed in MHattention\n * extra LN at the end of encoder is removed\n * decoder returns a stack of activations from all decoding layers\n\"\"\"\nimport copy\nfrom typing import Optional, List\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn, Tensor\nimport IPython\ne = IPython.embed\nclass Transformer(nn.Module):\n def __init__(self, d_model=512, nhead=8, num_encoder_layers=6,\n num_decoder_layers=6, dim_feedforward=2048, dropout=0.1,\n activation=\"relu\", normalize_before=False,\n return_intermediate_dec=False):\n super().__init__()\n encoder_layer = TransformerEncoderLayer(d_model, nhead, dim_feedforward,\n dropout, activation, normalize_before)\n encoder_norm = nn.LayerNorm(d_model) if normalize_before else None"
+ },
+ {
+ "comment": "This code initializes a Transformer model with an encoder and decoder, performing parameter initialization and normalization. It also includes a forward method for processing input data with possible flattening for images.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":30-53",
+ "content": " self.encoder = TransformerEncoder(encoder_layer, num_encoder_layers, encoder_norm)\n decoder_layer = TransformerDecoderLayer(d_model, nhead, dim_feedforward,\n dropout, activation, normalize_before)\n decoder_norm = nn.LayerNorm(d_model)\n self.decoder = TransformerDecoder(decoder_layer, num_decoder_layers, decoder_norm,\n return_intermediate=return_intermediate_dec)\n self._reset_parameters()\n self.d_model = d_model\n self.nhead = nhead\n def _reset_parameters(self):\n for p in self.parameters():\n if p.dim() > 1:\n nn.init.xavier_uniform_(p)\n def forward(self, src, mask, query_embed, pos_embed, latent_input=None, proprio_input=None, additional_pos_embed=None):\n # TODO flatten only when input has H and W\n if len(src.shape) == 4: # has H and W\n # flatten NxCxHxW to HWxNxC\n bs, c, h, w = src.shape\n src = src.flatten(2).permute(2, 0, 1)"
+ },
+ {
+ "comment": "The code initializes the transformer model by handling different source (src) input shapes. It either flattens and repeats the inputs if the shape is bs, hw, c or simply permutes and repeats if the shape is NxHWxC. Positional embeddings are calculated for both position and additional positional information. The decoder uses these embeddings to process target (tgt) and source memory.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":54-74",
+ "content": " pos_embed = pos_embed.flatten(2).permute(2, 0, 1).repeat(1, bs, 1)\n query_embed = query_embed.unsqueeze(1).repeat(1, bs, 1)\n # mask = mask.flatten(1)\n additional_pos_embed = additional_pos_embed.unsqueeze(1).repeat(1, bs, 1) # seq, bs, dim\n pos_embed = torch.cat([additional_pos_embed, pos_embed], axis=0)\n addition_input = torch.stack([latent_input, proprio_input], axis=0)\n src = torch.cat([addition_input, src], axis=0)\n else:\n assert len(src.shape) == 3\n # flatten NxHWxC to HWxNxC\n bs, hw, c = src.shape\n src = src.permute(1, 0, 2)\n pos_embed = pos_embed.unsqueeze(1).repeat(1, bs, 1)\n query_embed = query_embed.unsqueeze(1).repeat(1, bs, 1)\n tgt = torch.zeros_like(query_embed)\n memory = self.encoder(src, src_key_padding_mask=mask, pos=pos_embed)\n hs = self.decoder(tgt, memory, memory_key_padding_mask=mask,\n pos=pos_embed, query_pos=query_embed)"
+ },
+ {
+ "comment": "This code defines two classes: TransformerEncoder and TransformerDecoder. The TransformerEncoder class initializes an encoder with a specified number of layers and normalization method, then forwards input through each layer in the encoder. The TransformerDecoder class initializes a decoder with a specified number of layers and normalization method, then forwards input through each layer in the decoder. Both classes can handle optional masks and positions during forward propagation.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":75-108",
+ "content": " hs = hs.transpose(1, 2)\n return hs\nclass TransformerEncoder(nn.Module):\n def __init__(self, encoder_layer, num_layers, norm=None):\n super().__init__()\n self.layers = _get_clones(encoder_layer, num_layers)\n self.num_layers = num_layers\n self.norm = norm\n def forward(self, src,\n mask: Optional[Tensor] = None,\n src_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None):\n output = src\n for layer in self.layers:\n output = layer(output, src_mask=mask,\n src_key_padding_mask=src_key_padding_mask, pos=pos)\n if self.norm is not None:\n output = self.norm(output)\n return output\nclass TransformerDecoder(nn.Module):\n def __init__(self, decoder_layer, num_layers, norm=None, return_intermediate=False):\n super().__init__()\n self.layers = _get_clones(decoder_layer, num_layers)\n self.num_layers = num_layers\n self.norm = norm"
+ },
+ {
+ "comment": "The code defines a Transformer model's forward pass, where each layer applies its operations iteratively on the target (tgt) and memory inputs. The intermediate results are stored if return_intermediate is set to True. Finally, the norm layer normalizes the output, and if return_intermediate is set, stores the normalized outputs as intermediates.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":109-133",
+ "content": " self.return_intermediate = return_intermediate\n def forward(self, tgt, memory,\n tgt_mask: Optional[Tensor] = None,\n memory_mask: Optional[Tensor] = None,\n tgt_key_padding_mask: Optional[Tensor] = None,\n memory_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None,\n query_pos: Optional[Tensor] = None):\n output = tgt\n intermediate = []\n for layer in self.layers:\n output = layer(output, memory, tgt_mask=tgt_mask,\n memory_mask=memory_mask,\n tgt_key_padding_mask=tgt_key_padding_mask,\n memory_key_padding_mask=memory_key_padding_mask,\n pos=pos, query_pos=query_pos)\n if self.return_intermediate:\n intermediate.append(self.norm(output))\n if self.norm is not None:\n output = self.norm(output)\n if self.return_intermediate:"
+ },
+ {
+ "comment": "This code defines a class called \"TransformerEncoderLayer\" which implements a layer for the transformer encoder in the Transformer model. It consists of a self-attention mechanism, followed by a feedforward network and normalization layers. The \"return_intermediate\" parameter controls whether intermediate results are returned or not.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":134-162",
+ "content": " intermediate.pop()\n intermediate.append(output)\n if self.return_intermediate:\n return torch.stack(intermediate)\n return output.unsqueeze(0)\nclass TransformerEncoderLayer(nn.Module):\n def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1,\n activation=\"relu\", normalize_before=False):\n super().__init__()\n self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)\n # Implementation of Feedforward model\n self.linear1 = nn.Linear(d_model, dim_feedforward)\n self.dropout = nn.Dropout(dropout)\n self.linear2 = nn.Linear(dim_feedforward, d_model)\n self.norm1 = nn.LayerNorm(d_model)\n self.norm2 = nn.LayerNorm(d_model)\n self.dropout1 = nn.Dropout(dropout)\n self.dropout2 = nn.Dropout(dropout)\n self.activation = _get_activation_fn(activation)\n self.normalize_before = normalize_before\n def with_pos_embed(self, tensor, pos: Optional[Tensor]):"
+ },
+ {
+ "comment": "This code defines three functions: `forward_post`, `forward_pre`, and a helper function that calculates the tensor based on positional embeddings. The `forward_post` function applies self-attention to the input source, adds it back to the original source, and performs two feed-forward layers with residual connections and layer normalization for each of them. The `forward_pre` function applies layer normalization to the input source, calculates self-attention based on positional embeddings, and performs two feed-forward layers similar to `forward_post`. The code seems to be part of a transformer model in natural language processing or computer vision tasks that incorporate position information.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":163-186",
+ "content": " return tensor if pos is None else tensor + pos\n def forward_post(self,\n src,\n src_mask: Optional[Tensor] = None,\n src_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None):\n q = k = self.with_pos_embed(src, pos)\n src2 = self.self_attn(q, k, value=src, attn_mask=src_mask,\n key_padding_mask=src_key_padding_mask)[0]\n src = src + self.dropout1(src2)\n src = self.norm1(src)\n src2 = self.linear2(self.dropout(self.activation(self.linear1(src))))\n src = src + self.dropout2(src2)\n src = self.norm2(src)\n return src\n def forward_pre(self, src,\n src_mask: Optional[Tensor] = None,\n src_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None):\n src2 = self.norm1(src)\n q = k = self.with_pos_embed(src2, pos)\n src2 = self.self_attn(q, k, value=src2, attn_mask=src_mask,"
+ },
+ {
+ "comment": "This code defines a TransformerDecoderLayer class that inherits from nn.Module and takes in parameters such as d_model, nhead, dim_feedforward, dropout, activation, and normalize_before. The class has methods for forward pass and initializing the layer. It also includes an instance of MultiheadAttention for self attention and multi-headed attention.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":187-209",
+ "content": " key_padding_mask=src_key_padding_mask)[0]\n src = src + self.dropout1(src2)\n src2 = self.norm2(src)\n src2 = self.linear2(self.dropout(self.activation(self.linear1(src2))))\n src = src + self.dropout2(src2)\n return src\n def forward(self, src,\n src_mask: Optional[Tensor] = None,\n src_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None):\n if self.normalize_before:\n return self.forward_pre(src, src_mask, src_key_padding_mask, pos)\n return self.forward_post(src, src_mask, src_key_padding_mask, pos)\nclass TransformerDecoderLayer(nn.Module):\n def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1,\n activation=\"relu\", normalize_before=False):\n super().__init__()\n self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)\n self.multihead_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)"
+ },
+ {
+ "comment": "This code defines a class for the Feedforward model in Transformer architecture. It includes several linear layers, dropout layers, and layer normalization. The forward_post method takes input tensors, masks, and positional embeddings as arguments to perform feed-forward operations.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":210-233",
+ "content": " # Implementation of Feedforward model\n self.linear1 = nn.Linear(d_model, dim_feedforward)\n self.dropout = nn.Dropout(dropout)\n self.linear2 = nn.Linear(dim_feedforward, d_model)\n self.norm1 = nn.LayerNorm(d_model)\n self.norm2 = nn.LayerNorm(d_model)\n self.norm3 = nn.LayerNorm(d_model)\n self.dropout1 = nn.Dropout(dropout)\n self.dropout2 = nn.Dropout(dropout)\n self.dropout3 = nn.Dropout(dropout)\n self.activation = _get_activation_fn(activation)\n self.normalize_before = normalize_before\n def with_pos_embed(self, tensor, pos: Optional[Tensor]):\n return tensor if pos is None else tensor + pos\n def forward_post(self, tgt, memory,\n tgt_mask: Optional[Tensor] = None,\n memory_mask: Optional[Tensor] = None,\n tgt_key_padding_mask: Optional[Tensor] = None,\n memory_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None,"
+ },
+ {
+ "comment": "This function performs multi-head self-attention, applies layer normalization and feed-forward network layers to the target sequence. It takes in the target (tgt) and memory sequences, along with optional masking tensors for attention masks and key padding masks. It returns the processed target sequence.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":234-254",
+ "content": " query_pos: Optional[Tensor] = None):\n q = k = self.with_pos_embed(tgt, query_pos)\n tgt2 = self.self_attn(q, k, value=tgt, attn_mask=tgt_mask,\n key_padding_mask=tgt_key_padding_mask)[0]\n tgt = tgt + self.dropout1(tgt2)\n tgt = self.norm1(tgt)\n tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt, query_pos),\n key=self.with_pos_embed(memory, pos),\n value=memory, attn_mask=memory_mask,\n key_padding_mask=memory_key_padding_mask)[0]\n tgt = tgt + self.dropout2(tgt2)\n tgt = self.norm2(tgt)\n tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))\n tgt = tgt + self.dropout3(tgt2)\n tgt = self.norm3(tgt)\n return tgt\n def forward_pre(self, tgt, memory,\n tgt_mask: Optional[Tensor] = None,\n memory_mask: Optional[Tensor] = None,\n tgt_key_padding_mask: Optional[Tensor] = None,"
+ },
+ {
+ "comment": "This code defines a function for the transformer model in PyTorch. It performs self-attention on the target sequence (tgt) and applies multi-head attention to interact with memory, incorporating positional embeddings and masking for attentive processing. Finally, it passes through a feed-forward network and dropout layers before returning the modified target sequence.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":255-274",
+ "content": " memory_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None,\n query_pos: Optional[Tensor] = None):\n tgt2 = self.norm1(tgt)\n q = k = self.with_pos_embed(tgt2, query_pos)\n tgt2 = self.self_attn(q, k, value=tgt2, attn_mask=tgt_mask,\n key_padding_mask=tgt_key_padding_mask)[0]\n tgt = tgt + self.dropout1(tgt2)\n tgt2 = self.norm2(tgt)\n tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt2, query_pos),\n key=self.with_pos_embed(memory, pos),\n value=memory, attn_mask=memory_mask,\n key_padding_mask=memory_key_padding_mask)[0]\n tgt = tgt + self.dropout2(tgt2)\n tgt2 = self.norm3(tgt)\n tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))\n tgt = tgt + self.dropout3(tgt2)\n return tgt\n def forward(self, tgt, memory,"
+ },
+ {
+ "comment": "The code defines a Transformer model with optional masks and position embeddings, using deepcopy to create N identical modules for parallel processing. The build_transformer function initializes the Transformer model with given argument values.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":275-298",
+ "content": " tgt_mask: Optional[Tensor] = None,\n memory_mask: Optional[Tensor] = None,\n tgt_key_padding_mask: Optional[Tensor] = None,\n memory_key_padding_mask: Optional[Tensor] = None,\n pos: Optional[Tensor] = None,\n query_pos: Optional[Tensor] = None):\n if self.normalize_before:\n return self.forward_pre(tgt, memory, tgt_mask, memory_mask,\n tgt_key_padding_mask, memory_key_padding_mask, pos, query_pos)\n return self.forward_post(tgt, memory, tgt_mask, memory_mask,\n tgt_key_padding_mask, memory_key_padding_mask, pos, query_pos)\ndef _get_clones(module, N):\n return nn.ModuleList([copy.deepcopy(module) for i in range(N)])\ndef build_transformer(args):\n return Transformer(\n d_model=args.hidden_dim,\n dropout=args.dropout,\n nhead=args.nheads,\n dim_feedforward=args.dim_feedforward,\n num_encoder_layers=args.enc_layers,"
+ },
+ {
+ "comment": "This code defines a function for creating a transformer model with specified parameters and returns an activation function based on the input string.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/models/transformer.py\":299-313",
+ "content": " num_decoder_layers=args.dec_layers,\n normalize_before=args.pre_norm,\n return_intermediate_dec=True,\n )\ndef _get_activation_fn(activation):\n \"\"\"Return an activation function given a string\"\"\"\n if activation == \"relu\":\n return F.relu\n if activation == \"gelu\":\n return F.gelu\n if activation == \"glu\":\n return F.glu\n raise RuntimeError(F\"activation should be relu/gelu, not {activation}.\")"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/fed5dcc3-e480-44b7-aa3a-2090e6e612ad.json b/docs/doc/fed5dcc3-e480-44b7-aa3a-2090e6e612ad.json
new file mode 100644
index 00000000..d6b50cb8
--- /dev/null
+++ b/docs/doc/fed5dcc3-e480-44b7-aa3a-2090e6e612ad.json
@@ -0,0 +1,30 @@
+{
+ "summary": "This code defines task parameters and simulation environments for robotics applications, including gripper position limits, joint names, and normalization functions for master and puppet grippers.",
+ "details": [
+ {
+ "comment": "This code defines constant values for task parameters. It specifies different simulation tasks, their associated dataset directories, the number of episodes, episode length, and camera names. These constants are used for organizing and accessing datasets in the 'DATA_DIR' directory.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/constants.py\":0-34",
+ "content": "import pathlib\nimport os\n### Task parameters\nDATA_DIR = '/home/zfu/interbotix_ws/src/act/data' if os.getlogin() == 'zfu' else '/scr/tonyzhao/datasets'\nSIM_TASK_CONFIGS = {\n 'sim_transfer_cube_scripted':{\n 'dataset_dir': DATA_DIR + '/sim_transfer_cube_scripted',\n 'num_episodes': 50,\n 'episode_len': 400,\n 'camera_names': ['top', 'left_wrist', 'right_wrist']\n },\n 'sim_transfer_cube_human':{\n 'dataset_dir': DATA_DIR + '/sim_transfer_cube_human',\n 'num_episodes': 50,\n 'episode_len': 400,\n 'camera_names': ['top']\n },\n 'sim_insertion_scripted': {\n 'dataset_dir': DATA_DIR + '/sim_insertion_scripted',\n 'num_episodes': 50,\n 'episode_len': 400,\n 'camera_names': ['top', 'left_wrist', 'right_wrist']\n },\n 'sim_insertion_human': {\n 'dataset_dir': DATA_DIR + '/sim_insertion_human',\n 'num_episodes': 50,\n 'episode_len': 500,\n 'camera_names': ['top']\n },\n 'all': {\n 'dataset_dir': DATA_DIR + '/',"
+ },
+ {
+ "comment": "This code defines a dictionary containing constant values for simulation environments. It includes dataset directories, episode parameters, and camera names for each environment. Additionally, there are constants defining the time step (DT), frame rate (FPS), joint names, initial arm pose, and finger position limits for the simulation. These constants will be used in the simulation processes to ensure consistency across different environments and tasks.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/constants.py\":35-65",
+ "content": " 'num_episodes': None,\n 'episode_len': None,\n 'name_filter': lambda n: 'sim' not in n,\n 'camera_names': ['cam_high', 'cam_left_wrist', 'cam_right_wrist']\n },\n 'sim_transfer_cube_scripted_mirror':{\n 'dataset_dir': DATA_DIR + '/sim_transfer_cube_scripted_mirror',\n 'num_episodes': None,\n 'episode_len': 400,\n 'camera_names': ['top', 'left_wrist', 'right_wrist']\n },\n 'sim_insertion_scripted_mirror': {\n 'dataset_dir': DATA_DIR + '/sim_insertion_scripted_mirror',\n 'num_episodes': None,\n 'episode_len': 400,\n 'camera_names': ['top', 'left_wrist', 'right_wrist']\n },\n}\n### Simulation envs fixed constants\nDT = 0.02\nFPS = 50\nJOINT_NAMES = [\"waist\", \"shoulder\", \"elbow\", \"forearm_roll\", \"wrist_angle\", \"wrist_rotate\"]\nSTART_ARM_POSE = [0, -0.96, 1.16, 0, -0.3, 0, 0.02239, -0.02239, 0, -0.96, 1.16, 0, -0.3, 0, 0.02239, -0.02239]\nXML_DIR = str(pathlib.Path(__file__).parent.resolve()) + '/assets/' # note: absolute path\n# Left finger position limits (qpos[7]), right_finger = -1 * left_finger"
+ },
+ {
+ "comment": "This code defines gripper position and joint limits for the master and puppet grippers. It also includes normalization and unnormalization functions to convert gripper positions between normalized and actual values. The purpose is likely to enable consistent handling of gripper positions regardless of their current state.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/constants.py\":66-83",
+ "content": "MASTER_GRIPPER_POSITION_OPEN = 0.02417\nMASTER_GRIPPER_POSITION_CLOSE = 0.01244\nPUPPET_GRIPPER_POSITION_OPEN = 0.05800\nPUPPET_GRIPPER_POSITION_CLOSE = 0.01844\n# Gripper joint limits (qpos[6])\nMASTER_GRIPPER_JOINT_OPEN = -0.8\nMASTER_GRIPPER_JOINT_CLOSE = -1.65\nPUPPET_GRIPPER_JOINT_OPEN = 1.4910\nPUPPET_GRIPPER_JOINT_CLOSE = -0.6213\n############################ Helper functions ############################\nMASTER_GRIPPER_POSITION_NORMALIZE_FN = lambda x: (x - MASTER_GRIPPER_POSITION_CLOSE) / (MASTER_GRIPPER_POSITION_OPEN - MASTER_GRIPPER_POSITION_CLOSE)\nPUPPET_GRIPPER_POSITION_NORMALIZE_FN = lambda x: (x - PUPPET_GRIPPER_POSITION_CLOSE) / (PUPPET_GRIPPER_POSITION_OPEN - PUPPET_GRIPPER_POSITION_CLOSE)\nMASTER_GRIPPER_POSITION_UNNORMALIZE_FN = lambda x: x * (MASTER_GRIPPER_POSITION_OPEN - MASTER_GRIPPER_POSITION_CLOSE) + MASTER_GRIPPER_POSITION_CLOSE\nPUPPET_GRIPPER_POSITION_UNNORMALIZE_FN = lambda x: x * (PUPPET_GRIPPER_POSITION_OPEN - PUPPET_GRIPPER_POSITION_CLOSE) + PUPPET_GRIPPER_POSITION_CLOSE\nMASTER2P"
+ },
+ {
+ "comment": "This code defines various lambda functions for joint normalization and unnormalization, gripper velocity normalization, as well as a master-to-puppet joint conversion function. These functions are likely used in robotics or similar applications to manipulate and convert gripper positions and velocities between two systems with different open and closed positions.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/constants.py\":83-94",
+ "content": "UPPET_POSITION_FN = lambda x: PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(MASTER_GRIPPER_POSITION_NORMALIZE_FN(x))\nMASTER_GRIPPER_JOINT_NORMALIZE_FN = lambda x: (x - MASTER_GRIPPER_JOINT_CLOSE) / (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE)\nPUPPET_GRIPPER_JOINT_NORMALIZE_FN = lambda x: (x - PUPPET_GRIPPER_JOINT_CLOSE) / (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE)\nMASTER_GRIPPER_JOINT_UNNORMALIZE_FN = lambda x: x * (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE) + MASTER_GRIPPER_JOINT_CLOSE\nPUPPET_GRIPPER_JOINT_UNNORMALIZE_FN = lambda x: x * (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE) + PUPPET_GRIPPER_JOINT_CLOSE\nMASTER2PUPPET_JOINT_FN = lambda x: PUPPET_GRIPPER_JOINT_UNNORMALIZE_FN(MASTER_GRIPPER_JOINT_NORMALIZE_FN(x))\nMASTER_GRIPPER_VELOCITY_NORMALIZE_FN = lambda x: x / (MASTER_GRIPPER_POSITION_OPEN - MASTER_GRIPPER_POSITION_CLOSE)\nPUPPET_GRIPPER_VELOCITY_NORMALIZE_FN = lambda x: x / (PUPPET_GRIPPER_POSITION_OPEN - PUPPET_GRIPPER_POSITION_CLOSE)\nMASTE"
+ },
+ {
+ "comment": "This code defines four lambda functions, two each for the master and puppet grippers. The functions convert gripper positions to joint angles (pos2joint) and vice versa (joint2pos). It also calculates the midpoint of the master gripper's joint range. These functions use normalize and unnormalize FN from respective constants.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/constants.py\":94-99",
+ "content": "R_POS2JOINT = lambda x: MASTER_GRIPPER_POSITION_NORMALIZE_FN(x) * (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE) + MASTER_GRIPPER_JOINT_CLOSE\nMASTER_JOINT2POS = lambda x: MASTER_GRIPPER_POSITION_UNNORMALIZE_FN((x - MASTER_GRIPPER_JOINT_CLOSE) / (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE))\nPUPPET_POS2JOINT = lambda x: PUPPET_GRIPPER_POSITION_NORMALIZE_FN(x) * (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE) + PUPPET_GRIPPER_JOINT_CLOSE\nPUPPET_JOINT2POS = lambda x: PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN((x - PUPPET_GRIPPER_JOINT_CLOSE) / (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE))\nMASTER_GRIPPER_JOINT_MID = (MASTER_GRIPPER_JOINT_OPEN + MASTER_GRIPPER_JOINT_CLOSE)/2"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/doc/ffe5fc2c-ef19-46fe-837a-635d7e4a7acb.json b/docs/doc/ffe5fc2c-ef19-46fe-837a-635d7e4a7acb.json
new file mode 100644
index 00000000..20319413
--- /dev/null
+++ b/docs/doc/ffe5fc2c-ef19-46fe-837a-635d7e4a7acb.json
@@ -0,0 +1,30 @@
+{
+ "summary": "The \"plot_logs\" function generates matplotlib plots using training logs, handling missing files and plotting precision-recall curves with interpolated mAP values, setting axes titles and legends.",
+ "details": [
+ {
+ "comment": "This code defines a function \"plot_logs\" that takes in training logs, fields to plot (like class_error, loss), and optional parameters like ewm_col and log_name. It then generates matplotlib plots showing the results of each field color-coded for each log file with solid lines representing training results and dashed lines for test results.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/plot_utils.py\":0-25",
+ "content": "\"\"\"\nPlotting utilities to visualize training logs.\n\"\"\"\nimport torch\nimport pandas as pd\nimport numpy as np\nimport seaborn as sns\nimport matplotlib.pyplot as plt\nfrom pathlib import Path, PurePath\ndef plot_logs(logs, fields=('class_error', 'loss_bbox_unscaled', 'mAP'), ewm_col=0, log_name='log.txt'):\n '''\n Function to plot specific fields from training log(s). Plots both training and test results.\n :: Inputs - logs = list containing Path objects, each pointing to individual dir with a log file\n - fields = which results to plot from each log file - plots both training and test for each field.\n - ewm_col = optional, which column to use as the exponential weighted smoothing of the plots\n - log_name = optional, name of log file if different than default 'log.txt'.\n :: Outputs - matplotlib plots of results in fields, color coded for each log file.\n - solid lines are training results, dashed lines are test results.\n '''\n func_name = \"plot_utils.py::plot_logs\""
+ },
+ {
+ "comment": "This code checks if the 'logs' argument is a list of Paths or a single Path object. If not, it raises an error. It then iterates over each directory in the logs list and ensures they exist as directories. Finally, it checks if the log_name exists within each directory.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/plot_utils.py\":27-46",
+ "content": " # verify logs is a list of Paths (list[Paths]) or single Pathlib object Path,\n # convert single Path to list to avoid 'not iterable' error\n if not isinstance(logs, list):\n if isinstance(logs, PurePath):\n logs = [logs]\n print(f\"{func_name} info: logs param expects a list argument, converted to list[Path].\")\n else:\n raise ValueError(f\"{func_name} - invalid argument for logs parameter.\\n \\\n Expect list[Path] or single Path obj, received {type(logs)}\")\n # Quality checks - verify valid dir(s), that every item in list is Path object, and that log_name exists in each dir\n for i, dir in enumerate(logs):\n if not isinstance(dir, PurePath):\n raise ValueError(f\"{func_name} - non-Path object in logs argument of {type(dir)}: \\n{dir}\")\n if not dir.exists():\n raise ValueError(f\"{func_name} - invalid directory in logs argument:\\n{dir}\")\n # verify log_name exists\n fn = Path(dir / log_name)\n if not fn.exists():"
+ },
+ {
+ "comment": "This code checks for a missing log file and prompts the user to make sure they've reached Epoch 1 in training. It then loads log files, plots data frames for specified fields, and handles missing log files. The plot includes mAP (mean average precision) values using COCO evaluation metrics, and other field values interpolated and smoothed using exponential weighted moving averages.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/plot_utils.py\":47-71",
+ "content": " print(f\"-> missing {log_name}. Have you gotten to Epoch 1 in training?\")\n print(f\"--> full path of missing log file: {fn}\")\n return\n # load log file(s) and plot\n dfs = [pd.read_json(Path(p) / log_name, lines=True) for p in logs]\n fig, axs = plt.subplots(ncols=len(fields), figsize=(16, 5))\n for df, color in zip(dfs, sns.color_palette(n_colors=len(logs))):\n for j, field in enumerate(fields):\n if field == 'mAP':\n coco_eval = pd.DataFrame(\n np.stack(df.test_coco_eval_bbox.dropna().values)[:, 1]\n ).ewm(com=ewm_col).mean()\n axs[j].plot(coco_eval, c=color)\n else:\n df.interpolate().ewm(com=ewm_col).mean().plot(\n y=[f'train_{field}', f'test_{field}'],\n ax=axs[j],\n color=[color] * 2,\n style=['-', '--']\n )\n for ax, field in zip(axs, fields):\n ax.legend([Path(p).name for p in logs])"
+ },
+ {
+ "comment": "The code defines a function plot_precision_recall that takes in files, and depending on the naming_scheme, extracts either the exp_id or stem from each file. It then creates a figure with two subplots and for each file, it loads the corresponding data and calculates precision, recall, and mean average precision (mAP) at 50. The results are printed out for each file in the format \"naming_scheme name: mAP@50=precision%, score=score\".",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/plot_utils.py\":72-96",
+ "content": " ax.set_title(field)\ndef plot_precision_recall(files, naming_scheme='iter'):\n if naming_scheme == 'exp_id':\n # name becomes exp_id\n names = [f.parts[-3] for f in files]\n elif naming_scheme == 'iter':\n names = [f.stem for f in files]\n else:\n raise ValueError(f'not supported {naming_scheme}')\n fig, axs = plt.subplots(ncols=2, figsize=(16, 5))\n for f, color, name in zip(files, sns.color_palette(\"Blues\", n_colors=len(files)), names):\n data = torch.load(f)\n # precision is n_iou, n_points, n_cat, n_area, max_det\n precision = data['precision']\n recall = data['params'].recThrs\n scores = data['scores']\n # take precision for all classes, all areas and 100 detections\n precision = precision[0, :, :, 0, -1].mean(1)\n scores = scores[0, :, :, 0, -1].mean(1)\n prec = precision.mean()\n rec = data['recall'][0, :, 0, -1].mean()\n print(f'{naming_scheme} {name}: mAP@50={prec * 100: 05.1f}, ' +\n f'score={scores.mean():0.3f}, ' +"
+ },
+ {
+ "comment": "This code plots Precision-Recall curves and scores against Recall, sets titles for the axes, adds legends with given names, and returns the figure and axis objects.",
+ "location": "\"/media/root/Prima/works/act-plus-plus/docs/src/detr/util/plot_utils.py\":97-106",
+ "content": " f'f1={2 * prec * rec / (prec + rec + 1e-8):0.3f}'\n )\n axs[0].plot(recall, precision, c=color)\n axs[1].plot(recall, scores, c=color)\n axs[0].set_title('Precision / Recall')\n axs[0].legend(names)\n axs[1].set_title('Scores / Recall')\n axs[1].legend(names)\n return fig, axs"
+ }
+ ]
+}
\ No newline at end of file
diff --git a/docs/github-markdown.css b/docs/github-markdown.css
new file mode 100755
index 00000000..96a4f29e
--- /dev/null
+++ b/docs/github-markdown.css
@@ -0,0 +1,1197 @@
+@media (prefers-color-scheme: dark) {
+
+ .markdown-body,
+ [data-theme="dark"] {
+ /*dark*/
+ color-scheme: dark;
+ --color-prettylights-syntax-comment: #8b949e;
+ --color-prettylights-syntax-constant: #79c0ff;
+ --color-prettylights-syntax-entity: #d2a8ff;
+ --color-prettylights-syntax-storage-modifier-import: #c9d1d9;
+ --color-prettylights-syntax-entity-tag: #7ee787;
+ --color-prettylights-syntax-keyword: #ff7b72;
+ --color-prettylights-syntax-string: #a5d6ff;
+ --color-prettylights-syntax-variable: #ffa657;
+ --color-prettylights-syntax-brackethighlighter-unmatched: #f85149;
+ --color-prettylights-syntax-invalid-illegal-text: #f0f6fc;
+ --color-prettylights-syntax-invalid-illegal-bg: #8e1519;
+ --color-prettylights-syntax-carriage-return-text: #f0f6fc;
+ --color-prettylights-syntax-carriage-return-bg: #b62324;
+ --color-prettylights-syntax-string-regexp: #7ee787;
+ --color-prettylights-syntax-markup-list: #f2cc60;
+ --color-prettylights-syntax-markup-heading: #1f6feb;
+ --color-prettylights-syntax-markup-italic: #c9d1d9;
+ --color-prettylights-syntax-markup-bold: #c9d1d9;
+ --color-prettylights-syntax-markup-deleted-text: #ffdcd7;
+ --color-prettylights-syntax-markup-deleted-bg: #67060c;
+ --color-prettylights-syntax-markup-inserted-text: #aff5b4;
+ --color-prettylights-syntax-markup-inserted-bg: #033a16;
+ --color-prettylights-syntax-markup-changed-text: #ffdfb6;
+ --color-prettylights-syntax-markup-changed-bg: #5a1e02;
+ --color-prettylights-syntax-markup-ignored-text: #c9d1d9;
+ --color-prettylights-syntax-markup-ignored-bg: #1158c7;
+ --color-prettylights-syntax-meta-diff-range: #d2a8ff;
+ --color-prettylights-syntax-brackethighlighter-angle: #8b949e;
+ --color-prettylights-syntax-sublimelinter-gutter-mark: #484f58;
+ --color-prettylights-syntax-constant-other-reference-link: #a5d6ff;
+ --color-fg-default: #e6edf3;
+ --color-fg-muted: #848d97;
+ --color-fg-subtle: #6e7681;
+ --color-canvas-default: #0d1117;
+ --color-canvas-subtle: #161b22;
+ --color-border-default: #30363d;
+ --color-border-muted: #21262d;
+ --color-neutral-muted: rgba(110, 118, 129, 0.4);
+ --color-accent-fg: #2f81f7;
+ --color-accent-emphasis: #1f6feb;
+ --color-success-fg: #3fb950;
+ --color-success-emphasis: #238636;
+ --color-attention-fg: #d29922;
+ --color-attention-emphasis: #9e6a03;
+ --color-attention-subtle: rgba(187, 128, 9, 0.15);
+ --color-danger-fg: #f85149;
+ --color-danger-emphasis: #da3633;
+ --color-done-fg: #a371f7;
+ --color-done-emphasis: #8957e5;
+ }
+}
+
+@media (prefers-color-scheme: light) {
+
+ .markdown-body,
+ [data-theme="light"] {
+ /*light*/
+ color-scheme: light;
+ --color-prettylights-syntax-comment: #57606a;
+ --color-prettylights-syntax-constant: #0550ae;
+ --color-prettylights-syntax-entity: #6639ba;
+ --color-prettylights-syntax-storage-modifier-import: #24292f;
+ --color-prettylights-syntax-entity-tag: #116329;
+ --color-prettylights-syntax-keyword: #cf222e;
+ --color-prettylights-syntax-string: #0a3069;
+ --color-prettylights-syntax-variable: #953800;
+ --color-prettylights-syntax-brackethighlighter-unmatched: #82071e;
+ --color-prettylights-syntax-invalid-illegal-text: #f6f8fa;
+ --color-prettylights-syntax-invalid-illegal-bg: #82071e;
+ --color-prettylights-syntax-carriage-return-text: #f6f8fa;
+ --color-prettylights-syntax-carriage-return-bg: #cf222e;
+ --color-prettylights-syntax-string-regexp: #116329;
+ --color-prettylights-syntax-markup-list: #3b2300;
+ --color-prettylights-syntax-markup-heading: #0550ae;
+ --color-prettylights-syntax-markup-italic: #24292f;
+ --color-prettylights-syntax-markup-bold: #24292f;
+ --color-prettylights-syntax-markup-deleted-text: #82071e;
+ --color-prettylights-syntax-markup-deleted-bg: #ffebe9;
+ --color-prettylights-syntax-markup-inserted-text: #116329;
+ --color-prettylights-syntax-markup-inserted-bg: #dafbe1;
+ --color-prettylights-syntax-markup-changed-text: #953800;
+ --color-prettylights-syntax-markup-changed-bg: #ffd8b5;
+ --color-prettylights-syntax-markup-ignored-text: #eaeef2;
+ --color-prettylights-syntax-markup-ignored-bg: #0550ae;
+ --color-prettylights-syntax-meta-diff-range: #8250df;
+ --color-prettylights-syntax-brackethighlighter-angle: #57606a;
+ --color-prettylights-syntax-sublimelinter-gutter-mark: #8c959f;
+ --color-prettylights-syntax-constant-other-reference-link: #0a3069;
+ --color-fg-default: #1F2328;
+ --color-fg-muted: #656d76;
+ --color-fg-subtle: #6e7781;
+ --color-canvas-default: #ffffff;
+ --color-canvas-subtle: #f6f8fa;
+ --color-border-default: #d0d7de;
+ --color-border-muted: hsla(210, 18%, 87%, 1);
+ --color-neutral-muted: rgba(175, 184, 193, 0.2);
+ --color-accent-fg: #0969da;
+ --color-accent-emphasis: #0969da;
+ --color-success-fg: #1a7f37;
+ --color-success-emphasis: #1f883d;
+ --color-attention-fg: #9a6700;
+ --color-attention-emphasis: #9a6700;
+ --color-attention-subtle: #fff8c5;
+ --color-danger-fg: #d1242f;
+ --color-danger-emphasis: #cf222e;
+ --color-done-fg: #8250df;
+ --color-done-emphasis: #8250df;
+ }
+}
+
+.markdown-body {
+ -ms-text-size-adjust: 100%;
+ -webkit-text-size-adjust: 100%;
+ margin: 0;
+ color: var(--color-fg-default);
+ background-color: var(--color-canvas-default);
+ font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji";
+ font-size: 16px;
+ line-height: 1.5;
+ word-wrap: break-word;
+}
+
+.markdown-body .octicon {
+ display: inline-block;
+ fill: currentColor;
+ vertical-align: text-bottom;
+}
+
+.markdown-body h1:hover .anchor .octicon-link:before,
+.markdown-body h2:hover .anchor .octicon-link:before,
+.markdown-body h3:hover .anchor .octicon-link:before,
+.markdown-body h4:hover .anchor .octicon-link:before,
+.markdown-body h5:hover .anchor .octicon-link:before,
+.markdown-body h6:hover .anchor .octicon-link:before {
+ width: 16px;
+ height: 16px;
+ content: ' ';
+ display: inline-block;
+ background-color: currentColor;
+ -webkit-mask-image: url("data:image/svg+xml,");
+ mask-image: url("data:image/svg+xml,");
+}
+
+.markdown-body details,
+.markdown-body figcaption,
+.markdown-body figure {
+ display: block;
+}
+
+.markdown-body summary {
+ display: list-item;
+}
+
+.markdown-body [hidden] {
+ display: none !important;
+}
+
+.markdown-body a {
+ background-color: transparent;
+ color: var(--color-accent-fg);
+ text-decoration: none;
+}
+
+.markdown-body abbr[title] {
+ border-bottom: none;
+ -webkit-text-decoration: underline dotted;
+ text-decoration: underline dotted;
+}
+
+.markdown-body b,
+.markdown-body strong {
+ font-weight: var(--base-text-weight-semibold, 600);
+}
+
+.markdown-body dfn {
+ font-style: italic;
+}
+
+.markdown-body h1 {
+ margin: .67em 0;
+ font-weight: var(--base-text-weight-semibold, 600);
+ padding-bottom: .3em;
+ font-size: 2em;
+ border-bottom: 1px solid var(--color-border-muted);
+}
+
+.markdown-body mark {
+ background-color: var(--color-attention-subtle);
+ color: var(--color-fg-default);
+}
+
+.markdown-body small {
+ font-size: 90%;
+}
+
+.markdown-body sub,
+.markdown-body sup {
+ font-size: 75%;
+ line-height: 0;
+ position: relative;
+ vertical-align: baseline;
+}
+
+.markdown-body sub {
+ bottom: -0.25em;
+}
+
+.markdown-body sup {
+ top: -0.5em;
+}
+
+.markdown-body img {
+ border-style: none;
+ max-width: 100%;
+ box-sizing: content-box;
+ background-color: var(--color-canvas-default);
+}
+
+.markdown-body code,
+.markdown-body kbd,
+.markdown-body pre,
+.markdown-body samp {
+ font-family: monospace;
+ font-size: 1em;
+}
+
+.markdown-body figure {
+ margin: 1em 40px;
+}
+
+.markdown-body hr {
+ box-sizing: content-box;
+ overflow: hidden;
+ background: transparent;
+ border-bottom: 1px solid var(--color-border-muted);
+ height: .25em;
+ padding: 0;
+ margin: 24px 0;
+ background-color: var(--color-border-default);
+ border: 0;
+}
+
+.markdown-body input {
+ font: inherit;
+ margin: 0;
+ overflow: visible;
+ font-family: inherit;
+ font-size: inherit;
+ line-height: inherit;
+}
+
+.markdown-body [type=button],
+.markdown-body [type=reset],
+.markdown-body [type=submit] {
+ -webkit-appearance: button;
+ appearance: button;
+}
+
+.markdown-body [type=checkbox],
+.markdown-body [type=radio] {
+ box-sizing: border-box;
+ padding: 0;
+}
+
+.markdown-body [type=number]::-webkit-inner-spin-button,
+.markdown-body [type=number]::-webkit-outer-spin-button {
+ height: auto;
+}
+
+.markdown-body [type=search]::-webkit-search-cancel-button,
+.markdown-body [type=search]::-webkit-search-decoration {
+ -webkit-appearance: none;
+ appearance: none;
+}
+
+.markdown-body ::-webkit-input-placeholder {
+ color: inherit;
+ opacity: .54;
+}
+
+.markdown-body ::-webkit-file-upload-button {
+ -webkit-appearance: button;
+ appearance: button;
+ font: inherit;
+}
+
+.markdown-body a:hover {
+ text-decoration: underline;
+}
+
+.markdown-body ::placeholder {
+ color: var(--color-fg-subtle);
+ opacity: 1;
+}
+
+.markdown-body hr::before {
+ display: table;
+ content: "";
+}
+
+.markdown-body hr::after {
+ display: table;
+ clear: both;
+ content: "";
+}
+
+.markdown-body table {
+ border-spacing: 0;
+ border-collapse: collapse;
+ display: block;
+ width: max-content;
+ max-width: 100%;
+ overflow: auto;
+}
+
+.markdown-body td,
+.markdown-body th {
+ padding: 0;
+}
+
+.markdown-body details summary {
+ cursor: pointer;
+}
+
+.markdown-body details:not([open])>*:not(summary) {
+ display: none !important;
+}
+
+.markdown-body a:focus,
+.markdown-body [role=button]:focus,
+.markdown-body input[type=radio]:focus,
+.markdown-body input[type=checkbox]:focus {
+ outline: 2px solid var(--color-accent-fg);
+ outline-offset: -2px;
+ box-shadow: none;
+}
+
+.markdown-body a:focus:not(:focus-visible),
+.markdown-body [role=button]:focus:not(:focus-visible),
+.markdown-body input[type=radio]:focus:not(:focus-visible),
+.markdown-body input[type=checkbox]:focus:not(:focus-visible) {
+ outline: solid 1px transparent;
+}
+
+.markdown-body a:focus-visible,
+.markdown-body [role=button]:focus-visible,
+.markdown-body input[type=radio]:focus-visible,
+.markdown-body input[type=checkbox]:focus-visible {
+ outline: 2px solid var(--color-accent-fg);
+ outline-offset: -2px;
+ box-shadow: none;
+}
+
+.markdown-body a:not([class]):focus,
+.markdown-body a:not([class]):focus-visible,
+.markdown-body input[type=radio]:focus,
+.markdown-body input[type=radio]:focus-visible,
+.markdown-body input[type=checkbox]:focus,
+.markdown-body input[type=checkbox]:focus-visible {
+ outline-offset: 0;
+}
+
+.markdown-body kbd {
+ display: inline-block;
+ padding: 3px 5px;
+ font: 11px ui-monospace, SFMono-Regular, SF Mono, Menlo, Consolas, Liberation Mono, monospace;
+ line-height: 10px;
+ color: var(--color-fg-default);
+ vertical-align: middle;
+ background-color: var(--color-canvas-subtle);
+ border: solid 1px var(--color-neutral-muted);
+ border-bottom-color: var(--color-neutral-muted);
+ border-radius: 6px;
+ box-shadow: inset 0 -1px 0 var(--color-neutral-muted);
+}
+
+.markdown-body h1,
+.markdown-body h2,
+.markdown-body h3,
+.markdown-body h4,
+.markdown-body h5,
+.markdown-body h6 {
+ margin-top: 24px;
+ margin-bottom: 16px;
+ font-weight: var(--base-text-weight-semibold, 600);
+ line-height: 1.25;
+}
+
+.markdown-body h2 {
+ font-weight: var(--base-text-weight-semibold, 600);
+ padding-bottom: .3em;
+ font-size: 1.5em;
+ border-bottom: 1px solid var(--color-border-muted);
+}
+
+.markdown-body h3 {
+ font-weight: var(--base-text-weight-semibold, 600);
+ font-size: 1.25em;
+}
+
+.markdown-body h4 {
+ font-weight: var(--base-text-weight-semibold, 600);
+ font-size: 1em;
+}
+
+.markdown-body h5 {
+ font-weight: var(--base-text-weight-semibold, 600);
+ font-size: .875em;
+}
+
+.markdown-body h6 {
+ font-weight: var(--base-text-weight-semibold, 600);
+ font-size: .85em;
+ color: var(--color-fg-muted);
+}
+
+.markdown-body p {
+ margin-top: 0;
+ margin-bottom: 10px;
+}
+
+.markdown-body blockquote {
+ margin: 0;
+ padding: 0 1em;
+ color: var(--color-fg-muted);
+ border-left: .25em solid var(--color-border-default);
+}
+
+.markdown-body ul,
+.markdown-body ol {
+ margin-top: 0;
+ margin-bottom: 0;
+ padding-left: 2em;
+}
+
+.markdown-body ol ol,
+.markdown-body ul ol {
+ list-style-type: lower-roman;
+}
+
+.markdown-body ul ul ol,
+.markdown-body ul ol ol,
+.markdown-body ol ul ol,
+.markdown-body ol ol ol {
+ list-style-type: lower-alpha;
+}
+
+.markdown-body dd {
+ margin-left: 0;
+}
+
+.markdown-body tt,
+.markdown-body code,
+.markdown-body samp {
+ font-family: ui-monospace, SFMono-Regular, SF Mono, Menlo, Consolas, Liberation Mono, monospace;
+ font-size: 12px;
+}
+
+.markdown-body pre {
+ margin-top: 0;
+ margin-bottom: 0;
+ font-family: ui-monospace, SFMono-Regular, SF Mono, Menlo, Consolas, Liberation Mono, monospace;
+ font-size: 12px;
+ word-wrap: normal;
+}
+
+.markdown-body .octicon {
+ display: inline-block;
+ overflow: visible !important;
+ vertical-align: text-bottom;
+ fill: currentColor;
+}
+
+.markdown-body input::-webkit-outer-spin-button,
+.markdown-body input::-webkit-inner-spin-button {
+ margin: 0;
+ -webkit-appearance: none;
+ appearance: none;
+}
+
+.markdown-body .mr-2 {
+ margin-right: var(--base-size-8, 8px) !important;
+}
+
+.markdown-body::before {
+ display: table;
+ content: "";
+}
+
+.markdown-body::after {
+ display: table;
+ clear: both;
+ content: "";
+}
+
+.markdown-body>*:first-child {
+ margin-top: 0 !important;
+}
+
+.markdown-body>*:last-child {
+ margin-bottom: 0 !important;
+}
+
+.markdown-body a:not([href]) {
+ color: inherit;
+ text-decoration: none;
+}
+
+.markdown-body .absent {
+ color: var(--color-danger-fg);
+}
+
+.markdown-body .anchor {
+ float: left;
+ padding-right: 4px;
+ margin-left: -20px;
+ line-height: 1;
+}
+
+.markdown-body .anchor:focus {
+ outline: none;
+}
+
+.markdown-body p,
+.markdown-body blockquote,
+.markdown-body ul,
+.markdown-body ol,
+.markdown-body dl,
+.markdown-body table,
+.markdown-body pre,
+.markdown-body details {
+ margin-top: 0;
+ margin-bottom: 16px;
+}
+
+.markdown-body blockquote>:first-child {
+ margin-top: 0;
+}
+
+.markdown-body blockquote>:last-child {
+ margin-bottom: 0;
+}
+
+.markdown-body h1 .octicon-link,
+.markdown-body h2 .octicon-link,
+.markdown-body h3 .octicon-link,
+.markdown-body h4 .octicon-link,
+.markdown-body h5 .octicon-link,
+.markdown-body h6 .octicon-link {
+ color: var(--color-fg-default);
+ vertical-align: middle;
+ visibility: hidden;
+}
+
+.markdown-body h1:hover .anchor,
+.markdown-body h2:hover .anchor,
+.markdown-body h3:hover .anchor,
+.markdown-body h4:hover .anchor,
+.markdown-body h5:hover .anchor,
+.markdown-body h6:hover .anchor {
+ text-decoration: none;
+}
+
+.markdown-body h1:hover .anchor .octicon-link,
+.markdown-body h2:hover .anchor .octicon-link,
+.markdown-body h3:hover .anchor .octicon-link,
+.markdown-body h4:hover .anchor .octicon-link,
+.markdown-body h5:hover .anchor .octicon-link,
+.markdown-body h6:hover .anchor .octicon-link {
+ visibility: visible;
+}
+
+.markdown-body h1 tt,
+.markdown-body h1 code,
+.markdown-body h2 tt,
+.markdown-body h2 code,
+.markdown-body h3 tt,
+.markdown-body h3 code,
+.markdown-body h4 tt,
+.markdown-body h4 code,
+.markdown-body h5 tt,
+.markdown-body h5 code,
+.markdown-body h6 tt,
+.markdown-body h6 code {
+ padding: 0 .2em;
+ font-size: inherit;
+}
+
+.markdown-body summary h1,
+.markdown-body summary h2,
+.markdown-body summary h3,
+.markdown-body summary h4,
+.markdown-body summary h5,
+.markdown-body summary h6 {
+ display: inline-block;
+}
+
+.markdown-body summary h1 .anchor,
+.markdown-body summary h2 .anchor,
+.markdown-body summary h3 .anchor,
+.markdown-body summary h4 .anchor,
+.markdown-body summary h5 .anchor,
+.markdown-body summary h6 .anchor {
+ margin-left: -40px;
+}
+
+.markdown-body summary h1,
+.markdown-body summary h2 {
+ padding-bottom: 0;
+ border-bottom: 0;
+}
+
+.markdown-body ul.no-list,
+.markdown-body ol.no-list {
+ padding: 0;
+ list-style-type: none;
+}
+
+.markdown-body ol[type="a s"] {
+ list-style-type: lower-alpha;
+}
+
+.markdown-body ol[type="A s"] {
+ list-style-type: upper-alpha;
+}
+
+.markdown-body ol[type="i s"] {
+ list-style-type: lower-roman;
+}
+
+.markdown-body ol[type="I s"] {
+ list-style-type: upper-roman;
+}
+
+.markdown-body ol[type="1"] {
+ list-style-type: decimal;
+}
+
+.markdown-body div>ol:not([type]) {
+ list-style-type: decimal;
+}
+
+.markdown-body ul ul,
+.markdown-body ul ol,
+.markdown-body ol ol,
+.markdown-body ol ul {
+ margin-top: 0;
+ margin-bottom: 0;
+}
+
+.markdown-body li>p {
+ margin-top: 16px;
+}
+
+.markdown-body li+li {
+ margin-top: .25em;
+}
+
+.markdown-body dl {
+ padding: 0;
+}
+
+.markdown-body dl dt {
+ padding: 0;
+ margin-top: 16px;
+ font-size: 1em;
+ font-style: italic;
+ font-weight: var(--base-text-weight-semibold, 600);
+}
+
+.markdown-body dl dd {
+ padding: 0 16px;
+ margin-bottom: 16px;
+}
+
+.markdown-body table th {
+ font-weight: var(--base-text-weight-semibold, 600);
+}
+
+.markdown-body table th,
+.markdown-body table td {
+ padding: 6px 13px;
+ border: 1px solid var(--color-border-default);
+}
+
+.markdown-body table td>:last-child {
+ margin-bottom: 0;
+}
+
+.markdown-body table tr {
+ background-color: var(--color-canvas-default);
+ border-top: 1px solid var(--color-border-muted);
+}
+
+.markdown-body table tr:nth-child(2n) {
+ background-color: var(--color-canvas-subtle);
+}
+
+.markdown-body table img {
+ background-color: transparent;
+}
+
+.markdown-body img[align=right] {
+ padding-left: 20px;
+}
+
+.markdown-body img[align=left] {
+ padding-right: 20px;
+}
+
+.markdown-body .emoji {
+ max-width: none;
+ vertical-align: text-top;
+ background-color: transparent;
+}
+
+.markdown-body span.frame {
+ display: block;
+ overflow: hidden;
+}
+
+.markdown-body span.frame>span {
+ display: block;
+ float: left;
+ width: auto;
+ padding: 7px;
+ margin: 13px 0 0;
+ overflow: hidden;
+ border: 1px solid var(--color-border-default);
+}
+
+.markdown-body span.frame span img {
+ display: block;
+ float: left;
+}
+
+.markdown-body span.frame span span {
+ display: block;
+ padding: 5px 0 0;
+ clear: both;
+ color: var(--color-fg-default);
+}
+
+.markdown-body span.align-center {
+ display: block;
+ overflow: hidden;
+ clear: both;
+}
+
+.markdown-body span.align-center>span {
+ display: block;
+ margin: 13px auto 0;
+ overflow: hidden;
+ text-align: center;
+}
+
+.markdown-body span.align-center span img {
+ margin: 0 auto;
+ text-align: center;
+}
+
+.markdown-body span.align-right {
+ display: block;
+ overflow: hidden;
+ clear: both;
+}
+
+.markdown-body span.align-right>span {
+ display: block;
+ margin: 13px 0 0;
+ overflow: hidden;
+ text-align: right;
+}
+
+.markdown-body span.align-right span img {
+ margin: 0;
+ text-align: right;
+}
+
+.markdown-body span.float-left {
+ display: block;
+ float: left;
+ margin-right: 13px;
+ overflow: hidden;
+}
+
+.markdown-body span.float-left span {
+ margin: 13px 0 0;
+}
+
+.markdown-body span.float-right {
+ display: block;
+ float: right;
+ margin-left: 13px;
+ overflow: hidden;
+}
+
+.markdown-body span.float-right>span {
+ display: block;
+ margin: 13px auto 0;
+ overflow: hidden;
+ text-align: right;
+}
+
+.markdown-body code,
+.markdown-body tt {
+ padding: .2em .4em;
+ margin: 0;
+ font-size: 85%;
+ white-space: break-spaces;
+ background-color: var(--color-neutral-muted);
+ border-radius: 6px;
+}
+
+.markdown-body code br,
+.markdown-body tt br {
+ display: none;
+}
+
+.markdown-body del code {
+ text-decoration: inherit;
+}
+
+.markdown-body samp {
+ font-size: 85%;
+}
+
+.markdown-body pre code {
+ font-size: 100%;
+}
+
+.markdown-body pre>code {
+ padding: 0;
+ margin: 0;
+ word-break: normal;
+ white-space: pre;
+ background: transparent;
+ border: 0;
+}
+
+.markdown-body .highlight {
+ margin-bottom: 16px;
+}
+
+.markdown-body .highlight pre {
+ margin-bottom: 0;
+ word-break: normal;
+}
+
+.markdown-body .highlight pre,
+.markdown-body pre {
+ padding: 16px;
+ overflow: auto;
+ font-size: 85%;
+ line-height: 1.45;
+ color: var(--color-fg-default);
+ background-color: var(--color-canvas-subtle);
+ border-radius: 6px;
+}
+
+.markdown-body pre code,
+.markdown-body pre tt {
+ display: inline;
+ max-width: auto;
+ padding: 0;
+ margin: 0;
+ overflow: visible;
+ line-height: inherit;
+ word-wrap: normal;
+ background-color: transparent;
+ border: 0;
+}
+
+.markdown-body .csv-data td,
+.markdown-body .csv-data th {
+ padding: 5px;
+ overflow: hidden;
+ font-size: 12px;
+ line-height: 1;
+ text-align: left;
+ white-space: nowrap;
+}
+
+.markdown-body .csv-data .blob-num {
+ padding: 10px 8px 9px;
+ text-align: right;
+ background: var(--color-canvas-default);
+ border: 0;
+}
+
+.markdown-body .csv-data tr {
+ border-top: 0;
+}
+
+.markdown-body .csv-data th {
+ font-weight: var(--base-text-weight-semibold, 600);
+ background: var(--color-canvas-subtle);
+ border-top: 0;
+}
+
+.markdown-body [data-footnote-ref]::before {
+ content: "[";
+}
+
+.markdown-body [data-footnote-ref]::after {
+ content: "]";
+}
+
+.markdown-body .footnotes {
+ font-size: 12px;
+ color: var(--color-fg-muted);
+ border-top: 1px solid var(--color-border-default);
+}
+
+.markdown-body .footnotes ol {
+ padding-left: 16px;
+}
+
+.markdown-body .footnotes ol ul {
+ display: inline-block;
+ padding-left: 16px;
+ margin-top: 16px;
+}
+
+.markdown-body .footnotes li {
+ position: relative;
+}
+
+.markdown-body .footnotes li:target::before {
+ position: absolute;
+ top: -8px;
+ right: -8px;
+ bottom: -8px;
+ left: -24px;
+ pointer-events: none;
+ content: "";
+ border: 2px solid var(--color-accent-emphasis);
+ border-radius: 6px;
+}
+
+.markdown-body .footnotes li:target {
+ color: var(--color-fg-default);
+}
+
+.markdown-body .footnotes .data-footnote-backref g-emoji {
+ font-family: monospace;
+}
+
+.markdown-body .pl-c {
+ color: var(--color-prettylights-syntax-comment);
+}
+
+.markdown-body .pl-c1,
+.markdown-body .pl-s .pl-v {
+ color: var(--color-prettylights-syntax-constant);
+}
+
+.markdown-body .pl-e,
+.markdown-body .pl-en {
+ color: var(--color-prettylights-syntax-entity);
+}
+
+.markdown-body .pl-smi,
+.markdown-body .pl-s .pl-s1 {
+ color: var(--color-prettylights-syntax-storage-modifier-import);
+}
+
+.markdown-body .pl-ent {
+ color: var(--color-prettylights-syntax-entity-tag);
+}
+
+.markdown-body .pl-k {
+ color: var(--color-prettylights-syntax-keyword);
+}
+
+.markdown-body .pl-s,
+.markdown-body .pl-pds,
+.markdown-body .pl-s .pl-pse .pl-s1,
+.markdown-body .pl-sr,
+.markdown-body .pl-sr .pl-cce,
+.markdown-body .pl-sr .pl-sre,
+.markdown-body .pl-sr .pl-sra {
+ color: var(--color-prettylights-syntax-string);
+}
+
+.markdown-body .pl-v,
+.markdown-body .pl-smw {
+ color: var(--color-prettylights-syntax-variable);
+}
+
+.markdown-body .pl-bu {
+ color: var(--color-prettylights-syntax-brackethighlighter-unmatched);
+}
+
+.markdown-body .pl-ii {
+ color: var(--color-prettylights-syntax-invalid-illegal-text);
+ background-color: var(--color-prettylights-syntax-invalid-illegal-bg);
+}
+
+.markdown-body .pl-c2 {
+ color: var(--color-prettylights-syntax-carriage-return-text);
+ background-color: var(--color-prettylights-syntax-carriage-return-bg);
+}
+
+.markdown-body .pl-sr .pl-cce {
+ font-weight: bold;
+ color: var(--color-prettylights-syntax-string-regexp);
+}
+
+.markdown-body .pl-ml {
+ color: var(--color-prettylights-syntax-markup-list);
+}
+
+.markdown-body .pl-mh,
+.markdown-body .pl-mh .pl-en,
+.markdown-body .pl-ms {
+ font-weight: bold;
+ color: var(--color-prettylights-syntax-markup-heading);
+}
+
+.markdown-body .pl-mi {
+ font-style: italic;
+ color: var(--color-prettylights-syntax-markup-italic);
+}
+
+.markdown-body .pl-mb {
+ font-weight: bold;
+ color: var(--color-prettylights-syntax-markup-bold);
+}
+
+.markdown-body .pl-md {
+ color: var(--color-prettylights-syntax-markup-deleted-text);
+ background-color: var(--color-prettylights-syntax-markup-deleted-bg);
+}
+
+.markdown-body .pl-mi1 {
+ color: var(--color-prettylights-syntax-markup-inserted-text);
+ background-color: var(--color-prettylights-syntax-markup-inserted-bg);
+}
+
+.markdown-body .pl-mc {
+ color: var(--color-prettylights-syntax-markup-changed-text);
+ background-color: var(--color-prettylights-syntax-markup-changed-bg);
+}
+
+.markdown-body .pl-mi2 {
+ color: var(--color-prettylights-syntax-markup-ignored-text);
+ background-color: var(--color-prettylights-syntax-markup-ignored-bg);
+}
+
+.markdown-body .pl-mdr {
+ font-weight: bold;
+ color: var(--color-prettylights-syntax-meta-diff-range);
+}
+
+.markdown-body .pl-ba {
+ color: var(--color-prettylights-syntax-brackethighlighter-angle);
+}
+
+.markdown-body .pl-sg {
+ color: var(--color-prettylights-syntax-sublimelinter-gutter-mark);
+}
+
+.markdown-body .pl-corl {
+ text-decoration: underline;
+ color: var(--color-prettylights-syntax-constant-other-reference-link);
+}
+
+.markdown-body g-emoji {
+ display: inline-block;
+ min-width: 1ch;
+ font-family: "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol";
+ font-size: 1em;
+ font-style: normal !important;
+ font-weight: var(--base-text-weight-normal, 400);
+ line-height: 1;
+ vertical-align: -0.075em;
+}
+
+.markdown-body g-emoji img {
+ width: 1em;
+ height: 1em;
+}
+
+.markdown-body .task-list-item {
+ list-style-type: none;
+}
+
+.markdown-body .task-list-item label {
+ font-weight: var(--base-text-weight-normal, 400);
+}
+
+.markdown-body .task-list-item.enabled label {
+ cursor: pointer;
+}
+
+.markdown-body .task-list-item+.task-list-item {
+ margin-top: 4px;
+}
+
+.markdown-body .task-list-item .handle {
+ display: none;
+}
+
+.markdown-body .task-list-item-checkbox {
+ margin: 0 .2em .25em -1.4em;
+ vertical-align: middle;
+}
+
+.markdown-body .contains-task-list:dir(rtl) .task-list-item-checkbox {
+ margin: 0 -1.6em .25em .2em;
+}
+
+.markdown-body .contains-task-list {
+ position: relative;
+}
+
+.markdown-body .contains-task-list:hover .task-list-item-convert-container,
+.markdown-body .contains-task-list:focus-within .task-list-item-convert-container {
+ display: block;
+ width: auto;
+ height: 24px;
+ overflow: visible;
+ clip: auto;
+}
+
+.markdown-body ::-webkit-calendar-picker-indicator {
+ filter: invert(50%);
+}
+
+.markdown-body .markdown-alert {
+ padding: var(--base-size-8) var(--base-size-16);
+ margin-bottom: 16px;
+ color: inherit;
+ border-left: .25em solid var(--color-border-default);
+}
+
+.markdown-body .markdown-alert>:first-child {
+ margin-top: 0;
+}
+
+.markdown-body .markdown-alert>:last-child {
+ margin-bottom: 0;
+}
+
+.markdown-body .markdown-alert .markdown-alert-title {
+ display: flex;
+ font-weight: var(--base-text-weight-medium, 500);
+ align-items: center;
+ line-height: 1;
+}
+
+.markdown-body .markdown-alert.markdown-alert-note {
+ border-left-color: var(--color-accent-emphasis);
+}
+
+.markdown-body .markdown-alert.markdown-alert-note .markdown-alert-title {
+ color: var(--color-accent-fg);
+}
+
+.markdown-body .markdown-alert.markdown-alert-important {
+ border-left-color: var(--color-done-emphasis);
+}
+
+.markdown-body .markdown-alert.markdown-alert-important .markdown-alert-title {
+ color: var(--color-done-fg);
+}
+
+.markdown-body .markdown-alert.markdown-alert-warning {
+ border-left-color: var(--color-attention-emphasis);
+}
+
+.markdown-body .markdown-alert.markdown-alert-warning .markdown-alert-title {
+ color: var(--color-attention-fg);
+}
+
+.markdown-body .markdown-alert.markdown-alert-tip {
+ border-left-color: var(--color-success-emphasis);
+}
+
+.markdown-body .markdown-alert.markdown-alert-tip .markdown-alert-title {
+ color: var(--color-success-fg);
+}
+
+.markdown-body .markdown-alert.markdown-alert-caution {
+ border-left-color: var(--color-danger-emphasis);
+}
+
+.markdown-body .markdown-alert.markdown-alert-caution .markdown-alert-title {
+ color: var(--color-danger-fg);
+}
\ No newline at end of file
diff --git a/docs/index.html b/docs/index.html
new file mode 100755
index 00000000..d1154b4d
--- /dev/null
+++ b/docs/index.html
@@ -0,0 +1,1250 @@
+
+
+
+
+
+
+
+
+
+ Search Code By Comment
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Document index of:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/docs/metadata.json b/docs/metadata.json
new file mode 100644
index 00000000..0067396b
--- /dev/null
+++ b/docs/metadata.json
@@ -0,0 +1,205 @@
+{
+ "url": {
+ "full": "https://github.com/MarkFzp/act-plus-plus",
+ "partial": "MarkFzp/act-plus-plus"
+ },
+ "file_mapping": {
+ "0": {
+ "filepath": "/README.md",
+ "entry_id": 0,
+ "language_id": "plain-text"
+ },
+ "1": {
+ "filepath": "/__init__.py",
+ "entry_id": 10,
+ "language_id": "python"
+ },
+ "2": {
+ "filepath": "/align.py",
+ "entry_id": 14,
+ "language_id": "python"
+ },
+ "3": {
+ "filepath": "/commands.txt",
+ "entry_id": 20,
+ "language_id": "plain-text"
+ },
+ "4": {
+ "filepath": "/compress_data.py",
+ "entry_id": 66,
+ "language_id": "python"
+ },
+ "5": {
+ "filepath": "/conda_env.yaml",
+ "entry_id": 82,
+ "language_id": "yaml"
+ },
+ "6": {
+ "filepath": "/constants.py",
+ "entry_id": 86,
+ "language_id": "python"
+ },
+ "7": {
+ "filepath": "/detr/README.md",
+ "entry_id": 98,
+ "language_id": "plain-text"
+ },
+ "8": {
+ "filepath": "/detr/main.py",
+ "entry_id": 102,
+ "language_id": "python"
+ },
+ "9": {
+ "filepath": "/detr/models/__init__.py",
+ "entry_id": 118,
+ "language_id": "python"
+ },
+ "10": {
+ "filepath": "/detr/models/backbone.py",
+ "entry_id": 122,
+ "language_id": "python"
+ },
+ "11": {
+ "filepath": "/detr/models/detr_vae.py",
+ "entry_id": 134,
+ "language_id": "python"
+ },
+ "12": {
+ "filepath": "/detr/models/latent_model.py",
+ "entry_id": 164,
+ "language_id": "python"
+ },
+ "13": {
+ "filepath": "/detr/models/position_encoding.py",
+ "entry_id": 172,
+ "language_id": "python"
+ },
+ "14": {
+ "filepath": "/detr/models/transformer.py",
+ "entry_id": 182,
+ "language_id": "python"
+ },
+ "15": {
+ "filepath": "/detr/setup.py",
+ "entry_id": 210,
+ "language_id": "python"
+ },
+ "16": {
+ "filepath": "/detr/util/__init__.py",
+ "entry_id": 214,
+ "language_id": "python"
+ },
+ "17": {
+ "filepath": "/detr/util/box_ops.py",
+ "entry_id": 218,
+ "language_id": "python"
+ },
+ "18": {
+ "filepath": "/detr/util/misc.py",
+ "entry_id": 226,
+ "language_id": "python"
+ },
+ "19": {
+ "filepath": "/detr/util/plot_utils.py",
+ "entry_id": 258,
+ "language_id": "python"
+ },
+ "20": {
+ "filepath": "/dxl_test.py",
+ "entry_id": 270,
+ "language_id": "python"
+ },
+ "21": {
+ "filepath": "/dynamixel_client.py",
+ "entry_id": 274,
+ "language_id": "python"
+ },
+ "22": {
+ "filepath": "/ee_sim_env.py",
+ "entry_id": 320,
+ "language_id": "python"
+ },
+ "23": {
+ "filepath": "/imitate_episodes.py",
+ "entry_id": 348,
+ "language_id": "python"
+ },
+ "24": {
+ "filepath": "/policy.py",
+ "entry_id": 408,
+ "language_id": "python"
+ },
+ "25": {
+ "filepath": "/postprocess_episodes.py",
+ "entry_id": 432,
+ "language_id": "python"
+ },
+ "26": {
+ "filepath": "/record_sim_episodes.py",
+ "entry_id": 450,
+ "language_id": "python"
+ },
+ "27": {
+ "filepath": "/replay_episodes.py",
+ "entry_id": 468,
+ "language_id": "python"
+ },
+ "28": {
+ "filepath": "/scripted_policy.py",
+ "entry_id": 474,
+ "language_id": "python"
+ },
+ "29": {
+ "filepath": "/setup.py",
+ "entry_id": 496,
+ "language_id": "python"
+ },
+ "30": {
+ "filepath": "/sim_env.py",
+ "entry_id": 500,
+ "language_id": "python"
+ },
+ "31": {
+ "filepath": "/train_actuator_network.py",
+ "entry_id": 528,
+ "language_id": "python"
+ },
+ "32": {
+ "filepath": "/train_latent_model.py",
+ "entry_id": 562,
+ "language_id": "python"
+ },
+ "33": {
+ "filepath": "/truncate_data.py",
+ "entry_id": 602,
+ "language_id": "python"
+ },
+ "34": {
+ "filepath": "/utils.py",
+ "entry_id": 616,
+ "language_id": "python"
+ },
+ "35": {
+ "filepath": "/vinn_cache_feature.py",
+ "entry_id": 648,
+ "language_id": "python"
+ },
+ "36": {
+ "filepath": "/vinn_eval.py",
+ "entry_id": 662,
+ "language_id": "python"
+ },
+ "37": {
+ "filepath": "/vinn_select_k.py",
+ "entry_id": 690,
+ "language_id": "python"
+ },
+ "38": {
+ "filepath": "/visualize_episodes.py",
+ "entry_id": 702,
+ "language_id": "python"
+ }
+ },
+ "project_name": "act-plus-plus",
+ "split_count": 8
+}
\ No newline at end of file
diff --git a/docs/metadata_title.json b/docs/metadata_title.json
new file mode 100644
index 00000000..f885a7a9
--- /dev/null
+++ b/docs/metadata_title.json
@@ -0,0 +1 @@
+{"split_count": 2}
\ No newline at end of file
diff --git a/docs/sitemap.xml b/docs/sitemap.xml
new file mode 100644
index 00000000..b2c4b38b
--- /dev/null
+++ b/docs/sitemap.xml
@@ -0,0 +1,247 @@
+
+
+
+
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/README.md
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/__init__.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/align.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/commands.txt
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/compress_data.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/conda_env.yaml
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/constants.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/README.md
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/main.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/models/__init__.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/models/backbone.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/models/detr_vae.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/models/latent_model.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/models/position_encoding.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/models/transformer.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/setup.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/util/__init__.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/util/box_ops.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/util/misc.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/detr/util/plot_utils.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/dxl_test.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/dynamixel_client.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/ee_sim_env.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/imitate_episodes.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/policy.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/postprocess_episodes.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/record_sim_episodes.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/replay_episodes.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/scripted_policy.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/setup.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/sim_env.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/train_actuator_network.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/train_latent_model.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/truncate_data.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/utils.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/vinn_cache_feature.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/vinn_eval.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/vinn_select_k.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus?q=/visualize_episodes.py
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
+ https://james4ever0.github.io/act-plus-plus/tree.html?full=true
+ 2023-12-28T09:21:02+00:00
+ 1.00
+
+
+
\ No newline at end of file
diff --git a/docs/src/README.md b/docs/src/README.md
new file mode 100644
index 00000000..1e67d242
--- /dev/null
+++ b/docs/src/README.md
@@ -0,0 +1,85 @@
+# Imitation Learning algorithms and Co-training for Mobile ALOHA
+
+
+#### Project Website: https://mobile-aloha.github.io/
+
+This repo contains the implementation of ACT, Diffusion Policy and VINN, together with 2 simulated environments:
+Transfer Cube and Bimanual Insertion. You can train and evaluate them in sim or real.
+For real, you would also need to install [Mobile ALOHA](https://github.com/MarkFzp/mobile-aloha). This repo is forked from the [ACT repo](https://github.com/tonyzhaozh/act).
+
+### Updates:
+You can find all scripted/human demo for simulated environments [here](https://drive.google.com/drive/folders/1gPR03v05S1xiInoVJn7G7VJ9pDCnxq9O?usp=share_link).
+
+
+### Repo Structure
+- ``imitate_episodes.py`` Train and Evaluate ACT
+- ``policy.py`` An adaptor for ACT policy
+- ``detr`` Model definitions of ACT, modified from DETR
+- ``sim_env.py`` Mujoco + DM_Control environments with joint space control
+- ``ee_sim_env.py`` Mujoco + DM_Control environments with EE space control
+- ``scripted_policy.py`` Scripted policies for sim environments
+- ``constants.py`` Constants shared across files
+- ``utils.py`` Utils such as data loading and helper functions
+- ``visualize_episodes.py`` Save videos from a .hdf5 dataset
+
+
+### Installation
+
+ conda create -n aloha python=3.8.10
+ conda activate aloha
+ pip install torchvision
+ pip install torch
+ pip install pyquaternion
+ pip install pyyaml
+ pip install rospkg
+ pip install pexpect
+ pip install mujoco==2.3.7
+ pip install dm_control==1.0.14
+ pip install opencv-python
+ pip install matplotlib
+ pip install einops
+ pip install packaging
+ pip install h5py
+ pip install ipython
+ cd act/detr && pip install -e .
+
+- also need to install https://github.com/ARISE-Initiative/robomimic/tree/r2d2 (note the r2d2 branch) for Diffusion Policy by `pip install -e .`
+
+### Example Usages
+
+To set up a new terminal, run:
+
+ conda activate aloha
+ cd
+
+### Simulated experiments (LEGACY table-top ALOHA environments)
+
+We use ``sim_transfer_cube_scripted`` task in the examples below. Another option is ``sim_insertion_scripted``.
+To generated 50 episodes of scripted data, run:
+
+ python3 record_sim_episodes.py --task_name sim_transfer_cube_scripted --dataset_dir --num_episodes 50
+
+To can add the flag ``--onscreen_render`` to see real-time rendering.
+To visualize the simulated episodes after it is collected, run
+
+ python3 visualize_episodes.py --dataset_dir --episode_idx 0
+
+Note: to visualize data from the mobile-aloha hardware, use the visualize_episodes.py from https://github.com/MarkFzp/mobile-aloha
+
+To train ACT:
+
+ # Transfer Cube task
+ python3 imitate_episodes.py --task_name sim_transfer_cube_scripted --ckpt_dir --policy_class ACT --kl_weight 10 --chunk_size 100 --hidden_dim 512 --batch_size 8 --dim_feedforward 3200 --num_epochs 2000 --lr 1e-5 --seed 0
+
+
+To evaluate the policy, run the same command but add ``--eval``. This loads the best validation checkpoint.
+The success rate should be around 90% for transfer cube, and around 50% for insertion.
+To enable temporal ensembling, add flag ``--temporal_agg``.
+Videos will be saved to ```` for each rollout.
+You can also add ``--onscreen_render`` to see real-time rendering during evaluation.
+
+For real-world data where things can be harder to model, train for at least 5000 epochs or 3-4 times the length after the loss has plateaued.
+Please refer to [tuning tips](https://docs.google.com/document/d/1FVIZfoALXg_ZkYKaYVh-qOlaXveq5CtvJHXkY25eYhs/edit?usp=sharing) for more info.
+
+### [ACT tuning tips](https://docs.google.com/document/d/1FVIZfoALXg_ZkYKaYVh-qOlaXveq5CtvJHXkY25eYhs/edit?usp=sharing)
+TL;DR: if your ACT policy is jerky or pauses in the middle of an episode, just train for longer! Success rate and smoothness can improve way after loss plateaus.
diff --git a/docs/src/__init__.py b/docs/src/__init__.py
new file mode 100644
index 00000000..6bf0c97a
--- /dev/null
+++ b/docs/src/__init__.py
@@ -0,0 +1 @@
+w
\ No newline at end of file
diff --git a/docs/src/align.py b/docs/src/align.py
new file mode 100644
index 00000000..8b7fdcb1
--- /dev/null
+++ b/docs/src/align.py
@@ -0,0 +1,31 @@
+from interbotix_xs_modules.arm import InterbotixManipulatorXS
+from aloha_scripts.robot_utils import move_arms, torque_on, move_grippers
+from constants import PUPPET_GRIPPER_JOINT_OPEN, PUPPET_GRIPPER_JOINT_CLOSE
+import argparse
+import numpy as np
+
+# for calibrating head cam and arms being symmetrical
+
+def main():
+ argparser = argparse.ArgumentParser()
+ argparser.add_argument('--all', action='store_true', default=False)
+ args = argparser.parse_args()
+
+ puppet_bot_left = InterbotixManipulatorXS(robot_model="vx300s", group_name="arm", gripper_name="gripper", robot_name=f'puppet_left', init_node=True)
+ puppet_bot_right = InterbotixManipulatorXS(robot_model="vx300s", group_name="arm", gripper_name="gripper", robot_name=f'puppet_right', init_node=False)
+
+ all_bots = [puppet_bot_left, puppet_bot_right]
+ for bot in all_bots:
+ torque_on(bot)
+
+ multiplier = np.array([-1, 1, 1, -1, 1, 1])
+ puppet_sleep_position_left = np.array([-0.8, -0.5, 0.5, 0, 0.65, 0])
+ puppet_sleep_position_right = puppet_sleep_position_left * multiplier
+ all_positions = [puppet_sleep_position_left, puppet_sleep_position_right]
+ move_arms(all_bots, all_positions, move_time=2)
+
+ # move_grippers(all_bots, [PUPPET_GRIPPER_JOINT_OPEN] * 2, move_time=1) # open
+
+
+if __name__ == '__main__':
+ main()
diff --git a/docs/src/commands.txt b/docs/src/commands.txt
new file mode 100644
index 00000000..7f66ccab
--- /dev/null
+++ b/docs/src/commands.txt
@@ -0,0 +1,530 @@
+
+conda activate mimic
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+python3 imitate_episodes.py \
+--task_name sim_transfer_cube_human \
+--ckpt_dir /scr/tonyzhao/train_logs/vq_test \
+--policy_class ACT --kl_weight 10 --chunk_size 100 \
+--hidden_dim 512 --batch_size 8 --dim_feedforward 3200 \
+--num_epochs 10000 --lr 1e-5 --seed 0 --vq
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name all \
+--ckpt_dir /scr/tonyzhao/train_logs/pretrain_all \
+--policy_class ACT --kl_weight 10 --chunk_size 50 \
+--hidden_dim 512 --batch_size 24 --dim_feedforward 3200 --num_epochs 5000 --lr 1e-4 --seed 0
+
+
+#### NOTE to reproduce this experiment, uncomment the sim data filtering in utils.py
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name all \
+--ckpt_dir /scr/tonyzhao/train_logs/pretrain_all \
+--policy_class ACT --kl_weight 10 --chunk_size 50 \
+--hidden_dim 512 --batch_size 24 --dim_feedforward 3200 --lr 1e-4 --seed 0 \
+--num_steps 1000000 --eval_every 10000000000 --validate_every 2000 --save_every 5000
+
+# generate mirrored data
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus
+python3 record_sim_episodes.py --task_name sim_transfer_cube_scripted_mirror --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --num_episodes 50
+python3 postprocess_episodes.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --num_episodes 50
+# the sim_transfer_cube_scripted_mirror will have 100 episodes
+# I then copy the whole dir to sim_transfer_cube_scripted then removed all mirrored episodes
+# this gives sim_transfer_cube_scripted_mirror (100 episodes) and sim_transfer_cube_scripted (50 episodes)
+
+# visualize the original data
+python3 visualize_episodes.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --episode_idx 0
+# visualize the artificially mirrored data
+python3 visualize_episodes.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror --episode_idx 0 --ismirror
+
+# sanity check
+# replay the mirrored data action in the original env
+python3 replay_episodes.py --dataset_path /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror/mirror_episode_0.hdf5
+# replay the original data action in the original env
+python3 replay_episodes.py --dataset_path /scr/tonyzhao/datasets/sim_transfer_cube_scripted_mirror/episode_0.hdf5
+
+
+# launch experiment on original data
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted \
+--policy_class ACT --kl_weight 10 --chunk_size 50 \
+--hidden_dim 512 --batch_size 12 --dim_feedforward 3200 --lr 1e-5 --seed 0 \
+--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000 --no_encoder
+
+
+# launch experiment on all data
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name sim_transfer_cube_scripted_mirror \
+--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_mirror \
+--policy_class ACT --kl_weight 10 --chunk_size 50 \
+--hidden_dim 512 --batch_size 12 --dim_feedforward 3200 --lr 1e-5 --seed 0 \
+--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000 --no_encoder
+
+
+####### DIFFUSION POLICY
+
+- first install https://github.com/ARISE-Initiative/robomimic/tree/r2d2 (note the r2d2 branch)
+- on top of it pip install the current repo requirements
+
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_0 \
+--policy_class Diffusion --chunk_size 32 \
+--batch_size 32 --lr 1e-5 --seed 0 \
+--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000
+
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \
+--task_name sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_1 \
+--policy_class Diffusion --chunk_size 16 \
+--batch_size 32 --lr 1e-5 --seed 0 \
+--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000
+
+
+# above are all 100 train diffusion steps, 1e-5
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \
+--task_name sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_2_50step_1e-4 \
+--policy_class Diffusion --chunk_size 32 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 100000 --eval_every 2000 --validate_every 2000 --save_every 2000
+
+# Dec 10
+
+######################## more diffusion ########################
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_3_chunk64 \
+--policy_class Diffusion --chunk_size 64 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 200000 --eval_every 4000 --validate_every 4000 --save_every 4000
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_4_regressionTest \
+--policy_class Diffusion --chunk_size 32 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000
+
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_5_noEMA \
+--policy_class Diffusion --chunk_size 32 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \
+--task_name sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/cube_scripted_diffusion_sweep_6_noEMA_seed1 \
+--policy_class Diffusion --chunk_size 32 \
+--batch_size 32 --lr 1e-4 --seed 1 \
+--num_steps 200000 --eval_every 6000 --validate_every 6000 --save_every 6000
+
+###### Diffusion Real ######
+
+## deploy
+python3 imitate_episodes.py --task_name aloha_mobile_wipe_wine --ckpt_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/wipe_wine_diffusion_augmentation_seed0/ --policy_class Diffusion --chunk_size 32 --batch_size 32 --lr 1e-4 --seed 0 --num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000 --eval
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \
+--task_name aloha_mobile_wipe_wine \
+--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_diffusion_seed0 \
+--policy_class Diffusion --chunk_size 32 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000
+
+## Cotrain
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \
+--task_name aloha_mobile_wipe_wine_cotrain \
+--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_seed0 \
+--policy_class Diffusion --chunk_size 32 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000
+
+# train no cotrain again with augmentations
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name aloha_mobile_wipe_wine \
+--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_diffusion_augmentation_seed0 \
+--policy_class Diffusion --chunk_size 32 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000
+
+## Cotrain with augmentations
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \
+--task_name aloha_mobile_wipe_wine_cotrain \
+--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_seed0 \
+--policy_class Diffusion --chunk_size 32 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000
+
+# try chunk size 64, no cotrain
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name aloha_mobile_wipe_wine \
+--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_diffusion_augmentation_chunk64_seed0 \
+--policy_class Diffusion --chunk_size 64 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000
+
+# chunk 64 with cotrain
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \
+--task_name aloha_mobile_wipe_wine_cotrain \
+--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_chunk64_seed0 \
+--policy_class Diffusion --chunk_size 64 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000
+
+
+
+# chunk 64 with cotrain + EMA
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=0 python3 imitate_episodes.py \
+--task_name aloha_mobile_wipe_wine_2_cotrain \
+--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_chunk64_ema_seed0 \
+--policy_class Diffusion --chunk_size 64 \
+--batch_size 32 --lr 1e-4 --seed 0 \
+--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000
+
+
+# chunk 64 with cotrain + EMA + 3e-4
+
+conda activate mobile
+export MUJOCO_GL=egl
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=1 python3 imitate_episodes.py \
+--task_name aloha_mobile_wipe_wine_2_cotrain \
+--ckpt_dir /scr/tonyzhao/train_logs/wipe_wine_cotrain_diffusion_augmentation_chunk64_ema_3e-4_seed0 \
+--policy_class Diffusion --chunk_size 64 \
+--batch_size 32 --lr 3e-4 --seed 0 \
+--num_steps 1000000 --eval_every 1000000 --validate_every 5000 --save_every 5000
+
+
+######################## VINN ########################
+
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+CUDA_VISIBLE_DEVICES=1 python3 train.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted --cam_name top --seed 0
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+CUDA_VISIBLE_DEVICES=0 python3 train.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted --cam_name left_wrist --seed 0
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+CUDA_VISIBLE_DEVICES=1 python3 train.py --dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted --cam_name right_wrist --seed 0
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=sim_transfer_cube_scripted
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt
+
+TASK_NAME=sim_transfer_cube_scripted
+python3 vinn_select_k.py \
+--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test
+
+python3 vinn_eval.py \
+--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \
+--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test \
+--task_name $TASK_NAME
+
+## TODO
+make sure env is consistent
+tune a bit more
+
+
+######################## VINN Real ########################
+
+### test backward compatibility
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task sim_transfer_cube_scripted --cam_name top --seed 0
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task sim_transfer_cube_scripted --cam_name left_wrist --seed 0
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task sim_transfer_cube_scripted --cam_name right_wrist --seed 0
+
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=sim_transfer_cube_scripted
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt
+
+TASK_NAME=sim_transfer_cube_scripted
+python3 vinn_select_k.py \
+--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \
+--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test
+
+python3 vinn_eval.py \
+--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \
+--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-seed-0-test \
+--task_name $TASK_NAME
+
+### new data loader passed backward compatibility
+
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine --cam_name cam_high --seed 0
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine --cam_name cam_left_wrist --seed 0
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine --cam_name cam_right_wrist --seed 0
+
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_high --seed 0
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_left_wrist --seed 0
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_right_wrist --seed 0
+
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan --cam_name cam_high --seed 0
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan --cam_name cam_left_wrist --seed 0
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan --cam_name cam_right_wrist --seed 0
+
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_high --seed 0
+#CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_left_wrist --seed 0
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_right_wrist --seed 0
+
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wipe_wine_cotrain --cam_name cam_right_wrist --seed 0
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated --cam_name cam_high --seed 0
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated --cam_name cam_left_wrist --seed 0
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated --cam_name cam_right_wrist --seed 0
+
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_wash_pan_cotrain --cam_name cam_right_wrist --seed 0
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated_cotrain --cam_name cam_high --seed 0
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated_cotrain --cam_name cam_left_wrist --seed 0
+CUDA_VISIBLE_DEVICES=1 python3 train.py --task aloha_mobile_elevator_truncated_cotrain --cam_name cam_right_wrist --seed 0
+
+
+conda activate mobile
+export CUDA_VISIBLE_DEVICES=1
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_wipe_wine
+DATA_NAME=aloha_mobile_wipe_wine
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+
+conda activate mobile
+export CUDA_VISIBLE_DEVICES=1
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_wipe_wine_cotrain
+DATA_NAME=aloha_mobile_wipe_wine
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+
+
+conda activate mobile
+export CUDA_VISIBLE_DEVICES=1
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_wash_pan
+DATA_NAME=aloha_mobile_wash_pan
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+conda activate mobile
+export CUDA_VISIBLE_DEVICES=1
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_wash_pan_cotrain
+DATA_NAME=aloha_mobile_wash_pan
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+
+conda activate mobile
+export CUDA_VISIBLE_DEVICES=1
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_elevator_truncated
+DATA_NAME=aloha_mobile_elevator_truncated
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+conda activate mobile
+export CUDA_VISIBLE_DEVICES=1
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_elevator_truncated_cotrain
+DATA_NAME=aloha_mobile_elevator_truncated
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+
+
+# push chair task
+
+conda activate mobile
+export CUDA_VISIBLE_DEVICES=0
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+python3 train.py --task aloha_mobile_chair_truncated --cam_name cam_high --seed 0
+python3 train.py --task aloha_mobile_chair_truncated --cam_name cam_left_wrist --seed 0
+python3 train.py --task aloha_mobile_chair_truncated --cam_name cam_right_wrist --seed 0
+
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_chair_truncated
+DATA_NAME=aloha_mobile_chair_truncated
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+
+
+conda activate mobile
+export CUDA_VISIBLE_DEVICES=1
+cd /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning
+python3 train.py --task aloha_mobile_chair_truncated_cotrain --cam_name cam_high --seed 0
+python3 train.py --task aloha_mobile_chair_truncated_cotrain --cam_name cam_left_wrist --seed 0
+python3 train.py --task aloha_mobile_chair_truncated_cotrain --cam_name cam_right_wrist --seed 0
+
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_chair_truncated_cotrain
+DATA_NAME=aloha_mobile_chair_truncated
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+
+# cache feature again for wipe wine
+
+conda activate mobile
+export CUDA_VISIBLE_DEVICES=0
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_wipe_wine
+DATA_NAME=aloha_mobile_wipe_wine
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+cd /home/tonyzhao/Research/act-plus-plus
+TASK_NAME=aloha_mobile_wipe_wine_cotrain
+DATA_NAME=aloha_mobile_wipe_wine
+python3 vinn_cache_feature.py --ckpt_path /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME}
+
+
+
+# run on real robot
+
+TASK_NAME=aloha_mobile_chair_truncated
+python3 vinn_select_k.py \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME} \
+--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0
+
+python3 vinn_eval.py \
+--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \
+--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0 \
+--task_name $TASK_NAME
+
+
+
+TASK_NAME=aloha_mobile_chair_truncated
+python3 vinn_select_k.py \
+--dataset_dir /scr/tonyzhao/mobile_aloha_datasets/${DATA_NAME} \
+--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0
+
+python3 vinn_eval.py \
+--dataset_dir /scr/tonyzhao/datasets/sim_transfer_cube_scripted \
+--model_dir /home/tonyzhao/Research/act-plus-plus/byol_pytorch/examples/lightning/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--ckpt_dir /scr/tonyzhao/train_logs/VINN-eval-${TASK_NAME}-seed-0 \
+--task_name $TASK_NAME
+
+
+
+# eval on real robot
+
+conda activate aloha
+cd /home/mobile-aloha/interbotix_ws/src/act
+TASK_NAME=aloha_mobile_wipe_wine
+python3 vinn_cache_feature.py --ckpt_path /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/byol-${TASK_NAME}-DUMMY-seed-0.pt
+
+
+TASK_NAME=aloha_mobile_wipe_wine
+python3 vinn_select_k.py \
+--dataset_dir /home/mobile-aloha/data/${TASK_NAME} \
+--ckpt_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/VINN-eval-seed-0-test \
+
+
+TASK_NAME=aloha_mobile_wipe_wine
+python3 vinn_eval.py \
+--dataset_dir /home/mobile-aloha/data/${TASK_NAME} \
+--model_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/byol-${TASK_NAME}-DUMMY-seed-0.pt \
+--ckpt_dir /home/mobile-aloha/interbotix_ws/src/act/ckpts/vinn_ckpts/VINN-eval-seed-0-test \
+--task_name $TASK_NAME
+
+
+---------------------------------------------------------------------------------------
+
+NOTE: chunk size cannot be any number, try before launching
+TODO: Add history, EMA at test time
+
+conda activate mobile
+cd /home/tonyzhao/Research/act-plus-plus
+CUDA_VISIBLE_DEVICES=1 python3 train_actuator_network.py
+
+
+
diff --git a/docs/src/compress_data.py b/docs/src/compress_data.py
new file mode 100644
index 00000000..93a4989f
--- /dev/null
+++ b/docs/src/compress_data.py
@@ -0,0 +1,182 @@
+"""
+Example usage:
+$ python3 script/compress_data.py --dataset_dir /scr/lucyshi/dataset/aloha_test
+"""
+import os
+import h5py
+import cv2
+import numpy as np
+import argparse
+from tqdm import tqdm
+
+# Constants
+DT = 0.02
+JOINT_NAMES = ["waist", "shoulder", "elbow", "forearm_roll", "wrist_angle", "wrist_rotate"]
+STATE_NAMES = JOINT_NAMES + ["gripper"]
+
+
+def compress_dataset(input_dataset_path, output_dataset_path):
+ # Check if output path exists
+ if os.path.exists(output_dataset_path):
+ print(f"The file {output_dataset_path} already exists. Exiting...")
+ return
+
+ # Load the uncompressed dataset
+ with h5py.File(input_dataset_path, 'r') as infile:
+ # Create the compressed dataset
+ with h5py.File(output_dataset_path, 'w') as outfile:
+
+ outfile.attrs['sim'] = infile.attrs['sim']
+ outfile.attrs['compress'] = True
+
+ # Copy non-image data directly
+ for key in infile.keys():
+ if key != 'observations':
+ outfile.copy(infile[key], key)
+
+ obs_group = infile['observations']
+
+ # Create observation group in the output
+ out_obs_group = outfile.create_group('observations')
+
+ # Copy non-image data in observations directly
+ for key in obs_group.keys():
+ if key != 'images':
+ out_obs_group.copy(obs_group[key], key)
+
+ image_group = obs_group['images']
+ out_image_group = out_obs_group.create_group('images')
+
+ # JPEG compression parameters
+ encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 50]
+
+ compressed_lens = [] # List to store compressed lengths for each camera
+
+ for cam_name in image_group.keys():
+ if "_depth" in cam_name: # Depth images are not compressed
+ out_image_group.copy(image_group[cam_name], cam_name)
+ else:
+ images = image_group[cam_name]
+ compressed_images = []
+ cam_compressed_lens = [] # List to store compressed lengths for this camera
+
+ # Compress each image
+ for image in images:
+ result, encoded_image = cv2.imencode('.jpg', image, encode_param)
+ compressed_images.append(encoded_image)
+ cam_compressed_lens.append(len(encoded_image)) # Store the length
+
+ compressed_lens.append(cam_compressed_lens)
+
+ # Find the maximum length of the compressed images
+ max_len = max(len(img) for img in compressed_images)
+
+ # Create dataset to store compressed images
+ compressed_dataset = out_image_group.create_dataset(cam_name, (len(compressed_images), max_len), dtype='uint8')
+
+ # Store compressed images
+ for i, img in enumerate(compressed_images):
+ compressed_dataset[i, :len(img)] = img
+
+ # Save the compressed lengths to the HDF5 file
+ compressed_lens = np.array(compressed_lens)
+ _ = outfile.create_dataset('compress_len', compressed_lens.shape)
+ outfile['/compress_len'][...] = compressed_lens
+
+ print(f"Compressed dataset saved to {output_dataset_path}")
+
+
+def save_videos(video, dt, video_path=None):
+ if isinstance(video, list):
+ cam_names = list(video[0].keys())
+ h, w, _ = video[0][cam_names[0]].shape
+ w = w * len(cam_names)
+ fps = int(1/dt)
+ out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
+ # bitrate = 1000000
+ # out.set(cv2.VIDEOWRITER_PROP_BITRATE, bitrate)
+ for ts, image_dict in enumerate(video):
+ images = []
+ for cam_name in cam_names:
+ image = image_dict[cam_name]
+ image = image[:, :, [2, 1, 0]] # swap B and R channel
+ images.append(image)
+ images = np.concatenate(images, axis=1)
+ out.write(images)
+ out.release()
+ print(f'Saved video to: {video_path}')
+ elif isinstance(video, dict):
+ cam_names = list(video.keys())
+ # Remove depth images
+ cam_names = [cam_name for cam_name in cam_names if '_depth' not in cam_name]
+ all_cam_videos = []
+ for cam_name in cam_names:
+ all_cam_videos.append(video[cam_name])
+ all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension
+
+ n_frames, h, w, _ = all_cam_videos.shape
+ fps = int(1 / dt)
+ out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
+ for t in range(n_frames):
+ image = all_cam_videos[t]
+ image = image[:, :, [2, 1, 0]] # swap B and R channel
+ out.write(image)
+ out.release()
+ print(f'Saved video to: {video_path}')
+
+
+def load_and_save_first_episode_video(dataset_dir, video_path):
+ dataset_name = 'episode_0'
+ _, _, _, _, image_dict = load_hdf5(dataset_dir, dataset_name)
+ save_videos(image_dict, DT, video_path=video_path)
+
+
+def load_hdf5(dataset_dir, dataset_name):
+ dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')
+ if not os.path.isfile(dataset_path):
+ print(f'Dataset does not exist at \n{dataset_path}\n')
+ exit()
+
+ with h5py.File(dataset_path, 'r') as root:
+ compressed = root.attrs.get('compress', False)
+ image_dict = dict()
+ for cam_name in root[f'/observations/images/'].keys():
+ image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]
+ if compressed:
+ compress_len = root['/compress_len'][()]
+
+ if compressed:
+ for cam_id, cam_name in enumerate(image_dict.keys()):
+ padded_compressed_image_list = image_dict[cam_name]
+ image_list = []
+ for frame_id, padded_compressed_image in enumerate(padded_compressed_image_list):
+ image_len = int(compress_len[cam_id, frame_id])
+ compressed_image = padded_compressed_image
+ image = cv2.imdecode(compressed_image, 1)
+ image_list.append(image)
+ image_dict[cam_name] = image_list
+
+ return None, None, None, None, image_dict # Return only the image dict for this application
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description="Compress all HDF5 datasets in a directory.")
+ parser.add_argument('--dataset_dir', action='store', type=str, required=True, help='Directory containing the uncompressed datasets.')
+
+ args = parser.parse_args()
+
+ output_dataset_dir = args.dataset_dir + '_compressed'
+ os.makedirs(output_dataset_dir, exist_ok=True)
+
+ # Iterate over each file in the directory
+ for filename in tqdm(os.listdir(args.dataset_dir), desc="Compressing data"):
+ if filename.endswith('.hdf5'):
+ input_path = os.path.join(args.dataset_dir, filename)
+ output_path = os.path.join(output_dataset_dir, filename)
+ compress_dataset(input_path, output_path)
+
+ # After processing all datasets, load and save the video for the first episode
+ print(f'Saving video for episode 0 in {output_dataset_dir}')
+ video_path = os.path.join(output_dataset_dir, 'episode_0_video.mp4')
+ load_and_save_first_episode_video(output_dataset_dir, video_path)
+
diff --git a/docs/src/conda_env.yaml b/docs/src/conda_env.yaml
new file mode 100644
index 00000000..0f44d6b0
--- /dev/null
+++ b/docs/src/conda_env.yaml
@@ -0,0 +1,23 @@
+name: aloha
+channels:
+ - pytorch
+ - nvidia
+ - conda-forge
+dependencies:
+ - python=3.9
+ - pip=23.0.1
+ - pytorch=2.0.0
+ - torchvision=0.15.0
+ - pytorch-cuda=11.8
+ - pyquaternion=0.9.9
+ - pyyaml=6.0
+ - rospkg=1.5.0
+ - pexpect=4.8.0
+ - mujoco=2.3.3
+ - dm_control=1.0.9
+ - py-opencv=4.7.0
+ - matplotlib=3.7.1
+ - einops=0.6.0
+ - packaging=23.0
+ - h5py=3.8.0
+ - ipython=8.12.0
diff --git a/docs/src/constants.py b/docs/src/constants.py
new file mode 100644
index 00000000..266a812f
--- /dev/null
+++ b/docs/src/constants.py
@@ -0,0 +1,100 @@
+import pathlib
+import os
+
+### Task parameters
+DATA_DIR = '/home/zfu/interbotix_ws/src/act/data' if os.getlogin() == 'zfu' else '/scr/tonyzhao/datasets'
+SIM_TASK_CONFIGS = {
+ 'sim_transfer_cube_scripted':{
+ 'dataset_dir': DATA_DIR + '/sim_transfer_cube_scripted',
+ 'num_episodes': 50,
+ 'episode_len': 400,
+ 'camera_names': ['top', 'left_wrist', 'right_wrist']
+ },
+
+ 'sim_transfer_cube_human':{
+ 'dataset_dir': DATA_DIR + '/sim_transfer_cube_human',
+ 'num_episodes': 50,
+ 'episode_len': 400,
+ 'camera_names': ['top']
+ },
+
+ 'sim_insertion_scripted': {
+ 'dataset_dir': DATA_DIR + '/sim_insertion_scripted',
+ 'num_episodes': 50,
+ 'episode_len': 400,
+ 'camera_names': ['top', 'left_wrist', 'right_wrist']
+ },
+
+ 'sim_insertion_human': {
+ 'dataset_dir': DATA_DIR + '/sim_insertion_human',
+ 'num_episodes': 50,
+ 'episode_len': 500,
+ 'camera_names': ['top']
+ },
+ 'all': {
+ 'dataset_dir': DATA_DIR + '/',
+ 'num_episodes': None,
+ 'episode_len': None,
+ 'name_filter': lambda n: 'sim' not in n,
+ 'camera_names': ['cam_high', 'cam_left_wrist', 'cam_right_wrist']
+ },
+
+ 'sim_transfer_cube_scripted_mirror':{
+ 'dataset_dir': DATA_DIR + '/sim_transfer_cube_scripted_mirror',
+ 'num_episodes': None,
+ 'episode_len': 400,
+ 'camera_names': ['top', 'left_wrist', 'right_wrist']
+ },
+
+ 'sim_insertion_scripted_mirror': {
+ 'dataset_dir': DATA_DIR + '/sim_insertion_scripted_mirror',
+ 'num_episodes': None,
+ 'episode_len': 400,
+ 'camera_names': ['top', 'left_wrist', 'right_wrist']
+ },
+
+}
+
+### Simulation envs fixed constants
+DT = 0.02
+FPS = 50
+JOINT_NAMES = ["waist", "shoulder", "elbow", "forearm_roll", "wrist_angle", "wrist_rotate"]
+START_ARM_POSE = [0, -0.96, 1.16, 0, -0.3, 0, 0.02239, -0.02239, 0, -0.96, 1.16, 0, -0.3, 0, 0.02239, -0.02239]
+
+XML_DIR = str(pathlib.Path(__file__).parent.resolve()) + '/assets/' # note: absolute path
+
+# Left finger position limits (qpos[7]), right_finger = -1 * left_finger
+MASTER_GRIPPER_POSITION_OPEN = 0.02417
+MASTER_GRIPPER_POSITION_CLOSE = 0.01244
+PUPPET_GRIPPER_POSITION_OPEN = 0.05800
+PUPPET_GRIPPER_POSITION_CLOSE = 0.01844
+
+# Gripper joint limits (qpos[6])
+MASTER_GRIPPER_JOINT_OPEN = -0.8
+MASTER_GRIPPER_JOINT_CLOSE = -1.65
+PUPPET_GRIPPER_JOINT_OPEN = 1.4910
+PUPPET_GRIPPER_JOINT_CLOSE = -0.6213
+
+############################ Helper functions ############################
+
+MASTER_GRIPPER_POSITION_NORMALIZE_FN = lambda x: (x - MASTER_GRIPPER_POSITION_CLOSE) / (MASTER_GRIPPER_POSITION_OPEN - MASTER_GRIPPER_POSITION_CLOSE)
+PUPPET_GRIPPER_POSITION_NORMALIZE_FN = lambda x: (x - PUPPET_GRIPPER_POSITION_CLOSE) / (PUPPET_GRIPPER_POSITION_OPEN - PUPPET_GRIPPER_POSITION_CLOSE)
+MASTER_GRIPPER_POSITION_UNNORMALIZE_FN = lambda x: x * (MASTER_GRIPPER_POSITION_OPEN - MASTER_GRIPPER_POSITION_CLOSE) + MASTER_GRIPPER_POSITION_CLOSE
+PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN = lambda x: x * (PUPPET_GRIPPER_POSITION_OPEN - PUPPET_GRIPPER_POSITION_CLOSE) + PUPPET_GRIPPER_POSITION_CLOSE
+MASTER2PUPPET_POSITION_FN = lambda x: PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(MASTER_GRIPPER_POSITION_NORMALIZE_FN(x))
+
+MASTER_GRIPPER_JOINT_NORMALIZE_FN = lambda x: (x - MASTER_GRIPPER_JOINT_CLOSE) / (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE)
+PUPPET_GRIPPER_JOINT_NORMALIZE_FN = lambda x: (x - PUPPET_GRIPPER_JOINT_CLOSE) / (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE)
+MASTER_GRIPPER_JOINT_UNNORMALIZE_FN = lambda x: x * (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE) + MASTER_GRIPPER_JOINT_CLOSE
+PUPPET_GRIPPER_JOINT_UNNORMALIZE_FN = lambda x: x * (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE) + PUPPET_GRIPPER_JOINT_CLOSE
+MASTER2PUPPET_JOINT_FN = lambda x: PUPPET_GRIPPER_JOINT_UNNORMALIZE_FN(MASTER_GRIPPER_JOINT_NORMALIZE_FN(x))
+
+MASTER_GRIPPER_VELOCITY_NORMALIZE_FN = lambda x: x / (MASTER_GRIPPER_POSITION_OPEN - MASTER_GRIPPER_POSITION_CLOSE)
+PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN = lambda x: x / (PUPPET_GRIPPER_POSITION_OPEN - PUPPET_GRIPPER_POSITION_CLOSE)
+
+MASTER_POS2JOINT = lambda x: MASTER_GRIPPER_POSITION_NORMALIZE_FN(x) * (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE) + MASTER_GRIPPER_JOINT_CLOSE
+MASTER_JOINT2POS = lambda x: MASTER_GRIPPER_POSITION_UNNORMALIZE_FN((x - MASTER_GRIPPER_JOINT_CLOSE) / (MASTER_GRIPPER_JOINT_OPEN - MASTER_GRIPPER_JOINT_CLOSE))
+PUPPET_POS2JOINT = lambda x: PUPPET_GRIPPER_POSITION_NORMALIZE_FN(x) * (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE) + PUPPET_GRIPPER_JOINT_CLOSE
+PUPPET_JOINT2POS = lambda x: PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN((x - PUPPET_GRIPPER_JOINT_CLOSE) / (PUPPET_GRIPPER_JOINT_OPEN - PUPPET_GRIPPER_JOINT_CLOSE))
+
+MASTER_GRIPPER_JOINT_MID = (MASTER_GRIPPER_JOINT_OPEN + MASTER_GRIPPER_JOINT_CLOSE)/2
diff --git a/docs/src/detr/README.md b/docs/src/detr/README.md
new file mode 100644
index 00000000..500b1b8d
--- /dev/null
+++ b/docs/src/detr/README.md
@@ -0,0 +1,9 @@
+This part of the codebase is modified from DETR https://github.com/facebookresearch/detr under APACHE 2.0.
+
+ @article{Carion2020EndtoEndOD,
+ title={End-to-End Object Detection with Transformers},
+ author={Nicolas Carion and Francisco Massa and Gabriel Synnaeve and Nicolas Usunier and Alexander Kirillov and Sergey Zagoruyko},
+ journal={ArXiv},
+ year={2020},
+ volume={abs/2005.12872}
+ }
\ No newline at end of file
diff --git a/docs/src/detr/main.py b/docs/src/detr/main.py
new file mode 100644
index 00000000..acf15109
--- /dev/null
+++ b/docs/src/detr/main.py
@@ -0,0 +1,130 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+import argparse
+from pathlib import Path
+
+import numpy as np
+import torch
+from .models import build_ACT_model, build_CNNMLP_model
+
+import IPython
+e = IPython.embed
+
+def get_args_parser():
+ parser = argparse.ArgumentParser('Set transformer detector', add_help=False)
+ parser.add_argument('--lr', default=1e-4, type=float) # will be overridden
+ parser.add_argument('--lr_backbone', default=1e-5, type=float) # will be overridden
+ parser.add_argument('--batch_size', default=2, type=int) # not used
+ parser.add_argument('--weight_decay', default=1e-4, type=float)
+ parser.add_argument('--epochs', default=300, type=int) # not used
+ parser.add_argument('--lr_drop', default=200, type=int) # not used
+ parser.add_argument('--clip_max_norm', default=0.1, type=float, # not used
+ help='gradient clipping max norm')
+
+ # Model parameters
+ # * Backbone
+ parser.add_argument('--backbone', default='resnet18', type=str, # will be overridden
+ help="Name of the convolutional backbone to use")
+ parser.add_argument('--dilation', action='store_true',
+ help="If true, we replace stride with dilation in the last convolutional block (DC5)")
+ parser.add_argument('--position_embedding', default='sine', type=str, choices=('sine', 'learned'),
+ help="Type of positional embedding to use on top of the image features")
+ parser.add_argument('--camera_names', default=[], type=list, # will be overridden
+ help="A list of camera names")
+
+ # * Transformer
+ parser.add_argument('--enc_layers', default=4, type=int, # will be overridden
+ help="Number of encoding layers in the transformer")
+ parser.add_argument('--dec_layers', default=6, type=int, # will be overridden
+ help="Number of decoding layers in the transformer")
+ parser.add_argument('--dim_feedforward', default=2048, type=int, # will be overridden
+ help="Intermediate size of the feedforward layers in the transformer blocks")
+ parser.add_argument('--hidden_dim', default=256, type=int, # will be overridden
+ help="Size of the embeddings (dimension of the transformer)")
+ parser.add_argument('--dropout', default=0.1, type=float,
+ help="Dropout applied in the transformer")
+ parser.add_argument('--nheads', default=8, type=int, # will be overridden
+ help="Number of attention heads inside the transformer's attentions")
+ parser.add_argument('--num_queries', default=400, type=int, # will be overridden
+ help="Number of query slots")
+ parser.add_argument('--pre_norm', action='store_true')
+
+ # * Segmentation
+ parser.add_argument('--masks', action='store_true',
+ help="Train segmentation head if the flag is provided")
+
+ # repeat args in imitate_episodes just to avoid error. Will not be used
+ parser.add_argument('--eval', action='store_true')
+ parser.add_argument('--onscreen_render', action='store_true')
+ parser.add_argument('--ckpt_dir', action='store', type=str, help='ckpt_dir', required=True)
+ parser.add_argument('--policy_class', action='store', type=str, help='policy_class, capitalize', required=True)
+ parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)
+ parser.add_argument('--seed', action='store', type=int, help='seed', required=True)
+ parser.add_argument('--num_steps', action='store', type=int, help='num_epochs', required=True)
+ parser.add_argument('--kl_weight', action='store', type=int, help='KL Weight', required=False)
+ parser.add_argument('--chunk_size', action='store', type=int, help='chunk_size', required=False)
+ parser.add_argument('--temporal_agg', action='store_true')
+
+ parser.add_argument('--use_vq', action='store_true')
+ parser.add_argument('--vq_class', action='store', type=int, help='vq_class', required=False)
+ parser.add_argument('--vq_dim', action='store', type=int, help='vq_dim', required=False)
+ parser.add_argument('--load_pretrain', action='store_true', default=False)
+ parser.add_argument('--action_dim', action='store', type=int, required=False)
+ parser.add_argument('--eval_every', action='store', type=int, default=500, help='eval_every', required=False)
+ parser.add_argument('--validate_every', action='store', type=int, default=500, help='validate_every', required=False)
+ parser.add_argument('--save_every', action='store', type=int, default=500, help='save_every', required=False)
+ parser.add_argument('--resume_ckpt_path', action='store', type=str, help='load_ckpt_path', required=False)
+ parser.add_argument('--no_encoder', action='store_true')
+ parser.add_argument('--skip_mirrored_data', action='store_true')
+ parser.add_argument('--actuator_network_dir', action='store', type=str, help='actuator_network_dir', required=False)
+ parser.add_argument('--history_len', action='store', type=int)
+ parser.add_argument('--future_len', action='store', type=int)
+ parser.add_argument('--prediction_len', action='store', type=int)
+
+ return parser
+
+
+def build_ACT_model_and_optimizer(args_override):
+ parser = argparse.ArgumentParser('DETR training and evaluation script', parents=[get_args_parser()])
+ args = parser.parse_args()
+
+ for k, v in args_override.items():
+ setattr(args, k, v)
+
+ model = build_ACT_model(args)
+ model.cuda()
+
+ param_dicts = [
+ {"params": [p for n, p in model.named_parameters() if "backbone" not in n and p.requires_grad]},
+ {
+ "params": [p for n, p in model.named_parameters() if "backbone" in n and p.requires_grad],
+ "lr": args.lr_backbone,
+ },
+ ]
+ optimizer = torch.optim.AdamW(param_dicts, lr=args.lr,
+ weight_decay=args.weight_decay)
+
+ return model, optimizer
+
+
+def build_CNNMLP_model_and_optimizer(args_override):
+ parser = argparse.ArgumentParser('DETR training and evaluation script', parents=[get_args_parser()])
+ args = parser.parse_args()
+
+ for k, v in args_override.items():
+ setattr(args, k, v)
+
+ model = build_CNNMLP_model(args)
+ model.cuda()
+
+ param_dicts = [
+ {"params": [p for n, p in model.named_parameters() if "backbone" not in n and p.requires_grad]},
+ {
+ "params": [p for n, p in model.named_parameters() if "backbone" in n and p.requires_grad],
+ "lr": args.lr_backbone,
+ },
+ ]
+ optimizer = torch.optim.AdamW(param_dicts, lr=args.lr,
+ weight_decay=args.weight_decay)
+
+ return model, optimizer
+
diff --git a/docs/src/detr/models/__init__.py b/docs/src/detr/models/__init__.py
new file mode 100644
index 00000000..cc78db10
--- /dev/null
+++ b/docs/src/detr/models/__init__.py
@@ -0,0 +1,9 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+from .detr_vae import build as build_vae
+from .detr_vae import build_cnnmlp as build_cnnmlp
+
+def build_ACT_model(args):
+ return build_vae(args)
+
+def build_CNNMLP_model(args):
+ return build_cnnmlp(args)
\ No newline at end of file
diff --git a/docs/src/detr/models/backbone.py b/docs/src/detr/models/backbone.py
new file mode 100644
index 00000000..f28637ea
--- /dev/null
+++ b/docs/src/detr/models/backbone.py
@@ -0,0 +1,122 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+"""
+Backbone modules.
+"""
+from collections import OrderedDict
+
+import torch
+import torch.nn.functional as F
+import torchvision
+from torch import nn
+from torchvision.models._utils import IntermediateLayerGetter
+from typing import Dict, List
+
+from util.misc import NestedTensor, is_main_process
+
+from .position_encoding import build_position_encoding
+
+import IPython
+e = IPython.embed
+
+class FrozenBatchNorm2d(torch.nn.Module):
+ """
+ BatchNorm2d where the batch statistics and the affine parameters are fixed.
+
+ Copy-paste from torchvision.misc.ops with added eps before rqsrt,
+ without which any other policy_models than torchvision.policy_models.resnet[18,34,50,101]
+ produce nans.
+ """
+
+ def __init__(self, n):
+ super(FrozenBatchNorm2d, self).__init__()
+ self.register_buffer("weight", torch.ones(n))
+ self.register_buffer("bias", torch.zeros(n))
+ self.register_buffer("running_mean", torch.zeros(n))
+ self.register_buffer("running_var", torch.ones(n))
+
+ def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
+ missing_keys, unexpected_keys, error_msgs):
+ num_batches_tracked_key = prefix + 'num_batches_tracked'
+ if num_batches_tracked_key in state_dict:
+ del state_dict[num_batches_tracked_key]
+
+ super(FrozenBatchNorm2d, self)._load_from_state_dict(
+ state_dict, prefix, local_metadata, strict,
+ missing_keys, unexpected_keys, error_msgs)
+
+ def forward(self, x):
+ # move reshapes to the beginning
+ # to make it fuser-friendly
+ w = self.weight.reshape(1, -1, 1, 1)
+ b = self.bias.reshape(1, -1, 1, 1)
+ rv = self.running_var.reshape(1, -1, 1, 1)
+ rm = self.running_mean.reshape(1, -1, 1, 1)
+ eps = 1e-5
+ scale = w * (rv + eps).rsqrt()
+ bias = b - rm * scale
+ return x * scale + bias
+
+
+class BackboneBase(nn.Module):
+
+ def __init__(self, backbone: nn.Module, train_backbone: bool, num_channels: int, return_interm_layers: bool):
+ super().__init__()
+ # for name, parameter in backbone.named_parameters(): # only train later layers # TODO do we want this?
+ # if not train_backbone or 'layer2' not in name and 'layer3' not in name and 'layer4' not in name:
+ # parameter.requires_grad_(False)
+ if return_interm_layers:
+ return_layers = {"layer1": "0", "layer2": "1", "layer3": "2", "layer4": "3"}
+ else:
+ return_layers = {'layer4': "0"}
+ self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
+ self.num_channels = num_channels
+
+ def forward(self, tensor):
+ xs = self.body(tensor)
+ return xs
+ # out: Dict[str, NestedTensor] = {}
+ # for name, x in xs.items():
+ # m = tensor_list.mask
+ # assert m is not None
+ # mask = F.interpolate(m[None].float(), size=x.shape[-2:]).to(torch.bool)[0]
+ # out[name] = NestedTensor(x, mask)
+ # return out
+
+
+class Backbone(BackboneBase):
+ """ResNet backbone with frozen BatchNorm."""
+ def __init__(self, name: str,
+ train_backbone: bool,
+ return_interm_layers: bool,
+ dilation: bool):
+ backbone = getattr(torchvision.models, name)(
+ replace_stride_with_dilation=[False, False, dilation],
+ pretrained=is_main_process(), norm_layer=FrozenBatchNorm2d) # pretrained # TODO do we want frozen batch_norm??
+ num_channels = 512 if name in ('resnet18', 'resnet34') else 2048
+ super().__init__(backbone, train_backbone, num_channels, return_interm_layers)
+
+
+class Joiner(nn.Sequential):
+ def __init__(self, backbone, position_embedding):
+ super().__init__(backbone, position_embedding)
+
+ def forward(self, tensor_list: NestedTensor):
+ xs = self[0](tensor_list)
+ out: List[NestedTensor] = []
+ pos = []
+ for name, x in xs.items():
+ out.append(x)
+ # position encoding
+ pos.append(self[1](x).to(x.dtype))
+
+ return out, pos
+
+
+def build_backbone(args):
+ position_embedding = build_position_encoding(args)
+ train_backbone = args.lr_backbone > 0
+ return_interm_layers = args.masks
+ backbone = Backbone(args.backbone, train_backbone, return_interm_layers, args.dilation)
+ model = Joiner(backbone, position_embedding)
+ model.num_channels = backbone.num_channels
+ return model
diff --git a/docs/src/detr/models/detr_vae.py b/docs/src/detr/models/detr_vae.py
new file mode 100644
index 00000000..8c193529
--- /dev/null
+++ b/docs/src/detr/models/detr_vae.py
@@ -0,0 +1,326 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+"""
+DETR model and criterion classes.
+"""
+import torch
+from torch import nn
+from torch.autograd import Variable
+import torch.nn.functional as F
+from .backbone import build_backbone
+from .transformer import build_transformer, TransformerEncoder, TransformerEncoderLayer
+
+import numpy as np
+
+import IPython
+e = IPython.embed
+
+
+def reparametrize(mu, logvar):
+ std = logvar.div(2).exp()
+ eps = Variable(std.data.new(std.size()).normal_())
+ return mu + std * eps
+
+
+def get_sinusoid_encoding_table(n_position, d_hid):
+ def get_position_angle_vec(position):
+ return [position / np.power(10000, 2 * (hid_j // 2) / d_hid) for hid_j in range(d_hid)]
+
+ sinusoid_table = np.array([get_position_angle_vec(pos_i) for pos_i in range(n_position)])
+ sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i
+ sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1
+
+ return torch.FloatTensor(sinusoid_table).unsqueeze(0)
+
+
+class DETRVAE(nn.Module):
+ """ This is the DETR module that performs object detection """
+ def __init__(self, backbones, transformer, encoder, state_dim, num_queries, camera_names, vq, vq_class, vq_dim, action_dim):
+ """ Initializes the model.
+ Parameters:
+ backbones: torch module of the backbone to be used. See backbone.py
+ transformer: torch module of the transformer architecture. See transformer.py
+ state_dim: robot state dimension of the environment
+ num_queries: number of object queries, ie detection slot. This is the maximal number of objects
+ DETR can detect in a single image. For COCO, we recommend 100 queries.
+ aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.
+ """
+ super().__init__()
+ self.num_queries = num_queries
+ self.camera_names = camera_names
+ self.transformer = transformer
+ self.encoder = encoder
+ self.vq, self.vq_class, self.vq_dim = vq, vq_class, vq_dim
+ self.state_dim, self.action_dim = state_dim, action_dim
+ hidden_dim = transformer.d_model
+ self.action_head = nn.Linear(hidden_dim, action_dim)
+ self.is_pad_head = nn.Linear(hidden_dim, 1)
+ self.query_embed = nn.Embedding(num_queries, hidden_dim)
+ if backbones is not None:
+ self.input_proj = nn.Conv2d(backbones[0].num_channels, hidden_dim, kernel_size=1)
+ self.backbones = nn.ModuleList(backbones)
+ self.input_proj_robot_state = nn.Linear(state_dim, hidden_dim)
+ else:
+ # input_dim = 14 + 7 # robot_state + env_state
+ self.input_proj_robot_state = nn.Linear(state_dim, hidden_dim)
+ self.input_proj_env_state = nn.Linear(7, hidden_dim)
+ self.pos = torch.nn.Embedding(2, hidden_dim)
+ self.backbones = None
+
+ # encoder extra parameters
+ self.latent_dim = 32 # final size of latent z # TODO tune
+ self.cls_embed = nn.Embedding(1, hidden_dim) # extra cls token embedding
+ self.encoder_action_proj = nn.Linear(action_dim, hidden_dim) # project action to embedding
+ self.encoder_joint_proj = nn.Linear(state_dim, hidden_dim) # project qpos to embedding
+
+ print(f'Use VQ: {self.vq}, {self.vq_class}, {self.vq_dim}')
+ if self.vq:
+ self.latent_proj = nn.Linear(hidden_dim, self.vq_class * self.vq_dim)
+ else:
+ self.latent_proj = nn.Linear(hidden_dim, self.latent_dim*2) # project hidden state to latent std, var
+ self.register_buffer('pos_table', get_sinusoid_encoding_table(1+1+num_queries, hidden_dim)) # [CLS], qpos, a_seq
+
+ # decoder extra parameters
+ if self.vq:
+ self.latent_out_proj = nn.Linear(self.vq_class * self.vq_dim, hidden_dim)
+ else:
+ self.latent_out_proj = nn.Linear(self.latent_dim, hidden_dim) # project latent sample to embedding
+ self.additional_pos_embed = nn.Embedding(2, hidden_dim) # learned position embedding for proprio and latent
+
+
+ def encode(self, qpos, actions=None, is_pad=None, vq_sample=None):
+ bs, _ = qpos.shape
+ if self.encoder is None:
+ latent_sample = torch.zeros([bs, self.latent_dim], dtype=torch.float32).to(qpos.device)
+ latent_input = self.latent_out_proj(latent_sample)
+ probs = binaries = mu = logvar = None
+ else:
+ # cvae encoder
+ is_training = actions is not None # train or val
+ ### Obtain latent z from action sequence
+ if is_training:
+ # project action sequence to embedding dim, and concat with a CLS token
+ action_embed = self.encoder_action_proj(actions) # (bs, seq, hidden_dim)
+ qpos_embed = self.encoder_joint_proj(qpos) # (bs, hidden_dim)
+ qpos_embed = torch.unsqueeze(qpos_embed, axis=1) # (bs, 1, hidden_dim)
+ cls_embed = self.cls_embed.weight # (1, hidden_dim)
+ cls_embed = torch.unsqueeze(cls_embed, axis=0).repeat(bs, 1, 1) # (bs, 1, hidden_dim)
+ encoder_input = torch.cat([cls_embed, qpos_embed, action_embed], axis=1) # (bs, seq+1, hidden_dim)
+ encoder_input = encoder_input.permute(1, 0, 2) # (seq+1, bs, hidden_dim)
+ # do not mask cls token
+ cls_joint_is_pad = torch.full((bs, 2), False).to(qpos.device) # False: not a padding
+ is_pad = torch.cat([cls_joint_is_pad, is_pad], axis=1) # (bs, seq+1)
+ # obtain position embedding
+ pos_embed = self.pos_table.clone().detach()
+ pos_embed = pos_embed.permute(1, 0, 2) # (seq+1, 1, hidden_dim)
+ # query model
+ encoder_output = self.encoder(encoder_input, pos=pos_embed, src_key_padding_mask=is_pad)
+ encoder_output = encoder_output[0] # take cls output only
+ latent_info = self.latent_proj(encoder_output)
+
+ if self.vq:
+ logits = latent_info.reshape([*latent_info.shape[:-1], self.vq_class, self.vq_dim])
+ probs = torch.softmax(logits, dim=-1)
+ binaries = F.one_hot(torch.multinomial(probs.view(-1, self.vq_dim), 1).squeeze(-1), self.vq_dim).view(-1, self.vq_class, self.vq_dim).float()
+ binaries_flat = binaries.view(-1, self.vq_class * self.vq_dim)
+ probs_flat = probs.view(-1, self.vq_class * self.vq_dim)
+ straigt_through = binaries_flat - probs_flat.detach() + probs_flat
+ latent_input = self.latent_out_proj(straigt_through)
+ mu = logvar = None
+ else:
+ probs = binaries = None
+ mu = latent_info[:, :self.latent_dim]
+ logvar = latent_info[:, self.latent_dim:]
+ latent_sample = reparametrize(mu, logvar)
+ latent_input = self.latent_out_proj(latent_sample)
+
+ else:
+ mu = logvar = binaries = probs = None
+ if self.vq:
+ latent_input = self.latent_out_proj(vq_sample.view(-1, self.vq_class * self.vq_dim))
+ else:
+ latent_sample = torch.zeros([bs, self.latent_dim], dtype=torch.float32).to(qpos.device)
+ latent_input = self.latent_out_proj(latent_sample)
+
+ return latent_input, probs, binaries, mu, logvar
+
+ def forward(self, qpos, image, env_state, actions=None, is_pad=None, vq_sample=None):
+ """
+ qpos: batch, qpos_dim
+ image: batch, num_cam, channel, height, width
+ env_state: None
+ actions: batch, seq, action_dim
+ """
+ latent_input, probs, binaries, mu, logvar = self.encode(qpos, actions, is_pad, vq_sample)
+
+ # cvae decoder
+ if self.backbones is not None:
+ # Image observation features and position embeddings
+ all_cam_features = []
+ all_cam_pos = []
+ for cam_id, cam_name in enumerate(self.camera_names):
+ features, pos = self.backbones[cam_id](image[:, cam_id])
+ features = features[0] # take the last layer feature
+ pos = pos[0]
+ all_cam_features.append(self.input_proj(features))
+ all_cam_pos.append(pos)
+ # proprioception features
+ proprio_input = self.input_proj_robot_state(qpos)
+ # fold camera dimension into width dimension
+ src = torch.cat(all_cam_features, axis=3)
+ pos = torch.cat(all_cam_pos, axis=3)
+ hs = self.transformer(src, None, self.query_embed.weight, pos, latent_input, proprio_input, self.additional_pos_embed.weight)[0]
+ else:
+ qpos = self.input_proj_robot_state(qpos)
+ env_state = self.input_proj_env_state(env_state)
+ transformer_input = torch.cat([qpos, env_state], axis=1) # seq length = 2
+ hs = self.transformer(transformer_input, None, self.query_embed.weight, self.pos.weight)[0]
+ a_hat = self.action_head(hs)
+ is_pad_hat = self.is_pad_head(hs)
+ return a_hat, is_pad_hat, [mu, logvar], probs, binaries
+
+
+
+class CNNMLP(nn.Module):
+ def __init__(self, backbones, state_dim, camera_names):
+ """ Initializes the model.
+ Parameters:
+ backbones: torch module of the backbone to be used. See backbone.py
+ transformer: torch module of the transformer architecture. See transformer.py
+ state_dim: robot state dimension of the environment
+ num_queries: number of object queries, ie detection slot. This is the maximal number of objects
+ DETR can detect in a single image. For COCO, we recommend 100 queries.
+ aux_loss: True if auxiliary decoding losses (loss at each decoder layer) are to be used.
+ """
+ super().__init__()
+ self.camera_names = camera_names
+ self.action_head = nn.Linear(1000, state_dim) # TODO add more
+ if backbones is not None:
+ self.backbones = nn.ModuleList(backbones)
+ backbone_down_projs = []
+ for backbone in backbones:
+ down_proj = nn.Sequential(
+ nn.Conv2d(backbone.num_channels, 128, kernel_size=5),
+ nn.Conv2d(128, 64, kernel_size=5),
+ nn.Conv2d(64, 32, kernel_size=5)
+ )
+ backbone_down_projs.append(down_proj)
+ self.backbone_down_projs = nn.ModuleList(backbone_down_projs)
+
+ mlp_in_dim = 768 * len(backbones) + state_dim
+ self.mlp = mlp(input_dim=mlp_in_dim, hidden_dim=1024, output_dim=self.action_dim, hidden_depth=2)
+ else:
+ raise NotImplementedError
+
+ def forward(self, qpos, image, env_state, actions=None):
+ """
+ qpos: batch, qpos_dim
+ image: batch, num_cam, channel, height, width
+ env_state: None
+ actions: batch, seq, action_dim
+ """
+ is_training = actions is not None # train or val
+ bs, _ = qpos.shape
+ # Image observation features and position embeddings
+ all_cam_features = []
+ for cam_id, cam_name in enumerate(self.camera_names):
+ features, pos = self.backbones[cam_id](image[:, cam_id])
+ features = features[0] # take the last layer feature
+ pos = pos[0] # not used
+ all_cam_features.append(self.backbone_down_projs[cam_id](features))
+ # flatten everything
+ flattened_features = []
+ for cam_feature in all_cam_features:
+ flattened_features.append(cam_feature.reshape([bs, -1]))
+ flattened_features = torch.cat(flattened_features, axis=1) # 768 each
+ features = torch.cat([flattened_features, qpos], axis=1) # qpos: 14
+ a_hat = self.mlp(features)
+ return a_hat
+
+
+def mlp(input_dim, hidden_dim, output_dim, hidden_depth):
+ if hidden_depth == 0:
+ mods = [nn.Linear(input_dim, output_dim)]
+ else:
+ mods = [nn.Linear(input_dim, hidden_dim), nn.ReLU(inplace=True)]
+ for i in range(hidden_depth - 1):
+ mods += [nn.Linear(hidden_dim, hidden_dim), nn.ReLU(inplace=True)]
+ mods.append(nn.Linear(hidden_dim, output_dim))
+ trunk = nn.Sequential(*mods)
+ return trunk
+
+
+def build_encoder(args):
+ d_model = args.hidden_dim # 256
+ dropout = args.dropout # 0.1
+ nhead = args.nheads # 8
+ dim_feedforward = args.dim_feedforward # 2048
+ num_encoder_layers = args.enc_layers # 4 # TODO shared with VAE decoder
+ normalize_before = args.pre_norm # False
+ activation = "relu"
+
+ encoder_layer = TransformerEncoderLayer(d_model, nhead, dim_feedforward,
+ dropout, activation, normalize_before)
+ encoder_norm = nn.LayerNorm(d_model) if normalize_before else None
+ encoder = TransformerEncoder(encoder_layer, num_encoder_layers, encoder_norm)
+
+ return encoder
+
+
+def build(args):
+ state_dim = 14 # TODO hardcode
+
+ # From state
+ # backbone = None # from state for now, no need for conv nets
+ # From image
+ backbones = []
+ for _ in args.camera_names:
+ backbone = build_backbone(args)
+ backbones.append(backbone)
+
+ transformer = build_transformer(args)
+
+ if args.no_encoder:
+ encoder = None
+ else:
+ encoder = build_transformer(args)
+
+ model = DETRVAE(
+ backbones,
+ transformer,
+ encoder,
+ state_dim=state_dim,
+ num_queries=args.num_queries,
+ camera_names=args.camera_names,
+ vq=args.vq,
+ vq_class=args.vq_class,
+ vq_dim=args.vq_dim,
+ action_dim=args.action_dim,
+ )
+
+ n_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)
+ print("number of parameters: %.2fM" % (n_parameters/1e6,))
+
+ return model
+
+def build_cnnmlp(args):
+ state_dim = 14 # TODO hardcode
+
+ # From state
+ # backbone = None # from state for now, no need for conv nets
+ # From image
+ backbones = []
+ for _ in args.camera_names:
+ backbone = build_backbone(args)
+ backbones.append(backbone)
+
+ model = CNNMLP(
+ backbones,
+ state_dim=state_dim,
+ camera_names=args.camera_names,
+ )
+
+ n_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)
+ print("number of parameters: %.2fM" % (n_parameters/1e6,))
+
+ return model
+
diff --git a/docs/src/detr/models/latent_model.py b/docs/src/detr/models/latent_model.py
new file mode 100644
index 00000000..a6dfe050
--- /dev/null
+++ b/docs/src/detr/models/latent_model.py
@@ -0,0 +1,73 @@
+import torch.nn as nn
+from torch.nn import functional as F
+import torch
+
+DROPOUT_RATE = 0.1
+
+# a causal transformer block
+class Causal_Transformer_Block(nn.Module):
+ def __init__(self, seq_len, latent_dim, num_head) -> None:
+ super().__init__()
+ self.num_head = num_head
+ self.latent_dim = latent_dim
+ self.ln_1 = nn.LayerNorm(latent_dim)
+ self.attn = nn.MultiheadAttention(latent_dim, num_head, dropout=DROPOUT_RATE, batch_first=True)
+ self.ln_2 = nn.LayerNorm(latent_dim)
+ self.mlp = nn.Sequential(
+ nn.Linear(latent_dim, 4 * latent_dim),
+ nn.GELU(),
+ nn.Linear(4 * latent_dim, latent_dim),
+ nn.Dropout(DROPOUT_RATE),
+ )
+
+ # self.register_buffer("attn_mask", torch.triu(torch.ones(seq_len, seq_len), diagonal=1).bool())
+
+ def forward(self, x):
+ attn_mask = torch.triu(torch.ones(x.shape[1], x.shape[1], device=x.device, dtype=torch.bool), diagonal=1)
+ x = self.ln_1(x)
+ x = x + self.attn(x, x, x, attn_mask=attn_mask)[0]
+ x = self.ln_2(x)
+ x = x + self.mlp(x)
+
+ return x
+
+# use self-attention instead of RNN to model the latent space sequence
+class Latent_Model_Transformer(nn.Module):
+ def __init__(self, input_dim, output_dim, seq_len, latent_dim=256, num_head=8, num_layer=3) -> None:
+ super().__init__()
+ self.input_dim = input_dim
+ self.output_dim = output_dim
+ self.seq_len = seq_len
+ self.latent_dim = latent_dim
+ self.num_head = num_head
+ self.num_layer = num_layer
+ self.input_layer = nn.Linear(input_dim, latent_dim)
+ self.weight_pos_embed = nn.Embedding(seq_len, latent_dim)
+ self.attention_blocks = nn.Sequential(
+ nn.Dropout(DROPOUT_RATE),
+ *[Causal_Transformer_Block(seq_len, latent_dim, num_head) for _ in range(num_layer)],
+ nn.LayerNorm(latent_dim)
+ )
+ self.output_layer = nn.Linear(latent_dim, output_dim)
+
+ def forward(self, x):
+ x = self.input_layer(x)
+ x = x + self.weight_pos_embed(torch.arange(x.shape[1], device=x.device))
+ x = self.attention_blocks(x)
+ logits = self.output_layer(x)
+
+ return logits
+
+ @torch.no_grad()
+ def generate(self, n, temperature=0.1, x=None):
+ if x is None:
+ x = torch.zeros((n, 1, self.input_dim), device=self.weight_pos_embed.weight.device)
+ for i in range(self.seq_len):
+ logits = self.forward(x)[:, -1]
+ probs = torch.softmax(logits / temperature, dim=-1)
+ samples = torch.multinomial(probs, num_samples=1)[..., 0]
+ samples_one_hot = F.one_hot(samples.long(), num_classes=self.output_dim).float()
+ x = torch.cat([x, samples_one_hot[:, None, :]], dim=1)
+
+ return x[:, 1:, :]
+
diff --git a/docs/src/detr/models/position_encoding.py b/docs/src/detr/models/position_encoding.py
new file mode 100644
index 00000000..209d9171
--- /dev/null
+++ b/docs/src/detr/models/position_encoding.py
@@ -0,0 +1,93 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+"""
+Various positional encodings for the transformer.
+"""
+import math
+import torch
+from torch import nn
+
+from util.misc import NestedTensor
+
+import IPython
+e = IPython.embed
+
+class PositionEmbeddingSine(nn.Module):
+ """
+ This is a more standard version of the position embedding, very similar to the one
+ used by the Attention is all you need paper, generalized to work on images.
+ """
+ def __init__(self, num_pos_feats=64, temperature=10000, normalize=False, scale=None):
+ super().__init__()
+ self.num_pos_feats = num_pos_feats
+ self.temperature = temperature
+ self.normalize = normalize
+ if scale is not None and normalize is False:
+ raise ValueError("normalize should be True if scale is passed")
+ if scale is None:
+ scale = 2 * math.pi
+ self.scale = scale
+
+ def forward(self, tensor):
+ x = tensor
+ # mask = tensor_list.mask
+ # assert mask is not None
+ # not_mask = ~mask
+
+ not_mask = torch.ones_like(x[0, [0]])
+ y_embed = not_mask.cumsum(1, dtype=torch.float32)
+ x_embed = not_mask.cumsum(2, dtype=torch.float32)
+ if self.normalize:
+ eps = 1e-6
+ y_embed = y_embed / (y_embed[:, -1:, :] + eps) * self.scale
+ x_embed = x_embed / (x_embed[:, :, -1:] + eps) * self.scale
+
+ dim_t = torch.arange(self.num_pos_feats, dtype=torch.float32, device=x.device)
+ dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)
+
+ pos_x = x_embed[:, :, :, None] / dim_t
+ pos_y = y_embed[:, :, :, None] / dim_t
+ pos_x = torch.stack((pos_x[:, :, :, 0::2].sin(), pos_x[:, :, :, 1::2].cos()), dim=4).flatten(3)
+ pos_y = torch.stack((pos_y[:, :, :, 0::2].sin(), pos_y[:, :, :, 1::2].cos()), dim=4).flatten(3)
+ pos = torch.cat((pos_y, pos_x), dim=3).permute(0, 3, 1, 2)
+ return pos
+
+
+class PositionEmbeddingLearned(nn.Module):
+ """
+ Absolute pos embedding, learned.
+ """
+ def __init__(self, num_pos_feats=256):
+ super().__init__()
+ self.row_embed = nn.Embedding(50, num_pos_feats)
+ self.col_embed = nn.Embedding(50, num_pos_feats)
+ self.reset_parameters()
+
+ def reset_parameters(self):
+ nn.init.uniform_(self.row_embed.weight)
+ nn.init.uniform_(self.col_embed.weight)
+
+ def forward(self, tensor_list: NestedTensor):
+ x = tensor_list.tensors
+ h, w = x.shape[-2:]
+ i = torch.arange(w, device=x.device)
+ j = torch.arange(h, device=x.device)
+ x_emb = self.col_embed(i)
+ y_emb = self.row_embed(j)
+ pos = torch.cat([
+ x_emb.unsqueeze(0).repeat(h, 1, 1),
+ y_emb.unsqueeze(1).repeat(1, w, 1),
+ ], dim=-1).permute(2, 0, 1).unsqueeze(0).repeat(x.shape[0], 1, 1, 1)
+ return pos
+
+
+def build_position_encoding(args):
+ N_steps = args.hidden_dim // 2
+ if args.position_embedding in ('v2', 'sine'):
+ # TODO find a better way of exposing other arguments
+ position_embedding = PositionEmbeddingSine(N_steps, normalize=True)
+ elif args.position_embedding in ('v3', 'learned'):
+ position_embedding = PositionEmbeddingLearned(N_steps)
+ else:
+ raise ValueError(f"not supported {args.position_embedding}")
+
+ return position_embedding
diff --git a/docs/src/detr/models/transformer.py b/docs/src/detr/models/transformer.py
new file mode 100644
index 00000000..f38afd0e
--- /dev/null
+++ b/docs/src/detr/models/transformer.py
@@ -0,0 +1,314 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+"""
+DETR Transformer class.
+
+Copy-paste from torch.nn.Transformer with modifications:
+ * positional encodings are passed in MHattention
+ * extra LN at the end of encoder is removed
+ * decoder returns a stack of activations from all decoding layers
+"""
+import copy
+from typing import Optional, List
+
+import torch
+import torch.nn.functional as F
+from torch import nn, Tensor
+
+import IPython
+e = IPython.embed
+
+class Transformer(nn.Module):
+
+ def __init__(self, d_model=512, nhead=8, num_encoder_layers=6,
+ num_decoder_layers=6, dim_feedforward=2048, dropout=0.1,
+ activation="relu", normalize_before=False,
+ return_intermediate_dec=False):
+ super().__init__()
+
+ encoder_layer = TransformerEncoderLayer(d_model, nhead, dim_feedforward,
+ dropout, activation, normalize_before)
+ encoder_norm = nn.LayerNorm(d_model) if normalize_before else None
+ self.encoder = TransformerEncoder(encoder_layer, num_encoder_layers, encoder_norm)
+
+ decoder_layer = TransformerDecoderLayer(d_model, nhead, dim_feedforward,
+ dropout, activation, normalize_before)
+ decoder_norm = nn.LayerNorm(d_model)
+ self.decoder = TransformerDecoder(decoder_layer, num_decoder_layers, decoder_norm,
+ return_intermediate=return_intermediate_dec)
+
+ self._reset_parameters()
+
+ self.d_model = d_model
+ self.nhead = nhead
+
+ def _reset_parameters(self):
+ for p in self.parameters():
+ if p.dim() > 1:
+ nn.init.xavier_uniform_(p)
+
+ def forward(self, src, mask, query_embed, pos_embed, latent_input=None, proprio_input=None, additional_pos_embed=None):
+ # TODO flatten only when input has H and W
+ if len(src.shape) == 4: # has H and W
+ # flatten NxCxHxW to HWxNxC
+ bs, c, h, w = src.shape
+ src = src.flatten(2).permute(2, 0, 1)
+ pos_embed = pos_embed.flatten(2).permute(2, 0, 1).repeat(1, bs, 1)
+ query_embed = query_embed.unsqueeze(1).repeat(1, bs, 1)
+ # mask = mask.flatten(1)
+
+ additional_pos_embed = additional_pos_embed.unsqueeze(1).repeat(1, bs, 1) # seq, bs, dim
+ pos_embed = torch.cat([additional_pos_embed, pos_embed], axis=0)
+
+ addition_input = torch.stack([latent_input, proprio_input], axis=0)
+ src = torch.cat([addition_input, src], axis=0)
+ else:
+ assert len(src.shape) == 3
+ # flatten NxHWxC to HWxNxC
+ bs, hw, c = src.shape
+ src = src.permute(1, 0, 2)
+ pos_embed = pos_embed.unsqueeze(1).repeat(1, bs, 1)
+ query_embed = query_embed.unsqueeze(1).repeat(1, bs, 1)
+
+ tgt = torch.zeros_like(query_embed)
+ memory = self.encoder(src, src_key_padding_mask=mask, pos=pos_embed)
+ hs = self.decoder(tgt, memory, memory_key_padding_mask=mask,
+ pos=pos_embed, query_pos=query_embed)
+ hs = hs.transpose(1, 2)
+ return hs
+
+class TransformerEncoder(nn.Module):
+
+ def __init__(self, encoder_layer, num_layers, norm=None):
+ super().__init__()
+ self.layers = _get_clones(encoder_layer, num_layers)
+ self.num_layers = num_layers
+ self.norm = norm
+
+ def forward(self, src,
+ mask: Optional[Tensor] = None,
+ src_key_padding_mask: Optional[Tensor] = None,
+ pos: Optional[Tensor] = None):
+ output = src
+
+ for layer in self.layers:
+ output = layer(output, src_mask=mask,
+ src_key_padding_mask=src_key_padding_mask, pos=pos)
+
+ if self.norm is not None:
+ output = self.norm(output)
+
+ return output
+
+
+class TransformerDecoder(nn.Module):
+
+ def __init__(self, decoder_layer, num_layers, norm=None, return_intermediate=False):
+ super().__init__()
+ self.layers = _get_clones(decoder_layer, num_layers)
+ self.num_layers = num_layers
+ self.norm = norm
+ self.return_intermediate = return_intermediate
+
+ def forward(self, tgt, memory,
+ tgt_mask: Optional[Tensor] = None,
+ memory_mask: Optional[Tensor] = None,
+ tgt_key_padding_mask: Optional[Tensor] = None,
+ memory_key_padding_mask: Optional[Tensor] = None,
+ pos: Optional[Tensor] = None,
+ query_pos: Optional[Tensor] = None):
+ output = tgt
+
+ intermediate = []
+
+ for layer in self.layers:
+ output = layer(output, memory, tgt_mask=tgt_mask,
+ memory_mask=memory_mask,
+ tgt_key_padding_mask=tgt_key_padding_mask,
+ memory_key_padding_mask=memory_key_padding_mask,
+ pos=pos, query_pos=query_pos)
+ if self.return_intermediate:
+ intermediate.append(self.norm(output))
+
+ if self.norm is not None:
+ output = self.norm(output)
+ if self.return_intermediate:
+ intermediate.pop()
+ intermediate.append(output)
+
+ if self.return_intermediate:
+ return torch.stack(intermediate)
+
+ return output.unsqueeze(0)
+
+
+class TransformerEncoderLayer(nn.Module):
+
+ def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1,
+ activation="relu", normalize_before=False):
+ super().__init__()
+ self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
+ # Implementation of Feedforward model
+ self.linear1 = nn.Linear(d_model, dim_feedforward)
+ self.dropout = nn.Dropout(dropout)
+ self.linear2 = nn.Linear(dim_feedforward, d_model)
+
+ self.norm1 = nn.LayerNorm(d_model)
+ self.norm2 = nn.LayerNorm(d_model)
+ self.dropout1 = nn.Dropout(dropout)
+ self.dropout2 = nn.Dropout(dropout)
+
+ self.activation = _get_activation_fn(activation)
+ self.normalize_before = normalize_before
+
+ def with_pos_embed(self, tensor, pos: Optional[Tensor]):
+ return tensor if pos is None else tensor + pos
+
+ def forward_post(self,
+ src,
+ src_mask: Optional[Tensor] = None,
+ src_key_padding_mask: Optional[Tensor] = None,
+ pos: Optional[Tensor] = None):
+ q = k = self.with_pos_embed(src, pos)
+ src2 = self.self_attn(q, k, value=src, attn_mask=src_mask,
+ key_padding_mask=src_key_padding_mask)[0]
+ src = src + self.dropout1(src2)
+ src = self.norm1(src)
+ src2 = self.linear2(self.dropout(self.activation(self.linear1(src))))
+ src = src + self.dropout2(src2)
+ src = self.norm2(src)
+ return src
+
+ def forward_pre(self, src,
+ src_mask: Optional[Tensor] = None,
+ src_key_padding_mask: Optional[Tensor] = None,
+ pos: Optional[Tensor] = None):
+ src2 = self.norm1(src)
+ q = k = self.with_pos_embed(src2, pos)
+ src2 = self.self_attn(q, k, value=src2, attn_mask=src_mask,
+ key_padding_mask=src_key_padding_mask)[0]
+ src = src + self.dropout1(src2)
+ src2 = self.norm2(src)
+ src2 = self.linear2(self.dropout(self.activation(self.linear1(src2))))
+ src = src + self.dropout2(src2)
+ return src
+
+ def forward(self, src,
+ src_mask: Optional[Tensor] = None,
+ src_key_padding_mask: Optional[Tensor] = None,
+ pos: Optional[Tensor] = None):
+ if self.normalize_before:
+ return self.forward_pre(src, src_mask, src_key_padding_mask, pos)
+ return self.forward_post(src, src_mask, src_key_padding_mask, pos)
+
+
+class TransformerDecoderLayer(nn.Module):
+
+ def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1,
+ activation="relu", normalize_before=False):
+ super().__init__()
+ self.self_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
+ self.multihead_attn = nn.MultiheadAttention(d_model, nhead, dropout=dropout)
+ # Implementation of Feedforward model
+ self.linear1 = nn.Linear(d_model, dim_feedforward)
+ self.dropout = nn.Dropout(dropout)
+ self.linear2 = nn.Linear(dim_feedforward, d_model)
+
+ self.norm1 = nn.LayerNorm(d_model)
+ self.norm2 = nn.LayerNorm(d_model)
+ self.norm3 = nn.LayerNorm(d_model)
+ self.dropout1 = nn.Dropout(dropout)
+ self.dropout2 = nn.Dropout(dropout)
+ self.dropout3 = nn.Dropout(dropout)
+
+ self.activation = _get_activation_fn(activation)
+ self.normalize_before = normalize_before
+
+ def with_pos_embed(self, tensor, pos: Optional[Tensor]):
+ return tensor if pos is None else tensor + pos
+
+ def forward_post(self, tgt, memory,
+ tgt_mask: Optional[Tensor] = None,
+ memory_mask: Optional[Tensor] = None,
+ tgt_key_padding_mask: Optional[Tensor] = None,
+ memory_key_padding_mask: Optional[Tensor] = None,
+ pos: Optional[Tensor] = None,
+ query_pos: Optional[Tensor] = None):
+ q = k = self.with_pos_embed(tgt, query_pos)
+ tgt2 = self.self_attn(q, k, value=tgt, attn_mask=tgt_mask,
+ key_padding_mask=tgt_key_padding_mask)[0]
+ tgt = tgt + self.dropout1(tgt2)
+ tgt = self.norm1(tgt)
+ tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt, query_pos),
+ key=self.with_pos_embed(memory, pos),
+ value=memory, attn_mask=memory_mask,
+ key_padding_mask=memory_key_padding_mask)[0]
+ tgt = tgt + self.dropout2(tgt2)
+ tgt = self.norm2(tgt)
+ tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))
+ tgt = tgt + self.dropout3(tgt2)
+ tgt = self.norm3(tgt)
+ return tgt
+
+ def forward_pre(self, tgt, memory,
+ tgt_mask: Optional[Tensor] = None,
+ memory_mask: Optional[Tensor] = None,
+ tgt_key_padding_mask: Optional[Tensor] = None,
+ memory_key_padding_mask: Optional[Tensor] = None,
+ pos: Optional[Tensor] = None,
+ query_pos: Optional[Tensor] = None):
+ tgt2 = self.norm1(tgt)
+ q = k = self.with_pos_embed(tgt2, query_pos)
+ tgt2 = self.self_attn(q, k, value=tgt2, attn_mask=tgt_mask,
+ key_padding_mask=tgt_key_padding_mask)[0]
+ tgt = tgt + self.dropout1(tgt2)
+ tgt2 = self.norm2(tgt)
+ tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt2, query_pos),
+ key=self.with_pos_embed(memory, pos),
+ value=memory, attn_mask=memory_mask,
+ key_padding_mask=memory_key_padding_mask)[0]
+ tgt = tgt + self.dropout2(tgt2)
+ tgt2 = self.norm3(tgt)
+ tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))
+ tgt = tgt + self.dropout3(tgt2)
+ return tgt
+
+ def forward(self, tgt, memory,
+ tgt_mask: Optional[Tensor] = None,
+ memory_mask: Optional[Tensor] = None,
+ tgt_key_padding_mask: Optional[Tensor] = None,
+ memory_key_padding_mask: Optional[Tensor] = None,
+ pos: Optional[Tensor] = None,
+ query_pos: Optional[Tensor] = None):
+ if self.normalize_before:
+ return self.forward_pre(tgt, memory, tgt_mask, memory_mask,
+ tgt_key_padding_mask, memory_key_padding_mask, pos, query_pos)
+ return self.forward_post(tgt, memory, tgt_mask, memory_mask,
+ tgt_key_padding_mask, memory_key_padding_mask, pos, query_pos)
+
+
+def _get_clones(module, N):
+ return nn.ModuleList([copy.deepcopy(module) for i in range(N)])
+
+
+def build_transformer(args):
+ return Transformer(
+ d_model=args.hidden_dim,
+ dropout=args.dropout,
+ nhead=args.nheads,
+ dim_feedforward=args.dim_feedforward,
+ num_encoder_layers=args.enc_layers,
+ num_decoder_layers=args.dec_layers,
+ normalize_before=args.pre_norm,
+ return_intermediate_dec=True,
+ )
+
+
+def _get_activation_fn(activation):
+ """Return an activation function given a string"""
+ if activation == "relu":
+ return F.relu
+ if activation == "gelu":
+ return F.gelu
+ if activation == "glu":
+ return F.glu
+ raise RuntimeError(F"activation should be relu/gelu, not {activation}.")
diff --git a/docs/src/detr/setup.py b/docs/src/detr/setup.py
new file mode 100644
index 00000000..55d18c0d
--- /dev/null
+++ b/docs/src/detr/setup.py
@@ -0,0 +1,10 @@
+from distutils.core import setup
+from setuptools import find_packages
+
+setup(
+ name='detr',
+ version='0.0.0',
+ packages=find_packages(),
+ license='MIT License',
+ long_description=open('README.md').read(),
+)
\ No newline at end of file
diff --git a/docs/src/detr/util/__init__.py b/docs/src/detr/util/__init__.py
new file mode 100644
index 00000000..168f9979
--- /dev/null
+++ b/docs/src/detr/util/__init__.py
@@ -0,0 +1 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
diff --git a/docs/src/detr/util/box_ops.py b/docs/src/detr/util/box_ops.py
new file mode 100644
index 00000000..9c088e5b
--- /dev/null
+++ b/docs/src/detr/util/box_ops.py
@@ -0,0 +1,88 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+"""
+Utilities for bounding box manipulation and GIoU.
+"""
+import torch
+from torchvision.ops.boxes import box_area
+
+
+def box_cxcywh_to_xyxy(x):
+ x_c, y_c, w, h = x.unbind(-1)
+ b = [(x_c - 0.5 * w), (y_c - 0.5 * h),
+ (x_c + 0.5 * w), (y_c + 0.5 * h)]
+ return torch.stack(b, dim=-1)
+
+
+def box_xyxy_to_cxcywh(x):
+ x0, y0, x1, y1 = x.unbind(-1)
+ b = [(x0 + x1) / 2, (y0 + y1) / 2,
+ (x1 - x0), (y1 - y0)]
+ return torch.stack(b, dim=-1)
+
+
+# modified from torchvision to also return the union
+def box_iou(boxes1, boxes2):
+ area1 = box_area(boxes1)
+ area2 = box_area(boxes2)
+
+ lt = torch.max(boxes1[:, None, :2], boxes2[:, :2]) # [N,M,2]
+ rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:]) # [N,M,2]
+
+ wh = (rb - lt).clamp(min=0) # [N,M,2]
+ inter = wh[:, :, 0] * wh[:, :, 1] # [N,M]
+
+ union = area1[:, None] + area2 - inter
+
+ iou = inter / union
+ return iou, union
+
+
+def generalized_box_iou(boxes1, boxes2):
+ """
+ Generalized IoU from https://giou.stanford.edu/
+
+ The boxes should be in [x0, y0, x1, y1] format
+
+ Returns a [N, M] pairwise matrix, where N = len(boxes1)
+ and M = len(boxes2)
+ """
+ # degenerate boxes gives inf / nan results
+ # so do an early check
+ assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
+ assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
+ iou, union = box_iou(boxes1, boxes2)
+
+ lt = torch.min(boxes1[:, None, :2], boxes2[:, :2])
+ rb = torch.max(boxes1[:, None, 2:], boxes2[:, 2:])
+
+ wh = (rb - lt).clamp(min=0) # [N,M,2]
+ area = wh[:, :, 0] * wh[:, :, 1]
+
+ return iou - (area - union) / area
+
+
+def masks_to_boxes(masks):
+ """Compute the bounding boxes around the provided masks
+
+ The masks should be in format [N, H, W] where N is the number of masks, (H, W) are the spatial dimensions.
+
+ Returns a [N, 4] tensors, with the boxes in xyxy format
+ """
+ if masks.numel() == 0:
+ return torch.zeros((0, 4), device=masks.device)
+
+ h, w = masks.shape[-2:]
+
+ y = torch.arange(0, h, dtype=torch.float)
+ x = torch.arange(0, w, dtype=torch.float)
+ y, x = torch.meshgrid(y, x)
+
+ x_mask = (masks * x.unsqueeze(0))
+ x_max = x_mask.flatten(1).max(-1)[0]
+ x_min = x_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]
+
+ y_mask = (masks * y.unsqueeze(0))
+ y_max = y_mask.flatten(1).max(-1)[0]
+ y_min = y_mask.masked_fill(~(masks.bool()), 1e8).flatten(1).min(-1)[0]
+
+ return torch.stack([x_min, y_min, x_max, y_max], 1)
diff --git a/docs/src/detr/util/misc.py b/docs/src/detr/util/misc.py
new file mode 100644
index 00000000..dfa9fb5b
--- /dev/null
+++ b/docs/src/detr/util/misc.py
@@ -0,0 +1,468 @@
+# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
+"""
+Misc functions, including distributed helpers.
+
+Mostly copy-paste from torchvision references.
+"""
+import os
+import subprocess
+import time
+from collections import defaultdict, deque
+import datetime
+import pickle
+from packaging import version
+from typing import Optional, List
+
+import torch
+import torch.distributed as dist
+from torch import Tensor
+
+# needed due to empty tensor bug in pytorch and torchvision 0.5
+import torchvision
+if version.parse(torchvision.__version__) < version.parse('0.7'):
+ from torchvision.ops import _new_empty_tensor
+ from torchvision.ops.misc import _output_size
+
+
+class SmoothedValue(object):
+ """Track a series of values and provide access to smoothed values over a
+ window or the global series average.
+ """
+
+ def __init__(self, window_size=20, fmt=None):
+ if fmt is None:
+ fmt = "{median:.4f} ({global_avg:.4f})"
+ self.deque = deque(maxlen=window_size)
+ self.total = 0.0
+ self.count = 0
+ self.fmt = fmt
+
+ def update(self, value, n=1):
+ self.deque.append(value)
+ self.count += n
+ self.total += value * n
+
+ def synchronize_between_processes(self):
+ """
+ Warning: does not synchronize the deque!
+ """
+ if not is_dist_avail_and_initialized():
+ return
+ t = torch.tensor([self.count, self.total], dtype=torch.float64, device='cuda')
+ dist.barrier()
+ dist.all_reduce(t)
+ t = t.tolist()
+ self.count = int(t[0])
+ self.total = t[1]
+
+ @property
+ def median(self):
+ d = torch.tensor(list(self.deque))
+ return d.median().item()
+
+ @property
+ def avg(self):
+ d = torch.tensor(list(self.deque), dtype=torch.float32)
+ return d.mean().item()
+
+ @property
+ def global_avg(self):
+ return self.total / self.count
+
+ @property
+ def max(self):
+ return max(self.deque)
+
+ @property
+ def value(self):
+ return self.deque[-1]
+
+ def __str__(self):
+ return self.fmt.format(
+ median=self.median,
+ avg=self.avg,
+ global_avg=self.global_avg,
+ max=self.max,
+ value=self.value)
+
+
+def all_gather(data):
+ """
+ Run all_gather on arbitrary picklable data (not necessarily tensors)
+ Args:
+ data: any picklable object
+ Returns:
+ list[data]: list of data gathered from each rank
+ """
+ world_size = get_world_size()
+ if world_size == 1:
+ return [data]
+
+ # serialized to a Tensor
+ buffer = pickle.dumps(data)
+ storage = torch.ByteStorage.from_buffer(buffer)
+ tensor = torch.ByteTensor(storage).to("cuda")
+
+ # obtain Tensor size of each rank
+ local_size = torch.tensor([tensor.numel()], device="cuda")
+ size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
+ dist.all_gather(size_list, local_size)
+ size_list = [int(size.item()) for size in size_list]
+ max_size = max(size_list)
+
+ # receiving Tensor from all ranks
+ # we pad the tensor because torch all_gather does not support
+ # gathering tensors of different shapes
+ tensor_list = []
+ for _ in size_list:
+ tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
+ if local_size != max_size:
+ padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda")
+ tensor = torch.cat((tensor, padding), dim=0)
+ dist.all_gather(tensor_list, tensor)
+
+ data_list = []
+ for size, tensor in zip(size_list, tensor_list):
+ buffer = tensor.cpu().numpy().tobytes()[:size]
+ data_list.append(pickle.loads(buffer))
+
+ return data_list
+
+
+def reduce_dict(input_dict, average=True):
+ """
+ Args:
+ input_dict (dict): all the values will be reduced
+ average (bool): whether to do average or sum
+ Reduce the values in the dictionary from all processes so that all processes
+ have the averaged results. Returns a dict with the same fields as
+ input_dict, after reduction.
+ """
+ world_size = get_world_size()
+ if world_size < 2:
+ return input_dict
+ with torch.no_grad():
+ names = []
+ values = []
+ # sort the keys so that they are consistent across processes
+ for k in sorted(input_dict.keys()):
+ names.append(k)
+ values.append(input_dict[k])
+ values = torch.stack(values, dim=0)
+ dist.all_reduce(values)
+ if average:
+ values /= world_size
+ reduced_dict = {k: v for k, v in zip(names, values)}
+ return reduced_dict
+
+
+class MetricLogger(object):
+ def __init__(self, delimiter="\t"):
+ self.meters = defaultdict(SmoothedValue)
+ self.delimiter = delimiter
+
+ def update(self, **kwargs):
+ for k, v in kwargs.items():
+ if isinstance(v, torch.Tensor):
+ v = v.item()
+ assert isinstance(v, (float, int))
+ self.meters[k].update(v)
+
+ def __getattr__(self, attr):
+ if attr in self.meters:
+ return self.meters[attr]
+ if attr in self.__dict__:
+ return self.__dict__[attr]
+ raise AttributeError("'{}' object has no attribute '{}'".format(
+ type(self).__name__, attr))
+
+ def __str__(self):
+ loss_str = []
+ for name, meter in self.meters.items():
+ loss_str.append(
+ "{}: {}".format(name, str(meter))
+ )
+ return self.delimiter.join(loss_str)
+
+ def synchronize_between_processes(self):
+ for meter in self.meters.values():
+ meter.synchronize_between_processes()
+
+ def add_meter(self, name, meter):
+ self.meters[name] = meter
+
+ def log_every(self, iterable, print_freq, header=None):
+ i = 0
+ if not header:
+ header = ''
+ start_time = time.time()
+ end = time.time()
+ iter_time = SmoothedValue(fmt='{avg:.4f}')
+ data_time = SmoothedValue(fmt='{avg:.4f}')
+ space_fmt = ':' + str(len(str(len(iterable)))) + 'd'
+ if torch.cuda.is_available():
+ log_msg = self.delimiter.join([
+ header,
+ '[{0' + space_fmt + '}/{1}]',
+ 'eta: {eta}',
+ '{meters}',
+ 'time: {time}',
+ 'data: {data}',
+ 'max mem: {memory:.0f}'
+ ])
+ else:
+ log_msg = self.delimiter.join([
+ header,
+ '[{0' + space_fmt + '}/{1}]',
+ 'eta: {eta}',
+ '{meters}',
+ 'time: {time}',
+ 'data: {data}'
+ ])
+ MB = 1024.0 * 1024.0
+ for obj in iterable:
+ data_time.update(time.time() - end)
+ yield obj
+ iter_time.update(time.time() - end)
+ if i % print_freq == 0 or i == len(iterable) - 1:
+ eta_seconds = iter_time.global_avg * (len(iterable) - i)
+ eta_string = str(datetime.timedelta(seconds=int(eta_seconds)))
+ if torch.cuda.is_available():
+ print(log_msg.format(
+ i, len(iterable), eta=eta_string,
+ meters=str(self),
+ time=str(iter_time), data=str(data_time),
+ memory=torch.cuda.max_memory_allocated() / MB))
+ else:
+ print(log_msg.format(
+ i, len(iterable), eta=eta_string,
+ meters=str(self),
+ time=str(iter_time), data=str(data_time)))
+ i += 1
+ end = time.time()
+ total_time = time.time() - start_time
+ total_time_str = str(datetime.timedelta(seconds=int(total_time)))
+ print('{} Total time: {} ({:.4f} s / it)'.format(
+ header, total_time_str, total_time / len(iterable)))
+
+
+def get_sha():
+ cwd = os.path.dirname(os.path.abspath(__file__))
+
+ def _run(command):
+ return subprocess.check_output(command, cwd=cwd).decode('ascii').strip()
+ sha = 'N/A'
+ diff = "clean"
+ branch = 'N/A'
+ try:
+ sha = _run(['git', 'rev-parse', 'HEAD'])
+ subprocess.check_output(['git', 'diff'], cwd=cwd)
+ diff = _run(['git', 'diff-index', 'HEAD'])
+ diff = "has uncommited changes" if diff else "clean"
+ branch = _run(['git', 'rev-parse', '--abbrev-ref', 'HEAD'])
+ except Exception:
+ pass
+ message = f"sha: {sha}, status: {diff}, branch: {branch}"
+ return message
+
+
+def collate_fn(batch):
+ batch = list(zip(*batch))
+ batch[0] = nested_tensor_from_tensor_list(batch[0])
+ return tuple(batch)
+
+
+def _max_by_axis(the_list):
+ # type: (List[List[int]]) -> List[int]
+ maxes = the_list[0]
+ for sublist in the_list[1:]:
+ for index, item in enumerate(sublist):
+ maxes[index] = max(maxes[index], item)
+ return maxes
+
+
+class NestedTensor(object):
+ def __init__(self, tensors, mask: Optional[Tensor]):
+ self.tensors = tensors
+ self.mask = mask
+
+ def to(self, device):
+ # type: (Device) -> NestedTensor # noqa
+ cast_tensor = self.tensors.to(device)
+ mask = self.mask
+ if mask is not None:
+ assert mask is not None
+ cast_mask = mask.to(device)
+ else:
+ cast_mask = None
+ return NestedTensor(cast_tensor, cast_mask)
+
+ def decompose(self):
+ return self.tensors, self.mask
+
+ def __repr__(self):
+ return str(self.tensors)
+
+
+def nested_tensor_from_tensor_list(tensor_list: List[Tensor]):
+ # TODO make this more general
+ if tensor_list[0].ndim == 3:
+ if torchvision._is_tracing():
+ # nested_tensor_from_tensor_list() does not export well to ONNX
+ # call _onnx_nested_tensor_from_tensor_list() instead
+ return _onnx_nested_tensor_from_tensor_list(tensor_list)
+
+ # TODO make it support different-sized images
+ max_size = _max_by_axis([list(img.shape) for img in tensor_list])
+ # min_size = tuple(min(s) for s in zip(*[img.shape for img in tensor_list]))
+ batch_shape = [len(tensor_list)] + max_size
+ b, c, h, w = batch_shape
+ dtype = tensor_list[0].dtype
+ device = tensor_list[0].device
+ tensor = torch.zeros(batch_shape, dtype=dtype, device=device)
+ mask = torch.ones((b, h, w), dtype=torch.bool, device=device)
+ for img, pad_img, m in zip(tensor_list, tensor, mask):
+ pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
+ m[: img.shape[1], :img.shape[2]] = False
+ else:
+ raise ValueError('not supported')
+ return NestedTensor(tensor, mask)
+
+
+# _onnx_nested_tensor_from_tensor_list() is an implementation of
+# nested_tensor_from_tensor_list() that is supported by ONNX tracing.
+@torch.jit.unused
+def _onnx_nested_tensor_from_tensor_list(tensor_list: List[Tensor]) -> NestedTensor:
+ max_size = []
+ for i in range(tensor_list[0].dim()):
+ max_size_i = torch.max(torch.stack([img.shape[i] for img in tensor_list]).to(torch.float32)).to(torch.int64)
+ max_size.append(max_size_i)
+ max_size = tuple(max_size)
+
+ # work around for
+ # pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
+ # m[: img.shape[1], :img.shape[2]] = False
+ # which is not yet supported in onnx
+ padded_imgs = []
+ padded_masks = []
+ for img in tensor_list:
+ padding = [(s1 - s2) for s1, s2 in zip(max_size, tuple(img.shape))]
+ padded_img = torch.nn.functional.pad(img, (0, padding[2], 0, padding[1], 0, padding[0]))
+ padded_imgs.append(padded_img)
+
+ m = torch.zeros_like(img[0], dtype=torch.int, device=img.device)
+ padded_mask = torch.nn.functional.pad(m, (0, padding[2], 0, padding[1]), "constant", 1)
+ padded_masks.append(padded_mask.to(torch.bool))
+
+ tensor = torch.stack(padded_imgs)
+ mask = torch.stack(padded_masks)
+
+ return NestedTensor(tensor, mask=mask)
+
+
+def setup_for_distributed(is_master):
+ """
+ This function disables printing when not in master process
+ """
+ import builtins as __builtin__
+ builtin_print = __builtin__.print
+
+ def print(*args, **kwargs):
+ force = kwargs.pop('force', False)
+ if is_master or force:
+ builtin_print(*args, **kwargs)
+
+ __builtin__.print = print
+
+
+def is_dist_avail_and_initialized():
+ if not dist.is_available():
+ return False
+ if not dist.is_initialized():
+ return False
+ return True
+
+
+def get_world_size():
+ if not is_dist_avail_and_initialized():
+ return 1
+ return dist.get_world_size()
+
+
+def get_rank():
+ if not is_dist_avail_and_initialized():
+ return 0
+ return dist.get_rank()
+
+
+def is_main_process():
+ return get_rank() == 0
+
+
+def save_on_master(*args, **kwargs):
+ if is_main_process():
+ torch.save(*args, **kwargs)
+
+
+def init_distributed_mode(args):
+ if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:
+ args.rank = int(os.environ["RANK"])
+ args.world_size = int(os.environ['WORLD_SIZE'])
+ args.gpu = int(os.environ['LOCAL_RANK'])
+ elif 'SLURM_PROCID' in os.environ:
+ args.rank = int(os.environ['SLURM_PROCID'])
+ args.gpu = args.rank % torch.cuda.device_count()
+ else:
+ print('Not using distributed mode')
+ args.distributed = False
+ return
+
+ args.distributed = True
+
+ torch.cuda.set_device(args.gpu)
+ args.dist_backend = 'nccl'
+ print('| distributed init (rank {}): {}'.format(
+ args.rank, args.dist_url), flush=True)
+ torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
+ world_size=args.world_size, rank=args.rank)
+ torch.distributed.barrier()
+ setup_for_distributed(args.rank == 0)
+
+
+@torch.no_grad()
+def accuracy(output, target, topk=(1,)):
+ """Computes the precision@k for the specified values of k"""
+ if target.numel() == 0:
+ return [torch.zeros([], device=output.device)]
+ maxk = max(topk)
+ batch_size = target.size(0)
+
+ _, pred = output.topk(maxk, 1, True, True)
+ pred = pred.t()
+ correct = pred.eq(target.view(1, -1).expand_as(pred))
+
+ res = []
+ for k in topk:
+ correct_k = correct[:k].view(-1).float().sum(0)
+ res.append(correct_k.mul_(100.0 / batch_size))
+ return res
+
+
+def interpolate(input, size=None, scale_factor=None, mode="nearest", align_corners=None):
+ # type: (Tensor, Optional[List[int]], Optional[float], str, Optional[bool]) -> Tensor
+ """
+ Equivalent to nn.functional.interpolate, but with support for empty batch sizes.
+ This will eventually be supported natively by PyTorch, and this
+ class can go away.
+ """
+ if version.parse(torchvision.__version__) < version.parse('0.7'):
+ if input.numel() > 0:
+ return torch.nn.functional.interpolate(
+ input, size, scale_factor, mode, align_corners
+ )
+
+ output_shape = _output_size(2, input, size, scale_factor)
+ output_shape = list(input.shape[:-2]) + list(output_shape)
+ return _new_empty_tensor(input, output_shape)
+ else:
+ return torchvision.ops.misc.interpolate(input, size, scale_factor, mode, align_corners)
diff --git a/docs/src/detr/util/plot_utils.py b/docs/src/detr/util/plot_utils.py
new file mode 100644
index 00000000..0f24bed0
--- /dev/null
+++ b/docs/src/detr/util/plot_utils.py
@@ -0,0 +1,107 @@
+"""
+Plotting utilities to visualize training logs.
+"""
+import torch
+import pandas as pd
+import numpy as np
+import seaborn as sns
+import matplotlib.pyplot as plt
+
+from pathlib import Path, PurePath
+
+
+def plot_logs(logs, fields=('class_error', 'loss_bbox_unscaled', 'mAP'), ewm_col=0, log_name='log.txt'):
+ '''
+ Function to plot specific fields from training log(s). Plots both training and test results.
+
+ :: Inputs - logs = list containing Path objects, each pointing to individual dir with a log file
+ - fields = which results to plot from each log file - plots both training and test for each field.
+ - ewm_col = optional, which column to use as the exponential weighted smoothing of the plots
+ - log_name = optional, name of log file if different than default 'log.txt'.
+
+ :: Outputs - matplotlib plots of results in fields, color coded for each log file.
+ - solid lines are training results, dashed lines are test results.
+
+ '''
+ func_name = "plot_utils.py::plot_logs"
+
+ # verify logs is a list of Paths (list[Paths]) or single Pathlib object Path,
+ # convert single Path to list to avoid 'not iterable' error
+
+ if not isinstance(logs, list):
+ if isinstance(logs, PurePath):
+ logs = [logs]
+ print(f"{func_name} info: logs param expects a list argument, converted to list[Path].")
+ else:
+ raise ValueError(f"{func_name} - invalid argument for logs parameter.\n \
+ Expect list[Path] or single Path obj, received {type(logs)}")
+
+ # Quality checks - verify valid dir(s), that every item in list is Path object, and that log_name exists in each dir
+ for i, dir in enumerate(logs):
+ if not isinstance(dir, PurePath):
+ raise ValueError(f"{func_name} - non-Path object in logs argument of {type(dir)}: \n{dir}")
+ if not dir.exists():
+ raise ValueError(f"{func_name} - invalid directory in logs argument:\n{dir}")
+ # verify log_name exists
+ fn = Path(dir / log_name)
+ if not fn.exists():
+ print(f"-> missing {log_name}. Have you gotten to Epoch 1 in training?")
+ print(f"--> full path of missing log file: {fn}")
+ return
+
+ # load log file(s) and plot
+ dfs = [pd.read_json(Path(p) / log_name, lines=True) for p in logs]
+
+ fig, axs = plt.subplots(ncols=len(fields), figsize=(16, 5))
+
+ for df, color in zip(dfs, sns.color_palette(n_colors=len(logs))):
+ for j, field in enumerate(fields):
+ if field == 'mAP':
+ coco_eval = pd.DataFrame(
+ np.stack(df.test_coco_eval_bbox.dropna().values)[:, 1]
+ ).ewm(com=ewm_col).mean()
+ axs[j].plot(coco_eval, c=color)
+ else:
+ df.interpolate().ewm(com=ewm_col).mean().plot(
+ y=[f'train_{field}', f'test_{field}'],
+ ax=axs[j],
+ color=[color] * 2,
+ style=['-', '--']
+ )
+ for ax, field in zip(axs, fields):
+ ax.legend([Path(p).name for p in logs])
+ ax.set_title(field)
+
+
+def plot_precision_recall(files, naming_scheme='iter'):
+ if naming_scheme == 'exp_id':
+ # name becomes exp_id
+ names = [f.parts[-3] for f in files]
+ elif naming_scheme == 'iter':
+ names = [f.stem for f in files]
+ else:
+ raise ValueError(f'not supported {naming_scheme}')
+ fig, axs = plt.subplots(ncols=2, figsize=(16, 5))
+ for f, color, name in zip(files, sns.color_palette("Blues", n_colors=len(files)), names):
+ data = torch.load(f)
+ # precision is n_iou, n_points, n_cat, n_area, max_det
+ precision = data['precision']
+ recall = data['params'].recThrs
+ scores = data['scores']
+ # take precision for all classes, all areas and 100 detections
+ precision = precision[0, :, :, 0, -1].mean(1)
+ scores = scores[0, :, :, 0, -1].mean(1)
+ prec = precision.mean()
+ rec = data['recall'][0, :, 0, -1].mean()
+ print(f'{naming_scheme} {name}: mAP@50={prec * 100: 05.1f}, ' +
+ f'score={scores.mean():0.3f}, ' +
+ f'f1={2 * prec * rec / (prec + rec + 1e-8):0.3f}'
+ )
+ axs[0].plot(recall, precision, c=color)
+ axs[1].plot(recall, scores, c=color)
+
+ axs[0].set_title('Precision / Recall')
+ axs[0].legend(names)
+ axs[1].set_title('Scores / Recall')
+ axs[1].legend(names)
+ return fig, axs
diff --git a/docs/src/dxl_test.py b/docs/src/dxl_test.py
new file mode 100644
index 00000000..1d5be391
--- /dev/null
+++ b/docs/src/dxl_test.py
@@ -0,0 +1,4 @@
+from dynamixel_client import DynamixelClient
+client = DynamixelClient([1, 2], port='/dev/ttyDXL_wheels', lazy_connect=True)
+
+print(client.read_pos_vel_cur())
diff --git a/docs/src/dynamixel_client.py b/docs/src/dynamixel_client.py
new file mode 100644
index 00000000..814fe97e
--- /dev/null
+++ b/docs/src/dynamixel_client.py
@@ -0,0 +1,604 @@
+"""Communication using the DynamixelSDK."""
+##This is based off of the dynamixel SDK
+import atexit
+import logging
+import time
+from typing import Optional, Sequence, Union, Tuple
+
+import numpy as np
+
+PROTOCOL_VERSION = 2.0
+
+# The following addresses assume XH motors.
+ADDR_TORQUE_ENABLE = 64
+ADDR_GOAL_POSITION = 116
+ADDR_PRESENT_POSITION = 132
+ADDR_PRESENT_VELOCITY = 128
+ADDR_PRESENT_CURRENT = 126
+ADDR_PRESENT_POS_VEL_CUR = 126
+
+# Data Byte Length
+LEN_PRESENT_POSITION = 4
+LEN_PRESENT_VELOCITY = 4
+LEN_PRESENT_CURRENT = 2
+LEN_PRESENT_POS_VEL_CUR = 10
+LEN_GOAL_POSITION = 4
+
+DEFAULT_POS_SCALE = 2.0 * np.pi / 4096 # 0.088 degrees
+# See http://emanual.robotis.com/docs/en/dxl/x/xh430-v210/#goal-velocity
+DEFAULT_VEL_SCALE = 0.229 * 2.0 * np.pi / 60.0 # 0.229 rpm
+DEFAULT_CUR_SCALE = 1.34
+
+
+def dynamixel_cleanup_handler():
+ """Cleanup function to ensure Dynamixels are disconnected properly."""
+ open_clients = list(DynamixelClient.OPEN_CLIENTS)
+ for open_client in open_clients:
+ if open_client.port_handler.is_using:
+ logging.warning('Forcing client to close.')
+ open_client.port_handler.is_using = False
+ open_client.disconnect()
+
+
+def signed_to_unsigned(value: int, size: int) -> int:
+ """Converts the given value to its unsigned representation."""
+ if value < 0:
+ bit_size = 8 * size
+ max_value = (1 << bit_size) - 1
+ value = max_value + value
+ return value
+
+
+def unsigned_to_signed(value: int, size: int) -> int:
+ """Converts the given value from its unsigned representation."""
+ bit_size = 8 * size
+ if (value & (1 << (bit_size - 1))) != 0:
+ value = -((1 << bit_size) - value)
+ return value
+
+
+class DynamixelClient:
+ """Client for communicating with Dynamixel motors.
+
+ NOTE: This only supports Protocol 2.
+ """
+
+ # The currently open clients.
+ OPEN_CLIENTS = set()
+
+ def __init__(self,
+ motor_ids: Sequence[int],
+ port: str = '/dev/ttyUSB0',
+ baudrate: int = 1000000,
+ lazy_connect: bool = False,
+ pos_scale: Optional[float] = None,
+ vel_scale: Optional[float] = None,
+ cur_scale: Optional[float] = None):
+ """Initializes a new client.
+
+ Args:
+ motor_ids: All motor IDs being used by the client.
+ port: The Dynamixel device to talk to. e.g.
+ - Linux: /dev/ttyUSB0
+ - Mac: /dev/tty.usbserial-*
+ - Windows: COM1
+ baudrate: The Dynamixel baudrate to communicate with.
+ lazy_connect: If True, automatically connects when calling a method
+ that requires a connection, if not already connected.
+ pos_scale: The scaling factor for the positions. This is
+ motor-dependent. If not provided, uses the default scale.
+ vel_scale: The scaling factor for the velocities. This is
+ motor-dependent. If not provided uses the default scale.
+ cur_scale: The scaling factor for the currents. This is
+ motor-dependent. If not provided uses the default scale.
+ """
+ import dynamixel_sdk
+ self.dxl = dynamixel_sdk
+
+ self.motor_ids = list(motor_ids)
+ self.port_name = port
+ self.baudrate = baudrate
+ self.lazy_connect = lazy_connect
+
+ self.port_handler = self.dxl.PortHandler(port)
+ self.packet_handler = self.dxl.PacketHandler(PROTOCOL_VERSION)
+
+ self._pos_vel_cur_reader = DynamixelPosVelCurReader(
+ self,
+ self.motor_ids,
+ pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,
+ vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,
+ cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,
+ )
+ self._pos_reader = DynamixelPosReader(
+ self,
+ self.motor_ids,
+ pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,
+ vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,
+ cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,
+ )
+ self._vel_reader = DynamixelVelReader(
+ self,
+ self.motor_ids,
+ pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,
+ vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,
+ cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,
+ )
+ self._cur_reader = DynamixelCurReader(
+ self,
+ self.motor_ids,
+ pos_scale=pos_scale if pos_scale is not None else DEFAULT_POS_SCALE,
+ vel_scale=vel_scale if vel_scale is not None else DEFAULT_VEL_SCALE,
+ cur_scale=cur_scale if cur_scale is not None else DEFAULT_CUR_SCALE,
+ )
+ self._sync_writers = {}
+
+ self.OPEN_CLIENTS.add(self)
+
+ @property
+ def is_connected(self) -> bool:
+ return self.port_handler.is_open
+
+ def connect(self):
+ """Connects to the Dynamixel motors.
+
+ NOTE: This should be called after all DynamixelClients on the same
+ process are created.
+ """
+ assert not self.is_connected, 'Client is already connected.'
+
+ if self.port_handler.openPort():
+ logging.info('Succeeded to open port: %s', self.port_name)
+ else:
+ raise OSError(
+ ('Failed to open port at {} (Check that the device is powered '
+ 'on and connected to your computer).').format(self.port_name))
+
+ if self.port_handler.setBaudRate(self.baudrate):
+ logging.info('Succeeded to set baudrate to %d', self.baudrate)
+ else:
+ raise OSError(
+ ('Failed to set the baudrate to {} (Ensure that the device was '
+ 'configured for this baudrate).').format(self.baudrate))
+
+ # Start with all motors enabled. NO, I want to set settings before enabled
+ #self.set_torque_enabled(self.motor_ids, True)
+
+ def disconnect(self):
+ """Disconnects from the Dynamixel device."""
+ if not self.is_connected:
+ return
+ if self.port_handler.is_using:
+ logging.error('Port handler in use; cannot disconnect.')
+ return
+ # Ensure motors are disabled at the end.
+ self.set_torque_enabled(self.motor_ids, False, retries=0)
+ self.port_handler.closePort()
+ if self in self.OPEN_CLIENTS:
+ self.OPEN_CLIENTS.remove(self)
+
+ def set_torque_enabled(self,
+ motor_ids: Sequence[int],
+ enabled: bool,
+ retries: int = -1,
+ retry_interval: float = 0.25):
+ """Sets whether torque is enabled for the motors.
+
+ Args:
+ motor_ids: The motor IDs to configure.
+ enabled: Whether to engage or disengage the motors.
+ retries: The number of times to retry. If this is <0, will retry
+ forever.
+ retry_interval: The number of seconds to wait between retries.
+ """
+ remaining_ids = list(motor_ids)
+ while remaining_ids:
+ remaining_ids = self.write_byte(
+ remaining_ids,
+ int(enabled),
+ ADDR_TORQUE_ENABLE,
+ )
+ if remaining_ids:
+ logging.error('Could not set torque %s for IDs: %s',
+ 'enabled' if enabled else 'disabled',
+ str(remaining_ids))
+ if retries == 0:
+ break
+ time.sleep(retry_interval)
+ retries -= 1
+
+ def read_pos_vel_cur(self) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
+ """Returns the current positions and velocities."""
+ return self._pos_vel_cur_reader.read()
+ def read_pos(self) -> np.ndarray:
+ """Returns the current positions and velocities."""
+ return self._pos_reader.read()
+ def read_vel(self) -> np.ndarray:
+ """Returns the current positions and velocities."""
+ return self._vel_reader.read()
+ def read_cur(self) -> np.ndarray:
+ """Returns the current positions and velocities."""
+ return self._cur_reader.read()
+
+ def write_desired_pos(self, motor_ids: Sequence[int],
+ positions: np.ndarray):
+ """Writes the given desired positions.
+
+ Args:
+ motor_ids: The motor IDs to write to.
+ positions: The joint angles in radians to write.
+ """
+ assert len(motor_ids) == len(positions)
+
+ # Convert to Dynamixel position space.
+ positions = positions / self._pos_vel_cur_reader.pos_scale
+ self.sync_write(motor_ids, positions, ADDR_GOAL_POSITION,
+ LEN_GOAL_POSITION)
+
+ def write_byte(
+ self,
+ motor_ids: Sequence[int],
+ value: int,
+ address: int,
+ ) -> Sequence[int]:
+ """Writes a value to the motors.
+
+ Args:
+ motor_ids: The motor IDs to write to.
+ value: The value to write to the control table.
+ address: The control table address to write to.
+
+ Returns:
+ A list of IDs that were unsuccessful.
+ """
+ self.check_connected()
+ errored_ids = []
+ for motor_id in motor_ids:
+ comm_result, dxl_error = self.packet_handler.write1ByteTxRx(
+ self.port_handler, motor_id, address, value)
+ success = self.handle_packet_result(
+ comm_result, dxl_error, motor_id, context='write_byte')
+ if not success:
+ errored_ids.append(motor_id)
+ return errored_ids
+
+ def sync_write(self, motor_ids: Sequence[int],
+ values: Sequence[Union[int, float]], address: int,
+ size: int):
+ """Writes values to a group of motors.
+
+ Args:
+ motor_ids: The motor IDs to write to.
+ values: The values to write.
+ address: The control table address to write to.
+ size: The size of the control table value being written to.
+ """
+ self.check_connected()
+ key = (address, size)
+ if key not in self._sync_writers:
+ self._sync_writers[key] = self.dxl.GroupSyncWrite(
+ self.port_handler, self.packet_handler, address, size)
+ sync_writer = self._sync_writers[key]
+
+ errored_ids = []
+ for motor_id, desired_pos in zip(motor_ids, values):
+ value = signed_to_unsigned(int(desired_pos), size=size)
+ value = value.to_bytes(size, byteorder='little')
+ success = sync_writer.addParam(motor_id, value)
+ if not success:
+ errored_ids.append(motor_id)
+
+ if errored_ids:
+ logging.error('Sync write failed for: %s', str(errored_ids))
+
+ comm_result = sync_writer.txPacket()
+ self.handle_packet_result(comm_result, context='sync_write')
+
+ sync_writer.clearParam()
+
+ def check_connected(self):
+ """Ensures the robot is connected."""
+ if self.lazy_connect and not self.is_connected:
+ self.connect()
+ if not self.is_connected:
+ raise OSError('Must call connect() first.')
+
+ def handle_packet_result(self,
+ comm_result: int,
+ dxl_error: Optional[int] = None,
+ dxl_id: Optional[int] = None,
+ context: Optional[str] = None):
+ """Handles the result from a communication request."""
+ error_message = None
+ if comm_result != self.dxl.COMM_SUCCESS:
+ error_message = self.packet_handler.getTxRxResult(comm_result)
+ elif dxl_error is not None:
+ error_message = self.packet_handler.getRxPacketError(dxl_error)
+ if error_message:
+ if dxl_id is not None:
+ error_message = '[Motor ID: {}] {}'.format(
+ dxl_id, error_message)
+ if context is not None:
+ error_message = '> {}: {}'.format(context, error_message)
+ logging.error(error_message)
+ return False
+ return True
+
+ def convert_to_unsigned(self, value: int, size: int) -> int:
+ """Converts the given value to its unsigned representation."""
+ if value < 0:
+ max_value = (1 << (8 * size)) - 1
+ value = max_value + value
+ return value
+
+ def __enter__(self):
+ """Enables use as a context manager."""
+ if not self.is_connected:
+ self.connect()
+ return self
+
+ def __exit__(self, *args):
+ """Enables use as a context manager."""
+ self.disconnect()
+
+ def __del__(self):
+ """Automatically disconnect on destruction."""
+ self.disconnect()
+
+
+class DynamixelReader:
+ """Reads data from Dynamixel motors.
+
+ This wraps a GroupBulkRead from the DynamixelSDK.
+ """
+
+ def __init__(self, client: DynamixelClient, motor_ids: Sequence[int],
+ address: int, size: int):
+ """Initializes a new reader."""
+ self.client = client
+ self.motor_ids = motor_ids
+ self.address = address
+ self.size = size
+ self._initialize_data()
+
+ self.operation = self.client.dxl.GroupBulkRead(client.port_handler,
+ client.packet_handler)
+
+ for motor_id in motor_ids:
+ success = self.operation.addParam(motor_id, address, size)
+ if not success:
+ raise OSError(
+ '[Motor ID: {}] Could not add parameter to bulk read.'
+ .format(motor_id))
+
+ def read(self, retries: int = 1):
+ """Reads data from the motors."""
+ self.client.check_connected()
+ success = False
+ while not success and retries >= 0:
+ comm_result = self.operation.txRxPacket()
+ success = self.client.handle_packet_result(
+ comm_result, context='read')
+ retries -= 1
+
+ # If we failed, send a copy of the previous data.
+ if not success:
+ return self._get_data()
+
+ errored_ids = []
+ for i, motor_id in enumerate(self.motor_ids):
+ # Check if the data is available.
+ available = self.operation.isAvailable(motor_id, self.address,
+ self.size)
+ if not available:
+ errored_ids.append(motor_id)
+ continue
+
+ self._update_data(i, motor_id)
+
+ if errored_ids:
+ logging.error('Bulk read data is unavailable for: %s',
+ str(errored_ids))
+
+ return self._get_data()
+
+ def _initialize_data(self):
+ """Initializes the cached data."""
+ self._data = np.zeros(len(self.motor_ids), dtype=np.float32)
+
+ def _update_data(self, index: int, motor_id: int):
+ """Updates the data index for the given motor ID."""
+ self._data[index] = self.operation.getData(motor_id, self.address,
+ self.size)
+
+ def _get_data(self):
+ """Returns a copy of the data."""
+ return self._data.copy()
+
+
+class DynamixelPosVelCurReader(DynamixelReader):
+ """Reads positions and velocities."""
+
+ def __init__(self,
+ client: DynamixelClient,
+ motor_ids: Sequence[int],
+ pos_scale: float = 1.0,
+ vel_scale: float = 1.0,
+ cur_scale: float = 1.0):
+ super().__init__(
+ client,
+ motor_ids,
+ address=ADDR_PRESENT_POS_VEL_CUR,
+ size=LEN_PRESENT_POS_VEL_CUR,
+ )
+ self.pos_scale = pos_scale
+ self.vel_scale = vel_scale
+ self.cur_scale = cur_scale
+
+ def _initialize_data(self):
+ """Initializes the cached data."""
+ self._pos_data = np.zeros(len(self.motor_ids), dtype=np.float32)
+ self._vel_data = np.zeros(len(self.motor_ids), dtype=np.float32)
+ self._cur_data = np.zeros(len(self.motor_ids), dtype=np.float32)
+
+ def _update_data(self, index: int, motor_id: int):
+ """Updates the data index for the given motor ID."""
+ cur = self.operation.getData(motor_id, ADDR_PRESENT_CURRENT,
+ LEN_PRESENT_CURRENT)
+ vel = self.operation.getData(motor_id, ADDR_PRESENT_VELOCITY,
+ LEN_PRESENT_VELOCITY)
+ pos = self.operation.getData(motor_id, ADDR_PRESENT_POSITION,
+ LEN_PRESENT_POSITION)
+ cur = unsigned_to_signed(cur, size=2)
+ vel = unsigned_to_signed(vel, size=4)
+ pos = unsigned_to_signed(pos, size=4)
+ self._pos_data[index] = float(pos) * self.pos_scale
+ self._vel_data[index] = float(vel) * self.vel_scale
+ self._cur_data[index] = float(cur) * self.cur_scale
+
+ def _get_data(self):
+ """Returns a copy of the data."""
+ return (self._pos_data.copy(), self._vel_data.copy(),
+ self._cur_data.copy())
+
+
+class DynamixelPosReader(DynamixelReader):
+ """Reads positions and velocities."""
+
+ def __init__(self,
+ client: DynamixelClient,
+ motor_ids: Sequence[int],
+ pos_scale: float = 1.0,
+ vel_scale: float = 1.0,
+ cur_scale: float = 1.0):
+ super().__init__(
+ client,
+ motor_ids,
+ address=ADDR_PRESENT_POS_VEL_CUR,
+ size=LEN_PRESENT_POS_VEL_CUR,
+ )
+ self.pos_scale = pos_scale
+
+ def _initialize_data(self):
+ """Initializes the cached data."""
+ self._pos_data = np.zeros(len(self.motor_ids), dtype=np.float32)
+
+ def _update_data(self, index: int, motor_id: int):
+ """Updates the data index for the given motor ID."""
+ pos = self.operation.getData(motor_id, ADDR_PRESENT_POSITION,
+ LEN_PRESENT_POSITION)
+ pos = unsigned_to_signed(pos, size=4)
+ self._pos_data[index] = float(pos) * self.pos_scale
+
+ def _get_data(self):
+ """Returns a copy of the data."""
+ return self._pos_data.copy()
+
+class DynamixelVelReader(DynamixelReader):
+ """Reads positions and velocities."""
+
+ def __init__(self,
+ client: DynamixelClient,
+ motor_ids: Sequence[int],
+ pos_scale: float = 1.0,
+ vel_scale: float = 1.0,
+ cur_scale: float = 1.0):
+ super().__init__(
+ client,
+ motor_ids,
+ address=ADDR_PRESENT_POS_VEL_CUR,
+ size=LEN_PRESENT_POS_VEL_CUR,
+ )
+ self.pos_scale = pos_scale
+ self.vel_scale = vel_scale
+ self.cur_scale = cur_scale
+
+ def _initialize_data(self):
+ """Initializes the cached data."""
+ self._vel_data = np.zeros(len(self.motor_ids), dtype=np.float32)
+
+ def _update_data(self, index: int, motor_id: int):
+ """Updates the data index for the given motor ID."""
+ vel = self.operation.getData(motor_id, ADDR_PRESENT_VELOCITY,
+ LEN_PRESENT_VELOCITY)
+ vel = unsigned_to_signed(vel, size=4)
+ self._vel_data[index] = float(vel) * self.vel_scale
+
+ def _get_data(self):
+ """Returns a copy of the data."""
+ return self._vel_data.copy()
+
+class DynamixelCurReader(DynamixelReader):
+ """Reads positions and velocities."""
+
+ def __init__(self,
+ client: DynamixelClient,
+ motor_ids: Sequence[int],
+ pos_scale: float = 1.0,
+ vel_scale: float = 1.0,
+ cur_scale: float = 1.0):
+ super().__init__(
+ client,
+ motor_ids,
+ address=ADDR_PRESENT_POS_VEL_CUR,
+ size=LEN_PRESENT_POS_VEL_CUR,
+ )
+ self.cur_scale = cur_scale
+
+ def _initialize_data(self):
+ """Initializes the cached data."""
+ self._cur_data = np.zeros(len(self.motor_ids), dtype=np.float32)
+
+ def _update_data(self, index: int, motor_id: int):
+ """Updates the data index for the given motor ID."""
+ cur = self.operation.getData(motor_id, ADDR_PRESENT_CURRENT,
+ LEN_PRESENT_CURRENT)
+ cur = unsigned_to_signed(cur, size=2)
+ self._cur_data[index] = float(cur) * self.cur_scale
+
+ def _get_data(self):
+ """Returns a copy of the data."""
+ return self._cur_data.copy()
+
+
+# Register global cleanup function.
+atexit.register(dynamixel_cleanup_handler)
+
+if __name__ == '__main__':
+ import argparse
+ import itertools
+
+ parser = argparse.ArgumentParser()
+ parser.add_argument(
+ '-m',
+ '--motors',
+ required=True,
+ help='Comma-separated list of motor IDs.')
+ parser.add_argument(
+ '-d',
+ '--device',
+ default='/dev/ttyUSB0',
+ help='The Dynamixel device to connect to.')
+ parser.add_argument(
+ '-b', '--baud', default=1000000, help='The baudrate to connect with.')
+ parsed_args = parser.parse_args()
+
+ motors = [int(motor) for motor in parsed_args.motors.split(',')]
+
+ way_points = [np.zeros(len(motors)), np.full(len(motors), np.pi)]
+
+ with DynamixelClient(motors, parsed_args.device,
+ parsed_args.baud) as dxl_client:
+ for step in itertools.count():
+ if step > 0 and step % 50 == 0:
+ way_point = way_points[(step // 100) % len(way_points)]
+ print('Writing: {}'.format(way_point.tolist()))
+ dxl_client.write_desired_pos(motors, way_point)
+ read_start = time.time()
+ pos_now, vel_now, cur_now = dxl_client.read_pos_vel_cur()
+ if step % 5 == 0:
+ print('[{}] Frequency: {:.2f} Hz'.format(
+ step, 1.0 / (time.time() - read_start)))
+ print('> Pos: {}'.format(pos_now.tolist()))
+ print('> Vel: {}'.format(vel_now.tolist()))
+ print('> Cur: {}'.format(cur_now.tolist()))
diff --git a/docs/src/ee_sim_env.py b/docs/src/ee_sim_env.py
new file mode 100644
index 00000000..c553cb45
--- /dev/null
+++ b/docs/src/ee_sim_env.py
@@ -0,0 +1,267 @@
+import numpy as np
+import collections
+import os
+
+from constants import DT, XML_DIR, START_ARM_POSE
+from constants import PUPPET_GRIPPER_POSITION_CLOSE
+from constants import PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN
+from constants import PUPPET_GRIPPER_POSITION_NORMALIZE_FN
+from constants import PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN
+
+from utils import sample_box_pose, sample_insertion_pose
+from dm_control import mujoco
+from dm_control.rl import control
+from dm_control.suite import base
+
+import IPython
+e = IPython.embed
+
+
+def make_ee_sim_env(task_name):
+ """
+ Environment for simulated robot bi-manual manipulation, with end-effector control.
+ Action space: [left_arm_pose (7), # position and quaternion for end effector
+ left_gripper_positions (1), # normalized gripper position (0: close, 1: open)
+ right_arm_pose (7), # position and quaternion for end effector
+ right_gripper_positions (1),] # normalized gripper position (0: close, 1: open)
+
+ Observation space: {"qpos": Concat[ left_arm_qpos (6), # absolute joint position
+ left_gripper_position (1), # normalized gripper position (0: close, 1: open)
+ right_arm_qpos (6), # absolute joint position
+ right_gripper_qpos (1)] # normalized gripper position (0: close, 1: open)
+ "qvel": Concat[ left_arm_qvel (6), # absolute joint velocity (rad)
+ left_gripper_velocity (1), # normalized gripper velocity (pos: opening, neg: closing)
+ right_arm_qvel (6), # absolute joint velocity (rad)
+ right_gripper_qvel (1)] # normalized gripper velocity (pos: opening, neg: closing)
+ "images": {"main": (480x640x3)} # h, w, c, dtype='uint8'
+ """
+ if 'sim_transfer_cube' in task_name:
+ xml_path = os.path.join(XML_DIR, f'bimanual_viperx_ee_transfer_cube.xml')
+ physics = mujoco.Physics.from_xml_path(xml_path)
+ task = TransferCubeEETask(random=False)
+ env = control.Environment(physics, task, time_limit=20, control_timestep=DT,
+ n_sub_steps=None, flat_observation=False)
+ elif 'sim_insertion' in task_name:
+ xml_path = os.path.join(XML_DIR, f'bimanual_viperx_ee_insertion.xml')
+ physics = mujoco.Physics.from_xml_path(xml_path)
+ task = InsertionEETask(random=False)
+ env = control.Environment(physics, task, time_limit=20, control_timestep=DT,
+ n_sub_steps=None, flat_observation=False)
+ else:
+ raise NotImplementedError
+ return env
+
+class BimanualViperXEETask(base.Task):
+ def __init__(self, random=None):
+ super().__init__(random=random)
+
+ def before_step(self, action, physics):
+ a_len = len(action) // 2
+ action_left = action[:a_len]
+ action_right = action[a_len:]
+
+ # set mocap position and quat
+ # left
+ np.copyto(physics.data.mocap_pos[0], action_left[:3])
+ np.copyto(physics.data.mocap_quat[0], action_left[3:7])
+ # right
+ np.copyto(physics.data.mocap_pos[1], action_right[:3])
+ np.copyto(physics.data.mocap_quat[1], action_right[3:7])
+
+ # set gripper
+ g_left_ctrl = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(action_left[7])
+ g_right_ctrl = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(action_right[7])
+ np.copyto(physics.data.ctrl, np.array([g_left_ctrl, -g_left_ctrl, g_right_ctrl, -g_right_ctrl]))
+
+ def initialize_robots(self, physics):
+ # reset joint position
+ physics.named.data.qpos[:16] = START_ARM_POSE
+
+ # reset mocap to align with end effector
+ # to obtain these numbers:
+ # (1) make an ee_sim env and reset to the same start_pose
+ # (2) get env._physics.named.data.xpos['vx300s_left/gripper_link']
+ # get env._physics.named.data.xquat['vx300s_left/gripper_link']
+ # repeat the same for right side
+ np.copyto(physics.data.mocap_pos[0], [-0.31718881+0.1, 0.5, 0.29525084])
+ np.copyto(physics.data.mocap_quat[0], [1, 0, 0, 0])
+ # right
+ np.copyto(physics.data.mocap_pos[1], np.array([0.31718881-0.1, 0.49999888, 0.29525084]))
+ np.copyto(physics.data.mocap_quat[1], [1, 0, 0, 0])
+
+ # reset gripper control
+ close_gripper_control = np.array([
+ PUPPET_GRIPPER_POSITION_CLOSE,
+ -PUPPET_GRIPPER_POSITION_CLOSE,
+ PUPPET_GRIPPER_POSITION_CLOSE,
+ -PUPPET_GRIPPER_POSITION_CLOSE,
+ ])
+ np.copyto(physics.data.ctrl, close_gripper_control)
+
+ def initialize_episode(self, physics):
+ """Sets the state of the environment at the start of each episode."""
+ super().initialize_episode(physics)
+
+ @staticmethod
+ def get_qpos(physics):
+ qpos_raw = physics.data.qpos.copy()
+ left_qpos_raw = qpos_raw[:8]
+ right_qpos_raw = qpos_raw[8:16]
+ left_arm_qpos = left_qpos_raw[:6]
+ right_arm_qpos = right_qpos_raw[:6]
+ left_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(left_qpos_raw[6])]
+ right_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(right_qpos_raw[6])]
+ return np.concatenate([left_arm_qpos, left_gripper_qpos, right_arm_qpos, right_gripper_qpos])
+
+ @staticmethod
+ def get_qvel(physics):
+ qvel_raw = physics.data.qvel.copy()
+ left_qvel_raw = qvel_raw[:8]
+ right_qvel_raw = qvel_raw[8:16]
+ left_arm_qvel = left_qvel_raw[:6]
+ right_arm_qvel = right_qvel_raw[:6]
+ left_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(left_qvel_raw[6])]
+ right_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(right_qvel_raw[6])]
+ return np.concatenate([left_arm_qvel, left_gripper_qvel, right_arm_qvel, right_gripper_qvel])
+
+ @staticmethod
+ def get_env_state(physics):
+ raise NotImplementedError
+
+ def get_observation(self, physics):
+ # note: it is important to do .copy()
+ obs = collections.OrderedDict()
+ obs['qpos'] = self.get_qpos(physics)
+ obs['qvel'] = self.get_qvel(physics)
+ obs['env_state'] = self.get_env_state(physics)
+ obs['images'] = dict()
+ obs['images']['top'] = physics.render(height=480, width=640, camera_id='top')
+ # obs['images']['angle'] = physics.render(height=480, width=640, camera_id='angle')
+ # obs['images']['vis'] = physics.render(height=480, width=640, camera_id='front_close')
+ # used in scripted policy to obtain starting pose
+ obs['mocap_pose_left'] = np.concatenate([physics.data.mocap_pos[0], physics.data.mocap_quat[0]]).copy()
+ obs['mocap_pose_right'] = np.concatenate([physics.data.mocap_pos[1], physics.data.mocap_quat[1]]).copy()
+
+ # used when replaying joint trajectory
+ obs['gripper_ctrl'] = physics.data.ctrl.copy()
+ return obs
+
+ def get_reward(self, physics):
+ raise NotImplementedError
+
+
+class TransferCubeEETask(BimanualViperXEETask):
+ def __init__(self, random=None):
+ super().__init__(random=random)
+ self.max_reward = 4
+
+ def initialize_episode(self, physics):
+ """Sets the state of the environment at the start of each episode."""
+ self.initialize_robots(physics)
+ # randomize box position
+ cube_pose = sample_box_pose()
+ box_start_idx = physics.model.name2id('red_box_joint', 'joint')
+ np.copyto(physics.data.qpos[box_start_idx : box_start_idx + 7], cube_pose)
+ # print(f"randomized cube position to {cube_position}")
+
+ super().initialize_episode(physics)
+
+ @staticmethod
+ def get_env_state(physics):
+ env_state = physics.data.qpos.copy()[16:]
+ return env_state
+
+ def get_reward(self, physics):
+ # return whether left gripper is holding the box
+ all_contact_pairs = []
+ for i_contact in range(physics.data.ncon):
+ id_geom_1 = physics.data.contact[i_contact].geom1
+ id_geom_2 = physics.data.contact[i_contact].geom2
+ name_geom_1 = physics.model.id2name(id_geom_1, 'geom')
+ name_geom_2 = physics.model.id2name(id_geom_2, 'geom')
+ contact_pair = (name_geom_1, name_geom_2)
+ all_contact_pairs.append(contact_pair)
+
+ touch_left_gripper = ("red_box", "vx300s_left/10_left_gripper_finger") in all_contact_pairs
+ touch_right_gripper = ("red_box", "vx300s_right/10_right_gripper_finger") in all_contact_pairs
+ touch_table = ("red_box", "table") in all_contact_pairs
+
+ reward = 0
+ if touch_right_gripper:
+ reward = 1
+ if touch_right_gripper and not touch_table: # lifted
+ reward = 2
+ if touch_left_gripper: # attempted transfer
+ reward = 3
+ if touch_left_gripper and not touch_table: # successful transfer
+ reward = 4
+ return reward
+
+
+class InsertionEETask(BimanualViperXEETask):
+ def __init__(self, random=None):
+ super().__init__(random=random)
+ self.max_reward = 4
+
+ def initialize_episode(self, physics):
+ """Sets the state of the environment at the start of each episode."""
+ self.initialize_robots(physics)
+ # randomize peg and socket position
+ peg_pose, socket_pose = sample_insertion_pose()
+ id2index = lambda j_id: 16 + (j_id - 16) * 7 # first 16 is robot qpos, 7 is pose dim # hacky
+
+ peg_start_id = physics.model.name2id('red_peg_joint', 'joint')
+ peg_start_idx = id2index(peg_start_id)
+ np.copyto(physics.data.qpos[peg_start_idx : peg_start_idx + 7], peg_pose)
+ # print(f"randomized cube position to {cube_position}")
+
+ socket_start_id = physics.model.name2id('blue_socket_joint', 'joint')
+ socket_start_idx = id2index(socket_start_id)
+ np.copyto(physics.data.qpos[socket_start_idx : socket_start_idx + 7], socket_pose)
+ # print(f"randomized cube position to {cube_position}")
+
+ super().initialize_episode(physics)
+
+ @staticmethod
+ def get_env_state(physics):
+ env_state = physics.data.qpos.copy()[16:]
+ return env_state
+
+ def get_reward(self, physics):
+ # return whether peg touches the pin
+ all_contact_pairs = []
+ for i_contact in range(physics.data.ncon):
+ id_geom_1 = physics.data.contact[i_contact].geom1
+ id_geom_2 = physics.data.contact[i_contact].geom2
+ name_geom_1 = physics.model.id2name(id_geom_1, 'geom')
+ name_geom_2 = physics.model.id2name(id_geom_2, 'geom')
+ contact_pair = (name_geom_1, name_geom_2)
+ all_contact_pairs.append(contact_pair)
+
+ touch_right_gripper = ("red_peg", "vx300s_right/10_right_gripper_finger") in all_contact_pairs
+ touch_left_gripper = ("socket-1", "vx300s_left/10_left_gripper_finger") in all_contact_pairs or \
+ ("socket-2", "vx300s_left/10_left_gripper_finger") in all_contact_pairs or \
+ ("socket-3", "vx300s_left/10_left_gripper_finger") in all_contact_pairs or \
+ ("socket-4", "vx300s_left/10_left_gripper_finger") in all_contact_pairs
+
+ peg_touch_table = ("red_peg", "table") in all_contact_pairs
+ socket_touch_table = ("socket-1", "table") in all_contact_pairs or \
+ ("socket-2", "table") in all_contact_pairs or \
+ ("socket-3", "table") in all_contact_pairs or \
+ ("socket-4", "table") in all_contact_pairs
+ peg_touch_socket = ("red_peg", "socket-1") in all_contact_pairs or \
+ ("red_peg", "socket-2") in all_contact_pairs or \
+ ("red_peg", "socket-3") in all_contact_pairs or \
+ ("red_peg", "socket-4") in all_contact_pairs
+ pin_touched = ("red_peg", "pin") in all_contact_pairs
+
+ reward = 0
+ if touch_left_gripper and touch_right_gripper: # touch both
+ reward = 1
+ if touch_left_gripper and touch_right_gripper and (not peg_touch_table) and (not socket_touch_table): # grasp both
+ reward = 2
+ if peg_touch_socket and (not peg_touch_table) and (not socket_touch_table): # peg and socket touching
+ reward = 3
+ if pin_touched: # successful insertion
+ reward = 4
+ return reward
diff --git a/docs/src/imitate_episodes.py b/docs/src/imitate_episodes.py
new file mode 100644
index 00000000..3f2fa450
--- /dev/null
+++ b/docs/src/imitate_episodes.py
@@ -0,0 +1,666 @@
+import torch
+import numpy as np
+import os
+import pickle
+import argparse
+import matplotlib.pyplot as plt
+from copy import deepcopy
+from itertools import repeat
+from tqdm import tqdm
+from einops import rearrange
+import wandb
+import time
+from torchvision import transforms
+
+from constants import FPS
+from constants import PUPPET_GRIPPER_JOINT_OPEN
+from utils import load_data # data functions
+from utils import sample_box_pose, sample_insertion_pose # robot functions
+from utils import compute_dict_mean, set_seed, detach_dict, calibrate_linear_vel, postprocess_base_action # helper functions
+from policy import ACTPolicy, CNNMLPPolicy, DiffusionPolicy
+from visualize_episodes import save_videos
+
+from detr.models.latent_model import Latent_Model_Transformer
+
+from sim_env import BOX_POSE
+
+import IPython
+e = IPython.embed
+
+def get_auto_index(dataset_dir):
+ max_idx = 1000
+ for i in range(max_idx+1):
+ if not os.path.isfile(os.path.join(dataset_dir, f'qpos_{i}.npy')):
+ return i
+ raise Exception(f"Error getting auto index, or more than {max_idx} episodes")
+
+def main(args):
+ set_seed(1)
+ # command line parameters
+ is_eval = args['eval']
+ ckpt_dir = args['ckpt_dir']
+ policy_class = args['policy_class']
+ onscreen_render = args['onscreen_render']
+ task_name = args['task_name']
+ batch_size_train = args['batch_size']
+ batch_size_val = args['batch_size']
+ num_steps = args['num_steps']
+ eval_every = args['eval_every']
+ validate_every = args['validate_every']
+ save_every = args['save_every']
+ resume_ckpt_path = args['resume_ckpt_path']
+
+ # get task parameters
+ is_sim = task_name[:4] == 'sim_'
+ if is_sim or task_name == 'all':
+ from constants import SIM_TASK_CONFIGS
+ task_config = SIM_TASK_CONFIGS[task_name]
+ else:
+ from aloha_scripts.constants import TASK_CONFIGS
+ task_config = TASK_CONFIGS[task_name]
+ dataset_dir = task_config['dataset_dir']
+ # num_episodes = task_config['num_episodes']
+ episode_len = task_config['episode_len']
+ camera_names = task_config['camera_names']
+ stats_dir = task_config.get('stats_dir', None)
+ sample_weights = task_config.get('sample_weights', None)
+ train_ratio = task_config.get('train_ratio', 0.99)
+ name_filter = task_config.get('name_filter', lambda n: True)
+
+ # fixed parameters
+ state_dim = 14
+ lr_backbone = 1e-5
+ backbone = 'resnet18'
+ if policy_class == 'ACT':
+ enc_layers = 4
+ dec_layers = 7
+ nheads = 8
+ policy_config = {'lr': args['lr'],
+ 'num_queries': args['chunk_size'],
+ 'kl_weight': args['kl_weight'],
+ 'hidden_dim': args['hidden_dim'],
+ 'dim_feedforward': args['dim_feedforward'],
+ 'lr_backbone': lr_backbone,
+ 'backbone': backbone,
+ 'enc_layers': enc_layers,
+ 'dec_layers': dec_layers,
+ 'nheads': nheads,
+ 'camera_names': camera_names,
+ 'vq': args['use_vq'],
+ 'vq_class': args['vq_class'],
+ 'vq_dim': args['vq_dim'],
+ 'action_dim': 16,
+ 'no_encoder': args['no_encoder'],
+ }
+ elif policy_class == 'Diffusion':
+
+ policy_config = {'lr': args['lr'],
+ 'camera_names': camera_names,
+ 'action_dim': 16,
+ 'observation_horizon': 1,
+ 'action_horizon': 8,
+ 'prediction_horizon': args['chunk_size'],
+ 'num_queries': args['chunk_size'],
+ 'num_inference_timesteps': 10,
+ 'ema_power': 0.75,
+ 'vq': False,
+ }
+ elif policy_class == 'CNNMLP':
+ policy_config = {'lr': args['lr'], 'lr_backbone': lr_backbone, 'backbone' : backbone, 'num_queries': 1,
+ 'camera_names': camera_names,}
+ else:
+ raise NotImplementedError
+
+ actuator_config = {
+ 'actuator_network_dir': args['actuator_network_dir'],
+ 'history_len': args['history_len'],
+ 'future_len': args['future_len'],
+ 'prediction_len': args['prediction_len'],
+ }
+
+ config = {
+ 'num_steps': num_steps,
+ 'eval_every': eval_every,
+ 'validate_every': validate_every,
+ 'save_every': save_every,
+ 'ckpt_dir': ckpt_dir,
+ 'resume_ckpt_path': resume_ckpt_path,
+ 'episode_len': episode_len,
+ 'state_dim': state_dim,
+ 'lr': args['lr'],
+ 'policy_class': policy_class,
+ 'onscreen_render': onscreen_render,
+ 'policy_config': policy_config,
+ 'task_name': task_name,
+ 'seed': args['seed'],
+ 'temporal_agg': args['temporal_agg'],
+ 'camera_names': camera_names,
+ 'real_robot': not is_sim,
+ 'load_pretrain': args['load_pretrain'],
+ 'actuator_config': actuator_config,
+ }
+
+ if not os.path.isdir(ckpt_dir):
+ os.makedirs(ckpt_dir)
+ config_path = os.path.join(ckpt_dir, 'config.pkl')
+ expr_name = ckpt_dir.split('/')[-1]
+ if not is_eval:
+ wandb.init(project="mobile-aloha2", reinit=True, entity="mobile-aloha2", name=expr_name)
+ wandb.config.update(config)
+ with open(config_path, 'wb') as f:
+ pickle.dump(config, f)
+ if is_eval:
+ ckpt_names = [f'policy_last.ckpt']
+ results = []
+ for ckpt_name in ckpt_names:
+ success_rate, avg_return = eval_bc(config, ckpt_name, save_episode=True, num_rollouts=10)
+ # wandb.log({'success_rate': success_rate, 'avg_return': avg_return})
+ results.append([ckpt_name, success_rate, avg_return])
+
+ for ckpt_name, success_rate, avg_return in results:
+ print(f'{ckpt_name}: {success_rate=} {avg_return=}')
+ print()
+ exit()
+
+ train_dataloader, val_dataloader, stats, _ = load_data(dataset_dir, name_filter, camera_names, batch_size_train, batch_size_val, args['chunk_size'], args['skip_mirrored_data'], config['load_pretrain'], policy_class, stats_dir_l=stats_dir, sample_weights=sample_weights, train_ratio=train_ratio)
+
+ # save dataset stats
+ stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')
+ with open(stats_path, 'wb') as f:
+ pickle.dump(stats, f)
+
+ best_ckpt_info = train_bc(train_dataloader, val_dataloader, config)
+ best_step, min_val_loss, best_state_dict = best_ckpt_info
+
+ # save best checkpoint
+ ckpt_path = os.path.join(ckpt_dir, f'policy_best.ckpt')
+ torch.save(best_state_dict, ckpt_path)
+ print(f'Best ckpt, val loss {min_val_loss:.6f} @ step{best_step}')
+ wandb.finish()
+
+
+def make_policy(policy_class, policy_config):
+ if policy_class == 'ACT':
+ policy = ACTPolicy(policy_config)
+ elif policy_class == 'CNNMLP':
+ policy = CNNMLPPolicy(policy_config)
+ elif policy_class == 'Diffusion':
+ policy = DiffusionPolicy(policy_config)
+ else:
+ raise NotImplementedError
+ return policy
+
+
+def make_optimizer(policy_class, policy):
+ if policy_class == 'ACT':
+ optimizer = policy.configure_optimizers()
+ elif policy_class == 'CNNMLP':
+ optimizer = policy.configure_optimizers()
+ elif policy_class == 'Diffusion':
+ optimizer = policy.configure_optimizers()
+ else:
+ raise NotImplementedError
+ return optimizer
+
+
+def get_image(ts, camera_names, rand_crop_resize=False):
+ curr_images = []
+ for cam_name in camera_names:
+ curr_image = rearrange(ts.observation['images'][cam_name], 'h w c -> c h w')
+ curr_images.append(curr_image)
+ curr_image = np.stack(curr_images, axis=0)
+ curr_image = torch.from_numpy(curr_image / 255.0).float().cuda().unsqueeze(0)
+
+ if rand_crop_resize:
+ print('rand crop resize is used!')
+ original_size = curr_image.shape[-2:]
+ ratio = 0.95
+ curr_image = curr_image[..., int(original_size[0] * (1 - ratio) / 2): int(original_size[0] * (1 + ratio) / 2),
+ int(original_size[1] * (1 - ratio) / 2): int(original_size[1] * (1 + ratio) / 2)]
+ curr_image = curr_image.squeeze(0)
+ resize_transform = transforms.Resize(original_size, antialias=True)
+ curr_image = resize_transform(curr_image)
+ curr_image = curr_image.unsqueeze(0)
+
+ return curr_image
+
+
+def eval_bc(config, ckpt_name, save_episode=True, num_rollouts=50):
+ set_seed(1000)
+ ckpt_dir = config['ckpt_dir']
+ state_dim = config['state_dim']
+ real_robot = config['real_robot']
+ policy_class = config['policy_class']
+ onscreen_render = config['onscreen_render']
+ policy_config = config['policy_config']
+ camera_names = config['camera_names']
+ max_timesteps = config['episode_len']
+ task_name = config['task_name']
+ temporal_agg = config['temporal_agg']
+ onscreen_cam = 'angle'
+ vq = config['policy_config']['vq']
+ actuator_config = config['actuator_config']
+ use_actuator_net = actuator_config['actuator_network_dir'] is not None
+
+ # load policy and stats
+ ckpt_path = os.path.join(ckpt_dir, ckpt_name)
+ policy = make_policy(policy_class, policy_config)
+ loading_status = policy.deserialize(torch.load(ckpt_path))
+ print(loading_status)
+ policy.cuda()
+ policy.eval()
+ if vq:
+ vq_dim = config['policy_config']['vq_dim']
+ vq_class = config['policy_config']['vq_class']
+ latent_model = Latent_Model_Transformer(vq_dim, vq_dim, vq_class)
+ latent_model_ckpt_path = os.path.join(ckpt_dir, 'latent_model_last.ckpt')
+ latent_model.deserialize(torch.load(latent_model_ckpt_path))
+ latent_model.eval()
+ latent_model.cuda()
+ print(f'Loaded policy from: {ckpt_path}, latent model from: {latent_model_ckpt_path}')
+ else:
+ print(f'Loaded: {ckpt_path}')
+ stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')
+ with open(stats_path, 'rb') as f:
+ stats = pickle.load(f)
+ # if use_actuator_net:
+ # prediction_len = actuator_config['prediction_len']
+ # future_len = actuator_config['future_len']
+ # history_len = actuator_config['history_len']
+ # actuator_network_dir = actuator_config['actuator_network_dir']
+
+ # from act.train_actuator_network import ActuatorNetwork
+ # actuator_network = ActuatorNetwork(prediction_len)
+ # actuator_network_path = os.path.join(actuator_network_dir, 'actuator_net_last.ckpt')
+ # loading_status = actuator_network.load_state_dict(torch.load(actuator_network_path))
+ # actuator_network.eval()
+ # actuator_network.cuda()
+ # print(f'Loaded actuator network from: {actuator_network_path}, {loading_status}')
+
+ # actuator_stats_path = os.path.join(actuator_network_dir, 'actuator_net_stats.pkl')
+ # with open(actuator_stats_path, 'rb') as f:
+ # actuator_stats = pickle.load(f)
+
+ # actuator_unnorm = lambda x: x * actuator_stats['commanded_speed_std'] + actuator_stats['commanded_speed_std']
+ # actuator_norm = lambda x: (x - actuator_stats['observed_speed_mean']) / actuator_stats['observed_speed_mean']
+ # def collect_base_action(all_actions, norm_episode_all_base_actions):
+ # post_processed_actions = post_process(all_actions.squeeze(0).cpu().numpy())
+ # norm_episode_all_base_actions += actuator_norm(post_processed_actions[:, -2:]).tolist()
+
+ pre_process = lambda s_qpos: (s_qpos - stats['qpos_mean']) / stats['qpos_std']
+ if policy_class == 'Diffusion':
+ post_process = lambda a: ((a + 1) / 2) * (stats['action_max'] - stats['action_min']) + stats['action_min']
+ else:
+ post_process = lambda a: a * stats['action_std'] + stats['action_mean']
+
+ # load environment
+ if real_robot:
+ from aloha_scripts.robot_utils import move_grippers # requires aloha
+ from aloha_scripts.real_env import make_real_env # requires aloha
+ env = make_real_env(init_node=True, setup_robots=True, setup_base=True)
+ env_max_reward = 0
+ else:
+ from sim_env import make_sim_env
+ env = make_sim_env(task_name)
+ env_max_reward = env.task.max_reward
+
+ query_frequency = policy_config['num_queries']
+ if temporal_agg:
+ query_frequency = 1
+ num_queries = policy_config['num_queries']
+ if real_robot:
+ BASE_DELAY = 13
+ query_frequency -= BASE_DELAY
+
+ max_timesteps = int(max_timesteps * 1) # may increase for real-world tasks
+
+ episode_returns = []
+ highest_rewards = []
+ for rollout_id in range(num_rollouts):
+ if real_robot:
+ e()
+ rollout_id += 0
+ ### set task
+ if 'sim_transfer_cube' in task_name:
+ BOX_POSE[0] = sample_box_pose() # used in sim reset
+ elif 'sim_insertion' in task_name:
+ BOX_POSE[0] = np.concatenate(sample_insertion_pose()) # used in sim reset
+
+ ts = env.reset()
+
+ ### onscreen render
+ if onscreen_render:
+ ax = plt.subplot()
+ plt_img = ax.imshow(env._physics.render(height=480, width=640, camera_id=onscreen_cam))
+ plt.ion()
+
+ ### evaluation loop
+ if temporal_agg:
+ all_time_actions = torch.zeros([max_timesteps, max_timesteps+num_queries, 16]).cuda()
+
+ # qpos_history = torch.zeros((1, max_timesteps, state_dim)).cuda()
+ qpos_history_raw = np.zeros((max_timesteps, state_dim))
+ image_list = [] # for visualization
+ qpos_list = []
+ target_qpos_list = []
+ rewards = []
+ # if use_actuator_net:
+ # norm_episode_all_base_actions = [actuator_norm(np.zeros(history_len, 2)).tolist()]
+ with torch.inference_mode():
+ time0 = time.time()
+ DT = 1 / FPS
+ culmulated_delay = 0
+ for t in range(max_timesteps):
+ time1 = time.time()
+ ### update onscreen render and wait for DT
+ if onscreen_render:
+ image = env._physics.render(height=480, width=640, camera_id=onscreen_cam)
+ plt_img.set_data(image)
+ plt.pause(DT)
+
+ ### process previous timestep to get qpos and image_list
+ time2 = time.time()
+ obs = ts.observation
+ if 'images' in obs:
+ image_list.append(obs['images'])
+ else:
+ image_list.append({'main': obs['image']})
+ qpos_numpy = np.array(obs['qpos'])
+ qpos_history_raw[t] = qpos_numpy
+ qpos = pre_process(qpos_numpy)
+ qpos = torch.from_numpy(qpos).float().cuda().unsqueeze(0)
+ # qpos_history[:, t] = qpos
+ if t % query_frequency == 0:
+ curr_image = get_image(ts, camera_names, rand_crop_resize=(config['policy_class'] == 'Diffusion'))
+ # print('get image: ', time.time() - time2)
+
+ if t == 0:
+ # warm up
+ for _ in range(10):
+ policy(qpos, curr_image)
+ print('network warm up done')
+ time1 = time.time()
+
+ ### query policy
+ time3 = time.time()
+ if config['policy_class'] == "ACT":
+ if t % query_frequency == 0:
+ if vq:
+ if rollout_id == 0:
+ for _ in range(10):
+ vq_sample = latent_model.generate(1, temperature=1, x=None)
+ print(torch.nonzero(vq_sample[0])[:, 1].cpu().numpy())
+ vq_sample = latent_model.generate(1, temperature=1, x=None)
+ all_actions = policy(qpos, curr_image, vq_sample=vq_sample)
+ else:
+ # e()
+ all_actions = policy(qpos, curr_image)
+ # if use_actuator_net:
+ # collect_base_action(all_actions, norm_episode_all_base_actions)
+ if real_robot:
+ all_actions = torch.cat([all_actions[:, :-BASE_DELAY, :-2], all_actions[:, BASE_DELAY:, -2:]], dim=2)
+ if temporal_agg:
+ all_time_actions[[t], t:t+num_queries] = all_actions
+ actions_for_curr_step = all_time_actions[:, t]
+ actions_populated = torch.all(actions_for_curr_step != 0, axis=1)
+ actions_for_curr_step = actions_for_curr_step[actions_populated]
+ k = 0.01
+ exp_weights = np.exp(-k * np.arange(len(actions_for_curr_step)))
+ exp_weights = exp_weights / exp_weights.sum()
+ exp_weights = torch.from_numpy(exp_weights).cuda().unsqueeze(dim=1)
+ raw_action = (actions_for_curr_step * exp_weights).sum(dim=0, keepdim=True)
+ else:
+ raw_action = all_actions[:, t % query_frequency]
+ # if t % query_frequency == query_frequency - 1:
+ # # zero out base actions to avoid overshooting
+ # raw_action[0, -2:] = 0
+ elif config['policy_class'] == "Diffusion":
+ if t % query_frequency == 0:
+ all_actions = policy(qpos, curr_image)
+ # if use_actuator_net:
+ # collect_base_action(all_actions, norm_episode_all_base_actions)
+ if real_robot:
+ all_actions = torch.cat([all_actions[:, :-BASE_DELAY, :-2], all_actions[:, BASE_DELAY:, -2:]], dim=2)
+ raw_action = all_actions[:, t % query_frequency]
+ elif config['policy_class'] == "CNNMLP":
+ raw_action = policy(qpos, curr_image)
+ all_actions = raw_action.unsqueeze(0)
+ # if use_actuator_net:
+ # collect_base_action(all_actions, norm_episode_all_base_actions)
+ else:
+ raise NotImplementedError
+ # print('query policy: ', time.time() - time3)
+
+ ### post-process actions
+ time4 = time.time()
+ raw_action = raw_action.squeeze(0).cpu().numpy()
+ action = post_process(raw_action)
+ target_qpos = action[:-2]
+
+ # if use_actuator_net:
+ # assert(not temporal_agg)
+ # if t % prediction_len == 0:
+ # offset_start_ts = t + history_len
+ # actuator_net_in = np.array(norm_episode_all_base_actions[offset_start_ts - history_len: offset_start_ts + future_len])
+ # actuator_net_in = torch.from_numpy(actuator_net_in).float().unsqueeze(dim=0).cuda()
+ # pred = actuator_network(actuator_net_in)
+ # base_action_chunk = actuator_unnorm(pred.detach().cpu().numpy()[0])
+ # base_action = base_action_chunk[t % prediction_len]
+ # else:
+ base_action = action[-2:]
+ # base_action = calibrate_linear_vel(base_action, c=0.19)
+ # base_action = postprocess_base_action(base_action)
+ # print('post process: ', time.time() - time4)
+
+ ### step the environment
+ time5 = time.time()
+ if real_robot:
+ ts = env.step(target_qpos, base_action)
+ else:
+ ts = env.step(target_qpos)
+ # print('step env: ', time.time() - time5)
+
+ ### for visualization
+ qpos_list.append(qpos_numpy)
+ target_qpos_list.append(target_qpos)
+ rewards.append(ts.reward)
+ duration = time.time() - time1
+ sleep_time = max(0, DT - duration)
+ # print(sleep_time)
+ time.sleep(sleep_time)
+ # time.sleep(max(0, DT - duration - culmulated_delay))
+ if duration >= DT:
+ culmulated_delay += (duration - DT)
+ print(f'Warning: step duration: {duration:.3f} s at step {t} longer than DT: {DT} s, culmulated delay: {culmulated_delay:.3f} s')
+ # else:
+ # culmulated_delay = max(0, culmulated_delay - (DT - duration))
+
+ print(f'Avg fps {max_timesteps / (time.time() - time0)}')
+ plt.close()
+ if real_robot:
+ move_grippers([env.puppet_bot_left, env.puppet_bot_right], [PUPPET_GRIPPER_JOINT_OPEN] * 2, move_time=0.5) # open
+ # save qpos_history_raw
+ log_id = get_auto_index(ckpt_dir)
+ np.save(os.path.join(ckpt_dir, f'qpos_{log_id}.npy'), qpos_history_raw)
+ plt.figure(figsize=(10, 20))
+ # plot qpos_history_raw for each qpos dim using subplots
+ for i in range(state_dim):
+ plt.subplot(state_dim, 1, i+1)
+ plt.plot(qpos_history_raw[:, i])
+ # remove x axis
+ if i != state_dim - 1:
+ plt.xticks([])
+ plt.tight_layout()
+ plt.savefig(os.path.join(ckpt_dir, f'qpos_{log_id}.png'))
+ plt.close()
+
+
+ rewards = np.array(rewards)
+ episode_return = np.sum(rewards[rewards!=None])
+ episode_returns.append(episode_return)
+ episode_highest_reward = np.max(rewards)
+ highest_rewards.append(episode_highest_reward)
+ print(f'Rollout {rollout_id}\n{episode_return=}, {episode_highest_reward=}, {env_max_reward=}, Success: {episode_highest_reward==env_max_reward}')
+
+ # if save_episode:
+ # save_videos(image_list, DT, video_path=os.path.join(ckpt_dir, f'video{rollout_id}.mp4'))
+
+ success_rate = np.mean(np.array(highest_rewards) == env_max_reward)
+ avg_return = np.mean(episode_returns)
+ summary_str = f'\nSuccess rate: {success_rate}\nAverage return: {avg_return}\n\n'
+ for r in range(env_max_reward+1):
+ more_or_equal_r = (np.array(highest_rewards) >= r).sum()
+ more_or_equal_r_rate = more_or_equal_r / num_rollouts
+ summary_str += f'Reward >= {r}: {more_or_equal_r}/{num_rollouts} = {more_or_equal_r_rate*100}%\n'
+
+ print(summary_str)
+
+ # save success rate to txt
+ result_file_name = 'result_' + ckpt_name.split('.')[0] + '.txt'
+ with open(os.path.join(ckpt_dir, result_file_name), 'w') as f:
+ f.write(summary_str)
+ f.write(repr(episode_returns))
+ f.write('\n\n')
+ f.write(repr(highest_rewards))
+
+ return success_rate, avg_return
+
+
+def forward_pass(data, policy):
+ image_data, qpos_data, action_data, is_pad = data
+ image_data, qpos_data, action_data, is_pad = image_data.cuda(), qpos_data.cuda(), action_data.cuda(), is_pad.cuda()
+ return policy(qpos_data, image_data, action_data, is_pad) # TODO remove None
+
+
+def train_bc(train_dataloader, val_dataloader, config):
+ num_steps = config['num_steps']
+ ckpt_dir = config['ckpt_dir']
+ seed = config['seed']
+ policy_class = config['policy_class']
+ policy_config = config['policy_config']
+ eval_every = config['eval_every']
+ validate_every = config['validate_every']
+ save_every = config['save_every']
+
+ set_seed(seed)
+
+ policy = make_policy(policy_class, policy_config)
+ if config['load_pretrain']:
+ loading_status = policy.deserialize(torch.load(os.path.join('/home/zfu/interbotix_ws/src/act/ckpts/pretrain_all', 'policy_step_50000_seed_0.ckpt')))
+ print(f'loaded! {loading_status}')
+ if config['resume_ckpt_path'] is not None:
+ loading_status = policy.deserialize(torch.load(config['resume_ckpt_path']))
+ print(f'Resume policy from: {config["resume_ckpt_path"]}, Status: {loading_status}')
+ policy.cuda()
+ optimizer = make_optimizer(policy_class, policy)
+
+ min_val_loss = np.inf
+ best_ckpt_info = None
+
+ train_dataloader = repeater(train_dataloader)
+ for step in tqdm(range(num_steps+1)):
+ # validation
+ if step % validate_every == 0:
+ print('validating')
+
+ with torch.inference_mode():
+ policy.eval()
+ validation_dicts = []
+ for batch_idx, data in enumerate(val_dataloader):
+ forward_dict = forward_pass(data, policy)
+ validation_dicts.append(forward_dict)
+ if batch_idx > 50:
+ break
+
+ validation_summary = compute_dict_mean(validation_dicts)
+
+ epoch_val_loss = validation_summary['loss']
+ if epoch_val_loss < min_val_loss:
+ min_val_loss = epoch_val_loss
+ best_ckpt_info = (step, min_val_loss, deepcopy(policy.serialize()))
+ for k in list(validation_summary.keys()):
+ validation_summary[f'val_{k}'] = validation_summary.pop(k)
+ wandb.log(validation_summary, step=step)
+ print(f'Val loss: {epoch_val_loss:.5f}')
+ summary_string = ''
+ for k, v in validation_summary.items():
+ summary_string += f'{k}: {v.item():.3f} '
+ print(summary_string)
+
+ # evaluation
+ if (step > 0) and (step % eval_every == 0):
+ # first save then eval
+ ckpt_name = f'policy_step_{step}_seed_{seed}.ckpt'
+ ckpt_path = os.path.join(ckpt_dir, ckpt_name)
+ torch.save(policy.serialize(), ckpt_path)
+ success, _ = eval_bc(config, ckpt_name, save_episode=True, num_rollouts=10)
+ wandb.log({'success': success}, step=step)
+
+ # training
+ policy.train()
+ optimizer.zero_grad()
+ data = next(train_dataloader)
+ forward_dict = forward_pass(data, policy)
+ # backward
+ loss = forward_dict['loss']
+ loss.backward()
+ optimizer.step()
+ wandb.log(forward_dict, step=step) # not great, make training 1-2% slower
+
+ if step % save_every == 0:
+ ckpt_path = os.path.join(ckpt_dir, f'policy_step_{step}_seed_{seed}.ckpt')
+ torch.save(policy.serialize(), ckpt_path)
+
+ ckpt_path = os.path.join(ckpt_dir, f'policy_last.ckpt')
+ torch.save(policy.serialize(), ckpt_path)
+
+ best_step, min_val_loss, best_state_dict = best_ckpt_info
+ ckpt_path = os.path.join(ckpt_dir, f'policy_step_{best_step}_seed_{seed}.ckpt')
+ torch.save(best_state_dict, ckpt_path)
+ print(f'Training finished:\nSeed {seed}, val loss {min_val_loss:.6f} at step {best_step}')
+
+ return best_ckpt_info
+
+def repeater(data_loader):
+ epoch = 0
+ for loader in repeat(data_loader):
+ for data in loader:
+ yield data
+ print(f'Epoch {epoch} done')
+ epoch += 1
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--eval', action='store_true')
+ parser.add_argument('--onscreen_render', action='store_true')
+ parser.add_argument('--ckpt_dir', action='store', type=str, help='ckpt_dir', required=True)
+ parser.add_argument('--policy_class', action='store', type=str, help='policy_class, capitalize', required=True)
+ parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)
+ parser.add_argument('--batch_size', action='store', type=int, help='batch_size', required=True)
+ parser.add_argument('--seed', action='store', type=int, help='seed', required=True)
+ parser.add_argument('--num_steps', action='store', type=int, help='num_steps', required=True)
+ parser.add_argument('--lr', action='store', type=float, help='lr', required=True)
+ parser.add_argument('--load_pretrain', action='store_true', default=False)
+ parser.add_argument('--eval_every', action='store', type=int, default=500, help='eval_every', required=False)
+ parser.add_argument('--validate_every', action='store', type=int, default=500, help='validate_every', required=False)
+ parser.add_argument('--save_every', action='store', type=int, default=500, help='save_every', required=False)
+ parser.add_argument('--resume_ckpt_path', action='store', type=str, help='resume_ckpt_path', required=False)
+ parser.add_argument('--skip_mirrored_data', action='store_true')
+ parser.add_argument('--actuator_network_dir', action='store', type=str, help='actuator_network_dir', required=False)
+ parser.add_argument('--history_len', action='store', type=int)
+ parser.add_argument('--future_len', action='store', type=int)
+ parser.add_argument('--prediction_len', action='store', type=int)
+
+ # for ACT
+ parser.add_argument('--kl_weight', action='store', type=int, help='KL Weight', required=False)
+ parser.add_argument('--chunk_size', action='store', type=int, help='chunk_size', required=False)
+ parser.add_argument('--hidden_dim', action='store', type=int, help='hidden_dim', required=False)
+ parser.add_argument('--dim_feedforward', action='store', type=int, help='dim_feedforward', required=False)
+ parser.add_argument('--temporal_agg', action='store_true')
+ parser.add_argument('--use_vq', action='store_true')
+ parser.add_argument('--vq_class', action='store', type=int, help='vq_class')
+ parser.add_argument('--vq_dim', action='store', type=int, help='vq_dim')
+ parser.add_argument('--no_encoder', action='store_true')
+
+ main(vars(parser.parse_args()))
diff --git a/docs/src/policy.py b/docs/src/policy.py
new file mode 100644
index 00000000..4e4a0aa9
--- /dev/null
+++ b/docs/src/policy.py
@@ -0,0 +1,295 @@
+import torch.nn as nn
+from torch.nn import functional as F
+import torchvision.transforms as transforms
+import torch
+import numpy as np
+from detr.main import build_ACT_model_and_optimizer, build_CNNMLP_model_and_optimizer
+import IPython
+e = IPython.embed
+
+from collections import OrderedDict
+from robomimic.models.base_nets import ResNet18Conv, SpatialSoftmax
+from robomimic.algo.diffusion_policy import replace_bn_with_gn, ConditionalUnet1D
+
+
+from diffusers.schedulers.scheduling_ddpm import DDPMScheduler
+from diffusers.schedulers.scheduling_ddim import DDIMScheduler
+from diffusers.training_utils import EMAModel
+
+
+class DiffusionPolicy(nn.Module):
+ def __init__(self, args_override):
+ super().__init__()
+
+ self.camera_names = args_override['camera_names']
+
+ self.observation_horizon = args_override['observation_horizon'] ### TODO TODO TODO DO THIS
+ self.action_horizon = args_override['action_horizon'] # apply chunk size
+ self.prediction_horizon = args_override['prediction_horizon'] # chunk size
+ self.num_inference_timesteps = args_override['num_inference_timesteps']
+ self.ema_power = args_override['ema_power']
+ self.lr = args_override['lr']
+ self.weight_decay = 0
+
+ self.num_kp = 32
+ self.feature_dimension = 64
+ self.ac_dim = args_override['action_dim'] # 14 + 2
+ self.obs_dim = self.feature_dimension * len(self.camera_names) + 14 # camera features and proprio
+
+ backbones = []
+ pools = []
+ linears = []
+ for _ in self.camera_names:
+ backbones.append(ResNet18Conv(**{'input_channel': 3, 'pretrained': False, 'input_coord_conv': False}))
+ pools.append(SpatialSoftmax(**{'input_shape': [512, 15, 20], 'num_kp': self.num_kp, 'temperature': 1.0, 'learnable_temperature': False, 'noise_std': 0.0}))
+ linears.append(torch.nn.Linear(int(np.prod([self.num_kp, 2])), self.feature_dimension))
+ backbones = nn.ModuleList(backbones)
+ pools = nn.ModuleList(pools)
+ linears = nn.ModuleList(linears)
+
+ backbones = replace_bn_with_gn(backbones) # TODO
+
+
+ noise_pred_net = ConditionalUnet1D(
+ input_dim=self.ac_dim,
+ global_cond_dim=self.obs_dim*self.observation_horizon
+ )
+
+ nets = nn.ModuleDict({
+ 'policy': nn.ModuleDict({
+ 'backbones': backbones,
+ 'pools': pools,
+ 'linears': linears,
+ 'noise_pred_net': noise_pred_net
+ })
+ })
+
+ nets = nets.float().cuda()
+ ENABLE_EMA = True
+ if ENABLE_EMA:
+ ema = EMAModel(model=nets, power=self.ema_power)
+ else:
+ ema = None
+ self.nets = nets
+ self.ema = ema
+
+ # setup noise scheduler
+ self.noise_scheduler = DDIMScheduler(
+ num_train_timesteps=50,
+ beta_schedule='squaredcos_cap_v2',
+ clip_sample=True,
+ set_alpha_to_one=True,
+ steps_offset=0,
+ prediction_type='epsilon'
+ )
+
+ n_parameters = sum(p.numel() for p in self.parameters())
+ print("number of parameters: %.2fM" % (n_parameters/1e6,))
+
+
+ def configure_optimizers(self):
+ optimizer = torch.optim.AdamW(self.nets.parameters(), lr=self.lr, weight_decay=self.weight_decay)
+ return optimizer
+
+
+ def __call__(self, qpos, image, actions=None, is_pad=None):
+ B = qpos.shape[0]
+ if actions is not None: # training time
+ nets = self.nets
+ all_features = []
+ for cam_id in range(len(self.camera_names)):
+ cam_image = image[:, cam_id]
+ cam_features = nets['policy']['backbones'][cam_id](cam_image)
+ pool_features = nets['policy']['pools'][cam_id](cam_features)
+ pool_features = torch.flatten(pool_features, start_dim=1)
+ out_features = nets['policy']['linears'][cam_id](pool_features)
+ all_features.append(out_features)
+
+ obs_cond = torch.cat(all_features + [qpos], dim=1)
+
+ # sample noise to add to actions
+ noise = torch.randn(actions.shape, device=obs_cond.device)
+
+ # sample a diffusion iteration for each data point
+ timesteps = torch.randint(
+ 0, self.noise_scheduler.config.num_train_timesteps,
+ (B,), device=obs_cond.device
+ ).long()
+
+ # add noise to the clean actions according to the noise magnitude at each diffusion iteration
+ # (this is the forward diffusion process)
+ noisy_actions = self.noise_scheduler.add_noise(
+ actions, noise, timesteps)
+
+ # predict the noise residual
+ noise_pred = nets['policy']['noise_pred_net'](noisy_actions, timesteps, global_cond=obs_cond)
+
+ # L2 loss
+ all_l2 = F.mse_loss(noise_pred, noise, reduction='none')
+ loss = (all_l2 * ~is_pad.unsqueeze(-1)).mean()
+
+ loss_dict = {}
+ loss_dict['l2_loss'] = loss
+ loss_dict['loss'] = loss
+
+ if self.training and self.ema is not None:
+ self.ema.step(nets)
+ return loss_dict
+ else: # inference time
+ To = self.observation_horizon
+ Ta = self.action_horizon
+ Tp = self.prediction_horizon
+ action_dim = self.ac_dim
+
+ nets = self.nets
+ if self.ema is not None:
+ nets = self.ema.averaged_model
+
+ all_features = []
+ for cam_id in range(len(self.camera_names)):
+ cam_image = image[:, cam_id]
+ cam_features = nets['policy']['backbones'][cam_id](cam_image)
+ pool_features = nets['policy']['pools'][cam_id](cam_features)
+ pool_features = torch.flatten(pool_features, start_dim=1)
+ out_features = nets['policy']['linears'][cam_id](pool_features)
+ all_features.append(out_features)
+
+ obs_cond = torch.cat(all_features + [qpos], dim=1)
+
+ # initialize action from Guassian noise
+ noisy_action = torch.randn(
+ (B, Tp, action_dim), device=obs_cond.device)
+ naction = noisy_action
+
+ # init scheduler
+ self.noise_scheduler.set_timesteps(self.num_inference_timesteps)
+
+ for k in self.noise_scheduler.timesteps:
+ # predict noise
+ noise_pred = nets['policy']['noise_pred_net'](
+ sample=naction,
+ timestep=k,
+ global_cond=obs_cond
+ )
+
+ # inverse diffusion step (remove noise)
+ naction = self.noise_scheduler.step(
+ model_output=noise_pred,
+ timestep=k,
+ sample=naction
+ ).prev_sample
+
+ return naction
+
+ def serialize(self):
+ return {
+ "nets": self.nets.state_dict(),
+ "ema": self.ema.averaged_model.state_dict() if self.ema is not None else None,
+ }
+
+ def deserialize(self, model_dict):
+ status = self.nets.load_state_dict(model_dict["nets"])
+ print('Loaded model')
+ if model_dict.get("ema", None) is not None:
+ print('Loaded EMA')
+ status_ema = self.ema.averaged_model.load_state_dict(model_dict["ema"])
+ status = [status, status_ema]
+ return status
+
+class ACTPolicy(nn.Module):
+ def __init__(self, args_override):
+ super().__init__()
+ model, optimizer = build_ACT_model_and_optimizer(args_override)
+ self.model = model # CVAE decoder
+ self.optimizer = optimizer
+ self.kl_weight = args_override['kl_weight']
+ self.vq = args_override['vq']
+ print(f'KL Weight {self.kl_weight}')
+
+ def __call__(self, qpos, image, actions=None, is_pad=None, vq_sample=None):
+ env_state = None
+ normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
+ std=[0.229, 0.224, 0.225])
+ image = normalize(image)
+ if actions is not None: # training time
+ actions = actions[:, :self.model.num_queries]
+ is_pad = is_pad[:, :self.model.num_queries]
+
+ loss_dict = dict()
+ a_hat, is_pad_hat, (mu, logvar), probs, binaries = self.model(qpos, image, env_state, actions, is_pad, vq_sample)
+ if self.vq or self.model.encoder is None:
+ total_kld = [torch.tensor(0.0)]
+ else:
+ total_kld, dim_wise_kld, mean_kld = kl_divergence(mu, logvar)
+ if self.vq:
+ loss_dict['vq_discrepancy'] = F.l1_loss(probs, binaries, reduction='mean')
+ all_l1 = F.l1_loss(actions, a_hat, reduction='none')
+ l1 = (all_l1 * ~is_pad.unsqueeze(-1)).mean()
+ loss_dict['l1'] = l1
+ loss_dict['kl'] = total_kld[0]
+ loss_dict['loss'] = loss_dict['l1'] + loss_dict['kl'] * self.kl_weight
+ return loss_dict
+ else: # inference time
+ a_hat, _, (_, _), _, _ = self.model(qpos, image, env_state, vq_sample=vq_sample) # no action, sample from prior
+ return a_hat
+
+ def configure_optimizers(self):
+ return self.optimizer
+
+ @torch.no_grad()
+ def vq_encode(self, qpos, actions, is_pad):
+ actions = actions[:, :self.model.num_queries]
+ is_pad = is_pad[:, :self.model.num_queries]
+
+ _, _, binaries, _, _ = self.model.encode(qpos, actions, is_pad)
+
+ return binaries
+
+ def serialize(self):
+ return self.state_dict()
+
+ def deserialize(self, model_dict):
+ return self.load_state_dict(model_dict)
+
+
+class CNNMLPPolicy(nn.Module):
+ def __init__(self, args_override):
+ super().__init__()
+ model, optimizer = build_CNNMLP_model_and_optimizer(args_override)
+ self.model = model # decoder
+ self.optimizer = optimizer
+
+ def __call__(self, qpos, image, actions=None, is_pad=None):
+ env_state = None # TODO
+ normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
+ std=[0.229, 0.224, 0.225])
+ image = normalize(image)
+ if actions is not None: # training time
+ actions = actions[:, 0]
+ a_hat = self.model(qpos, image, env_state, actions)
+ mse = F.mse_loss(actions, a_hat)
+ loss_dict = dict()
+ loss_dict['mse'] = mse
+ loss_dict['loss'] = loss_dict['mse']
+ return loss_dict
+ else: # inference time
+ a_hat = self.model(qpos, image, env_state) # no action, sample from prior
+ return a_hat
+
+ def configure_optimizers(self):
+ return self.optimizer
+
+def kl_divergence(mu, logvar):
+ batch_size = mu.size(0)
+ assert batch_size != 0
+ if mu.data.ndimension() == 4:
+ mu = mu.view(mu.size(0), mu.size(1))
+ if logvar.data.ndimension() == 4:
+ logvar = logvar.view(logvar.size(0), logvar.size(1))
+
+ klds = -0.5 * (1 + logvar - mu.pow(2) - logvar.exp())
+ total_kld = klds.sum(1).mean(0, True)
+ dimension_wise_kld = klds.mean(0)
+ mean_kld = klds.mean(1).mean(0, True)
+
+ return total_kld, dimension_wise_kld, mean_kld
diff --git a/docs/src/postprocess_episodes.py b/docs/src/postprocess_episodes.py
new file mode 100644
index 00000000..4d78f8f6
--- /dev/null
+++ b/docs/src/postprocess_episodes.py
@@ -0,0 +1,175 @@
+import os
+import numpy as np
+import cv2
+import h5py
+import argparse
+import time
+from visualize_episodes import visualize_joints, visualize_timestamp, save_videos
+
+import matplotlib.pyplot as plt
+from constants import DT
+
+import IPython
+e = IPython.embed
+
+JOINT_NAMES = ["waist", "shoulder", "elbow", "forearm_roll", "wrist_angle", "wrist_rotate"]
+STATE_NAMES = JOINT_NAMES + ["gripper"]
+
+MIRROR_STATE_MULTIPLY = np.array([-1, 1, 1, -1, 1, -1, 1]).astype('float32')
+MIRROR_BASE_MULTIPLY = np.array([1, -1]).astype('float32')
+
+def load_hdf5(dataset_dir, dataset_name):
+ dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')
+ if not os.path.isfile(dataset_path):
+ print(f'Dataset does not exist at \n{dataset_path}\n')
+ exit()
+
+ with h5py.File(dataset_path, 'r') as root:
+ is_sim = root.attrs['sim']
+ compressed = root.attrs.get('compress', False)
+ qpos = root['/observations/qpos'][()]
+ qvel = root['/observations/qvel'][()]
+ action = root['/action'][()]
+ image_dict = dict()
+ for cam_name in root[f'/observations/images/'].keys():
+ image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]
+ if 'base_action' in root.keys():
+ print('base_action exists')
+ base_action = root['/base_action'][()]
+ else:
+ base_action = None
+ if compressed:
+ compress_len = root['/compress_len'][()]
+
+ if compressed:
+ for cam_id, cam_name in enumerate(image_dict.keys()):
+ # un-pad and uncompress
+ padded_compressed_image_list = image_dict[cam_name]
+ image_list = []
+ for padded_compressed_image in padded_compressed_image_list: # [:1000] to save memory
+ image = cv2.imdecode(padded_compressed_image, 1)
+ image_list.append(image)
+ image_dict[cam_name] = np.array(image_list)
+
+ return qpos, qvel, action, base_action, image_dict, is_sim
+
+def main(args):
+ dataset_dir = args['dataset_dir']
+ num_episodes = args['num_episodes']
+
+ start_idx = 0
+ for episode_idx in range(start_idx, start_idx + num_episodes):
+ dataset_name = f'episode_{episode_idx}'
+
+ qpos, qvel, action, base_action, image_dict, is_sim = load_hdf5(dataset_dir, dataset_name)
+
+ # process proprioception
+ qpos = np.concatenate([qpos[:, 7:] * MIRROR_STATE_MULTIPLY, qpos[:, :7] * MIRROR_STATE_MULTIPLY], axis=1)
+ qvel = np.concatenate([qvel[:, 7:] * MIRROR_STATE_MULTIPLY, qvel[:, :7] * MIRROR_STATE_MULTIPLY], axis=1)
+ action = np.concatenate([action[:, 7:] * MIRROR_STATE_MULTIPLY, action[:, :7] * MIRROR_STATE_MULTIPLY], axis=1)
+ if base_action is not None:
+ base_action = base_action * MIRROR_BASE_MULTIPLY
+
+ # mirror image obs
+ if 'left_wrist' in image_dict.keys():
+ image_dict['left_wrist'], image_dict['right_wrist'] = image_dict['right_wrist'][:, :, ::-1], image_dict['left_wrist'][:, :, ::-1]
+ elif 'cam_left_wrist' in image_dict.keys():
+ image_dict['cam_left_wrist'], image_dict['cam_right_wrist'] = image_dict['cam_right_wrist'][:, :, ::-1], image_dict['cam_left_wrist'][:, :, ::-1]
+ else:
+ raise Exception('No left_wrist or cam_left_wrist in image_dict')
+
+ if 'top' in image_dict.keys():
+ image_dict['top'] = image_dict['top'][:, :, ::-1]
+ elif 'cam_high' in image_dict.keys():
+ image_dict['cam_high'] = image_dict['cam_high'][:, :, ::-1]
+ else:
+ raise Exception('No top or cam_high in image_dict')
+
+ # saving
+ data_dict = {
+ '/observations/qpos': qpos,
+ '/observations/qvel': qvel,
+ '/action': action,
+ '/base_action': base_action,
+ } if base_action is not None else {
+ '/observations/qpos': qpos,
+ '/observations/qvel': qvel,
+ '/action': action,
+ }
+ for cam_name in image_dict.keys():
+ data_dict[f'/observations/images/{cam_name}'] = image_dict[cam_name]
+ max_timesteps = len(qpos)
+
+ COMPRESS = True
+
+ if COMPRESS:
+ # JPEG compression
+ t0 = time.time()
+ encode_param = [int(cv2.IMWRITE_JPEG_QUALITY), 50] # tried as low as 20, seems fine
+ compressed_len = []
+ for cam_name in image_dict.keys():
+ image_list = data_dict[f'/observations/images/{cam_name}']
+ compressed_list = []
+ compressed_len.append([])
+ for image in image_list:
+ result, encoded_image = cv2.imencode('.jpg', image, encode_param) # 0.02 sec # cv2.imdecode(encoded_image, 1)
+ compressed_list.append(encoded_image)
+ compressed_len[-1].append(len(encoded_image))
+ data_dict[f'/observations/images/{cam_name}'] = compressed_list
+ print(f'compression: {time.time() - t0:.2f}s')
+
+ # pad so it has same length
+ t0 = time.time()
+ compressed_len = np.array(compressed_len)
+ padded_size = compressed_len.max()
+ for cam_name in image_dict.keys():
+ compressed_image_list = data_dict[f'/observations/images/{cam_name}']
+ padded_compressed_image_list = []
+ for compressed_image in compressed_image_list:
+ padded_compressed_image = np.zeros(padded_size, dtype='uint8')
+ image_len = len(compressed_image)
+ padded_compressed_image[:image_len] = compressed_image
+ padded_compressed_image_list.append(padded_compressed_image)
+ data_dict[f'/observations/images/{cam_name}'] = padded_compressed_image_list
+ print(f'padding: {time.time() - t0:.2f}s')
+
+ # HDF5
+ t0 = time.time()
+ dataset_path = os.path.join(dataset_dir, f'mirror_episode_{episode_idx}')
+ with h5py.File(dataset_path + '.hdf5', 'w', rdcc_nbytes=1024 ** 2 * 2) as root:
+ root.attrs['sim'] = is_sim
+ root.attrs['compress'] = COMPRESS
+ obs = root.create_group('observations')
+ image = obs.create_group('images')
+ for cam_name in image_dict.keys():
+ if COMPRESS:
+ _ = image.create_dataset(cam_name, (max_timesteps, padded_size), dtype='uint8',
+ chunks=(1, padded_size), )
+ else:
+ _ = image.create_dataset(cam_name, (max_timesteps, 480, 640, 3), dtype='uint8',
+ chunks=(1, 480, 640, 3), )
+ qpos = obs.create_dataset('qpos', (max_timesteps, 14))
+ qvel = obs.create_dataset('qvel', (max_timesteps, 14))
+ action = root.create_dataset('action', (max_timesteps, 14))
+ if base_action is not None:
+ base_action = root.create_dataset('base_action', (max_timesteps, 2))
+
+ for name, array in data_dict.items():
+ root[name][...] = array
+
+ if COMPRESS:
+ _ = root.create_dataset('compress_len', (len(image_dict.keys()), max_timesteps))
+ root['/compress_len'][...] = compressed_len
+
+ print(f'Saving {dataset_path}: {time.time() - t0:.1f} secs\n')
+
+ if episode_idx == start_idx:
+ save_videos(image_dict, DT, video_path=os.path.join(dataset_dir, dataset_name + '_mirror_video.mp4'))
+ # visualize_joints(qpos, action, plot_path=os.path.join(dataset_dir, dataset_name + '_mirror_qpos.png'))
+ # visualize_timestamp(t_list, dataset_path) # TODO addn timestamp back
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--dataset_dir', action='store', type=str, help='Dataset dir.', required=True)
+ parser.add_argument('--num_episodes', action='store', type=int, help='Number of episodes.', required=True)
+ main(vars(parser.parse_args()))
diff --git a/docs/src/record_sim_episodes.py b/docs/src/record_sim_episodes.py
new file mode 100644
index 00000000..c8348877
--- /dev/null
+++ b/docs/src/record_sim_episodes.py
@@ -0,0 +1,191 @@
+import time
+import os
+import numpy as np
+import argparse
+import matplotlib.pyplot as plt
+import h5py
+
+from constants import PUPPET_GRIPPER_POSITION_NORMALIZE_FN, SIM_TASK_CONFIGS
+from ee_sim_env import make_ee_sim_env
+from sim_env import make_sim_env, BOX_POSE
+from scripted_policy import PickAndTransferPolicy, InsertionPolicy
+
+import IPython
+e = IPython.embed
+
+
+def main(args):
+ """
+ Generate demonstration data in simulation.
+ First rollout the policy (defined in ee space) in ee_sim_env. Obtain the joint trajectory.
+ Replace the gripper joint positions with the commanded joint position.
+ Replay this joint trajectory (as action sequence) in sim_env, and record all observations.
+ Save this episode of data, and continue to next episode of data collection.
+ """
+
+ task_name = args['task_name']
+ dataset_dir = args['dataset_dir']
+ num_episodes = args['num_episodes']
+ onscreen_render = args['onscreen_render']
+ inject_noise = False
+ render_cam_name = 'top'
+
+ if not os.path.isdir(dataset_dir):
+ os.makedirs(dataset_dir, exist_ok=True)
+
+ episode_len = SIM_TASK_CONFIGS[task_name]['episode_len']
+ camera_names = SIM_TASK_CONFIGS[task_name]['camera_names']
+ if task_name == 'sim_transfer_cube_scripted':
+ policy_cls = PickAndTransferPolicy
+ elif task_name == 'sim_insertion_scripted':
+ policy_cls = InsertionPolicy
+ elif task_name == 'sim_transfer_cube_scripted_mirror':
+ policy_cls = PickAndTransferPolicy
+ else:
+ raise NotImplementedError
+
+ success = []
+ for episode_idx in range(num_episodes):
+ print(f'{episode_idx=}')
+ print('Rollout out EE space scripted policy')
+ # setup the environment
+ env = make_ee_sim_env(task_name)
+ ts = env.reset()
+ episode = [ts]
+ policy = policy_cls(inject_noise)
+ # setup plotting
+ if onscreen_render:
+ ax = plt.subplot()
+ plt_img = ax.imshow(ts.observation['images'][render_cam_name])
+ plt.ion()
+ for step in range(episode_len):
+ action = policy(ts)
+ ts = env.step(action)
+ episode.append(ts)
+ if onscreen_render:
+ plt_img.set_data(ts.observation['images'][render_cam_name])
+ plt.pause(0.002)
+ plt.close()
+
+ episode_return = np.sum([ts.reward for ts in episode[1:]])
+ episode_max_reward = np.max([ts.reward for ts in episode[1:]])
+ if episode_max_reward == env.task.max_reward:
+ print(f"{episode_idx=} Successful, {episode_return=}")
+ else:
+ print(f"{episode_idx=} Failed")
+
+ joint_traj = [ts.observation['qpos'] for ts in episode]
+ # replace gripper pose with gripper control
+ gripper_ctrl_traj = [ts.observation['gripper_ctrl'] for ts in episode]
+ for joint, ctrl in zip(joint_traj, gripper_ctrl_traj):
+ left_ctrl = PUPPET_GRIPPER_POSITION_NORMALIZE_FN(ctrl[0])
+ right_ctrl = PUPPET_GRIPPER_POSITION_NORMALIZE_FN(ctrl[2])
+ joint[6] = left_ctrl
+ joint[6+7] = right_ctrl
+
+ subtask_info = episode[0].observation['env_state'].copy() # box pose at step 0
+
+ # clear unused variables
+ del env
+ del episode
+ del policy
+
+ # setup the environment
+ print('Replaying joint commands')
+ env = make_sim_env(task_name)
+ BOX_POSE[0] = subtask_info # make sure the sim_env has the same object configurations as ee_sim_env
+ ts = env.reset()
+
+ episode_replay = [ts]
+ # setup plotting
+ if onscreen_render:
+ ax = plt.subplot()
+ plt_img = ax.imshow(ts.observation['images'][render_cam_name])
+ plt.ion()
+ for t in range(len(joint_traj)): # note: this will increase episode length by 1
+ action = joint_traj[t]
+ ts = env.step(action)
+ episode_replay.append(ts)
+ if onscreen_render:
+ plt_img.set_data(ts.observation['images'][render_cam_name])
+ plt.pause(0.02)
+
+ episode_return = np.sum([ts.reward for ts in episode_replay[1:]])
+ episode_max_reward = np.max([ts.reward for ts in episode_replay[1:]])
+ if episode_max_reward == env.task.max_reward:
+ success.append(1)
+ print(f"{episode_idx=} Successful, {episode_return=}")
+ else:
+ success.append(0)
+ print(f"{episode_idx=} Failed")
+
+ plt.close()
+
+ """
+ For each timestep:
+ observations
+ - images
+ - each_cam_name (480, 640, 3) 'uint8'
+ - qpos (14,) 'float64'
+ - qvel (14,) 'float64'
+
+ action (14,) 'float64'
+ """
+
+ data_dict = {
+ '/observations/qpos': [],
+ '/observations/qvel': [],
+ '/action': [],
+ }
+ for cam_name in camera_names:
+ data_dict[f'/observations/images/{cam_name}'] = []
+
+ # because the replaying, there will be eps_len + 1 actions and eps_len + 2 timesteps
+ # truncate here to be consistent
+ joint_traj = joint_traj[:-1]
+ episode_replay = episode_replay[:-1]
+
+ # len(joint_traj) i.e. actions: max_timesteps
+ # len(episode_replay) i.e. time steps: max_timesteps + 1
+ max_timesteps = len(joint_traj)
+ while joint_traj:
+ action = joint_traj.pop(0)
+ ts = episode_replay.pop(0)
+ data_dict['/observations/qpos'].append(ts.observation['qpos'])
+ data_dict['/observations/qvel'].append(ts.observation['qvel'])
+ data_dict['/action'].append(action)
+ for cam_name in camera_names:
+ data_dict[f'/observations/images/{cam_name}'].append(ts.observation['images'][cam_name])
+
+ # HDF5
+ t0 = time.time()
+ dataset_path = os.path.join(dataset_dir, f'episode_{episode_idx}')
+ with h5py.File(dataset_path + '.hdf5', 'w', rdcc_nbytes=1024 ** 2 * 2) as root:
+ root.attrs['sim'] = True
+ obs = root.create_group('observations')
+ image = obs.create_group('images')
+ for cam_name in camera_names:
+ _ = image.create_dataset(cam_name, (max_timesteps, 480, 640, 3), dtype='uint8',
+ chunks=(1, 480, 640, 3), )
+ # compression='gzip',compression_opts=2,)
+ # compression=32001, compression_opts=(0, 0, 0, 0, 9, 1, 1), shuffle=False)
+ qpos = obs.create_dataset('qpos', (max_timesteps, 14))
+ qvel = obs.create_dataset('qvel', (max_timesteps, 14))
+ action = root.create_dataset('action', (max_timesteps, 14))
+
+ for name, array in data_dict.items():
+ root[name][...] = array
+ print(f'Saving: {time.time() - t0:.1f} secs\n')
+
+ print(f'Saved to {dataset_dir}')
+ print(f'Success: {np.sum(success)} / {len(success)}')
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)
+ parser.add_argument('--dataset_dir', action='store', type=str, help='dataset saving dir', required=True)
+ parser.add_argument('--num_episodes', action='store', type=int, help='num_episodes', required=False)
+ parser.add_argument('--onscreen_render', action='store_true')
+
+ main(vars(parser.parse_args()))
+
diff --git a/docs/src/replay_episodes.py b/docs/src/replay_episodes.py
new file mode 100644
index 00000000..2e4f6757
--- /dev/null
+++ b/docs/src/replay_episodes.py
@@ -0,0 +1,48 @@
+import os
+import h5py
+import argparse
+from collections import defaultdict
+from sim_env import make_sim_env
+from utils import sample_box_pose, sample_insertion_pose
+from sim_env import BOX_POSE
+from constants import DT
+from visualize_episodes import save_videos
+
+import IPython
+e = IPython.embed
+
+
+def main(args):
+ dataset_path = args['dataset_path']
+
+
+ if not os.path.isfile(dataset_path):
+ print(f'Dataset does not exist at \n{dataset_path}\n')
+ exit()
+
+ with h5py.File(dataset_path, 'r') as root:
+ actions = root['/action'][()]
+
+ env = make_sim_env('sim_transfer_cube')
+ BOX_POSE[0] = sample_box_pose() # used in sim reset
+ ts = env.reset()
+ episode_replay = [ts]
+ for action in actions:
+ ts = env.step(action)
+ episode_replay.append(ts)
+
+ # saving
+ image_dict = defaultdict(lambda: [])
+ while episode_replay:
+ ts = episode_replay.pop(0)
+ for cam_name, image in ts.observation['images'].items():
+ image_dict[cam_name].append(image)
+
+ video_path = dataset_path.replace('episode_', 'replay_episode_').replace('hdf5', 'mp4')
+ save_videos(image_dict, DT, video_path=video_path)
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--dataset_path', action='store', type=str, help='Dataset path.', required=True)
+ main(vars(parser.parse_args()))
diff --git a/docs/src/scripted_policy.py b/docs/src/scripted_policy.py
new file mode 100644
index 00000000..4fd8f000
--- /dev/null
+++ b/docs/src/scripted_policy.py
@@ -0,0 +1,194 @@
+import numpy as np
+import matplotlib.pyplot as plt
+from pyquaternion import Quaternion
+
+from constants import SIM_TASK_CONFIGS
+from ee_sim_env import make_ee_sim_env
+
+import IPython
+e = IPython.embed
+
+
+class BasePolicy:
+ def __init__(self, inject_noise=False):
+ self.inject_noise = inject_noise
+ self.step_count = 0
+ self.left_trajectory = None
+ self.right_trajectory = None
+
+ def generate_trajectory(self, ts_first):
+ raise NotImplementedError
+
+ @staticmethod
+ def interpolate(curr_waypoint, next_waypoint, t):
+ t_frac = (t - curr_waypoint["t"]) / (next_waypoint["t"] - curr_waypoint["t"])
+ curr_xyz = curr_waypoint['xyz']
+ curr_quat = curr_waypoint['quat']
+ curr_grip = curr_waypoint['gripper']
+ next_xyz = next_waypoint['xyz']
+ next_quat = next_waypoint['quat']
+ next_grip = next_waypoint['gripper']
+ xyz = curr_xyz + (next_xyz - curr_xyz) * t_frac
+ quat = curr_quat + (next_quat - curr_quat) * t_frac
+ gripper = curr_grip + (next_grip - curr_grip) * t_frac
+ return xyz, quat, gripper
+
+ def __call__(self, ts):
+ # generate trajectory at first timestep, then open-loop execution
+ if self.step_count == 0:
+ self.generate_trajectory(ts)
+
+ # obtain left and right waypoints
+ if self.left_trajectory[0]['t'] == self.step_count:
+ self.curr_left_waypoint = self.left_trajectory.pop(0)
+ next_left_waypoint = self.left_trajectory[0]
+
+ if self.right_trajectory[0]['t'] == self.step_count:
+ self.curr_right_waypoint = self.right_trajectory.pop(0)
+ next_right_waypoint = self.right_trajectory[0]
+
+ # interpolate between waypoints to obtain current pose and gripper command
+ left_xyz, left_quat, left_gripper = self.interpolate(self.curr_left_waypoint, next_left_waypoint, self.step_count)
+ right_xyz, right_quat, right_gripper = self.interpolate(self.curr_right_waypoint, next_right_waypoint, self.step_count)
+
+ # Inject noise
+ if self.inject_noise:
+ scale = 0.01
+ left_xyz = left_xyz + np.random.uniform(-scale, scale, left_xyz.shape)
+ right_xyz = right_xyz + np.random.uniform(-scale, scale, right_xyz.shape)
+
+ action_left = np.concatenate([left_xyz, left_quat, [left_gripper]])
+ action_right = np.concatenate([right_xyz, right_quat, [right_gripper]])
+
+ self.step_count += 1
+ return np.concatenate([action_left, action_right])
+
+
+class PickAndTransferPolicy(BasePolicy):
+
+ def generate_trajectory(self, ts_first):
+ init_mocap_pose_right = ts_first.observation['mocap_pose_right']
+ init_mocap_pose_left = ts_first.observation['mocap_pose_left']
+
+ box_info = np.array(ts_first.observation['env_state'])
+ box_xyz = box_info[:3]
+ box_quat = box_info[3:]
+ # print(f"Generate trajectory for {box_xyz=}")
+
+ gripper_pick_quat = Quaternion(init_mocap_pose_right[3:])
+ gripper_pick_quat = gripper_pick_quat * Quaternion(axis=[0.0, 1.0, 0.0], degrees=-60)
+
+ meet_left_quat = Quaternion(axis=[1.0, 0.0, 0.0], degrees=90)
+
+ meet_xyz = np.array([0, 0.5, 0.25])
+
+ self.left_trajectory = [
+ {"t": 0, "xyz": init_mocap_pose_left[:3], "quat": init_mocap_pose_left[3:], "gripper": 0}, # sleep
+ {"t": 100, "xyz": meet_xyz + np.array([-0.1, 0, -0.02]), "quat": meet_left_quat.elements, "gripper": 1}, # approach meet position
+ {"t": 260, "xyz": meet_xyz + np.array([0.02, 0, -0.02]), "quat": meet_left_quat.elements, "gripper": 1}, # move to meet position
+ {"t": 310, "xyz": meet_xyz + np.array([0.02, 0, -0.02]), "quat": meet_left_quat.elements, "gripper": 0}, # close gripper
+ {"t": 360, "xyz": meet_xyz + np.array([-0.1, 0, -0.02]), "quat": np.array([1, 0, 0, 0]), "gripper": 0}, # move left
+ {"t": 400, "xyz": meet_xyz + np.array([-0.1, 0, -0.02]), "quat": np.array([1, 0, 0, 0]), "gripper": 0}, # stay
+ ]
+
+ self.right_trajectory = [
+ {"t": 0, "xyz": init_mocap_pose_right[:3], "quat": init_mocap_pose_right[3:], "gripper": 0}, # sleep
+ {"t": 90, "xyz": box_xyz + np.array([0, 0, 0.08]), "quat": gripper_pick_quat.elements, "gripper": 1}, # approach the cube
+ {"t": 130, "xyz": box_xyz + np.array([0, 0, -0.015]), "quat": gripper_pick_quat.elements, "gripper": 1}, # go down
+ {"t": 170, "xyz": box_xyz + np.array([0, 0, -0.015]), "quat": gripper_pick_quat.elements, "gripper": 0}, # close gripper
+ {"t": 200, "xyz": meet_xyz + np.array([0.05, 0, 0]), "quat": gripper_pick_quat.elements, "gripper": 0}, # approach meet position
+ {"t": 220, "xyz": meet_xyz, "quat": gripper_pick_quat.elements, "gripper": 0}, # move to meet position
+ {"t": 310, "xyz": meet_xyz, "quat": gripper_pick_quat.elements, "gripper": 1}, # open gripper
+ {"t": 360, "xyz": meet_xyz + np.array([0.1, 0, 0]), "quat": gripper_pick_quat.elements, "gripper": 1}, # move to right
+ {"t": 400, "xyz": meet_xyz + np.array([0.1, 0, 0]), "quat": gripper_pick_quat.elements, "gripper": 1}, # stay
+ ]
+
+
+class InsertionPolicy(BasePolicy):
+
+ def generate_trajectory(self, ts_first):
+ init_mocap_pose_right = ts_first.observation['mocap_pose_right']
+ init_mocap_pose_left = ts_first.observation['mocap_pose_left']
+
+ peg_info = np.array(ts_first.observation['env_state'])[:7]
+ peg_xyz = peg_info[:3]
+ peg_quat = peg_info[3:]
+
+ socket_info = np.array(ts_first.observation['env_state'])[7:]
+ socket_xyz = socket_info[:3]
+ socket_quat = socket_info[3:]
+
+ gripper_pick_quat_right = Quaternion(init_mocap_pose_right[3:])
+ gripper_pick_quat_right = gripper_pick_quat_right * Quaternion(axis=[0.0, 1.0, 0.0], degrees=-60)
+
+ gripper_pick_quat_left = Quaternion(init_mocap_pose_right[3:])
+ gripper_pick_quat_left = gripper_pick_quat_left * Quaternion(axis=[0.0, 1.0, 0.0], degrees=60)
+
+ meet_xyz = np.array([0, 0.5, 0.15])
+ lift_right = 0.00715
+
+ self.left_trajectory = [
+ {"t": 0, "xyz": init_mocap_pose_left[:3], "quat": init_mocap_pose_left[3:], "gripper": 0}, # sleep
+ {"t": 120, "xyz": socket_xyz + np.array([0, 0, 0.08]), "quat": gripper_pick_quat_left.elements, "gripper": 1}, # approach the cube
+ {"t": 170, "xyz": socket_xyz + np.array([0, 0, -0.03]), "quat": gripper_pick_quat_left.elements, "gripper": 1}, # go down
+ {"t": 220, "xyz": socket_xyz + np.array([0, 0, -0.03]), "quat": gripper_pick_quat_left.elements, "gripper": 0}, # close gripper
+ {"t": 285, "xyz": meet_xyz + np.array([-0.1, 0, 0]), "quat": gripper_pick_quat_left.elements, "gripper": 0}, # approach meet position
+ {"t": 340, "xyz": meet_xyz + np.array([-0.05, 0, 0]), "quat": gripper_pick_quat_left.elements,"gripper": 0}, # insertion
+ {"t": 400, "xyz": meet_xyz + np.array([-0.05, 0, 0]), "quat": gripper_pick_quat_left.elements, "gripper": 0}, # insertion
+ ]
+
+ self.right_trajectory = [
+ {"t": 0, "xyz": init_mocap_pose_right[:3], "quat": init_mocap_pose_right[3:], "gripper": 0}, # sleep
+ {"t": 120, "xyz": peg_xyz + np.array([0, 0, 0.08]), "quat": gripper_pick_quat_right.elements, "gripper": 1}, # approach the cube
+ {"t": 170, "xyz": peg_xyz + np.array([0, 0, -0.03]), "quat": gripper_pick_quat_right.elements, "gripper": 1}, # go down
+ {"t": 220, "xyz": peg_xyz + np.array([0, 0, -0.03]), "quat": gripper_pick_quat_right.elements, "gripper": 0}, # close gripper
+ {"t": 285, "xyz": meet_xyz + np.array([0.1, 0, lift_right]), "quat": gripper_pick_quat_right.elements, "gripper": 0}, # approach meet position
+ {"t": 340, "xyz": meet_xyz + np.array([0.05, 0, lift_right]), "quat": gripper_pick_quat_right.elements, "gripper": 0}, # insertion
+ {"t": 400, "xyz": meet_xyz + np.array([0.05, 0, lift_right]), "quat": gripper_pick_quat_right.elements, "gripper": 0}, # insertion
+
+ ]
+
+
+def test_policy(task_name):
+ # example rolling out pick_and_transfer policy
+ onscreen_render = True
+ inject_noise = False
+
+ # setup the environment
+ episode_len = SIM_TASK_CONFIGS[task_name]['episode_len']
+ if 'sim_transfer_cube' in task_name:
+ env = make_ee_sim_env('sim_transfer_cube')
+ elif 'sim_insertion' in task_name:
+ env = make_ee_sim_env('sim_insertion')
+ else:
+ raise NotImplementedError
+
+ for episode_idx in range(2):
+ ts = env.reset()
+ episode = [ts]
+ if onscreen_render:
+ ax = plt.subplot()
+ plt_img = ax.imshow(ts.observation['images']['angle'])
+ plt.ion()
+
+ policy = PickAndTransferPolicy(inject_noise)
+ for step in range(episode_len):
+ action = policy(ts)
+ ts = env.step(action)
+ episode.append(ts)
+ if onscreen_render:
+ plt_img.set_data(ts.observation['images']['angle'])
+ plt.pause(0.02)
+ plt.close()
+
+ episode_return = np.sum([ts.reward for ts in episode[1:]])
+ if episode_return > 0:
+ print(f"{episode_idx=} Successful, {episode_return=}")
+ else:
+ print(f"{episode_idx=} Failed")
+
+
+if __name__ == '__main__':
+ test_task_name = 'sim_transfer_cube_scripted'
+ test_policy(test_task_name)
+
diff --git a/docs/src/setup.py b/docs/src/setup.py
new file mode 100644
index 00000000..f9373217
--- /dev/null
+++ b/docs/src/setup.py
@@ -0,0 +1,10 @@
+from distutils.core import setup
+from setuptools import find_packages
+
+setup(
+ name='act',
+ version='0.0.0',
+ packages=find_packages(),
+ license='MIT License',
+ long_description=open('README.md').read(),
+)
diff --git a/docs/src/sim_env.py b/docs/src/sim_env.py
new file mode 100644
index 00000000..03b21f95
--- /dev/null
+++ b/docs/src/sim_env.py
@@ -0,0 +1,280 @@
+import numpy as np
+import os
+import collections
+import matplotlib.pyplot as plt
+from dm_control import mujoco
+from dm_control.rl import control
+from dm_control.suite import base
+
+from constants import DT, XML_DIR, START_ARM_POSE
+from constants import PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN
+from constants import MASTER_GRIPPER_POSITION_NORMALIZE_FN
+from constants import PUPPET_GRIPPER_POSITION_NORMALIZE_FN
+from constants import PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN
+
+import IPython
+e = IPython.embed
+
+BOX_POSE = [None] # to be changed from outside
+
+def make_sim_env(task_name):
+ """
+ Environment for simulated robot bi-manual manipulation, with joint position control
+ Action space: [left_arm_qpos (6), # absolute joint position
+ left_gripper_positions (1), # normalized gripper position (0: close, 1: open)
+ right_arm_qpos (6), # absolute joint position
+ right_gripper_positions (1),] # normalized gripper position (0: close, 1: open)
+
+ Observation space: {"qpos": Concat[ left_arm_qpos (6), # absolute joint position
+ left_gripper_position (1), # normalized gripper position (0: close, 1: open)
+ right_arm_qpos (6), # absolute joint position
+ right_gripper_qpos (1)] # normalized gripper position (0: close, 1: open)
+ "qvel": Concat[ left_arm_qvel (6), # absolute joint velocity (rad)
+ left_gripper_velocity (1), # normalized gripper velocity (pos: opening, neg: closing)
+ right_arm_qvel (6), # absolute joint velocity (rad)
+ right_gripper_qvel (1)] # normalized gripper velocity (pos: opening, neg: closing)
+ "images": {"main": (480x640x3)} # h, w, c, dtype='uint8'
+ """
+ if 'sim_transfer_cube' in task_name:
+ xml_path = os.path.join(XML_DIR, f'bimanual_viperx_transfer_cube.xml')
+ physics = mujoco.Physics.from_xml_path(xml_path)
+ task = TransferCubeTask(random=False)
+ env = control.Environment(physics, task, time_limit=20, control_timestep=DT,
+ n_sub_steps=None, flat_observation=False)
+ elif 'sim_insertion' in task_name:
+ xml_path = os.path.join(XML_DIR, f'bimanual_viperx_insertion.xml')
+ physics = mujoco.Physics.from_xml_path(xml_path)
+ task = InsertionTask(random=False)
+ env = control.Environment(physics, task, time_limit=20, control_timestep=DT,
+ n_sub_steps=None, flat_observation=False)
+ else:
+ raise NotImplementedError
+ return env
+
+class BimanualViperXTask(base.Task):
+ def __init__(self, random=None):
+ super().__init__(random=random)
+
+ def before_step(self, action, physics):
+ left_arm_action = action[:6]
+ right_arm_action = action[7:7+6]
+ normalized_left_gripper_action = action[6]
+ normalized_right_gripper_action = action[7+6]
+
+ left_gripper_action = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(normalized_left_gripper_action)
+ right_gripper_action = PUPPET_GRIPPER_POSITION_UNNORMALIZE_FN(normalized_right_gripper_action)
+
+ full_left_gripper_action = [left_gripper_action, -left_gripper_action]
+ full_right_gripper_action = [right_gripper_action, -right_gripper_action]
+
+ env_action = np.concatenate([left_arm_action, full_left_gripper_action, right_arm_action, full_right_gripper_action])
+ super().before_step(env_action, physics)
+ return
+
+ def initialize_episode(self, physics):
+ """Sets the state of the environment at the start of each episode."""
+ super().initialize_episode(physics)
+
+ @staticmethod
+ def get_qpos(physics):
+ qpos_raw = physics.data.qpos.copy()
+ left_qpos_raw = qpos_raw[:8]
+ right_qpos_raw = qpos_raw[8:16]
+ left_arm_qpos = left_qpos_raw[:6]
+ right_arm_qpos = right_qpos_raw[:6]
+ left_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(left_qpos_raw[6])]
+ right_gripper_qpos = [PUPPET_GRIPPER_POSITION_NORMALIZE_FN(right_qpos_raw[6])]
+ return np.concatenate([left_arm_qpos, left_gripper_qpos, right_arm_qpos, right_gripper_qpos])
+
+ @staticmethod
+ def get_qvel(physics):
+ qvel_raw = physics.data.qvel.copy()
+ left_qvel_raw = qvel_raw[:8]
+ right_qvel_raw = qvel_raw[8:16]
+ left_arm_qvel = left_qvel_raw[:6]
+ right_arm_qvel = right_qvel_raw[:6]
+ left_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(left_qvel_raw[6])]
+ right_gripper_qvel = [PUPPET_GRIPPER_VELOCITY_NORMALIZE_FN(right_qvel_raw[6])]
+ return np.concatenate([left_arm_qvel, left_gripper_qvel, right_arm_qvel, right_gripper_qvel])
+
+ @staticmethod
+ def get_env_state(physics):
+ raise NotImplementedError
+
+ def get_observation(self, physics):
+ obs = collections.OrderedDict()
+ obs['qpos'] = self.get_qpos(physics)
+ obs['qvel'] = self.get_qvel(physics)
+ obs['env_state'] = self.get_env_state(physics)
+ obs['images'] = dict()
+ obs['images']['top'] = physics.render(height=480, width=640, camera_id='top')
+ obs['images']['left_wrist'] = physics.render(height=480, width=640, camera_id='left_wrist')
+ obs['images']['right_wrist'] = physics.render(height=480, width=640, camera_id='right_wrist')
+ # obs['images']['angle'] = physics.render(height=480, width=640, camera_id='angle')
+ # obs['images']['vis'] = physics.render(height=480, width=640, camera_id='front_close')
+
+ return obs
+
+ def get_reward(self, physics):
+ # return whether left gripper is holding the box
+ raise NotImplementedError
+
+
+class TransferCubeTask(BimanualViperXTask):
+ def __init__(self, random=None):
+ super().__init__(random=random)
+ self.max_reward = 4
+
+ def initialize_episode(self, physics):
+ """Sets the state of the environment at the start of each episode."""
+ # TODO Notice: this function does not randomize the env configuration. Instead, set BOX_POSE from outside
+ # reset qpos, control and box position
+ with physics.reset_context():
+ physics.named.data.qpos[:16] = START_ARM_POSE
+ np.copyto(physics.data.ctrl, START_ARM_POSE)
+ assert BOX_POSE[0] is not None
+ physics.named.data.qpos[-7:] = BOX_POSE[0]
+ # print(f"{BOX_POSE=}")
+ super().initialize_episode(physics)
+
+ @staticmethod
+ def get_env_state(physics):
+ env_state = physics.data.qpos.copy()[16:]
+ return env_state
+
+ def get_reward(self, physics):
+ # return whether left gripper is holding the box
+ all_contact_pairs = []
+ for i_contact in range(physics.data.ncon):
+ id_geom_1 = physics.data.contact[i_contact].geom1
+ id_geom_2 = physics.data.contact[i_contact].geom2
+ name_geom_1 = physics.model.id2name(id_geom_1, 'geom')
+ name_geom_2 = physics.model.id2name(id_geom_2, 'geom')
+ contact_pair = (name_geom_1, name_geom_2)
+ all_contact_pairs.append(contact_pair)
+
+ touch_left_gripper = ("red_box", "vx300s_left/10_left_gripper_finger") in all_contact_pairs
+ touch_right_gripper = ("red_box", "vx300s_right/10_right_gripper_finger") in all_contact_pairs
+ touch_table = ("red_box", "table") in all_contact_pairs
+
+ reward = 0
+ if touch_right_gripper:
+ reward = 1
+ if touch_right_gripper and not touch_table: # lifted
+ reward = 2
+ if touch_left_gripper: # attempted transfer
+ reward = 3
+ if touch_left_gripper and not touch_table: # successful transfer
+ reward = 4
+ return reward
+
+
+class InsertionTask(BimanualViperXTask):
+ def __init__(self, random=None):
+ super().__init__(random=random)
+ self.max_reward = 4
+
+ def initialize_episode(self, physics):
+ """Sets the state of the environment at the start of each episode."""
+ # TODO Notice: this function does not randomize the env configuration. Instead, set BOX_POSE from outside
+ # reset qpos, control and box position
+ with physics.reset_context():
+ physics.named.data.qpos[:16] = START_ARM_POSE
+ np.copyto(physics.data.ctrl, START_ARM_POSE)
+ assert BOX_POSE[0] is not None
+ physics.named.data.qpos[-7*2:] = BOX_POSE[0] # two objects
+ # print(f"{BOX_POSE=}")
+ super().initialize_episode(physics)
+
+ @staticmethod
+ def get_env_state(physics):
+ env_state = physics.data.qpos.copy()[16:]
+ return env_state
+
+ def get_reward(self, physics):
+ # return whether peg touches the pin
+ all_contact_pairs = []
+ for i_contact in range(physics.data.ncon):
+ id_geom_1 = physics.data.contact[i_contact].geom1
+ id_geom_2 = physics.data.contact[i_contact].geom2
+ name_geom_1 = physics.model.id2name(id_geom_1, 'geom')
+ name_geom_2 = physics.model.id2name(id_geom_2, 'geom')
+ contact_pair = (name_geom_1, name_geom_2)
+ all_contact_pairs.append(contact_pair)
+
+ touch_right_gripper = ("red_peg", "vx300s_right/10_right_gripper_finger") in all_contact_pairs
+ touch_left_gripper = ("socket-1", "vx300s_left/10_left_gripper_finger") in all_contact_pairs or \
+ ("socket-2", "vx300s_left/10_left_gripper_finger") in all_contact_pairs or \
+ ("socket-3", "vx300s_left/10_left_gripper_finger") in all_contact_pairs or \
+ ("socket-4", "vx300s_left/10_left_gripper_finger") in all_contact_pairs
+
+ peg_touch_table = ("red_peg", "table") in all_contact_pairs
+ socket_touch_table = ("socket-1", "table") in all_contact_pairs or \
+ ("socket-2", "table") in all_contact_pairs or \
+ ("socket-3", "table") in all_contact_pairs or \
+ ("socket-4", "table") in all_contact_pairs
+ peg_touch_socket = ("red_peg", "socket-1") in all_contact_pairs or \
+ ("red_peg", "socket-2") in all_contact_pairs or \
+ ("red_peg", "socket-3") in all_contact_pairs or \
+ ("red_peg", "socket-4") in all_contact_pairs
+ pin_touched = ("red_peg", "pin") in all_contact_pairs
+
+ reward = 0
+ if touch_left_gripper and touch_right_gripper: # touch both
+ reward = 1
+ if touch_left_gripper and touch_right_gripper and (not peg_touch_table) and (not socket_touch_table): # grasp both
+ reward = 2
+ if peg_touch_socket and (not peg_touch_table) and (not socket_touch_table): # peg and socket touching
+ reward = 3
+ if pin_touched: # successful insertion
+ reward = 4
+ return reward
+
+
+def get_action(master_bot_left, master_bot_right):
+ action = np.zeros(14)
+ # arm action
+ action[:6] = master_bot_left.dxl.joint_states.position[:6]
+ action[7:7+6] = master_bot_right.dxl.joint_states.position[:6]
+ # gripper action
+ left_gripper_pos = master_bot_left.dxl.joint_states.position[7]
+ right_gripper_pos = master_bot_right.dxl.joint_states.position[7]
+ normalized_left_pos = MASTER_GRIPPER_POSITION_NORMALIZE_FN(left_gripper_pos)
+ normalized_right_pos = MASTER_GRIPPER_POSITION_NORMALIZE_FN(right_gripper_pos)
+ action[6] = normalized_left_pos
+ action[7+6] = normalized_right_pos
+ return action
+
+def test_sim_teleop():
+ """ Testing teleoperation in sim with ALOHA. Requires hardware and ALOHA repo to work. """
+ from interbotix_xs_modules.arm import InterbotixManipulatorXS
+
+ BOX_POSE[0] = [0.2, 0.5, 0.05, 1, 0, 0, 0]
+
+ # source of data
+ master_bot_left = InterbotixManipulatorXS(robot_model="wx250s", group_name="arm", gripper_name="gripper",
+ robot_name=f'master_left', init_node=True)
+ master_bot_right = InterbotixManipulatorXS(robot_model="wx250s", group_name="arm", gripper_name="gripper",
+ robot_name=f'master_right', init_node=False)
+
+ # setup the environment
+ env = make_sim_env('sim_transfer_cube')
+ ts = env.reset()
+ episode = [ts]
+ # setup plotting
+ ax = plt.subplot()
+ plt_img = ax.imshow(ts.observation['images']['angle'])
+ plt.ion()
+
+ for t in range(1000):
+ action = get_action(master_bot_left, master_bot_right)
+ ts = env.step(action)
+ episode.append(ts)
+
+ plt_img.set_data(ts.observation['images']['angle'])
+ plt.pause(0.02)
+
+
+if __name__ == '__main__':
+ test_sim_teleop()
+
diff --git a/docs/src/train_actuator_network.py b/docs/src/train_actuator_network.py
new file mode 100644
index 00000000..2e256781
--- /dev/null
+++ b/docs/src/train_actuator_network.py
@@ -0,0 +1,367 @@
+
+import numpy as np
+import torch
+from torch import nn
+from torch.nn import functional as F
+from torch.utils.data import DataLoader
+import os
+import h5py
+import math
+import wandb
+import pickle
+import matplotlib.pyplot as plt
+from copy import deepcopy
+from tqdm import tqdm
+from utils import find_all_hdf5
+from imitate_episodes import repeater, compute_dict_mean
+
+import IPython
+e = IPython.embed
+
+def main():
+ ### Idea
+ # input : o o o o o o # observed speed
+ # target: a a a a a a # commanded speed
+ # at test time, input desired speed profile and convert that to command
+
+ #########################################################
+ history_len = 50
+ future_len = 50
+ prediction_len = 50
+ batch_size_train = 16
+ batch_size_val = 16
+ lr = 1e-4
+ weight_decay = 1e-4
+
+ num_steps = 10000
+ validate_every = 2000
+ save_every = 2000
+
+ expr_name = f'actuator_network_test_{history_len}_{future_len}_{prediction_len}'
+ ckpt_dir = f'/scr/tonyzhao/train_logs/{expr_name}' if os.getlogin() == 'tonyzhao' else f'./ckpts/{expr_name}'
+ dataset_dir = '/scr/tonyzhao/compressed_datasets/aloha_mobile_fork/' if os.getlogin() == 'tonyzhao' else '/home/zfu/data/aloha_mobile_fork/'
+ #########################################################
+ assert(history_len + future_len >= prediction_len)
+ assert(future_len % prediction_len == 0)
+
+ wandb.init(project="mobile-aloha2", reinit=True, entity="mobile-aloha2", name=expr_name) # mode='disabled',
+
+ if not os.path.isdir(ckpt_dir):
+ os.makedirs(ckpt_dir)
+
+ dataset_path_list = find_all_hdf5(dataset_dir, skip_mirrored_data=True)
+ dataset_path_list = [n for n in dataset_path_list if 'replayed' in n]
+ num_episodes = len(dataset_path_list)
+
+ # obtain train test split
+ train_ratio = 0.9
+ shuffled_episode_ids = np.random.permutation(num_episodes)
+ train_episode_ids = shuffled_episode_ids[:int(train_ratio * num_episodes)]
+ val_episode_ids = shuffled_episode_ids[int(train_ratio * num_episodes):]
+ print(f'\n\nData from: {dataset_dir}\n- Train on {len(train_episode_ids)} episodes\n- Test on {len(val_episode_ids)} episodes\n\n')
+
+ # obtain normalization stats for qpos and action
+ # if load_pretrain:
+ # with open(os.path.join('/home/zfu/interbotix_ws/src/act/ckpts/pretrain_all', 'dataset_stats.pkl'), 'rb') as f:
+ # norm_stats = pickle.load(f)
+ # print('Loaded pretrain dataset stats')
+ norm_stats, all_episode_len = get_norm_stats(dataset_path_list)
+ train_episode_len = [all_episode_len[i] for i in train_episode_ids]
+ val_episode_len = [all_episode_len[i] for i in val_episode_ids]
+ assert(all_episode_len[0] % prediction_len == 0)
+
+ # save dataset stats
+ stats_path = os.path.join(ckpt_dir, f'actuator_net_stats.pkl')
+ with open(stats_path, 'wb') as f:
+ pickle.dump(norm_stats, f)
+
+ # construct dataset and dataloader
+ train_dataset = EpisodicDataset(dataset_path_list, norm_stats, train_episode_ids, train_episode_len, history_len, future_len, prediction_len)
+ val_dataset = EpisodicDataset(dataset_path_list, norm_stats, val_episode_ids, val_episode_len, history_len, future_len, prediction_len)
+ train_dataloader = DataLoader(train_dataset, batch_size=batch_size_train, shuffle=True, pin_memory=True, num_workers=1, prefetch_factor=1)
+ val_dataloader = DataLoader(val_dataset, batch_size=batch_size_val, shuffle=True, pin_memory=True, num_workers=1, prefetch_factor=1)
+
+ policy = ActuatorNetwork(prediction_len).cuda()
+ optimizer = torch.optim.AdamW(policy.parameters(), lr=lr, weight_decay=weight_decay)
+
+ n_parameters = sum(p.numel() for p in policy.parameters() if p.requires_grad)
+ print("number of parameters: %.2fM" % (n_parameters/1e6,))
+
+ min_val_loss = np.inf
+ best_ckpt_info = None
+ train_dataloader = repeater(train_dataloader)
+ for step in tqdm(range(num_steps+1)):
+ # validation
+ if step % validate_every == 0:
+ print('validating')
+
+ with torch.inference_mode():
+ policy.eval()
+ validation_dicts = []
+ for batch_idx, data in enumerate(val_dataloader):
+ observed_speed, commanded_speed = data
+ out, forward_dict = policy(observed_speed.cuda(), commanded_speed.cuda())
+ validation_dicts.append(forward_dict)
+
+ validation_summary = compute_dict_mean(validation_dicts)
+
+ epoch_val_loss = validation_summary['loss']
+ if epoch_val_loss < min_val_loss:
+ min_val_loss = epoch_val_loss
+ best_ckpt_info = (step, min_val_loss, deepcopy(policy.state_dict()))
+ for k in list(validation_summary.keys()):
+ validation_summary[f'val_{k}'] = validation_summary.pop(k)
+ wandb.log(validation_summary, step=step)
+ print(f'Val loss: {epoch_val_loss:.5f}')
+ summary_string = ''
+ for k, v in validation_summary.items():
+ summary_string += f'{k}: {v.item():.3f} '
+ print(summary_string)
+
+ visualize_prediction(dataset_path_list, val_episode_ids, policy, norm_stats, history_len, future_len, prediction_len, ckpt_dir, step, 'val')
+ visualize_prediction(dataset_path_list, train_episode_ids, policy, norm_stats, history_len, future_len, prediction_len, ckpt_dir, step, 'train')
+
+
+ # training
+ policy.train()
+ optimizer.zero_grad()
+ data = next(train_dataloader)
+ observed_speed, commanded_speed = data
+ out, forward_dict = policy(observed_speed.cuda(), commanded_speed.cuda())
+ # backward
+ loss = forward_dict['loss']
+ loss.backward()
+ optimizer.step()
+ wandb.log(forward_dict, step=step) # not great, make training 1-2% slower
+
+ if step % save_every == 0:
+ ckpt_path = os.path.join(ckpt_dir, f'actuator_net_step_{step}.ckpt')
+ torch.save(policy.state_dict(), ckpt_path)
+
+ ckpt_path = os.path.join(ckpt_dir, f'actuator_net_last.ckpt')
+ torch.save(policy.state_dict(), ckpt_path)
+
+ best_step, min_val_loss, best_state_dict = best_ckpt_info
+ ckpt_path = os.path.join(ckpt_dir, f'actuator_net_step_{best_step}.ckpt')
+ torch.save(best_state_dict, ckpt_path)
+ print(f'Training finished:\nval loss {min_val_loss:.6f} at step {best_step}')
+
+
+def visualize_prediction(dataset_path_list, episode_ids, policy, norm_stats, history_len, future_len, prediction_len, ckpt_dir, step, name):
+ num_vis = 2
+ episode_ids = episode_ids[:num_vis]
+ vis_path = [dataset_path_list[i] for i in episode_ids]
+
+ for i, dataset_path in enumerate(vis_path):
+ try:
+ with h5py.File(dataset_path, 'r') as root:
+ commanded_speed = root['/base_action'][()]
+ observed_speed = root['/obs_tracer'][()]
+ except Exception as ee:
+ print(f'Error loading {dataset_path} in get_norm_stats')
+ print(ee)
+ quit()
+
+ # commanded_speed = (commanded_speed - norm_stats["commanded_speed_mean"]) / norm_stats["commanded_speed_std"]
+ norm_observed_speed = (observed_speed - norm_stats["observed_speed_mean"]) / norm_stats["observed_speed_std"]
+ out_unnorm_fn = lambda x: (x * norm_stats["commanded_speed_std"]) + norm_stats["commanded_speed_mean"]
+
+ history_pad = np.zeros((history_len, 2))
+ future_pad = np.zeros((future_len, 2))
+ norm_observed_speed = np.concatenate([history_pad, norm_observed_speed, future_pad], axis=0)
+
+ episode_len = commanded_speed.shape[0]
+
+ all_pred = []
+ for t in range(0, episode_len, prediction_len):
+ offset_start_ts = t + history_len
+ policy_input = norm_observed_speed[offset_start_ts-history_len: offset_start_ts+future_len]
+ policy_input = torch.from_numpy(policy_input).float().unsqueeze(dim=0).cuda()
+ pred = policy(policy_input)
+ pred = pred.detach().cpu().numpy()[0]
+ all_pred += out_unnorm_fn(pred).tolist()
+ all_pred = np.array(all_pred)
+
+ plot_path = os.path.join(ckpt_dir, f'{name}{i}_step{step}_linear')
+ plt.figure()
+ plt.plot(commanded_speed[:, 0], label='commanded_speed_linear')
+ plt.plot(observed_speed[:, 0], label='observed_speed_linear')
+ plt.plot(all_pred[:, 0], label='pred_commanded_speed_linear')
+ # plot vertical grey dotted lines every prediction_len
+ for t in range(0, episode_len, prediction_len):
+ plt.axvline(t, linestyle='--', color='grey')
+ plt.legend()
+ plt.savefig(plot_path)
+ plt.close()
+
+ plot_path = os.path.join(ckpt_dir, f'{name}{i}_step{step}_angular')
+ plt.figure()
+ plt.plot(commanded_speed[:, 1], label='commanded_speed_angular')
+ plt.plot(observed_speed[:, 1], label='observed_speed_angular')
+ plt.plot(all_pred[:, 1], label='pred_commanded_speed_angular')
+ # plot vertical dotted lines every prediction_len
+ for t in range(0, episode_len, prediction_len):
+ plt.axvline(t, linestyle='--', color='grey')
+ plt.legend()
+ plt.savefig(plot_path)
+ plt.close()
+
+
+
+class ActuatorNetwork(nn.Module):
+
+ def __init__(self, prediction_len):
+ super().__init__()
+ d_model = 256
+ encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=8)
+ self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=3)
+ self.pe = PositionalEncoding(d_model)
+ self.in_proj = nn.Linear(2, d_model)
+ self.out_proj = nn.Linear(d_model, 2)
+ self.prediction_len = prediction_len
+
+ def forward(self, src, tgt=None):
+ if tgt is not None: # training time
+ # (batch, seq, feature) -> (seq, batch, feature)
+ src = self.in_proj(src)
+ src = torch.einsum('b s d -> s b d', src)
+ src = self.pe(src)
+ out = self.transformer(src)
+
+ tgt = torch.einsum('b s d -> s b d', tgt)
+ assert(self.prediction_len == tgt.shape[0])
+ out = out[0: self.prediction_len] # take first few tokens only for prediction
+ out = self.out_proj(out)
+
+ l2_loss = loss = F.mse_loss(out, tgt)
+ loss_dict = {'loss': l2_loss}
+ out = torch.einsum('s b d -> b s d', out)
+ return out, loss_dict
+ else:
+ src = self.in_proj(src)
+ src = torch.einsum('b s d -> s b d', src)
+ src = self.pe(src)
+ out = self.transformer(src)
+ out = out[0: self.prediction_len] # take first few tokens only for prediction
+ out = self.out_proj(out)
+ out = torch.einsum('s b d -> b s d', out)
+ return out
+
+
+
+class PositionalEncoding(nn.Module):
+ def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):
+ super().__init__()
+ self.dropout = nn.Dropout(p=dropout)
+ position = torch.arange(max_len).unsqueeze(1)
+ div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
+ pe = torch.zeros(max_len, 1, d_model)
+ pe[:, 0, 0::2] = torch.sin(position * div_term)
+ pe[:, 0, 1::2] = torch.cos(position * div_term)
+ self.register_buffer('pe', pe)
+
+ def forward(self, x):
+ """
+ Arguments:
+ x: Tensor, shape ``[seq_len, batch_size, embedding_dim]``
+ """
+ x = x + self.pe[:x.size(0)]
+ return self.dropout(x)
+
+def get_norm_stats(dataset_path_list):
+ all_commanded_speed = []
+ all_observed_speed = []
+ all_episode_len = []
+ for dataset_path in dataset_path_list:
+ try:
+ with h5py.File(dataset_path, 'r') as root:
+ commanded_speed = root['/base_action'][()]
+ observed_speed = root['/obs_tracer'][()]
+ except Exception as e:
+ print(f'Error loading {dataset_path} in get_norm_stats')
+ print(e)
+ quit()
+ all_commanded_speed.append(torch.from_numpy(commanded_speed))
+ all_observed_speed.append(torch.from_numpy(observed_speed))
+ all_episode_len.append(len(commanded_speed))
+ all_commanded_speed = torch.cat(all_commanded_speed, dim=0)
+ all_observed_speed = torch.cat(all_observed_speed, dim=0)
+
+ # normalize all_commanded_speed
+ commanded_speed_mean = all_commanded_speed.mean(dim=[0]).float()
+ commanded_speed_std = all_commanded_speed.std(dim=[0]).float()
+ commanded_speed_std = torch.clip(commanded_speed_std, 1e-2, np.inf) # clipping
+
+ # normalize all_observed_speed
+ observed_speed_mean = all_observed_speed.mean(dim=[0]).float()
+ observed_speed_std = all_observed_speed.std(dim=[0]).float()
+ observed_speed_std = torch.clip(observed_speed_std, 1e-2, np.inf) # clipping
+
+ stats = {"commanded_speed_mean": commanded_speed_mean.numpy(), "commanded_speed_std": commanded_speed_std.numpy(),
+ "observed_speed_mean": observed_speed_mean.numpy(), "observed_speed_std": observed_speed_std.numpy()}
+
+ return stats, all_episode_len
+
+
+class EpisodicDataset(torch.utils.data.Dataset):
+ def __init__(self, dataset_path_list, norm_stats, episode_ids, episode_len, history_len, future_len, prediction_len):
+ super(EpisodicDataset).__init__()
+ self.episode_ids = episode_ids
+ self.dataset_path_list = dataset_path_list
+ self.norm_stats = norm_stats
+ self.episode_len = episode_len
+ self.cumulative_len = np.cumsum(self.episode_len)
+ self.max_episode_len = max(episode_len)
+ self.history_len = history_len
+ self.future_len = future_len
+ self.prediction_len = prediction_len
+ self.is_sim = False
+ self.history_pad = np.zeros((self.history_len, 2))
+ self.future_pad = np.zeros((self.future_len, 2))
+ self.prediction_pad = np.zeros((self.prediction_len, 2))
+ self.__getitem__(0) # initialize self.is_sim
+
+ def __len__(self):
+ return sum(self.episode_len)
+
+ def _locate_transition(self, index):
+ assert index < self.cumulative_len[-1]
+ episode_index = np.argmax(self.cumulative_len > index) # argmax returns first True index
+ start_ts = index - (self.cumulative_len[episode_index] - self.episode_len[episode_index])
+ episode_id = self.episode_ids[episode_index]
+ return episode_id, start_ts
+
+ def __getitem__(self, index):
+ episode_id, start_ts = self._locate_transition(index)
+ dataset_path = self.dataset_path_list[episode_id]
+ try:
+ # print(dataset_path)
+ with h5py.File(dataset_path, 'r') as root:
+ commanded_speed = root['/base_action'][()]
+ observed_speed = root['/obs_tracer'][()]
+ observed_speed = np.concatenate([self.history_pad, observed_speed, self.future_pad], axis=0)
+ commanded_speed = np.concatenate([commanded_speed, self.prediction_pad], axis=0)
+
+ offset_start_ts = start_ts + self.history_len
+ commanded_speed = commanded_speed[start_ts: start_ts+self.prediction_len]
+ observed_speed = observed_speed[offset_start_ts-self.history_len: offset_start_ts+self.future_len]
+
+ commanded_speed = torch.from_numpy(commanded_speed).float()
+ observed_speed = torch.from_numpy(observed_speed).float()
+
+ # normalize to mean 0 std 1
+ commanded_speed = (commanded_speed - self.norm_stats["commanded_speed_mean"]) / self.norm_stats["commanded_speed_std"]
+ observed_speed = (observed_speed - self.norm_stats["observed_speed_mean"]) / self.norm_stats["observed_speed_std"]
+
+ except:
+ print(f'Error loading {dataset_path} in __getitem__')
+ quit()
+
+ # print(image_data.dtype, qpos_data.dtype, action_data.dtype, is_pad.dtype)
+ return observed_speed, commanded_speed
+
+
+
+
+if __name__ == '__main__':
+ main()
diff --git a/docs/src/train_latent_model.py b/docs/src/train_latent_model.py
new file mode 100644
index 00000000..c8512171
--- /dev/null
+++ b/docs/src/train_latent_model.py
@@ -0,0 +1,470 @@
+import torch
+import numpy as np
+import os
+import pickle
+import argparse
+import matplotlib.pyplot as plt
+from copy import deepcopy
+from tqdm import tqdm
+from einops import rearrange
+import torch.nn.functional as F
+
+from constants import DT
+from constants import PUPPET_GRIPPER_JOINT_OPEN
+from utils import load_data # data functions
+from utils import sample_box_pose, sample_insertion_pose # robot functions
+from utils import compute_dict_mean, set_seed, detach_dict # helper functions
+from policy import ACTPolicy, CNNMLPPolicy
+from visualize_episodes import save_videos
+from detr.models.latent_model import Latent_Model_Transformer
+
+from sim_env import BOX_POSE
+
+import IPython
+e = IPython.embed
+
+def main(args):
+ set_seed(1)
+ # command line parameters
+ is_eval = args['eval']
+ ckpt_dir = args['ckpt_dir']
+ policy_class = args['policy_class']
+ onscreen_render = args['onscreen_render']
+ task_name = args['task_name']
+ batch_size_train = args['batch_size']
+ batch_size_val = args['batch_size']
+ num_epochs = args['num_epochs']
+
+ # get task parameters
+ is_sim = task_name[:4] == 'sim_'
+ if is_sim:
+ from constants import SIM_TASK_CONFIGS
+ task_config = SIM_TASK_CONFIGS[task_name]
+ else:
+ from aloha_scripts.constants import TASK_CONFIGS
+ task_config = TASK_CONFIGS[task_name]
+ dataset_dir = task_config['dataset_dir']
+ num_episodes = task_config['num_episodes']
+ episode_len = task_config['episode_len']
+ camera_names = task_config['camera_names']
+ name_filter = task_config.get('name_filter', lambda n: True)
+
+ # fixed parameters
+ state_dim = 14
+ lr_backbone = 1e-5
+ backbone = 'resnet18'
+ if policy_class == 'ACT':
+ enc_layers = 4
+ dec_layers = 7
+ nheads = 8
+ policy_config = {'lr': args['lr'],
+ 'num_queries': args['chunk_size'],
+ 'kl_weight': args['kl_weight'],
+ 'hidden_dim': args['hidden_dim'],
+ 'dim_feedforward': args['dim_feedforward'],
+ 'lr_backbone': lr_backbone,
+ 'backbone': backbone,
+ 'enc_layers': enc_layers,
+ 'dec_layers': dec_layers,
+ 'nheads': nheads,
+ 'camera_names': camera_names,
+ 'vq': True,
+ 'vq_class': args['vq_class'],
+ 'vq_dim': args['vq_dim'],
+ }
+ elif policy_class == 'CNNMLP':
+ policy_config = {'lr': args['lr'], 'lr_backbone': lr_backbone, 'backbone' : backbone, 'num_queries': 1,
+ 'camera_names': camera_names,}
+ else:
+ raise NotImplementedError
+
+ config = {
+ 'num_epochs': num_epochs,
+ 'ckpt_dir': ckpt_dir,
+ 'episode_len': episode_len,
+ 'state_dim': state_dim,
+ 'lr': args['lr'],
+ 'policy_class': policy_class,
+ 'onscreen_render': onscreen_render,
+ 'policy_config': policy_config,
+ 'task_name': task_name,
+ 'seed': args['seed'],
+ 'temporal_agg': args['temporal_agg'],
+ 'camera_names': camera_names,
+ 'real_robot': not is_sim
+ }
+
+ # if is_eval:
+ # ckpt_names = [f'policy_best.ckpt']
+ # results = []
+ # for ckpt_name in ckpt_names:
+ # success_rate, avg_return = eval_bc(config, ckpt_name, save_episode=True)
+ # results.append([ckpt_name, success_rate, avg_return])
+
+ # for ckpt_name, success_rate, avg_return in results:
+ # print(f'{ckpt_name}: {success_rate=} {avg_return=}')
+ # print()
+ # exit()
+
+ train_dataloader, val_dataloader, stats, _ = load_data(dataset_dir, name_filter, camera_names, batch_size_train, batch_size_val)
+
+ # save dataset stats
+ # if not os.path.isdir(ckpt_dir):
+ # os.makedirs(ckpt_dir)
+ # stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')
+ # with open(stats_path, 'wb') as f:
+ # pickle.dump(stats, f)
+
+ ckpt_name = f'policy_last.ckpt'
+ best_ckpt_info = train_bc(train_dataloader, val_dataloader, config, ckpt_name)
+ best_epoch, min_val_loss, best_state_dict = best_ckpt_info
+
+ # save best checkpoint
+ ckpt_path = os.path.join(ckpt_dir, f'latent_model_best.ckpt')
+ torch.save(best_state_dict, ckpt_path)
+ print(f'Best ckpt, val loss {min_val_loss:.6f} @ epoch{best_epoch}')
+
+
+def make_policy(policy_class, policy_config):
+ if policy_class == 'ACT':
+ policy = ACTPolicy(policy_config)
+ elif policy_class == 'CNNMLP':
+ policy = CNNMLPPolicy(policy_config)
+ else:
+ raise NotImplementedError
+ return policy
+
+
+# def make_optimizer(policy_class, policy):
+# if policy_class == 'ACT':
+# optimizer = policy.configure_optimizers()
+# elif policy_class == 'CNNMLP':
+# optimizer = policy.configure_optimizers()
+# else:
+# raise NotImplementedError
+# return optimizer
+
+
+def get_image(ts, camera_names):
+ curr_images = []
+ for cam_name in camera_names:
+ curr_image = rearrange(ts.observation['images'][cam_name], 'h w c -> c h w')
+ curr_images.append(curr_image)
+ curr_image = np.stack(curr_images, axis=0)
+ curr_image = torch.from_numpy(curr_image / 255.0).float().cuda().unsqueeze(0)
+ return curr_image
+
+
+# def eval_bc(config, ckpt_name, save_episode=True):
+# set_seed(1000)
+# ckpt_dir = config['ckpt_dir']
+# state_dim = config['state_dim']
+# real_robot = config['real_robot']
+# policy_class = config['policy_class']
+# onscreen_render = config['onscreen_render']
+# policy_config = config['policy_config']
+# camera_names = config['camera_names']
+# max_timesteps = config['episode_len']
+# task_name = config['task_name']
+# temporal_agg = config['temporal_agg']
+# onscreen_cam = 'angle'
+
+# # load policy and stats
+# ckpt_path = os.path.join(ckpt_dir, ckpt_name)
+# policy = make_policy(policy_class, policy_config)
+# loading_status = policy.load_state_dict(torch.load(ckpt_path))
+# print(loading_status)
+# policy.cuda()
+# policy.eval()
+# print(f'Loaded: {ckpt_path}')
+# stats_path = os.path.join(ckpt_dir, f'dataset_stats.pkl')
+# with open(stats_path, 'rb') as f:
+# stats = pickle.load(f)
+
+# pre_process = lambda s_qpos: (s_qpos - stats['qpos_mean']) / stats['qpos_std']
+# post_process = lambda a: a * stats['action_std'] + stats['action_mean']
+
+# # load environment
+# if real_robot:
+# from aloha_scripts.robot_utils import move_grippers # requires aloha
+# from aloha_scripts.real_env import make_real_env # requires aloha
+# env = make_real_env(init_node=True)
+# env_max_reward = 0
+# else:
+# from sim_env import make_sim_env
+# env = make_sim_env(task_name)
+# env_max_reward = env.task.max_reward
+
+# query_frequency = policy_config['num_queries']
+# if temporal_agg:
+# query_frequency = 1
+# num_queries = policy_config['num_queries']
+
+# max_timesteps = int(max_timesteps * 1) # may increase for real-world tasks
+
+# num_rollouts = 50
+# episode_returns = []
+# highest_rewards = []
+# for rollout_id in range(num_rollouts):
+# rollout_id += 0
+# ### set task
+# if 'sim_transfer_cube' in task_name:
+# BOX_POSE[0] = sample_box_pose() # used in sim reset
+# elif 'sim_insertion' in task_name:
+# BOX_POSE[0] = np.concatenate(sample_insertion_pose()) # used in sim reset
+
+# ts = env.reset()
+
+# ### onscreen render
+# if onscreen_render:
+# ax = plt.subplot()
+# plt_img = ax.imshow(env._physics.render(height=480, width=640, camera_id=onscreen_cam))
+# plt.ion()
+
+# ### evaluation loop
+# if temporal_agg:
+# all_time_actions = torch.zeros([max_timesteps, max_timesteps+num_queries, state_dim]).cuda()
+
+# qpos_history = torch.zeros((1, max_timesteps, state_dim)).cuda()
+# image_list = [] # for visualization
+# qpos_list = []
+# target_qpos_list = []
+# rewards = []
+# with torch.inference_mode():
+# for t in range(max_timesteps):
+# ### update onscreen render and wait for DT
+# if onscreen_render:
+# image = env._physics.render(height=480, width=640, camera_id=onscreen_cam)
+# plt_img.set_data(image)
+# plt.pause(DT)
+
+# ### process previous timestep to get qpos and image_list
+# obs = ts.observation
+# if 'images' in obs:
+# image_list.append(obs['images'])
+# else:
+# image_list.append({'main': obs['image']})
+# qpos_numpy = np.array(obs['qpos'])
+# qpos = pre_process(qpos_numpy)
+# qpos = torch.from_numpy(qpos).float().cuda().unsqueeze(0)
+# qpos_history[:, t] = qpos
+# curr_image = get_image(ts, camera_names)
+
+# ### query policy
+# if config['policy_class'] == "ACT":
+# if t % query_frequency == 0:
+# all_actions = policy(qpos, curr_image)
+# if temporal_agg:
+# all_time_actions[[t], t:t+num_queries] = all_actions
+# actions_for_curr_step = all_time_actions[:, t]
+# actions_populated = torch.all(actions_for_curr_step != 0, axis=1)
+# actions_for_curr_step = actions_for_curr_step[actions_populated]
+# k = 0.01
+# exp_weights = np.exp(-k * np.arange(len(actions_for_curr_step)))
+# exp_weights = exp_weights / exp_weights.sum()
+# exp_weights = torch.from_numpy(exp_weights).cuda().unsqueeze(dim=1)
+# raw_action = (actions_for_curr_step * exp_weights).sum(dim=0, keepdim=True)
+# else:
+# raw_action = all_actions[:, t % query_frequency]
+# elif config['policy_class'] == "CNNMLP":
+# raw_action = policy(qpos, curr_image)
+# else:
+# raise NotImplementedError
+
+# ### post-process actions
+# raw_action = raw_action.squeeze(0).cpu().numpy()
+# action = post_process(raw_action)
+# target_qpos = action
+
+# ### step the environment
+# ts = env.step(target_qpos)
+
+# ### for visualization
+# qpos_list.append(qpos_numpy)
+# target_qpos_list.append(target_qpos)
+# rewards.append(ts.reward)
+
+# plt.close()
+# if real_robot:
+# move_grippers([env.puppet_bot_left, env.puppet_bot_right], [PUPPET_GRIPPER_JOINT_OPEN] * 2, move_time=0.5) # open
+# pass
+
+# rewards = np.array(rewards)
+# episode_return = np.sum(rewards[rewards!=None])
+# episode_returns.append(episode_return)
+# episode_highest_reward = np.max(rewards)
+# highest_rewards.append(episode_highest_reward)
+# print(f'Rollout {rollout_id}\n{episode_return=}, {episode_highest_reward=}, {env_max_reward=}, Success: {episode_highest_reward==env_max_reward}')
+
+# if save_episode:
+# save_videos(image_list, DT, video_path=os.path.join(ckpt_dir, f'video{rollout_id}.mp4'))
+
+# success_rate = np.mean(np.array(highest_rewards) == env_max_reward)
+# avg_return = np.mean(episode_returns)
+# summary_str = f'\nSuccess rate: {success_rate}\nAverage return: {avg_return}\n\n'
+# for r in range(env_max_reward+1):
+# more_or_equal_r = (np.array(highest_rewards) >= r).sum()
+# more_or_equal_r_rate = more_or_equal_r / num_rollouts
+# summary_str += f'Reward >= {r}: {more_or_equal_r}/{num_rollouts} = {more_or_equal_r_rate*100}%\n'
+
+# print(summary_str)
+
+# # save success rate to txt
+# result_file_name = 'result_' + ckpt_name.split('.')[0] + '.txt'
+# with open(os.path.join(ckpt_dir, result_file_name), 'w') as f:
+# f.write(summary_str)
+# f.write(repr(episode_returns))
+# f.write('\n\n')
+# f.write(repr(highest_rewards))
+
+# return success_rate, avg_return
+
+
+def forward_pass(data, policy, latent_model):
+ image_data, qpos_data, action_data, is_pad = data
+ image_data, qpos_data, action_data, is_pad = image_data.cuda(), qpos_data.cuda(), action_data.cuda(), is_pad.cuda()
+ forward_dict = {}
+ gt_labels = policy.vq_encode(qpos_data, action_data, is_pad)
+ inputs = torch.cat([torch.zeros_like(gt_labels)[:, [0]], gt_labels[:, :-1]], dim=1)
+ output_logits = latent_model(inputs)
+ ce_loss = F.cross_entropy(output_logits, gt_labels)
+
+ with torch.no_grad():
+ output_labels = F.one_hot(torch.argmax(output_logits, dim=-1), num_classes=gt_labels.shape[-1]).float()
+ # output_latents = F.softmax(output_logits, dim=-1)
+ l1_error = F.l1_loss(output_labels, gt_labels, reduction='mean')
+ # l1_errors = []
+ # for i in range(l1_errors.shape[1]):
+ # l1_errors.append(torch.mean(l1_errors[:, i]).item())
+
+ forward_dict['loss'] = ce_loss
+ forward_dict['l1_error'] = l1_error
+
+ return forward_dict
+
+
+def train_bc(train_dataloader, val_dataloader, config, ckpt_name):
+ num_epochs = config['num_epochs']
+ ckpt_dir = config['ckpt_dir']
+ seed = config['seed']
+ policy_class = config['policy_class']
+ policy_config = config['policy_config']
+
+ set_seed(seed)
+
+ vq_dim = config['policy_config']['vq_dim']
+ vq_class = config['policy_config']['vq_class']
+ latent_model = Latent_Model_Transformer(vq_dim, vq_dim, vq_class)
+ latent_model.cuda()
+
+ ckpt_path = os.path.join(ckpt_dir, ckpt_name)
+ policy = make_policy(policy_class, policy_config)
+ loading_status = policy.load_state_dict(torch.load(ckpt_path))
+ policy.eval()
+ policy.cuda()
+
+ optimizer = torch.optim.AdamW(latent_model.parameters(), lr=config['lr'])
+
+ train_history = []
+ validation_history = []
+ min_val_loss = np.inf
+ best_ckpt_info = None
+ for epoch in tqdm(range(num_epochs)):
+ print(f'\nEpoch {epoch}')
+ # validation
+ with torch.inference_mode():
+ latent_model.eval()
+ epoch_dicts = []
+ for batch_idx, data in enumerate(val_dataloader):
+ forward_dict = forward_pass(data, policy, latent_model)
+ epoch_dicts.append(forward_dict)
+ epoch_summary = compute_dict_mean(epoch_dicts)
+ validation_history.append(epoch_summary)
+
+ epoch_val_loss = epoch_summary['loss']
+ if epoch_val_loss < min_val_loss:
+ min_val_loss = epoch_val_loss
+ best_ckpt_info = (epoch, min_val_loss, deepcopy(latent_model.state_dict()))
+ print(f'Val loss: {epoch_val_loss:.5f}')
+ summary_string = ''
+ for k, v in epoch_summary.items():
+ summary_string += f'{k}: {v.item():.3f} '
+ print(summary_string)
+
+ # training
+ optimizer.zero_grad()
+ for batch_idx, data in enumerate(train_dataloader):
+ forward_dict = forward_pass(data, policy, latent_model)
+ # backward
+ loss = forward_dict['loss']
+ loss.backward()
+ optimizer.step()
+ optimizer.zero_grad()
+ train_history.append(detach_dict(forward_dict))
+ epoch_summary = compute_dict_mean(train_history[(batch_idx+1)*epoch:(batch_idx+1)*(epoch+1)])
+ epoch_train_loss = epoch_summary['loss']
+ print(f'Train loss: {epoch_train_loss:.5f}')
+ summary_string = ''
+ for k, v in epoch_summary.items():
+ summary_string += f'{k}: {v.item():.3f} '
+ print(summary_string)
+
+ if epoch % 100 == 0:
+ ckpt_path = os.path.join(ckpt_dir, f'latent_model_epoch_{epoch}_seed_{seed}.ckpt')
+ torch.save(latent_model.state_dict(), ckpt_path)
+ plot_history(train_history, validation_history, epoch, ckpt_dir, seed)
+
+ ckpt_path = os.path.join(ckpt_dir, f'latent_model_last.ckpt')
+ torch.save(latent_model.state_dict(), ckpt_path)
+
+ best_epoch, min_val_loss, best_state_dict = best_ckpt_info
+ ckpt_path = os.path.join(ckpt_dir, f'latent_model_epoch_{best_epoch}_seed_{seed}.ckpt')
+ torch.save(best_state_dict, ckpt_path)
+ print(f'Training finished:\nSeed {seed}, val loss {min_val_loss:.6f} at epoch {best_epoch}')
+
+ # save training curves
+ plot_history(train_history, validation_history, num_epochs, ckpt_dir, seed)
+
+ return best_ckpt_info
+
+
+def plot_history(train_history, validation_history, num_epochs, ckpt_dir, seed):
+ # save training curves
+ for key in train_history[0]:
+ plot_path = os.path.join(ckpt_dir, f'latent_model_val_{key}_seed_{seed}.png')
+ plt.figure()
+ train_values = [summary[key].item() for summary in train_history]
+ val_values = [summary[key].item() for summary in validation_history]
+ plt.plot(np.linspace(0, num_epochs-1, len(train_history)), train_values, label='train')
+ plt.plot(np.linspace(0, num_epochs-1, len(validation_history)), val_values, label='validation')
+ # plt.ylim([-0.1, 1])
+ plt.tight_layout()
+ plt.legend()
+ plt.title(key)
+ plt.savefig(plot_path)
+ print(f'Saved plots to {ckpt_dir}')
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--eval', action='store_true')
+ parser.add_argument('--onscreen_render', action='store_true')
+ parser.add_argument('--ckpt_dir', action='store', type=str, help='ckpt_dir', required=True)
+ parser.add_argument('--policy_class', action='store', type=str, help='policy_class, capitalize', required=True)
+ parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)
+ parser.add_argument('--batch_size', action='store', type=int, help='batch_size', required=True)
+ parser.add_argument('--seed', action='store', type=int, help='seed', required=True)
+ parser.add_argument('--num_epochs', action='store', type=int, help='num_epochs', required=True)
+ parser.add_argument('--lr', action='store', type=float, help='lr', required=True)
+
+ # for ACT
+ parser.add_argument('--kl_weight', action='store', type=int, help='KL Weight', required=False)
+ parser.add_argument('--chunk_size', action='store', type=int, help='chunk_size', required=False)
+ parser.add_argument('--hidden_dim', action='store', type=int, help='hidden_dim', required=False)
+ parser.add_argument('--dim_feedforward', action='store', type=int, help='dim_feedforward', required=False)
+ parser.add_argument('--temporal_agg', action='store_true')
+ parser.add_argument('--use_vq', action='store_true')
+ parser.add_argument('--vq_class', action='store', type=int, help='vq_class')
+ parser.add_argument('--vq_dim', action='store', type=int, help='vq_dim')
+
+ main(vars(parser.parse_args()))
diff --git a/docs/src/truncate_data.py b/docs/src/truncate_data.py
new file mode 100644
index 00000000..8a7586b5
--- /dev/null
+++ b/docs/src/truncate_data.py
@@ -0,0 +1,158 @@
+"""
+Example usage:
+$ python3 script/compress_data.py --dataset_dir /scr/lucyshi/dataset/aloha_test
+"""
+import os
+import h5py
+import cv2
+import numpy as np
+import argparse
+from tqdm import tqdm
+
+# Constants
+DT = 0.02
+JOINT_NAMES = ["waist", "shoulder", "elbow", "forearm_roll", "wrist_angle", "wrist_rotate"]
+STATE_NAMES = JOINT_NAMES + ["gripper"]
+TRUNCATE_LEN = 2250
+
+
+def compress_dataset(input_dataset_path, output_dataset_path):
+ # Check if output path exists
+ if os.path.exists(output_dataset_path):
+ print(f"The file {output_dataset_path} already exists. Exiting...")
+ return
+
+ # Load the uncompressed dataset
+ with h5py.File(input_dataset_path, 'r') as infile:
+ # Create the compressed dataset
+ with h5py.File(output_dataset_path, 'w') as outfile:
+
+ outfile.attrs['sim'] = infile.attrs['sim']
+ outfile.attrs['compress'] = True
+
+ # Copy non-image data directly
+ for key in infile.keys():
+ if key != 'observations' and key != 'compress_len':
+ data = infile[key][:TRUNCATE_LEN]
+ out_data = outfile.create_dataset(key, (TRUNCATE_LEN, data.shape[1]))
+ out_data[:] = data
+
+ data_compress_len = infile['compress_len']
+ out_data_compress_len = outfile.create_dataset('compress_len', data_compress_len.shape)
+ out_data_compress_len[:] = data_compress_len
+
+ # Create observation group in the output
+ obs_group = infile['observations']
+ out_obs_group = outfile.create_group('observations')
+ for key in obs_group.keys():
+ if key != 'images':
+ data = obs_group[key][:TRUNCATE_LEN]
+ out_data = out_obs_group.create_dataset(key, (TRUNCATE_LEN, data.shape[1]))
+ out_data[:] = data
+
+ image_group = obs_group['images']
+ out_image_group = out_obs_group.create_group('images')
+
+ for cam_name in image_group.keys():
+ data = image_group[cam_name][:TRUNCATE_LEN]
+ out_data = out_image_group.create_dataset(cam_name, (TRUNCATE_LEN, data.shape[1]), dtype='uint8')
+ out_data[:] = data
+
+
+ print(f"Truncated dataset saved to {output_dataset_path}")
+
+
+def save_videos(video, dt, video_path=None):
+ if isinstance(video, list):
+ cam_names = list(video[0].keys())
+ h, w, _ = video[0][cam_names[0]].shape
+ w = w * len(cam_names)
+ fps = int(1/dt)
+ out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
+ # bitrate = 1000000
+ # out.set(cv2.VIDEOWRITER_PROP_BITRATE, bitrate)
+ for ts, image_dict in enumerate(video):
+ images = []
+ for cam_name in cam_names:
+ image = image_dict[cam_name]
+ image = image[:, :, [2, 1, 0]] # swap B and R channel
+ images.append(image)
+ images = np.concatenate(images, axis=1)
+ out.write(images)
+ out.release()
+ print(f'Saved video to: {video_path}')
+ elif isinstance(video, dict):
+ cam_names = list(video.keys())
+ # Remove depth images
+ cam_names = [cam_name for cam_name in cam_names if '_depth' not in cam_name]
+ all_cam_videos = []
+ for cam_name in cam_names:
+ all_cam_videos.append(video[cam_name])
+ all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension
+
+ n_frames, h, w, _ = all_cam_videos.shape
+ fps = int(1 / dt)
+ out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
+ for t in range(n_frames):
+ image = all_cam_videos[t]
+ image = image[:, :, [2, 1, 0]] # swap B and R channel
+ out.write(image)
+ out.release()
+ print(f'Saved video to: {video_path}')
+
+
+def load_and_save_first_episode_video(dataset_dir, video_path):
+ dataset_name = 'episode_0'
+ _, _, _, _, image_dict = load_hdf5(dataset_dir, dataset_name)
+ save_videos(image_dict, DT, video_path=video_path)
+
+
+def load_hdf5(dataset_dir, dataset_name):
+ dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')
+ if not os.path.isfile(dataset_path):
+ print(f'Dataset does not exist at \n{dataset_path}\n')
+ exit()
+
+ with h5py.File(dataset_path, 'r') as root:
+ compressed = root.attrs.get('compress', False)
+ image_dict = dict()
+ for cam_name in root[f'/observations/images/'].keys():
+ image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]
+ if compressed:
+ compress_len = root['/compress_len'][()]
+
+ if compressed:
+ for cam_id, cam_name in enumerate(image_dict.keys()):
+ padded_compressed_image_list = image_dict[cam_name]
+ image_list = []
+ for frame_id, padded_compressed_image in enumerate(padded_compressed_image_list):
+ image_len = int(compress_len[cam_id, frame_id])
+ compressed_image = padded_compressed_image
+ image = cv2.imdecode(compressed_image, 1)
+ image_list.append(image)
+ image_dict[cam_name] = image_list
+
+ return None, None, None, None, image_dict # Return only the image dict for this application
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description="Compress all HDF5 datasets in a directory.")
+ parser.add_argument('--dataset_dir', action='store', type=str, required=True, help='Directory containing the uncompressed datasets.')
+
+ args = parser.parse_args()
+
+ output_dataset_dir = args.dataset_dir + '_truncated'
+ os.makedirs(output_dataset_dir, exist_ok=True)
+
+ # Iterate over each file in the directory
+ for filename in tqdm(os.listdir(args.dataset_dir), desc="Truncating data"):
+ if filename.endswith('.hdf5'):
+ input_path = os.path.join(args.dataset_dir, filename)
+ output_path = os.path.join(output_dataset_dir, filename)
+ compress_dataset(input_path, output_path)
+
+ # After processing all datasets, load and save the video for the first episode
+ print(f'Saving video for episode 0 in {output_dataset_dir}')
+ video_path = os.path.join(output_dataset_dir, 'episode_0_video.mp4')
+ load_and_save_first_episode_video(output_dataset_dir, video_path)
+
diff --git a/docs/src/utils.py b/docs/src/utils.py
new file mode 100644
index 00000000..d65cd40c
--- /dev/null
+++ b/docs/src/utils.py
@@ -0,0 +1,360 @@
+import numpy as np
+import torch
+import os
+import h5py
+import pickle
+import fnmatch
+import cv2
+from time import time
+from torch.utils.data import TensorDataset, DataLoader
+import torchvision.transforms as transforms
+
+import IPython
+e = IPython.embed
+
+def flatten_list(l):
+ return [item for sublist in l for item in sublist]
+
+class EpisodicDataset(torch.utils.data.Dataset):
+ def __init__(self, dataset_path_list, camera_names, norm_stats, episode_ids, episode_len, chunk_size, policy_class):
+ super(EpisodicDataset).__init__()
+ self.episode_ids = episode_ids
+ self.dataset_path_list = dataset_path_list
+ self.camera_names = camera_names
+ self.norm_stats = norm_stats
+ self.episode_len = episode_len
+ self.chunk_size = chunk_size
+ self.cumulative_len = np.cumsum(self.episode_len)
+ self.max_episode_len = max(episode_len)
+ self.policy_class = policy_class
+ if self.policy_class == 'Diffusion':
+ self.augment_images = True
+ else:
+ self.augment_images = False
+ self.transformations = None
+ self.__getitem__(0) # initialize self.is_sim and self.transformations
+ self.is_sim = False
+
+ # def __len__(self):
+ # return sum(self.episode_len)
+
+ def _locate_transition(self, index):
+ assert index < self.cumulative_len[-1]
+ episode_index = np.argmax(self.cumulative_len > index) # argmax returns first True index
+ start_ts = index - (self.cumulative_len[episode_index] - self.episode_len[episode_index])
+ episode_id = self.episode_ids[episode_index]
+ return episode_id, start_ts
+
+ def __getitem__(self, index):
+ episode_id, start_ts = self._locate_transition(index)
+ dataset_path = self.dataset_path_list[episode_id]
+ try:
+ # print(dataset_path)
+ with h5py.File(dataset_path, 'r') as root:
+ try: # some legacy data does not have this attribute
+ is_sim = root.attrs['sim']
+ except:
+ is_sim = False
+ compressed = root.attrs.get('compress', False)
+ if '/base_action' in root:
+ base_action = root['/base_action'][()]
+ base_action = preprocess_base_action(base_action)
+ action = np.concatenate([root['/action'][()], base_action], axis=-1)
+ else:
+ action = root['/action'][()]
+ dummy_base_action = np.zeros([action.shape[0], 2])
+ action = np.concatenate([action, dummy_base_action], axis=-1)
+ original_action_shape = action.shape
+ episode_len = original_action_shape[0]
+ # get observation at start_ts only
+ qpos = root['/observations/qpos'][start_ts]
+ qvel = root['/observations/qvel'][start_ts]
+ image_dict = dict()
+ for cam_name in self.camera_names:
+ image_dict[cam_name] = root[f'/observations/images/{cam_name}'][start_ts]
+
+ if compressed:
+ for cam_name in image_dict.keys():
+ decompressed_image = cv2.imdecode(image_dict[cam_name], 1)
+ image_dict[cam_name] = np.array(decompressed_image)
+
+ # get all actions after and including start_ts
+ if is_sim:
+ action = action[start_ts:]
+ action_len = episode_len - start_ts
+ else:
+ action = action[max(0, start_ts - 1):] # hack, to make timesteps more aligned
+ action_len = episode_len - max(0, start_ts - 1) # hack, to make timesteps more aligned
+
+ # self.is_sim = is_sim
+ padded_action = np.zeros((self.max_episode_len, original_action_shape[1]), dtype=np.float32)
+ padded_action[:action_len] = action
+ is_pad = np.zeros(self.max_episode_len)
+ is_pad[action_len:] = 1
+
+ padded_action = padded_action[:self.chunk_size]
+ is_pad = is_pad[:self.chunk_size]
+
+ # new axis for different cameras
+ all_cam_images = []
+ for cam_name in self.camera_names:
+ all_cam_images.append(image_dict[cam_name])
+ all_cam_images = np.stack(all_cam_images, axis=0)
+
+ # construct observations
+ image_data = torch.from_numpy(all_cam_images)
+ qpos_data = torch.from_numpy(qpos).float()
+ action_data = torch.from_numpy(padded_action).float()
+ is_pad = torch.from_numpy(is_pad).bool()
+
+ # channel last
+ image_data = torch.einsum('k h w c -> k c h w', image_data)
+
+ # augmentation
+ if self.transformations is None:
+ print('Initializing transformations')
+ original_size = image_data.shape[2:]
+ ratio = 0.95
+ self.transformations = [
+ transforms.RandomCrop(size=[int(original_size[0] * ratio), int(original_size[1] * ratio)]),
+ transforms.Resize(original_size, antialias=True),
+ transforms.RandomRotation(degrees=[-5.0, 5.0], expand=False),
+ transforms.ColorJitter(brightness=0.3, contrast=0.4, saturation=0.5) #, hue=0.08)
+ ]
+
+ if self.augment_images:
+ for transform in self.transformations:
+ image_data = transform(image_data)
+
+ # normalize image and change dtype to float
+ image_data = image_data / 255.0
+
+ if self.policy_class == 'Diffusion':
+ # normalize to [-1, 1]
+ action_data = ((action_data - self.norm_stats["action_min"]) / (self.norm_stats["action_max"] - self.norm_stats["action_min"])) * 2 - 1
+ else:
+ # normalize to mean 0 std 1
+ action_data = (action_data - self.norm_stats["action_mean"]) / self.norm_stats["action_std"]
+
+ qpos_data = (qpos_data - self.norm_stats["qpos_mean"]) / self.norm_stats["qpos_std"]
+
+ except:
+ print(f'Error loading {dataset_path} in __getitem__')
+ quit()
+
+ # print(image_data.dtype, qpos_data.dtype, action_data.dtype, is_pad.dtype)
+ return image_data, qpos_data, action_data, is_pad
+
+
+def get_norm_stats(dataset_path_list):
+ all_qpos_data = []
+ all_action_data = []
+ all_episode_len = []
+
+ for dataset_path in dataset_path_list:
+ try:
+ with h5py.File(dataset_path, 'r') as root:
+ qpos = root['/observations/qpos'][()]
+ qvel = root['/observations/qvel'][()]
+ if '/base_action' in root:
+ base_action = root['/base_action'][()]
+ base_action = preprocess_base_action(base_action)
+ action = np.concatenate([root['/action'][()], base_action], axis=-1)
+ else:
+ action = root['/action'][()]
+ dummy_base_action = np.zeros([action.shape[0], 2])
+ action = np.concatenate([action, dummy_base_action], axis=-1)
+ except Exception as e:
+ print(f'Error loading {dataset_path} in get_norm_stats')
+ print(e)
+ quit()
+ all_qpos_data.append(torch.from_numpy(qpos))
+ all_action_data.append(torch.from_numpy(action))
+ all_episode_len.append(len(qpos))
+ all_qpos_data = torch.cat(all_qpos_data, dim=0)
+ all_action_data = torch.cat(all_action_data, dim=0)
+
+ # normalize action data
+ action_mean = all_action_data.mean(dim=[0]).float()
+ action_std = all_action_data.std(dim=[0]).float()
+ action_std = torch.clip(action_std, 1e-2, np.inf) # clipping
+
+ # normalize qpos data
+ qpos_mean = all_qpos_data.mean(dim=[0]).float()
+ qpos_std = all_qpos_data.std(dim=[0]).float()
+ qpos_std = torch.clip(qpos_std, 1e-2, np.inf) # clipping
+
+ action_min = all_action_data.min(dim=0).values.float()
+ action_max = all_action_data.max(dim=0).values.float()
+
+ eps = 0.0001
+ stats = {"action_mean": action_mean.numpy(), "action_std": action_std.numpy(),
+ "action_min": action_min.numpy() - eps,"action_max": action_max.numpy() + eps,
+ "qpos_mean": qpos_mean.numpy(), "qpos_std": qpos_std.numpy(),
+ "example_qpos": qpos}
+
+ return stats, all_episode_len
+
+def find_all_hdf5(dataset_dir, skip_mirrored_data):
+ hdf5_files = []
+ for root, dirs, files in os.walk(dataset_dir):
+ for filename in fnmatch.filter(files, '*.hdf5'):
+ if 'features' in filename: continue
+ if skip_mirrored_data and 'mirror' in filename:
+ continue
+ hdf5_files.append(os.path.join(root, filename))
+ print(f'Found {len(hdf5_files)} hdf5 files')
+ return hdf5_files
+
+def BatchSampler(batch_size, episode_len_l, sample_weights):
+ sample_probs = np.array(sample_weights) / np.sum(sample_weights) if sample_weights is not None else None
+ sum_dataset_len_l = np.cumsum([0] + [np.sum(episode_len) for episode_len in episode_len_l])
+ while True:
+ batch = []
+ for _ in range(batch_size):
+ episode_idx = np.random.choice(len(episode_len_l), p=sample_probs)
+ step_idx = np.random.randint(sum_dataset_len_l[episode_idx], sum_dataset_len_l[episode_idx + 1])
+ batch.append(step_idx)
+ yield batch
+
+def load_data(dataset_dir_l, name_filter, camera_names, batch_size_train, batch_size_val, chunk_size, skip_mirrored_data=False, load_pretrain=False, policy_class=None, stats_dir_l=None, sample_weights=None, train_ratio=0.99):
+ if type(dataset_dir_l) == str:
+ dataset_dir_l = [dataset_dir_l]
+ dataset_path_list_list = [find_all_hdf5(dataset_dir, skip_mirrored_data) for dataset_dir in dataset_dir_l]
+ num_episodes_0 = len(dataset_path_list_list[0])
+ dataset_path_list = flatten_list(dataset_path_list_list)
+ dataset_path_list = [n for n in dataset_path_list if name_filter(n)]
+ num_episodes_l = [len(dataset_path_list) for dataset_path_list in dataset_path_list_list]
+ num_episodes_cumsum = np.cumsum(num_episodes_l)
+
+ # obtain train test split on dataset_dir_l[0]
+ shuffled_episode_ids_0 = np.random.permutation(num_episodes_0)
+ train_episode_ids_0 = shuffled_episode_ids_0[:int(train_ratio * num_episodes_0)]
+ val_episode_ids_0 = shuffled_episode_ids_0[int(train_ratio * num_episodes_0):]
+ train_episode_ids_l = [train_episode_ids_0] + [np.arange(num_episodes) + num_episodes_cumsum[idx] for idx, num_episodes in enumerate(num_episodes_l[1:])]
+ val_episode_ids_l = [val_episode_ids_0]
+ train_episode_ids = np.concatenate(train_episode_ids_l)
+ val_episode_ids = np.concatenate(val_episode_ids_l)
+ print(f'\n\nData from: {dataset_dir_l}\n- Train on {[len(x) for x in train_episode_ids_l]} episodes\n- Test on {[len(x) for x in val_episode_ids_l]} episodes\n\n')
+
+ # obtain normalization stats for qpos and action
+ # if load_pretrain:
+ # with open(os.path.join('/home/zfu/interbotix_ws/src/act/ckpts/pretrain_all', 'dataset_stats.pkl'), 'rb') as f:
+ # norm_stats = pickle.load(f)
+ # print('Loaded pretrain dataset stats')
+ _, all_episode_len = get_norm_stats(dataset_path_list)
+ train_episode_len_l = [[all_episode_len[i] for i in train_episode_ids] for train_episode_ids in train_episode_ids_l]
+ val_episode_len_l = [[all_episode_len[i] for i in val_episode_ids] for val_episode_ids in val_episode_ids_l]
+ train_episode_len = flatten_list(train_episode_len_l)
+ val_episode_len = flatten_list(val_episode_len_l)
+ if stats_dir_l is None:
+ stats_dir_l = dataset_dir_l
+ elif type(stats_dir_l) == str:
+ stats_dir_l = [stats_dir_l]
+ norm_stats, _ = get_norm_stats(flatten_list([find_all_hdf5(stats_dir, skip_mirrored_data) for stats_dir in stats_dir_l]))
+ print(f'Norm stats from: {stats_dir_l}')
+
+ batch_sampler_train = BatchSampler(batch_size_train, train_episode_len_l, sample_weights)
+ batch_sampler_val = BatchSampler(batch_size_val, val_episode_len_l, None)
+
+ # print(f'train_episode_len: {train_episode_len}, val_episode_len: {val_episode_len}, train_episode_ids: {train_episode_ids}, val_episode_ids: {val_episode_ids}')
+
+ # construct dataset and dataloader
+ train_dataset = EpisodicDataset(dataset_path_list, camera_names, norm_stats, train_episode_ids, train_episode_len, chunk_size, policy_class)
+ val_dataset = EpisodicDataset(dataset_path_list, camera_names, norm_stats, val_episode_ids, val_episode_len, chunk_size, policy_class)
+ train_num_workers = (8 if os.getlogin() == 'zfu' else 16) if train_dataset.augment_images else 2
+ val_num_workers = 8 if train_dataset.augment_images else 2
+ print(f'Augment images: {train_dataset.augment_images}, train_num_workers: {train_num_workers}, val_num_workers: {val_num_workers}')
+ train_dataloader = DataLoader(train_dataset, batch_sampler=batch_sampler_train, pin_memory=True, num_workers=train_num_workers, prefetch_factor=2)
+ val_dataloader = DataLoader(val_dataset, batch_sampler=batch_sampler_val, pin_memory=True, num_workers=val_num_workers, prefetch_factor=2)
+
+ return train_dataloader, val_dataloader, norm_stats, train_dataset.is_sim
+
+def calibrate_linear_vel(base_action, c=None):
+ if c is None:
+ c = 0.0 # 0.19
+ v = base_action[..., 0]
+ w = base_action[..., 1]
+ base_action = base_action.copy()
+ base_action[..., 0] = v - c * w
+ return base_action
+
+def smooth_base_action(base_action):
+ return np.stack([
+ np.convolve(base_action[:, i], np.ones(5)/5, mode='same') for i in range(base_action.shape[1])
+ ], axis=-1).astype(np.float32)
+
+def preprocess_base_action(base_action):
+ # base_action = calibrate_linear_vel(base_action)
+ base_action = smooth_base_action(base_action)
+
+ return base_action
+
+def postprocess_base_action(base_action):
+ linear_vel, angular_vel = base_action
+ linear_vel *= 1.0
+ angular_vel *= 1.0
+ # angular_vel = 0
+ # if np.abs(linear_vel) < 0.05:
+ # linear_vel = 0
+ return np.array([linear_vel, angular_vel])
+
+### env utils
+
+def sample_box_pose():
+ x_range = [0.0, 0.2]
+ y_range = [0.4, 0.6]
+ z_range = [0.05, 0.05]
+
+ ranges = np.vstack([x_range, y_range, z_range])
+ cube_position = np.random.uniform(ranges[:, 0], ranges[:, 1])
+
+ cube_quat = np.array([1, 0, 0, 0])
+ return np.concatenate([cube_position, cube_quat])
+
+def sample_insertion_pose():
+ # Peg
+ x_range = [0.1, 0.2]
+ y_range = [0.4, 0.6]
+ z_range = [0.05, 0.05]
+
+ ranges = np.vstack([x_range, y_range, z_range])
+ peg_position = np.random.uniform(ranges[:, 0], ranges[:, 1])
+
+ peg_quat = np.array([1, 0, 0, 0])
+ peg_pose = np.concatenate([peg_position, peg_quat])
+
+ # Socket
+ x_range = [-0.2, -0.1]
+ y_range = [0.4, 0.6]
+ z_range = [0.05, 0.05]
+
+ ranges = np.vstack([x_range, y_range, z_range])
+ socket_position = np.random.uniform(ranges[:, 0], ranges[:, 1])
+
+ socket_quat = np.array([1, 0, 0, 0])
+ socket_pose = np.concatenate([socket_position, socket_quat])
+
+ return peg_pose, socket_pose
+
+### helper functions
+
+def compute_dict_mean(epoch_dicts):
+ result = {k: None for k in epoch_dicts[0]}
+ num_items = len(epoch_dicts)
+ for k in result:
+ value_sum = 0
+ for epoch_dict in epoch_dicts:
+ value_sum += epoch_dict[k]
+ result[k] = value_sum / num_items
+ return result
+
+def detach_dict(d):
+ new_d = dict()
+ for k, v in d.items():
+ new_d[k] = v.detach()
+ return new_d
+
+def set_seed(seed):
+ torch.manual_seed(seed)
+ np.random.seed(seed)
diff --git a/docs/src/vinn_cache_feature.py b/docs/src/vinn_cache_feature.py
new file mode 100644
index 00000000..08636f73
--- /dev/null
+++ b/docs/src/vinn_cache_feature.py
@@ -0,0 +1,148 @@
+import torch
+import argparse
+import pathlib
+from torch import nn
+import torchvision
+import os
+import time
+import h5py
+import h5py
+from torchvision import models, transforms
+from PIL import Image
+from tqdm import tqdm
+import cv2
+import numpy as np
+
+import IPython
+e = IPython.embed
+
+
+def chunks(lst, n):
+ """Yield successive n-sized chunks from lst."""
+ for i in range(0, len(lst), n):
+ yield lst[i:i + n]
+
+def expand_greyscale(t):
+ return t.expand(3, -1, -1)
+
+
+def main(args):
+ #################################################
+ batch_size = 256
+ #################################################
+
+ ckpt_path = args.ckpt_path
+ dataset_dir = args.dataset_dir
+ ckpt_name = pathlib.PurePath(ckpt_path).name
+ dataset_name = ckpt_name.split('-')[1]
+ repr_type = ckpt_name.split('-')[0]
+ seed = int(ckpt_name.split('-')[-1][:-3])
+
+ if 'cotrain' in ckpt_name:
+ repr_type += '_cotrain'
+
+ episode_idxs = [int(name.split('_')[1].split('.')[0]) for name in os.listdir(dataset_dir) if ('.hdf5' in name) and ('features' not in name)]
+ episode_idxs.sort()
+ assert len(episode_idxs) == episode_idxs[-1] + 1 # no holes
+ num_episodes = len(episode_idxs)
+
+ feature_extractors = {}
+
+ for episode_idx in range(num_episodes):
+
+ # load all images
+ print(f'loading data')
+ dataset_path = os.path.join(dataset_dir, f'episode_{episode_idx}.hdf5')
+ with h5py.File(dataset_path, 'r') as root:
+ image_dict = {}
+ camera_names = list(root[f'/observations/images/'].keys())
+ print(f'Camera names: {camera_names}')
+ for cam_name in camera_names:
+ image = root[f'/observations/images/{cam_name}'][:]
+ uncompressed_image = []
+ for im in image:
+ im = np.array(cv2.imdecode(im, 1))
+ uncompressed_image.append(im)
+ image = np.stack(uncompressed_image, axis=0)
+
+ image_dict[cam_name] = image
+
+ print(f'loading model')
+ # load pretrain nets after cam names are known
+ if not feature_extractors:
+ for cam_name in camera_names:
+ resnet = torchvision.models.resnet18(pretrained=True)
+ loading_status = resnet.load_state_dict(torch.load(ckpt_path.replace('DUMMY', cam_name)))
+ print(cam_name, loading_status)
+ resnet = nn.Sequential(*list(resnet.children())[:-1])
+ resnet = resnet.cuda()
+ resnet.eval()
+ feature_extractors[cam_name] = resnet
+
+ # inference with resnet
+ feature_dict = {}
+ for cam_name, images in image_dict.items():
+ # Preprocess images
+ image_size = 120 # TODO NOTICE: reduced resolution
+ transform = transforms.Compose([
+ transforms.Resize(image_size), # will scale the image
+ transforms.CenterCrop(image_size),
+ transforms.ToTensor(),
+ transforms.Lambda(expand_greyscale),
+ transforms.Normalize(
+ mean=torch.tensor([0.485, 0.456, 0.406]),
+ std=torch.tensor([0.229, 0.224, 0.225])),
+ ])
+ processed_images = []
+ for image in tqdm(images):
+ image = Image.fromarray(image)
+ image = transform(image)
+ processed_images.append(image)
+ processed_images = torch.stack(processed_images).cuda()
+
+ # query the model
+ all_features = []
+ with torch.inference_mode():
+ for batch in chunks(processed_images, batch_size):
+ print('inference')
+ features = feature_extractors[cam_name](batch)
+ features = features.squeeze(axis=3).squeeze(axis=2)
+ all_features.append(features)
+ all_features = torch.cat(all_features, axis=0)
+ max_timesteps = all_features.shape[0]
+ feature_dict[cam_name] = all_features
+
+ # TODO START diagnostics
+ # first_image = images[0]
+ # first_processed_image = processed_images[0].cpu().numpy()
+ # first_feature = all_features[0].cpu().numpy()
+ # import numpy as np
+ # np.save('first_image.npy', first_image)
+ # np.save('first_processed_image.npy', first_processed_image)
+ # np.save('first_feature.npy', first_feature)
+ # torch.save(resnet.state_dict(), 'rn.ckpt')
+ # e()
+ # exit()
+ # TODO END diagnostics
+
+
+ # save
+ dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_idx}.hdf5')
+ print(dataset_path)
+ # HDF5
+ t0 = time.time()
+ with h5py.File(dataset_path, 'w', rdcc_nbytes=1024 ** 2 * 2) as root:
+ features = root.create_group('features')
+ for cam_name, array in feature_dict.items():
+ cam_feature = features.create_dataset(cam_name, (max_timesteps, 512))
+ features[cam_name][...] = array.cpu().numpy()
+ print(f'Saving: {time.time() - t0:.1f} secs\n')
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description='cache features')
+ parser.add_argument('--ckpt_path', type=str, required=True, help='ckpt_path')
+ parser.add_argument('--dataset_dir', type=str, required=True, help='dataset_dir')
+ args = parser.parse_args()
+
+ main(args)
\ No newline at end of file
diff --git a/docs/src/vinn_eval.py b/docs/src/vinn_eval.py
new file mode 100644
index 00000000..397ddab7
--- /dev/null
+++ b/docs/src/vinn_eval.py
@@ -0,0 +1,336 @@
+import torch
+from torch import nn
+import torch.nn.functional as F
+import numpy as np
+import h5py
+import pathlib
+import os
+import argparse
+import matplotlib.pyplot as plt
+from PIL import Image
+import torchvision
+from torchvision import transforms
+# from visualize_episodes import visualize_joints
+from utils import set_seed, sample_box_pose
+# from imitate_episodes import get_image
+from sim_env import BOX_POSE
+from constants import DT
+from imitate_episodes import save_videos
+from einops import rearrange
+import time
+
+DT = 0.02
+import IPython
+e = IPython.embed
+
+# modified from https://github.com/jyopari/VINN/blob/main/nearest-neighbor-eval/handle_nn.ipynb
+
+def calculate_nearest_neighbors(curr_feature, support_inputs, support_targets, k, state_weight):
+ has_skip = len(support_targets.shape) == 3
+ if has_skip: # when there is action skip
+ num_targets, skip, a_dim = support_targets.shape
+ support_targets = support_targets.view((num_targets, -1))
+
+ curr_vis_feature, curr_s_feature = curr_feature
+ support_vis_feature, support_s_feature = support_inputs
+
+ pairwise_dist_vis = torch.norm(curr_vis_feature - support_vis_feature, dim=1).unsqueeze(0)
+ pairwise_dist_s = torch.norm(curr_s_feature - support_s_feature, dim=1).unsqueeze(0)
+ pairwise_dist = pairwise_dist_vis + pairwise_dist_s * state_weight
+
+ sorted_dist, index = torch.sort(pairwise_dist, dim=1) # sort the support axis
+ permuted_support_targets = support_targets[index]
+ topk_dist = pairwise_dist[:, :k]
+ topk_support_targets = permuted_support_targets[:, :k]
+ weights = F.softmax(-topk_dist, dim=1)
+ weighted_support_targets = weights.unsqueeze(2) * topk_support_targets
+ prediction = torch.sum(weighted_support_targets, dim=1)
+
+ if has_skip:
+ num_predictions = prediction.shape[0]
+ prediction = prediction.reshape((num_predictions, skip, a_dim))
+
+ return prediction
+
+
+def main(args):
+ # TODO ######################
+ k = None # for scripted box transfer
+ skip = 100
+ real_robot = True
+ save_episode = True
+ # TODO ######################
+ onscreen_cam = 'main'
+ state_dim = 14
+ dataset_dir = args['dataset_dir']
+ onscreen_render = args['onscreen_render']
+ ckpt_dir = args['ckpt_dir']
+ model_dir = args['model_dir']
+ task_name = args['task_name']
+
+ if 'insertion' in task_name:
+ sim_episode_len = 400
+ env_max_reward = 4
+ ks = [None]
+ elif 'transfer_cube' in task_name:
+ sim_episode_len = 400
+ env_max_reward = 4
+ ks = [1, 1, 1]
+ if 'human' in dataset_dir:
+ state_weight = 5
+ else:
+ state_weight = 10
+ print(f'{state_weight=}')
+ elif task_name == 'ziploc_slide':
+ env_max_reward = 1
+ ks = [71]
+ state_weight = 0
+ elif task_name == 'aloha_mobile_wipe_wine':
+ sim_episode_len = 1300
+ env_max_reward = 4
+ ks = [2, 2, 2]
+ state_weight = 5
+ print(f'{state_weight=}')
+ else:
+ raise NotImplementedError
+
+ model_name = pathlib.PurePath(model_dir).name
+ seed = int(model_name.split('-')[-1][:-3])
+
+ repr_type = 'byol'
+ if 'cotrain' in model_name:
+ repr_type += '_cotrain'
+ e() # make sure!
+
+ k = ks[seed]
+
+ if real_robot:
+ BASE_DELAY = 15
+ query_freq = skip - BASE_DELAY
+
+ # load train data
+ vis_features = []
+ state_features = []
+ Y = []
+ for episode_id in range(0, 40):
+ dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')
+ with h5py.File(dataset_path, 'r') as root:
+ action = root['/action'][:]
+ base_action = root['/base_action'][:]
+ action = np.concatenate([action, base_action], axis=1)
+ camera_names = list(root[f'/observations/images/'].keys())
+
+ # Visual feature
+ all_cam_feature = []
+ for cam_name in camera_names:
+ feature_dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_id}.hdf5')
+ with h5py.File(feature_dataset_path, 'r') as root:
+ cam_feature = root[f'/features/{cam_name}'][:]
+ all_cam_feature.append(cam_feature)
+ vis_fea = np.concatenate(all_cam_feature, axis=1)
+
+ ## State feature
+ dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')
+ with h5py.File(dataset_path, 'r') as root:
+ s_fea = root['/observations/qpos'][:]
+
+ # stack actions together
+ eps_len = len(action)
+ indices = np.tile(np.arange(skip), eps_len).reshape(eps_len, skip) # each row is 0, 1, ... skip
+ offset = np.expand_dims(np.arange(eps_len), axis=1)
+ indices = indices + offset # row1: 0, 1, ... skip; row2: 1, 2, ... skip+1
+ # indices will exceed eps_len, thus clamp to eps_len-1
+ indices = np.clip(indices, 0, eps_len-1)
+ # stack action
+ action = action[indices] # new shape: eps_len, skip, a_dim
+
+ vis_features.append(vis_fea)
+ state_features.append(s_fea)
+ Y.append(action)
+
+ vis_features = np.concatenate(vis_features)
+ state_features = np.concatenate(state_features)
+ Y = np.concatenate(Y)
+ train_inputs = [torch.from_numpy(vis_features).cuda(), torch.from_numpy(state_features).cuda()]
+ train_targets = torch.from_numpy(Y).cuda()
+
+ set_seed(1000)
+ feature_extractors = {}
+ for cam_name in camera_names:
+ resnet = torchvision.models.resnet18(pretrained=True)
+ loading_status = resnet.load_state_dict(torch.load(model_dir.replace('DUMMY', cam_name)))
+ print(cam_name, loading_status)
+ resnet = nn.Sequential(*list(resnet.children())[:-1])
+ resnet = resnet.cuda()
+ resnet.eval()
+ feature_extractors[cam_name] = resnet
+
+
+
+ # load environment
+ if real_robot:
+ from aloha_scripts.real_env import make_real_env #### TODO TODO
+ env = make_real_env(init_node=True, setup_robots=True, setup_base=True)
+ max_timesteps = sim_episode_len
+ camera_names = ['cam_high', 'cam_left_wrist', 'cam_right_wrist']
+ else:
+ from sim_env import make_sim_env
+ env = make_sim_env(task_name)
+ max_timesteps = sim_episode_len
+
+
+ num_rollouts = 50
+ episode_returns = []
+ max_rewards = []
+ for rollout_id in range(num_rollouts):
+ ### set task
+ BOX_POSE[0] = sample_box_pose() # used in sim reset
+ ts = env.reset()
+
+ ### evaluation loop
+ qpos_history = torch.zeros((1, max_timesteps, state_dim)).cuda()
+ image_list = [] # for visualization
+ qpos_list = []
+ target_qpos_list = []
+ rewards = []
+ with torch.inference_mode():
+ for t in range(sim_episode_len):
+ start_time = time.time()
+ if t % 100 == 0: print(t)
+ if t % query_freq == 0:
+ ### process previous timestep to get qpos and image_list
+ obs = ts.observation
+ if 'images' in obs:
+ image_list.append(obs['images'])
+ else:
+ image_list.append({'main': obs['image']})
+ qpos_numpy = np.array(obs['qpos'])
+ # qpos = pre_process(qpos_numpy)
+ qpos = torch.from_numpy(qpos_numpy).float().cuda().unsqueeze(0)
+ qpos_history[:, t] = qpos
+ _, curr_image_raw = get_image(ts, camera_names)
+
+ image_size = 120
+ transform = transforms.Compose([
+ transforms.Resize(image_size), # will scale the image
+ transforms.CenterCrop(image_size),
+ transforms.ToTensor(),
+ transforms.Lambda(expand_greyscale),
+ transforms.Normalize(
+ mean=torch.tensor([0.485, 0.456, 0.406]),
+ std=torch.tensor([0.229, 0.224, 0.225])),
+ ])
+
+ all_cam_features = []
+ for cam_id, curr_image in enumerate(curr_image_raw):
+ curr_image = Image.fromarray(curr_image) # TODO only one camera
+ curr_image = transform(curr_image)
+ curr_image = curr_image.unsqueeze(dim=0).cuda()
+ curr_image_feature = feature_extractors[camera_names[cam_id]](curr_image)
+ curr_image_feature = curr_image_feature.squeeze(3).squeeze(2)
+ all_cam_features.append(curr_image_feature)
+ curr_image_feature = torch.cat(all_cam_features, dim=1)
+
+ ### Visual feature
+ # curr_feature = curr_image_feature
+
+ ### State feature
+ # curr_feature = qpos
+
+ ### Both features
+ curr_feature = [curr_image_feature, qpos]
+
+ action = calculate_nearest_neighbors(curr_feature, train_inputs, train_targets, k, state_weight) # TODO use this
+ action = action.squeeze(0).cpu().numpy()
+ action = np.concatenate([action[:-BASE_DELAY, :-2], action[BASE_DELAY:, -2:]], axis=1)
+ print(f'Query: {(time.time() - start_time):.3f}s')
+
+ curr_action = action[t % query_freq]
+ target_qpos = curr_action[:-2]
+ base_action = curr_action[-2:]
+
+ # ### SAFETY
+ # max_a = 0.05
+ # curr_qpos = qpos.squeeze().cpu().numpy()
+ # target_qpos = target_qpos.clip(curr_qpos - max_a, curr_qpos + max_a)
+ # ### SAFETY
+
+ ### step the environment
+ ts = env.step(target_qpos, base_action=base_action)
+ duration = time.time() - start_time
+ # print(f'{duration:.3f}')
+ time.sleep(max(0, DT - duration))
+
+ ### save things for visualization
+ qpos_list.append(qpos_numpy)
+ target_qpos_list.append(target_qpos)
+ rewards.append(ts.reward)
+
+ # if real_robot and t != 0 and t % 60 == 0:
+ # e()
+ plt.close()
+ if real_robot:
+ env.puppet_bot_left.dxl.robot_set_operating_modes("single", "gripper", "position")
+ env.puppet_bot_right.dxl.robot_set_operating_modes("single", "gripper", "position")
+ env.puppet_bot_left.dxl.robot_set_operating_modes("single", "gripper", "pwm")
+ env.puppet_bot_right.dxl.robot_set_operating_modes("single", "gripper", "pwm")
+
+ rewards = np.array(rewards)
+ episode_return = np.sum(rewards[rewards!=None])
+ episode_returns.append(episode_return)
+ max_reward = np.max(rewards)
+ max_rewards.append(max_reward)
+
+ print(f'{episode_return=}, {max_reward=}')
+ if save_episode:
+ save_videos(image_list, DT, video_path=os.path.join(ckpt_dir, f'video{rollout_id}.mp4'))
+ # visualize_joints(qpos_list, target_qpos_list, plot_path=os.path.join(ckpt_dir, f'qpos{rollout_id}.png'))
+ # visualize_joints(qpos_list, example_qpos, plot_path=os.path.join(ckpt_dir, f'qpos_reference{rollout_id}.png'), label_overwrite=("policy", "dataset"))
+
+ success_rate = np.mean(np.array(max_rewards) == env_max_reward)
+ avg_return = np.mean(episode_returns)
+ summary_str = f'\nSuccess rate: {success_rate}\nAverage return: {avg_return}\n\n'
+ for r in range(env_max_reward+1):
+ more_or_equal_r = (np.array(max_rewards) >= r).sum()
+ more_or_equal_r_rate = more_or_equal_r / num_rollouts
+ summary_str += f'Reward >= {r}: {more_or_equal_r}/{num_rollouts} = {more_or_equal_r_rate*100}%\n'
+
+ print(summary_str)
+
+ # save success rate to txt
+ result_file_name = f'result_{skip}_{k}' + '.txt'
+ with open(os.path.join(ckpt_dir, result_file_name), 'w') as f:
+ f.write(summary_str)
+ f.write(repr(episode_returns))
+ f.write('\n\n')
+ f.write(repr(max_rewards))
+
+ return success_rate, avg_return
+
+
+
+def get_image(ts, camera_names):
+ if 'images' in ts.observation:
+ curr_images = []
+ for cam_name in camera_names:
+ curr_image = rearrange(ts.observation['images'][cam_name], 'h w c -> c h w')
+ curr_images.append(curr_image)
+ curr_image_raw = np.stack(curr_images, axis=0)
+ else:
+ curr_image_raw = rearrange(ts.observation['image'], 'h w c -> c h w')
+ curr_image = torch.from_numpy(curr_image_raw / 255.0).float().cuda().unsqueeze(0)
+ curr_image_raw = rearrange(curr_image_raw, 'b c h w -> b h w c')
+ return curr_image, curr_image_raw
+
+
+def expand_greyscale(t):
+ return t.expand(3, -1, -1)
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--onscreen_render', action='store_true')
+ parser.add_argument('--dataset_dir', action='store', type=str, help='The text to parse.', required=True)
+ parser.add_argument('--model_dir', action='store', type=str, help='model_dir', required=True)
+ parser.add_argument('--task_name', action='store', type=str, help='task_name', required=True)
+ parser.add_argument('--ckpt_dir', action='store', type=str, help='The text to parse.', required=True)
+ main(vars(parser.parse_args()))
diff --git a/docs/src/vinn_select_k.py b/docs/src/vinn_select_k.py
new file mode 100644
index 00000000..b0e2b3c9
--- /dev/null
+++ b/docs/src/vinn_select_k.py
@@ -0,0 +1,134 @@
+import torch
+import torch.nn.functional as F
+import numpy as np
+import h5py
+import pathlib
+import os
+import argparse
+import matplotlib.pyplot as plt
+
+import IPython
+e = IPython.embed
+
+# modified from https://github.com/jyopari/VINN/blob/main/nearest-neighbor-eval/handle_nn.ipynb
+
+def calculate_nearest_neighbors(query_inputs, query_targets, support_inputs, support_targets, max_k):
+ with torch.no_grad():
+ pairwise_dist = []
+ for q_in in query_inputs:
+ diff = support_inputs - q_in.unsqueeze(0)
+ dist = torch.norm(diff, dim=1)
+ pairwise_dist.append(dist)
+ pairwise_dist = torch.stack(pairwise_dist)
+
+ sorted_dist, index = torch.sort(pairwise_dist, dim=1) # sort the support axis
+ permuted_support_targets = support_targets[index]
+ errors = []
+ for k in range(1, max_k):
+ topk_dist = pairwise_dist[:, :k]
+ topk_support_targets = permuted_support_targets[:, :k]
+ weights = F.softmax(-topk_dist, dim=1)
+ weighted_support_targets = weights.unsqueeze(2) * topk_support_targets
+ prediction = torch.sum(weighted_support_targets, dim=1)
+ error = F.mse_loss(prediction, query_targets)
+ errors.append(error)
+ return errors
+
+def chunks(lst, n):
+ """Yield successive n-sized chunks from lst."""
+ for i in range(0, len(lst), n):
+ yield lst[i:i + n]
+
+def main(args):
+ # TODO ######################
+ dataset_dir = args['dataset_dir']
+ ckpt_dir = args['ckpt_dir']
+ seed = 0
+ max_k = 400
+ batch_size = 100
+ # TODO ######################
+
+ repr_type = 'byol'
+ if 'cotrain' in ckpt_dir:
+ repr_type += '_cotrain'
+ e() # make sure!
+
+ if not os.path.isdir(ckpt_dir):
+ os.makedirs(ckpt_dir)
+
+ episode_idxs = [int(name.split('_')[1].split('.')[0]) for name in os.listdir(dataset_dir) if ('.hdf5' in name) and ('features' not in name)]
+ episode_idxs.sort()
+ assert len(episode_idxs) == episode_idxs[-1] + 1 # no holes
+ num_episodes = len(episode_idxs)
+ val_split = int(num_episodes * 0.8)
+
+ # load train data
+ X = []
+ Y = []
+ for episode_id in range(0, val_split):
+ dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')
+ with h5py.File(dataset_path, 'r') as root:
+ action = root['/action'][:]
+ camera_names = list(root[f'/observations/images/'].keys())
+
+ all_cam_feature = []
+ feature_dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_id}.hdf5')
+ with h5py.File(feature_dataset_path, 'r') as root:
+ for cam_name in camera_names:
+ cam_feature = root[f'/features/{cam_name}'][:]
+ all_cam_feature.append(cam_feature)
+ cam_feature = np.concatenate(all_cam_feature, axis=1)
+
+ X.append(cam_feature)
+ Y.append(action)
+
+ X = np.concatenate(X)
+ Y = np.concatenate(Y)
+ train_inputs = torch.from_numpy(X).cuda()
+ train_targets = torch.from_numpy(Y).cuda()
+ print(f'All features: {train_inputs.shape}')
+
+ # load test data
+ X = []
+ Y = []
+ for episode_id in range(val_split, num_episodes):
+ dataset_path = os.path.join(dataset_dir, f'episode_{episode_id}.hdf5')
+ with h5py.File(dataset_path, 'r') as root:
+ action = root['/action'][:]
+
+ all_cam_feature = []
+ feature_dataset_path = os.path.join(dataset_dir, f'{repr_type}_features_seed{seed}_episode_{episode_id}.hdf5')
+ with h5py.File(feature_dataset_path, 'r') as root:
+ for cam_name in camera_names:
+ cam_feature = root[f'/features/{cam_name}'][:]
+ all_cam_feature.append(cam_feature)
+ cam_feature = np.concatenate(all_cam_feature, axis=1)
+
+ X.append(cam_feature)
+ Y.append(action)
+
+ X = np.concatenate(X)
+ Y = np.concatenate(Y)
+ val_inputs = torch.from_numpy(X).cuda()
+ val_targets = torch.from_numpy(Y).cuda()
+
+ val_losses = []
+ for inputs, targets in zip(chunks(val_inputs, batch_size), chunks(val_targets, batch_size)):
+ val_loss = calculate_nearest_neighbors(inputs, targets, train_inputs, train_targets, max_k)
+ val_loss = torch.stack(val_loss)
+ val_losses.append(val_loss)
+ val_losses = torch.mean(torch.stack(val_losses), dim=0)
+ val_loss = val_losses
+
+ val_loss = torch.tensor(val_loss).cpu().numpy()
+ print(f'min val loss of {np.min(val_loss)} at k={np.argmin(val_loss)}')
+
+ plt.plot(np.arange(1, max_k), val_loss)
+ plt.savefig(os.path.join(ckpt_dir, f'k_select-seed{seed}.png'))
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--dataset_dir', action='store', type=str, help='The text to parse.', required=True)
+ parser.add_argument('--ckpt_dir', action='store', type=str, help='The text to parse.', required=True)
+ main(vars(parser.parse_args()))
\ No newline at end of file
diff --git a/docs/src/visualize_episodes.py b/docs/src/visualize_episodes.py
new file mode 100644
index 00000000..6caca0f4
--- /dev/null
+++ b/docs/src/visualize_episodes.py
@@ -0,0 +1,154 @@
+import os
+import numpy as np
+import cv2
+import h5py
+import argparse
+
+import matplotlib.pyplot as plt
+from constants import DT
+
+import IPython
+e = IPython.embed
+
+JOINT_NAMES = ["waist", "shoulder", "elbow", "forearm_roll", "wrist_angle", "wrist_rotate"]
+STATE_NAMES = JOINT_NAMES + ["gripper"]
+
+def load_hdf5(dataset_dir, dataset_name):
+ dataset_path = os.path.join(dataset_dir, dataset_name + '.hdf5')
+ if not os.path.isfile(dataset_path):
+ print(f'Dataset does not exist at \n{dataset_path}\n')
+ exit()
+
+ with h5py.File(dataset_path, 'r') as root:
+ is_sim = root.attrs['sim']
+ qpos = root['/observations/qpos'][()]
+ qvel = root['/observations/qvel'][()]
+ action = root['/action'][()]
+ image_dict = dict()
+ for cam_name in root[f'/observations/images/'].keys():
+ image_dict[cam_name] = root[f'/observations/images/{cam_name}'][()]
+
+ return qpos, qvel, action, image_dict
+
+def main(args):
+ dataset_dir = args['dataset_dir']
+ episode_idx = args['episode_idx']
+ ismirror = args['ismirror']
+ if ismirror:
+ dataset_name = f'mirror_episode_{episode_idx}'
+ else:
+ dataset_name = f'episode_{episode_idx}'
+
+ qpos, qvel, action, image_dict = load_hdf5(dataset_dir, dataset_name)
+ save_videos(image_dict, DT, video_path=os.path.join(dataset_dir, dataset_name + '_video.mp4'))
+ visualize_joints(qpos, action, plot_path=os.path.join(dataset_dir, dataset_name + '_qpos.png'))
+ # visualize_timestamp(t_list, dataset_path) # TODO addn timestamp back
+
+
+def save_videos(video, dt, video_path=None):
+ if isinstance(video, list):
+ cam_names = list(video[0].keys())
+ cam_names = sorted(cam_names)
+ h, w, _ = video[0][cam_names[0]].shape
+ w = w * len(cam_names)
+ fps = int(1/dt)
+ out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
+ for ts, image_dict in enumerate(video):
+ images = []
+ for cam_name in cam_names:
+ image = image_dict[cam_name]
+ image = image[:, :, [2, 1, 0]] # swap B and R channel
+ images.append(image)
+ images = np.concatenate(images, axis=1)
+ out.write(images)
+ out.release()
+ print(f'Saved video to: {video_path}')
+ elif isinstance(video, dict):
+ cam_names = list(video.keys())
+ cam_names = sorted(cam_names)
+ all_cam_videos = []
+ for cam_name in cam_names:
+ all_cam_videos.append(video[cam_name])
+ all_cam_videos = np.concatenate(all_cam_videos, axis=2) # width dimension
+
+ n_frames, h, w, _ = all_cam_videos.shape
+ fps = int(1 / dt)
+ out = cv2.VideoWriter(video_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))
+ for t in range(n_frames):
+ image = all_cam_videos[t]
+ image = image[:, :, [2, 1, 0]] # swap B and R channel
+ out.write(image)
+ out.release()
+ print(f'Saved video to: {video_path}')
+
+
+def visualize_joints(qpos_list, command_list, plot_path=None, ylim=None, label_overwrite=None):
+ if label_overwrite:
+ label1, label2 = label_overwrite
+ else:
+ label1, label2 = 'State', 'Command'
+
+ qpos = np.array(qpos_list) # ts, dim
+ command = np.array(command_list)
+ num_ts, num_dim = qpos.shape
+ h, w = 2, num_dim
+ num_figs = num_dim
+ fig, axs = plt.subplots(num_figs, 1, figsize=(w, h * num_figs))
+
+ # plot joint state
+ all_names = [name + '_left' for name in STATE_NAMES] + [name + '_right' for name in STATE_NAMES]
+ for dim_idx in range(num_dim):
+ ax = axs[dim_idx]
+ ax.plot(qpos[:, dim_idx], label=label1)
+ ax.set_title(f'Joint {dim_idx}: {all_names[dim_idx]}')
+ ax.legend()
+
+ # plot arm command
+ for dim_idx in range(num_dim):
+ ax = axs[dim_idx]
+ ax.plot(command[:, dim_idx], label=label2)
+ ax.legend()
+
+ if ylim:
+ for dim_idx in range(num_dim):
+ ax = axs[dim_idx]
+ ax.set_ylim(ylim)
+
+ plt.tight_layout()
+ plt.savefig(plot_path)
+ print(f'Saved qpos plot to: {plot_path}')
+ plt.close()
+
+def visualize_timestamp(t_list, dataset_path):
+ plot_path = dataset_path.replace('.pkl', '_timestamp.png')
+ h, w = 4, 10
+ fig, axs = plt.subplots(2, 1, figsize=(w, h*2))
+ # process t_list
+ t_float = []
+ for secs, nsecs in t_list:
+ t_float.append(secs + nsecs * 10E-10)
+ t_float = np.array(t_float)
+
+ ax = axs[0]
+ ax.plot(np.arange(len(t_float)), t_float)
+ ax.set_title(f'Camera frame timestamps')
+ ax.set_xlabel('timestep')
+ ax.set_ylabel('time (sec)')
+
+ ax = axs[1]
+ ax.plot(np.arange(len(t_float)-1), t_float[:-1] - t_float[1:])
+ ax.set_title(f'dt')
+ ax.set_xlabel('timestep')
+ ax.set_ylabel('time (sec)')
+
+ plt.tight_layout()
+ plt.savefig(plot_path)
+ print(f'Saved timestamp plot to: {plot_path}')
+ plt.close()
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--dataset_dir', action='store', type=str, help='Dataset dir.', required=True)
+ parser.add_argument('--episode_idx', action='store', type=int, help='Episode index.', required=False)
+ parser.add_argument('--ismirror', action='store_true')
+ main(vars(parser.parse_args()))
diff --git a/docs/tree.html b/docs/tree.html
new file mode 100644
index 00000000..bab940c6
--- /dev/null
+++ b/docs/tree.html
@@ -0,0 +1,172 @@
+
+
+
+
+
+
+
+
+ Project structure of: MarkFzp/act-plus-plus
+
+
+
+
+
+
+
+