make new droid idle filter default (#625)

kpertsch · web-flow · commit d469782f1c40 · 2025-09-02T10:53:46.000-07:00
diff --git a/README.md b/README.md
@@ -12,6 +12,7 @@ This is an experiment: $\pi_0$ was developed for our own robots, which differ fr
 
 ## Updates
 
+- [Sept 2025]: We have added an [improved idle filter](examples/droid/README_train.md#data-filtering) for DROID training.
 - [Jun 2025]: We have added [instructions](examples/droid/README_train.md) for using `openpi` to train VLAs on the full [DROID dataset](https://droid-dataset.github.io/). This is an approximate open-source implementation of the training pipeline used to train pi0-FAST-DROID. 
 
 
diff --git a/examples/droid/README_train.md b/examples/droid/README_train.md
@@ -30,20 +30,14 @@ First, change the `rlds_data_dir` path in your `TrainConfig` to the directory th
 
 Then, compute normalization statistics (this will take ~10 minutes):
 ```bash
-uv run --group rlds scripts/compute_norm_stats.py --config-name pi0_fast_droid_finetune --max-frames 10_000_000
+uv run --group rlds scripts/compute_norm_stats.py --config-name pi0_fast_droid_finetune
 ```
 
 Run training:
 ```bash
-uv run --group rlds scripts/train.py pi0_fast_droid_finetune --exp-name=my_experiment --overwrite
+XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 uv run --group rlds scripts/train.py pi0_fast_droid_finetune --exp-name=my_experiment --overwrite
 ```
 
-By default, training uses no filtering. Alternatively, you can use a custom filtering scheme by providing a json that maps from episode keys to a list of time step ranges (denoted as a tuple of start and end time step indicies) in that episode you wish to keep. The episode key is a unique ID defined as `f"{recording_folderpath}--{file_path}"`. We choose this convention because both paths are easily accessible in the DROID RLDS episodes' metadata.
-
-We provide an example of such a filtering scheme in [filtering/compute_droid_nonidle_ranges.py](examples/droid/filtering/compute_droid_nonidle_ranges.py), which is significantly more aggressive than the default (and thus leads to policies that take significantly fewer idle actions). We recommend using the filter produced by this script, and have also provided a copy of the filter [here](https://huggingface.co/KarlP/droid#filtering-data) specifically for `droid/1.0.1`.
-
-The filter json you wish to use can be specified by modifying the line `filter_dict_path="<path_to_filter_dict>"` in [src/openpi/training/config.py](src/openpi/training/config.py).
-
 **Note**: The original pi0-FAST-DROID model was trained with joint velocity actions.
 Joint velocity actions are not compatible with simulated evaluation environments (much harder to simulate). 
 Thus, we do not recommend training with joint velocity actions and instead use joint position actions here.
@@ -57,6 +51,14 @@ If you start from PaliGemma instead of pi0 initialization, plan with ~5 days on
 We have experimented with LoRA for cheaper finetuning, but haven't found the policies to perform well so far.
 
 
+## Data Filtering
+
+Like any diverse real-robot dataset, the DROID dataset isn't perfectly "clean" and we have found data filtering to significantly improve policy performance. Concretely, the DROID dataset contains many *idle* timesteps in which the robot does not move (in part due to the VR teleoperation interface that was used during data collection, we will not go into too much detail here). Appropriate filtering of these idle transitions can improve policy performance.
+
+By default, our openpi training recipe implements the same idle filter used to train all pi-DROID models. We implement it by pre-computing which dataset indices to sample during training. You can check [compute_droid_nonidle_ranges.py](examples/droid/compute_droid_nonidle_ranges.py) for how we compute these indices. Roughly speaking, we filter any time steps for which the next chunk of actions would be largely idle. During training, our code automatically pulls our pre-computed list of indices from cloud storage and applies them. If you want to modify the idle filter / create your custom sampling logic, you can modify our script to generate a new index list and provide it via the `filter_dict_path="<path_to_filter_dict>"` argument in [src/openpi/training/config.py](src/openpi/training/config.py).
+
+**Note**: our list of filtering indices is only valid for the `droid/1.0.1` dataset mentioned in the download section above, and will not provide valid filtering for any other version of the DROID dataset, so make sure you download the dataset above! If you have a custom DROID version, you can rerun the [compute_droid_nonidle_ranges.py](examples/droid/compute_droid_nonidle_ranges.py) script to generate a new list of sampling indices.
+
 ## RoboArena
 
 Consider submitting your DROID policies to the [RoboArena benchmark](https://robo-arena.github.io/), which allows you to evaluate your policies on diverse tasks & scenes, **in the real world**! :)
diff --git a/examples/droid/compute_droid_nonidle_ranges.py b/examples/droid/compute_droid_nonidle_ranges.py
@@ -1,8 +1,9 @@
 """
-Iterates through the DROID dataset and a json mapping from episode unique IDs to ranges of time steps
-that should not be filtered out (all others are).
+Iterates through the DROID dataset and creates a json mapping from episode unique IDs to ranges of time steps
+that should be sampled during training (all others are filtered out).
 
-Specifically, we look for ranges of consecutive steps that contain at most min_idle_len consecutive idle frames
+Filtering logic:
+We look for ranges of consecutive steps that contain at most min_idle_len consecutive idle frames
 (default to 7 -- as most DROID action-chunking policies run the first 8 actions generated in each chunk, filtering
 this way means the policy will not get stuck outputting stationary actions). Additionally, we also only keep non-idle
 ranges of length at least min_non_idle_len (default to 16 frames = ~1 second), while also removing the last
diff --git a/src/openpi/training/config.py b/src/openpi/training/config.py
@@ -93,7 +93,7 @@ class DataConfig:
     rlds_data_dir: str | None = None
     # Action space for DROID dataset.
     action_space: droid_rlds_dataset.DroidActionSpace | None = None
-    # Path to the filter dictionary file for DROID dataset
+    # Path to the data filter file for DROID dataset
     filter_dict_path: str | None = None
 
 
@@ -350,7 +350,7 @@ class RLDSDroidDataConfig(DataConfigFactory):
     # to tuples denoting ranges of time steps to keep (start, end). Episodes are uniquely identified with
     # f"{recording_folderpath}--{file_path}", both of which are present in the RLDS episode metadata.
     # Path to the filter dictionary file.
-    filter_dict_path: str | None = None
+    filter_dict_path: str | None = "gs://openpi-assets/droid/droid_sample_ranges_v1_0_1.json"
 
     @override
     def create(self, assets_dirs: pathlib.Path, model_config: _model.BaseModelConfig) -> DataConfig:
@@ -693,8 +693,6 @@ def __post_init__(self) -> None:
             # Set this to the path to your DROID RLDS dataset (the parent directory of the `droid` directory).
             rlds_data_dir="<path_to_droid_rlds_dataset>",
             action_space=droid_rlds_dataset.DroidActionSpace.JOINT_POSITION,
-            # Set this to the path for whatever filtering json you wish to use (or None)
-            filter_dict_path="<path_to_filtering_json_or_None>",
         ),
         weight_loader=weight_loaders.CheckpointWeightLoader("gs://openpi-assets/checkpoints/pi0_fast_base/params"),
         lr_schedule=_optimizer.CosineDecaySchedule(
diff --git a/src/openpi/training/droid_rlds_dataset.py b/src/openpi/training/droid_rlds_dataset.py
@@ -7,8 +7,14 @@
 
 from enum import Enum
 from enum import auto
+import json
+import logging
 from pathlib import Path
 
+import tqdm
+
+import openpi.shared.download as download
+
 
 class DroidActionSpace(Enum):
     """Action space for DROID dataset."""
@@ -32,7 +38,7 @@ def __init__(
         shuffle_buffer_size: int = 250_000,
         num_parallel_reads: int = -1,  # -1 == tf.data.AUTOTUNE -- hack to not import tf at top level
         num_parallel_calls: int = -1,  # -1 == tf.data.AUTOTUNE -- hack to not import tf at top level
-        filter_dict_path=None,
+        filter_dict_path=None,  # Path to json file with indices to sample during training
     ):
         # Import tensorflow here to not make it mandatory in case RLDS data loader is not used.
         import dlimp as dl
@@ -52,32 +58,27 @@ def __init__(
             )
         )
 
-        # Repeat dataset so we never run out of data.
-        dataset = dataset.repeat()
+        # # Repeat dataset so we never run out of data.
+        # dataset = dataset.repeat()
 
         # Load the filter dictionary if provided.
-        # The filter dictionary is a JSON file that maps episode keys to ranges of frames to keep
+        # The filter dictionary is a JSON file that maps episode keys to ranges of frames to sample
         # (e.g.,
         # {
-        #     "keep_ranges": {
-        #         "<episode key>": [[0, 100], [200, 300]]
-        #     }
-        #  }
-        # means keep frames 0-89 and 200-289).
+        #     "<episode key>": [[0, 100], [200, 300]]
+        # }
+        # means keep frames 0-99 and 200-299).
         if filter_dict_path is not None:
-            import json
-
-            from tqdm import tqdm
-
-            with Path(filter_dict_path).open("r") as f:
+            cached_filter_dict_path = download.maybe_download(filter_dict_path)
+            with Path(cached_filter_dict_path).open("r") as f:
                 filter_dict = json.load(f)
 
-            print(f"Using filter dictionary with {len(filter_dict['keep_ranges'])} episodes")
+            logging.info(f"Using filter dictionary with {len(filter_dict)} episodes")
 
             keys_tensor = []
             values_tensor = []
 
-            for episode_key, ranges in tqdm(filter_dict.items()):
+            for episode_key, ranges in tqdm.tqdm(filter_dict.items(), desc="Creating idle filter hash table..."):
                 for start, end in ranges:
                     for t in range(start, end):
                         frame_key = f"{episode_key}--{t}"
@@ -86,7 +87,7 @@ def __init__(
             self.filter_table = tf.lookup.StaticHashTable(
                 tf.lookup.KeyValueTensorInitializer(keys_tensor, values_tensor), default_value=False
             )
-            print("Filter hash table initialized")
+            logging.info("Filter hash table initialized")
         else:
             self.filter_table = tf.lookup.StaticHashTable(
                 tf.lookup.KeyValueTensorInitializer([""], [True]), default_value=True
@@ -122,6 +123,7 @@ def restructure(traj):
             traj_len = tf.shape(traj["action"])[0]
             indices = tf.as_string(tf.range(traj_len))
 
+            # Data filtering:
             # Compute a uniquely-identifying step ID by concatenating the recording folderpath, file path,
             # and each step's time step index. This will index into the filter hash table, and if it returns true,
             # then the frame passes the filter.
@@ -175,11 +177,19 @@ def chunk_actions(traj):
         # Flatten: map from trajectory dataset to dataset of individual action chunks
         dataset = dataset.flatten(num_parallel_calls=num_parallel_calls)
 
+        # Filter data that doesn't pass the filter
         def filter_from_dict(frame):
             return frame["passes_filter"]
 
         dataset = dataset.filter(filter_from_dict)
 
+        # Remove "passes_filter" key from output
+        def remove_passes_filter(frame):
+            frame.pop("passes_filter")
+            return frame
+
+        dataset = dataset.map(remove_passes_filter)
+
         # Decode images: RLDS saves encoded images, only decode now for efficiency
         def decode_images(traj):
             traj["observation"]["image"] = tf.io.decode_image(

Original file line number	Diff line number	Diff line change
`@@ -12,6 +12,7 @@ This is an experiment: $\pi_0$ was developed for our own robots, which differ fr`
`12`	`12`
`13`	`13`	`## Updates`
`14`	`14`
	`15`	`+- [Sept 2025]: We have added an [improved idle filter](examples/droid/README_train.md#data-filtering) for DROID training.`
`15`	`16`	- [Jun 2025]: We have added [instructions](examples/droid/README_train.md) for using `openpi` to train VLAs on the full [DROID dataset](https://droid-dataset.github.io/). This is an approximate open-source implementation of the training pipeline used to train pi0-FAST-DROID.
`16`	`17`
`17`	`18`