This repository contains the code for the paper STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation.
There are two ways to set up the environment, using our dockerfile with the provided Makefile (This path is much easier), or setting up a conda environment.
sudo make build # build the GPU image
sudo make sanity # quick check: imports mujoco_py/gym/d4rl + steps Hopper to ensure mujoco works
to automatically download the assets (policies, diffusion models, datasets, etc) run:
# Pick ONE of these; assets are saved into your repo on the host:
# since the docker mounts your repo at /workspace, you can also download it and put inside the host (In case google Drive link is not accessible)
sudo make download ZIP_LOCAL=assets.zip
sudo make download ZIP_URL="https://drive.google.com/file/d/<FILE_ID>/view?usp=sharing"
After these you have full access to the running code and a shell:
sudo make shell
Please read the other sections regarding how to run the experiments and train models, you are good to go!
conda create -n ope python=3.9
conda activate opeYou need to install D4RL in order to run the experiments. You can do this by running the following command:
pip install git+https://github.com/Farama-Foundation/d4rl@master#egg=d4rlTo use MuJoCo 2.1.0, you need a license key and the correct binaries. please follow the instructions on the OpenAI mujoco website to get the license key and install the correct binaries.
Once D4RL is installed, you can safely update the environment using the provided env.yml:
conda env update -n ope -f env.ymlIf you want to also run the diffusion policy experiments, you need to install clean diffuser library. Instructions can be found here.
This framework supports running off-policy evaluation (OPE) experiments on two types of environments:
- D4RL benchmark environments (with pre-collected datasets)
- Standard Gym environments (with custom dataset generation)
For D4RL environments, the datasets are already included in the D4RL package, so no additional dataset generation is required.
For Gym environments, you need to generate datasets using the provided script:
python opelab/examples/gym/generate_dataset.py --name dataset_nameThis script:
- Collects trajectories using a specified policy
- Stores observations, actions, rewards, and terminal states
- Computes and saves normalization statistics for the dataset
- Saves everything to the
dataset/dataset_name/directory
For both environment types, you can train diffusion models:
python opelab/examples/d4rl/diffusion_trainer.py --dataset hopper-medium-v2 --T 16 --D 256 --epochs 100 --output_dir ./trained_modelspython opelab/examples/gym/diffusion_trainer.py --env Pendulum-v1 --T 2 --D 128 --epochs 100 --train_steps 100 --batch_size 64 --output_dir ./trained_modelsCreate JSON configuration files to define the experiment setup. Separate configurations are needed for D4RL and Gym environments.
{
"env_name": "hopper-medium-v2",
"guidance_hyperparams": {
"action_scale": 0.5,
"normalize_grad": true,
"k_guide": 1,
"use_neg_grad": true,
"ratio": 0.5
},
"target_policy_paths": [
"policy/hopper/dope/1.pkl",
"policy/hopper/dope/2.pkl"
],
"baseline_configs": {
"Naive": {
"class": "OnPolicy",
"params": {}
},
"Diffuser": {
"class": "Diffuser",
"params": {
"T": 16,
"D": 256,
"num_samples": 50,
"model_path": "path/to/diffusion/model.pth"
}
}
},
"experiment_params": {
"horizon": 768,
"rollouts": 50,
"gamma": 0.99,
"trials": 5,
"save_path": "results/experiment_name"
}
}{
"env_name": "Pendulum-v1",
"behavior_policy_path": "policy/pendulum/Pi_3.pkl",
"dataset_path": "dataset/pdataset/",
"guidance_hyperparams": {
"action_scale": 0.1,
"normalize_grad": true,
"k_guide": 1
},
"target_policy_paths": [
"policy/pendulum/Pi_1.pkl",
"policy/pendulum/Pi_2.pkl"
],
"baseline_configs": {
"Naive": {
"class": "OnPolicy",
"params": {}
},
"Diffuser": {
"class": "Diffuser",
"params": {
"T": 2,
"D": 256,
"num_samples": 100,
"model_path": "pendulum/T2D256/m1.pth"
}
}
},
"experiment_params": {
"horizon": 200,
"rollouts": 50,
"gamma": 0.99,
"trials": 5,
"save_path": "results/pendulum/experiment_name"
}
}The main entry point for running OPE experiments is main_full.py, which is available for both D4RL and Gym environments.
python opelab/examples/d4rl/main_full.py --config opelab/examples/d4rl/configs/hopper.jsonpython opelab/examples/gym/main_full.py --config opelab/examples/gym/configs/pendulum.jsonhorizon: Number of timesteps to evaluaterollouts: Number of environment rollouts to performgamma: Discount factortrials: Number of evaluation trialssave_path: Path to save the experiment resultstop_k: Number of top policies to identifyoracle_rollouts: Number of rollouts for the oracle evaluation
Full model checkpoint and policies is hosted at this link: Google Drive.
Our codebase builds upon several open-source projects. We would like to acknowledge the following repositories:
If you find this code useful in your research, please consider citing the following paper:
@InProceedings{stitch_ope,
title={{STITCH}-{OPE}: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation},
author={Hossein Goli and Michael Gimelfarb and Nathan Samuel de Lara and Haruki Nishimura and Masha Itkina and Florian Shkurti},
booktitle={Advances in Neural Information Processing Systems},
year={2025}}
