Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

This is the official implementation of LiRE.

This repository contains offline RL dataset and scripts to reproduce experiments.

The code is based on

CORL library: Offline Reinforcement Learning library. This library provides single-file implementations of offline RL algorithms.
PEBBLE: online Preference-based Reinforcement learning code. We used the SAC implementation of this code to create new offline preference-based RL dataset.

Please visit our paper and project page for more details.

Installation

1. Install with conda env file (Click to expand)

  conda env create -f LiRE.yml
  pip install git+https://github.com/Farama-Foundation/Metaworld.git@master#egg=metaworld
  pip install git+https://github.com/denisyarats/dmc2gym.git
  pip install gdown
  sudo apt install unzip

2. Install with installation list (Click to expand)

  conda create -n LiRE python=3.9
  conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
  pip install "gym[mujoco_py,classic_control]==0.23.0"
  pip install pyrallis rich tqdm==4.64.0 wandb==0.12.21
  pip install git+https://github.com/denisyarats/dmc2gym.git
  pip install git+https://github.com/Farama-Foundation/Metaworld.git@master#egg=metaworld
  pip install gdown
  sudo apt install unzip

Trouble shooting (Click to expand)
- AttributeError: module 'numpy' has no attribute 'int'
  - modify to dim = int(np.prod(s.shape)) from dim = np.int(np.prod(s.shape)) in .../LiRE/lib/python3.9/site-packages/dmc2gym/wrappers.py

Algorithms

In this repro, we can run MR, SeqRank, LiRE.

For other baselines, we experimented with the following repo:

Algorithms	URL
PT	https://github.com/csmile-1006/PreferenceTransformer
DPPO	https://github.com/snu-mllab/DPPO
IPL	https://github.com/jhejna/inverse-preference-learning

Dataset

For more details, please see here

MetaWorld
DMControl

Scripts

Please see here

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Reward_learning		Reward_learning
algorithms		algorithms
configs		configs
dataset		dataset
human_feedback		human_feedback
scripts		scripts
.gitignore		.gitignore
LiRE.yml		LiRE.yml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

Installation

Algorithms

Dataset

Scripts

About

Releases

Packages

Languages

chwoong/LiRE

Folders and files

Latest commit

History

Repository files navigation

Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

Installation

Algorithms

Dataset

Scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages