RULE: Reinforcement UnLEarning Achieves
Forget–Retain Pareto Optimality

TL;DR: RULE is a reinforcement unlearning pipeline that enables the model to explore when and how to refuse. RULE achieves a strong Pareto frontier between forgetting and retention—without massive datasets.

📰 News

🎉 [2025.09] Our paper has been accepted to NeurIPS 2025!

✨ Overview

We propose RULE, which views model unlearning as refusal-policy optimization and introduces an online RL–based refusal fine-tuning approach. This brings three key benefits:

Natural, safe responses: Prior methods often yield unnatural outputs after fine-tuning. By designing appropriate rewards, RULE induces refusal behavior on forget data, producing fluent and safe replies.
Generalization beyond the forget/retain sets: We introduce a simple, effective data synthesis strategy and leverage RL’s exploration on a boundary set. The model implicitly learns a refusal policy from rewards, improving generalization to unseen but related queries.
A better forget–retain trade-off: Because RL samples on-policy from the model’s own distribution, RULE better preserves the model’s knowledge while unlearning targeted content.

Empirically, on RWKU and MUSE-Book, RULE achieves a Pareto-optimal forget–retain frontier using only 10% of the forget and retain sets, while maintaining naturalness and general utility. Additional experiments show robustness to both black-box and white-box attacks, and compatibility with multiple reward designs and online RL algorithms.

📈 Key Findings

Natural refusals on forget-related queries without collapsing helpfulness.
Data-efficient: strong results with a small fraction of forget data + synthetic boundary data.
Pareto-optimal trade-off between forgetting and retention.
Generalization to unseen but semantically related queries.

See the paper for full quantitative results, attack robustness, and ablations.

🚀 Installation

We recommend Python 3.9+.

# Option A: editable install
pip install -e .

# Install dependencies
pip install -r requirements.txt

If you use conda:

conda create -n rule python=3.9 -y
conda activate rule
pip install -e .
pip install -r requirements.txt

🗂️ Repository Structure

log/            # Training and evaluation logs
# For Rejection Steering:
RS/             # Rejection Steering (RS) implementation
    scripts/    # Scripts for running RS experiments
    models/     # RS model implementations
    utils/      # Utility functions for RS
# For Refusal Boundary Optimization:
examples/       # Example experiment configs (YAML + runnable bash)
data/           # Datasets and metadata
verl/           # Core source code (models, training, evaluation, utils)
run_muse.sh     # Script to run MUSE-Book experiments for ReBO
run_rwku.sh     # Script to run RWKU experiments for ReBO
requirements.txt
setup.py

🧪 Quick Start

1) Rejection Steering (RS)

cd RS && bash scripts/full/run_rt_epoch_target.sh

2) Refusal Boundary Optimization (ReBO)

bash examples/exp_target/RWKU/run_llama_bs32_kl1e-2_forget_bf16_two_stage_reject_ref_rollout8_withformat_with_fb_neighbor_abs_lr2e-6.sh

Tips

Edit the RS runner at: RS/scripts/full/run_rt_epoch_target.sh.

Edit ReBO YAMLs under examples/ for models, rewards, data paths, and hyperparameters.

🧰 Configuration

RS:
- Set the forget targets, reward weights, and sampler options in the runner script above.
ReBO:
- Control boundary synthesis, rollout length, reward shaping, and evaluation suites in examples/**.yaml.

🙏 Acknowledgements

This project builds on:

EasyR1 (preference-based RL training utilities)
RWKU (real-world knowledge unlearning benchmark)

We also evaluate on MUSE-Books where appropriate.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

📚 Citation

If you find RULE useful, please cite our paper:

@misc{zhang2025rulereinforcementunlearningachieves,
      title={RULE: Reinforcement UnLEarning Achieves Forget-Retain Pareto Optimality},
      author={Chenlong Zhang and Zhuoran Jin and Hongbang Yuan and Jiaheng Wei and Tong Zhou and Kang Liu and Jun Zhao and Yubo Chen},
      year={2025},
      eprint={2506.07171},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.07171}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RULE: Reinforcement UnLEarning Achieves
Forget–Retain Pareto Optimality

📰 News

✨ Overview

📈 Key Findings

🚀 Installation

🗂️ Repository Structure

🧪 Quick Start

1) Rejection Steering (RS)

2) Refusal Boundary Optimization (ReBO)

🧰 Configuration

🙏 Acknowledgements

📄 License

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
RS		RS
data		data
examples		examples
img		img
verl		verl
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
default_config.yaml		default_config.yaml
make_table.py		make_table.py
requirements.txt		requirements.txt
run_muse.sh		run_muse.sh
run_rwku.sh		run_rwku.sh
run_task.sh		run_task.sh
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

RULE: Reinforcement UnLEarning Achieves Forget–Retain Pareto Optimality

📰 News

✨ Overview

📈 Key Findings

🚀 Installation

🗂️ Repository Structure

🧪 Quick Start

1) Rejection Steering (RS)

2) Refusal Boundary Optimization (ReBO)

🧰 Configuration

🙏 Acknowledgements

📄 License

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

RULE: Reinforcement UnLEarning Achieves
Forget–Retain Pareto Optimality

Packages