Reference implementation of the probabilistic evaluation framework proposed in the paper:
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Yan Scholten, Stephan Günnemann, Leo Schwinn
International Conference on Learning Representations, ICLR 2025 (Oral)
[ Project page | PDF ]
Training code and more supplementary materials will be released soon. In the meantime, you can explore our demo notebook, which demonstrates that greedy evaluations can misleadingly suggest successful unlearning, while our probabilistic evaluations provide more accurate assessments of model capabilities.
Please cite our paper if you use this code in your own work:
@inproceedings{scholten2024probabilistic,
title={A Probabilistic Perspective on Unlearning and Alignment for Large Language Models},
author={Yan Scholten and Stephan G{\"u}nnemann and Leo Schwinn},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=51WraMid8K}
}
For questions and feedback please contact:
Yan Scholten, Technical University of Munich
Stephan Günnemann, Technical University of Munich
Leo Schwinn, Technical University of Munich
The code by Yan Scholten, Stephan Günnemann and Leo Schwinn is licensed under MIT license.