Code for “Reject Only Critical Tokens: Pivot-Aware Speculative Decoding.”
This repo currently includes core components for dataset generation, pivot-classifier training, and PAD decoding.
Conventional speculative decoding preserves the target model’s distribution, but suffers from low acceptance rates. PAD reframes verification around task utility rather than exact distribution matching, using a lightweight classifier to accept all non-pivotal tokens and reject only critical (pivot) tokens that would degrade downstream performance.
Highlights
- Utility-based objective: Matches the target model’s expected utility, not its exact distribution.
- Pivot-token classifier: Identifies tokens likely to harm task performance.
- Speedups in practice: Up to 2.5× faster with comparable accuracy.
- Drop-in friendly: Minimal overhead; compatible with standard SD pipelines.
If you use PAD, please cite:
@inproceedings{
ziashahabi2025pad,
title={Reject Only Critical Tokens: Pivot-Aware Speculative Decoding},
author={Amir Ziashahabi and Yavuz Faruk Bakman and Duygu Nur Yaldiz and Mostafa El-Khamy and Sai Praneeth Karimireddy and Salman Avestimehr},
booktitle={NeurIPS 2025 Workshop on Efficient Reasoning},
year={2025},
}Questions or collaborations: Amir Ziashahabi — [email protected]