Skip to content

Amir-zsh/PAD

Repository files navigation

Pivot-Aware Speculative Decoding (PAD)

Code for “Reject Only Critical Tokens: Pivot-Aware Speculative Decoding.”

This repo currently includes core components for dataset generation, pivot-classifier training, and PAD decoding.


Overview

Conventional speculative decoding preserves the target model’s distribution, but suffers from low acceptance rates. PAD reframes verification around task utility rather than exact distribution matching, using a lightweight classifier to accept all non-pivotal tokens and reject only critical (pivot) tokens that would degrade downstream performance.

Highlights

  • Utility-based objective: Matches the target model’s expected utility, not its exact distribution.
  • Pivot-token classifier: Identifies tokens likely to harm task performance.
  • Speedups in practice: Up to 2.5× faster with comparable accuracy.
  • Drop-in friendly: Minimal overhead; compatible with standard SD pipelines.

📄 Reference

If you use PAD, please cite:

@inproceedings{
  ziashahabi2025pad,
  title={Reject Only Critical Tokens: Pivot-Aware Speculative Decoding},
  author={Amir Ziashahabi and Yavuz Faruk Bakman and Duygu Nur Yaldiz and Mostafa El-Khamy and Sai Praneeth Karimireddy and Salman Avestimehr},
  booktitle={NeurIPS 2025 Workshop on Efficient Reasoning},
  year={2025},
}

📬 Contact

Questions or collaborations: Amir Ziashahabi[email protected]


About

Official implementation of Pivot-Aware Speculative Decoding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published