Open
Description
It looks like the model runner API supports prompt lookup speculative decoding: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/prompt_lookup
However, it doesn't seem to be part of the LLM API yet:
TensorRT-LLM/tensorrt_llm/llmapi/llm_args.py
Lines 851 to 854 in 3ee4332