Skip to content

[Feature] Prompt lookup speculative decoding for LLM API #3138

@tonyay163

Description

@tonyay163

It looks like the model runner API supports prompt lookup speculative decoding: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/prompt_lookup

However, it doesn't seem to be part of the LLM API yet:

speculative_config: Optional[Union[LookaheadDecodingConfig,
MedusaDecodingConfig,
EagleDecodingConfig,
MTPDecodingConfig]] = None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions