-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
Community Engagementhelp/insights needed from communityhelp/insights needed from communityfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality supportwaiting for feedback
Description
It looks like the model runner API supports prompt lookup speculative decoding: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/prompt_lookup
However, it doesn't seem to be part of the LLM API yet:
TensorRT-LLM/tensorrt_llm/llmapi/llm_args.py
Lines 851 to 854 in 3ee4332
speculative_config: Optional[Union[LookaheadDecodingConfig, | |
MedusaDecodingConfig, | |
EagleDecodingConfig, | |
MTPDecodingConfig]] = None |
Metadata
Metadata
Assignees
Labels
Community Engagementhelp/insights needed from communityhelp/insights needed from communityfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality supportwaiting for feedback