[Feature] Prompt lookup speculative decoding for LLM API

It looks like the model runner API supports prompt lookup speculative decoding: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/prompt_lookup

However, it doesn't seem to be part of the LLM API yet: https://github.com/NVIDIA/TensorRT-LLM/blob/3ee4332fb183bf09a8a8a577bb3dd9a8e68f29f6/tensorrt_llm/llmapi/llm_args.py#L851-L854

	speculative_config: Optional[Union[LookaheadDecodingConfig,
	MedusaDecodingConfig,
	EagleDecodingConfig,
	MTPDecodingConfig]] = None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Prompt lookup speculative decoding for LLM API #3138

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Prompt lookup speculative decoding for LLM API #3138

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions