Skip to content

Conversation

@pawel-olejniczak
Copy link
Contributor

The sampler crashes when repetition penalties are used because make_selective_sampling_metadata() sets prompt_token_ids=None during skip_copy=True (decode phase), but penalties require prompt tokens. Add caching mechanism for prompt_token_ids to reuse the tensor when skip_copy=True and penalties are needed. Cache is invalidated when batch composition changes. Fixes repetition penalty support while preserving skip_copy performance optimization.

The sampler crashes when repetition penalties are used because make_selective_sampling_metadata()
sets prompt_token_ids=None during skip_copy=True (decode phase), but penalties require prompt tokens.
Add caching mechanism for prompt_token_ids to reuse the tensor when skip_copy=True and penalties are needed.
Cache is invalidated when batch composition changes. Fixes repetition penalty support while
preserving skip_copy performance optimization.

Signed-off-by: Paweł Olejniczak <[email protected]>
@pawel-olejniczak pawel-olejniczak force-pushed the dev/polejnix/fix_repetition_penalty_crash branch from d8fe456 to b76d0a0 Compare December 31, 2025 13:44
@github-actions
Copy link

✅ CI Passed

All checks passed successfully against the following vllm commit:
b3a2bdf1ac90748d58bf8c05f8d0095ede5c7eca

@iboiko-habana iboiko-habana merged commit 25e637c into vllm-project:main Jan 5, 2026
50 checks passed
yingjie-han pushed a commit to yingjie-han/vllm-gaudi that referenced this pull request Jan 16, 2026
The sampler crashes when repetition penalties are used because
make_selective_sampling_metadata() sets prompt_token_ids=None during
skip_copy=True (decode phase), but penalties require prompt tokens. Add
caching mechanism for prompt_token_ids to reuse the tensor when
skip_copy=True and penalties are needed. Cache is invalidated when batch
composition changes. Fixes repetition penalty support while preserving
skip_copy performance optimization.

Signed-off-by: Paweł Olejniczak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants