feat: repetition detector for degenerate token loops#65
Open
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
Open
feat: repetition detector for degenerate token loops#65janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
Conversation
Adds a lightweight repetition detector to the scheduler that monitors the last 32 generated tokens per request and stops generation when degenerate patterns are detected: - Single-token repetition (8+ identical tokens) - Short sequence repetition (2-4 token patterns repeated 6+ times) This prevents runaway generation when models enter degenerate loops, saving compute and improving reliability for long-running requests. Includes 15 unit tests covering all detection patterns and edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
janhilgard
added a commit
to janhilgard/vllm-mlx
that referenced
this pull request
Feb 11, 2026
Moves repetition detection logic to feature/repetition-detector branch (PR waybarrios#65) per review feedback on PR waybarrios#53. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 tasks
|
How is this different from repetition penalties or DRY? |
Collaborator
Author
|
Good question! They solve different problems: Repetition penalty / DRY are preventative — they modify logits during sampling to discourage repetition before it happens. They work well most of the time. This detector is a safety net — it doesn't touch sampling at all. It monitors output and terminates generation when degenerate loops have already formed. Think of it as a circuit breaker for the server. Why both are needed:
The overhead is near-zero (list append + periodic check on a 32-token window), so it's cheap insurance. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
finish_reason="stop"when degenerate patterns are detected:0 0 0 0 0 0 0 0)ab ab ab ab ab ab)Split out from PR #53 per review feedback — this touches the scheduler hot path and is independent of the GPT-OSS reasoning parser.
Test plan
tests/test_repetition_detector.py)🤖 Generated with Claude Code