[Bug] Crash special token xgrammar #3108

maximegmd · 2025-01-24T13:15:33Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

When using xgrammar with an EBNF grammar, SGLang will crash if the model outputs a reserved token.

[2025-01-24 04:52:54 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1756, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 512, in event_loop_overlap
    self.process_batch_result(tmp_batch, tmp_result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1089, in process_batch_result
    self.process_batch_result_decode(batch, result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1253, in process_batch_result_decode
    req.grammar.accept_token(next_token_id)
  File "/sgl-workspace/sglang/python/sglang/srt/constrained/xgrammar_backend.py", line 52, in accept_token
    assert self.matcher.accept_token(token)
  File "/usr/local/lib/python3.10/dist-packages/xgrammar/matcher.py", line 205, in accept_token
    return self._handle.accept_token(token_id, debug_print)
RuntimeError: [04:52:54] /workspace/cpp/grammar_matcher.cc:361: Token id 128255: <|reserved_special_token_247|> is regarded as a special token, and cannot be accepted by the GrammarMatcher


[2025-01-24 04:52:54 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1756, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 512, in event_loop_overlap
    self.process_batch_result(tmp_batch, tmp_result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1089, in process_batch_result
    self.process_batch_result_decode(batch, result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1253, in process_batch_result_decode
    req.grammar.accept_token(next_token_id)
  File "/sgl-workspace/sglang/python/sglang/srt/constrained/xgrammar_backend.py", line 52, in accept_token
    assert self.matcher.accept_token(token)
  File "/usr/local/lib/python3.10/dist-packages/xgrammar/matcher.py", line 205, in accept_token
    return self._handle.accept_token(token_id, debug_print)
RuntimeError: [04:52:54] /workspace/cpp/grammar_matcher.cc:361: Token id 128255: <|reserved_special_token_247|> is regarded as a special token, and cannot be accepted by the GrammarMatcher


[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
...

Followed by an infinite stream of:

[2025-01-24 04:53:06] Exception in callback Loop._read_from_self
handle: <Handle Loop._read_from_self>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 66, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 399, in uvloop.loop.Loop._read_from_self
  File "uvloop/loop.pyx", line 404, in uvloop.loop.Loop._invoke_signals
  File "uvloop/loop.pyx", line 379, in uvloop.loop.Loop._ceval_process_signals
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 332, in sigquit_handler
    kill_process_tree(os.getpid())
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 508, in kill_process_tree
    itself.send_signal(signal.SIGQUIT)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1285, in send_signal
    self._send_signal(sig)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1266, in _send_signal
    os.kill(self.pid, sig)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 332, in sigquit_handler
    kill_process_tree(os.getpid())
...

Reproduction

docker run -d --gpus all \
    -p 8000:8000 \
    -v /home/azureuser/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=*****" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-70B --host 0.0.0.0 --port 8000 --tp 4--dp 1 --grammar-backend xgrammar

Environment

Latest docker image: https://hub.docker.com/layers/lmsysorg/sglang/latest/images/sha256-576f608ad94fda242249416b3d9d27f8448091cfeff5776f6b99d90f4a42c13b

Microsoft Azure 4xA100 80G.

The text was updated successfully, but these errors were encountered:

adarshxs · 2025-01-24T16:51:04Z

could you share your prompt and your ebnf grammar @maximegmd ? I'll have a look.

maximegmd · 2025-01-24T20:51:16Z

I cannot share prompts as they contain private information but grammar is:

GRAMMAR = """
root ::= reasoning
reasoning ::= "<think>\\n" line* "</think>" "\\n" "\\n" scores
line ::= [^\\n<]* (("<" [^/] line) | "\\n")
scientific_accuracy ::= "Scientific accuracy: " values
harm_risk ::= "Harm risk: " values
inaccurate_irrelevant ::= "Inaccurate or irrelevant information: " values
missing_information ::= "Missing information: " values
hallucination_risk ::= "Hallucination risk: " values
refusal ::= "Refusal: " values
scores ::= scientific_accuracy "\\n" harm_risk "\\n" inaccurate_irrelevant "\\n" missing_information "\\n" hallucination_risk
values ::= ("1" | "2" | "3" | "4" | "5")
"""

The occurence rate is about 1 in 30 000 requests, it crashes the inference image around once a day for us, I haven't found a 100% repro for this bug.

zhaochenyang20 · 2025-01-25T04:30:53Z

cc @shuaills @Ubospica

zhaochenyang20 self-assigned this Jan 25, 2025

zhaochenyang20 added the help wanted Extra attention is needed label Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Crash special token xgrammar #3108

[Bug] Crash special token xgrammar #3108

maximegmd commented Jan 24, 2025 •

edited

Loading

adarshxs commented Jan 24, 2025 •

edited

Loading

maximegmd commented Jan 24, 2025

zhaochenyang20 commented Jan 25, 2025

[Bug] Crash special token xgrammar #3108

[Bug] Crash special token xgrammar #3108

Comments

maximegmd commented Jan 24, 2025 • edited Loading

Checklist

Describe the bug

Reproduction

Environment

adarshxs commented Jan 24, 2025 • edited Loading

maximegmd commented Jan 24, 2025

zhaochenyang20 commented Jan 25, 2025

maximegmd commented Jan 24, 2025 •

edited

Loading

adarshxs commented Jan 24, 2025 •

edited

Loading