Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Crash special token xgrammar #3108

Open
5 tasks done
maximegmd opened this issue Jan 24, 2025 · 3 comments
Open
5 tasks done

[Bug] Crash special token xgrammar #3108

maximegmd opened this issue Jan 24, 2025 · 3 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@maximegmd
Copy link

maximegmd commented Jan 24, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

When using xgrammar with an EBNF grammar, SGLang will crash if the model outputs a reserved token.

[2025-01-24 04:52:54 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1756, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 512, in event_loop_overlap
    self.process_batch_result(tmp_batch, tmp_result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1089, in process_batch_result
    self.process_batch_result_decode(batch, result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1253, in process_batch_result_decode
    req.grammar.accept_token(next_token_id)
  File "/sgl-workspace/sglang/python/sglang/srt/constrained/xgrammar_backend.py", line 52, in accept_token
    assert self.matcher.accept_token(token)
  File "/usr/local/lib/python3.10/dist-packages/xgrammar/matcher.py", line 205, in accept_token
    return self._handle.accept_token(token_id, debug_print)
RuntimeError: [04:52:54] /workspace/cpp/grammar_matcher.cc:361: Token id 128255: <|reserved_special_token_247|> is regarded as a special token, and cannot be accepted by the GrammarMatcher


[2025-01-24 04:52:54 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1756, in run_scheduler_process
    scheduler.event_loop_overlap()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 512, in event_loop_overlap
    self.process_batch_result(tmp_batch, tmp_result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1089, in process_batch_result
    self.process_batch_result_decode(batch, result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1253, in process_batch_result_decode
    req.grammar.accept_token(next_token_id)
  File "/sgl-workspace/sglang/python/sglang/srt/constrained/xgrammar_backend.py", line 52, in accept_token
    assert self.matcher.accept_token(token)
  File "/usr/local/lib/python3.10/dist-packages/xgrammar/matcher.py", line 205, in accept_token
    return self._handle.accept_token(token_id, debug_print)
RuntimeError: [04:52:54] /workspace/cpp/grammar_matcher.cc:361: Token id 128255: <|reserved_special_token_247|> is regarded as a special token, and cannot be accepted by the GrammarMatcher


[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
[2025-01-24 04:52:54] Received sigquit from a child proces. It usually means the child failed.
...

Followed by an infinite stream of:

[2025-01-24 04:53:06] Exception in callback Loop._read_from_self
handle: <Handle Loop._read_from_self>
Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 66, in uvloop.loop.Handle._run
  File "uvloop/loop.pyx", line 399, in uvloop.loop.Loop._read_from_self
  File "uvloop/loop.pyx", line 404, in uvloop.loop.Loop._invoke_signals
  File "uvloop/loop.pyx", line 379, in uvloop.loop.Loop._ceval_process_signals
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 332, in sigquit_handler
    kill_process_tree(os.getpid())
  File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 508, in kill_process_tree
    itself.send_signal(signal.SIGQUIT)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1285, in send_signal
    self._send_signal(sig)
  File "/usr/local/lib/python3.10/dist-packages/psutil/__init__.py", line 1266, in _send_signal
    os.kill(self.pid, sig)
  File "/sgl-workspace/sglang/python/sglang/srt/entrypoints/engine.py", line 332, in sigquit_handler
    kill_process_tree(os.getpid())
...

Reproduction

docker run -d --gpus all \
    -p 8000:8000 \
    -v /home/azureuser/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=*****" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1-Distill-Llama-70B --host 0.0.0.0 --port 8000 --tp 4--dp 1 --grammar-backend xgrammar

Environment

Latest docker image: https://hub.docker.com/layers/lmsysorg/sglang/latest/images/sha256-576f608ad94fda242249416b3d9d27f8448091cfeff5776f6b99d90f4a42c13b

Microsoft Azure 4xA100 80G.

@adarshxs
Copy link
Contributor

adarshxs commented Jan 24, 2025

could you share your prompt and your ebnf grammar @maximegmd ? I'll have a look.

@maximegmd
Copy link
Author

I cannot share prompts as they contain private information but grammar is:

GRAMMAR = """
root ::= reasoning
reasoning ::= "<think>\\n" line* "</think>" "\\n" "\\n" scores
line ::= [^\\n<]* (("<" [^/] line) | "\\n")
scientific_accuracy ::= "Scientific accuracy: " values
harm_risk ::= "Harm risk: " values
inaccurate_irrelevant ::= "Inaccurate or irrelevant information: " values
missing_information ::= "Missing information: " values
hallucination_risk ::= "Hallucination risk: " values
refusal ::= "Refusal: " values
scores ::= scientific_accuracy "\\n" harm_risk "\\n" inaccurate_irrelevant "\\n" missing_information "\\n" hallucination_risk
values ::= ("1" | "2" | "3" | "4" | "5")
"""

The occurence rate is about 1 in 30 000 requests, it crashes the inference image around once a day for us, I haven't found a 100% repro for this bug.

@zhaochenyang20
Copy link
Collaborator

cc @shuaills @Ubospica

@zhaochenyang20 zhaochenyang20 self-assigned this Jan 25, 2025
@zhaochenyang20 zhaochenyang20 added the help wanted Extra attention is needed label Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants