[Bug] Fix GLM4-MoE TopK output format mismatch for unquantized models #18206

ConnorLi96 · 2026-02-04T00:52:44Z

Set explicit output_format=TopKOutputFormat.STANDARD for unquantized GLM4-MoE models
Add validation check in triton MoE runner to catch format mismatches early
Fixes ValueError when using EAGLE speculative decoding with unquantized GLM4-MoE

Motivation

Fixes ValueError: too many values to unpack (expected 3) when using EAGLE speculative decoding with unquantized GLM4-MoE models. The server crashes immediately after warmup during the first generation request.

File "sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 206, in fused_experts
    topk_weights, topk_ids, _ = topk_output
ValueError: too many values to unpack (expected 3)

Launch command to reproduce:

python3 -m sglang.launch_server \
  --model-path baseten-admin/glm-4.7-fp4 \
  --port 12345 \
  --enable-metrics \
  --tp-size 4 \
  --moe-runner-backend flashinfer_trtllm \
  --attention-backend flashinfer \
  --tool-call-parser glm47 \
  --reasoning-parser glm45 \
  --mem-fraction-static 0.8 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --served-model-name baseten-admin/glm-4.7-fp4 \
  --host 0.0.0.0 \
  --context-length 202752 \
  --crash-dump-folder /var/log/sglang/

Hardware: B200

Modifications

Root Cause: GLM4-MoE's TopK layer doesn't explicitly set output_format, causing auto-detection to incorrectly produce BypassedTopKOutput (5 fields) instead of StandardTopKOutput (3 fields) when using EAGLE with unquantized models.

Fix: Set explicit output_format=TopKOutputFormat.STANDARD for unquantized GLM4-MoE models in glm4_moe.py:

output_format=TopKOutputFormat.STANDARD if quant_config is None else None,

Accuracy Tests

✅ Tested unquantized GLM4-MoE with EAGLE speculative decoding (no longer crashes)
✅ Verified FP4 quantized models still work correctly with flashinfer_trtllm backend

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

- Set explicit output_format=TopKOutputFormat.STANDARD for unquantized GLM4-MoE models - Add validation check in triton MoE runner to catch format mismatches early - Fixes ValueError when using EAGLE speculative decoding with unquantized GLM4-MoE

gemini-code-assist · 2026-02-04T00:52:48Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ConnorLi96 · 2026-02-04T00:58:56Z

/tag-and-rerun-ci

ConnorLi96 · 2026-02-04T01:01:18Z

cc @zRzRzRzRzRzRzR @JustinTong0323 seems like you're working on this model recently.

github-actions bot added the run-ci label Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Fix GLM4-MoE TopK output format mismatch for unquantized models #18206

[Bug] Fix GLM4-MoE TopK output format mismatch for unquantized models #18206

ConnorLi96 commented Feb 4, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Uh oh!

ConnorLi96 commented Feb 4, 2026

Uh oh!

ConnorLi96 commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Bug] Fix GLM4-MoE TopK output format mismatch for unquantized models #18206

Are you sure you want to change the base?

[Bug] Fix GLM4-MoE TopK output format mismatch for unquantized models #18206

Conversation

ConnorLi96 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 4, 2026

Uh oh!

ConnorLi96 commented Feb 4, 2026

Uh oh!

ConnorLi96 commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ConnorLi96 commented Feb 4, 2026 •

edited

Loading