Skip to content

[Issue]: GLM-5 aiter fused_moe with SGLang + MI355 #2059

@ozziemoreno

Description

@ozziemoreno

Problem Description

sglang server 16 concurrency from the client side passes and then fails when concurrency is changed to 17+, crash seems to happen during fused_moe. Is there a working container I should use?

Operating System

Ubuntu 22.04.5 LTS (Jammy Jellyfish)

CPU

AMD EPYC 9575F 64-Core Processor

GPU

8x AMD Instinct MI355X

ROCm Version

ROCm 7.2

ROCm Component

No response

Steps to Reproduce

SGLANG_ROCM_FUSED_DECODE_MLA=0 python3 -m sglang.launch_server --model zai-org/GLM-5 --tp-size 8 --attention-backend triton --disable-radix-cache --watchdog-timeout 1200

python3 -m sglang.bench_serving --dataset-name random --random-range-ratio 1 --max-concurrency 17 --num-prompt 32 --random-input 1000 --random-output 60 --model zai-org/GLM-5 --warmup-requests 0 --backend sglang-oai

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

dmesg

[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32826)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu:  Process python3 pid 3247476 thread python3 pid 3247476
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu:   in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu:   cookie node_id 1 fault from die AID0.XCD0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu:          MORE_FAULTS: 0x1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu:          WALKER_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu:          RW: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32797)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu:  Process python3 pid 3247479 thread python3 pid 3247479
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu:   in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu:   cookie node_id 2 fault from die AID0.XCD1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu:          MORE_FAULTS: 0x1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu:          WALKER_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu:          RW: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32800)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu:  Process python3 pid 3247474 thread python3 pid 3247474
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu:   in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu:   cookie node_id 2 fault from die AID0.XCD1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu:          MORE_FAULTS: 0x1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu:          WALKER_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu:          RW: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32844)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu:  Process python3 pid 3247480 thread python3 pid 3247480
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu:   in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu:   cookie node_id 2 fault from die AID0.XCD1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu:          MORE_FAULTS: 0x1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu:          WALKER_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu:          MAPPING_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu:          RW: 0x0

sglang server crash output

[2026-02-17 23:51:55] INFO:     127.0.0.1:36718 - "GET /get_server_info HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50626 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50634 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50636 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50646 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50662 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50664 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50678 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50692 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50702 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50716 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50732 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50740 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50746 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50748 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50762 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50768 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO:     127.0.0.1:50778 - "POST /v1/completions HTTP/1.1" 200 OK
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP4] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP4] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP0] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP0] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP1] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP1] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP3] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP3] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP5] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP5] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP7] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP7] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP2] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP2] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP6] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP6] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
  warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
  warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
  warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
  warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
  warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
  warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
  warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
  warnings.warn(
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:11 TP5] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP5] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:11 TP0] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:11 TP0] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP2] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:12 TP2] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP3] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:12 TP3] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP1] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:12 TP1] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP7] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP6] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:12 TP7] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:12 TP6] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
Memory access fault by GPU node-7 (Agent handle: 0x20916760) on address (nil). Reason: Unknown.
Memory access fault by GPU node-4 (Agent handle: 0x1992e500) on address (nil). Reason: Unknown.
Memory access fault by GPU node-2 (Agent handle: 0x27c21790) on address (nil). Reason: Unknown.
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP4] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
[2026-02-17 23:52:12 TP4] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False) 
Memory access fault by GPU node-9 (Agent handle: 0x21c27f20) on address (nil). Reason: Unknown.
Memory access fault by GPU node-3 (Agent handle: 0x410a8c90) on address (nil). Reason: Unknown.
Memory access fault by GPU node-5 (Agent handle: 0x16c8a2b0) on address (nil). Reason: Unknown.
Memory access fault by GPU node-6 (Agent handle: 0x4b79ff10) on address (nil). Reason: Unknown.
Memory access fault by GPU node-8 (Agent handle: 0x169cc520) on address (nil). Reason: Unknown.
GPU coredump: Directory "/coredumps not writable or does not exist
GPU core dump failed
Fatal Python error: Aborted

Thread 0x000070b8fd5ff640 (most recent call first):
  File "/app/sglang-repo/python/sglang/srt/utils/watchdog.py", line 145 in _watchdog_once
  File "/app/sglang-repo/python/sglang/srt/utils/watchdog.py", line GPU coredump: Directory "/coredumps not writable or does not exist
125 in _watchdog_threadGPU core dump failed

  File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line 1012 in runFatal Python error: 
Aborted  File 

"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"Thread 0x, line 1075000079ed76fff640 in  (most recent call first):
_bootstrap_inner
  File   File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""/app/sglang-repo/python/sglang/srt/utils/watchdog.py, line "1032 in _bootstrap, line 
145
 in Thread 0x_watchdog_once000070d0559ff640
 (most recent call first):
  File "/app/sglang-repo/python/sglang/srt/utils/watchdog.py  File "", line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py125" in , line _watchdog_thread359
 in   File wait
  File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, line "1012 in , line 655run in 
wait  File 
"  File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py, line "1075 in , line 60_bootstrap_inner in 
run  File 
"  File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, line "1032, line  in 1075_bootstrap in 
_bootstrap_inner

Thread 0x  File 00007a04e73ff640" (most recent call first):
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"  File , line 1032 in "_bootstrap/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
"
Thread 0x, line 0000714544532740359 (most recent call first):
 in   File wait"
/app/aiter-repo/aiter/fused_moe.py  File "", line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py117" in , line fused_moe655
 in   File wait
"  File /app/sglang-repo/python/sglang/srt/layers/quantization/unquant.py""/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py, line "407 in , line forward_cuda60
 in run  File 
  File ""/app/sglang-repo/python/sglang/srt/layers/utils/multi_platform.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"", line , line 107583 in  in _bootstrap_innerforward_hip

  File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py  File "", line /app/sglang-repo/python/sglang/srt/layers/utils/multi_platform.py1032" in , line _bootstrap71
 in 
Thread 0xforward00007a79d6741740
 (most recent call first):
  File "/app/sglang-repo/python/sglang/srt/layers/quantization/unquant.py"  File , line 342 in "apply/app/sglang-repo/python/sglang/srt/layers/quantization/unquant.py
"  File , line "152/app/sglang-repo/python/sglang/srt/layers/moe/fused_moe_triton/layer.py in "apply, line 
1017  File  in "run_moe_core/app/sglang-repo/python/sglang/srt/layers/linear.py
"  File , line "1429/app/sglang-repo/python/sglang/srt/layers/moe/fused_moe_triton/layer.py in "forward, line 
996 in   File forward_impl
"  File /app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py""/app/sglang-repo/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line , line 1787977 in  in _call_implforward

  File   File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
  File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
  File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 680 in forward_normal
  File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 582 in forward
  File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
  File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
  File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 2421 in forward
  File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
  File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
  File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 2730 in forward
  File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
  File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
  File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 2919 in forward
  File "/app/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.pyGPU coredump: Directory "/coredumps not writable or does not exist
", line GPU core dump failed
124 in decorate_context
  File "/app/sglang-repo/python/sglang/srt/model_executor/model_runner.py", line 2327 in forward_extend
  File "/app/sglang-repo/python/sglang/srt/model_executor/model_runner.py"Fatal Python error: , line Aborted2489

 in Thread 0x_forward_raw000077ec879ff640
 (most recent call first):
  File   File ""/app/sglang-repo/python/sglang/srt/model_executor/model_runner.py/app/sglang-repo/python/sglang/srt/utils/watchdog.py"", line , line 2390145 in  in forward_watchdog_once

  File   File ""/app/sglang-repo/python/sglang/srt/managers/tp_worker.py/app/sglang-repo/python/sglang/srt/utils/watchdog.py"", line GPU coredump: Directory "/coredumps not writable or does not exist
, line 456125GPU core dump failed
 in  in forward_batch_generationFatal Python error: _watchdog_thread
Aborted
  File 

  File ""Thread 0x/app/sglang-repo/python/sglang/srt/managers/scheduler.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py0000780c9bfff640"" (most recent call first):
, line , line   File 23411012"GPU coredump: Directory "/coredumps not writable or does not exist
 in  in /app/sglang-repo/python/sglang/srt/utils/watchdog.pyGPU core dump failed
run_batch
run"Fatal Python error:   File 
GPU coredump: Directory "/coredumps not writable or does not exist
, line AbortedGPU coredump: Directory "/coredumps not writable or does not exist
"  File "GPU core dump failed
145

GPU core dump failed
/app/sglang-repo/python/sglang/srt/managers/scheduler.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in Thread 0x"Fatal Python error: Fatal Python error: "_watchdog_once, line 0000777060dff640, line AbortedAborted
1075 (most recent call first):
GPU coredump: Directory "/coredumps not writable or does not exist
1153 in 



  File Thread 0x"/app/sglang-repo/python/sglang/srt/utils/watchdog.py000076019e3ff640" (most recent call first):
 in   File GPU core dump failed
event_loop_overlapThread 0x, line   File _bootstrap_inner"
00007ee01edff640Fatal Python error: 125"
/app/sglang-repo/python/sglang/srt/utils/watchdog.py  File  (most recent call first):
Aborted

  File  in /app/sglang-repo/python/sglang/srt/utils/watchdog.py  File ""Thread 0x"_watchdog_thread"", line /app/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py0000788ec1fff640/app/sglang-repo/python/sglang/srt/utils/watchdog.py
, line 145/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py145" (most recent call first):
"  File  in "" in , line   File , line _watchdog_once/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, line _watchdog_once124"145
"1032
 in /app/sglang-repo/python/sglang/srt/utils/watchdog.py in   File , line 1012 in  in   File decorate_context"_watchdog_once, line "run_bootstrap"

145/app/sglang-repo/python/sglang/srt/utils/watchdog.py

/app/sglang-repo/python/sglang/srt/utils/watchdog.py  File   File  in "  File 
"""_watchdog_once, line "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"Thread 0x, line /app/sglang-repo/python/sglang/srt/managers/scheduler.py/app/sglang-repo/python/sglang/srt/utils/watchdog.py
125, line 00007803e3fff640125""  File  in 1075 (most recent call first):
 in , line 3160, line "_watchdog_thread in   File _watchdog_thread in 125/app/sglang-repo/python/sglang/srt/utils/watchdog.py
_bootstrap_inner"
run_scheduler_process
 in "  File 
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py  File   File _watchdog_thread, line "  File """
125/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py  File  in ""359"""_watchdog_thread, line , line 1032 in  in , line , line 108/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
1012_bootstrap
wait1012 in run
"  File  in 
Thread 0x
 in   File , line "run000078240c9ff640  File run"1012/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
 (most recent call first):
"
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py" in "  File   File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py  File , line run, line ""/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""314
1012/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in   File  in ", line 359 in 655"_bootstrap"run, line wait in , line 
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
1075
wait1075  File "  File  in   File 
 in ", line "_bootstrap_inner"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py  File _bootstrap_inner/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/spawn.py1075/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
""
" in "_bootstrap_inner  File , line /app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py  File , line , line 
"655""135 in 1075  File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in wait, line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_main in "_bootstrap_inner"
60"
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
, line   File  in , line   File "  File , line "1032"run1032"1032/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py" in /app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py
 in /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/spawn.py in , line _bootstrap"  File _bootstrap"_bootstrap1032
, line "

, line  in 
60/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py

122_bootstrapThread 0x in 00007618fabff640"Thread 0xThread 0x in 
run (most recent call first):
, line 00007787a7fff64000007ef693fff640spawn_main

  File 1075 (most recent call first):
 (most recent call first):

Thread 0x  File   File " in   File   File 000078a62a3ff640""/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_bootstrap_inner"" (most recent call first):
<string>/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py  File """, line 359  File "", line , line , line  in ", line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py359 in 1 in 1075wait/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py359"wait<module>
 in 
" in , line 
_bootstrap_inner  File , line wait359  File 
"1032
 in "  File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py" in   File wait/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line 655_bootstrap"
"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in wait
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py  File , line 
Extension modules: "

""655numpy._core._multiarray_umath, line   File Thread 0x00007878ed350740 (most recent call first):
, line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in 1032"/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py,   File 655"wait in "numpy.linalg._umath_linalg" in , line 
_bootstrap, line 60 in /app/py_3.12/lib/python3.12/site-packages/torch/_ops.pywait655  File 
, run"
 in "
pybase64._pybase64
  File wait/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.pyThread 0x  File ", charset_normalizer.md"
"00007899003a4740/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py,   File , line  (most recent call first):
, line   File "requests.packages.charset_normalizer.md"60 in 1075", line , /app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.pyrun in /app/sglang-repo/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe_triton_kernels.py60requests.packages.chardet.md"
, line _bootstrap_inner" in   File 60
run" in   File 
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.pyrun"
, line   File 1075" in ",   File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_bootstrap_inner/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"multidict._multidict""
, line   File , line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, 1075"1032 in "yarl._quoting_c in /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_bootstrap, line , propcache._helpers_c_bootstrap_inner"
1075, aiohttp._http_writer
, line 
 in ,   File 1032Thread 0x_bootstrap_inneraiohttp._http_parser" in 0000768e028a6740
, /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_bootstrap" (most recent call first):
  File aiohttp._websocket.mask
, line   File ", 
1032"/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.pyaiohttp._websocket.reader_cThread 0x in _bootstrap""00007ef782fff640, 
, line  (most recent call first):
frozenlist._frozenlist
1032  File Thread 0x,  in "0000791b26ef2740torch._C_bootstrap/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py (most recent call first):
, 
"  File torch._C._dynamo.autograd_compiler
, line ", Thread 0x359/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.pytorch._C._dynamo.eval_frame000077fc634b3740 in ",  (most recent call first):
wait, line torch._C._dynamo.guards  File 
1787, "  File  in torch._C._dynamo.utils/app/aiter-repo/aiter/rotary_embedding.py"_call_impl, torch._C._fft"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
, , line "  File torch._C._linalg180, line ", torch._C._nested in 655/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py, forward_native in "torch._C._nn
wait, line ,   File 
328  File  in "_wrapped_call_impl/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py
"torch._C._sparse", line   File , /app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py60torch._C._special" in , line run0
 in   File _call_impl"
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py  File ", line 1075 in _bootstrap_inner
  File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap

pip packages

amd-aiter                         0.1.10.post4.dev17+g7411c9975       /app/aiter-repo
conch-triton-kernels              1.2.1
flash_attn                        2.8.3
sgl-kernel                        0.3.21
sglang                            0.5.6.post3.dev2061+g10569d04b      /app/sglang-repo/python
sglang-router                     0.3.2
torch                             2.10.0a0+git449b176
torchao                           0.9.0
torchaudio                        2.10.0+27b7ebd
torchvision                       0.25.0+8ac84ee
transformers                      5.2.0.dev0
triton                            3.5.0+gitc3c476f3
triton_kernels                    1.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions