-
Notifications
You must be signed in to change notification settings - Fork 214
Open
Description
Problem Description
sglang server 16 concurrency from the client side passes and then fails when concurrency is changed to 17+, crash seems to happen during fused_moe. Is there a working container I should use?
Operating System
Ubuntu 22.04.5 LTS (Jammy Jellyfish)
CPU
AMD EPYC 9575F 64-Core Processor
GPU
8x AMD Instinct MI355X
ROCm Version
ROCm 7.2
ROCm Component
No response
Steps to Reproduce
SGLANG_ROCM_FUSED_DECODE_MLA=0 python3 -m sglang.launch_server --model zai-org/GLM-5 --tp-size 8 --attention-backend triton --disable-radix-cache --watchdog-timeout 1200
python3 -m sglang.bench_serving --dataset-name random --random-range-ratio 1 --max-concurrency 17 --num-prompt 32 --random-input 1000 --random-output 60 --model zai-org/GLM-5 --warmup-requests 0 --backend sglang-oai
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
dmesg
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32826)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: Process python3 pid 3247476 thread python3 pid 3247476
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: cookie node_id 1 fault from die AID0.XCD0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: MORE_FAULTS: 0x1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: WALKER_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: MAPPING_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:65:00.0: amdgpu: RW: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32797)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: Process python3 pid 3247479 thread python3 pid 3247479
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: cookie node_id 2 fault from die AID0.XCD1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: MORE_FAULTS: 0x1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: WALKER_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: MAPPING_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:85:00.0: amdgpu: RW: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32800)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: Process python3 pid 3247474 thread python3 pid 3247474
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: cookie node_id 2 fault from die AID0.XCD1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: MORE_FAULTS: 0x1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: WALKER_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: MAPPING_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:75:00.0: amdgpu: RW: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:3 pasid:32844)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: Process python3 pid 3247480 thread python3 pid 3247480
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: in page starting at address 0x0000000000000000 from IH client 0x1b (UTCL2)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: cookie node_id 2 fault from die AID0.XCD1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: MORE_FAULTS: 0x1
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: WALKER_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: MAPPING_ERROR: 0x0
[Thu Feb 12 20:35:53 2026] amdgpu 0000:e5:00.0: amdgpu: RW: 0x0
sglang server crash output
[2026-02-17 23:51:55] INFO: 127.0.0.1:36718 - "GET /get_server_info HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50626 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50634 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50636 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50646 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50662 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50664 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50678 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50692 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50702 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50716 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50732 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50740 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50746 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50748 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50762 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50768 - "POST /v1/completions HTTP/1.1" 200 OK
[2026-02-17 23:52:11] INFO: 127.0.0.1:50778 - "POST /v1/completions HTTP/1.1" 200 OK
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP4] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP4] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP0] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP0] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP1] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP1] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP3] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP3] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP5] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP5] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP7] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP7] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP2] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP2] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[2026-02-17 23:52:11 TP6] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = False, estimated_m_per_expert = 573
[aiter] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP6] [fused_moe] using 2stage default for (256, 16384, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
warnings.warn(
/app/py_3.12/lib/python3.12/site-packages/triton/backends/amd/compiler.py:79: UserWarning: kpack is deprecated starting from gfx950 and will be removed in later releases. So for now kpack = 2 will be overwritten to 1 to make transitioning easier.
warnings.warn(
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:11 TP5] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP5] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:11 TP0] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:11 TP0] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP2] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:12 TP2] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP3] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:12 TP3] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP1] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:12 TP1] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP7] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP6] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:12 TP7] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:12 TP6] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
Memory access fault by GPU node-7 (Agent handle: 0x20916760) on address (nil). Reason: Unknown.
Memory access fault by GPU node-4 (Agent handle: 0x1992e500) on address (nil). Reason: Unknown.
Memory access fault by GPU node-2 (Agent handle: 0x27c21790) on address (nil). Reason: Unknown.
[aiter] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[2026-02-17 23:52:12 TP4] run_1stage = False, ksplit = 0 q_type = QuantType.No block_m = 128 use_nt = True, estimated_m_per_expert = 35
[aiter] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
[2026-02-17 23:52:12 TP4] [fused_moe] using 2stage default for (256, 1024, 6144, 256, 257, 9, 'ActivationType.Silu', 'torch.bfloat16', 'torch.bfloat16', 'torch.bfloat16', 'QuantType.No', True, False)
Memory access fault by GPU node-9 (Agent handle: 0x21c27f20) on address (nil). Reason: Unknown.
Memory access fault by GPU node-3 (Agent handle: 0x410a8c90) on address (nil). Reason: Unknown.
Memory access fault by GPU node-5 (Agent handle: 0x16c8a2b0) on address (nil). Reason: Unknown.
Memory access fault by GPU node-6 (Agent handle: 0x4b79ff10) on address (nil). Reason: Unknown.
Memory access fault by GPU node-8 (Agent handle: 0x169cc520) on address (nil). Reason: Unknown.
GPU coredump: Directory "/coredumps not writable or does not exist
GPU core dump failed
Fatal Python error: Aborted
Thread 0x000070b8fd5ff640 (most recent call first):
File "/app/sglang-repo/python/sglang/srt/utils/watchdog.py", line 145 in _watchdog_once
File "/app/sglang-repo/python/sglang/srt/utils/watchdog.py", line GPU coredump: Directory "/coredumps not writable or does not exist
125 in _watchdog_threadGPU core dump failed
File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line 1012 in runFatal Python error:
Aborted File
"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"Thread 0x, line 1075000079ed76fff640 in (most recent call first):
_bootstrap_inner
File File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""/app/sglang-repo/python/sglang/srt/utils/watchdog.py, line "1032 in _bootstrap, line
145
in Thread 0x_watchdog_once000070d0559ff640
(most recent call first):
File "/app/sglang-repo/python/sglang/srt/utils/watchdog.py File "", line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py125" in , line _watchdog_thread359
in File wait
File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, line "1012 in , line 655run in
wait File
" File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py, line "1075 in , line 60_bootstrap_inner in
run File
" File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, line "1032, line in 1075_bootstrap in
_bootstrap_inner
Thread 0x File 00007a04e73ff640" (most recent call first):
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py" File , line 1032 in "_bootstrap/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
"
Thread 0x, line 0000714544532740359 (most recent call first):
in File wait"
/app/aiter-repo/aiter/fused_moe.py File "", line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py117" in , line fused_moe655
in File wait
" File /app/sglang-repo/python/sglang/srt/layers/quantization/unquant.py""/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py, line "407 in , line forward_cuda60
in run File
File ""/app/sglang-repo/python/sglang/srt/layers/utils/multi_platform.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"", line , line 107583 in in _bootstrap_innerforward_hip
File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py File "", line /app/sglang-repo/python/sglang/srt/layers/utils/multi_platform.py1032" in , line _bootstrap71
in
Thread 0xforward00007a79d6741740
(most recent call first):
File "/app/sglang-repo/python/sglang/srt/layers/quantization/unquant.py" File , line 342 in "apply/app/sglang-repo/python/sglang/srt/layers/quantization/unquant.py
" File , line "152/app/sglang-repo/python/sglang/srt/layers/moe/fused_moe_triton/layer.py in "apply, line
1017 File in "run_moe_core/app/sglang-repo/python/sglang/srt/layers/linear.py
" File , line "1429/app/sglang-repo/python/sglang/srt/layers/moe/fused_moe_triton/layer.py in "forward, line
996 in File forward_impl
" File /app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py""/app/sglang-repo/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line , line 1787977 in in _call_implforward
File File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 680 in forward_normal
File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 582 in forward
File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 2421 in forward
File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 2730 in forward
File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787 in _call_impl
File "/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776 in _wrapped_call_impl
File "/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py", line 2919 in forward
File "/app/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.pyGPU coredump: Directory "/coredumps not writable or does not exist
", line GPU core dump failed
124 in decorate_context
File "/app/sglang-repo/python/sglang/srt/model_executor/model_runner.py", line 2327 in forward_extend
File "/app/sglang-repo/python/sglang/srt/model_executor/model_runner.py"Fatal Python error: , line Aborted2489
in Thread 0x_forward_raw000077ec879ff640
(most recent call first):
File File ""/app/sglang-repo/python/sglang/srt/model_executor/model_runner.py/app/sglang-repo/python/sglang/srt/utils/watchdog.py"", line , line 2390145 in in forward_watchdog_once
File File ""/app/sglang-repo/python/sglang/srt/managers/tp_worker.py/app/sglang-repo/python/sglang/srt/utils/watchdog.py"", line GPU coredump: Directory "/coredumps not writable or does not exist
, line 456125GPU core dump failed
in in forward_batch_generationFatal Python error: _watchdog_thread
Aborted
File
File ""Thread 0x/app/sglang-repo/python/sglang/srt/managers/scheduler.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py0000780c9bfff640"" (most recent call first):
, line , line File 23411012"GPU coredump: Directory "/coredumps not writable or does not exist
in in /app/sglang-repo/python/sglang/srt/utils/watchdog.pyGPU core dump failed
run_batch
run"Fatal Python error: File
GPU coredump: Directory "/coredumps not writable or does not exist
, line AbortedGPU coredump: Directory "/coredumps not writable or does not exist
" File "GPU core dump failed
145
GPU core dump failed
/app/sglang-repo/python/sglang/srt/managers/scheduler.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in Thread 0x"Fatal Python error: Fatal Python error: "_watchdog_once, line 0000777060dff640, line AbortedAborted
1075 (most recent call first):
GPU coredump: Directory "/coredumps not writable or does not exist
1153 in
File Thread 0x"/app/sglang-repo/python/sglang/srt/utils/watchdog.py000076019e3ff640" (most recent call first):
in File GPU core dump failed
event_loop_overlapThread 0x, line File _bootstrap_inner"
00007ee01edff640Fatal Python error: 125"
/app/sglang-repo/python/sglang/srt/utils/watchdog.py File (most recent call first):
Aborted
File in /app/sglang-repo/python/sglang/srt/utils/watchdog.py File ""Thread 0x"_watchdog_thread"", line /app/py_3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py0000788ec1fff640/app/sglang-repo/python/sglang/srt/utils/watchdog.py
, line 145/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py145" (most recent call first):
" File in "" in , line File , line _watchdog_once/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, line _watchdog_once124"145
"1032
in /app/sglang-repo/python/sglang/srt/utils/watchdog.py in File , line 1012 in in File decorate_context"_watchdog_once, line "run_bootstrap"
145/app/sglang-repo/python/sglang/srt/utils/watchdog.py
/app/sglang-repo/python/sglang/srt/utils/watchdog.py File File in " File
"""_watchdog_once, line "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"Thread 0x, line /app/sglang-repo/python/sglang/srt/managers/scheduler.py/app/sglang-repo/python/sglang/srt/utils/watchdog.py
125, line 00007803e3fff640125"" File in 1075 (most recent call first):
in , line 3160, line "_watchdog_thread in File _watchdog_thread in 125/app/sglang-repo/python/sglang/srt/utils/watchdog.py
_bootstrap_inner"
run_scheduler_process
in " File
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py File File _watchdog_thread, line " File """
125/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py File in ""359"""_watchdog_thread, line , line 1032 in in , line , line 108/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
1012_bootstrap
wait1012 in run
" File in
Thread 0x
in File , line "run000078240c9ff640 File run"1012/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
(most recent call first):
"
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/process.py" in " File File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py File , line run, line ""/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py""314
1012/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in File in ", line 359 in 655"_bootstrap"run, line wait in , line
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
1075
wait1075 File " File in File
in ", line "_bootstrap_inner"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py File _bootstrap_inner/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/spawn.py1075/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
""
" in "_bootstrap_inner File , line /app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py File , line , line
"655""135 in 1075 File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in wait, line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_main in "_bootstrap_inner"
60"
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
, line File in , line File " File , line "1032"run1032"1032/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py" in /app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py
in /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/spawn.py in , line _bootstrap" File _bootstrap"_bootstrap1032
, line "
, line in
60/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
122_bootstrapThread 0x in 00007618fabff640"Thread 0xThread 0x in
run (most recent call first):
, line 00007787a7fff64000007ef693fff640spawn_main
File 1075 (most recent call first):
(most recent call first):
Thread 0x File File " in File File 000078a62a3ff640""/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_bootstrap_inner"" (most recent call first):
<string>/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py File """, line 359 File "", line , line , line in ", line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py359 in 1 in 1075wait/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py359"wait<module>
in
" in , line
_bootstrap_inner File , line wait359 File
"1032
in " File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py" in File wait/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line 655_bootstrap"
"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in wait
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py File , line
Extension modules: "
""655numpy._core._multiarray_umath, line File Thread 0x00007878ed350740 (most recent call first):
, line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py in 1032"/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py, File 655"wait in "numpy.linalg._umath_linalg" in , line
_bootstrap, line 60 in /app/py_3.12/lib/python3.12/site-packages/torch/_ops.pywait655 File
, run"
in "
pybase64._pybase64
File wait/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.pyThread 0x File ", charset_normalizer.md"
"00007899003a4740/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py, File , line (most recent call first):
, line File "requests.packages.charset_normalizer.md"60 in 1075", line , /app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.pyrun in /app/sglang-repo/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe_triton_kernels.py60requests.packages.chardet.md"
, line _bootstrap_inner" in File 60
run" in File
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.pyrun"
, line File 1075" in ", File /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_bootstrap_inner/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py"multidict._multidict""
, line File , line /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py, 1075"1032 in "yarl._quoting_c in /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_bootstrap, line , propcache._helpers_c_bootstrap_inner"
1075, aiohttp._http_writer
, line
in , File 1032Thread 0x_bootstrap_inneraiohttp._http_parser" in 0000768e028a6740
, /root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py_bootstrap" (most recent call first):
File aiohttp._websocket.mask
, line File ",
1032"/app/sglang-repo/python/sglang/srt/models/deepseek_v2.py/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.pyaiohttp._websocket.reader_cThread 0x in _bootstrap""00007ef782fff640,
, line (most recent call first):
frozenlist._frozenlist
1032 File Thread 0x, in "0000791b26ef2740torch._C_bootstrap/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py (most recent call first):
,
" File torch._C._dynamo.autograd_compiler
, line ", Thread 0x359/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.pytorch._C._dynamo.eval_frame000077fc634b3740 in ", (most recent call first):
wait, line torch._C._dynamo.guards File
1787, " File in torch._C._dynamo.utils/app/aiter-repo/aiter/rotary_embedding.py"_call_impl, torch._C._fft"/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py
, , line " File torch._C._linalg180, line ", torch._C._nested in 655/app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py, forward_native in "torch._C._nn
wait, line , File
328 File in "_wrapped_call_impl/app/py_3.12/lib/python3.12/site-packages/tqdm/_monitor.py
"torch._C._sparse", line File , /app/py_3.12/lib/python3.12/site-packages/torch/nn/modules/module.py60torch._C._special" in , line run0
in File _call_impl"
/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py File ", line 1075 in _bootstrap_inner
File "/root/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap
pip packages
amd-aiter 0.1.10.post4.dev17+g7411c9975 /app/aiter-repo
conch-triton-kernels 1.2.1
flash_attn 2.8.3
sgl-kernel 0.3.21
sglang 0.5.6.post3.dev2061+g10569d04b /app/sglang-repo/python
sglang-router 0.3.2
torch 2.10.0a0+git449b176
torchao 0.9.0
torchaudio 2.10.0+27b7ebd
torchvision 0.25.0+8ac84ee
transformers 5.2.0.dev0
triton 3.5.0+gitc3c476f3
triton_kernels 1.0.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels