[Bug] [ROCm] Running DeepSeek V3 on MI300X, getting "Config not found, Performance might be sub-optimal" error #3219

nikhil-tensorwave · 2025-01-30T19:34:09Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

I am running DeepSeek v3 on a node with 8xMI300X GPUs on ROCm 6.3.1. I am able to run it using an image built from Dockerfile.rocm in docker, however I have noticed this warning show up:

Using default W8A8 Block FP8 kernel config. Performance might be sub-optimal! Config file not found at <multiple config files>

In the container built from Dockerfile.rocm, with SGLang v0.4.2, these are the missing config files:

/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=24576,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=7168,K=16384,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=32768,K=512,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json

I also tried the Docker image lmsysorg/sglang:v0.4.1.post4-rocm620, based on this blog from AMD. This had SGLang v0.4.1, and was missing the following config files:

/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=1536,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=3072,K=1536,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=576,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=7168,K=2048,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=4608,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=7168,K=2304,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=512,K=7168,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/quantization/configs/N=7168,K=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json
/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/configs/E=256,N=256,device_name=AMD_Instinct_MI300X,dtype=fp8_w8a8,block_shape=[128, 128].json

Reproduction

Reproduction steps:
I launched the container with the command:

docker run -it --network=host \
    --group-add=video \
    --ipc=host \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    --device /dev/kfd \
    --device /dev/dri \
    --shm-size 16G \
    -p 8080:8080 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    v0.4.2-rocm620:latest

And I ran the server using the command

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code --port 8080

Environment

ROCm 6.3.1
8xMI300X

Docker images:

v0.4.2-rocm620:latest built from docker/Dockerfile.rocm and build instructions from SGLang docs
lmsysorg/sglang:v0.4.1.post4-rocm620

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [ROCm] Running DeepSeek V3 on MI300X, getting "Config not found, Performance might be sub-optimal" error #3219

[Bug] [ROCm] Running DeepSeek V3 on MI300X, getting "Config not found, Performance might be sub-optimal" error #3219

nikhil-tensorwave commented Jan 30, 2025

[Bug] [ROCm] Running DeepSeek V3 on MI300X, getting "Config not found, Performance might be sub-optimal" error #3219

[Bug] [ROCm] Running DeepSeek V3 on MI300X, getting "Config not found, Performance might be sub-optimal" error #3219

Comments

nikhil-tensorwave commented Jan 30, 2025

Checklist

Describe the bug

Reproduction

Environment