Problem Description
[Issue]: Deploy GLM-4.6 on MI300X failed. Error msg "aiter : wrapper_mha_batch_prefill() Expected a value of type 'int' for argument 'max_seqlen_k‘ but instead found type 'NoneType'
model : GLM-4.6 FP8
https://www.modelscope.cn/models/ZhipuAI/GLM-4.6-FP8
docker image: rocm/sglang-0.5.2
reference link: https://deepwiki.com/zai-org/GLM-4.5/5.3-sglang-deployment
service start command:
python3 -m sglang.launch_server
--model-path /data/models/GLM-4.6
--tp-size 8
--speculative-algorithm EAGLE
--speculative-num-steps 3
--speculative-eagle-topk 1
--speculative-num-draft-tokens 4
--mem-fraction-static 0.8
--disable-shared-experts-fusion
--served-model-name glm-4.6
--host 0.0.0.0
--port 8011
Operating System
Ubuntu 22.04
CPU
AMD EPYC Genoa 9654
GPU
AMD 8*MI300X
ROCm Version
ROCm 7.0
ROCm Component
No response
Steps to Reproduce
[Issue]: Deploy GLM-4.6 on MI300X failed. Error msg "aiter : wrapper_mha_batch_prefill() Expected a value of type 'int' for argument 'max_seqlen_k‘ but instead found type 'NoneType'
model : GLM-4.6 FP8
https://www.modelscope.cn/models/ZhipuAI/GLM-4.6-FP8
docker image: rocm/sglang-0.5.2
reference link: https://deepwiki.com/zai-org/GLM-4.5/5.3-sglang-deployment
service start command:
python3 -m sglang.launch_server
--model-path /data/models/GLM-4.6
--tp-size 8
--speculative-algorithm EAGLE
--speculative-num-steps 3
--speculative-eagle-topk 1
--speculative-num-draft-tokens 4
--mem-fraction-static 0.8
--disable-shared-experts-fusion
--served-model-name glm-4.6
--host 0.0.0.0
--port 8011
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response