Weight output_partition_size = 576 is not divisible by weight quantization block_n = 128 #4306

wp778 · 2025-03-11T08:51:05Z

wp778
Mar 11, 2025

I am using four A800 servers, each with 8x40G GPUs, to deploy meituan/DeepSeek-R1-Block-INT8.

The startup commands I used are:

python3 -m sglang.launch_server --model /mnt/oceanfs/DeepSeek-R1-Block-INT8 --tp 32 --dist-init-addr 10.0.0.3:5000 --nnodes 4 --node-rank 0 --trust-remote --host 0.0.0.0 --port 30000 --enable-torch-compile --torch-compile-max-bs 8

python3 -m sglang.launch_server --model /mnt/oceanfs/DeepSeek-R1-Block-INT8 --tp 32 --dist-init-addr 10.0.0.3:5000 --nnodes 4 --node-rank 1 --trust-remote --enable-torch-compile --torch-compile-max-bs 8

python3 -m sglang.launch_server --model /mnt/oceanfs/DeepSeek-R1-Block-INT8 --tp 32 --dist-init-addr 10.0.0.3:5000 --nnodes 4 --node-rank 2 --trust-remote --enable-torch-compile --torch-compile-max-bs 8

python3 -m sglang.launch_server --model /mnt/oceanfs/DeepSeek-R1-Block-INT8 --tp 32 --dist-init-addr 10.0.0.3:5000 --nnodes 4 --node-rank 3 --trust-remote --enable-torch-compile --torch-compile-max-bs 8

However, during startup, I encountered the following error:
"Weight output_partition_size = 576 is not divisible by weight quantization block_n = 128"

Why is this happening, and how can I resolve it?

wp778 · 2025-03-11T08:52:48Z

wp778
Mar 11, 2025
Author

However, Meituan requires this section to be present in the config.json file and it cannot be removed.

1 reply

wp778 Mar 11, 2025
Author

[2025-03-11 08:45:55 TP1] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2255, in run_scheduler_process
scheduler = Scheduler(server_args, port_args, gpu_id, tp_rank, dp_rank)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 216, in init
self.tp_worker = TpWorkerClass(
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker_overlap_thread.py", line 63, in init
self.worker = TpModelWorker(server_args, gpu_id, tp_rank, dp_rank, nccl_port)
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 74, in init
self.model_runner = ModelRunner(
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 169, in init
self.load_model()
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 351, in load_model
self.model = get_model(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/init.py", line 22, in get_model
return loader.load_model(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 357, in load_model
model = _initialize_model(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 138, in _initialize_model
return model_class(
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 1057, in init
self.model = DeepseekV2Model(
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 1016, in init
[
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 1017, in
DeepseekV2DecoderLayer(
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 947, in init
self.mlp = DeepseekV2MLP(
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 85, in init
self.gate_up_proj = MergedColumnParallelLinear(
File "/sgl-workspace/sglang/python/sglang/srt/layers/linear.py", line 507, in init
super().init(
File "/sgl-workspace/sglang/python/sglang/srt/layers/linear.py", line 360, in init
self.quant_method.create_weights(
File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/blockwise_int8.py", line 162, in create_weights
raise ValueError(
ValueError: Weight output_partition_size = 576 is not divisible by weight quantization block_n = 128.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weight output_partition_size = 576 is not divisible by weight quantization block_n = 128 #4306

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Weight output_partition_size = 576 is not divisible by weight quantization block_n = 128 #4306

wp778 Mar 11, 2025

Replies: 1 comment · 1 reply

wp778 Mar 11, 2025 Author

wp778 Mar 11, 2025 Author

wp778
Mar 11, 2025

Replies: 1 comment 1 reply

wp778
Mar 11, 2025
Author

wp778 Mar 11, 2025
Author