How to make 671B R1 inference in batch? #4411

Andcircle · 2025-03-14T05:25:51Z

Andcircle
Mar 14, 2025

current set up is:
2 nodes of 8 x H100

Launch
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-R1 --tp 16 --dist-init-addr 10.0.0.1:5000 --nnodes 2 --node-rank 0 --trust-remote-code

Then use python openai lib to call server get inference result one by one
Tried to call the server simultaneously with 'response = client.chat.completions.create', but the inference speed got much slower.

How can I do batch inference efficiently?
Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make 671B R1 inference in batch? #4411

{{title}}

Replies: 0 comments

Select a reply

How to make 671B R1 inference in batch? #4411

Andcircle Mar 14, 2025

Replies: 0 comments

Andcircle
Mar 14, 2025