Skip to content

Conversation

ming1753
Copy link
Collaborator

@ming1753 ming1753 commented Oct 14, 2025

export FD_ENABLE_MAX_PREFILL=1
python -m fastdeploy.entrypoints.openai.api_server \
        --model /path/to/checkpoint \
        --port 8185 \
        --metrics-port 8189 \
        --engine-worker-queue-port 8198 \
        --cache-queue-port 55660 \
        --max-model-len 16384 \
        --max-num-batched-tokens 16384 \
        --gpu-memory-utilization 0.7 \
        --max-num-seqs 256 \
        --workers 2 \
        --graph-optimization-config '{"graph_opt_level":0, "use_cudagraph":true}' \

Copy link

paddle-bot bot commented Oct 14, 2025

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants