-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Checklist
- 1. I have searched related issues but cannot get the expected help.
- 2. The bug has not been fixed in the latest version.
- 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
- 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- 5. Please use English, otherwise it will be closed.
Describe the bug
Much of the time it is fine, but there is a abrupt termination of the streaming with:
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
using the OpenAI API endpoint. E.g. I see about 250 of those failures over course of 12 hours (even though many more fail because we have 3 retries in exponential backoff). Interestingly, these events occur in a cluster, suggesting the entire sglang is hung-up with the 8 simultaneous requests.
Perhaps even worse, sometimes the response just gets totally stuck and hangs for an hour.
Reproduction
image: lmsysorg/sglang:v0.4.2-rocm620
command:
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --host 0.0.0.0 --port 5000 --trust-remote-code --context-length 65536 --tp 8 --random-seed 1234 --download-dir /root/.cache/huggingface/hub/
There's no easy repro. The pattern of usage is ~14k system prompt + query and good number of chat turns afterwards. Also in some cases large context is filled to do RAG etc.
But I shared logs. These are the entire logs from start to finish over which there are these issues.
Environment
root@ef5e23d28c0e:/sgl-workspace# python3 -m sglang.check_env
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.26.4)
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
WARNING 01-28 22:04:48 rocm.py:17] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead.
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/pydantic/_internal/_config.py:341: UserWarning: Valid config keys have changed in V2:
* 'fields' has been removed
warnings.warn(message, UserWarning)
Python: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0]
ROCM available: True
GPU 0,1,2,3,4,5,6,7: AMD Instinct MI300X
GPU 0,1,2,3,4,5,6,7 Compute Capability: 9.4
ROCM_HOME: /opt/rocm
HIPCC: HIP version: 6.2.41133-dd7f95766
ROCM Driver Version: 6.7.0
PyTorch: 2.5.0+git13a0629
flashinfer: Module Not Found
triton: 3.0.0
transformers: 4.46.1
torchao: 0.8.0
numpy: 1.26.4
aiohttp: 3.10.10
fastapi: 0.115.4
hf_transfer: 0.1.9
huggingface_hub: 0.26.2
interegular: 0.3.3
modelscope: 1.22.3
orjson: 3.10.15
packaging: 24.1
psutil: 6.1.0
pydantic: 2.9.2
multipart: 0.0.20
zmq: 26.2.0
uvicorn: 0.32.0
uvloop: 0.21.0
vllm: 0.6.3.post2.dev1+g1ef171e0.d20250114
openai: 1.60.1
anthropic: 0.45.0
decord: 0.6.0
AMD Topology:
============================ ROCm System Management Interface ============================
=============================== Link Type between two GPUs ===============================
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7
GPU0 0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI
GPU1 XGMI 0 XGMI XGMI XGMI XGMI XGMI XGMI
GPU2 XGMI XGMI 0 XGMI XGMI XGMI XGMI XGMI
GPU3 XGMI XGMI XGMI 0 XGMI XGMI XGMI XGMI
GPU4 XGMI XGMI XGMI XGMI 0 XGMI XGMI XGMI
GPU5 XGMI XGMI XGMI XGMI XGMI 0 XGMI XGMI
GPU6 XGMI XGMI XGMI XGMI XGMI XGMI 0 XGMI
GPU7 XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0
================================== End of ROCm SMI Log ===================================
ulimit soft: 1048576