Why is there only one gpu being used even when I set --dp 2? #1206
-
I have export CUDA_VISIBLE_DEVICES=0,1 and both have memory of 23GB. However, I get the following error message after I run python3 -m sglang.launch_server --model-path lmms-lab/
llava-onevision-qwen2-7b-si --port=30000 --chat-template=vicuna_v1.1 --dp-size 2 --enable-p2p-check --chunked-prefill-size=16384 --mem-fraction-static 0.7 [gpu=0] Init nccl begin. TracebackTraceback (most recent call last): torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 MiB. GPU 0 has a total capacity of 23.65 GiB of which 17.56 MiB is free. Process 4097022 has 23.63 GiB memory in use. Of the allocated memory 23.14 GiB is allocated by PyTorch, and 91.81 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. After diving into the process.py, I find that when dp_worker_id=0 and gpu_ids=[0], there will be 3 processes: dp_size 2
whether_daemonic:-------------------------------------------------------------- False False
self:-------------------------------------------------------------- <Process name='Process-1' pid=851486 parent=851370 started>
whether_daemonic:-------------------------------------------------------------- False False
whether_daemonic:-------------------------------------------------------------- False False
self:-------------------------------------------------------------- <Process name='Process-2' pid=851490 parent=851370 started>
self:-------------------------------------------------------------- <Process name='Process-1:1' pid=851489 parent=851486 started>
dp_worker_id:---------------------------------------- 0
gpu_ids:---------------------------------------- [0] and the 3rd process is the child of 1st process. Is it permitted? Should they become daemonic? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
what is your precision? You should be able to run the fp16 version with 23G memory. Maybe you are running a fp32 one. |
Beta Was this translation helpful? Give feedback.
what is your precision? You should be able to run the fp16 version with 23G memory. Maybe you are running a fp32 one.
You can add
--dtype float16
when you launch the server.