dp and tp size for 8 x A100 #1126
-
Hello, I am running Llama 3.1 on 8 x A100. I tried tp 8 + dp 8 with --mem-fraction-static 0.4, but it gives CUDA error: invalid device ordinal. What should be the correct setup if I want both tp and dp? |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 2 replies
-
python3 -m sglang.check_env |
Beta Was this translation helpful? Give feedback.
-
Python: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] Legend: X = Self |
Beta Was this translation helpful? Give feedback.
-
Because you're using PCIE. Try to add |
Beta Was this translation helpful? Give feedback.
-
does dp size x tp size = total no of GPU? so if I have 8 gpu, I can only do tp 4 x dp 2? |
Beta Was this translation helpful? Give feedback.
-
num_gpu = dp x tp |
Beta Was this translation helpful? Give feedback.
num_gpu = dp x tp