-
Notifications
You must be signed in to change notification settings - Fork 2.1k
🚀 Enhance GRPO VLLM server from sync to async and accelerate training #3182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@binary-husky the speedups you posted look great, though I have a question on how you parallelize the computation. THe picture shows a data dependency between roll outs and model training (and vllm update).
In other words, this achieve parallization within grad accum steps, and works only if the grad accum > 1? |
@fabianlim Yes, works only if the grad acc step > 1. |
We should make this happen! The new Mistral reasoning model uses a pipeline like this one. |
Shouldn't populating |
client first call
generate
(non-blocking), then after a while callget_future
(with identical arguments) to get resultclient automatically call
get_future
insidegenerate
, blocking further execution before the generation is completegradient_accumulation_steps
batches, so that training and vllm generation can run in parallel!However, I have to admit that this piece of code is not elegant enough, remove them if they disqualifies.
RolloutEngine
intrl.scripts.vllm_serve
, for more sophisticated vllm inference functionality, trying to supportlm_generate > MCP tool_call > lm_generate > another MCP tool_call > ...
, but not complete yet.