-
I want to see the code of DPU(Delayed Parameter Update) in ZeRO-Offload please let me know where the location of that part. thank you |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi @simey1128, DPU has not been committed to DeepSpeed main repo because it introduces 1-step staleness in parameters and changes loss slightly in each training step. Because of this, it cannot pass the unit tests in DeepSpeed that we added for checking the correctness of system optimizations. To enable delay parameter update, the files that need to be changed can be found in the repo https://github.com/jren73/delay_param_update. Also, note that this implementation was based on DeepSpeed v0.3.0. |
Beta Was this translation helpful? Give feedback.
Hi @simey1128,
DPU has not been committed to DeepSpeed main repo because it introduces 1-step staleness in parameters and changes loss slightly in each training step. Because of this, it cannot pass the unit tests in DeepSpeed that we added for checking the correctness of system optimizations.
To enable delay parameter update, the files that need to be changed can be found in the repo https://github.com/jren73/delay_param_update. Also, note that this implementation was based on DeepSpeed v0.3.0.