You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have observed a recent change in LinearWithGradAccumulationAndAsyncCommunication to store the gradient of weights in WeightGradStore as a part of the new Zero Bubble Pipeline Parallelism feature (#396):
I have observed a recent change in
LinearWithGradAccumulationAndAsyncCommunication
to store the gradient of weights inWeightGradStore
as a part of the new Zero Bubble Pipeline Parallelism feature (#396):Megatron-DeepSpeed/megatron/core/tensor_parallel/layers.py
Line 370 in 1280f59
However, the stored gradients are only accessed in
deepspeed_zbh1_engine
:Megatron-DeepSpeed/megatron/core/pipeline_parallel/deepspeed_zbh1_engine.py
Line 108 in 1280f59
If the Zero Bubble Pipeline Parallelism feature is not enabled, it seems that the gradients are not being returned. Is this an expected behavior?
The text was updated successfully, but these errors were encountered: