You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have successfully trained GoogleNet from scratch using 2 machines covering entire imagenet(1.281167 million images)
I have achieved accuracy of 62.3% top-1 and 84.7% top-5 accuracy in 26 epocs.
Hopefully some statistician can prove PSGD is working at least with simple momentum, nesterov method... for AdaDelta, Adam (squared gradients) not sure about the implication of PSGD...
The text was updated successfully, but these errors were encountered:
The key for distributed data training is starting from a model with top-1 accuracy high enough on GoogleNet.(say at least 5%, I call this step first opinion), Otherwise sparknet will stuck at random guessing instead of learning to improve higher accuracy.
I am starting to suspect the initial opinion is very critical and may cause bias latter on.
Hopefully some statistician can start from above observation to deduce first order method like PSGD 's momentum, nesterov momentum can converge with a reasonable starting point when dealing with dramatically different data in parallel with only occasional communication.
I will try out method in utilized squared momentum method also for convergence.(AdaDelta, Adam, RMSProp etc...), There are strong evidence suggest squared momentum also converge using PSGD method.
Hopefully some statistician can prove PSGD is working at least with simple momentum, nesterov method... for AdaDelta, Adam (squared gradients) not sure about the implication of PSGD...
The text was updated successfully, but these errors were encountered: