GoogleNet training successful using PSGD method-- will Adam, AdaDelta work? #140

nhe150 · 2016-07-01T21:29:48Z

I have used a modified version of SparkNet.
I have successfully trained GoogleNet from scratch using 2 machines covering entire imagenet(1.281167 million images)
I have achieved accuracy of 62.3% top-1 and 84.7% top-5 accuracy in 26 epocs.

Hopefully some statistician can prove PSGD is working at least with simple momentum, nesterov method... for AdaDelta, Adam (squared gradients) not sure about the implication of PSGD...

robertnishihara · 2016-07-03T21:49:49Z

Nice work! Thanks for the running the benchmark.

nhe150 · 2016-07-07T22:00:52Z

The key for distributed data training is starting from a model with top-1 accuracy high enough on GoogleNet.(say at least 5%, I call this step first opinion), Otherwise sparknet will stuck at random guessing instead of learning to improve higher accuracy.

I am starting to suspect the initial opinion is very critical and may cause bias latter on.

nhe150 · 2016-07-08T17:27:15Z

Hopefully some statistician can start from above observation to deduce first order method like PSGD 's momentum, nesterov momentum can converge with a reasonable starting point when dealing with dramatically different data in parallel with only occasional communication.
I will try out method in utilized squared momentum method also for convergence.(AdaDelta, Adam, RMSProp etc...), There are strong evidence suggest squared momentum also converge using PSGD method.

nhe150 · 2016-07-08T17:29:18Z

And it seems the PSGD converge faster than all other methods. I will call this clustering wisdom in training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GoogleNet training successful using PSGD method-- will Adam, AdaDelta work? #140

GoogleNet training successful using PSGD method-- will Adam, AdaDelta work? #140

nhe150 commented Jul 1, 2016

robertnishihara commented Jul 3, 2016

nhe150 commented Jul 7, 2016 •

edited

Loading

nhe150 commented Jul 8, 2016

nhe150 commented Jul 8, 2016

GoogleNet training successful using PSGD method-- will Adam, AdaDelta work? #140

GoogleNet training successful using PSGD method-- will Adam, AdaDelta work? #140

Comments

nhe150 commented Jul 1, 2016

robertnishihara commented Jul 3, 2016

nhe150 commented Jul 7, 2016 • edited Loading

nhe150 commented Jul 8, 2016

nhe150 commented Jul 8, 2016

nhe150 commented Jul 7, 2016 •

edited

Loading