Elastic resets
#2566
Replies: 1 comment 1 reply
-
@t0278611 no, I never implemented this, there was also a simplified SAM variant that used EMA, not sure if it was of the model weights or past gradients to approx the original SAM algorithm.. I tried that one at one point but couldn't improve past training runs. Open to adding this if there's some evidence of success with hacked timm scripts and models in the image space ... I've run across a lot of paper ideas that I couldn't replicate over the years... |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Using model ema (at least for small-ish models) seems to drastically improve the validation results. I wonder why the ema weights are never used in training (a bit like the lookahead optimizer). It should be fairly forward to implement "elastic resets", where the online weights periodically get overwritten by the ema weights. I did not find the feature in the train args, if it exist please do point it out
Beta Was this translation helpful? Give feedback.
All reactions