Log uniform learning rates #16

tomelliot · 2017-08-23T10:19:38Z

Great implementation, thanks for sharing!

I'm trying to understand the log uniform learning rates (p12 of the UNREAL paper, line 27 of main.py). My understanding of the implementation would be to randomly select a value in the log uniform distribution between the alpha_low and alpha_high values, something like

def loguniform(low=0, high=1, size=None):
    return np.exp(np.random.uniform(low, high, size))

(from https://stackoverflow.com/a/43977980). Can you comment on the "initial_alpha_log_rate" option, and static learning rate for each training thread?

Another note I have is around maze_environment.py. I tried training it with the default values (just to make sure everything was running smoothly), and it looked like it was diverging. Changing the hit reward to -0.001 (instead of -1) allows it to train with the default hyperparameters. If that's unexpected let me know, otherwise I can submit a PR.

The text was updated successfully, but these errors were encountered:

kvas7andy · 2018-02-17T19:48:06Z

Hi, @tomelliot!

Noticed your comment, when was looking at log-uniform function of unreal algorithm.
Actually, I think that code, you suggest, isn't right implementation of log-uniform distribution sampling? and even stackoverflow answer is not correct. parameters low and high are expected to be inside logarithm function, and uniform distribution should sample value between log(low) and log(high).
I reference to the similar, but not the same question and the answer https://math.stackexchange.com/a/1411802

Second question is about training process. Was the problem because of wrong predefined hit reward or not enough training time?

tomelliot · 2018-02-17T20:41:37Z

I couldn't find a decisive source defining "log uniform", just comments on SO and the like. I just went with the answer from SO, but you're correct that the code snippet isn't correct. It should be:

def loguniform(low=0, high=1, size=None): return np.exp(np.random.uniform(np.log(low), np.log(high), size))

Second question is about training process. Was the problem because of wrong predefined hit reward or not enough training time?

IIRC training time wasn't the problem - I ran it for sufficient epochs for the complexity of the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log uniform learning rates #16

Log uniform learning rates #16

tomelliot commented Aug 23, 2017 •

edited

Loading

kvas7andy commented Feb 17, 2018

tomelliot commented Feb 17, 2018 •

edited

Loading

Log uniform learning rates #16

Log uniform learning rates #16

Comments

tomelliot commented Aug 23, 2017 • edited Loading

kvas7andy commented Feb 17, 2018

tomelliot commented Feb 17, 2018 • edited Loading

tomelliot commented Aug 23, 2017 •

edited

Loading

tomelliot commented Feb 17, 2018 •

edited

Loading