Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log uniform learning rates #16

Open
tomelliot opened this issue Aug 23, 2017 · 2 comments
Open

Log uniform learning rates #16

tomelliot opened this issue Aug 23, 2017 · 2 comments

Comments

@tomelliot
Copy link

tomelliot commented Aug 23, 2017

Great implementation, thanks for sharing!

I'm trying to understand the log uniform learning rates (p12 of the UNREAL paper, line 27 of main.py). My understanding of the implementation would be to randomly select a value in the log uniform distribution between the alpha_low and alpha_high values, something like

def loguniform(low=0, high=1, size=None):
    return np.exp(np.random.uniform(low, high, size))

(from https://stackoverflow.com/a/43977980). Can you comment on the "initial_alpha_log_rate" option, and static learning rate for each training thread?

Another note I have is around maze_environment.py. I tried training it with the default values (just to make sure everything was running smoothly), and it looked like it was diverging. Changing the hit reward to -0.001 (instead of -1) allows it to train with the default hyperparameters. If that's unexpected let me know, otherwise I can submit a PR.

@kvas7andy
Copy link

Hi, @tomelliot!

Noticed your comment, when was looking at log-uniform function of unreal algorithm.
Actually, I think that code, you suggest, isn't right implementation of log-uniform distribution sampling? and even stackoverflow answer is not correct. parameters low and high are expected to be inside logarithm function, and uniform distribution should sample value between log(low) and log(high).
I reference to the similar, but not the same question and the answer https://math.stackexchange.com/a/1411802

Second question is about training process. Was the problem because of wrong predefined hit reward or not enough training time?

@tomelliot
Copy link
Author

tomelliot commented Feb 17, 2018

I couldn't find a decisive source defining "log uniform", just comments on SO and the like. I just went with the answer from SO, but you're correct that the code snippet isn't correct. It should be:

def loguniform(low=0, high=1, size=None): return np.exp(np.random.uniform(np.log(low), np.log(high), size))

Second question is about training process. Was the problem because of wrong predefined hit reward or not enough training time?

IIRC training time wasn't the problem - I ran it for sufficient epochs for the complexity of the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants