You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to understand the log uniform learning rates (p12 of the UNREAL paper, line 27 of main.py). My understanding of the implementation would be to randomly select a value in the log uniform distribution between the alpha_low and alpha_high values, something like
(from https://stackoverflow.com/a/43977980). Can you comment on the "initial_alpha_log_rate" option, and static learning rate for each training thread?
Another note I have is around maze_environment.py. I tried training it with the default values (just to make sure everything was running smoothly), and it looked like it was diverging. Changing the hit reward to -0.001 (instead of -1) allows it to train with the default hyperparameters. If that's unexpected let me know, otherwise I can submit a PR.
The text was updated successfully, but these errors were encountered:
Noticed your comment, when was looking at log-uniform function of unreal algorithm.
Actually, I think that code, you suggest, isn't right implementation of log-uniform distribution sampling? and even stackoverflow answer is not correct. parameters low and high are expected to be inside logarithm function, and uniform distribution should sample value between log(low) and log(high).
I reference to the similar, but not the same question and the answer https://math.stackexchange.com/a/1411802
Second question is about training process. Was the problem because of wrong predefined hit reward or not enough training time?
I couldn't find a decisive source defining "log uniform", just comments on SO and the like. I just went with the answer from SO, but you're correct that the code snippet isn't correct. It should be:
Great implementation, thanks for sharing!
I'm trying to understand the log uniform learning rates (p12 of the UNREAL paper, line 27 of main.py). My understanding of the implementation would be to randomly select a value in the log uniform distribution between the alpha_low and alpha_high values, something like
(from https://stackoverflow.com/a/43977980). Can you comment on the "initial_alpha_log_rate" option, and static learning rate for each training thread?
Another note I have is around maze_environment.py. I tried training it with the default values (just to make sure everything was running smoothly), and it looked like it was diverging. Changing the hit reward to -0.001 (instead of -1) allows it to train with the default hyperparameters. If that's unexpected let me know, otherwise I can submit a PR.
The text was updated successfully, but these errors were encountered: