You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I found that the authors of UNREAL sampled such that zero rewards and non-zero rewards are equally represented in the reward prediction task, which was told in section 3.2, but it seems that the code doesn't do this. Is there something wrong?
Thanks.
The text was updated successfully, but these errors were encountered:
I think this part is implemented in sample_rp_sequence() in ~/train/experience.py:107. Random numbers are drawn to decide the container to sample from.
@hugemicrobe Thank you, and you are right. And I have another question about the combination of the loss from all the tasks. Since different tasks have different samples for input, such as reward prediction task and the main task, why is it available to sum up the loss and get the gradients? Thanks!
Hello, I found that the authors of UNREAL sampled such that zero rewards and non-zero rewards are equally represented in the reward prediction task, which was told in section 3.2, but it seems that the code doesn't do this. Is there something wrong?
Thanks.
The text was updated successfully, but these errors were encountered: