Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About reward prediction task #15

Open
klvn930815 opened this issue Jul 18, 2017 · 2 comments
Open

About reward prediction task #15

klvn930815 opened this issue Jul 18, 2017 · 2 comments

Comments

@klvn930815
Copy link

Hello, I found that the authors of UNREAL sampled such that zero rewards and non-zero rewards are equally represented in the reward prediction task, which was told in section 3.2, but it seems that the code doesn't do this. Is there something wrong?

Thanks.

@hugemicrobe
Copy link

I think this part is implemented in sample_rp_sequence() in ~/train/experience.py:107. Random numbers are drawn to decide the container to sample from.

@klvn930815
Copy link
Author

@hugemicrobe Thank you, and you are right. And I have another question about the combination of the loss from all the tasks. Since different tasks have different samples for input, such as reward prediction task and the main task, why is it available to sum up the loss and get the gradients? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants