About reward prediction task #15

klvn930815 · 2017-07-18T02:33:44Z

Hello, I found that the authors of UNREAL sampled such that zero rewards and non-zero rewards are equally represented in the reward prediction task, which was told in section 3.2, but it seems that the code doesn't do this. Is there something wrong?

Thanks.

hugemicrobe · 2017-07-20T13:05:24Z

I think this part is implemented in sample_rp_sequence() in ~/train/experience.py:107. Random numbers are drawn to decide the container to sample from.

klvn930815 · 2017-07-29T03:59:00Z

@hugemicrobe Thank you, and you are right. And I have another question about the combination of the loss from all the tasks. Since different tasks have different samples for input, such as reward prediction task and the main task, why is it available to sum up the loss and get the gradients? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About reward prediction task #15

About reward prediction task #15

klvn930815 commented Jul 18, 2017

hugemicrobe commented Jul 20, 2017

klvn930815 commented Jul 29, 2017

About reward prediction task #15

About reward prediction task #15

Comments

klvn930815 commented Jul 18, 2017

hugemicrobe commented Jul 20, 2017

klvn930815 commented Jul 29, 2017