-
Notifications
You must be signed in to change notification settings - Fork 68
Open
Description
First of all, great work,
In your thesis, the "Dropout as a Bayesian Approximation..." and "Concrete Dropout" article, @yaringal, you seem to apply the Dropout distribution only to the weights and not the biases, which then leads to a p-dependant regularization term that only includes the weight matrices.
However, in the pytorch implementation (I didn't check the other ones) of the regularization term you sum the squares of layer.parameters() which will collect the biases as well. This will lead to a p-dependant regularization term for the biases, which is probably not what you want if you start optimizing p. Is this a bug or am I missing something?
Metadata
Metadata
Assignees
Labels
No labels