Skip to content

Conversation

@tom-pollak
Copy link
Collaborator

we want the wd to be centered around 1 (exp) not 0

for _ in range(max_iters):
grad = (np.log(k) - digamma(k) - s) / (1.0 / k - polygamma(1, k) + 1e-8)
grad += k**2 * wd
grad += (k - 1) ** 2 * wd
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exponential bistro for the win!!!!

# Apply custom weight decay centered at 1
if wd > 0:
with pt.no_grad():
theta.data -= wd * (theta.data - 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theta here can be more general than k-values in a gamma distro. (this code isn't being used in experiments atm so not an issue for the deadline)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants