-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the loss setting in beta-VAE #1
Comments
Yes, I'm not following the paper, and I'm calculating loss with DAE bottleneck z.
this loss too, (but I calculated loss with output before sigmoid activation). However I didn't get so much difference with the result. So I'm using
this loss calculation. And I'm using beta=0.5, while original paper uses beta=53.0 Line 15 in cc86131
this means that they are not calculating loss with bottleneck z like mine, I think. And one more thing I need to mention is that when visualizing the output of the VAE, I pass it through DAE. Lines 359 to 364 in cc86131
This is because the output of the VAE itself is too noisy. We are calculating reconstruction loss with DAE, but DAE can output the same result even if the input has noise. So the reconstruction loss becomes zero even if VAE output contains noise. So the VAE output itself is inherently noisy, I think. So I'm passing the VAE output into DAE to clean it up to visualize. |
You are right. I tried the original loss and there is no obvious differences. It may due to the convergence of DAE. And the second point you mentioned is really helpful!!! I implemented this model with Pytorch so I only looked over your overall architectures in comparison to descriptions in the paper and didn't noticed this 'through_dae' term. As you said the output is noisy and this operation really makes sense. Thanks again~ |
Hello! I have a question about the loss setting in beta-VAE.
In the MODEL ARCHITECTURE part of the paper, the authors said that they "replace the pixel log-likelihood term in Eq. 2 with an L2 loss in the high-level feature space of DAE", as is implemented in your model. [loss = L2(z_d - z_out_d) + beta * KL]
However, in the MODEL DETAILS part, they also said that "The reconstruction error was taking in the last layer of the DAE (in the pixel space of DAE reconstructions) using L2 loss and before the non-linearity." It seems that the loss should be [loss = L2(x_d - x_out_d) + beta * KL]
I'm wondering which is right and why they are inconsistent. Because with pre-trained DAE, in the course of training beta-VAE, I find these two terms of loss didn't work well (the reconstr-loss is much larger than latent-loss).
Look forward to any reply. Thanks a lot!
The text was updated successfully, but these errors were encountered: