Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the loss setting in beta-VAE #1

Open
simplespy opened this issue Jan 8, 2018 · 2 comments
Open

About the loss setting in beta-VAE #1

simplespy opened this issue Jan 8, 2018 · 2 comments

Comments

@simplespy
Copy link

Hello! I have a question about the loss setting in beta-VAE.

In the MODEL ARCHITECTURE part of the paper, the authors said that they "replace the pixel log-likelihood term in Eq. 2 with an L2 loss in the high-level feature space of DAE", as is implemented in your model. [loss = L2(z_d - z_out_d) + beta * KL]

However, in the MODEL DETAILS part, they also said that "The reconstruction error was taking in the last layer of the DAE (in the pixel space of DAE reconstructions) using L2 loss and before the non-linearity." It seems that the loss should be [loss = L2(x_d - x_out_d) + beta * KL]

I'm wondering which is right and why they are inconsistent. Because with pre-trained DAE, in the course of training beta-VAE, I find these two terms of loss didn't work well (the reconstr-loss is much larger than latent-loss).

Look forward to any reply. Thanks a lot!

@miyosuda
Copy link
Owner

miyosuda commented Jan 8, 2018

However, in the MODEL DETAILS part, they also said that "The reconstruction error was taking in the last layer of the DAE (in the pixel space of DAE reconstructions) using L2 loss and before the non-linearity." It seems that the loss should be [loss = L2(x_d - x_out_d) + beta * KL]

Yes, I'm not following the paper, and I'm calculating loss with DAE bottleneck z.
I also tried

[loss = L2(x_d - x_out_d) + beta * KL]

this loss too, (but I calculated loss with output before sigmoid activation).

However I didn't get so much difference with the result. So I'm using

[loss = L2(z_d - z_out_d) + beta * KL]

this loss calculation.

And I'm using beta=0.5, while original paper uses beta=53.0

tf.app.flags.DEFINE_float("vae_beta", 0.5, "Beta-VAE beta hyper parameter")

this means that they are not calculating loss with bottleneck z like mine, I think.

And one more thing I need to mention is that when visualizing the output of the VAE, I pass it through DAE.

scan/model.py

Lines 359 to 364 in cc86131

def reconstruct(self, sess, xs, through_dae=True):
""" Reconstruct given data. """
if through_dae:
# Use output from DAE decoder
return sess.run(self.x_out_d,
feed_dict={self.x: xs})

This is because the output of the VAE itself is too noisy.

We are calculating reconstruction loss with DAE, but DAE can output the same result even if the input has noise. So the reconstruction loss becomes zero even if VAE output contains noise. So the VAE output itself is inherently noisy, I think.

So I'm passing the VAE output into DAE to clean it up to visualize.

@simplespy
Copy link
Author

You are right. I tried the original loss and there is no obvious differences. It may due to the convergence of DAE.

And the second point you mentioned is really helpful!!! I implemented this model with Pytorch so I only looked over your overall architectures in comparison to descriptions in the paper and didn't noticed this 'through_dae' term. As you said the output is noisy and this operation really makes sense.

Thanks again~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants