About the loss setting in beta-VAE #1

simplespy · 2018-01-08T01:26:06Z

Hello! I have a question about the loss setting in beta-VAE.

In the MODEL ARCHITECTURE part of the paper, the authors said that they "replace the pixel log-likelihood term in Eq. 2 with an L2 loss in the high-level feature space of DAE", as is implemented in your model. [loss = L2(z_d - z_out_d) + beta * KL]

However, in the MODEL DETAILS part, they also said that "The reconstruction error was taking in the last layer of the DAE (in the pixel space of DAE reconstructions) using L2 loss and before the non-linearity." It seems that the loss should be [loss = L2(x_d - x_out_d) + beta * KL]

I'm wondering which is right and why they are inconsistent. Because with pre-trained DAE, in the course of training beta-VAE, I find these two terms of loss didn't work well (the reconstr-loss is much larger than latent-loss).

Look forward to any reply. Thanks a lot!

miyosuda · 2018-01-08T03:15:03Z

However, in the MODEL DETAILS part, they also said that "The reconstruction error was taking in the last layer of the DAE (in the pixel space of DAE reconstructions) using L2 loss and before the non-linearity." It seems that the loss should be [loss = L2(x_d - x_out_d) + beta * KL]

Yes, I'm not following the paper, and I'm calculating loss with DAE bottleneck z.
I also tried

[loss = L2(x_d - x_out_d) + beta * KL]

this loss too, (but I calculated loss with output before sigmoid activation).

However I didn't get so much difference with the result. So I'm using

[loss = L2(z_d - z_out_d) + beta * KL]

this loss calculation.

And I'm using beta=0.5, while original paper uses beta=53.0

scan/options.py

Line 15 in cc86131

tf.app.flags.DEFINE_float("vae_beta", 0.5, "Beta-VAE beta hyper parameter")

this means that they are not calculating loss with bottleneck z like mine, I think.

And one more thing I need to mention is that when visualizing the output of the VAE, I pass it through DAE.

scan/model.py

Lines 359 to 364 in cc86131

    
           def reconstruct(self, sess, xs, through_dae=True): 
        
             """ Reconstruct given data. """ 
        
             if through_dae: 
        
               # Use output from DAE decoder 
        
               return sess.run(self.x_out_d,  
        
                               feed_dict={self.x: xs})

This is because the output of the VAE itself is too noisy.

We are calculating reconstruction loss with DAE, but DAE can output the same result even if the input has noise. So the reconstruction loss becomes zero even if VAE output contains noise. So the VAE output itself is inherently noisy, I think.

So I'm passing the VAE output into DAE to clean it up to visualize.

simplespy · 2018-01-08T06:35:04Z

You are right. I tried the original loss and there is no obvious differences. It may due to the convergence of DAE.

And the second point you mentioned is really helpful!!! I implemented this model with Pytorch so I only looked over your overall architectures in comparison to descriptions in the paper and didn't noticed this 'through_dae' term. As you said the output is noisy and this operation really makes sense.

Thanks again~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the loss setting in beta-VAE #1

About the loss setting in beta-VAE #1

simplespy commented Jan 8, 2018

miyosuda commented Jan 8, 2018

simplespy commented Jan 8, 2018

About the loss setting in beta-VAE #1

About the loss setting in beta-VAE #1

Comments

simplespy commented Jan 8, 2018

miyosuda commented Jan 8, 2018

simplespy commented Jan 8, 2018