Notes on Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions Synthesis

Introduction
Network structure
Open questions

Introduction

They present a method for disentangling the latent space into the label relevant and irrelevant dimensions, $z_s$ and $z_u$, for a single input. We apply two separated encoders to map the input into $z_s$ and $z_u$ respectively, and then give the concatenated code to the decoder to reconstruct the input.

The label irrelevant code $z_u$ represent the common characteristics of all inputs, hence they are constrained by the standard Gaussian, and their encoder is trained in amortized variational inference way, like VAE. While $z_s$ is assumed to follow the Gaussian mixture distribution in which each component corresponds to a particular class. The parameters for the Gaussian components in $z_s$ encoder are optimized by the label supervision in a global stochastic way.

Optimization for VAE is quite stable, but results from it are blurry. Mainly because the posterior defined by $q_\phi(z|x)$ is not complex enough to capture the true posterior, also known for ”posterior collapse”.

Network structure

The model structure is defined as the following graph.

Open questions

We can use this to do class-conditional GANs, especially multi-class problems.

Notes on Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions Synthesis

Introduction

Network structure

Open questions

Similar Posts

Comments