Instead of disentangling representation on the latent space directly, which is a difficult task for flow-based models, this approach learns an effective encoder to map the distribution of conditions into a latent space and builds a tight connection between the real and generated distributions in an adversarial manner.
CVAE-GAN: fine-grained image generation through asymmetric training. This paper shows great potential in dealing with fine- grained classification problems.
The model structure is defined as the following graph. $c_u$ and a random noise are sampled from their specific distribution and concatenated with the supervised conditions. They pretrain the GoogLeNet on MNIST and CelebA dataset and then calculate the top-1 accuracy of the generated samples by different approaches. They calculate the FID score of the generated samples on a pretrained GoogLeNet.
We can use the GLOW model to extract latent vectors. We can learn the latent features of different classes, and apply diff on them. We can also guide image synthesis using the diff. Unlike makeup, the style position maybe stochastic, how to encourage that?