Some studies propose to decompose the images into three layers, i.e., face structure layer, skin detail layer, and color layer, and transfer the information from each layer of reference images to the corresponding layer of the target images. However, the predefined layers and transfer function are not data-driven and thus are inclined to generate artifacts in many cases.
One of the possible approaches is to find the average latent vector of makeup images and average latent vector of non-makeup images, and then use the difference as the direction of manipulating. However, this approach contains two major issues: 1) it can only find the general makeup but not the user-specified makeup, and 2) it requires a lot of images from the same person or same makeup to find the correct average latent vector.
To address this issue, BeautyGlow first defines a trans- formation matrix that decomposes the latent vectors into latent vector of makeup features and latent vector of facial identity features. However, due to the lack of paired data, we further formulate a new loss function containing perceptual loss, makeup loss, intra-domain loss, inter-domain loss, and cycle consistency loss, to guide the decomposition. Compared with other methods based on GANs, BeautyGlow does not need to train two large networks, i.e., generator and discriminator, which makes it more stable.
The model structure is defined as:
We can learn the latent features of different classes, and apply diff on them. We can also guide image synthesis using the diff. Unlike makeup, the style position maybe stochastic, how to encourage that?