DISENTANGLING THE SPATIAL STRUCTURE AND STYLE IN CONDITIONAL VAE
Ziye Zhang, Li Sun, Zhilin Zheng, Qingli Li
-
SPS
IEEE Members: $11.00
Non-members: $15.00Length: 10:26
This paper proposes a structure in conditional variation autoencoder (cVAE) to disentangle the latent vector into a spatial structure and a style code, complementary to each other, with the one ($z_s$) being label relevant and the other ($z_u$) irrelevant. Different from traditional cVAE, our network maps the condition label into its relevant code $z_s$ through a separated module. Depending on whether the label directly relates to the image spatial structure or not, $z_s$ output from the condition mapping module is used either as the style code with the two spatial dimension of $1\times1$, or as the spatial structure code with a single channel. Based on the input image and its corresponding $z_s$, the encoder provides the posterior distribution close to a common prior regardless of its label, thus $z_u$ sampled from it becomes label irrelevant. The decoder employs $z_s$ and $z_u$ by two typical adaptive normalization modules to reconstruct the input image. Results on two datasets with different types of labels show the effectiveness of our method.