Proceedings:
No. 3: AAAI-22 Technical Tracks 3
Volume
Issue:
Proceedings of the AAAI Conference on Artificial Intelligence, 36
Track:
AAAI Technical Track on Computer Vision III
Downloads:
Abstract:
Single image generation (SIG), described as generating diverse samples that have the same visual content as the given natural image, is first introduced by SinGAN, which builds a pyramid of GANs to progressively learn the internal patch distribution of the single image. It shows excellent performance in a wide range of image manipulation tasks. However, SinGAN has some limitations. Firstly, due to lack of semantic information, SinGAN cannot handle the object images well as it does on the scene and texture images. Secondly, the independent progressive training scheme is time-consuming and easy to cause artifacts accumulation. To tackle these problems, in this paper, we dig into the single image generation problem and improve SinGAN by fully-utilization of internal and external priors. The main contributions of this paper include: 1) We interpret single image generation from the perspective of the general generative task, that is, to learn a diverse distribution from the Dirac distribution composed of a single image. In order to solve this non-trivial problem, we construct a regularized latent variable model to formulate SIG. To the best of our knowledge, it is the first time to give a clear formulation and optimization goal of SIG, and all the existing methods for SIG can be regarded as special cases of this model. 2) We design a novel Prior-based end-to-end training GAN (PetsGAN), which is infused with internal prior and external prior to overcome the problems of SinGAN. For one thing, we employ the pre-trained GAN model to inject external prior for image generation, which can alleviate the problem of lack of semantic information and generate natural, reasonable and diverse samples, even for the object image. For another, we fully-utilize the internal prior by a differential Patch Matching module and an effective reconstruction network to generate consistent and realistic texture. 3) We construct abundant of qualitative and quantitative experiments on three datasets. The experimental results show our method surpasses other methods on both generated image quality, diversity, and training speed. Moreover, we apply our method to other image manipulation tasks (e.g., style transfer, harmonization) and the results further prove the effectiveness and efficiency of our method.
DOI:
10.1609/aaai.v36i3.20251
AAAI
Proceedings of the AAAI Conference on Artificial Intelligence, 36