Abstract
Motivated by the recent progress in generative models, we introduce a model that generates images from natural language descriptions. The proposed model iteratively draws patches on a canvas, while attending to the relevant words in the description. After training on Microsoft COCO, we compare our model with several baseline generative models on image generation and retrieval tasks. We demonstrate that our model produces higher quality samples than other approaches and generates images with novel scene compositions corresponding to previously unseen captions in the dataset.
Abstract (translated by Google)
受到生成模型的最新进展的启发,我们引入了一种从自然语言描述中生成图像的模型。所提出的模型在画布上迭代地绘制补丁,同时参照描述中的相关单词。在Microsoft COCO上进行培训之后,我们将模型与几个基线生成模型进行比较,分析图像生成和检索任务。我们证明,我们的模型产生比其他方法更高质量的样本,并产生具有与数据集中以前看不见的标题相对应的新的场景组成的图像。
URL
https://arxiv.org/abs/1511.02793