Abstract
In this paper, we address the task of learning novel visual concepts, and their interactions with other concepts, from a few images with sentence descriptions. Using linguistic context and visual features, our method is able to efficiently hypothesize the semantic meaning of new words and add them to its word dictionary so that they can be used to describe images which contain these novel concepts. Our method has an image captioning module based on m-RNN with several improvements. In particular, we propose a transposed weight sharing scheme, which not only improves performance on image captioning, but also makes the model more suitable for the novel concept learning task. We propose methods to prevent overfitting the new concepts. In addition, three novel concept datasets are constructed for this new task. In the experiments, we show that our method effectively learns novel visual concepts from a few examples without disturbing the previously learned concepts. The project page is this http URL
Abstract (translated by Google)
在本文中,我们从几个带有句子描述的图像着手,学习新颖的视觉概念,以及它们与其他概念的相互作用。通过使用语言上下文和视觉特征,我们的方法能够有效地假设新词的语义,并将其添加到单词词典中,以便它们可以用来描述包含这些新颖概念的图像。我们的方法有一个基于m-RNN的图像字幕模块,并进行了一些改进。具体而言,我们提出了一种转置权重分配方案,不仅提高了图像字幕的性能,而且使得该模型更适合新颖的概念学习任务。我们提出防止过度配合新概念的方法。另外,为这个新任务构建了三个新的概念数据集。在实验中,我们展示了我们的方法有效地从几个例子中学习新颖的视觉概念,而不会干扰以前学到的概念。项目页面是这个http URL
URL
https://arxiv.org/abs/1504.06692