Abstract
This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Convolutional Neural Network activations as an embedding to find semantically similar images. From these images, the most typical caption is selected based on unigram frequencies. Although the method received low scores with automated evaluation metrics and in human assessed average correctness, it is competitive in the ratio of captions which pass the Turing test and which are assessed as better or equal to human captions.
Abstract (translated by Google)
本报告将我们提交给2015年MS COCO字幕挑战。该方法使用卷积神经网络激活作为嵌入来查找语义相似的图像。从这些图像中,最典型的标题是基于单字频率选择的。尽管该方法在自动评估指标和人类评估的平均正确性方面获得了低分,但它在通过图灵测试的字幕比例上具有竞争性,并被评估为与人类字幕更好或相同。
URL
https://arxiv.org/abs/1506.03995