Abstract
Temporal common sense has applications in AI tasks such as QA, multi-document summarization, and human-AI communication. We propose the task of sequencing – given a jumbled set of aligned image-caption pairs that belong to a story, the task is to sort them such that the output sequence forms a coherent story. We present multiple approaches, via unary (position) and pairwise (order) predictions, and their ensemble-based combinations, achieving strong results on this task. We use both text-based and image-based features, which depict complementary improvements. Using qualitative examples, we demonstrate that our models have learnt interesting aspects of temporal common sense.
Abstract (translated by Google)
时态常识在AI任务中有应用,如QA,多文件摘要和人 - AI通信。我们提出了排序的任务 - 给定一组混杂的图像 - 字幕对,属于一个故事,任务是排序他们,使输出序列形成一个连贯的故事。我们提出了多种方法,通过一元(位置)和成对(顺序)预测,以及它们的基于集合的组合,在这个任务上取得了很好的结果。我们使用基于文本和基于图像的功能,这些功能描述了互补的改进。使用定性的例子,我们证明我们的模型已经学习了时间常识的有趣方面。
URL
https://arxiv.org/abs/1606.07493