Abstract
Automatically describing videos has ever been fascinating. In this work, we attempt to describe videos from a specific domain - broadcast videos of lawn tennis matches. Given a video shot from a tennis match, we intend to generate a textual commentary similar to what a human expert would write on a sports website. Unlike many recent works that focus on generating short captions, we are interested in generating semantically richer descriptions. This demands a detailed low-level analysis of the video content, specially the actions and interactions among subjects. We address this by limiting our domain to the game of lawn tennis. Rich descriptions are generated by leveraging a large corpus of human created descriptions harvested from Internet. We evaluate our method on a newly created tennis video data set. Extensive analysis demonstrate that our approach addresses both semantic correctness as well as readability aspects involved in the task.
Abstract (translated by Google)
自动描述视频令人着迷。在这项工作中,我们试图描述来自特定领域的视频 - 播放草地网球比赛的视频。鉴于网球比赛的视频,我们打算产生一个类似于人类专家在体育网站上写的文字评论。与许多最近着重于生成短字幕的作品不同,我们有兴趣生成语义更丰富的描述。这要求对视频内容进行详细的低级分析,特别是对象之间的操作和交互。我们通过将我们的域名限制在草地网球比赛来解决这个问题。通过利用从互联网收集的大量人类创建的描述来生成丰富的描述。我们在一个新创建的网球视频数据集上评估我们的方法。广泛的分析表明,我们的方法解决了任务涉及的语义正确性和可读性方面。
URL
https://arxiv.org/abs/1511.08522