Abstract
This paper presents a novel approach to perform sentiment analysis of news videos, based on the fusion of audio, textual and visual clues extracted from their contents. The proposed approach aims at contributing to the semiodiscoursive study regarding the construction of the ethos (identity) of this media universe, which has become a central part of the modern-day lives of millions of people. To achieve this goal, we apply state-of-the-art computational methods for (1) automatic emotion recognition from facial expressions, (2) extraction of modulations in the participants’ speeches and (3) sentiment analysis from the closed caption associated to the videos of interest. More specifically, we compute features, such as, visual intensities of recognized emotions, field sizes of participants, voicing probability, sound loudness, speech fundamental frequencies and the sentiment scores (polarities) from text sentences in the closed caption. Experimental results with a dataset containing 520 annotated news videos from three Brazilian and one American popular TV newscasts show that our approach achieves an accuracy of up to 84% in the sentiments (tension levels) classification task, thus demonstrating its high potential to be used by media analysts in several applications, especially, in the journalistic domain.
Abstract (translated by Google)
本文提出了一种基于从内容中提取的音频,文本和视觉线索的融合,对新闻视频进行情感分析的新方法。所提出的方法旨在促进关于这个媒体宇宙的精神(身份)建构的半投影研究,这已经成为数百万人现代生活的核心部分。为了实现这一目标,我们应用了最先进的计算方法,用于(1)从面部表情中自动识别情绪,(2)提取参与者演讲中的调制,以及(3)从与隐藏字幕相关的情感分析感兴趣的视频。更具体地说,我们计算特征,例如识别情绪的视觉强度,参与者的场大小,发声概率,声音响度,语音基频以及隐藏字幕中的文本句子的情感分数(极性)。一个包含来自三个巴西和一个美国流行电视新闻的520条带注释的新闻视频的数据集的实验结果表明,我们的方法在情绪(张力水平)分类任务中达到了高达84%的准确度,从而表明其具有很高的潜力媒体分析师在几个应用领域,特别是在新闻领域。
URL
https://arxiv.org/abs/1604.02612