papers AI Learner
The Github is limit! Click to go to the new site.

Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels

2016-04-12
Ishan Misra, C. Lawrence Zitnick, Margaret Mitchell, Ross Girshick

Abstract

When human annotators are given a choice about what to label in an image, they apply their own subjective judgments on what to ignore and what to mention. We refer to these noisy “human-centric” annotations as exhibiting human reporting bias. Examples of such annotations include image tags and keywords found on photo sharing sites, or in datasets containing image captions. In this paper, we use these noisy annotations for learning visually correct image classifiers. Such annotations do not use consistent vocabulary, and miss a significant amount of the information present in an image; however, we demonstrate that the noise in these annotations exhibits structure and can be modeled. We propose an algorithm to decouple the human reporting bias from the correct visually grounded labels. Our results are highly interpretable for reporting “what’s in the image” versus “what’s worth saying.” We demonstrate the algorithm’s efficacy along a variety of metrics and datasets, including MS COCO and Yahoo Flickr 100M. We show significant improvements over traditional algorithms for both image classification and image captioning, doubling the performance of existing methods in some cases.

Abstract (translated by Google)

当人类的注释者可以选择图像中的标签时,他们将自己的主观判断应用于忽略什么和提到什么。我们将这些嘈杂的“以人为本”的注释称为表现人类报告偏见。此类注释的示例包括在照片共享网站或包含图像标题的数据集中找到的图像标记和关键字。在本文中,我们使用这些噪声标注来学习视觉上正确的图像分类器。这样的注释不使用一致的词汇,并且错过图像中存在的大量信息;然而,我们证明这些注释中的噪声展示结构并且可以被模拟。我们提出了一个算法来解耦从正确的视觉接地标签的人类报告偏见。我们的结果在报道“形象内容”与“值得评论的内容”方面具有高度的可理解性。我们演示了算法的各种指标和数据集的功效,包括MS COCO和Yahoo Flickr 100M。在传统的图像分类和图像字幕算法方面,我们显示出了显着的改进,在某些情况下,将现有方法的性能提高了一倍。

URL

https://arxiv.org/abs/1512.06974

PDF

https://arxiv.org/pdf/1512.06974


Similar Posts

Comments