Abstract
We introduce SocialIQa, the first large-scale benchmark for commonsense reasoning about social situations. This resource contains 45,000 multiple choice questions for probing emotional and social intelligence in a variety of everyday situations (e.g., Q: Skylar went to Jan's birthday party and gave her a gift. What does Skylar need to do before this?'' A:
Go shopping’’). Through crowdsourcing, we collect commonsense questions along with correct and incorrect answers about social interactions, using a new framework that mitigates stylistic artifacts in incorrect answers by asking workers to provide the right answer to the wrong question. While humans can easily solve these questions (90%), our benchmark is more challenging for existing question-answering (QA) models, such as those based on pretrained language models (77%). Notably, we further establish SocialIQa as a resource for transfer learning of commonsense knowledge, achieving state-of-the-art performance on several commonsense reasoning tasks (Winograd Schemas, COPA).
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.09728