SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

2019-05-02

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

arXiv_AI

arXiv_AI Transfer_Learning

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently come close to the level of non-expert humans, suggesting limited headroom for further research. This paper recaps lessons learned from the GLUE benchmark and presents SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a new public leaderboard. SuperGLUE will be available soon at super.gluebenchmark.com.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00537

PDF

http://arxiv.org/pdf/1905.00537

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments