A Probabilistic Generative Model of Linguistic Typology

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

In the Principles and Parameters framework, the structural features of languages depend on parameters that may be toggled on or off, with a single parameter often dictating the status of multiple features. The implied covariance between features inspires our probabilisation of this line of linguistic inquiry—we develop a generative model of language based on exponential-family matrix factorisation. By modelling all languages and features within the same architecture, we show how structural similarities between languages can be exploited to predict typological features with near-perfect accuracy, besting several baselines on the task of predicting held-out features. Furthermore, we show that language representations pre-trained on monolingual text allow for generalisation to unobserved languages. This finding has clear practical and also theoretical implications: the results confirm what linguists have hypothesised, i.e. that there are significant correlations between typological features and languages.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.10950

PDF

http://arxiv.org/pdf/1903.10950

A Probabilistic Generative Model of Linguistic Typology

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments