papers AI Learner
The Github is limit! Click to go to the new site.

Word embeddings for idiolect identification

2019-02-10
Konstantinos Perifanos, Eirini Florou, Dionysis Goutsos

Abstract

The term idiolect refers to the unique and distinctive use of language of an individual and it is the theoretical foundation of Authorship Attribution. In this paper we are focusing on learning distributed representations (embeddings) of social media users that reflect their writing style. These representations can be considered as stylistic fingerprints of the authors. We are exploring the performance of the two main flavours of distributed representations, namely embeddings produced by Neural Probabilistic Language models (such as word2vec) and matrix factorization (such as GloVe).

Abstract (translated by Google)
URL

http://arxiv.org/abs/1902.03658

PDF

http://arxiv.org/pdf/1902.03658


Comments

Content