Abstract
When analyzing the spread of viruses, epidemiologists often need to identify the location of infected hosts. This information can be found in public databases, such as GenBank~\cite{genebank}, however, information provided in these databases are usually limited to the country or state level. More fine-grained localization information requires phylogeographers to manually read relevant scientific articles. In this work we propose an approach to automate the process of place name identification from medical (epidemiology) articles. %Place name resolution or toponym resolution is the task of detecting and resolving ambiguities related to mention of geographical locations in text. %Our model consists of a deep feed-forward neural network (DFFNN) for the detection of toponyms from medical texts. The focus of this paper is to propose a deep learning based model for toponym detection and experiment with the use of external linguistic features and domain specific information. The model was evaluated using a collection of $105$ epidemiology articles from PubMed Central~\cite{Weissenbacher2015} provided by the recent SemEval task $12$~\cite{semeval-2019-web}. Our best detection model achieves an F1 score of $80.13\%$, a significant improvement compared to the state of the art of $69.84\%$. These results underline the importance of domain specific embedding as well as specific linguistic features in toponym detection in medical journals.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.11018