Abstract
The subtleties of human perception, as measured by vision scientists through the use of psychophysics, are important clues to the internal workings of visual recognition. For instance, measured reaction time can indicate whether a visual stimulus is easy for a subject to recognize, or whether it is hard. In this paper, we consider how to incorporate psychophysical measurements of visual perception into the loss function of a deep neural network being trained for a recognition task, under the assumption that such information can enforce consistency with human behavior. As a case study to assess the viability of this approach, we look at the problem of handwritten document transcription. While good progress has been made towards automatically transcribing modern handwriting, significant challenges remain in transcribing historical documents. Here we work towards a comprehensive transcription solution for Medieval manuscripts that combines networks trained using our novel loss formulation with natural language processing elements. In a baseline assessment, reliable performance is demonstrated for the standard IAM and RIMES datasets. Further, we go on to show feasibility for our approach on a previously published dataset and a new dataset of digitized Latin manuscripts, originally produced by scribes in the Cloister of St. Gall around the middle of the 9th century.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.03734