Efficient Video Scene Text Spotting: Unifying Detection, Tracking, and Recognition

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

This paper proposes an unified framework for efficiently spotting scene text in videos. The method localizes and tracks text in each frame, and recognizes each tracked text stream one-time. Specifically, we first train a spatial-temporal text detector for localizing text regions in the sequential frames. Secondly, a well-designed text tracker is trained for grouping the localized text regions into corresponding cropped text streams. To efficiently spot video text, we recognize each tracked text stream one-time with a text region quality scoring mechanism instead of identifying the cropped text regions one-by-one. Experiments on two public benchmarks demonstrate that our method achieves impressive performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03299

PDF

http://arxiv.org/pdf/1903.03299

Efficient Video Scene Text Spotting: Unifying Detection, Tracking, and Recognition

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments