papers AI Learner
The Github is limit! Click to go to the new site.

Efficient Video Scene Text Spotting: Unifying Detection, Tracking, and Recognition

2019-03-08
Zhanzhan Cheng, Jing Lu, Jianwen Xie, Yi Niu, Shiliang Pu, Fei Wu

Abstract

This paper proposes an unified framework for efficiently spotting scene text in videos. The method localizes and tracks text in each frame, and recognizes each tracked text stream one-time. Specifically, we first train a spatial-temporal text detector for localizing text regions in the sequential frames. Secondly, a well-designed text tracker is trained for grouping the localized text regions into corresponding cropped text streams. To efficiently spot video text, we recognize each tracked text stream one-time with a text region quality scoring mechanism instead of identifying the cropped text regions one-by-one. Experiments on two public benchmarks demonstrate that our method achieves impressive performance.

Abstract (translated by Google)
URL

http://arxiv.org/abs/1903.03299

PDF

http://arxiv.org/pdf/1903.03299


Similar Posts

Comments