Abstract
Crowd counting from a single image is a challenging task due to high appearance similarity, perspective changes and severe congestion. Many methods only focus on the local appearance features and they cannot handle the aforementioned challenges. In order to tackle them, we propose a Perspective Crowd Counting Network (PCC Net), which consists of three parts: 1) Density Map Estimation (DME) focuses on learning very local features for density map estimation; 2) Random High-level Density Classification (R-HDC) extracts global features to predict the coarse density labels of random patches in images; 3) Fore-/Background Segmentation (FBS) encodes mid-level features to segments the foreground and background. Besides, the DULR module is embedded in PCC Net to encode the perspective changes on four directions (Down, Up, Left and Right). The proposed PCC Net is verified on five mainstream datasets, which achieves the state-of-the-art performance on the one and attains the competitive results on the other four datasets. The source code is available at https://github.com/gjy3035/PCC-Net.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1905.10085