Abstract
In recent years, we have seen the performance of video-based person Re-Identification (ReID) methods have improved considerably. However, with the influx of varying video domains, such as egocentric videos, it has become apparent that there are still many open challenges to be faced. These challenges are due to factors such as poor video quality due to ego-motion, blurriness, severe changes in lighting conditions and perspective distortions. To facilitate the research towards conquering these challenges, this paper contributes a new, first-of-its-kind dataset called EgoReID. The dataset is captured using 3 mobile cellphones with non-overlapping field-of-view. It contains 900 IDs and around 10,200 tracks with a total of 176,000 detections. Moreover, for each video we also provide 12-sensor meta data. Directly applying current approaches to our dataset results in poor performance. Considering the unique nature of our dataset, we propose a new framework which takes advantage of both visual and sensor meta data to successfully perform Person ReID. In this paper, we propose to employ human body parsing and extract weighted local video features from different body regions. In addition, we also employ sensor meta data to determine target’s next camera and their estimated time of arrival, such that the search is only performed among tracks present in the predicted next camera around the estimated time. This considerably improves our ReID performance as it significantly reduces our search space.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1812.09570