Abstract
Person re-identification (re-ID) is a task of matching pedestrians under disjoint camera views. To recognise paired snapshots, it has to cope with large cross-view variations caused by the camera view shift. Supervised deep neural networks are effective in producing a set of non-linear projections that can transform cross-view images into a common feature space. However, they typically impose a symmetric architecture, yielding the network ill-conditioned on its optimisation. In this paper, we learn view-invariant subspace for person re-ID, and its corresponding similarity metric using an adversarial view adaptation approach. The main contribution is to learn coupled asymmetric mappings regarding view characteristics which are adversarially trained to address the view discrepancy by optimising the cross-entropy view confusion objective. To determine the similarity value, the network is empowered with a similarity discriminator to promote features that are highly discriminant in distinguishing positive and negative pairs. The other contribution includes an adaptive weighing on the most difficult samples to address the imbalance of within/between-identity pairs. Our approach achieves notable improved performance in comparison to state-of-the-arts on benchmark datasets.
Abstract (translated by Google)
URL
https://arxiv.org/abs/1904.01755