No-Frills Human-Object Interaction Detection: Factorization, Appearance and Layout Encodings, and Training Techniques

Abstract
Abstract (translated by Google)
URL
PDF

Abstract

We show that with an appropriate factorization, and encodings of layout and appearance constructed from outputs of pretrained object detectors, a relatively simple model outperforms more sophisticated approaches on human-object interaction detection. Our model includes factors for detection scores, human and object appearance, and coarse (box-pair configuration) and optionally fine-grained layout (human pose). We also develop training techniques that improve learning efficiency by: (i) eliminating train-inference mismatch; (ii) rejecting easy negatives during mini-batch training; and (iii) using a ratio of negatives to positives that is two orders of magnitude larger than existing approaches while constructing training mini-batches. We conduct a thorough ablation study to understand the importance of different factors and training techniques using the challenging HICO-Det dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.05967

PDF

https://arxiv.org/pdf/1811.05967

No-Frills Human-Object Interaction Detection: Factorization, Appearance and Layout Encodings, and Training Techniques

Abstract

Abstract (translated by Google)

URL

PDF

Similar Posts

Comments