Abstract
Activity recognition is a challenging problem with many practical applications. In addition to the visual features, recent approaches have benefited from the use of context, e.g., inter-relationships among the activities and objects. However, these approaches require data to be labeled, entirely available beforehand, and not designed to be updated continuously, which make them unsuitable for surveillance applications. In contrast, we propose a continuous-learning framework for context-aware activity recognition from unlabeled video, which has two distinct advantages over existing methods. First, it employs a novel active-learning technique that not only exploits the informativeness of the individual activities but also utilizes their contextual information during query selection; this leads to significant reduction in expensive manual annotation effort. Second, the learned models can be adapted online as more data is available. We formulate a conditional random field model that encodes the context and devise an information-theoretic approach that utilizes entropy and mutual information of the nodes to compute the set of most informative queries, which are labeled by a human. These labels are combined with graphical inference techniques for incremental updates. We provide a theoretical formulation of the active learning framework with an analytic solution. Experiments on six challenging datasets demonstrate that our framework achieves superior performance with significantly less manual labeling.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1904.04406