Abstract
Image understanding relies heavily on accurate multi-label classification. In recent years deep learning (DL) algorithms have become very successful tools for multi-label classification of image objects. With these set of tools, various implementations of DL algorithms for multi-label classification have been published for the public use in the form of application programming interfaces (API). In this study, we evaluate and compare 10 of the most prominent publicly available APIs in a best-of-breed challenge. The evaluation of the various APIs is performed on the Visual Genome labeling benchmark dataset using 12 well-recognized similarity metrics. Additionally, for the first time in this kind of comparison, we use a semantic similarity metric to evaluate the semantic similarity performance. In this evaluation, Microsoft Computer Vision, IBM Visual Recognition, and Imagga APIs show better performance than the other APIs.
Abstract (translated by Google)
URL
http://arxiv.org/abs/1903.09190