Abstract
Fine-grained vehicle classification is the task of classifying make, model, and year of a vehicle. This is a very challenging task, because vehicles of different types but similar color and viewpoint can often look much more similar than vehicles of same type but differing color and viewpoint. Vehicle make, model, and year in com- bination with vehicle color - are of importance in several applications such as vehicle search, re-identification, tracking, and traffic analysis. In this work we investigate the suitability of several recent landmark convolutional neural network (CNN) architectures, which have shown top results on large scale image classification tasks, for the task of fine-grained classification of vehicles. We compare the performance of the networks VGG16, several ResNets, Inception architectures, the recent DenseNets, and MobileNet. For classification we use the Stanford Cars-196 dataset which features 196 different types of vehicles. We investigate several aspects of CNN training, such as data augmentation and training from scratch vs. fine-tuning. Importantly, we introduce no aspects in the architectures or training process which are specific to vehicle classification. Our final model achieves a state-of-the-art classification accuracy of 94.6% outperforming all related works, even approaches which are specifically tailored for the task, e.g. by including vehicle part detections.
Abstract (translated by Google)
URL
https://arxiv.org/abs/1806.02987