Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Embedded CNN based vehicle classification and counting in non-laned road traffic

2019-01-18

Mayank Singh Chauhan, Arshdeep Singh, Mansi Khemka, Arneish Prateek, Rijurekha Sen

arXiv_CV

arXiv_CV Object_Detection CNN Inference Classification Detection
Abstract

Classifying and counting vehicles in road traffic has numerous applications in the transportation engineering domain. However, the wide variety of vehicles (two-wheelers, three-wheelers, cars, buses, trucks etc.) plying on roads of developing regions without any lane discipline, makes vehicle classification and counting a hard problem to automate. In this paper, we use state of the art Convolutional Neural Network (CNN) based object detection models and train them for multiple vehicle classes using data from Delhi roads. We get upto 75% MAP on an 80-20 train-test split using 5562 video frames from four different locations. As robust network connectivity is scarce in developing regions for continuous video transmissions from the road to cloud servers, we also evaluate the latency, energy and hardware cost of embedded implementations of our CNN model based inferences.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06358

PDF

http://arxiv.org/pdf/1901.06358
Read All
Robust Anomaly Detection in Images using Adversarial Autoencoders

2019-01-18

Laura Beggel, Michael Pfeiffer, Bernd Bischl

arXiv_CV

arXiv_CV Adversarial Object_Detection Detection
Abstract

Reliably detecting anomalies in a given set of images is a task of high practical relevance for visual quality inspection, surveillance, or medical image analysis. Autoencoder neural networks learn to reconstruct normal images, and hence can classify those images as anomalies, where the reconstruction error exceeds some threshold. Here we analyze a fundamental problem of this approach when the training set is contaminated with a small fraction of outliers. We find that continued training of autoencoders inevitably reduces the reconstruction error of outliers, and hence degrades the anomaly detection performance. In order to counteract this effect, an adversarial autoencoder architecture is adapted, which imposes a prior distribution on the latent representation, typically placing anomalies into low likelihood-regions. Utilizing the likelihood model, potential anomalies can be identified and rejected already during training, which results in an anomaly detector that is significantly more robust to the presence of outliers during training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06355

PDF

http://arxiv.org/pdf/1901.06355
Read All
Adapting Convolutional Neural Networks for Geographical Domain Shift

2019-01-18

Pavel Ostyakov, Sergey I. Nikolenko

arXiv_CV

arXiv_CV GAN CNN
Abstract

We present the winning solution for the Inclusive Images Competition organized as part of the Conference on Neural Information Processing Systems (NeurIPS 2018) Competition Track. The competition was organized to study ways to cope with domain shift in image processing, specifically geographical shift: the training and two test sets in the competition had different geographical distributions. Our solution has proven to be relatively straightforward and simple: it is an ensemble of several CNNs where only the last layer is fine-tuned with the help of a small labeled set of tuning labels made available by the organizers. We believe that while domain shift remains a formidable problem, our approach opens up new possibilities for alleviating this problem in practice, where small labeled datasets from the target domain are usually either available or can be obtained and labeled cheaply.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06345

PDF

http://arxiv.org/pdf/1901.06345
Read All
Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection

2019-01-18

Fan Yang, Lei Zhang, Sijia Yu, Danil Prokhorov, Xue Mei, Haibin Ling

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Deep_Learning Detection
Abstract

Pavement crack detection is a critical task for insuring road safety. Manual crack detection is extremely time-consuming. Therefore, an automatic road crack detection method is required to boost this progress. However, it remains a challenging task due to the intensity inhomogeneity of cracks and complexity of the background, e.g., the low contrast with surrounding pavements and possible shadows with similar intensity. Inspired by recent advances of deep learning in computer vision, we propose a novel network architecture, named Feature Pyramid and Hierarchical Boosting Network (FPHBN), for pavement crack detection. The proposed network integrates semantic information to low-level features for crack detection in a feature pyramid way. And, it balances the contribution of both easy and hard samples to loss by nested sample reweighting in a hierarchical way. To demonstrate the superiority and generality of the proposed method, we evaluate the proposed method on five crack datasets and compare it with state-of-the-art crack detection, edge detection, semantic segmentation methods. Extensive experiments show that the proposed method outperforms these state-of-the-art methods in terms of accuracy and generality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06340

PDF

http://arxiv.org/pdf/1901.06340
Read All
Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation

2019-01-18

Wei Sun, Tianfu Wu

arXiv_CV

arXiv_CV Adversarial Attention GAN Optimization
Abstract

Image synthesis and image-to-image translation are two important generative learning tasks. Remarkable progress has been made by learning Generative Adversarial Networks (GANs)~\cite{goodfellow2014generative} and cycle-consistent GANs (CycleGANs)~\cite{zhu2017unpaired} respectively. This paper presents a method of learning Spatial Pyramid Attentive Pooling (SPAP) which is a novel architectural unit and can be easily integrated into both generators and discriminators in GANs and CycleGANs. The proposed SPAP integrates Atrous spatial pyramid~\cite{chen2018deeplab}, a proposed cascade attention mechanism and residual connections~\cite{he2016deep}. It leverages the advantages of the three components to facilitate effective end-to-end generative learning: (i) the capability of fusing multi-scale information by ASPP; (ii) the capability of capturing relative importance between both spatial locations (especially multi-scale context) or feature channels by attention; (iii) the capability of preserving information and enhancing optimization feasibility by residual connections. Coarse-to-fine and fine-to-coarse SPAP are studied and intriguing attention maps are observed in both tasks. In experiments, the proposed SPAP is tested in GANs on the Celeba-HQ-128 dataset~\cite{karras2017progressive}, and tested in CycleGANs on the Image-to-Image translation datasets including the Cityscape dataset~\cite{cordts2016cityscapes}, Facade and Aerial Maps dataset~\cite{zhu2017unpaired}, both obtaining better performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06322

PDF

http://arxiv.org/pdf/1901.06322
Read All
Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data

2019-01-18

Yinhao Zhu, Nicholas Zabaras, Phaedon-Stelios Koutsourelakis, Paris Perdikaris

arXiv_CV

arXiv_CV CNN Deep_Learning
Abstract

Surrogate modeling and uncertainty quantification tasks for PDE systems are most often considered as supervised learning problems where input and output data pairs are used for training. The construction of such emulators is by definition a small data problem which poses challenges to deep learning approaches that have been developed to operate in the big data regime. Even in cases where such models have been shown to have good predictive capability in high dimensions, they fail to address constraints in the data implied by the PDE model. This paper provides a methodology that incorporates the governing equations of the physical model in the loss/likelihood functions. The resulting physics-constrained, deep learning models are trained without any labeled data (e.g. employing only input data) and provide comparable predictive responses with data-driven models while obeying the constraints of the problem at hand. This work employs a convolutional encoder-decoder neural network approach as well as a conditional flow-based generative model for the solution of PDEs, surrogate model construction, and uncertainty quantification tasks. The methodology is posed as a minimization problem of the reverse Kullback-Leibler (KL) divergence between the model predictive density and the reference conditional density, where the later is defined as the Boltzmann-Gibbs distribution at a given inverse temperature with the underlying potential relating to the PDE system of interest. The generalization capability of these models to out-of-distribution input is considered. Quantification and interpretation of the predictive uncertainty is provided for a number of problems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06314

PDF

http://arxiv.org/pdf/1901.06314
Read All
Neural-network simulations of memory consolidation and reconsolidation

2019-01-18

Peter Helfer, Thomas R. Shultz

arXiv_CV

arXiv_CV Prediction
Abstract

In the mammalian brain newly acquired memories depend on the hippocampus for maintenance and recall, but over time these functions are taken over by the neocortex through a process called systems consolidation. However, reactivation of a consolidated memory can induce a brief period of temporary hippocampus-dependence, followed by return to hippocampus-independence. Here we present a computational model that uses simulation of recently described mechanisms of synaptic plasticity to account for findings from the systems consolidation/reconsolidation literature and to make predictions for future research.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.02270

PDF

https://arxiv.org/pdf/1901.02270
Read All
RxNN: A Framework for Evaluating Deep Neural Networks on Resistive Crossbars

2019-01-18

Shubham Jain, Abhronil Sengupta, Kaushik Roy, Anand Raghunathan

arXiv_CV

arXiv_CV Knowledge Quantitative
Abstract

Resistive crossbars have emerged as promising building blocks for realizing DNNs due to their ability to compactly and efficiently realize the dominant DNN computational kernel, viz., vector-matrix multiplication. However, a key challenge with resistive crossbars is that they suffer from a range of device and circuit level non-idealities such as interconnect parasitics, peripheral circuits, sneak paths, and process variations. These non-idealities can lead to errors in vector-matrix multiplication that eventually degrade the DNN’s accuracy. There has been no study of the impact of non-idealities on the accuracy of large-scale DNNs, in part because existing device and circuit models are infeasible to use in application-level evaluation. In this work, we present a fast and accurate simulation framework to enable evaluation and re-training of large-scale DNNs on resistive crossbar based hardware fabrics. We first characterize the impact of crossbar non-idealities on errors incurred in the realized vector-matrix multiplications and observe that the errors have significant data and hardware-instance dependence that should be considered. We propose a Fast Crossbar Model (FCM) to accurately capture the errors arising due to crossbar non-idealities while being four-to-five orders of magnitude faster than circuit simulation. Finally, we develop RxNN, a software framework to evaluate and re-train DNNs on resistive crossbar systems. RxNN is based on the popular Caffe machine learning framework, and we use it to evaluate a suite of large-scale DNNs developed for the ImageNet Challenge (ILSVRC). Our experiments reveal that resistive crossbar non-idealities can lead to significant accuracy degradations (9.6%-32%) for these large-scale DNNs. To the best of our knowledge, this work is the first quantitative evaluation of the accuracy of large-scale DNNs on resistive crossbar based hardware.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.00072

PDF

http://arxiv.org/pdf/1809.00072
Read All
Improving Sequence-to-Sequence Learning via Optimal Transport

2019-01-18

Liqun Chen, Yizhe Zhang, Ruiyi Zhang, Chenyang Tao, Zhe Gan, Haichao Zhang, Bai Li, Dinghan Shen, Changyou Chen, Lawrence Carin

arXiv_CL

arXiv_CL Image_Caption Summarization Caption
Abstract

Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE). However, standard MLE training considers a word-level objective, predicting the next word given the previous ground-truth partial sentence. This procedure focuses on modeling local syntactic patterns, and may fail to capture long-range semantic structure. We present a novel solution to alleviate these issues. Our approach imposes global sequence-level guidance via new supervision based on optimal transport, enabling the overall characterization and preservation of semantic features. We further show that this method can be understood as a Wasserstein gradient flow trying to match our model to the ground truth sequence distribution. Extensive experiments are conducted to validate the utility of the proposed approach, showing consistent improvements over a wide variety of NLP tasks, including machine translation, abstractive text summarization, and image captioning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06283

PDF

http://arxiv.org/pdf/1901.06283
Read All
A Recent Survey on the Applications of Genetic Programming in Image Processing

2019-01-18

Asifullah Khan, Aqsa Saeed Qureshi, Noor ul Wahab, Mutawara Hussain, Muhammad Yousaf Hamza

arXiv_AI

arXiv_AI Survey Optimization Classification Recognition
Abstract

During the last two decades, Genetic Programming (GP) has been largely used to tackle optimization, classification, and automatic features selection related tasks. The widespread use of GP is mainly due to its flexible and comprehensible tree-type structure. Similarly, research is also gaining momentum in the field of Image Processing (IP) because of its promising results over wide areas of applications ranging from medical IP to multispectral imaging. IP is mainly involved in applications such as computer vision, pattern recognition, image compression, storage and transmission, and medical diagnostics. This prevailing nature of images and their associated algorithm i.e complexities gave an impetus to the exploration of GP. GP has thus been used in different ways for IP since its inception. Many interesting GP techniques have been developed and employed in the field of IP. To give the research community an extensive view of these techniques, this paper presents the diverse applications of GP in IP and provides useful resources for further research. Also, comparison of different parameters used in ten different applications of IP are summarized in tabular form. Moreover, analysis of different parameters used in IP related tasks is carried-out to save the time needed in future for evaluating the parameters of GP. As more advancement is made in GP methodologies, its success in solving complex tasks not only related to IP but also in other fields will increase. Additionally, guidelines are provided for applying GP in IP related tasks, pros and cons of GP techniques are discussed, and some future directions are also set.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.07387

PDF

http://arxiv.org/pdf/1901.07387
Read All
Computing large market equilibria using abstractions

2019-01-18

Christian Kroer, Alexander Peysakhovich, Eric Sodomka, Nicolas E. Stier-Moses

arXiv_AI

arXiv_AI
Abstract

Computing market equilibria is an important practical problem for market design (e.g. fair division, item allocation). However, computing equilibria requires large amounts of information (e.g. all valuations for all buyers for all items) and compute power. We consider ameliorating these issues by applying a method used for solving complex games: constructing a coarsened abstraction of a given market, solving for the equilibrium in the abstraction, and lifting the prices and allocations back to the original market. We show how to bound important quantities such as regret, envy, Nash social welfare, Pareto optimality, and maximin share when the abstracted prices and allocations are used in place of the real equilibrium. We then study two abstraction methods of interest for practitioners: 1) filling in unknown valuations using techniques from matrix completion, 2) reducing the problem size by aggregating groups of buyers/items into smaller numbers of representative buyers/items and solving for equilibrium in this coarsened market. We find that in real data allocations/prices that are relatively close to equilibria can be computed from even very coarse abstractions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06230

PDF

http://arxiv.org/pdf/1901.06230
Read All
Computing Optimal Coarse Correlated Equilibria in Sequential Games

2019-01-18

Andrea Celli, Stefano Coniglio, Nicola Gatti

arXiv_AI

arXiv_AI Relation
Abstract

We investigate the computation of equilibria in extensive-form games where ex ante correlation is possible, focusing on correlated equilibria requiring the least amount of communication between the players and the mediator. Motivated by the hardness results on the computation of normal-form correlated equilibria, we introduce the notion of normal-form coarse correlated equilibrium, extending the definition of coarse correlated equilibrium to sequential games. We show that, in two-player games without chance moves, an optimal (e.g., social welfare maximizing) normal-form coarse correlated equilibrium can be computed in polynomial time, and that in general multi-player games (including two-player games with Chance), the problem is NP-hard. For the former case, we provide a polynomial-time algorithm based on the ellipsoid method and also propose a more practical one, which can be efficiently applied to problems of considerable size. Then, we discuss how our algorithm can be extended to games with Chance and games with more than two players.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06221

PDF

http://arxiv.org/pdf/1901.06221
Read All
Lung and Pancreatic Tumor Characterization in the Deep Learning Era: Novel Supervised and Unsupervised Learning Approaches

2019-01-18

Sarfaraz Hussein, Pujan Kandel, Candice W. Bolan, Michael B. Wallace, Ulas Bagci

arXiv_AI

arXiv_AI Sparse CNN Transfer_Learning Classification Deep_Learning
Abstract

Risk stratification (characterization) of tumors from radiology images can be more accurate and faster with computer-aided diagnosis (CAD) tools. Tumor characterization through such tools can also enable non-invasive cancer staging, prognosis, and foster personalized treatment planning as a part of precision medicine. In this study, we propose both supervised and unsupervised machine learning strategies to improve tumor characterization. Our first approach is based on supervised learning for which we demonstrate significant gains with deep learning algorithms, particularly by utilizing a 3D Convolutional Neural Network and Transfer Learning. Motivated by the radiologists’ interpretations of the scans, we then show how to incorporate task dependent feature representations into a CAD system via a graph-regularized sparse Multi-Task Learning (MTL) framework. In the second approach, we explore an unsupervised learning algorithm to address the limited availability of labeled training data, a common problem in medical imaging applications. Inspired by learning from label proportion (LLP) approaches in computer vision, we propose to use proportion-SVM for characterizing tumors. We also seek the answer to the fundamental question about the goodness of “deep features” for unsupervised tumor classification. We evaluate our proposed supervised and unsupervised learning algorithms on two different tumor diagnosis challenges: lung and pancreas with 1018 CT and 171 MRI scans, respectively, and obtain the state-of-the-art sensitivity and specificity results in both problems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.03230

PDF

http://arxiv.org/pdf/1801.03230
Read All
Red blood cell image generation for data augmentation using Conditional Generative Adversarial Networks

2019-01-18

Oleksandr Bailo, DongShik Ham, Young Min Shin

arXiv_CV

arXiv_CV Adversarial Object_Detection Segmentation GAN Detection
Abstract

In this paper, we describe how to apply image-to-image translation techniques to medical blood smear data to generate new data samples and meaningfully increase small datasets. Specifically, given the segmentation mask of the microscopy image, we are able to generate photorealistic images of blood cells which are further used alongside real data during the network training for segmentation and object detection tasks. This image data generation approach is based on conditional generative adversarial networks which have proven capabilities to high-quality image synthesis. In addition to synthesizing blood images, we synthesize segmentation mask as well which leads to a diverse variety of generated samples. The effectiveness of the technique is thoroughly analyzed and quantified through a number of experiments on a manually collected and annotated dataset of blood smear taken under a microscope.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06219

PDF

http://arxiv.org/pdf/1901.06219
Read All
On-Policy Trust Region Policy Optimisation with Replay Buffers

2019-01-18

Dmitry Kangin, Nicolas Pugeault

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Building upon the recent success of deep reinforcement learning methods, we investigate the possibility of on-policy reinforcement learning improvement by reusing the data from several consecutive policies. On-policy methods bring many benefits, such as ability to evaluate each resulting policy. However, they usually discard all the information about the policies which existed before. In this work, we propose adaptation of the replay buffer concept, borrowed from the off-policy learning setting, to create the method, combining advantages of on- and off-policy learning. To achieve this, the proposed algorithm generalises the $Q$-, value and advantage functions for data from multiple policies. The method uses trust region optimisation, while avoiding some of the common problems of the algorithms such as TRPO or ACKTR: it uses hyperparameters to replace the trust region selection heuristics, as well as the trainable covariance matrix instead of the fixed one. In many cases, the method not only improves the results comparing to the state-of-the-art trust region on-policy learning algorithms such as PPO, ACKTR and TRPO, but also with respect to their off-policy counterpart DDPG.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06212

PDF

http://arxiv.org/pdf/1901.06212
Read All
Generative Adversarial Classifier for Handwriting Characters Super-Resolution

2019-01-18

Zhuang Qian, Kaizhu Huang, Qiufeng Wang, Jimin Xiao, Rui Zhang

arXiv_AI

arXiv_AI Adversarial Super_Resolution Attention GAN Classification Recognition
Abstract

Generative Adversarial Networks (GAN) receive great attentions recently due to its excellent performance in image generation, transformation, and super-resolution. However, GAN has rarely been studied and trained for classification, leading that the generated images may not be appropriate for classification. In this paper, we propose a novel Generative Adversarial Classifier (GAC) particularly for low-resolution Handwriting Character Recognition. Specifically, involving additionally a classifier in the training process of normal GANs, GAC is calibrated for learning suitable structures and restored characters images that benefits the classification. Experimental results show that our proposed method can achieve remarkable performance in handwriting characters 8x super-resolution, approximately 10% and 20% higher than the present state-of-the-art methods respectively on benchmark data CASIA-HWDB1.1 and MNIST.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06199

PDF

http://arxiv.org/pdf/1901.06199
Read All
Model-Free Active Input-Output Feedback Linearization of a Single-Link Flexible Joint Manipulator: An Improved ADRC Approach

2019-01-18

Wameedh Riyadh Abdul Adheem, Ibraheem Kasim Ibraheem

arXiv_RO

arXiv_RO Knowledge
Abstract

Traditional Input-Output Feedback Linearization (IOFL) requires full knowledge of system dynamics and assumes no disturbance at the input channel and no system’s uncertainties. In this paper, a model-free Active Input-Output Feedback Linearization (AIOFL) technique based on an Improved Active Disturbance Rejection Control (IADRC) paradigm is proposed to design feedback linearization control law for a generalized nonlinear system with known relative degree. The Linearization Control Law(LCL) is composed of a scaled generalized disturbance estimated by an Improved Nonlinear Extended State Observer (INLESO) with saturation-like behavior and the nominal control law produced by an Improved Nonlinear State Error Feedback (INLSEF). The proposed AIOFL cancels in real-time fashion the generalized disturbances which represent all the unwanted dynamics, exogenous disturbances, and system uncertainties and transforms the system into a chain of integrators up to the relative degree of the system, the only information required about the nonlinear system. Stability analysis has been conducted based on Lyapunov functions and revealed the convergence of the INLESO and the asymptotic stability of the closed-loop system. Verification of the outcomes has been achieved by applying the proposed AIOFL technique on the Flexible Joint Single Link Manipulator (SLFJM). The simulations results validated the effectiveness of the proposed AIOFL tool based on IADRC as compared to the conventional ADRC based AIOFL and the traditional IOFL techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.00222

PDF

http://arxiv.org/pdf/1805.00222
Read All
TactileGCN: A Graph Convolutional Network for Predicting Grasp Stability with Tactile Sensors

2019-01-18

Alberto Garcia-Garcia, Brayan Stiven Zapata-Impata, Sergio Orts-Escolano, Pablo Gil, Jose Garcia-Rodriguez

arXiv_RO

arXiv_RO CNN
Abstract

Tactile sensors provide useful contact data during the interaction with an object which can be used to accurately learn to determine the stability of a grasp. Most of the works in the literature represented tactile readings as plain feature vectors or matrix-like tactile images, using them to train machine learning models. In this work, we explore an alternative way of exploiting tactile information to predict grasp stability by leveraging graph-like representations of tactile data, which preserve the actual spatial arrangement of the sensor’s taxels and their locality. In experimentation, we trained a Graph Neural Network to binary classify grasps as stable or slippery ones. To train such network and prove its predictive capabilities for the problem at hand, we captured a novel dataset of approximately 5000 three-fingered grasps across 41 objects for training and 1000 grasps with 10 unknown objects for testing. Our experiments prove that this novel approach can be effectively used to predict grasp stability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06181

PDF

http://arxiv.org/pdf/1901.06181
Read All
Identifying Unclear Questions in Community Question Answering Websites

2019-01-18

Jan Trienes, Krisztian Balog

arXiv_CL

arXiv_CL Face Text_Classification Classification
Abstract

Thousands of complex natural language questions are submitted to community question answering websites on a daily basis, rendering them as one of the most important information sources these days. However, oftentimes submitted questions are unclear and cannot be answered without further clarification questions by expert community members. This study is the first to investigate the complex task of classifying a question as clear or unclear, i.e., if it requires further clarification. We construct a novel dataset and propose a classification approach that is based on the notion of similar questions. This approach is compared to state-of-the-art text classification baselines. Our main finding is that the similar questions approach is a viable alternative that can be used as a stepping stone towards the development of supportive user interfaces for question formulation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06168

PDF

http://arxiv.org/pdf/1901.06168
Read All
Backbone Can Not be Trained at Once: Rolling Back to Pre-trained Network for Person Re-Identification

2019-01-18

Youngmin Ro, Jongwon Choi, Dae Ung Jo, Byeongho Heo, Jongin Lim, Jin Young Choi

arXiv_CV

arXiv_CV Re-identification Segmentation Pose_Estimation Person_Re-identification CNN Classification
Abstract

In person re-identification (ReID) task, because of its shortage of trainable dataset, it is common to utilize fine-tuning method using a classification network pre-trained on a large dataset. However, it is relatively difficult to sufficiently fine-tune the low-level layers of the network due to the gradient vanishing problem. In this work, we propose a novel fine-tuning strategy that allows low-level layers to be sufficiently trained by rolling back the weights of high-level layers to their initial pre-trained weights. Our strategy alleviates the problem of gradient vanishing in low-level layers and robustly trains the low-level layers to fit the ReID dataset, thereby increasing the performance of ReID tasks. The improved performance of the proposed strategy is validated via several experiments. Furthermore, without any add-ons such as pose estimation or segmentation, our strategy exhibits state-of-the-art performance using only vanilla deep convolutional neural network architecture.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06140

PDF

http://arxiv.org/pdf/1901.06140
Read All
Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification

2019-01-18

Weitao Feng, Zhihao Hu, Wei Wu, Junjie Yan, Wanli Ouyang

arXiv_CV

arXiv_CV Re-identification Tracking Object_Tracking Classification
Abstract

In this paper, we propose a unified Multi-Object Tracking (MOT) framework learning to make full use of long term and short term cues for handling complex cases in MOT scenes. Besides, for better association, we propose switcher-aware classification (SAC), which takes the potential identity-switch causer (switcher) into consideration. Specifically, the proposed framework includes a Single Object Tracking (SOT) sub-net to capture short term cues, a re-identification (ReID) sub-net to extract long term cues and a switcher-aware classifier to make matching decisions using extracted features from the main target and the switcher. Short term cues help to find false negatives, while long term cues avoid critical mistakes when occlusion happens, and the SAC learns to combine multiple cues in an effective way and improves robustness. The method is evaluated on the challenging MOT benchmarks and achieves the state-of-the-art results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06129

PDF

http://arxiv.org/pdf/1901.06129
Read All
Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods

2019-01-18

Apratim Bhattacharyya, Mario Fritz, Bernt Schiele

arXiv_CV

arXiv_CV Inference Prediction
Abstract

For autonomous agents to successfully operate in the real world, the ability to anticipate future scene states is a key competence. In real-world scenarios, future states become increasingly uncertain and multi-modal, particularly on long time horizons. Dropout based Bayesian inference provides a computationally tractable, theoretically well grounded approach to learn likely hypotheses/models to deal with uncertain futures and make predictions that correspond well to observations – are well calibrated. However, it turns out that such approaches fall short to capture complex real-world scenes, even falling behind in accuracy when compared to the plain deterministic approaches. This is because the used log-likelihood estimate discourages diversity. In this work, we propose a novel Bayesian formulation for anticipating future scene states which leverages synthetic likelihoods that encourage the learning of diverse models to accurately capture the multi-modal nature of future scene states. We show that our approach achieves accurate state-of-the-art predictions and calibrated probabilities through extensive experiments for scene anticipation on Cityscapes dataset. Moreover, we show that our approach generalizes across diverse tasks such as digit generation and precipitation forecasting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.00746

PDF

http://arxiv.org/pdf/1810.00746
Read All
Fast High-Dimensional Kernel Filtering

2019-01-18

Pravin Nair, Kunal N. Chaudhury

arXiv_CV

arXiv_CV
Abstract

The bilateral and nonlocal means filters are instances of kernel-based filters that are popularly used in image processing. It was recently shown that fast and accurate bilateral filtering of grayscale images can be performed using a low-rank approximation of the kernel matrix. More specifically, based on the eigendecomposition of the kernel matrix, the overall filtering was approximated using spatial convolutions, for which efficient algorithms are available. Unfortunately, this technique cannot be scaled to high-dimensional data such as color and hyperspectral images. This is simply because one needs to compute/store a large matrix and perform its eigendecomposition in this case. We show how this problem can be solved using the Nystr"om method, which is generally used for approximating the eigendecomposition of large matrices. The resulting algorithm can also be used for nonlocal means filtering. We demonstrate the effectiveness of our proposal for bilateral and nonlocal means filtering of color and hyperspectral images. In particular, our method is shown to be competitive with state-of-the-art fast algorithms, and moreover it comes with a theoretical guarantee on the approximation error.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06112

PDF

http://arxiv.org/pdf/1901.06112
Read All
CRDN: Cascaded Residual Dense Networks for Dynamic MR Imaging with Edge-enhanced Loss Constraint

2019-01-18

Ziwen Ke, Shanshan Wang, Huitao Cheng, Leslie Ying, Qiegen Liu, Hairong Zheng, Dong Liang

arXiv_CV

arXiv_CV Knowledge CNN Deep_Learning
Abstract

Dynamic magnetic resonance (MR) imaging has generated great research interest, as it can provide both spatial and temporal information for clinical diagnosis. However, slow imaging speed or long scanning time is still one of the challenges for dynamic MR imaging. Most existing methods reconstruct Dynamic MR images from incomplete k-space data under the guidance of compressed sensing (CS) or low rank theory, which suffer from long iterative reconstruction time. Recently, deep learning has shown great potential in accelerating dynamic MR. Our previous work proposed a dynamic MR imaging method with both k-space and spatial prior knowledge integrated via multi-supervised network training. Nevertheless, there was still a certain degree of smooth in the reconstructed images at high acceleration factors. In this work, we propose cascaded residual dense networks for dynamic MR imaging with edge-enhance loss constraint, dubbed as CRDN. Specifically, the cascaded residual dense networks fully exploit the hierarchical features from all the convolutional layers with both local and global feature fusion. We further utilize the total variation (TV) loss function, which has the edge enhancement properties, for training the networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06111

PDF

http://arxiv.org/pdf/1901.06111
Read All
Linearized ADMM and Fast Nonlocal Denoising for Efficient Plug-and-Play Restoration

2019-01-18

Unni V. S., Sanjay Ghosh, Kunal N. Chaudhury

arXiv_CV

arXiv_CV Regularization Super_Resolution Optimization
Abstract

In plug-and-play image restoration, the regularization is performed using powerful denoisers such as nonlocal means (NLM) or BM3D. This is done within the framework of alternating direction method of multipliers (ADMM), where the regularization step is formally replaced by an off-the-shelf denoiser. Each plug-and-play iteration involves the inversion of the forward model followed by a denoising step. In this paper, we present a couple of ideas for improving the efficiency of the inversion and denoising steps. First, we propose to use linearized ADMM, which generally allows us to perform the inversion at a lower cost than standard ADMM. Moreover, we can easily incorporate hard constraints into the optimization framework as a result. Second, we develop a fast algorithm for doubly stochastic NLM, originally proposed by Sreehari et al. (IEEE TCI, 2016), which is about 80x faster than brute-force computation. This particular denoiser can be expressed as the proximal map of a convex regularizer and, as a consequence, we can guarantee convergence for linearized plug-and-play ADMM. We demonstrate the effectiveness of our proposals for super-resolution and single-photon imaging.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06110

PDF

http://arxiv.org/pdf/1901.06110
Read All
Look before you sweep: Visibility-aware motion planning

2019-01-18

Gustavo Goretkin, Leslie Pack Kaelbling, Tomás Lozano-Pérez

arXiv_RO

arXiv_RO Detection
Abstract

This paper addresses the problem of planning for a robot with a directional obstacle-detection sensor that must move through a cluttered environment. The planning objective is to remain safe by finding a path for the complete robot, including sensor, that guarantees that the robot will not move into any part of the workspace before it has been seen by the sensor. Although a great deal of work has addressed a version of this problem in which the “field of view” of the sensor is a sphere around the robot, there is very little work addressing robots with a narrow or occluded field of view. We give a formal definition of the problem, several solution methods with different computational trade-offs, and experimental results in illustrative domains.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06109

PDF

http://arxiv.org/pdf/1901.06109
Read All
Exploring Semi-supervised Variational Autoencoders for Biomedical Relation Extraction

2019-01-18

Yijia Zhang, Zhiyong Lu

arXiv_CL

arXiv_CL Knowledge Relation_Extraction CNN RNN Relation Memory_Networks
Abstract

The biomedical literature provides a rich source of knowledge such as protein-protein interactions (PPIs), drug-drug interactions (DDIs) and chemical-protein interactions (CPIs). Biomedical relation extraction aims to automatically extract biomedical relations from biomedical text for various biomedical research. State-of-the-art methods for biomedical relation extraction are primarily based on supervised machine learning and therefore depend on (sufficient) labeled data. However, creating large sets of training data is prohibitively expensive and labor-intensive, especially so in biomedicine as domain knowledge is required. In contrast, there is a large amount of unlabeled biomedical text available in PubMed. Hence, computational methods capable of employing unlabeled data to reduce the burden of manual annotation are of particular interest in biomedical relation extraction. We present a novel semi-supervised approach based on variational autoencoder (VAE) for biomedical relation extraction. Our model consists of the following three parts, a classifier, an encoder and a decoder. The classifier is implemented using multi-layer convolutional neural networks (CNNs), and the encoder and decoder are implemented using both bidirectional long short-term memory networks (Bi-LSTMs) and CNNs, respectively. The semi-supervised mechanism allows our model to learn features from both the labeled and unlabeled data. We evaluate our method on multiple public PPI, DDI and CPI corpora. Experimental results show that our method effectively exploits the unlabeled data to improve the performance and reduce the dependence on labeled data. To our best knowledge, this is the first semi-supervised VAE-based method for (biomedical) relation extraction. Our results suggest that exploiting such unlabeled data can be greatly beneficial to improved performance in various biomedical relation extraction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06103

PDF

http://arxiv.org/pdf/1901.06103
Read All
Pelee: A Real-Time Object Detection System on Mobile Devices

2019-01-18

Robert J. Wang, Xiang Li, Charles X. Ling

arXiv_CV

arXiv_CV Object_Detection CNN Deep_Learning Detection
Abstract

An increasing need of running Convolutional Neural Network (CNN) models on mobile devices with limited computing power and memory resource encourages studies on efficient model design. A number of efficient architectures have been proposed in recent years, for example, MobileNet, ShuffleNet, and MobileNetV2. However, all these models are heavily dependent on depthwise separable convolution which lacks efficient implementation in most deep learning frameworks. In this study, we propose an efficient architecture named PeleeNet, which is built with conventional convolution instead. On ImageNet ILSVRC 2012 dataset, our proposed PeleeNet achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on NVIDIA TX2. Meanwhile, PeleeNet is only 66% of the model size of MobileNet. We then propose a real-time object detection system by combining PeleeNet with Single Shot MultiBox Detector (SSD) method and optimizing the architecture for fast speed. Our proposed detection system2, named Pelee, achieves 76.4% mAP (mean average precision) on PASCAL VOC2007 and 22.4 mAP on MS COCO dataset at the speed of 23.6 FPS on iPhone 8 and 125 FPS on NVIDIA TX2. The result on COCO outperforms YOLOv2 in consideration of a higher precision, 13.6 times lower computational cost and 11.3 times smaller model size.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.06882

PDF

https://arxiv.org/pdf/1804.06882
Read All
Theory of Minds: Understanding Behavior in Groups Through Inverse Planning

2019-01-18

Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum

arXiv_AI

arXiv_AI Sparse Reinforcement_Learning Inference Relation
Abstract

Human social behavior is structured by relationships. We form teams, groups, tribes, and alliances at all scales of human life. These structures guide multi-agent cooperation and competition, but when we observe others these underlying relationships are typically unobservable and hence must be inferred. Humans make these inferences intuitively and flexibly, often making rapid generalizations about the latent relationships that underlie behavior from just sparse and noisy observations. Rapid and accurate inferences are important for determining who to cooperate with, who to compete with, and how to cooperate in order to compete. Towards the goal of building machine-learning algorithms with human-like social intelligence, we develop a generative model of multi-agent action understanding based on a novel representation for these latent relationships called Composable Team Hierarchies (CTH). This representation is grounded in the formalism of stochastic games and multi-agent reinforcement learning. We use CTH as a target for Bayesian inference yielding a new algorithm for understanding behavior in groups that can both infer hidden relationships as well as predict future actions for multiple agents interacting together. Our algorithm rapidly recovers an underlying causal model of how agents relate in spatial stochastic games from just a few observations. The patterns of inference made by this algorithm closely correspond with human judgments and the algorithm makes the same rapid generalizations that people do.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06085

PDF

http://arxiv.org/pdf/1901.06085
Read All
DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning

2019-01-18

Sheng He, Lambert Schomaker

arXiv_CV

arXiv_CV Deep_Learning
Abstract

This paper presents a novel iterative deep learning framework and apply it for document enhancement and binarization. Unlike the traditional methods which predict the binary label of each pixel on the input image, we train the neural network to learn the degradations in document images and produce the uniform images of the degraded input images, which allows the network to refine the output iteratively. Two different iterative methods have been studied in this paper: recurrent refinement (RR) which uses the same trained neural network in each iteration for document enhancement and stacked refinement (SR) which uses a stack of different neural networks for iterative output refinement. Given the learned uniform and enhanced image, the binarization map can be easy to obtain by a global or local threshold. The experimental results on several public benchmark data sets show that our proposed methods provide a new clean version of the degraded image which is suitable for visualization and promising results of binarization using the global Otsu’s threshold based on the enhanced images learned iteratively by the neural network.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06081

PDF

http://arxiv.org/pdf/1901.06081
Read All
Chinese Word Segmentation: Another Decade Review

2019-01-18

Hai Zhao, Deng Cai, Changning Huang, Chunyu Kit

arXiv_CL

arXiv_CL Review Segmentation Attention Deep_Learning Recognition
Abstract

This paper reviews the development of Chinese word segmentation (CWS) in the most recent decade, 2007-2017. Special attention was paid to the deep learning technologies that has already permeated into most areas of natural language processing (NLP). The basic view we have arrived at is that compared to traditional supervised learning methods, neural network based methods have not shown any superior performance. The most critical challenge still lies on balancing of recognition of in-vocabulary (IV) and out-of-vocabulary (OOV) words. However, as neural models have potentials to capture the essential linguistic structure of natural language, we are optimistic about significant progresses may arrive in the near future.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06079

PDF

http://arxiv.org/pdf/1901.06079
Read All
On-field player workload exposure and knee injury risk monitoring via deep learning

2019-01-18

William R. Johnson, Ajmal Mian, David Lloyd, Jacqueline Alderson

arXiv_CV

arXiv_CV CNN Deep_Learning Prediction Relation
Abstract

In sports analytics, an understanding of accurate on-field 3D knee joint moments (KJM) could provide an early warning system for athlete workload exposure and knee injury risk. Traditionally, this analysis has relied on captive laboratory force plates and associated downstream biomechanical modeling, and many researchers have approached the problem of portability by extrapolating models built on linear statistics. An alternative approach would be to capitalize on recent advances in deep learning. In this study, using the pre-trained CaffeNet convolutional neural network (CNN) model, multivariate regression of marker-based motion capture to 3D KJM for three sports-related movement types were compared. The strongest overall mean correlation to source modeling of 0.8895 was achieved over the initial 33 % of stance phase for sidestepping. The accuracy of these mean predictions of the three critical KJM associated with anterior cruciate ligament (ACL) injury demonstrate the feasibility of on-field knee injury assessment using deep learning in lieu of laboratory embedded force plates. This multidisciplinary research approach significantly advances machine representation of real-world physical models with practical application for both community and professional level athletes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08016

PDF

http://arxiv.org/pdf/1809.08016
Read All
Learning Mutually Local-global U-nets For High-resolution Retinal Lesion Segmentation in Fundus Images

2019-01-18

Zizheng Yan, Xiaoguang Han, Changmiao Wang, Yuda Qiu, Zixiang Xiong, Shuguang Cui

arXiv_CV

arXiv_CV Segmentation
Abstract

Diabetic retinopathy is the most important complication of diabetes. Early diagnosis of retinal lesions helps to avoid visual loss or blindness. Due to high-resolution and small-size lesion regions, applying existing methods, such as U-Nets, to perform segmentation on fundus photography is very challenging. Although downsampling the input images could simplify the problem, it loses detailed information. Conducting patch-level analysis helps reaching fine-scale segmentation yet usually leads to misunderstanding as the lack of context information. In this paper, we propose an efficient network that combines them together, not only being aware of local details but also taking fully use of the context perceptions. This is implemented by integrating the decoder parts of a global-level U-net and a patch-level one. The two streams are jointly optimized, ensuring that they are enhanced mutually. Experimental results demonstrate our new framework significantly outperforms existing patch-based and global-based methods, especially when the lesion regions are scattered and small-scaled.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06047

PDF

http://arxiv.org/pdf/1901.06047
Read All
Good Similar Patches for Image Denoising

2019-01-18

Si Lu

arXiv_CV

arXiv_CV
Abstract

Patch-based denoising algorithms like BM3D have achieved outstanding performance. An important idea for the success of these methods is to exploit the recurrence of similar patches in an input image to estimate the underlying image structures. However, in these algorithms, the similar patches used for denoising are obtained via Nearest Neighbour Search (NNS) and are sometimes not optimal. First, due to the existence of noise, NNS can select similar patches with similar noise patterns to the reference patch. Second, the unreliable noisy pixels in digital images can bring a bias to the patch searching process and result in a loss of color fidelity in the final denoising result. We observe that given a set of good similar patches, their distribution is not necessarily centered at the noisy reference patch and can be approximated by a Gaussian component. Based on this observation, we present a patch searching method that clusters similar patch candidates into patch groups using Gaussian Mixture Model-based clustering, and selects the patch group that contains the reference patch as the final patches for denoising. We also use an unreliable pixel estimation algorithm to pre-process the input noisy images to further improve the patch searching. Our experiments show that our approach can better capture the underlying patch structures and can consistently enable the state-of-the-art patch-based denoising algorithms, such as BM3D, LPCA and PLOW, to better denoise images by providing them with patches found by our approach while without modifying these algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06046

PDF

http://arxiv.org/pdf/1901.06046
Read All
SAML-QC: a Stochastic Assessment and Machine Learning based QC technique for Industrial Printing

2019-01-18

Azhar Hussain

arXiv_CV

arXiv_CV Detection Recognition
Abstract

Recently, the advancement in industrial automation and high-speed printing has raised numerous challenges related to the printing quality inspection of final products. This paper proposes a machine vision based technique to assess the printing quality of text on industrial objects. The assessment is based on three quality defects such as text misalignment, varying printing shades, and misprinted text. The proposed scheme performs the quality inspection through stochastic assessment technique based on the second-order statistics of printing. First: the text-containing area on printed product is identified through image processing techniques. Second: the alignment testing of the identified text-containing area is performed. Third: optical character recognition is performed to divide the text into different small boxes and only the intensity value of each text-containing box is taken as a random variable and second-order statistics are estimated to determine the varying printing defects in the text under one, two and three sigma thresholds. Fourth: the K-Nearest Neighbors based supervised machine learning is performed to provide the stochastic process for misprinted text detection. Finally, the technique is deployed on an industrial image for the printing quality assessment with varying values of n and m. The results have shown that the proposed SAML-QC technique can perform real-time automated inspection for industrial printing.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.07370

PDF

http://arxiv.org/pdf/1901.07370
Read All
Automatic Keyboard Layout Design for Low-Resource Latin-Script Languages

2019-01-18

Theresa Breiner, Chieu Nguyen, Daan van Esch, Jeremy O'Brien

arXiv_CL

arXiv_CL
Abstract

We present our approach to automatically designing and implementing keyboard layouts on mobile devices for typing low-resource languages written in the Latin script. For many speakers, one of the barriers in accessing and creating text content on the web is the absence of input tools for their language. Ease in typing in these languages would lower technological barriers to online communication and collaboration, likely leading to the creation of more web content. Unfortunately, it can be time-consuming to develop layouts manually even for language communities that use a keyboard layout very similar to English; starting from scratch requires many configuration files to describe multiple possible behaviors for each key. With our approach, we only need a small amount of data in each language to generate keyboard layouts with very little human effort. This process can help serve speakers of low-resource languages in a scalable way, allowing us to develop input tools for more languages. Having input tools that reflect the linguistic diversity of the world will let as many people as possible use technology to learn, communicate, and express themselves in their own native languages.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06039

PDF

http://arxiv.org/pdf/1901.06039
Read All
High-speed Video from Asynchronous Camera Array

2019-01-17

Si Lu

arXiv_CV

arXiv_CV
Abstract

This paper presents a method for capturing high-speed video using an asynchronous camera array. Our method sequentially fires each sensor in a camera array with a small time offset and assembles captured frames into a high-speed video according to the time stamps. The resulting video, however, suffers from parallax jittering caused by the viewpoint difference among sensors in the camera array. To address this problem, we develop a dedicated novel view synthesis algorithm that transforms the video frames as if they were captured by a single reference sensor. Specifically, for any frame from a non-reference sensor, we find the two temporally neighboring frames captured by the reference sensor. Using these three frames, we render a new frame with the same time stamp as the non-reference frame but from the viewpoint of the reference sensor. Specifically, we segment these frames into super-pixels and then apply local content-preserving warping to warp them to form the new frame. We employ a multi-label Markov Random Field method to blend these warped frames. Our experiments show that our method can produce high-quality and high-speed video of a wide variety of scenes with large parallax, scene dynamics, and camera motion and outperforms several baseline and state-of-the-art approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06034

PDF

http://arxiv.org/pdf/1901.06034
Read All
A Survey of the Recent Architectures of Deep Convolutional Neural Networks

2019-01-17

Asifullah Khan, Anabia Sohail, Umme Zahoora, Aqsa Saeed Qureshi

arXiv_CV

arXiv_CV Regularization Attention Survey CNN Optimization
Abstract

Deep Convolutional Neural Networks (CNNs) are a special type of Neural Networks, which have shown state-of-the-art results on various competitive benchmarks. The powerful learning ability of deep CNN is largely achieved with the use of multiple non-linear feature extraction stages that can automatically learn hierarchical representation from the data. Availability of a large amount of data and improvements in the hardware processing units have accelerated the research in CNNs and recently very interesting deep CNN architectures are reported. The recent race in deep CNN architectures for achieving high performance on the challenging benchmarks has shown that the innovative architectural ideas, as well as parameter optimization, can improve the CNN performance on various vision-related tasks. In this regard, different ideas in the CNN design have been explored such as use of different activation and loss functions, parameter optimization, regularization, and restructuring of processing units. However, the major improvement in representational capacity is achieved by the restructuring of the processing units. Especially, the idea of using a block as a structural unit instead of a layer is gaining substantial appreciation. This survey thus focuses on the intrinsic taxonomy present in the recently reported CNN architectures and consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature map exploitation, channel boosting and attention. Additionally, it covers the elementary understanding of the CNN components and sheds light on the current challenges and applications of CNNs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06032

PDF

http://arxiv.org/pdf/1901.06032
Read All
Scale-Aware Attention Network for Crowd Counting

2019-01-17

Rahul Rama Varior, Bing Shuai, Joe Tighe, Davide Modolo

arXiv_CV

arXiv_CV Attention CNN Prediction
Abstract

In crowd counting datasets, people appear at different scales, depending on their distance to the camera. To address this issue, we propose a novel multi-branch scale-aware attention network that exploits the hierarchical structure of convolutional neural networks and generates, in a single forward pass, multi-scale density predictions from different layers of the architecture. To aggregate these maps into our final prediction, we present a new soft attention mechanism that learns a set of gating masks. Furthermore, we introduce a scale-aware loss function to regularize the training of different branches and guide them to specialize on a particular scale. As this new training requires ground-truth annotations for the size of each head, we also propose a simple, yet effective technique to estimate it automatically. Finally, we present an ablation study on each of these components and compare our approach against the literature on 4 crowd counting datasets: UCF-QNRF, ShanghaiTech A & B and UCF_CC_50. Without bells and whistles, our approach achieves state-of-the-art on all these datasets. We observe a remarkable improvement on the UCF-QNRF (25%) and a significant one on the others (around 10%).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06026

PDF

http://arxiv.org/pdf/1901.06026
Read All
Analyzing Covariate Influence on Gender and Race Prediction from Near-Infrared Ocular Images

2019-01-17

Denton Bobeldyk, Arun Ross

arXiv_CV

arXiv_CV Review Face Prediction Recognition
Abstract

Recent research has explored the possibility of automatically deducing information such as gender, age and race of an individual from their biometric data. While the face modality has been extensively studied in this regard, the iris modality less so. In this paper, we first review the medical literature to establish a biological basis for extracting gender and race cues from the iris. Then, we demonstrate that it is possible to use simple texture descriptors, like BSIF (Binarized Statistical Image Feature) and LBP (Local Binary Patterns), to extract gender and race attributes from an NIR ocular image used in a typical iris recognition system. The proposed method predicts gender and race from a single eye image with an accuracy of 86% and 90%, respectively. In addition, the following analysis are conducted: (a) the role of different parts of the ocular region on attribute prediction; (b) the influence of gender on race prediction, and vice-versa; (c) the impact of eye color on gender and race prediction; (d) the impact of image blur on gender and race prediction; (e) the generalizability of the method across different datasets; and (f) the consistency of prediction performance across the left and right eyes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.01912

PDF

http://arxiv.org/pdf/1805.01912
Read All
Facial Landmark Point Localization using Coarse-to-Fine Deep Recurrent Neural Network

2019-01-17

Shahar Mahpod, Rig Das, Emanuele Maiorana, Yosi Keller, Patrizio Campisi

arXiv_CV

arXiv_CV Face Deep_Learning Recognition Face_Recognition
Abstract

The accurate localization of facial landmarks is at the core of face analysis tasks, such as face recognition and facial expression analysis, to name a few. In this work we propose a novel localization approach based on a Deep Learning architecture that utilizes dual cascaded CNN subnetworks of the same length, where each subnetwork in a cascade refines the accuracy of its predecessor. The first set of cascaded subnetworks estimates heatmaps that encode the landmarks’ locations, while the second set of cascaded subnetworks refines the heatmaps-based localization using regression, and also receives as input the output of the corresponding heatmap estimation subnetwork. The proposed scheme is experimentally shown to compare favorably with contemporary state-of-the-art schemes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.01760

PDF

http://arxiv.org/pdf/1805.01760
Read All
FARSA: Fully Automated Roadway Safety Assessment

2019-01-17

Weilian Song, Scott Workman, Armin Hadzic, Xu Zhang, Eric Green, Mei Chen, Reginald Souleyrette, Nathan Jacobs

arXiv_CV

arXiv_CV Attention CNN
Abstract

This paper addresses the task of road safety assessment. An emerging approach for conducting such assessments in the United States is through the US Road Assessment Program (usRAP), which rates roads from highest risk (1 star) to lowest (5 stars). Obtaining these ratings requires manual, fine-grained labeling of roadway features in street-level panoramas, a slow and costly process. We propose to automate this process using a deep convolutional neural network that directly estimates the star rating from a street-level panorama, requiring milliseconds per image at test time. Our network also estimates many other road-level attributes, including curvature, roadside hazards, and the type of median. To support this, we incorporate task-specific attention layers so the network can focus on the panorama regions that are most useful for a particular task. We evaluated our approach on a large dataset of real-world images from two US states. We found that incorporating additional tasks, and using a semi-supervised training approach, significantly reduced overfitting problems, allowed us to optimize more layers of the network, and resulted in higher accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06013

PDF

http://arxiv.org/pdf/1901.06013
Read All
Instance-Level Microtubule Segmentation Using Recurrent Attention

2019-01-17

Samira Masoudi, Afsaneh Razi, Cameron H.G. Wright, Jay C. Gatlin, Ulas Bagci

arXiv_CV

arXiv_CV Segmentation Attention Deep_Learning
Abstract

We propose a new deep learning algorithm for multiple microtubule (MT) segmentation in time-lapse images using the recurrent attention. Segmentation results from each pair of succeeding frames are being fed into a Hungarian algorithm to assign correspondences among MTs to generate a distinct path through the frames. Based on the obtained trajectories, we calculate MT velocities. Results of this work is expected to help biologists to characterize MT behaviors as well as their potential interactions. To validate our technique, we first use the statistics derived from the real time-lapse series of MT gliding assays to produce a large set of simulated data. We employ this dataset to train our network and optimize its hyperparameters. Then, we utilize the trained model to initialize the network while learning about the real data. Our experimental results show that the proposed algorithm improves the precision for MT instance velocity estimation to 71.3% from the baseline result (29.3%). We also demonstrate how the injection of temporal information into our network can reduce the false negative rates from 67.8% (baseline) down to 28.7% (proposed).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.06006

PDF

http://arxiv.org/pdf/1901.06006
Read All
PSACNN: Pulse Sequence Resilient Fast Whole Brain Segmentation

2019-01-17

Amod Jog, Andrew Hoopes, Douglas N. Greves, Koen Van Leemput, Bruce Fischl

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

With the advent of convolutional neural networks~(CNN), supervised learning methods are increasingly being used for whole brain segmentation. However, a large, manually annotated training dataset of labeled brain images required to train such supervised methods is frequently difficult to obtain or create. In addition, existing training datasets are generally acquired with a homogeneous magnetic resonance imaging~(MRI) acquisition protocol. CNNs trained on such datasets are unable to generalize on test data with different acquisition protocols. Modern neuroimaging studies and clinical trials are necessarily multi-center initiatives with a wide variety of acquisition protocols. Despite stringent protocol harmonization practices, it is very difficult to standardize the gamut of MRI imaging parameters across scanners, field strengths, receive coils etc., that affect image contrast. In this paper we propose a CNN-based segmentation algorithm that, in addition to being highly accurate and fast, is also resilient to variation in the input acquisition. Our approach relies on building approximate forward models of pulse sequences that produce a typical test image. For a given pulse sequence, we use its forward model to generate plausible, synthetic training examples that appear as if they were acquired in a scanner with that pulse sequence. Sampling over a wide variety of pulse sequences results in a wide variety of augmented training examples that help build an image contrast invariant model. Our method trains a single CNN that can segment input MRI images with acquisition parameters as disparate as $T_1$-weighted and $T_2$-weighted contrasts with only $T_1$-weighted training data. The segmentations generated are highly accurate with state-of-the-art results~(overall Dice overlap$=0.94$), with a fast run time~($\approx$ 45 seconds), and consistent across a wide range of acquisition protocols.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05992

PDF

http://arxiv.org/pdf/1901.05992
Read All
Semantic Nighttime Image Segmentation with Synthetic Stylized Data, Gradual Adaptation and Uncertainty-Aware Evaluation

2019-01-17

Christos Sakaridis, Dengxin Dai, Luc Van Gool

arXiv_CV

arXiv_CV Segmentation Style_Transfer Semantic_Segmentation Prediction
Abstract

This work addresses the problem of semantic segmentation of nighttime images. The main direction of recent progress in semantic segmentation pertains to daytime scenes with favorable illumination conditions. We focus on improving the performance of state-of-the-art methods on the nighttime domain by adapting them to nighttime data without extra annotations, and designing a new evaluation framework to address the uncertainty of semantics in nighttime images. To this end, we make the following contributions: 1) a novel pipeline for dataset-scale guided style transfer to generate synthetic nighttime images from real daytime input; 2) a framework to gradually adapt semantic segmentation models from day to night via stylized and real images of progressively increasing darkness; 3) a novel uncertainty-aware annotation and evaluation framework and metric for semantic segmentation in adverse conditions; 4) the Dark Zurich dataset with 2416 nighttime and 2920 twilight unlabeled images plus 20 nighttime images with pixel-level annotations that conform to our newly-proposed evaluation. Our experiments evidence that both our stylized data per se and our gradual adaptation significantly boost performance at nighttime both for standard evaluation metrics and our metric. Moreover, our new evaluation reveals that state-of-the-art segmentation models output overly confident predictions at indiscernible regions compared to visible ones.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05946

PDF

https://arxiv.org/pdf/1901.05946
Read All
Foreground-aware Image Inpainting

2019-01-17

Wei Xiong, Zhe Lin, Jimei Yang, Xin Lu, Connelly Barnes, Jiebo Luo

arXiv_CV

arXiv_CV Inference
Abstract

Existing image inpainting methods typically fill holes by borrowing information from surrounding image regions. They often produce unsatisfactory results when the holes overlap with or touch foreground objects due to lack of information about the actual extent of foreground and background regions within the holes. These scenarios, however, are very important in practice, especially for applications such as distracting object removal. To address the problem, we propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion. Specifically, our model learns to predict the foreground contour first, and then inpaints the missing region using the predicted contour as guidance. We show that by this disentanglement, the contour completion model predicts reasonable contours of objects, and further substantially improves the performance of image inpainting. Experiments show that our method significantly outperforms existing methods and achieves superior inpainting results on challenging cases with complex compositions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05945

PDF

https://arxiv.org/pdf/1901.05945
Read All
Sign Language Representation by TEO Humanoid Robot: End-User Interest, Comprehension and Satisfaction

2019-01-17

Jennifer J. Gago, Juan G. Victores, Carlos Balaguer

arXiv_RO

arXiv_RO
Abstract

In this paper, we illustrate our work on improving the accessibility of Cyber-Physical Systems (CPS), presenting a study on human-robot interaction where the end-users are either deaf or hearing-impaired people. Current trends in robotic designs include devices with robotic arms and hands capable of performing manipulation and grasping tasks. This paper focuses on how these devices can be used for a different purpose, which is that of enabling robotic communication via sign language. For the study, several tests and questionnaires are run to check and measure how end-users feel about interpreting sign language represented by a humanoid robotic assistant as opposed to subtitles on a screen. Stemming from this dichotomy, dactylology, basic vocabulary representation and end-user satisfaction are the main topics covered by a delivered form, in which additional commentaries are valued and taken into consideration for further decision taking regarding robot-human interaction. The experiments were performed using TEO, a household companion humanoid robot developed at the University Carlos III de Madrid (UC3M), via representations in Spanish Sign Language (LSE), and a total of 16 deaf and hearing-impaired participants.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05939

PDF

http://arxiv.org/pdf/1901.05939
Read All
Resource-Aware Algorithms for Distributed Loop Closure Detection with Provable Performance Guarantees

2019-01-17

Yulun Tian, Kasra Khosoussi, Jonathan P. How

arXiv_RO

arXiv_RO Detection SLAM
Abstract

Inter-robot loop closure detection, e.g., for collaborative simultaneous localization and mapping (CSLAM), is a fundamental capability for many multirobot applications in GPS-denied regimes. In real-world scenarios, this is a resource-intensive process that involves exchanging observations and verifying potential matches. This poses severe challenges especially for small-size and low-cost robots with various operational and resource constraints that limit, e.g., energy consumption, communication bandwidth, and computation capacity. This paper presents resource-aware algorithms for distributed inter-robot loop closure detection. In particular, we seek to select a subset of potential inter-robot loop closures that maximizes a monotone submodular performance metric without exceeding computation and communication budgets. We demonstrate that this problem is in general NP-hard, and present efficient approximation algorithms with provable performance guarantees. A convex relaxation scheme is used to certify near-optimal performance of the proposed framework in real and synthetic SLAM benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1901.05925

PDF

http://arxiv.org/pdf/1901.05925
Read All
EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Architecture Search

2019-01-17

Jiemin Fang, Yukang Chen, Xinbang Zhang, Qian Zhang, Chang Huang, Gaofeng Meng, Wenyu Liu, Xinggang Wang

arXiv_CV

arXiv_CV
Abstract

Neural architecture search (NAS) methods have been proposed to release human experts from tedious architecture engineering. However, most current methods are constrained in small-scale search due to the issue of computational resources. Meanwhile, directly applying architectures searched on small datasets to large-scale tasks often bears no performance guarantee. This limitation impedes the wide use of NAS on large-scale tasks. To overcome this obstacle, we propose an elastic architecture transfer mechanism for accelerating large-scale neural architecture search (EAT-NAS). In our implementations, architectures are first searched on a small dataset (the width and depth of architectures are taken into consideration as well), e.g., CIFAR-10, and the best is chosen as the basic architecture. Then the whole architecture is transferred with elasticity. We accelerate the search process on a large-scale dataset, e.g., the whole ImageNet dataset, with the help of the basic architecture. What we propose is not only a NAS method but a mechanism for architecture-level transfer. In our experiments, we obtain two final models EATNet-A and EATNet-B that achieve competitive accuracies, 73.8% and 73.7% on ImageNet, respectively, which also surpass the models searched from scratch on ImageNet under the same settings. For computational cost, EAT-NAS takes only less than 5 days on 8 TITAN X GPUs, which is significantly less than the computational consumption of the state-of-the-art large-scale NAS methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05884

PDF

https://arxiv.org/pdf/1901.05884
Read All
UltraCompression: Framework for High Density Compression of Ultrasound Volumes using Physics Modeling Deep Neural Networks

2019-01-17

Debarghya China, Francis Tom, Sumanth Nandamuri, Aupendu Kar, Mukundhan Srinivasan, Pabitra Mitra, Debdoot Sheet

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

Ultrasound image compression by preserving speckle-based key information is a challenging task. In this paper, we introduce an ultrasound image compression framework with the ability to retain realism of speckle appearance despite achieving very high-density compression factors. The compressor employs a tissue segmentation method, transmitting segments along with transducer frequency, number of samples and image size as essential information required for decompression. The decompressor is based on a convolutional network trained to generate patho-realistic ultrasound images which convey essential information pertinent to tissue pathology visible in the images. We demonstrate generalizability of the building blocks using two variants to build the compressor. We have evaluated the quality of decompressed images using distortion losses as well as perception loss and compared it with other off the shelf solutions. The proposed method achieves a compression ratio of $725:1$ while preserving the statistical distribution of speckles. This enables image segmentation on decompressed images to achieve dice score of $0.89 \pm 0.11$, which evidently is not so accurately achievable when images are compressed with current standards like JPEG, JPEG 2000, WebP and BPG. We envision this frame work to serve as a roadmap for speckle image compression standards.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1901.05880

PDF

https://arxiv.org/pdf/1901.05880
Read All

184/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL