Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Unsupervised Data Uncertainty Learning in Visual Retrieval Systems

2019-02-07

Ahmed Taha, Yi-Ting Chen, Teruhisa Misu, Abhinav Shrivastava, Larry Davis

arXiv_CV

arXiv_CV Embedding
Abstract

We introduce an unsupervised formulation to estimate heteroscedastic uncertainty in retrieval systems. We propose an extension to triplet loss that models data uncertainty for each input. Besides improving performance, our formulation models local noise in the embedding space. It quantifies input uncertainty and thus enhances interpretability of the system. This helps identify noisy observations in query and search databases. Evaluation on both image and video retrieval applications highlight the utility of our approach. We highlight our efficiency in modeling local noise using two real-world datasets: Clothing1M and Honda Driving datasets. Qualitative results illustrate our ability in identifying confusing scenarios in various domains. Uncertainty learning also enables data cleaning by detecting noisy training labels.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02586

PDF

http://arxiv.org/pdf/1902.02586
Read All
Illumination Invariant Foreground Object Segmentation using ForeGANs

2019-02-07

Maryam Sultana, Soon Ki Jung

arXiv_CV

arXiv_CV Adversarial Segmentation GAN
Abstract

The foreground segmentation algorithms suffer performance degradation in the presence of various challenges such as dynamic backgrounds, and various illumination conditions. To handle these challenges, we present a foreground segmentation method, based on generative adversarial network (GAN). We aim to segment foreground objects in the presence of two aforementioned major challenges in background scenes in real environments. To address this problem, our presented GAN model is trained on background image samples with various illumination conditions including dynamic changes, after that for testing the GAN model has to generate the same background sample as test sample with similar illumination conditions via back-propagation technique. The generated background sample is then subtracted from the given test sample to segment foreground objects. We have also proposed a dataset for this problem containing video sequences captured from dawn until dusk with time lapsed condition. The comparison of our proposed method with five state-of-the-art methods highlights the strength of our algorithm for foreground segmentation in the presence of challenging illumination conditions and dynamic background scenario.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03120

PDF

http://arxiv.org/pdf/1902.03120
Read All
License Plate Recognition with Compressive Sensing Based Feature Extraction

2019-02-07

Andrej Jokic, Nikola Vukovic

arXiv_CV

arXiv_CV Classification Recognition
Abstract

License plate recognition is the key component to many automatic traffic control systems. It enables the automatic identification of vehicles in many applications. Such systems must be able to identify vehicles from images taken in various conditions including low light, rain, snow, etc. In order to reduce the complexity and cost of the hardware required for such devices, the algorithm should be as efficient as possible. This paper proposes a license plate recognition system which uses a new approach based on compressive sensing techniques for dimensionality reduction and feature extraction. Dimensionality reduction will enable precise classification with less training data while demanding less computational power. Based on the extracted features, character recognition and classification is done by a Support Vector Machine classifier.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05386

PDF

http://arxiv.org/pdf/1902.05386
Read All
Learning to detect chest radiographs containing lung nodules using visual attention networks

2019-02-07

Emanuele Pesce, Petros-Pavlos Ypsilantis, Samuel Withey, Robert Bakewell, Vicky Goh, Giovanni Montana

arXiv_CV

arXiv_CV Salient Attention Reinforcement_Learning CNN Classification Detection
Abstract

Machine learning approaches hold great potential for the automated detection of lung nodules in chest radiographs, but training the algorithms requires vary large amounts of manually annotated images, which are difficult to obtain. Weak labels indicating whether a radiograph is likely to contain pulmonary nodules are typically easier to obtain at scale by parsing historical free-text radiological reports associated to the radiographs. Using a repositotory of over 700,000 chest radiographs, in this study we demonstrate that promising nodule detection performance can be achieved using weak labels through convolutional neural networks for radiograph classification. We propose two network architectures for the classification of images likely to contain pulmonary nodules using both weak labels and manually-delineated bounding boxes, when these are available. Annotated nodules are used at training time to deliver a visual attention mechanism informing the model about its localisation performance. The first architecture extracts saliency maps from high-level convolutional layers and compares the estimated position of a nodule against the ground truth, when this is available. A corresponding localisation error is then back-propagated along with the softmax classification error. The second approach consists of a recurrent attention model that learns to observe a short sequence of smaller image portions through reinforcement learning. When a nodule annotation is available at training time, the reward function is modified accordingly so that exploring portions of the radiographs away from a nodule incurs a larger penalty. Our empirical results demonstrate the potential advantages of these architectures in comparison to competing methodologies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1712.00996

PDF

http://arxiv.org/pdf/1712.00996
Read All
Fully Convolutional Neural Network for Semantic Segmentation of Anatomical Structure and Pathologies in Colour Fundus Images Associated with Diabetic Retinopathy

2019-02-07

Oindrila Saha, Rachana Sathish, Debdoot Sheet

arXiv_CV

arXiv_CV Segmentation CNN Semantic_Segmentation Classification Prediction Detection
Abstract

Diabetic retinopathy (DR) is the most common form of diabetic eye disease. Retinopathy can affect all diabetic patients and becomes particularly dangerous, increasing the risk of blindness, if it is left untreated. The success rate of its curability solemnly depends on diagnosis at an early stage. The development of automated computer aided disease diagnosis tools could help in faster detection of symptoms with a wider reach and reasonable cost. This paper proposes a method for the automated segmentation of retinal lesions and optic disk in fundus images using a deep fully convolutional neural network for semantic segmentation. This trainable segmentation pipeline consists of an encoder network, a corresponding decoder network followed by pixel-wise classification to segment microaneurysms, hemorrhages, hard exudates, soft exudates, optic disk from background. The network was trained using Binary cross entropy criterion with Sigmoid as the last layer, while during an additional SoftMax layer was used for boosting response of single class. The performance of the proposed method is evaluated using sensitivity, positive prediction value (PPV) and accuracy as the metrices. Further, the position of the Optic disk is localised using the segmented output map.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03122

PDF

http://arxiv.org/pdf/1902.03122
Read All
The Actor-Advisor: Policy Gradient With Off-Policy Advice

2019-02-07

Hélène Plisnier, Denis Steckelmacher, Diederik M. Roijers, Ann Nowé

arXiv_AI

arXiv_AI Knowledge GAN Transfer_Learning
Abstract

Actor-critic algorithms learn an explicit policy (actor), and an accompanying value function (critic). The actor performs actions in the environment, while the critic evaluates the actor’s current policy. However, despite their stability and promising convergence properties, current actor-critic algorithms do not outperform critic-only ones in practice. We believe that the fact that the critic learns Q^pi, instead of the optimal Q-function Q*, prevents state-of-the-art robust and sample-efficient off-policy learning algorithms from being used. In this paper, we propose an elegant solution, the Actor-Advisor architecture, in which a Policy Gradient actor learns from unbiased Monte-Carlo returns, while being shaped (or advised) by the Softmax policy arising from an off-policy critic. The critic can be learned independently from the actor, using any state-of-the-art algorithm. Being advised by a high-quality critic, the actor quickly and robustly learns the task, while its use of the Monte-Carlo return helps overcome any bias the critic may have. In addition to a new Actor-Critic formulation, the Actor-Advisor, a method that allows an external advisory policy to shape a Policy Gradient actor, can be applied to many other domains. By varying the source of advice, we demonstrate the wide applicability of the Actor-Advisor to three other important subfields of RL: safe RL with backup policies, efficient leverage of domain knowledge, and transfer learning in RL. Our experimental results demonstrate the benefits of the Actor-Advisor compared to state-of-the-art actor-critic methods, illustrate its applicability to the three other application scenarios listed above, and show that many important challenges of RL can now be solved using a single elegant solution.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02556

PDF

http://arxiv.org/pdf/1902.02556
Read All
First Impressions: A Survey on Vision-Based Apparent Personality Trait Analysis

2019-02-07

Julio C. S. Jacques Junior, Yağmur Güçlütürk, Marc Pérez, Umut Güçlü, Carlos Andujar, Xavier Baró, Hugo Jair Escalante, Isabelle Guyon, Marcel A. J. van Gerven, Rob van Lier, Sergio Escalera

arXiv_CV

arXiv_CV Review GAN Face Survey Recognition
Abstract

Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.08046

PDF

http://arxiv.org/pdf/1804.08046
Read All
Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification

2019-02-07

Wei Rao, Chenglin Xu, Eng Siong Chng, Haizhou Li

arXiv_SD

arXiv_SD
Abstract

The performance of speaker verification degrades significantly when the test speech is corrupted by interference speakers. Speaker diarization does well to separate speakers if the speakers are temporally overlapped. However, if multi-talkers speak at the same time, we need the technique to separate the speech in the spectral domain. This paper proposes an overlapped multi-talker speaker verification framework by using target speaker extraction methods. Specifically, given the target speaker information, the target speaker’s speech is firstly extracted from the overlapped multi-talker speech by a target speaker extraction module. Then, the extracted speech is passed to the speaker verification system. Experimental results show that the proposed approach significantly improves the performance of overlapped multi-talker speaker verification and achieves 65.7% relative EER reduction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02546

PDF

http://arxiv.org/pdf/1902.02546
Read All
Online Clustering by Penalized Weighted GMM

2019-02-07

Shlomo Bugdary, Shay Maymon

arXiv_CV

arXiv_CV
Abstract

With the dawn of the Big Data era, data sets are growing rapidly. Data is streaming from everywhere - from cameras, mobile phones, cars, and other electronic devices. Clustering streaming data is a very challenging problem. Unlike the traditional clustering algorithms where the dataset can be stored and scanned multiple times, clustering streaming data has to satisfy constraints such as limit memory size, real-time response, unknown data statistics and an unknown number of clusters. In this paper, we present a novel online clustering algorithm which can be used to cluster streaming data without knowing the number of clusters a priori. Results on both synthetic and real datasets show that the proposed algorithm produces partitions which are close to what you could get if you clustered the whole data at one time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02544

PDF

http://arxiv.org/pdf/1902.02544
Read All
DoPAMINE: Double-sided Masked CNN for Pixel Adaptive Multiplicative Noise Despeckling

2019-02-07

Sunghwan Joo, Sungmin Cha, Taesup Moon

arXiv_CV

arXiv_CV
Abstract

We propose DoPAMINE, a new neural network based multiplicative noise despeckling algorithm. Our algorithm is inspired by Neural AIDE (N-AIDE), which is a recently proposed neural adaptive image denoiser. While the original N-AIDE was designed for the additive noise case, we show that the same framework, i.e., adaptively learning a network for pixel-wise affine denoisers by minimizing an unbiased estimate of MSE, can be applied to the multiplicative noise case as well. Moreover, we derive a double-sided masked CNN architecture which can control the variance of the activation values in each layer and converge fast to high denoising performance during supervised training. In the experimental results, we show our DoPAMINE possesses high adaptivity via fine-tuning the network parameters based on the given noisy image and achieves significantly better despeckling results compared to SAR-DRN, a state-of-the-art CNN-based algorithm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02530

PDF

http://arxiv.org/pdf/1902.02530
Read All
In-Memory and Error-Immune Differential RRAM Implementation of Binarized Deep Neural Networks

2019-02-07

Marc Bocquet, Tifenn Hirztlin, Jacques-Olivier Klein, Etienne Nowak, Elisa Vianello, Jean-Michel Portal, Damien Querlioz

arXiv_CV

arXiv_CV Recognition
Abstract

RRAM-based in-Memory Computing is an exciting road for implementing highly energy efficient neural networks. This vision is however challenged by RRAM variability, as the efficient implementation of in-memory computing does not allow error correction. In this work, we fabricated and tested a differential HfO2-based memory structure and its associated sense circuitry, which are ideal for in-memory computing. For the first time, we show that our approach achieves the same reliability benefits as error correction, but without any CMOS overhead. We show, also for the first time, that it can naturally implement Binarized Deep Neural Networks, a very recent development of Artificial Intelligence, with extreme energy efficiency, and that the system is fully satisfactory for image recognition applications. Finally, we evidence how the extra reliability provided by the differential memory allows programming the devices in low voltage conditions, where they feature high endurance of billions of cycles.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.02528

PDF

https://arxiv.org/pdf/1902.02528
Read All
Agent-Based Adaptive Level Generation for Dynamic Difficulty Adjustment in Angry Birds

2019-02-07

Matthew Stephenson, Jochen Renz

arXiv_AI

arXiv_AI
Abstract

This paper presents an adaptive level generation algorithm for the physics-based puzzle game Angry Birds. The proposed algorithm is based on a pre-existing level generator for this game, but where the difficulty of the generated levels can be adjusted based on the player’s performance. This allows for the creation of personalised levels tailored specifically to the player’s own abilities. The effectiveness of our proposed method is evaluated using several agents with differing strategies and AI techniques. By using these agents as models / representations of real human player’s characteristics, we can optimise level properties efficiently over a large number of generations. As a secondary investigation, we also demonstrate that by combining the performance of several agents together it is possible to generate levels that are especially challenging for certain players but not others.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02518

PDF

http://arxiv.org/pdf/1902.02518
Read All
Improving Latent User Models in Online Social Media

2019-02-07

Adit Krishnan, Ashish Sharma, Hari Sundaram

arXiv_AI

arXiv_AI Face
Abstract

Modern social platforms are characterized by the presence of rich user-behavior data associated with the publication, sharing and consumption of textual content. Users interact with content and with each other in a complex and dynamic social environment while simultaneously evolving over time. In order to effectively characterize users and predict their future behavior in such a setting, it is necessary to overcome several challenges. Content heterogeneity and temporal inconsistency of behavior data result in severe sparsity at the user level. In this paper, we propose a novel mutual-enhancement framework to simultaneously partition and learn latent activity profiles of users. We propose a flexible user partitioning approach to effectively discover rare behaviors and tackle user-level sparsity. We extensively evaluate the proposed framework on massive datasets from real-world platforms including Q&A networks and interactive online courses (MOOCs). Our results indicate significant gains over state-of-the-art behavior models ( 15% avg ) in a varied range of tasks and our gains are further magnified for users with limited interaction data. The proposed algorithms are amenable to parallelization, scale linearly in the size of datasets, and provide flexibility to model diverse facets of user behavior.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1711.11124

PDF

http://arxiv.org/pdf/1711.11124
Read All
Advances on CNN-based super-resolution of Sentinel-2 images

2019-02-07

Massimiliano Gargiulo

arXiv_CV

arXiv_CV Super_Resolution CNN
Abstract

Thanks to their temporal-spatial coverage and free access, Sentinel-2 images are very interesting for the community. However, a relatively coarse spatial resolution, compared to that of state-of-the-art commercial products, motivates the study of super-resolution techniques to mitigate such a limitation. Specifically, thirtheen bands are sensed simultaneously but at different spatial resolutions: 10, 20, and 60 meters depending on the spectral location. Here, building upon our previous convolutional neural network (CNN) based method, we propose an improved CNN solution to super-resolve the 20-m resolution bands benefiting spatial details conveyed by the accompanying 10-m spectral bands.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02513

PDF

http://arxiv.org/pdf/1902.02513
Read All
Sparsely Aggregated Convolutional Networks

2019-02-07

Ligeng Zhu, Ruizhi Deng, Michael Maire, Zhiwei Deng, Greg Mori, Ping Tan

arXiv_CV

arXiv_CV Sparse CNN
Abstract

We explore a key architectural aspect of deep convolutional neural networks: the pattern of internal skip connections used to aggregate outputs of earlier layers for consumption by deeper layers. Such aggregation is critical to facilitate training of very deep networks in an end-to-end manner. This is a primary reason for the widespread adoption of residual networks, which aggregate outputs via cumulative summation. While subsequent works investigate alternative aggregation operations (e.g. concatenation), we focus on an orthogonal question: which outputs to aggregate at a particular point in the network. We propose a new internal connection structure which aggregates only a sparse set of previous outputs at any given depth. Our experiments demonstrate this simple design change offers superior performance with fewer parameters and lower computational requirements. Moreover, we show that sparse aggregation allows networks to scale more robustly to 1000+ layers, thereby opening future avenues for training long-running visual processes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.05895

PDF

http://arxiv.org/pdf/1801.05895
Read All
Towards Autoencoding Variational Inference for Aspect-based Opinion Summary

2019-02-07

Tai Hoang, Huy Le, Tho Quan

arXiv_CL

arXiv_CL Sentiment Knowledge Sentiment_Classification Inference Classification
Abstract

Aspect-based Opinion Summary (AOS), consisting of aspect discovery and sentiment classification steps, has recently been emerging as one of the most crucial data mining tasks in e-commerce systems. Along this direction, the LDA-based model is considered as a notably suitable approach, since this model offers both topic modeling and sentiment classification. However, unlike traditional topic modeling, in the context of aspect discovery it is often required some initial seed words, whose prior knowledge is not easy to be incorporated into LDA models. Moreover, LDA approaches rely on sampling methods, which need to load the whole corpus into memory, making them hardly scalable. In this research, we study an alternative approach for AOS problem, based on Autoencoding Variational Inference (AVI). Firstly, we introduce the Autoencoding Variational Inference for Aspect Discovery (AVIAD) model, which extends the previous work of Autoencoding Variational Inference for Topic Models (AVITM) to embed prior knowledge of seed words. This work includes enhancement of the previous AVI architecture and also modification of the loss function. Ultimately, we present the Autoencoding Variational Inference for Joint Sentiment/Topic (AVIJST) model. In this model, we substantially extend the AVI model to support the JST model, which performs topic modeling for corresponding sentiment. The experimental results show that our proposed models enjoy higher topic coherent, faster convergence time and better accuracy on sentiment classification, as compared to their LDA-based counterparts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02507

PDF

http://arxiv.org/pdf/1902.02507
Read All
Conv-codes: Audio Hashing For Bird Species Classification

2019-02-07

Anshul Thakur, Pulkit Sharma, Vinayak Abrol, Padmanabhan Rajan

arXiv_SD

arXiv_SD Sparse Classification
Abstract

In this work, we propose a supervised, convex representation based audio hashing framework for bird species classification. The proposed framework utilizes archetypal analysis, a matrix factorization technique, to obtain convex-sparse representations of a bird vocalization. These convex representations are hashed using Bloom filters with non-cryptographic hash functions to obtain compact binary codes, designated as conv-codes. The conv-codes extracted from the training examples are clustered using class-specific k-medoids clustering with Jaccard coefficient as the similarity metric. A hash table is populated using the cluster centers as keys while hash values/slots are pointers to the species identification information. During testing, the hash table is searched to find the species information corresponding to a cluster center that exhibits maximum similarity with the test conv-code. Hence, the proposed framework classifies a bird vocalization in the conv-code space and requires no explicit classifier or reconstruction error calculations. Apart from that, based on min-hash and direct addressing, we also propose a variant of the proposed framework that provides faster and effective classification. The performances of both these frameworks are compared with existing bird species classification frameworks on the audio recordings of 50 different bird species.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02498

PDF

http://arxiv.org/pdf/1902.02498
Read All
CHIP: Channel-wise Disentangled Interpretation of Deep Convolutional Neural Networks

2019-02-07

Xinrui Cui, Dan Wang, Z. Jane Wang

arXiv_CV

arXiv_CV Regularization Sparse Knowledge CNN Image_Classification Classification Prediction
Abstract

With the widespread applications of deep convolutional neural networks (DCNNs), it becomes increasingly important for DCNNs not only to make accurate predictions but also to explain how they make their decisions. In this work, we propose a CHannel-wise disentangled InterPretation (CHIP) model to give the visual interpretation to the predictions of DCNNs. The proposed model distills the class-discriminative importance of channels in networks by utilizing the sparse regularization. Here, we first introduce the network perturbation technique to learn the model. The proposed model is capable to not only distill the global perspective knowledge from networks but also present the class-discriminative visual interpretation for specific predictions of networks. It is noteworthy that the proposed model is able to interpret different layers of networks without re-training. By combining the distilled interpretation knowledge in different layers, we further propose the Refined CHIP visual interpretation that is both high-resolution and class-discriminative. Experimental results on the standard dataset demonstrate that the proposed model provides promising visual interpretation for the predictions of networks in image classification task compared with existing visual interpretation methods. Besides, the proposed method outperforms related approaches in the application of ILSVRC 2015 weakly-supervised localization task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02497

PDF

http://arxiv.org/pdf/1902.02497
Read All
An Architecture Combining Convolutional Neural Network and Support Vector Machine for Image Classification

2019-02-07

Abien Fred Agarap

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

Convolutional neural networks (CNNs) are similar to “ordinary” neural networks in the sense that they are made up of hidden layers consisting of neurons with “learnable” parameters. These neurons receive inputs, performs a dot product, and then follows it with a non-linearity. The whole network expresses the mapping between raw image pixels and their class scores. Conventionally, the Softmax function is the classifier used at the last layer of this network. However, there have been studies (Alalshekmubarak and Smith, 2013; Agarap, 2017; Tang, 2013) conducted to challenge this norm. The cited studies introduce the usage of linear support vector machine (SVM) in an artificial neural network architecture. This project is yet another take on the subject, and is inspired by (Tang, 2013). Empirical data has shown that the CNN-SVM model was able to achieve a test accuracy of ~99.04% using the MNIST dataset (LeCun, Cortes, and Burges, 2010). On the other hand, the CNN-Softmax was able to achieve a test accuracy of ~99.23% using the same dataset. Both models were also tested on the recently-published Fashion-MNIST dataset (Xiao, Rasul, and Vollgraf, 2017), which is suppose to be a more difficult image classification dataset than MNIST (Zalandoresearch, 2017). This proved to be the case as CNN-SVM reached a test accuracy of ~90.72%, while the CNN-Softmax reached a test accuracy of ~91.86%. The said results may be improved if data preprocessing techniques were employed on the datasets, and if the base CNN model was a relatively more sophisticated than the one used in this study.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1712.03541

PDF

http://arxiv.org/pdf/1712.03541
Read All
Towards Building an Intelligent Anti-Malware System: A Deep Learning Approach using Support Vector Machine for Malware Classification

2019-02-07

Abien Fred Agarap

arXiv_CV

arXiv_CV Classification Deep_Learning Detection Relation
Abstract

Effective and efficient mitigation of malware is a long-time endeavor in the information security community. The development of an anti-malware system that can counteract an unknown malware is a prolific activity that may benefit several sectors. We envision an intelligent anti-malware system that utilizes the power of deep learning (DL) models. Using such models would enable the detection of newly-released malware through mathematical generalization. That is, finding the relationship between a given malware $x$ and its corresponding malware family $y$, $f: x \mapsto y$. To accomplish this feat, we used the Malimg dataset (Nataraj et al., 2011) which consists of malware images that were processed from malware binaries, and then we trained the following DL models 1 to classify each malware family: CNN-SVM (Tang, 2013), GRU-SVM (Agarap, 2017), and MLP-SVM. Empirical evidence has shown that the GRU-SVM stands out among the DL models with a predictive accuracy of ~84.92%. This stands to reason for the mentioned model had the relatively most sophisticated architecture design among the presented models. The exploration of an even more optimal DL-SVM model is the next stage towards the engineering of an intelligent anti-malware system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.00318

PDF

http://arxiv.org/pdf/1801.00318
Read All
Deep Learning using Rectified Linear Units

2019-02-07

Abien Fred Agarap

arXiv_CV

arXiv_CV Classification Deep_Learning Prediction
Abstract

We introduce the use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN). Conventionally, ReLU is used as an activation function in DNNs, with Softmax function as their classification function. However, there have been several studies on using a classification function other than Softmax, and this study is an addition to those. We accomplish this by taking the activation of the penultimate layer $h_{n - 1}$ in a neural network, then multiply it by weight parameters $\theta$ to get the raw scores $o_{i}$. Afterwards, we threshold the raw scores $o_{i}$ by $0$, i.e. $f(o) = \max(0, o_{i})$, where $f(o)$ is the ReLU function. We provide class predictions $\hat{y}$ through argmax function, i.e. argmax $f(x)$.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.08375

PDF

http://arxiv.org/pdf/1803.08375
Read All
Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks

2019-02-07

Amirata Ghorbani, James Wexler, Been Kim

arXiv_CV

arXiv_CV Prediction Relation
Abstract

Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Due to it’s complexity, i For high-stakes domains such as medical, providing intuitive explanations that can be consumed by domain experts without ML expertise becomes crucial. To this demand, concept-based methods (e.g., TCAV) were introduced to provide explanations using user-chosen high-level concepts rather than individual input features. While these methods successfully leverage rich representations learned by the networks to reveal how human-defined concepts are related to the prediction, they require users to select concepts of their choice and collect labeled examples of those concepts. In this work, we introduce DTCAV (Discovery TCAV) a global concept-based interpretability method that can automatically discover concepts as image segments, along with each concept’s estimated importance for a deep neural network’s predictions. We validate that discovered concepts are as coherent to humans as hand-labeled concepts. We also show that the discovered concepts carry significant signal for prediction by analyzing a network’s performance with stitched/added/deleted concepts. DTCAV results revealed a number of undesirable correlations (e.g., a basketball player’s jersey was a more important concept for predicting the basketball class than the ball itself) and show the potential shallow reasoning of these networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03129

PDF

http://arxiv.org/pdf/1902.03129
Read All
End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification

2019-02-07

Hee-Soo Heo, Jee-weon Jung, IL-Ho Yang, Sung-Hyun Yoon, Hye-jin Shim, Ha-Jin Yu

arXiv_SD

arXiv_SD Embedding
Abstract

In recent years, speaker verification has been primarily performed using deep neural networks that are trained to output embeddings from input features such as spectrograms or filterbank energies. Therefore, studies have been conducted to design various loss functions, including metric learning, to train deep neural networks to make them suitable for speaker verification. We propose end-to-end loss functions for speaker verification using speaker bases, which are trainable parameters. We expect that each speaker basis will represent the corresponding speaker in the process of training deep neural networks. Conventional loss functions can only consider a limited number of speakers that are included in a mini-batch. In contrast, as the proposed loss functions are based on speaker bases, each sample can be compared against all speakers regardless of mini-batch composition. Through a speaker verification experiment performed using the VoxCeleb 1, we confirmed that the proposed loss functions could increase between-speaker variations and perform hard negative mining for each mini-batch. In particular, it was shown that the system trained through the proposed loss functions had an equal error rate of 5.55%. In addition, the proposed loss functions reduced errors by approximately 15% compared with the system trained with the conventional center loss function.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02455

PDF

http://arxiv.org/pdf/1902.02455
Read All
Theoretical analysis on Noise2Noise using Stein's Unbiased Risk Estimator for Gaussian denoising: Towards unsupervised training with clipped noisy images

2019-02-07

Magauiya Zhussip, Shakarim Soltanayev, Se Young Chun

arXiv_CV

arXiv_CV
Abstract

Recently, Noise2Noise has been proposed for unsupervised training of deep neural networks in image restoration problems including denoising Gaussian noise. However, it does not work well for truncated noise with non-zero mean. Here, we perform theoretical analysis on Noise2Noise for the limited case of Gaussian noise removal using Stein’s Unbiased Risk Estimator (SURE). We extend SURE to deal with a pair of noise realizations to directly compare with Noise2Noise. Then, we show that Noise2Noise with Gaussian noise is a special case of our newly extended SURE with a pair of uncorrelated noise realizations. Lastly, we propose a compensation method for clipped Gaussian noise to approximately follow Normal distribution and show how this compensation method can be used for SURE based unsupervised denoiser training. We also show that our theoretical analysis provides insights on how to use Noise2Noise for clipped Gaussian noise.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02452

PDF

http://arxiv.org/pdf/1902.02452
Read All
Speeding up scaled gradient projection methods using deep neural networks for inverse problems in image processing

2019-02-07

Byung Hyun Lee, Se Young Chun

arXiv_CV

arXiv_CV Sparse Optimization
Abstract

Conventional optimization based methods have utilized forward models with image priors to solve inverse problems in image processing. Recently, deep neural networks (DNN) have been investigated to significantly improve the image quality of the solution for inverse problems. Most DNN based inverse problems have focused on using data-driven image priors with massive amount of data. However, these methods often do not inherit nice properties of conventional approaches using theoretically well-grounded optimization algorithms such as monotone, global convergence. Here we investigate another possibility of using DNN for inverse problems in image processing. We propose methods to use DNNs to seamlessly speed up convergence rates of conventional optimization based methods. Our DNN-incorporated scaled gradient projection methods, without breaking theoretical properties, significantly improved convergence speed over state-of-the-art conventional optimization methods such as ISTA or FISTA in practice for inverse problems such as image inpainting, compressive image recovery with partial Fourier samples, image deblurring, and medical image reconstruction with sparse-view projections.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02449

PDF

http://arxiv.org/pdf/1902.02449
Read All
Effectiveness of LSTMs in Predicting Congestive Heart Failure Onset

2019-02-07

Sunil Mallya, Marc Overhage, Navneet Srivastava, Tatsuya Arai, Cole Erdman

arXiv_AI

arXiv_AI Sparse Embedding CNN RNN Deep_Learning Prediction
Abstract

In this paper we present a Recurrent neural networks (RNN) based architecture that achieves an AUCROC of 0.9147 for predicting the onset of Congestive Heart Failure (CHF) 15 months in advance using a 12-month observation window on a large cohort of 216,394 patients. We believe this to be the largest study in CHF onset prediction with respect to the number of CHF case patients in the cohort and the test set (3,332 CHF patients) on which the AUC metrics are reported. We explore the extent to which LSTM (Long Short Term Memory) based model, a variant of RNNs, can accurately predict the onset of CHF when compared to known linear baselines like Logistic Regression, Random Forests and deep learning based models such as Multi-Layer Perceptron and Convolutional Neural Networks. We utilize demographics, medical diagnosis and procedure data from 21,405 CHF and 194,989 control patients to as our features. We describe our feature embedding strategy for medical diagnosis codes that accommodates the sparse, irregular, longitudinal, and high-dimensional characteristics of EHR data. We empirically show that LSTMs can capture the longitudinal aspects of EHR data better than the proposed baselines. As an attempt to interpret the model, we present a temporal data analysis-based technique on false positives to attribute feature importance. A model capable of predicting the onset of congestive heart failure months in the future with this level of accuracy and precision can support efforts of practitioners to implement risk factor reduction strategies and researchers to begin to systematically evaluate interventions to potentially delay or avert development of the disease with high mortality, morbidity and significant costs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02443

PDF

http://arxiv.org/pdf/1902.02443
Read All
2019-05-31

Read All
A Simple Baseline for Bayesian Uncertainty in Deep Learning

2019-02-07

Wesley Maddox, Timur Garipov, Pavel Izmailov, Dmitry Vetrov, Andrew Gordon Wilson

arXiv_AI

arXiv_AI Transfer_Learning Deep_Learning Detection Gradient_Descent
Abstract

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of computer vision tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including MC dropout, KFAC Laplace, and temperature scaling.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02476

PDF

http://arxiv.org/pdf/1902.02476
Read All
Augmenting Learning Components for Safety in Resource Constrained Autonomous Robots

2019-02-06

Shreyas Ramakrishna, Abhishek Dubey, Matthew P Burruss, Charles Hartsell, Nagabhushan Mahadevan, Saideep Nannapaneni, Aron Laszka, Gabor Karsai

arXiv_AI

arXiv_AI
Abstract

This paper deals with resource constrained autonomous robots commonly found in factories, hospitals, and education laboratories, which popularly use learning enabled components (LEC) to make control actions. However, these LECs do not provide any safety guarantees, and testing them is challenging. To overcome these challenges, we introduce a framework that performs confidence estimation, resource management, and supervised safety control of autonomous systems with LECs. Using this framework, we make the following contributions: (1) allow for seamless integration of safety controllers and different simplex strategies to aid the LEC, (2) introduce RL-Simplex and illustrate the use of Q-learning to learn the optimal weights for the arbitration logic of the Simplex Architecture, (3) design a system level monitor that uses the current state information and a discrete Bayesian network model learned from past data to estimate a metric, which indicates if the car will remain in the safe region, and (4) a Resource Manager which performs dynamic task offloading depending on the resource temperature and CPU utilization while continually adjusting vehicle speed to compensate for the latency overhead. We compare the speed, steering and safety performance of the different controllers and simplex strategies, and we find RL-Simplex to have 60\% fewer safety violations and higher optimized speed during indoor driving ($\sim\,0.40\,m/s$) than the original system (using only LEC).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02432

PDF

http://arxiv.org/pdf/1902.02432
Read All
Toward A Neuro-inspired Creative Decoder

2019-02-06

Payel Das, Brian Quanz, Pin-Yu Chen, Jaw-wook Ahn

arXiv_AI

arXiv_AI
Abstract

Creativity, a process that generates novel and valuable ideas, involves increased association between task-positive (control) and task-negative (default) networks in brain. Inspired by this seminal finding, in this study we propose a creative decoder that directly modulates the neuronal activation pattern, while sampling from the learned latent space. The proposed approach is fully unsupervised and can be used as off-the-shelf. Our experiments on three different image datasets (MNIST, FMNIST, CELEBA) reveal that the co-activation between task-positive and task-negative neurons during decoding in a deep neural net enables generation of novel artifacts. We further identify sufficient conditions on several novelty metrics towards measuring the creativity of generated samples.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02399

PDF

http://arxiv.org/pdf/1902.02399
Read All
Real-time 3D Traffic Cone Detection for Autonomous Driving

2019-02-06

Ankit Dhall, Dengxin Dai, Luc Van Gool

arXiv_CV

arXiv_CV Regularization Object_Detection Detection
Abstract

Considerable progress has been made in semantic scene understanding of road scenes with monocular cameras. It is, however, mainly related to certain classes such as cars and pedestrians. This work investigates traffic cones, an object class crucial for traffic control in the context of autonomous vehicles. 3D object detection using images from a monocular camera is intrinsically an ill-posed problem. In this work, we leverage the unique structure of traffic cones and propose a pipelined approach to the problem. Specifically, we first detect cones in images by a tailored 2D object detector; then, the spatial arrangement of keypoints on a traffic cone are detected by our deep structural regression network, where the fact that the cross-ratio is projection invariant is leveraged for network regularization; finally, the 3D position of cones is recovered by the classical Perspective n-Point algorithm. Extensive experiments show that our approach can accurately detect traffic cones and estimate their position in the 3D world in real time. The proposed method is also deployed on a real-time, critical system. It runs efficiently on the low-power Jetson TX2, providing accurate 3D position estimates, allowing a race-car to map and drive autonomously on an unseen track indicated by traffic cones. With the help of robust and accurate perception, our race-car won both Formula Student Competitions held in Italy and Germany in 2018, cruising at a top-speed of 54 kmph. Visualization of the complete pipeline, mapping and navigation can be found on our project page.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02394

PDF

http://arxiv.org/pdf/1902.02394
Read All
Distributed Synthesis of Surveillance Strategies for Mobile Sensors

2019-02-06

Suda Bharadwaj, Rayna Dimitrova, Ufuk Topcu

arXiv_AI

arXiv_AI
Abstract

We study the problem of synthesizing strategies for a mobile sensor network to conduct surveillance in partnership with static alarm triggers. We formulate the problem as a multi-agent reactive synthesis problem with surveillance objectives specified as temporal logic formulas. In order to avoid the state space blow-up arising from a centralized strategy computation, we propose a method to decentralize the surveillance strategy synthesis by decomposing the multi-agent game into subgames that can be solved independently. We also decompose the global surveillance specification into local specifications for each sensor, and show that if the sensors satisfy their local surveillance specifications, then the sensor network as a whole will satisfy the global surveillance objective. Thus, our method is able to guarantee global surveillance properties in a mobile sensor network while synthesizing completely decentralized strategies with no need for coordination between the sensors. We also present a case study in which we demonstrate an application of decentralized surveillance strategy synthesis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02393

PDF

http://arxiv.org/pdf/1902.02393
Read All
Investigating RNN Memory using Neuro-Evolution: Investigating Recurrent Neural Network Memory Structures using Neuro-Evolution

2019-02-06

Alexander Ororbia, Ahmed Ahmed Elsaid, Travis Desell

arXiv_AI

arXiv_AI RNN Prediction
Abstract

This paper presents a new algorithm, Evolutionary eXploration of Augmenting Memory Models (EXAMM), which is capable of evolving recurrent neural networks (RNNs) using a wide variety of memory structures, such as Delta-RNN, GRU, LSTM, MGU and UGRNN cells. EXAMM evolved RNNs to perform prediction of large-scale, real world time series data from the aviation and power industries. These data sets consist of very long time series (thousands of readings), each with a large number of potentially correlated and dependent parameters. Four different parameters were selected for prediction and EXAMM runs were performed using each memory cell type alone, each cell type with feed forward nodes, and with all possible memory cell types. Evolved RNN performance was measured using repeated k-fold cross validation, resulting in 1210 EXAMM runs which evolved 2,420,000 RNNs in 12,100 CPU hours on a high performance computing cluster. Generalization of the evolved RNNs was examined statistically, providing interesting findings that can help refine the RNN memory cell design as well as inform future neuro-evolution algorithms development.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02390

PDF

http://arxiv.org/pdf/1902.02390
Read All
Global Explanations of Neural Networks: Mapping the Landscape of Predictions

2019-02-06

Mark Ibrahim, Melissa Louie, Ceena Modarres, John Paisley

arXiv_AI

arXiv_AI Prediction
Abstract

A barrier to the wider adoption of neural networks is their lack of interpretability. While local explanation methods exist for one prediction, most global attributions still reduce neural network decisions to a single set of features. In response, we present an approach for generating global attributions called GAM, which explains the landscape of neural network predictions across subpopulations. GAM augments global explanations with the proportion of samples that each attribution best explains and specifies which samples are described by each attribution. Global explanations also have tunable granularity to detect more or fewer subpopulations. We demonstrate that GAM’s global explanations 1) yield the known feature importances of simulated data, 2) match feature weights of interpretable statistical models on real data, and 3) are intuitive to practitioners through user studies. With more transparent predictions, GAM can help ensure neural network decisions are generated for the right reasons.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02384

PDF

http://arxiv.org/pdf/1902.02384
Read All
End-to-end Anchored Speech Recognition

2019-02-06

Yiming Wang, Xing Fan, I-Fan Chen, Yuzong Liu, Tongfei Chen, Björn Hoffmeister

arXiv_CL

arXiv_CL Attention Face Speech_Recognition Recognition
Abstract

Voice-controlled house-hold devices, like Amazon Echo or Google Home, face the problem of performing speech recognition of device-directed speech in the presence of interfering background speech, i.e., background noise and interfering speech from another person or media device in proximity need to be ignored. We propose two end-to-end models to tackle this problem with information extracted from the “anchored segment”. The anchored segment refers to the wake-up word part of an audio stream, which contains valuable speaker information that can be used to suppress interfering speech and background noise. The first method is called “Multi-source Attention” where the attention mechanism takes both the speaker information and decoder state into consideration. The second method directly learns a frame-level mask on top of the encoder output. We also explore a multi-task learning setup where we use the ground truth of the mask to guide the learner. Given that audio data with interfering speech is rare in our training data set, we also propose a way to synthesize “noisy” speech from “clean” speech to mitigate the mismatch between training and test data. Our proposed methods show up to 15% relative reduction in WER for Amazon Alexa live data with interfering background speech without significantly degrading on clean speech.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02383

PDF

http://arxiv.org/pdf/1902.02383
Read All
Compression of Recurrent Neural Networks for Efficient Language Modeling

2019-02-06

Artem M. Grachev, Dmitry I. Ignatov, Andrey V. Savchenko

arXiv_CL

arXiv_CL Attention Inference RNN Language_Model
Abstract

Recurrent neural networks have proved to be an effective method for statistical language modeling. However, in practice their memory and run-time complexity are usually too large to be implemented in real-time offline mobile applications. In this paper we consider several compression techniques for recurrent neural networks including Long-Short Term Memory models. We make particular attention to the high-dimensional output problem caused by the very large vocabulary size. We focus on effective compression methods in the context of their exploitation on devices: pruning, quantization, and matrix decomposition approaches (low-rank factorization and tensor train decomposition, in particular). For each model we investigate the trade-off between its size, suitability for fast inference and perplexity. We propose a general pipeline for applying the most suitable methods to compress recurrent neural networks for language modeling. It has been shown in the experimental study with the Penn Treebank (PTB) dataset that the most efficient results in terms of speed and compression-perplexity balance are obtained by matrix decomposition techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02380

PDF

http://arxiv.org/pdf/1902.02380
Read All
Centroid-based deep metric learning for speaker recognition

2019-02-06

Jixuan Wang, Kuan-Chieh Wang, Marc Law, Frank Rudzicz, Michael Brudno

arXiv_SD

arXiv_SD Embedding Image_Classification Classification Recognition
Abstract

Speaker embedding models that utilize neural networks to map utterances to a space where distances reflect similarity between speakers have driven recent progress in the speaker recognition task. However, there is still a significant performance gap between recognizing speakers in the training set and unseen speakers. The latter case corresponds to the few-shot learning task, where a trained model is evaluated on unseen classes. Here, we optimize a speaker embedding model with prototypical network loss (PNL), a state-of-the-art approach for the few-shot image classification task. The resulting embedding model outperforms the state-of-the-art triplet loss based models in both speaker verification and identification tasks, for both seen and unseen speakers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02375

PDF

http://arxiv.org/pdf/1902.02375
Read All
The role of a layer in deep neural networks: a Gaussian Process perspective

2019-02-06

Oded Ben-David, Zohar Ringel

arXiv_AI

arXiv_AI Optimization Deep_Learning
Abstract

A fundamental question in deep learning concerns the role played by individual layers in a deep neural network (DNN) and the transferable properties of the data representations which they learn. To the extent that layers have clear roles one should be able to optimize them separately using layer-wise loss functions. Such loss functions would describe what is the set of good data representations at each depth of the network and provide a target for layer-wise greedy optimization (LEGO). Here we introduce the Deep Gaussian Layer-wise loss functions (DGLs) which, we believe, are the first supervised layer-wise loss functions which are both explicit and competitive in terms of accuracy. The DGLs have a solid theoretical foundation, they become exact for wide DNNs, and we find that they can monitor standard end-to-end training. Being highly structured and symmetric, the DGLs provide a promising analytic route to understanding the internal representations generated by DNNs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02354

PDF

http://arxiv.org/pdf/1902.02354
Read All
Deep Morphological Simplification Network for Guided Registration of Brain Magnetic Resonance Images

2019-02-06

Dongming Wei, Zhengwang Wu, Gang Li, Xiaohuan Cao, Dinggang Shen, Qian Wang

arXiv_CV

arXiv_CV Knowledge Face Deep_Learning
Abstract

Objective: Deformable brain MR image registration is challenging due to large inter-subject anatomical variation. For example, the highly complex cortical folding pattern makes it hard to accurately align corresponding cortical structures of individual images. In this paper, we propose a novel deep learning way to simplify the difficult registration problem of brain MR images. Methods: We train a morphological simplification network (MS-Net), which can generate a “simple” image with less anatomical details based on the “complex” input. With MS-Net, the complexity of the fixed image or the moving image under registration can be reduced gradually, thus building an individual (simplification) trajectory represented by MS-Net outputs. Since the generated images at the ends of the two trajectories (of the fixed and moving images) are so simple and very similar in appearance, they are easy to register. Thus, the two trajectories can act as a bridge to link the fixed and the moving images, and guide their registration. Results: Our experiments show that the proposed method can achieve highly accurate registration performance on different datasets (i.e., NIREP, LPBA, IBSR, CUMC, and MGH). Moreover, the method can be also easily transferred across diverse image datasets and obtain superior accuracy on surface alignment. Conclusion and Significance: We propose MS-Net as a powerful and flexible tool to simplify brain MR images and their registration. To our knowledge, this is the first work to simplify brain MR image registration by deep learning, instead of estimating deformation field directly.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02342

PDF

http://arxiv.org/pdf/1902.02342
Read All
Extending a model for ontology-based Arabic-English machine translation

2019-02-06

Neama Abdulaziz Dahan, Fadl Mutaher Ba-Alwi

arXiv_CL

arXiv_CL Ontology
Abstract

The acceleration in telecommunication needs leads to many groups of research, especially in communication facilitating and Machine Translation fields. While people contact with others having different languages and cultures, they need to have instant translations. However, the available instant translators are still providing somewhat bad Arabic-English Translations, for instance when translating books or articles, the meaning is not totally accurate. Therefore, using the semantic web techniques to deal with the homographs and homonyms semantically, the aim of this research is to extend a model for the ontology-based Arabic-English Machine Translation, named NAN, which simulate the human way in translation. The experimental results show that NAN translation is approximately more similar to the Human Translation than the other instant translators. The resulted translation will help getting the translated texts in the target language somewhat correctly and semantically more similar to human translations for the Non-Arabic Natives and the Non-English natives.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02326

PDF

http://arxiv.org/pdf/1902.02326
Read All
Is AmI Robust to Adversarial Examples?

2019-02-06

Nicholas Carlini

arXiv_AI

arXiv_AI Adversarial
Abstract

No.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02322

PDF

http://arxiv.org/pdf/1902.02322
Read All
On overfitting and asymptotic bias in batch reinforcement learning with partial observability

2019-02-06

Vincent Francois-Lavet, Guillaume Rabusseau, Joelle Pineau, Damien Ernst, Raphael Fonteneau

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1709.07796

PDF

http://arxiv.org/pdf/1709.07796
Read All
CESMA: Centralized Expert Supervises Multi-Agents

2019-02-06

Alex Tong Lin, Mark J. Debord, Katia Estabridis, Gary Hewer, Stanley Osher

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We consider the reinforcement learning problem of training multiple agents in order to maximize a shared reward. In this multi-agent system, each agent seeks to maximize the reward while interacting with other agents, and they may or may not be able to communicate. Typically the agents do not have access to other agent policies and thus each agent observes a non-stationary and partially-observable environment. In order to resolve this issue, we demonstrate a novel multi-agent training framework that first turns a multi-agent problem into a single-agent problem to obtain a centralized expert that is then used to guide supervised learning for multiple independent agents with the goal of decentralizing the policy. We additionally demonstrate a way to turn the exponential growth in the joint action space into a linear growth for the centralized policy. Overall, the problem is twofold: the problem of obtaining a centralized expert, and then the problem of supervised learning to train the multi-agents. We demonstrate our solutions to both of these tasks, and show that supervised learning can be used to decentralize a multi-agent policy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02311

PDF

http://arxiv.org/pdf/1902.02311
Read All
A Guiding Principle for Causal Decision Problems

2019-02-06

M. Gonzalez-Soto, L.E. Sucar, H.J. Escalante

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We define a Causal Decision Problem as a Decision Problem where the available actions, the family of uncertain events and the set of outcomes are related through the variables of a Causal Graphical Model $\mathcal{G}$. A solution criteria based on Pearl’s Do-Calculus and the Expected Utility criteria for rational preferences is proposed. The implementation of this criteria leads to an on-line decision making procedure that has been shown to have similar performance to classic Reinforcement Learning algorithms while allowing for a causal model of an environment to be learned. Thus, we aim to provide the theoretical guarantees of the usefulness and optimality of a decision making procedure based on causal information.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02279

PDF

http://arxiv.org/pdf/1902.02279
Read All
Simultaneous x, y Pixel Estimation and Feature Extraction for Multiple Small Objects in a Scene: A Description of the ALIEN Network

2019-02-06

Seth Zuckerman, Timothy Klein, Alexander Boxer, Christopher Goldman, Brian Lang

arXiv_CV

arXiv_CV Detection
Abstract

We present a deep-learning network that detects multiple small objects (hundreds to thousands) in a scene while simultaneously estimating their x,y pixel locations together with a characteristic feature-set (for instance, target orientation and color). All estimations are performed in a single, forward pass which makes implementing the network fast and efficient. In this paper, we describe the architecture of our network — nicknamed ALIEN — and detail its performance when applied to vehicle detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05387

PDF

http://arxiv.org/pdf/1902.05387
Read All
Unsupervised Polyglot Text To Speech

2019-02-06

Eliya Nachmani, Lior Wolf

arXiv_CL

arXiv_CL
Abstract

We present a TTS neural network that is able to produce speech in multiple languages. The proposed network is able to transfer a voice, which was presented as a sample in a source language, into one of several target languages. Training is done without using matching or parallel data, i.e., without samples of the same speaker in multiple languages, making the method much more applicable. The conversion is based on learning a polyglot network that has multiple per-language sub-networks and adding loss terms that preserve the speaker’s identity in multiple languages. We evaluate the proposed polyglot neural network for three languages with a total of more than 400 speakers and demonstrate convincing conversion capabilities.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02263

PDF

http://arxiv.org/pdf/1902.02263
Read All
CLEAR: A Consistent Lifting, Embedding, and Alignment Rectification Algorithm for Multi-Agent Data Association

2019-02-06

Kaveh Fathian, Kasra Khosoussi, Parker Lusk, Yulun Tian, Jonathan P. How

arXiv_CV

arXiv_CV Tracking Embedding Object_Tracking SLAM
Abstract

A fundamental challenge in many robotics applications is to correctly synchronize and fuse observations across a team of sensors or agents. Instead of solely relying on pairwise matches among observations, multi-way matching methods leverage the notion of cycle consistency to (i) provide a natural correction mechanism for removing noise and outliers from pairwise matches; (ii) construct an efficient and low-rank representation of the data via merging the redundant observations. To solve this computationally challenging problem, state-of-the-art techniques resort to relaxation and rounding techniques that can potentially result in a solution that violates the cycle consistency principle. Hence, losing the aforementioned benefits. In this work, we present the CLEAR algorithm to address this issue by generating solutions that are, by construction, cycle consistent. Through a novel spectral graph clustering approach, CLEAR fuses the techniques in the multi-way matching and the spectral clustering literature and provides consistent solutions, even in challenging high-noise regimes. Our resulting general framework can provide significant improvement in the accuracy and efficiency of existing distributed multi-agent learning, collaborative SLAM, and multiobject tracking pipelines, which traditionally use pairwise (but potentially inconsistent) correspondences.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02256

PDF

http://arxiv.org/pdf/1902.02256
Read All
Generative Image Translation for Data Augmentation of Bone Lesion Pathology

2019-02-06

Anant Gupta, Srivas Venkatesh, Sumit Chopra, Christian Ledig

arXiv_CV

arXiv_CV Adversarial Transfer_Learning Classification Detection
Abstract

Insufficient training data and severe class imbalance are often limiting factors when developing machine learning models for the classification of rare diseases. In this work, we address the problem of classifying bone lesions from X-ray images by increasing the small number of positive samples in the training set. We propose a generative data augmentation approach based on a cycle-consistent generative adversarial network that synthesizes bone lesions on images without pathology. We pose the generative task as an image-patch translation problem that we optimize specifically for distinct bones (humerus, tibia, femur). In experimental results, we confirm that the described method mitigates the class imbalance problem in the binary classification task of bone lesion detection. We show that the augmented training sets enable the training of superior classifiers achieving better performance on a held-out test set. Additionally, we demonstrate the feasibility of transfer learning and apply a generative model that was trained on one body part to another.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02248

PDF

http://arxiv.org/pdf/1902.02248
Read All
The Connection between DNNs and Classic Classifiers: Generalize, Memorize, or Both?

2019-02-06

Gilad Cohen, Guillermo Sapiro, Raja Giryes

arXiv_AI

arXiv_AI Classification Relation
Abstract

This work studies the relationship between the classification performed by deep neural networks (DNNs) and the decision of various classic classifiers, namely $k$-nearest neighbors ($k$-NN), support vector machines (SVM), and logistic regression (LR). This is studied at various layers of the network, providing us with new insights on the ability of DNNs to both memorize the training data and generalize to new data at the same time, where $k$-NN serves as the ideal estimator that perfectly memorizes the data. First, we show that DNNs’ generalization improves gradually along their layers and that memorization of non-generalizing networks happens only at the last layers. We also observe that the behavior of DNNs compared to the linear classifiers SVM and LR is quite the same on the training and test data regardless of whether the network generalizes. On the other hand, the similarity to $k$-NN holds only at the absence of overfitting. This suggests that the $k$-NN behavior of the network on new data is a good sign of generalization. Moreover, this allows us to use existing $k$-NN theory for DNNs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.06822

PDF

http://arxiv.org/pdf/1805.06822
Read All
Adversarially Learning a Local Anatomical Prior: Vertebrae Labelling with 2D reformations

2019-02-06

Anjany Sekuboyina, Markus Rempfler, Alexander Valentinitsch, Jan S. Kirschke, Bjoern H. Menze

arXiv_CV

arXiv_CV Adversarial
Abstract

Robust localisation and identification of vertebrae, jointly termed vertebrae labelling, in computed tomography (CT) images is an essential component of automated spine analysis. Current approaches for this task mostly work with 3D scans and are comprised of a sequence of multiple networks. Contrarily, our approach relies only on 2D reformations, enabling us to design an end-to-end trainable, standalone network. Our contribution includes: (1) Inspired by the workflow of human experts, a novel butterfly-shaped network architecture (termed Btrfly net) that efficiently combines information across sufficiently-informative sagittal and coronal reformations. (2) Two adversarial training regimes that encode an anatomical prior of the spine’s shape into the Btrfly net, each enforcing the prior in a distinct manner. We evaluate our approach on a public benchmarking dataset of 302 CT scans achieving a performance comparable to state-of-art methods (identification rate of $>$88%) without any post-processing stages. Addressing its translation to clinical settings, an in-house dataset of 65 CT scans with a higher data variability is introduced, where we discuss refinements that render our approach robust to such scenarios.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02205

PDF

http://arxiv.org/pdf/1902.02205
Read All

163/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL