Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

2019-02-09

Seyed-Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Hassan Ghasemzadeh

arXiv_AI

arXiv_AI Knowledge GAN
Abstract

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too gigantic to be deployed on edge devices like smart-phones or embedded sensor nodes. There has been efforts to compress these networks, and a popular method is knowledge distillation, where a large (a.k.a. teacher) pre-trained network is used to train a smaller (a.k.a. student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation which employs an intermediate-sized network (a.k.a. teacher assistant) to bridge the gap between the student and the teacher. We study the effect of teacher assistant size and extend the framework to multi-step distillation. Moreover, empirical and theoretical analysis are conducted to analyze the teacher assistant knowledge distillation framework. Extensive experiments on CIFAR-10 and CIFAR-100 datasets and plain CNN and ResNet architectures substantiate the effectiveness of our proposed approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03393

PDF

http://arxiv.org/pdf/1902.03393
Read All
Contextual Hourglass Network for Semantic Segmentation of High Resolution Aerial Imagery

2019-02-09

Panfeng Li, Youzuo Lin, Emily Schultz-Fellenz

arXiv_CV

arXiv_CV Segmentation Attention CNN Semantic_Segmentation Deep_Learning Prediction
Abstract

Semantic segmentation for aerial imagery is a challenging and important problem in remotely sensed imagery analysis. In recent years, with the success of deep learning, various convolutional neural network (CNN) based models have been developed. However, due to the varying sizes of the objects and imbalanced class labels, it can be challenging to obtain accurate pixel-wise semantic segmentation results. To address those challenges, we develop a novel semantic segmentation method and call it Contextual Hourglass Network. In our method, in order to improve the robustness of the prediction, we design a new contextual hourglass module which incorporates attention mechanism on processed low-resolution featuremaps to exploit the contextual semantics. We further exploit the stacked encoder-decoder structure by connecting multiple contextual hourglass modules from end to end. This architecture can effectively extract rich multi-scale features and add more feedback loops for better learning contextual semantics through intermediate supervision. To demonstrate the efficacy of our semantic segmentation method, we test it on Potsdam and Vaihingen datasets. Through the comparisons to other baseline methods, our method yields the best results on overall performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.12813

PDF

http://arxiv.org/pdf/1810.12813
Read All
Iteratively reweighted penalty alternating minimization methods with continuation for image deblurring

2019-02-09

Tao Sun, Dongsheng Li, Hao Jiang, Zhe Quan

arXiv_CV

arXiv_CV
Abstract

In this paper, we consider a class of nonconvex problems with linear constraints appearing frequently in the area of image processing. We solve this problem by the penalty method and propose the iteratively reweighted alternating minimization algorithm. To speed up the algorithm, we also apply the continuation strategy to the penalty parameter. A convergence result is proved for the algorithm. Compared with the nonconvex ADMM, the proposed algorithm enjoys both theoretical and computational advantages like weaker convergence requirements and faster speed. Numerical results demonstrate the efficiency of the proposed algorithm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.04062

PDF

http://arxiv.org/pdf/1902.04062
Read All
Optimization of dynamic mobile robot path planning based on evolutionary methods

2019-02-09

Masoud Fetanat, Sajjad Haghzad, Saeed Bagheri Shouraki

arXiv_RO

arXiv_RO Optimization
Abstract

This paper presents evolutionary methods for optimization in dynamic mobile robot path planning. In dynamic mobile path planning, the goal is to find an optimal feasible path from starting point to target point with various obstacles, as well as smoothness and safety in the proposed path. Pattern search (PS) algorithm, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) are used to find an optimal path for mobile robots to reach to target point with obstacle avoidance. For showing the success of the proposed method, first they are applied to two different paths with a dynamic environment in obstacles. The first results show that the PSO algorithms are converged and minimize the objective function better that the others, while PS has the lower time compared to other algorithms in the initial and modified environment. The second test path is in the z-type environment that we compare the mentioned algorithms on it. Also in this environment, the same result is repeated.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03390

PDF

http://arxiv.org/pdf/1902.03390
Read All
Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking

2019-02-09

Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari

arXiv_AI

arXiv_AI Tracking
Abstract

This paper proposes a generative moment matching network (GMMN)-based post-filter that provides inter-utterance pitch variation for deep neural network (DNN)-based singing voice synthesis. The natural pitch variation of a human singing voice leads to a richer musical experience and is used in double-tracking, a recording method in which two performances of the same phrase are recorded and mixed to create a richer, layered sound. However, singing voices synthesized using conventional DNN-based methods never vary because the synthesis process is deterministic and only one waveform is synthesized from one musical score. To address this problem, we use a GMMN to model the variation of the modulation spectrum of the pitch contour of natural singing voices and add a randomized inter-utterance variation to the pitch contour generated by conventional DNN-based singing voice synthesis. Experimental evaluations suggest that 1) our approach can provide perceptible inter-utterance pitch variation while preserving speech quality. We extend our approach to double-tracking, and the evaluation demonstrates that 2) GMMN-based neural double-tracking is perceptually closer to natural double-tracking than conventional signal processing-based artificial double-tracking is.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03389

PDF

http://arxiv.org/pdf/1902.03389
Read All
When Causal Intervention Meets Image Masking and Adversarial Perturbation for Deep Neural Networks

2019-02-09

Chao-Han Huck Yang, Yi-Chieh Liu, Pin-Yu Chen, Xiaoli Ma, Yi-Chang James Tsai

arXiv_AI

arXiv_AI Adversarial Inference Prediction Relation
Abstract

Discovering and exploiting the causality in deep neural networks (DNNs) are crucial challenges for understanding and reasoning causal effects (CE) on an explainable visual model. “Intervention” has been widely used for recognizing a causal relation ontologically. In this paper, we propose a causal inference framework for visual reasoning via do-calculus. To study the intervention effects on pixel-level feature(s) for causal reasoning, we introduce pixel-wide masking and adversarial perturbation. In our framework, CE is calculated using features in a latent space and perturbed prediction from a DNN-based model. We further provide a first look into the characteristics of discovered CE of adversarially perturbed images generated by gradient-based methods. Experimental results show that CE is a competitive and robust index for understanding DNNs when compared with conventional methods such as class-activation mappings (CAMs) on the ChestX-ray 14 dataset for human-interpretable feature(s) (e.g., symptom) reasoning. Moreover, CE holds promises for detecting adversarial examples as it possesses distinct characteristics in the presence of adversarial perturbations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03380

PDF

http://arxiv.org/pdf/1902.03380
Read All
Region based Ensemble Learning Network for Fine-grained Classification

2019-02-09

Weikuang Li, Tian Wang, Chuanyun Wang, Guangcun Shan, Mengyi Zhang, Hichem Snoussi

arXiv_CV

arXiv_CV Attention Classification Detection Recognition
Abstract

As an important research topic in computer vision, fine-grained classification which aims to recognition subordinate-level categories has attracted significant attention. We propose a novel region based ensemble learning network for fine-grained classification. Our approach contains a detection module and a module for classification. The detection module is based on the faster R-CNN framework to locate the semantic regions of the object. The classification module using an ensemble learning method, which trains a set of sub-classifiers for different semantic regions and combines them together to get a stronger classifier. In the evaluation, we implement experiments on the CUB-2011 dataset and the result of experiments proves our method s efficient for fine-grained classification. We also extend our approach to remote scene recognition and evaluate it on the NWPU-RESISC45 dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03377

PDF

http://arxiv.org/pdf/1902.03377
Read All
Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding

2019-02-09

Zihao Zhu, Changchang Yin, Buyue Qian, Yu Cheng, Jishang Wei, Fei Wang

arXiv_AI

arXiv_AI Sparse Embedding CNN
Abstract

Evaluating the clinical similarities between pairwise patients is a fundamental problem in healthcare informatics. A proper patient similarity measure enables various downstream applications, such as cohort study and treatment comparative effectiveness research. One major carrier for conducting patient similarity research is Electronic Health Records(EHRs), which are usually heterogeneous, longitudinal, and sparse. Though existing studies on learning patient similarity from EHRs have shown being useful in solving real clinical problems, their applicability is limited due to the lack of medical interpretations. Moreover, most previous methods assume a vector-based representation for patients, which typically requires aggregation of medical events over a certain time period. As a consequence, temporal information will be lost. In this paper, we propose a patient similarity evaluation framework based on the temporal matching of longitudinal patient EHRs. Two efficient methods are presented, unsupervised and supervised, both of which preserve the temporal properties in EHRs. The supervised scheme takes a convolutional neural network architecture and learns an optimal representation of patient clinical records with medical concept embedding. The empirical results on real-world clinical data demonstrate substantial improvement over the baselines. We make our code and sample data available for further study.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03376

PDF

http://arxiv.org/pdf/1902.03376
Read All
Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration

2019-02-09

Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern

arXiv_CV

arXiv_CV Segmentation GAN Classification Detection Recommendation
Abstract

The International Skin Imaging Collaboration (ISIC) is a global partnership that has organized the world’s largest public repository of dermoscopic images of skin lesions. This archive has been used for 3 consecutive years to host challenges on skin lesion analysis toward melanoma detection, covering 3 analysis tasks of lesion segmentation, lesion attribute detection, and disease classification. The most recent instance in 2018 was hosted at the Medical Image Computing and Computer Assisted Intervention (MICCAI) conference in Granada, Spain. The dataset included over 10,000 images. Approximately 900 users registered for data download, 115 submitted to the lesion segmentation task, 25 submitted to the lesion attribute detection task, and 159 submitted to the disease classification task, making this the largest study in the field to date. Important new analyses were introduced to better reflect the difficulties of translating research systems to clinical practice. This article summarizes the results of these analyses, and makes recommendations for future challenges in medical imaging.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03368

PDF

http://arxiv.org/pdf/1902.03368
Read All
HE-SLAM: a Stereo SLAM System Based on Histogram Equalization and ORB Features

2019-02-09

Yinghong Fang, Guangcun Shan, Xin Li, Wenliang Liu, Tian Wang, Hichem Snoussi

arXiv_RO

arXiv_RO Detection SLAM
Abstract

In the real-life environments, due to the sudden appearance of windows, lights, and objects blocking the light source, the visual SLAM system can easily capture the low-contrast images caused by over-exposure or over-darkness. At this time, the direct method of estimating camera motion based on pixel luminance information is infeasible, and it is often difficult to find enough valid feature points without image processing. This paper proposed HE-SLAM, a new method combining histogram equalization and ORB feature extraction, which can be robust in more scenes, especially in stages with low-contrast images. Because HE-SLAM uses histogram equalization to improve the contrast of images, it can extract enough valid feature points in low-contrast images for subsequent feature matching, keyframe selection, bundle adjustment, and loop closure detection. The proposed HE-SLAM has been tested on the popular datasets (such as KITTI and EuRoc), and the real-time performance and robustness of the system are demonstrated by comparing system runtime and the mean square root error (RMSE) of absolute trajectory error (ATE) with state-of-the-art methods like ORB-SLAM2.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03365

PDF

http://arxiv.org/pdf/1902.03365
Read All
Image Decomposition and Classification through a Generative Model

2019-02-09

Houpu Yao, Malcolm Regan, Yezhou Yang, Yi Ren

arXiv_CV

arXiv_CV Adversarial Classification
Abstract

We demonstrate in this paper that a generative model can be designed to perform classification tasks under challenging settings, including adversarial attacks and input distribution shifts. Specifically, we propose a conditional variational autoencoder that learns both the decomposition of inputs and the distributions of the resulting components. During test, we jointly optimize the latent variables of the generator and the relaxed component labels to find the best match between the given input and the output of the generator. The model demonstrates promising performance at recognizing overlapping components from the multiMNIST dataset, and novel component combinations from a traffic sign dataset. Experiments also show that the proposed model achieves high robustness on MNIST and NORB datasets, in particular for high-strength gradient attacks and non-gradient attacks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03361

PDF

http://arxiv.org/pdf/1902.03361
Read All
Improving Deep Image Clustering With Spatial Transformer Layers

2019-02-09

Thiago V.M. Souza, Cleber Zanchettin

arXiv_CV

arXiv_CV Attention Deep_Learning
Abstract

Image clustering is an important but challenging task in machine learning. As in most image processing areas, the latest improvements came from models based on the deep learning approach. However, classical deep learning methods have problems to deal with spatial image transformations like scale and rotation. In this paper, we propose the use of visual attention techniques to reduce this problem in image clustering methods. We evaluate the combination of a deep image clustering model called Deep Adaptive Clustering (DAC) with the Visual Spatial Transformer Networks (STN). The proposed model is evaluated in the datasets MNIST and FashionMNIST and outperformed the baseline model in experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05401

PDF

http://arxiv.org/pdf/1902.05401
Read All
Challenges in Partially-Automated Roadway Feature Mapping Using Mobile Laser Scanning and Vehicle Trajectory Data

2019-02-09

Mohammad Billah, Farzana Rahman, Arash Maskooki, Michael Todd, Matthew Barth, Jay A. Farrell

arXiv_CV

arXiv_CV
Abstract

Connected vehicle and driver’s assistance applications are greatly facilitated by Enhanced Digital Maps (EDMs) that represent roadway features (e.g., lane edges or centerlines, stop bars). Due to the large number of signalized intersections and miles of roadway, manual development of EDMs on a global basis is not feasible. Mobile Terrestrial Laser Scanning (MTLS) is the preferred data acquisition method to provide data for automated EDM development. Such systems provide an MTLS trajectory and a point cloud for the roadway environment. The challenge is to automatically convert these data into an EDM. This article presents a new processing and feature extraction method, experimental demonstration providing SAE-J2735 map messages for eleven example intersections, and a discussion of the results that points out remaining challenges and suggests directions for future research.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03346

PDF

http://arxiv.org/pdf/1902.03346
Read All
Adversarial Audio Synthesis

2019-02-09

Chris Donahue, Julian McAuley, Miller Puckette

arXiv_SD

arXiv_SD Adversarial GAN Image_Generation
Abstract

Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales. Generative adversarial networks (GANs) have seen wide success at generating images that are both locally and globally coherent, but they have seen little application to audio generation. In this paper we introduce WaveGAN, a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio. WaveGAN is capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation. Our experiments demonstrate that, without labels, WaveGAN learns to produce intelligible words when trained on a small-vocabulary speech dataset, and can also synthesize audio from other domains such as drums, bird vocalizations, and piano. We compare WaveGAN to a method which applies GANs designed for image generation on image-like audio feature representations, finding both approaches to be promising.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.04208

PDF

http://arxiv.org/pdf/1802.04208
Read All
Photorealistic Image Synthesis for Object Instance Detection

2019-02-09

Tomas Hodan, Vibhav Vineet, Ran Gal, Emanuel Shalev, Jon Hanzelka, Treb Connell, Pedro Urbina, Sudipta N. Sinha, Brian Guenter

arXiv_AI

arXiv_AI Object_Detection CNN Detection
Abstract

We present an approach to synthesize highly photorealistic images of 3D object models, which we use to train a convolutional neural network for detecting the objects in real images. The proposed approach has three key ingredients: (1) 3D object models are rendered in 3D models of complete scenes with realistic materials and lighting, (2) plausible geometric configuration of objects and cameras in a scene is generated using physics simulations, and (3) high photorealism of the synthesized images achieved by physically based rendering. When trained on images synthesized by the proposed approach, the Faster R-CNN object detector achieves a 24% absolute improvement of mAP@.75IoU on Rutgers APC and 11% on LineMod-Occluded datasets, compared to a baseline where the training images are synthesized by rendering object models on top of random photographs. This work is a step towards being able to effectively train object detectors without capturing or annotating any real images. A dataset of 600K synthetic images with ground truth annotations for various computer vision tasks will be released on the project website: thodan.github.io/objectsynth.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03334

PDF

http://arxiv.org/pdf/1902.03334
Read All
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

2019-02-08

Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents’ actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agents. Actions that lead to bigger changes in other agents’ behavior are considered influential and are rewarded. We show that this is equivalent to rewarding agents for having high mutual information between their actions. Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. The influence rewards for all agents can be computed in a decentralized way by enabling agents to learn a model of other agents using deep neural networks. In contrast, key previous works on emergent communication in the MARL setting were unable to learn diverse policies in a decentralized manner and had to resort to centralized training. Consequently, the influence reward opens up a window of new opportunities for research in this area.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.08647

PDF

http://arxiv.org/pdf/1810.08647
Read All
Active Area Coverage from Equilibrium

2019-02-08

Ian Abraham, Ahalya Prabhakar, Todd D. Murphey

arXiv_RO

arXiv_RO Sparse
Abstract

This paper develops a method for robots to integrate stability into actively seeking out informative measurements through coverage. We derive a controller using hybrid systems theory that allows us to consider safe equilibrium policies during active data collection. We show that our method is able to maintain Lyapunov attractiveness while still actively seeking out data. Using incremental sparse Gaussian processes, we define distributions which allow a robot to actively seek out informative measurements. We illustrate our methods for shape estimation using a cart double pendulum, dynamic model learning of a hovering quadrotor, and generating galloping gaits starting from stationary equilibrium by learning a dynamics model for the half-cheetah system from the Roboschool environment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03320

PDF

http://arxiv.org/pdf/1902.03320
Read All
A Differentiable Augmented Lagrangian Method for Bilevel Nonlinear Optimization

2019-02-08

Benoit Landry, Zachary Manchester, Marco Pavone

arXiv_RO

arXiv_RO Optimization
Abstract

Many problems in modern robotics can be addressed by modeling them as bilevel optimization problems. In this work, we leverage augmented Lagrangian methods and recent advances in automatic differentiation to develop a general-purpose nonlinear optimization solver that is well suited to bilevel optimization. We then demonstrate the validity and scalability of our algorithm with two representative robotic problems, namely robust control and parameter estimation for a system involving contact. We stress the general nature of the algorithm and its potential relevance to many other problems in robotics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03319

PDF

http://arxiv.org/pdf/1902.03319
Read All
A sequential guiding network with attention for image captioning

2019-02-08

Daouda Sow, Zengchang Qin, Mouhamed Niasse, Tao Wan

arXiv_CV

arXiv_CV Image_Caption Attention Caption CNN RNN Deep_Learning
Abstract

The recent advances of deep learning in both computer vision (CV) and natural language processing (NLP) provide us a new way of understanding semantics, by which we can deal with more challenging tasks such as automatic description generation from natural images. In this challenge, the encoder-decoder framework has achieved promising performance when a convolutional neural network (CNN) is used as image encoder and a recurrent neural network (RNN) as decoder. In this paper, we introduce a sequential guiding network that guides the decoder during word generation. The new model is an extension of the encoder-decoder framework with attention that has an additional guiding long short-term memory (LSTM) and can be trained in an end-to-end manner by using image/descriptions pairs. We validate our approach by conducting extensive experiments on a benchmark dataset, i.e., MS COCO Captions. The proposed model achieves significant improvement comparing to the other state-of-the-art deep learning models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1811.00228

PDF

https://arxiv.org/pdf/1811.00228
Read All
Motion Scaling Solutions for Improved Performance in High Delay Surgical Teleoperation

2019-02-08

Florian Richter, Ryan K. Orosco, Michael C. Yip

arXiv_RO

arXiv_RO
Abstract

Robotic teleoperation brings great potential for advances within the field of surgery. The ability of a surgeon to reach patient remotely opens exciting opportunities. Early experience with telerobotic surgery has been interesting, but the clinical feasibility remains out of reach, largely due to the deleterious effects of communication delays. Teleoperation tasks are significantly impacted by unavoidable signal latency, which directly results in slower operations, less precision in movements, and increased human errors. Introducing significant changes to the surgical workflow, for example by introducing semi-automation or self-correction, present too significant a technological and ethical burden for commercial surgical robotic systems to adopt. In this paper, we present three simple and intuitive motion scaling solutions to combat teleoperated robotic systems under delay and help improve operator accuracy. Motion scaling offers potentially improved user performance and reduction in errors with minimal change to the underlying teleoperation architecture. To validate the use of motion scaling as a performance enhancer in telesurgery, we conducted a user study with 17 participants, and our results show that the proposed solutions do indeed reduce the error rate when operating under high delay.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03290

PDF

http://arxiv.org/pdf/1902.03290
Read All
Augmented Reality Predictive Displays to Help Mitigate the Effects of Delayed Telesurgery

2019-02-08

Florian Richter, Yifei Zhang, Yuheng Zhi, Ryan K. Orosco, Michael C. Yip

arXiv_RO

arXiv_RO Tracking
Abstract

Surgical robots offer the exciting potential for remote telesurgery, but advances are needed to make this technology efficient and accurate to ensure patient safety. Achieving these goals is hindered by the deleterious effects of latency between the remote operator and the bedside robot. Predictive displays have found success in overcoming these effects by giving the operator immediate visual feedback. However, previously developed predictive displays can not be directly applied to telesurgery due to the unique challenges in tracking the 3D geometry of the surgical environment. In this paper, we present the first predictive display for teleoperated surgical robots. The predicted display is stereoscopic, utilizes Augmented Reality (AR) to show the predicted motions alongside the complex tissue found in-situ within surgical environments, and overcomes the challenges in accurately tracking slave-tools in real-time. We call this a Stereoscopic AR Predictive Display (SARPD). We provide measurements to show the performance of the real-time tool tracking and AR rendering. To test the SARPD’s performance, we conducted a user study with ten participants on the da Vinci Surgical System. The results showed with statistical significance that using SARPD decreased time to complete task while having no effect on error rates when operating under delay.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.08627

PDF

http://arxiv.org/pdf/1809.08627
Read All
FERAtt: Facial Expression Recognition with Attention Net

2019-02-08

Pedro D. Marrero Fernandez, Fidel A. Guerrero Peña, Tsang Ing Ren, Alexandre Cunha

arXiv_CV

arXiv_CV Attention Face CNN Classification Recognition
Abstract

We present a new end-to-end network architecture for facial expression recognition with an attention model. It focuses attention in the human face and uses a Gaussian space representation for expression recognition. We devise this architecture based on two fundamental complementary components: (1) facial image correction and attention and (2) facial expression representation and classification. The first component uses an encoder-decoder style network and a convolutional feature extractor that are pixel-wise multiplied to obtain a feature attention map. The second component is responsible for obtaining an embedded representation and classification of the facial expression. We propose a loss function that creates a Gaussian structure on the representation space. To demonstrate the proposed method, we create two larger and more comprehensive synthetic datasets using the traditional BU3DFE and CK+ facial datasets. We compared results with the PreActResNet18 baseline. Our experiments on these datasets have shown the superiority of our approach in recognizing facial expressions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03284

PDF

http://arxiv.org/pdf/1902.03284
Read All
Machine learning and chord based feature engineering for genre prediction in popular Brazilian music

2019-02-08

Bruna D. Wundervald, Walmes M. Zeviani

arXiv_SD

arXiv_SD Classification Prediction Relation
Abstract

Music genre can be hard to describe: many factors are involved, such as style, music technique, and historical context. Some genres even have overlapping characteristics. Looking for a better understanding of how music genres are related to musical harmonic structures, we gathered data about the music chords for thousands of popular Brazilian songs. Here, ‘popular’ does not only refer to the genre named MPB (Brazilian Popular Music) but to nine different genres that were considered particular to the Brazilian case. The main goals of the present work are to extract and engineer harmonically related features from chords data and to use it to classify popular Brazilian music genres towards establishing a connection between harmonic relationships and Brazilian genres. We also emphasize the generalization of the method for obtaining the data, allowing for the replication and direct extension of this work. Our final model is a combination of multiple classification trees, also known as the random forest model. We found that features extracted from harmonic elements can satisfactorily predict music genre for the Brazilian case, as well as features obtained from the Spotify API. The variables considered in this work also give an intuition about how they relate to the genres.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03283

PDF

http://arxiv.org/pdf/1902.03283
Read All
Constrained-CNN losses for weakly supervised segmentation

2019-02-08

Hoel Kervadec, Jose Dolz, Meng Tang, Eric Granger, Yuri Boykov, Ismail Ben Ayed

arXiv_CV

arXiv_CV Knowledge Segmentation Attention Weakly_Supervised Optimization
Abstract

Weakly-supervised learning based on, e.g., partially labelled images or image-tags, is currently attracting significant attention in CNN segmentation as it can mitigate the need for full and laborious pixel/voxel annotations. Enforcing high-order (global) inequality constraints on the network output (for instance, to constrain the size of the target region) can leverage unlabeled data, guiding the training process with domain-specific knowledge. Inequality constraints are very flexible because they do not assume exact prior knowledge. However, constrained Lagrangian dual optimization has been largely avoided in deep networks, mainly for computational tractability reasons. To the best of our knowledge, the method of [Pathak et al., 2015] is the only prior work that addresses deep CNNs with linear constraints in weakly supervised segmentation. It uses the constraints to synthesize fully-labeled training masks (proposals) from weak labels, mimicking full supervision and facilitating dual optimization. We propose to introduce a differentiable penalty, which enforces inequality constraints directly in the loss function, avoiding expensive Lagrangian dual iterates and proposal generation. From constrained-optimization perspective, our simple penalty-based approach is not optimal as there is no guarantee that the constraints are satisfied. However, surprisingly, it yields substantially better results than the Lagrangian-based constrained CNNs in [Pathak et al., 2015], while reducing the computational demand for training. By annotating only a small fraction of the pixels, the proposed approach can reach a level of segmentation performance that is comparable to full supervision on three separate tasks. While our experiments focused on basic linear constraints such as the target-region size and image tags, our framework can be easily extended to other non-linear constraints.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.04628

PDF

http://arxiv.org/pdf/1805.04628
Read All
Learning Ontologies with Epistemic Reasoning: The EL Case

2019-02-08

Ana Ozaki, Nicolas Troquard

arXiv_AI

arXiv_AI
Abstract

We investigate the problem of learning description logic ontologies from entailments via queries, using epistemic reasoning. We introduce a new learning model consisting of epistemic membership and example queries and show that polynomial learnability in this model coincides with polynomial learnability in Angluin’s exact learning model with membership and equivalence queries. We then instantiate our learning framework to EL and show some complexity results for an epistemic extension of EL where epistemic operators can be applied over the axioms. Finally, we transfer known results for EL ontologies and its fragments to our learning model based on epistemic reasoning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03273

PDF

http://arxiv.org/pdf/1902.03273
Read All
Does the 'Artificial Intelligence Clinician' learn optimal treatment strategies for sepsis in intensive care?

2019-02-08

Russell Jeter, Christopher Josef, Supreeth Shashikumar, Shamim Nemati

arXiv_AI

arXiv_AI Attention
Abstract

From 2017 to 2018 the number of scientific publications found via PubMed search using the keyword “Machine Learning” increased by 46% (4,317 to 6,307). The results of studies involving machine learning, artificial intelligence (AI), and big data have captured the attention of healthcare practitioners, healthcare managers, and the public at a time when Western medicine grapples with unmitigated cost increases and public demands for accountability. The complexity involved in healthcare applications of machine learning and the size of the associated data sets has afforded many researchers an uncontested opportunity to satisfy these demands with relatively little oversight. In a recent Nature Medicine article, “The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care,” Komorowski and his coauthors propose methods to train an artificial intelligence clinician to treat sepsis patients with vasopressors and IV fluids. In this post, we will closely examine the claims laid out in this paper. In particular, we will study the individual treatment profiles suggested by their AI Clinician to gain insight into how their AI Clinician intends to treat patients on an individual level.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03271

PDF

http://arxiv.org/pdf/1902.03271
Read All
FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary

2019-02-08

Yingzhen Yang, Nebojsa Jojic, Jun Huan

arXiv_AI

arXiv_AI Object_Detection CNN Image_Classification Inference Classification Prediction Detection
Abstract

We present a novel method of compression of deep Convolutional Neural Networks (CNNs). The proposed method reduces the number of parameters of each convolutional layer by learning a 3D tensor termed Filter Summary (FS). The convolutional filters are extracted from FS as overlapping 3D blocks, and nearby filters in FS share weights in their overlapping regions in a natural way. The resultant neural network based on such weight sharing scheme, termed Filter Summary CNNs or FSNet, has a FS in each convolution layer instead of a set of independent filters in the conventional convolution layer. FSNet has the same architecture as that of the baseline CNN to be compressed, and each convolution layer of FSNet generates the same number of filters from FS as that of the basline CNN in the forward process. Without hurting the inference speed, the parameter space of FSNet is much smaller than that of the baseline CNN. In addition, FSNet is compatible with weight quantization, leading to even higher compression ratio when combined with weight quantization. Experiments demonstrate the effectiveness of FSNet in compression of CNNs for computer vision tasks including image classification and object detection. For classification task, FSNet of 0.22M effective parameters has prediction accuracy of 93.91% on the CIFAR-10 dataset with less than 0.3% accuracy drop, using ResNet-18 of 11.18M parameters as baseline. Furthermore, FSNet version of ResNet-50 with 2.75M effective parameters achieves the top-1 and top-5 accuracy of 63.80% and 85.72% respectively on ILSVRC-12 benchmark. For object detection task, FSNet is used to compress the Single Shot MultiBox Detector (SSD300) of 26.32M parameters. FSNet of 0.45M effective parameters achieves mAP of 67.63% on the VOC2007 test data with weight quantization, and FSNet of 0.68M parameters achieves mAP of 70.00% with weight quantization on the same test data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03264

PDF

http://arxiv.org/pdf/1902.03264
Read All
Skin Lesion Synthesis with Generative Adversarial Networks

2019-02-08

Alceu Bissoto, Fábio Perez, Eduardo Valle, Sandra Avila

arXiv_CV

arXiv_CV Adversarial Knowledge GAN Classification Detection
Abstract

Skin cancer is by far the most common type of cancer. Early detection is the key to increase the chances for successful treatment significantly. Currently, Deep Neural Networks are the state-of-the-art results on automated skin cancer classification. To push the results further, we need to address the lack of annotated data, which is expensive and require much effort from specialists. To bypass this problem, we propose using Generative Adversarial Networks for generating realistic synthetic skin lesion images. To the best of our knowledge, our results are the first to show visually-appealing synthetic images that comprise clinically-meaningful information.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03253

PDF

http://arxiv.org/pdf/1902.03253
Read All
Invariant-equivariant representation learning for multi-class data

2019-02-08

Ilya Feige

arXiv_AI

arXiv_AI Represenation_Learning Quantitative
Abstract

Representations learnt through deep neural networks tend to be highly informative, but opaque in terms of what information they learn to encode. We introduce an approach to probabilistic modelling that learns to represent data with two separate deep representations: an invariant representation that encodes the information of the class from which the data belongs, and an equivariant representation that encodes the symmetry transformation defining the particular data point within the class manifold (equivariant in the sense that the representation varies naturally with symmetry transformations). This approach is based primarily on the strategic routing of data through the two latent variables, and thus is conceptually transparent, easy to implement, and in-principle generally applicable to any data comprised of discrete classes of continuous distributions (e.g. objects in images, topics in language, individuals in behavioural data). We demonstrate qualitatively compelling representation learning and competitive quantitative performance, in both supervised and semi-supervised settings, versus comparable modelling approaches in the literature with little fine tuning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03251

PDF

http://arxiv.org/pdf/1902.03251
Read All
Insertion Transformer: Flexible Sequence Generation via Insertion Operations

2019-02-08

Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit

arXiv_CL

arXiv_CL
Abstract

We present the Insertion Transformer, an iterative, partially autoregressive model for sequence generation based on insertion operations. Unlike typical autoregressive models which rely on a fixed, often left-to-right ordering of the output, our approach accommodates arbitrary orderings by allowing for tokens to be inserted anywhere in the sequence during decoding. This flexibility confers a number of advantages: for instance, not only can our model be trained to follow specific orderings such as left-to-right generation or a binary tree traversal, but it can also be trained to maximize entropy over all valid insertions for robustness. In addition, our model seamlessly accommodates both fully autoregressive generation (one insertion at a time) and partially autoregressive generation (simultaneous insertions at multiple locations). We validate our approach by analyzing its performance on the WMT 2014 English-German machine translation task under various settings for training and decoding. We find that the Insertion Transformer outperforms many prior non-autoregressive approaches to translation at comparable or better levels of parallelism, and successfully recovers the performance of the original Transformer while requiring only logarithmically many iterations during decoding.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03249

PDF

http://arxiv.org/pdf/1902.03249
Read All
Ask Not What AI Can Do, But What AI Should Do: Towards a Framework of Task Delegability

2019-02-08

Brian Lubars, Chenhao Tan

arXiv_AI

arXiv_AI Survey
Abstract

Although artificial intelligence holds promise for addressing societal challenges, issues of exactly which tasks to automate and the extent to do so remain understudied. We approach the problem of task delegability from a human-centered perspective by developing a framework on human perception of task delegation to artificial intelligence. We consider four high-level factors that can contribute to a delegation decision: motivation, difficulty, risk, and trust. To obtain an empirical understanding of human preferences in different tasks, we build a dataset of 100 tasks from academic papers, popular media portrayal of AI, and everyday life. For each task, we administer a survey to collect judgments of each factor and ask subjects to pick the extent to which they prefer AI involvement. We find little preference for full AI control and a strong preference for machine-in-the-loop designs, in which humans play the leading role. Our framework can effectively predict human preferences in degrees of AI assistance. Among the four factors, trust is the most predictive of human preferences of optimal human-machine delegation. This framework represents a first step towards characterizing human preferences of automation across tasks. We hope this work may encourage and aid in future efforts towards understanding such individual attitudes; our goal is to inform the public and the AI research community rather than dictating any direction in technology development.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03245

PDF

http://arxiv.org/pdf/1902.03245
Read All
A 3D Probabilistic Deep Learning System for Detection and Diagnosis of Lung Cancer Using Low-Dose CT Scans

2019-02-08

Onur Ozdemir, Rebecca L. Russell, Andrew A. Berlin

arXiv_CV

arXiv_CV Knowledge CNN Classification Deep_Learning Detection
Abstract

We introduce a new end-to-end computer aided detection and diagnosis system for lung cancer screening using low-dose CT scans. Our system is based on 3D convolutional neural networks and achieves state-of-the-art performance for both lung nodule detection and malignancy classification tasks on the publicly available LUNA16 and Kaggle Data Science Bowl challenges. Furthermore, we characterize model uncertainty in our system and show that we can use this to provide well-calibrated classification probabilities for nodule detection and patient malignancy diagnosis. To the best of our knowledge, model uncertainty has not been considered in the context of lung CT analysis before. These calibrated probabilities informed by model uncertainty can be used for subsequent risk-based decision making towards diagnostic interventions or disease treatments, as we demonstrate using a probability-based patient referral strategy to further improve our results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03233

PDF

http://arxiv.org/pdf/1902.03233
Read All
Minimal Images in Deep Neural Networks: Fragile Object Recognition in Natural Images

2019-02-08

Sanjana Srivastava, Guy Ben-Yosef, Xavier Boix

arXiv_CV

arXiv_CV Adversarial Recognition
Abstract

The human ability to recognize objects is impaired when the object is not shown in full. “Minimal images” are the smallest regions of an image that remain recognizable for humans. Ullman et al. 2016 show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of-the-art deep neural networks (DNNs), and are much more prominent in DNNs. We found many cases where DNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in DNNs. Our results thus reveal a new failure mode of DNNs that also affects humans to a much lesser degree. They expose how fragile DNN recognition ability is for natural images even without adversarial patterns being introduced. Bringing the robustness of DNNs in natural images to the human level remains an open challenge for the community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03227

PDF

http://arxiv.org/pdf/1902.03227
Read All
Object tracking in video signals using Compressive Sensing

2019-02-08

Marijana Kracunov, Milica Bastica, Jovana Tesovic

arXiv_CV

arXiv_CV Tracking Object_Tracking
Abstract

Reducing the number of pixels in video signals while maintaining quality needed for recovering the trace of an object using Compressive Sensing is main subject of this work. Quality of frames, from video that contains moving object, are gradually reduced by keeping different number of pixels in each iteration, going from 45% all the way to 1%. Using algorithm for tracing object, results were satisfactory and showed mere changes in trajectory graphs, obtained from original and reconstructed videos.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06253

PDF

http://arxiv.org/pdf/1903.06253
Read All
Software-Defined FPGA Accelerator Design for Mobile Deep Learning Applications

2019-02-08

Panagiotis G. Mousouliotis, Loukas P. Petrou

arXiv_CV

arXiv_CV Object_Detection Segmentation Attention CNN Semantic_Segmentation Optimization Inference Deep_Learning Detection Recognition
Abstract

Recently, the field of deep learning has received great attention by the scientific community and it is used to provide improved solutions to many computer vision problems. Convolutional neural networks (CNNs) have been successfully used to attack problems such as object recognition, object detection, semantic segmentation, and scene understanding. The rapid development of deep learning goes hand by hand with the adaptation of GPUs for accelerating its processes, such as network training and inference. Even though FPGA design exists long before the use of GPUs for accelerating computations and despite the fact that high-level synthesis (HLS) tools are getting more attractive, the adaptation of FPGAs for deep learning research and application development is poor due to the requirement of hardware design related expertise. This work presents a workflow for deep learning mobile application acceleration on small low-cost low-power FPGA devices using HLS tools. This workflow eases the design of an improved version of the SqueezeJet accelerator used for the speedup of mobile-friendly low-parameter ImageNet class CNNs, such as the SqueezeNet v1.1 and the ZynqNet. Additionally, the workflow includes the development of an HLS-driven analytical model which is used for performance estimation of the accelerator. This model can be also used to direct the design process and lead to future design improvements and optimizations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03192

PDF

http://arxiv.org/pdf/1902.03192
Read All
Speaker diarisation using 2D self-attentive combination of embeddings

2019-02-08

Guangzhi Sun, Chao Zhang, Phil Woodland

arXiv_SD

arXiv_SD Embedding
Abstract

Speaker diarisation systems often cluster audio segments using speaker embeddings such as i-vectors and d-vectors. Since different types of embeddings are often complementary, this paper proposes a generic framework to improve performance by combining them into a single embedding, referred to as a c-vector. This combination uses a 2-dimensional (2D) self-attentive structure, which extends the standard self-attentive layer by averaging not only across time but also across different types of embeddings. Two types of 2D self-attentive structure in this paper are the simultaneous combination and the consecutive combination, adopting a single and multiple self-attentive layers respectively. The penalty term in the original self-attentive layer which is jointly minimised with the objective function to encourage diversity of annotation vectors is also modified to obtain not only different local peaks but also the overall trends in the multiple annotation vectors. Experiments on the AMI meeting corpus show that our modified penalty term improves the d- vector relative speaker error rate (SER) by 6% and 21% for d-vector systems, and a 10% further relative SER reduction can be obtained using the c-vector from our best 2D self-attentive structure.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.03190

PDF

https://arxiv.org/pdf/1902.03190
Read All
Implicit Generative Modeling of Random Noise during Training for Adversarial Robustness

2019-02-08

Priyadarshini Panda, Kaushik Roy

arXiv_CV

arXiv_CV Adversarial
Abstract

We introduce a Noise-based prior Learning (NoL) approach for training neural networks that are intrinsically robust to adversarial attacks. We find that the implicit generative modeling of random noise with the same loss function used during posterior maximization, improves a model’s understanding of the data manifold furthering adversarial robustness. We evaluate our approach’s efficacy and provide a simplistic visualization tool for understanding adversarial data, using Principal Component Analysis. Our analysis reveals that adversarial robustness, in general, manifests in models with higher variance along the high-ranked principal components. We show that models learnt with our approach perform remarkably well against a wide-range of attacks. Furthermore, combining NoL with state-of-the-art adversarial training extends the robustness of a model, even beyond what it is adversarially trained for, in both white-box and black-box attack scenarios.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.02188

PDF

http://arxiv.org/pdf/1807.02188
Read All
Understanding The Impact of Partner Choice on Cooperation and Social Norms by means of Multi-agent Reinforcement Learning

2019-02-08

Nicolas Anastassacos, Steve Hailes, Micro Musolesi

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

The human ability to coordinate and cooperate has been vital to the development of societies for thousands of years. While it is not fully clear how this behavior arises, social norms are thought to be a key factor in this development. In contrast to laws set by authorities, norms tend to evolve in a bottom-up manner from interactions between members of a society. While much behavior can be explained through the use of social norms, it is difficult to measure the extent to which they shape society as well as how they are affected by other societal dynamics. In this paper, we discuss the design and evaluation of a reinforcement learning model for understanding how the opportunity to choose who you interact with in a society affects the overall societal outcome and the strength of social norms. We first study the emergence of norms and then the emergence of cooperation in presence of norms. In our model, agents interact with other agents in a society in the form of repeated matrix-games: coordination games and cooperation games. In particular, in our model, at each each stage, agents are either able to choose a partner to interact with or are forced to interact at random and learn using policy gradients.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03185

PDF

http://arxiv.org/pdf/1902.03185
Read All
Asynchronous Spatial Image Convolutions for Event Cameras

2019-02-08

Cedric Scheerlinck, Nick Barnes, Robert Mahony

arXiv_CV

arXiv_CV Image_Caption Tracking Detection
Abstract

Spatial convolution is arguably the most fundamental of 2D image processing operations. Conventional spatial image convolution can only be applied to a conventional image, that is, an array of pixel values (or similar image representation) that are associated with a single instant in time. Event cameras have serial, asynchronous output with no natural notion of an image frame, and each event arrives with a different timestamp. In this paper, we propose a method to compute the convolution of a linear spatial kernel with the output of an event camera. The approach operates on the event stream output of the camera directly without synthesising pseudo-image frames as is common in the literature. The key idea is the introduction of an internal state that directly encodes the convolved image information, which is updated asynchronously as each event arrives from the camera. The state can be read-off as-often-as and whenever required for use in higher level vision algorithms for real-time robotic systems. We demonstrate the application of our method to corner detection, providing an implementation of a Harris corner-response “state” that can be used in real-time for feature detection and tracking on robotic systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.00438

PDF

http://arxiv.org/pdf/1812.00438
Read All
BINet: Multi-perspective Business Process Anomaly Classification

2019-02-08

Timo Nolle, Stefan Luettgen, Alexander Seeliger, Max Mühlhäuser

arXiv_AI

arXiv_AI Classification Detection
Abstract

In this paper, we introduce BINet, a neural network architecture for real-time multi-perspective anomaly detection in business process event logs. BINet is designed to handle both the control flow and the data perspective of a business process. Additionally, we propose a set of heuristics for setting the threshold of an anomaly detection algorithm automatically. We demonstrate that BINet can be used to detect anomalies in event logs not only on a case level but also on event attribute level. Finally, we demonstrate that a simple set of rules can be used to utilize the output of BINet for anomaly classification. We compare BINet to eight other state-of-the-art anomaly detection algorithms and evaluate their performance on an elaborate data corpus of 29 synthetic and 15 real-life event logs. BINet outperforms all other methods both on the synthetic as well as on the real-life datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03155

PDF

http://arxiv.org/pdf/1902.03155
Read All
Discretization based Solutions for Secure Machine Learning against Adversarial Attacks

2019-02-08

Priyadarshini Panda, Indranil Chakraborty, Kaushik Roy

arXiv_CV

arXiv_CV Adversarial Deep_Learning
Abstract

Adversarial examples are perturbed inputs that are designed (from a deep learning network’s (DLN) parameter gradients) to mislead the DLN during test time. Intuitively, constraining the dimensionality of inputs or parameters of a network reduces the ‘space’ in which adversarial examples exist. Guided by this intuition, we demonstrate that discretization greatly improves the robustness of DLNs against adversarial attacks. Specifically, discretizing the input space (or allowed pixel levels from 256 values or 8-bit to 4 values or 2-bit) extensively improves the adversarial robustness of DLNs for a substantial range of perturbations for minimal loss in test accuracy. Furthermore, we find that Binary Neural Networks (BNNs) and related variants are intrinsically more robust than their full precision counterparts in adversarial scenarios. Combining input discretization with BNNs furthers the robustness even waiving the need for adversarial training for certain magnitude of perturbation values. We evaluate the effect of discretization on MNIST, CIFAR10, CIFAR100 and Imagenet datasets. Across all datasets, we observe maximal adversarial resistance with 2-bit input discretization that incurs an adversarial accuracy loss of just ~1-2% as compared to clean test accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03151

PDF

http://arxiv.org/pdf/1902.03151
Read All
Novelty Search for Deep Reinforcement Learning Policy Network Weights by Action Sequence Edit Metric Distance

2019-02-08

Ethan C. Jackson, Mark Daley

arXiv_AI

arXiv_AI Reinforcement_Learning Detection
Abstract

Reinforcement learning (RL) problems often feature deceptive local optima, and learning methods that optimize purely for reward signal often fail to learn strategies for overcoming them. Deep neuroevolution and novelty search have been proposed as effective alternatives to gradient-based methods for learning RL policies directly from pixels. In this paper, we introduce and evaluate the use of novelty search over agent action sequences by string edit metric distance as a means for promoting innovation. We also introduce a method for stagnation detection and population resampling inspired by recent developments in the RL community that uses the same mechanisms as novelty search to promote and develop innovative policies. Our methods extend a state-of-the-art method for deep neuroevolution using a simple-yet-effective genetic algorithm (GA) designed to efficiently learn deep RL policy network weights. Experiments using four games from the Atari 2600 benchmark were conducted. Results provide further evidence that GAs are competitive with gradient-based algorithms for deep RL. Results also demonstrate that novelty search over action sequences is an effective source of selection pressure that can be integrated into existing evolutionary algorithms for deep RL.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03142

PDF

http://arxiv.org/pdf/1902.03142
Read All
Bounded Fuzzy Possibilistic Method

2019-02-08

Hossein Yazdani

arXiv_AI

arXiv_AI Classification Detection
Abstract

This paper introduces Bounded Fuzzy Possibilistic Method (BFPM) by addressing several issues that previous clustering/classification methods have not considered. In fuzzy clustering, object’s membership values should sum to 1. Hence, any object may obtain full membership in at most one cluster. Possibilistic clustering methods remove this restriction. However, BFPM differs from previous fuzzy and possibilistic clustering approaches by allowing the membership function to take larger values with respect to all clusters. Furthermore, in BFPM, a data object can have full membership in multiple clusters or even in all clusters. BFPM relaxes the boundary conditions (restrictions) in membership assignment. The proposed methodology satisfies the necessity of obtaining full memberships and overcomes the issues with conventional methods on dealing with overlapping. Analysing the objects’ movements from their own cluster to another (mutation) is also proposed in this paper. BFPM has been applied in different domains in geometry, set theory, anomaly detection, risk management, diagnosis diseases, and other disciplines. Validity and comparison indexes have been also used to evaluate the accuracy of BFPM. BFPM has been evaluated in terms of accuracy, fuzzification constant (different norms), objects’ movement analysis, and covering diversity. The promising results prove the importance of considering the proposed methodology in learning methods to track the behaviour of data objects, in addition to obtain accurate results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03127

PDF

http://arxiv.org/pdf/1902.03127
Read All
Speech enhancement with variational autoencoders and alpha-stable distributions

2019-02-08

Simon Leglaive, Umut Simsekli, Antoine Liutkus, Laurent Girin, Radu Horaud

arXiv_SD

arXiv_SD Knowledge
Abstract

This paper focuses on single-channel semi-supervised speech enhancement. We learn a speaker-independent deep generative speech model using the framework of variational autoencoders. The noise model remains unsupervised because we do not assume prior knowledge of the noisy recording environment. In this context, our contribution is to propose a noise model based on alpha-stable distributions, instead of the more conventional Gaussian non-negative matrix factorization approach found in previous studies. We develop a Monte Carlo expectation-maximization algorithm for estimating the model parameters at test time. Experimental results show the superiority of the proposed approach both in terms of perceptual quality and intelligibility of the enhanced speech signal.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03926

PDF

http://arxiv.org/pdf/1902.03926
Read All
Towards autonomous ocean observing systems using Miniature Underwater Gliders with UAV deployment and recovery capabilities

2019-02-08

Erik Sollesnes, Ole Martin Brokstad, Rolf Klæboe, Bendik Vågen, Alfredo Carella, Alex Alcocer, Artur Piotr Zolich, Tor Arne Johansen

arXiv_RO

arXiv_RO Face Optimization
Abstract

This paper presents preliminary results towards the development of an autonomous ocean observing system using Miniature Underwater Gliders (MUGs) that can operate with the support of Unmanned Aerial Vehicles (UAVs) and Unmanned Surface Vessels (USVs) for deployment, recovery, battery charging, and communication relay. The system reduces human intervention to the minimum, revolutionizing the affordability of a broad range of surveillance and data collection operations. The MUGs are equipped with a small Variable Buoyancy System (VBS) composed of a gas filled piston and a linear actuator powered by brushless DC motor and a rechargable lithium ion battery in an oil filled flexible enclosure. By using a fully pressure tolerant electronic design the aim is to reduce the total complexity, weight, and cost of the overall system. A first prototype of the VBS was built and demonstrated in a small aquarium. The electronic components were tested in a pressure testing facility to a minimum of 20bar. Preliminary results are promising and future work will focus on system and weight optimization, UAV deployment/recovery strategies, as well as sea trials to an operating depth of 200m.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.03112

PDF

https://arxiv.org/pdf/1902.03112
Read All
Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops

2019-02-08

Limor Gultchin (University of Oxford), Genevieve Patterson (TRASH), Nancy Baym (Microsoft Research), Nathaniel Swinger (Lexington High School), Adam Tauman Kalai (Microsoft Research)

arXiv_CL

arXiv_CL Embedding
Abstract

We study humor in Word Embeddings, a popular AI tool that associates each word with a Euclidean vector. We find that: (a) the word vectors capture multiple aspects of humor discussed in theories of humor; and (b) each individual’s sense of humor can be represented by a vector, and that these sense-of-humor vectors accurately predict differences in people’s sense of humor on new, unrated, words. The fact that single-word humor seems to be relatively easy for AI has implications for the study of humor in language. Humor ratings are taken from the work of Englethaler and Hills (2017) as well as our own crowdsourcing study of 120,000 words.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02783

PDF

http://arxiv.org/pdf/1902.02783
Read All
FocusNet: An attention-based Fully Convolutional Network for Medical Image Segmentation

2019-02-08

Chaitanya Kaul, Suresh Manandhar, Nick Pears

arXiv_CV

arXiv_CV Segmentation Attention CNN
Abstract

We propose a novel technique to incorporate attention within convolutional neural networks using feature maps generated by a separate convolutional autoencoder. Our attention architecture is well suited for incorporation with deep convolutional networks. We evaluate our model on benchmark segmentation datasets in skin cancer segmentation and lung lesion segmentation. Results show highly competitive performance when compared with U-Net and it’s residual variant.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03091

PDF

http://arxiv.org/pdf/1902.03091
Read All
Addressing Overfitting on Pointcloud Classification using Atrous XCRF

2019-02-08

Hasan Asyari Arief, Ulf Geir Indahl, Geir-Harald Strand, Håvard Tveite

arXiv_CV

arXiv_CV Classification
Abstract

Advances in techniques for automated classification of pointcloud data introduce great opportunities for many new and existing applications. However, with a limited number of labeled points, automated classification by a machine learning model is prone to overfitting and poor generalization. The present paper addresses this problem by inducing controlled noise (on a trained model) generated by invoking conditional random field similarity penalties using nearby features. The method is called Atrous XCRF and works by forcing a trained model to respect the similarity penalties provided by unlabeled data. In a benchmark study carried out using the ISPRS 3D labeling dataset, our technique achieves 84.97% in term of overall accuracy, and 71.05% in term of F1 score. The result is on par with the current best model for the benchmark dataset and has the highest value in term of F1 score.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03088

PDF

http://arxiv.org/pdf/1902.03088
Read All
Skeleton-Based Online Action Prediction Using Scale Selection Network

2019-02-08

Jun Liu, Amir Shahroudy, Gang Wang, Ling-Yu Duan, Alex C. Kot

arXiv_CV

arXiv_CV CNN Prediction
Abstract

Action prediction is to recognize the class label of an ongoing activity when only a part of it is observed. In this paper, we focus on online action prediction in streaming 3D skeleton sequences. A dilated convolutional network is introduced to model the motion dynamics in temporal dimension via a sliding window over the temporal axis. Since there are significant temporal scale variations in the observed part of the ongoing action at different time steps, a novel window scale selection method is proposed to make our network focus on the performed part of the ongoing action and try to suppress the possible incoming interference from the previous actions at each step. An activation sharing scheme is also proposed to handle the overlapping computations among the adjacent time steps, which enables our framework to run more efficiently. Moreover, to enhance the performance of our framework for action prediction with the skeletal input data, a hierarchy of dilated tree convolutions are also designed to learn the multi-level structured semantic representations over the skeleton joints at each frame. Our proposed approach is evaluated on four challenging datasets. The extensive experiments demonstrate the effectiveness of our method for skeleton-based online action prediction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03084

PDF

http://arxiv.org/pdf/1902.03084
Read All
Spectral-Spatial Diffusion Geometry for Hyperspectral Image Clustering

2019-02-08

James M. Murphy, Mauro Maggioni

arXiv_CV

arXiv_CV Regularization
Abstract

An unsupervised learning algorithm to cluster hyperspectral image (HSI) data is proposed that exploits spatially-regularized random walks. Markov diffusions are defined on the space of HSI spectra with transitions constrained to near spatial neighbors. The explicit incorporation of spatial regularity into the diffusion construction leads to smoother random processes that are more adapted for unsupervised machine learning than those based on spectra alone. The regularized diffusion process is subsequently used to embed the high-dimensional HSI into a lower dimensional space through diffusion distances. Cluster modes are computed using density estimation and diffusion distances, and all other points are labeled according to these modes. The proposed method has low computational complexity and performs competitively against state-of-the-art HSI clustering algorithms on real data. In particular, the proposed spatial regularization confers an empirical advantage over non-regularized methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05402

PDF

http://arxiv.org/pdf/1902.05402
Read All

161/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL