Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Boundary Aware Multi-Focus Image Fusion Using Deep Neural Network

2019-03-30

Haoyu Ma, Juncheng Zhang, Shaojun Liu, Qingmin Liao

arXiv_CV

arXiv_CV Quantitative
Abstract

Since it is usually difficult to capture an all-in-focus image of a 3D scene directly, various multi-focus image fusion methods are employed to generate it from several images focusing at different depths. However, the performance of existing methods is barely satisfactory and often degrades for areas near the focused/defocused boundary (FDB). In this paper, a boundary aware method using deep neural network is proposed to overcome this problem. (1) Aiming to acquire improved fusion images, a 2-channel deep network is proposed to better extract the relative defocus information of the two source images. (2) After analyzing the different situations for patches far away from and near the FDB, we use two networks to handle them respectively. (3) To simulate the reality more precisely, a new approach of dataset generation is designed. Experiments demonstrate that the proposed method outperforms the state-of-the-art methods, both qualitatively and quantitatively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00198

PDF

http://arxiv.org/pdf/1904.00198
Read All
Exploiting SIFT Descriptor for Rotation Invariant Convolutional Neural Network

2019-03-30

Abhay Kumar, Nishant Jain, Chirag Singh, Suraj Tripathi

arXiv_CV

arXiv_CV CNN Relation
Abstract

This paper presents a novel approach to exploit the distinctive invariant features in convolutional neural network. The proposed CNN model uses Scale Invariant Feature Transform (SIFT) descriptor instead of the max-pooling layer. Max-pooling layer discards the pose, i.e., translational and rotational relationship between the low-level features, and hence unable to capture the spatial hierarchies between low and high level features. The SIFT descriptor layer captures the orientation and the spatial relationship of the features extracted by convolutional layer. The proposed SIFT Descriptor CNN therefore combines the feature extraction capabilities of CNN model and rotation invariance of SIFT descriptor. Experimental results on the MNIST and fashionMNIST datasets indicates reasonable improvements over conventional methods available in literature.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00197

PDF

http://arxiv.org/pdf/1904.00197
Read All
A Convolution-Free LBP-HOG Descriptor For Mammogram Classification

2019-03-30

Zainab Alhakeem, Se-In Jang

arXiv_CV

arXiv_CV Classification
Abstract

In image based feature descriptor design, an iterative scanning process utilizing the convolution operation is often adopted to extract local information of the image pixels. In this paper, we propose a convolution-free Local Binary Pattern (CF-LBP) and a convolution-free Histogram of Oriented Gradients (CF-HOG) descriptors in matrix form for mammogram classification. An integrated form of CF-LBP and CF-HOG, CF-LBP-HOG, is subsequently constructed in a single matrix formulation. The proposed descriptors are evaluated using a publicly available mammogram database. The results show promising performance in terms of classification accuracy and computational efficiency.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00187

PDF

http://arxiv.org/pdf/1904.00187
Read All
Adaptive Adjustment with Semantic Feature Space for Zero-Shot Recognition

2019-03-30

Jingcai Guo, Song Guo

arXiv_CV

arXiv_CV Knowledge Attention Recognition
Abstract

In most recent years, zero-shot recognition (ZSR) has gained increasing attention in machine learning and image processing fields. It aims at recognizing unseen class instances with knowledge transferred from seen classes. This is typically achieved by exploiting a pre-defined semantic feature space (FS), i.e., semantic attributes or word vectors, as a bridge to transfer knowledge between seen and unseen classes. However, due to the absence of unseen classes during training, the conventional ZSR easily suffers from domain shift and hubness problems. In this paper, we propose a novel ZSR learning framework that can handle these two issues well by adaptively adjusting semantic FS. To the best of our knowledge, our work is the first to consider the adaptive adjustment of semantic FS in ZSR. Moreover, our solution can be formulated to a more efficient framework that significantly boosts the training. Extensive experiments show the remarkable performance improvement of our model compared with other existing methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00170

PDF

http://arxiv.org/pdf/1904.00170
Read All
M2FPA: A Multi-Yaw Multi-Pitch High-Quality Database and Benchmark for Facial Pose Analysis

2019-03-30

Peipei Li, Xiang Wu, Yibo Hu, Ran He, Zhenan Sun

arXiv_CV

arXiv_CV GAN Face Pose_Estimation Optimization Quantitative Recognition Face_Recognition
Abstract

Facial images in surveillance or mobile scenarios often have large view-point variations in terms of pitch and yaw angles. These jointly occurred angle variations make face recognition challenging. Current public face databases mainly consider the case of yaw variations. In this paper, a new large-scale Multi-yaw Multi-pitch high-quality database is proposed for Facial Pose Analysis (M2FPA), including face frontalization, face rotation, facial pose estimation and pose-invariant face recognition. It contains 397,544 images of 229 subjects with yaw, pitch, attribute, illumination and accessory. M2FPA is the most comprehensive multi-view face database for facial pose analysis. Further, we provide an effective benchmark for face frontalization and pose-invariant face recognition on M2FPA with several state-of-the-art methods, including DR-GAN, TP-GAN and CAPG-GAN. We believe that the new database and benchmark can significantly push forward the advance of facial pose analysis in real-world applications. Moreover, a simple yet effective parsing guided discriminator is introduced to capture the local consistency during GAN optimization. Extensive quantitative and qualitative results on M2FPA and Multi-PIE demonstrate the superiority of our face frontalization method. Baseline results for both face synthesis and face recognition from state-of-theart methods demonstrate the challenge offered by this new database.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00168

PDF

http://arxiv.org/pdf/1904.00168
Read All
Exposing GAN-synthesized Faces Using Landmark Locations

2019-03-30

Xin Yang, Yuezun Li, Honggang Qi, Siwei Lyu

arXiv_CV

arXiv_CV GAN Face Classification
Abstract

Generative adversary networks (GANs) have recently led to highly realistic image synthesis results. In this work, we describe a new method to expose GAN-synthesized images using the locations of the facial landmark points. Our method is based on the observations that the facial parts configuration generated by GAN models are different from those of the real faces, due to the lack of global constraints. We perform experiments demonstrating this phenomenon, and show that an SVM classifier trained using the locations of facial landmark points is sufficient to achieve good classification performance for GAN-synthesized faces.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00167

PDF

http://arxiv.org/pdf/1904.00167
Read All
Controllable Generative Adversarial Network

2019-03-30

Minhyeok Lee, Junhee Seok

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Recently introduced generative adversarial network (GAN) has been shown numerous promising results to generate realistic samples. The essential task of GAN is to control the features of samples generated from a random distribution. While the current GAN structures, such as conditional GAN, successfully generate samples with desired major features, they often fail to produce detailed features that bring specific differences among samples. To overcome this limitation, here we propose a controllable GAN (ControlGAN) structure. By separating a feature classifier from a discriminator, the generator of ControlGAN is designed to learn generating synthetic samples with the specific detailed features. Evaluated with multiple image datasets, ControlGAN shows a power to generate improved samples with well-controlled features. Furthermore, we demonstrate that ControlGAN can generate intermediate features and opposite features for interpolated and extrapolated input labels that are not used in the training process. It implies that ControlGAN can significantly contribute to the variety of generated samples.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1708.00598

PDF

http://arxiv.org/pdf/1708.00598
Read All
Machine translation considering context information using Encoder-Decoder model

2019-03-30

Tetsuto Takano, Satoshi Yamane

arXiv_CL

arXiv_CL
Abstract

In the task of machine translation, context information is one of the important factor. But considering the context information model dose not proposed. The paper propose a new model which can integrate context information and make translation. In this paper, we create a new model based Encoder Decoder model. When translating current sentence, the model integrates output from preceding encoder with current encoder. The model can consider context information and the result score is higher than existing model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00160

PDF

http://arxiv.org/pdf/1904.00160
Read All
UVA: A Universal Variational Framework for Continuous Age Analysis

2019-03-30

Peipei Li, Huaibo Huang, Yibo Hu, Xiang Wu, Ran He, Zhenan Sun

arXiv_CV

arXiv_CV Adversarial Face Quantitative
Abstract

Conventional methods for facial age analysis tend to utilize accurate age labels in a supervised way. However, existing age datasets lies in a limited range of ages, leading to a long-tailed distribution. To alleviate the problem, this paper proposes a Universal Variational Aging (UVA) framework to formulate facial age priors in a disentangling manner. Benefiting from the variational evidence lower bound, the facial images are encoded and disentangled into an age-irrelevant distribution and an age-related distribution in the latent space. A conditional introspective adversarial learning mechanism is introduced to boost the image quality. In this way, when manipulating the age-related distribution, UVA can achieve age translation with arbitrary ages. Further, by sampling noise from the age-irrelevant distribution, we can generate photorealistic facial images with a specific age. Moreover, given an input face image, the mean value of age-related distribution can be treated as an age estimator. These indicate that UVA can efficiently and accurately estimate the age-related distribution by a disentangling manner, even if the training dataset performs a long-tailed age distribution. UVA is the first attempt to achieve facial age analysis tasks, including age translation, age generation and age estimation, in a universal framework. The qualitative and quantitative experiments demonstrate the superiority of UVA on five popular datasets, including CACD2000, Morph, UTKFace, MegaAge-Asian and FG-NET.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00158

PDF

http://arxiv.org/pdf/1904.00158
Read All
Linguistic generalization and compositionality in modern artificial neural networks

2019-03-30

Marco Baroni

arXiv_CL

arXiv_CL Review
Abstract

In the last decade, deep artificial neural networks have achieved astounding performance in many natural language processing tasks. Given the high productivity of language, these models must possess effective generalization abilities. It is widely assumed that humans handle linguistic productivity by means of algebraic compositional rules: Are deep networks similarly compositional? After reviewing the main innovations characterizing current deep language processing networks, I discuss a set of studies suggesting that deep networks are capable of subtle grammar-dependent generalizations, but also that they do not rely on systematic compositional rules. I argue that the intriguing behaviour of these devices (still awaiting a full understanding) should be of interest to linguists and cognitive scientists, as it offers a new perspective on possible computational strategies to deal with linguistic productivity beyond rule-based compositionality, and it might lead to new insights into the less systematic generalization patterns that also appear in natural language.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00157

PDF

http://arxiv.org/pdf/1904.00157
Read All
Robust Subspace Recovery Layer for Unsupervised Anomaly Detection

2019-03-30

Chieh-Hsin Lai, Dongmian Zou, Gilad Lerman

arXiv_CV

arXiv_CV Detection
Abstract

We propose a neural network for unsupervised anomaly detection with a novel robust subspace recovery layer (RSR layer). This layer seeks to extract the underlying subspace from a latent representation of the given data and remove outliers that lie away from this subspace. It is used together with an encoder and a decoder. The encoder maps the data into the latent space, from which the RSR layer extracts the subspace. The decoder then smoothly maps back the underlying subspace to a ``manifold” close to the original data. We illustrate algorithmic choices and performance for artificial data with corrupted manifold structure. We also demonstrate competitive precision and recall for image datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00152

PDF

http://arxiv.org/pdf/1904.00152
Read All
Learning Affective Correspondence between Music and Image

2019-03-30

Gaurav Verma, Eeshan Gunesh Dhekane, Tanaya Guha

arXiv_SD

arXiv_SD Classification Prediction Recognition
Abstract

We introduce the problem of learning affective correspondence between audio (music) and visual data (images). For this task, a music clip and an image are considered similar (having true correspondence) if they have similar emotion content. In order to estimate this crossmodal, emotion-centric similarity, we propose a deep neural network architecture that learns to project the data from the two modalities to a common representation space, and performs a binary classification task of predicting the affective correspondence (true or false). To facilitate the current study, we construct a large scale database containing more than $3,500$ music clips and $85,000$ images with three emotion classes (positive, neutral, negative). The proposed approach achieves $61.67\%$ accuracy for the affective correspondence prediction task on this database, outperforming two relevant and competitive baselines. We also demonstrate that our network learns modality-specific representations of emotion (without explicitly being trained with emotion labels), which are useful for emotion recognition in individual modalities.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00150

PDF

http://arxiv.org/pdf/1904.00150
Read All
Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions

2019-03-30

Zhi-Xiu Ye, Zhen-Hua Ling

arXiv_AI

arXiv_AI Relation_Extraction Attention Embedding Relation
Abstract

This paper presents a neural relation extraction method to deal with the noisy training data generated by distant supervision. Previous studies mainly focus on sentence-level de-noising by designing neural networks with intra-bag attentions. In this paper, both intra-bag and inter-bag attentions are considered in order to deal with the noise at sentence-level and bag-level respectively. First, relation-aware bag representations are calculated by weighting sentence embeddings using intra-bag attentions. Here, each possible relation is utilized as the query for attention calculation instead of only using the target relation in conventional methods. Furthermore, the representation of a group of bags in the training set which share the same relation label is calculated by weighting bag representations using a similarity-based inter-bag attention module. Finally, a bag group is utilized as a training sample when building our relation extractor. Experimental results on the New York Times dataset demonstrate the effectiveness of our proposed intra-bag and inter-bag attention modules. Our method also achieves better relation extraction accuracy than state-of-the-art methods on this dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00143

PDF

http://arxiv.org/pdf/1904.00143
Read All
ANA at SemEval-2019 Task 3: Contextual Emotion detection in Conversations through hierarchical LSTMs and BERT

2019-03-30

Chenyang Huang, Amine Trabelsi, Osmar R. Zaïane

arXiv_CL

arXiv_CL Text_Classification RNN Classification Detection
Abstract

This paper describes the system submitted by ANA Team for the SemEval-2019 Task 3: EmoContext. We propose a novel Hierarchical LSTMs for Contextual Emotion Detection (HRLCE) model. It classifies the emotion of an utterance given its conversational context. The results show that, in this task, our HRCLE outperforms the most recent state-of-the-art text classification framework: BERT. We combine the results generated by BERT and HRCLE to achieve an overall score of 0.7709 which ranked 5th on the final leader board of the competition among 165 Teams.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00132

PDF

http://arxiv.org/pdf/1904.00132
Read All
Dance Dance Generation: Motion Transfer for Internet Videos

2019-03-30

Yipin Zhou, Zhaowen Wang, Chen Fang, Trung Bui, Tamara L. Berg

arXiv_CV

arXiv_CV Quantitative
Abstract

This work presents computational methods for transferring body movements from one person to another with videos collected in the wild. Specifically, we train a personalized model on a single video from the Internet which can generate videos of this target person driven by the motions of other people. Our model is built on two generative networks: a human (foreground) synthesis net which generates photo-realistic imagery of the target person in a novel pose, and a fusion net which combines the generated foreground with the scene (background), adding shadows or reflections as needed to enhance realism. We validate the the efficacy of our proposed models over baselines with qualitative and quantitative evaluations as well as a subjective test.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00129

PDF

http://arxiv.org/pdf/1904.00129
Read All
Object Hallucination in Image Captioning

2019-03-29

Anna Rohrbach, Lisa Anne Hendricks, Kaylee Burns, Trevor Darrell, Kate Saenko

arXiv_CV

arXiv_CV Image_Caption Caption Classification
Abstract

Despite continuously improving performance, contemporary image captioning models are prone to “hallucinating” objects that are not actually in a scene. One problem is that standard metrics only measure similarity to ground truth captions and may not fully capture image relevance. In this work, we propose a new image relevance metric to evaluate current models with veridical visual labels and assess their rate of object hallucination. We analyze how captioning model architectures and learning objectives contribute to object hallucination, explore when hallucination is likely due to image misclassification or language priors, and assess how well current sentence metrics capture object hallucination. We investigate these questions on the standard image captioning benchmark, MSCOCO, using a diverse set of models. Our analysis yields several interesting findings, including that models which score best on standard sentence metrics do not always have lower hallucination and that models which hallucinate more tend to make errors driven by language priors.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1809.02156

PDF

https://arxiv.org/pdf/1809.02156
Read All
Structured Minimally Supervised Learning for Neural Relation Extraction

2019-03-29

Fan Bai, Alan Ritter

arXiv_CL

arXiv_CL Relation_Extraction Attention CNN Relation
Abstract

We present an approach to minimally supervised relation extraction that combines the benefits of learned representations and structured learning, and accurately predicts sentence-level relation mentions given only proposition-level supervision from a KB. By explicitly reasoning about missing data during learning, our approach enables large-scale training of 1D convolutional neural networks while mitigating the issue of label noise inherent in distant supervision. Our approach achieves state-of-the-art results on minimally supervised sentential relation extraction, outperforming a number of baselines, including a competitive approach that uses the attention layer of a purely neural model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00118

PDF

http://arxiv.org/pdf/1904.00118
Read All
Keyphrase Generation: A Text Summarization Struggle

2019-03-29

Erion Çano, Ondřej Bojar

arXiv_CL

arXiv_CL Summarization Deep_Learning
Abstract

Authors’ keyphrases assigned to scientific articles are essential for recognizing content and topic aspects. Most of the proposed supervised and unsupervised methods for keyphrase generation are unable to produce terms that are valuable but do not appear in the text. In this paper, we explore the possibility of considering the keyphrase string as an abstractive summary of the title and the abstract. First, we collect, process and release a large dataset of scientific paper metadata that contains 2.2 million records. Then we experiment with popular text summarization neural architectures. Despite using advanced deep learning models, large quantities of data and many days of computation, our systematic evaluation on four test datasets reveals that the explored text summarization methods could not produce better keyphrases than the simpler unsupervised methods, or the existing supervised ones.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00110

PDF

http://arxiv.org/pdf/1904.00110
Read All
How to Estimate the Ability of a Metaheuristic Algorithm to Guide Heuristics During Optimization

2019-03-29

Miloš Simić (University of Belgrade, Belgrade, Serbia)

arXiv_AI

arXiv_AI Optimization
Abstract

Metaheuristics are general methods that guide application of concrete heuristic(s) to problems that are too hard to solve using exact algorithms. However, even though a growing body of literature has been devoted to their statistical evaluation, the approaches proposed so far are able to assess only coupled effects of metaheuristics and heuristics. They do not reveal us anything about how efficient the examined metaheuristic is at guiding its subordinate heuristic(s), nor do they provide us information about how much the heuristic component of the combined algorithm contributes to the overall performance. In this paper, we propose a simple yet effective methodology of doing so by deriving a naive, placebo metaheuristic from the one being studied and comparing the distributions of chosen performance metrics for the two methods. We propose three measures of difference between the two distributions. Those measures, which we call BER values (benefit, equivalence, risk) are based on a preselected threshold of practical significance which represents the minimal difference between two performance scores required for them to be considered practically different. We illustrate usefulness of our methodology on the example of Simulated Annealing, Boolean Satisfiability Problem, and the Flip heuristic.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00103

PDF

http://arxiv.org/pdf/1904.00103
Read All
Autonomous Visual Assistance for Robot Operations Using a Tethered UAV

2019-03-29

Xuesu Xiao, Jan Dufek, Robin R. Murphy

arXiv_RO

arXiv_RO
Abstract

This paper develops an autonomous tethered aerial visual assistant for robot operations in unstructured or confined environments. Robotic tele-operation in remote environments is difficult due to lack of sufficient situational awareness, mostly caused by the stationary and limited field-of-view and lack of depth perception from the robot’s onboard camera. The emerging state of the practice is to use two robots, a primary and a secondary that acts as a visual assistant to overcome the perceptual limitations of the onboard sensors by providing an external viewpoint. However, problems exist when using a tele-operated visual assistant: extra manpower, manually chosen suboptimal viewpoint, and extra teamwork demand between primary and secondary operators. In this work, we use an autonomous tethered aerial visual assistant to replace the secondary robot and operator, reducing human robot ratio from 2:2 to 1:2. This visual assistant is able to autonomously navigate through unstructured or confined spaces in a risk-aware manner, while continuously maintaining good viewpoint quality to increase the primary operator’s situational awareness. With the proposed co-robots team, tele-operation missions in nuclear operations, bomb squad, disaster robots, and other domains with novel tasks or highly occluded environments could benefit from reduced manpower and teamwork demand, along with improved visual assistance quality based on trustworthy risk-aware motion in cluttered environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00078

PDF

http://arxiv.org/pdf/1904.00078
Read All
3D Organ Shape Reconstruction from Topogram Images

2019-03-29

Elena Balashova, Jiangping Wang, Vivek Singh, Bogdan Georgescu, Brian Teixeira, Ankur Kapoor

arXiv_CV

arXiv_CV GAN Prediction
Abstract

Automatic delineation and measurement of main organs such as liver is one of the critical steps for assessment of hepatic diseases, planning and postoperative or treatment follow-up. However, addressing this problem typically requires performing computed tomography (CT) scanning and complicated postprocessing of the resulting scans using slice-by-slice techniques. In this paper, we show that 3D organ shape can be automatically predicted directly from topogram images, which are easier to acquire and have limited exposure to radiation during acquisition, compared to CT scans. We evaluate our approach on the challenging task of predicting liver shape using a generative model. We also demonstrate that our method can be combined with user annotations, such as a 2D mask, for improved prediction accuracy. We show compelling results on 3D liver shape reconstruction and volume estimation on 2129 CT scans.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00073

PDF

http://arxiv.org/pdf/1904.00073
Read All
Unpaired Point Cloud Completion on Real Scans using Adversarial Training

2019-03-29

Xuelin Chen, Baoquan Chen, Niloy J. Mitra

arXiv_CV

arXiv_CV Adversarial Deep_Learning Quantitative
Abstract

As 3D scanning solutions become increasingly popular, several deep learning setups have been developed geared towards that task of scan completion, i.e., plausibly filling in regions there were missed in the raw scans. These methods, however, largely rely on supervision in the form of paired training data, i.e., partial scans with corresponding desired completed scans. While these methods have been successfully demonstrated on synthetic data, the approaches cannot be directly used on real scans in absence of suitable paired training data. We develop a first approach that works directly on input point clouds, does not require paired training data, and hence can directly be applied to real scans for scan completion. We evaluate the approach qualitatively on several real-world datasets (ScanNet, Matterport, KITTI), quantitatively on 3D-EPN shape completion benchmark dataset, and demonstrate realistic completions under varying levels of incompleteness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00069

PDF

http://arxiv.org/pdf/1904.00069
Read All
Brain Tissue Segmentation Using NeuroNet With Different Pre-processing Techniques

2019-03-29

Fakrul Islam Tushar, Basel Alyafi, Md. Kamrul Hasan, Lavsen Dahal

arXiv_CV

arXiv_CV Segmentation Quantitative
Abstract

Automatic segmentation of brain Magnetic Resonance Imaging (MRI) images is one of the vital steps for quantitative analysis of brain for further inspection. In this paper, NeuroNet has been adopted to segment the brain tissues (white matter (WM), grey matter (GM) and cerebrospinal fluid (CSF)) which uses Residual Network (ResNet) in encoder and Fully Convolution Network (FCN) in the decoder. To achieve the best performance, various hyper-parameters have been tuned, while, network parameters (kernel and bias) were initialized using the NeuroNet pre-trained model. Different pre-processing pipelines have also been introduced to get a robust trained model. The model has been trained and tested on IBSR18 data-set. To validate the research outcome, performance was measured quantitatively using Dice Similarity Coefficient (DSC) and is reported on average as 0.84 for CSF, 0.94 for GM, and 0.94 for WM. The outcome of the research indicates that for the IBSR18 data-set, pre-processing and proper tuning of hyper-parameters for NeuroNet model have improvement in DSC for the brain tissue segmentation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00068

PDF

http://arxiv.org/pdf/1904.00068
Read All
Multi-Scale Time-Frequency Attention for Rare Sound Event Detection

2019-03-29

Jingyang Zhang, Wenhao Ding, Jintao Kang, Liang He

arXiv_SD

arXiv_SD Knowledge Attention Classification Detection
Abstract

Attention mechanism has been widely applied to various sound-related tasks. In this work, we propose a Multi-Scale Time-Frequency Attention (MTFA) module for sound event detection. By generating an attention heatmap, MTFA enables the model to focus on discriminative components of the spectrogram along both time and frequency axis. Besides, gathering information at multiple scales helps the model adapt better to the characteristics of different categories of target events. The proposed method is demonstrated on task 2 of Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 Challenge. To the best of our knowledge, our method outperforms all previous methods that don’t use model ensemble on development dataset and achieves state-of-the-art on evaluation dataset by reducing the error rate to 0.09 from 0.13. This demonstrates the effectiveness of MTFA on retrieving discriminative representations for sound event detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00063

PDF

http://arxiv.org/pdf/1904.00063
Read All
Joining Sound Event Detection and Localization Through Spatial Segregation

2019-03-29

Ivo Trowitzsch, Christopher Schymura, Dorothea Kolossa, Klaus Obermayer

arXiv_SD

arXiv_SD Detection
Abstract

Identification and localization of sounds are both integral parts of computational auditory scene analysis. Although each can be solved separately, the goal of forming coherent auditory objects and achieving a comprehensive spatial scene understanding suggests pursuing a joint solution of the two problems. This work presents an approach that robustly binds localization with the detection of sound events in a binaural robotic system. Both tasks are joined through the use of spatial stream segregation which produces probabilistic time-frequency masks for individual sources attributable to separate locations, enabling segregated sound event detection operating on these streams. We use simulations of a comprehensive suite of test scenes with multiple co-occurring sound sources, and propose performance measures for systematic investigation of the impact of scene complexity on this segregated detection of sound types. Analyzing the effect of head orientation, we show how a robot can facilitate high performance through optimal head rotation. Furthermore, we investigate the performance of segregated detection given possible localization error as well as error in the estimation of number of active sources. Our analysis demonstrates that the proposed approach is an effective method to obtain joint sound event location and type information under a wide range of conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00055

PDF

http://arxiv.org/pdf/1904.00055
Read All
SE2Net: Siamese Edge-Enhancement Network for Salient Object Detection

2019-03-29

Sanping Zhou, Jinjun Wang, Fei Wang, Dong Huang

arXiv_CV

arXiv_CV Salient Object_Detection CNN Inference Detection
Abstract

Deep convolutional neural network significantly boosted the capability of salient object detection in handling large variations of scenes and object appearances. However, convolution operations seek to generate strong responses on individual pixels, while lack the ability to maintain the spatial structure of objects. Moreover, the down-sampling operations, such as pooling and striding, lose spatial details of the salient objects. In this paper, we propose a simple yet effective Siamese Edge-Enhancement Network (SE2Net) to preserve the edge structure for salient object detection. Specifically, a novel multi-stage siamese network is built to aggregate the low-level and high-level features, and parallelly estimate the salient maps of edges and regions. As a result, the predicted regions become more accurate by enhancing the responses at edges, and the predicted edges become more semantic by suppressing the false positives in background. After the refined salient maps of edges and regions are produced by the SE2Net, an edge-guided inference algorithm is designed to further improve the resulting salient masks along the predicted edges. Extensive experiments on several benchmark datasets have been conducted, which show that our method is superior than the state-of-the-art approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00048

PDF

http://arxiv.org/pdf/1904.00048
Read All
Autonomous Highway Driving using Deep Reinforcement Learning

2019-03-29

Subramanya Nageshrao, Eric Tseng, Dimitar Filev

arXiv_RO

arXiv_RO Reinforcement_Learning
Abstract

The operational space of an autonomous vehicle (AV) can be diverse and vary significantly. This may lead to a scenario that was not postulated in the design phase. Due to this, formulating a rule based decision maker for selecting maneuvers may not be ideal. Similarly, it may not be effective to design an a-priori cost function and then solve the optimal control problem in real-time. In order to address these issues and to avoid peculiar behaviors when encountering unforeseen scenario, we propose a reinforcement learning (RL) based method, where the ego car, i.e., an autonomous vehicle, learns to make decisions by directly interacting with simulated traffic. The decision maker for AV is implemented as a deep neural network providing an action choice for a given system state. In a critical application such as driving, an RL agent without explicit notion of safety may not converge or it may need extremely large number of samples before finding a reliable policy. To best address the issue, this paper incorporates reinforcement learning with an additional short horizon safety check (SC). In a critical scenario, the safety check will also provide an alternate safe action to the agent provided if it exists. This leads to two novel contributions. First, it generalizes the states that could lead to undesirable “near-misses” or “collisions “. Second, inclusion of safety check can provide a safe and stable training environment. This significantly enhances learning efficiency without inhibiting meaningful exploration to ensure safe and optimal learned behavior. We demonstrate the performance of the developed algorithm in highway driving scenario where the trained AV encounters varying traffic density in a highway setting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00035

PDF

http://arxiv.org/pdf/1904.00035
Read All
Incremental Learning with Unlabeled Data in the Wild

2019-03-29

Kibok Lee, Kimin Lee, Jinwoo Shin, Honglak Lee

arXiv_CV

arXiv_CV
Abstract

Deep neural networks are known to suffer from catastrophic forgetting in class-incremental learning, where the performance on previous tasks drastically degrades when learning a new task. To alleviate this effect, we propose to leverage a continuous and large stream of unlabeled data in the wild. In particular, to leverage such transient external data effectively, we design a novel class-incremental learning scheme with (a) a new distillation loss, termed global distillation, (b) a learning strategy to avoid overfitting to the most recent task, and (c) a sampling strategy for the desired external data. Our experimental results on various datasets, including CIFAR and ImageNet, demonstrate the superiority of the proposed methods over prior methods, particularly when a stream of unlabeled data is accessible: we achieve up to 9.3% of relative performance improvement compared to the state-of-the-art method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12648

PDF

http://arxiv.org/pdf/1903.12648
Read All
Thyroid Cancer Malignancy Prediction From Whole Slide Cytopathology Images

2019-03-29

David Dov, Shahar Kovalsky, Jonathan Cohen, Danielle Range, Ricardo Henao, Lawrence Carin

arXiv_AI

arXiv_AI Prediction
Abstract

We consider preoperative prediction of thyroid cancer based on ultra-high-resolution whole-slide cytopathology images. Inspired by how human experts perform diagnosis, our approach first identifies and classifies diagnostic image regions containing informative thyroid cells, which only comprise a tiny fraction of the entire image. These local estimates are then aggregated into a single prediction of thyroid malignancy. Several unique characteristics of thyroid cytopathology guide our deep-learning-based approach. While our method is closely related to multiple-instance learning, it deviates from these methods by using a supervised procedure to extract diagnostically relevant regions. Moreover, we propose to simultaneously predict thyroid malignancy, as well as a diagnostic score assigned by a human expert, which further allows us to devise an improved training strategy. Experimental results show that the proposed algorithm achieves performance comparable to human experts, and demonstrate the potential of using the algorithm for screening and as an assistive tool for the improved diagnosis of indeterminate cases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00839

PDF

http://arxiv.org/pdf/1904.00839
Read All
Integrating Semantic Knowledge to Tackle Zero-shot Text Classification

2019-03-29

Jingqing Zhang, Piyawat Lertvittayakumjorn, Yike Guo

arXiv_CL

arXiv_CL Knowledge_Graph Knowledge Text_Classification Embedding Classification
Abstract

Insufficient or even unavailable training data of emerging classes is a big challenge of many classification tasks, including text classification. Recognising text documents of classes that have never been seen in the learning stage, so-called zero-shot text classification, is therefore difficult and only limited previous works tackled this problem. In this paper, we propose a two-phase framework together with data augmentation and feature augmentation to solve this problem. Four kinds of semantic knowledge (word embeddings, class descriptions, class hierarchy, and a general knowledge graph) are incorporated into the proposed framework to deal with instances of unseen classes effectively. Experimental results show that each and the combination of the two phases achieve the best overall accuracy compared with baselines and recent approaches in classifying real-world texts under the zero-shot scenario.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12626

PDF

http://arxiv.org/pdf/1903.12626
Read All
Stable, Concurrent Controller Composition for Multi-Objective Robotic Tasks

2019-03-29

Anqi Li, Ching-An Cheng, Byron Boots, Magnus Egerstedt

arXiv_RO

arXiv_RO
Abstract

Robotic systems often need to consider multiple tasks concurrently. This challenge calls for control synthesis algorithms that are capable of fulfilling multiple control specifications simultaneously while maintaining the stability of the overall system. In this paper, we decompose complex, multi-objective tasks into subtasks, where individual subtask controllers are designed independently and then combined to generate the overall control policy. In particular, we adopt Riemannian Motion Policies (RMPs), a recently proposed controller structure in robotics, and, RMPflow, its associated computational framework for combining RMP controllers. We re-establish and extend the stability results of RMPflow through a rigorous Control Lyapunov Function (CLF) treatment. We then show that RMPflow can stably combine individually designed subtask controllers that satisfy certain CLF constraints. This new insight leads to an efficient CLF-based computational framework to generate stable controllers that consider all the subtasks simultaneously. Compared with the original usage of RMPflow, our framework provides users the flexibility to incorporate design heuristics through nominal controllers for the subtasks. We validate the proposed computational framework through numerical simulation and robotic implementation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12605

PDF

http://arxiv.org/pdf/1903.12605
Read All
Quadrotor Manipulation System: Development of a Robust Contact Force Estimation and Impedance Control Scheme Based on DOb and FTRLS

2019-03-29

Ahmed Khalifa, Mohamed Fanni

arXiv_RO

arXiv_RO Tracking
Abstract

The research on aerial manipulation systems has been increased rapidly in recent years. These systems are very attractive for a wide range of applications due to their unique features. However, dynamics, control and manipulation tasks of such systems are quite challenging because they are naturally unstable, have very fast dynamics, have strong nonlinearities, are very susceptible to parameters variations due to carrying a payload besides the external disturbances, and have complex inverse kinematics. In addition, the manipulation tasks require estimating (applying) a certain force of (at) the end-effector as well as the accurate positioning of it. Thus, in this article, a robust force estimation and impedance control scheme is proposed to address these issues. The robustness is achieved based on the Disturbance Observer (DOb) technique. Then, a tracking and performance low computational linear controller is used. For teleoperation purpose, the contact force needs to be identified. However, the current developed techniques for force estimation have limitations because they are based on ignoring some dynamics and/or requiring of an indicator of the environment contact. Unlike these techniques, we propose a technique based on linearization capabilities of DOb and a Fast Tracking Recursive Least Squares (FTRLS) algorithm. The complex inverse kinematics problem of such a system is solved by a Jacobin based algorithm. The stability analysis of the proposed scheme is presented. The algorithm is tested to achieve tracking of task space reference trajectories besides the impedance control. The efficiency of the proposed technique is enlightened via numerical simulation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00008

PDF

http://arxiv.org/pdf/1904.00008
Read All
A Convolutional Neural Network for Language-Agnostic Source Code Summarization

2019-03-29

Jessica Moore, Ben Gelman, David Slater

arXiv_CL

arXiv_CL Summarization CNN Detection
Abstract

Descriptive comments play a crucial role in the software engineering process. They decrease development time, enable better bug detection, and facilitate the reuse of previously written code. However, comments are commonly the last of a software developer’s priorities and are thus either insufficient or missing entirely. Automatic source code summarization may therefore have the ability to significantly improve the software development process. We introduce a novel encoder-decoder model that summarizes source code, effectively writing a comment to describe the code’s functionality. We make two primary innovations beyond current source code summarization models. First, our encoder is fully language-agnostic and requires no complex input preprocessing. Second, our decoder has an open vocabulary, enabling it to predict any word, even ones not seen in training. We demonstrate results comparable to state-of-the-art methods on a single-language data set and provide the first results on a data set consisting of multiple programming languages.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00805

PDF

http://arxiv.org/pdf/1904.00805
Read All
CroP: Color Constancy Benchmark Dataset Generator

2019-03-29

Nikola Banić, Karlo Koščević, Marko Subašić, Sven Lončarić

arXiv_CV

arXiv_CV Deep_Learning
Abstract

Implementing color constancy as a pre-processing step in contemporary digital cameras is of significant importance as it removes the influence of scene illumination on object colors. Several benchmark color constancy datasets have been created for the purpose of developing and testing new color constancy methods. However, they all have numerous drawbacks including a small number of images, erroneously extracted ground-truth illuminations, long histories of misuses, violations of their stated assumptions, etc. To overcome such and similar problems, in this paper a color constancy benchmark dataset generator is proposed. For a given camera sensor it enables generation of any number of realistic raw images taken in a subset of the real world, namely images of printed photographs. Datasets with such images share many positive features with other existing real-world datasets, while some of the negative features are completely eliminated. The generated images can be successfully used to train methods that afterward achieve high accuracy on real-world datasets. This opens the way for creating large enough datasets for advanced deep learning techniques. Experimental results are presented and discussed. The source code is available at this http URL

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12581

PDF

http://arxiv.org/pdf/1903.12581
Read All
Learning Relational Representations with Auto-encoding Logic Programs

2019-03-29

Sebastijan Dumancic, Tias Guns, Wannes Meert, Hendrik Blockeel

arXiv_AI

arXiv_AI Represenation_Learning Deep_Learning Relation
Abstract

Deep learning methods capable of handling relational data have proliferated over the last years. In contrast to traditional relational learning methods that leverage first-order logic for representing such data, these deep learning methods aim at re-representing symbolic relational data in Euclidean spaces. They offer better scalability, but can only numerically approximate relational structures and are less flexible in terms of reasoning tasks supported. This paper introduces a novel framework for relational representation learning that combines the best of both worlds. This framework, inspired by the auto-encoding principle, uses first-order logic as a data representation language, and the mapping between the original and latent representation is done by means of logic programs instead of neural networks. We show how learning can be cast as a constraint optimisation problem for which existing solvers can be used. The use of logic as a representation language makes the proposed framework more accurate (as the representation is exact, rather than approximate), more flexible, and more interpretable than deep learning methods. We experimentally show that these latent representations are indeed beneficial in relational learning tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12577

PDF

http://arxiv.org/pdf/1903.12577
Read All
CNN-based Prostate Zonal Segmentation on T2-weighted MR Images: A Cross-dataset Study

2019-03-29

Leonardo Rundo, Changhee Han, Jin Zhang, Ryuichiro Hataya, Yudai Nagano, Carmelo Militello, Claudio Ferretti, Marco S. Nobile, Andrea Tangherloni, Maria Carla Gilardi, Salvatore Vitabile, Hideki Nakayama, Giancarlo Mauri

arXiv_AI

arXiv_AI Segmentation CNN Deep_Learning
Abstract

Prostate cancer is the most common cancer among US men. However, prostate imaging is still challenging despite the advances in multi-parametric Magnetic Resonance Imaging (MRI), which provides both morphologic and functional information pertaining to the pathological regions. Along with whole prostate gland segmentation, distinguishing between the Central Gland (CG) and Peripheral Zone (PZ) can guide towards differential diagnosis, since the frequency and severity of tumors differ in these regions; however, their boundary is often weak and fuzzy. This work presents a preliminary study on Deep Learning to automatically delineate the CG and PZ, aiming at evaluating the generalization ability of Convolutional Neural Networks (CNNs) on two multi-centric MRI prostate datasets. Especially, we compared three CNN-based architectures: SegNet, U-Net, and pix2pix. In such a context, the segmentation performances achieved with/without pre-training were compared in 4-fold cross-validation. In general, U-Net outperforms the other methods, especially when training and testing are performed on multiple datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12571

PDF

http://arxiv.org/pdf/1903.12571
Read All
Laser slicing: a thin film lift-off method for GaN-on-GaN technology

2019-03-29

Vladislav Voronenkov, Natalia Bochkareva, Ruslan Gorbunov, Andrey Zubrilov, Viktor Kogotkov, Philipp Latyshev, Yuri Lelikov, Andrey Leonidov, Yuri Shreter

arXiv_CV

arXiv_CV GAN
Abstract

A femtosecond laser focused inside bulk GaN was used to slice a thin GaN film with an epitaxial device structure from a bulk GaN substrate. The demonstrated laser slicing lift-off process did not require any special release layers in the epitaxial structure. GaN film with a thickness of 5 $\mu$m and an InGaN LED epitaxial device structure was lifted off a GaN substrate and transferred onto a copper substrate. The electroluminescence of the LED chip after the laser slicing lift-off was demonstrated.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.06348

PDF

https://arxiv.org/pdf/1902.06348
Read All
Infinite Brain MR Images: PGGAN-based Data Augmentation for Tumor Detection

2019-03-29

Changhee Han, Leonardo Rundo, Ryosuke Araki, Yujiro Furukawa, Giancarlo Mauri, Hideki Nakayama, Hideaki Hayashi

arXiv_AI

arXiv_AI Adversarial GAN CNN Detection
Abstract

Due to the lack of available annotated medical images, accurate computer-assisted diagnosis requires intensive Data Augmentation (DA) techniques, such as geometric/intensity transformations of original images; however, those transformed images intrinsically have a similar distribution to the original ones, leading to limited performance improvement. To fill the data lack in the real image distribution, we synthesize brain contrast-enhanced Magnetic Resonance (MR) images—realistic but completely different from the original ones—using Generative Adversarial Networks (GANs). This study exploits Progressive Growing of GANs (PGGANs), a multi-stage generative training method, to generate original-sized 256 X 256 MR images for Convolutional Neural Network-based brain tumor detection, which is challenging via conventional GANs; difficulties arise due to unstable GAN training with high resolution and a variety of tumors in size, location, shape, and contrast. Our preliminary results show that this novel PGGAN-based DA method can achieve promising performance improvement, when combined with classical DA, in tumor detection and also in other medical imaging tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12564

PDF

http://arxiv.org/pdf/1903.12564
Read All
Second Rethinking of Network Pruning in the Adversarial Setting

2019-03-29

Shaokai Ye, Kaidi Xu, Sijia Liu, Hao Cheng, Jan-Henrik Lambrechts, Huan Zhang, Aojun Zhou, Kaisheng Ma, Yanzhi Wang, Xue Lin

arXiv_CV

arXiv_CV Adversarial Optimization
Abstract

It is well known that deep neural networks (DNNs) are vulnerable to adversarial attacks, which are implemented by adding crafted perturbations onto benign examples. Min-max robust optimization based adversarial training can provide a notion of security against adversarial attacks. However, adversarial robustness requires a significantly larger capacity of the network than that for the natural training with only benign examples. This paper proposes a framework of concurrent adversarial training and weight pruning that enables model compression while still preserving the adversarial robustness and essentially tackles the dilemma of adversarial training. Furthermore, this work studies two hypotheses about weight pruning in the conventional network pruning setting and finds that weight pruning is essential for reducing the network model size in the adversarial setting, i.e., training a small model from scratch even with inherited initialization from the large model cannot achieve both adversarial robustness and model compression.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12561

PDF

http://arxiv.org/pdf/1903.12561
Read All
Learning More with Less: GAN-based Medical Image Augmentation

2019-03-29

Changhee Han, Kohei Murao, Shin'ichi Satoh, Hideki Nakayama

arXiv_AI

arXiv_AI Adversarial Object_Detection Segmentation GAN CNN Classification Detection
Abstract

Accurate computer-assisted diagnosis using Convolutional Neural Networks (CNNs) requires large-scale annotated training data, associated with expert physicians’ time-consuming labor; thus, Data Augmentation (DA) using Generative Adversarial Networks (GANs) is essential in Medical Imaging, since they can synthesize additional annotated training data to handle small and fragmented medical images from various scanners; those images are realistic but completely different from the original ones, filling the data lack in the real image distribution. As a tutorial, this paper introduces background on GAN-based Medical Image Augmentation, along with tricks to achieve high classification/object detection/segmentation performance using them, based on our empirical experience and related work. Moreover, we show our first GAN-based DA work using automatic bounding box annotation, for robust CNN-based brain metastases detection on 256 x 256 MR images; GAN-based DA can boost 10% sensitivity in diagnosis with a clinically acceptable amount of additional False Positives, even with highly-rough and inconsistent bounding boxes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00838

PDF

http://arxiv.org/pdf/1904.00838
Read All
Probabilistic Forecasting of Sensory Data with Generative Adversarial Networks - ForGAN

2019-03-29

Alireza Koochali, Peter Schichtel, Sheraz Ahmed, Andreas Dengel

arXiv_AI

arXiv_AI Adversarial GAN
Abstract

Time series forecasting is one of the challenging problems for humankind. Traditional forecasting methods using mean regression models have severe shortcomings in reflecting real-world fluctuations. While new probabilistic methods rush to rescue, they fight with technical difficulties like quantile crossing or selecting a prior distribution. To meld the different strengths of these fields while avoiding their weaknesses as well as to push the boundary of the state-of-the-art, we introduce ForGAN - one step ahead probabilistic forecasting with generative adversarial networks. ForGAN utilizes the power of the conditional generative adversarial network to learn the data generating distribution and compute probabilistic forecasts from it. We argue how to evaluate ForGAN in opposition to regression methods. To investigate probabilistic forecasting of ForGAN, we create a new dataset and demonstrate our method abilities on it. This dataset will be made publicly available for comparison. Furthermore, we test ForGAN on two publicly available datasets, namely Mackey-Glass dataset and Internet traffic dataset (A5M) where the impressive performance of ForGAN demonstrate its high capability in forecasting future values.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12549

PDF

http://arxiv.org/pdf/1903.12549
Read All
Motion Guided LIDAR-camera Self-calibration and Accelerated Depth Upsampling

2019-03-29

Juan Castorena, Gint Puskorius, Gaurav Pandey

arXiv_CV

arXiv_CV
Abstract

In this work we describe a novel motion guided method for targetless self-calibration of a LiDAR and camera and use the re-projection of LiDAR points onto the image reference frame for real-time depth upsampling. The calibration parameters are estimated by optimizing an objective function that penalizes distances between 2D and re-projected 3D motion vectors obtained from time-synchronized image and point cloud sequences. For upsampling, we propose a simple, yet effective and time efficient formulation that minimizes depth gradients subject to an equality constraint involving the LiDAR measurements. We test our algorithms on real data from urban environments and demonstrate that our two methods are effective and suitable to mobile robotics and autonomous vehicle applications imposing real-time requirements.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.10681

PDF

http://arxiv.org/pdf/1803.10681
Read All
Re-Ranking Words to Improve Interpretability of Automatically Generated Topics

2019-03-29

Areej Alokaili, Nikolaos Aletras, Mark Stevenson

arXiv_CL

arXiv_CL Face Relation
Abstract

Topics models, such as LDA, are widely used in Natural Language Processing. Making their output interpretable is an important area of research with applications to areas such as the enhancement of exploratory search interfaces and the development of interpretable machine learning models. Conventionally, topics are represented by their n most probable words, however, these representations are often difficult for humans to interpret. This paper explores the re-ranking of topic words to generate more interpretable topic representations. A range of approaches are compared and evaluated in two experiments. The first uses crowdworkers to associate topics represented by different word rankings with related documents. The second experiment is an automatic approach based on a document retrieval task applied on multiple domains. Results in both experiments demonstrate that re-ranking words improves topic interpretability and that the most effective re-ranking schemes were those which combine information about the importance of words both within topics and their relative frequency in the entire corpus. In addition, close correlation between the results of the two evaluation approaches suggests that the automatic method proposed here could be used to evaluate re-ranking methods without the need for human judgements.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12542

PDF

http://arxiv.org/pdf/1903.12542
Read All
A Study on the Global Convergence Time Complexity of Estimation of Distribution Algorithms

2019-03-29

R. Rastegar, M. R. Meybodi

arXiv_AI

arXiv_AI
Abstract

The Estimation of Distribution Algorithm is a new class of population based search methods in that a probabilistic model of individuals is estimated based on the high quality individuals and used to generate the new individuals. In this paper we compute 1) some upper bounds on the number of iterations required for global convergence of EDA 2) the exact number of iterations needed for EDA to converge to global optima.

Abstract (translated by Google)

URL

http://arxiv.org/abs/cs/0601132

PDF

http://arxiv.org/e-print/cs/0601132
Read All
Photo-realistic Monocular Gaze Redirection using Generative Adversarial Networks

2019-03-29

Zhe He, Adrian Spurr, Xucong Zhang, Otmar Hilliges

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Gaze redirection is the task of changing the gaze to a desired direction for a given monocular eye patch image. Many applications such as videoconferencing, films and games, and generation of training data for gaze estimation require redirecting the gaze, without distorting the appearance of the area surrounding the eye and while producing photo-realistic images. Existing methods lack the ability to generate perceptually plausible images. In this work, we present a novel method to alleviate this problem by leveraging generative adversarial training to synthesize an eye image conditioned on a target gaze direction. Our method ensures perceptual similarity and consistency of synthesized images to the real images. Furthermore, a gaze estimation loss is used to control the gaze direction accurately. To attain high-quality images, we incorporate perceptual and cycle consistency losses into our architecture. In extensive evaluations we show that the proposed method outperforms state-of-the-art approaches in terms of both image quality and redirection precision. Finally, we show that generated images can bring significant improvement for the gaze estimation task if used to augment real training data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12530

PDF

http://arxiv.org/pdf/1903.12530
Read All
Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels

2019-03-29

Kai Zhang, Wangmeng Zuo, Lei Zhang

arXiv_CV

arXiv_CV Super_Resolution Quantitative
Abstract

While deep neural networks (DNN) based single image super-resolution (SISR) methods are rapidly gaining popularity, they are mainly designed for the widely-used bicubic degradation, and there still remains the fundamental challenge for them to super-resolve low-resolution (LR) image with arbitrary blur kernels. In the meanwhile, plug-and-play image restoration has been recognized with high flexibility due to its modular structure for easy plug-in of denoiser priors. In this paper, we propose a principled formulation and framework by extending bicubic degradation based deep SISR with the help of plug-and-play framework to handle LR images with arbitrary blur kernels. Specifically, we design a new SISR degradation model so as to take advantage of existing blind deblurring methods for blur kernel estimation. To optimize the new degradation induced energy function, we then derive a plug-and-play algorithm via variable splitting technique, which allows us to plug any super-resolver prior rather than the denoiser prior as a modular part. Quantitative and qualitative evaluations on synthetic and real LR images demonstrate that the proposed deep plug-and-play super-resolution framework is flexible and effective to deal with blurry LR images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12529

PDF

http://arxiv.org/pdf/1903.12529
Read All
A Provable Defense for Deep Residual Networks

2019-03-29

Matthew Mirman, Gagandeep Singh, Martin Vechev

arXiv_AI

arXiv_AI
Abstract

We present a training system, which can provably defend significantly larger neural networks than previously possible, including ResNet-34 and DenseNet-100. Our approach is based on differentiable abstract interpretation and introduces two novel concepts: (i) abstract layers for fine-tuning the precision and scalability of the abstraction, (ii) a flexible domain specific language (DSL) for describing training objectives that combine abstract and concrete losses with arbitrary specifications. Our training method is implemented in the DiffAI system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12519

PDF

http://arxiv.org/pdf/1903.12519
Read All
Towards Brain-inspired System: Deep Recurrent Reinforcement Learning for Simulated Self-driving Agent

2019-03-29

Jieneng Chen, Jingye Chen, Ruiming Zhang, Xiaobin Hu

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

An effective way to achieve intelligence is to simulate various intelligent behaviors in the human brain. In recent years, bio-inspired learning methods have emerged, and they are different from the classical mathematical programming principle. In the perspective of brain inspiration, reinforcement learning has gained additional interest in solving decision-making tasks as increasing neuroscientific research demonstrates that significant links exist between reinforcement learning and specific neural substrates. Because of the tremendous research that focuses on human brains and reinforcement learning, scientists have investigated how robots can autonomously tackle complex tasks in the form of a self-driving agent control in a human-like way. In this study, we propose an end-to-end architecture using novel deep-Q-network architecture in conjunction with a recurrence to resolve the problem in the field of simulated self-driving. The main contribution of this study is that we trained the driving agent using a brain-inspired trial-and-error technique, which was in line with the real world situation. Besides, there are three innovations in the proposed learning network: raw screen outputs are the only information which the driving agent can rely on, a weighted layer that enhances the differences of the lengthy episode, and a modified replay mechanism that overcomes the problem of sparsity and accelerates learning. The proposed network was trained and tested under a third-partied OpenAI Gym environment. After training for several episodes, the resulting driving agent performed advanced behaviors in the given scene. We hope that in the future, the proposed brain-inspired learning system would inspire practicable self-driving control solutions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12517

PDF

http://arxiv.org/pdf/1903.12517
Read All
Training Object Detectors on Synthetic Images Containing Reflecting Materials

2019-03-29

Sebastian Hartwig, Timo Ropinski

arXiv_CV

arXiv_CV Object_Detection Deep_Learning Detection
Abstract

One of the grand challenges of deep learning is the requirement to obtain large labeled training data sets. While synthesized data sets can be used to overcome this challenge, it is important that these data sets close the reality gap, i.e., a model trained on synthetic image data is able to generalize to real images. Whereas, the reality gap can be considered bridged in several application scenarios, training on synthesized images containing reflecting materials requires further research. Since the appearance of objects with reflecting materials is dominated by the surrounding environment, this interaction needs to be considered during training data generation. Therefore, within this paper we examine the effect of reflecting materials in the context of synthetic image generation for training object detectors. We investigate the influence of rendering approach used for image synthesis, the effect of domain randomization, as well as the amount of used training data. To be able to compare our results to the state-of-the-art, we focus on indoor scenes as they have been investigated extensively. Within this scenario, bathroom furniture is a natural choice for objects with reflecting materials, for which we report our findings on real and synthetic testing data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00824

PDF

http://arxiv.org/pdf/1904.00824
Read All
Degrees of Laziness in Grounding: Effects of Lazy-Grounding Strategies on ASP Solving

2019-03-29

Richard Taupe, Antonius Weinzierl, Gerhard Friedrich

arXiv_AI

arXiv_AI Relation
Abstract

The traditional ground-and-solve approach to Answer Set Programming (ASP) suffers from the grounding bottleneck, which makes large-scale problem instances unsolvable. Lazy grounding is an alternative approach that interleaves grounding with solving and thus uses space more efficiently. The limited view on the search space in lazy grounding poses unique challenges, however, and can have adverse effects on solving performance. In this paper we present a novel characterization of degrees of laziness in grounding for ASP, i.e. of compromises between lazily grounding as little as possible and the traditional full grounding upfront. We investigate how these degrees of laziness compare to each other formally as well as, by means of an experimental analysis using a number of benchmarks, in terms of their effects on solving performance. Our contributions are the introduction of a range of novel lazy grounding strategies, a formal account on their relationships and their correctness, and an investigation of their effects on solving performance. Experiments show that our approach performs significantly better than state-of-the-art lazy grounding in many cases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12510

PDF

http://arxiv.org/pdf/1903.12510
Read All

101/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL