Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

On Time-frequency Scattering and Computer Music

2019-05-20

Vincent Lostanlen

arXiv_SD

arXiv_SD Recognition
Abstract

Time-frequency scattering is a mathematical transformation of sound waves. Its core purpose is to mimick the way the human auditory system extracts information from its environment. In the context of improving the artificial intelligence of sounds, it has found succesful applications in automatic speech transcription as well as the recognition of urban sounds and musical sounds. In this article, we show that time-frequency scattering can also be useful for applications in contemporary music creations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.04506

PDF

http://arxiv.org/pdf/1810.04506
Read All
Enriching Pre-trained Language Model with Entity Information for Relation Classification

2019-05-20

Shanchan Wu, Yifan He

arXiv_CL

arXiv_CL CNN RNN Classification Language_Model Relation
Abstract

Relation classification is an important NLP task to extract relations between entities. The state-of-the-art methods for relation classification are primarily based on Convolutional or Recurrent Neural Networks. Recently, the pre-trained BERT model achieves very successful results in many NLP classification / sequence labeling tasks. Relation classification differs from those tasks in that it relies on information of both the sentence and the two target entities. In this paper, we propose a model that both leverages the pre-trained BERT language model and incorporates information from the target entities to tackle the relation classification task. We locate the target entities and transfer the information through the pre-trained architecture and incorporate the corresponding encoding of the two entities. We achieve significant improvement over the state-of-the-art method on the SemEval-2010 task 8 relational dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08284

PDF

http://arxiv.org/pdf/1905.08284
Read All
Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

2019-05-20

Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, Victor Lempitsky

arXiv_CV

arXiv_CV Adversarial CNN
Abstract

Several recent works have shown how highly realistic human head images can be obtained by training convolutional neural networks to generate them. In order to create a personalized talking head model, these works require training on a large dataset of images of a single person. However, in many practical scenarios, such personalized talking head models need to be learned from a few image views of a person, potentially even a single image. Here, we present a system with such few-shot capability. It performs lengthy meta-learning on a large dataset of videos, and after that is able to frame few- and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators. Crucially, the system is able to initialize the parameters of both the generator and the discriminator in a person-specific way, so that training can be based on just a few images and done quickly, despite the need to tune tens of millions of parameters. We show that such an approach is able to learn highly realistic and personalized talking head models of new people and even portrait paintings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08233

PDF

http://arxiv.org/pdf/1905.08233
Read All
Adversarially robust transfer learning

2019-05-20

Ali Shafahi, Parsa Saadatpanah, Chen Zhu, Amin Ghiasi, Christoph Studer, David Jacobs, Tom Goldstein

arXiv_CV

arXiv_CV Adversarial Transfer_Learning
Abstract

Transfer learning, in which a network is trained on one task and re-purposed on another, is often used to produce neural network classifiers when data is scarce or full-scale training is too costly. When the goal is to produce a model that is not only accurate but also adversarially robust, data scarcity and computational limitations become even more cumbersome. We consider robust transfer learning, in which we transfer not only performance but also robustness from a source model to a target domain. We start by observing that robust networks contain robust feature extractors. By training classifiers on top of these feature extractors, we produce new models that inherit the robustness of their parent networks. We then consider the case of fine-tuning a network by re-training end-to-end in the target domain. When using lifelong learning strategies, this process preserves the robustness of the source network while achieving high accuracy. By using such strategies, it is possible to produce accurate and robust models with little data, and without the cost of adversarial training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08232

PDF

http://arxiv.org/pdf/1905.08232
Read All
Patch-based 3D Human Pose Refinement

2019-05-20

Qingfu Wan, Weichao Qiu, Alan L. Yuille

arXiv_CV

arXiv_CV Segmentation Pose_Estimation Prediction
Abstract

State-of-the-art 3D human pose estimation approaches typically estimate pose from the entire RGB image in a single forward run. In this paper, we develop a post-processing step to refine 3D human pose estimation from body part patches. Using local patches as input has two advantages. First, the fine details around body parts are zoomed in to high resolution for preciser 3D pose prediction. Second, it enables the part appearance to be shared between poses to benefit rare poses. In order to acquire informative representation of patches, we explore different input modalities and validate the superiority of fusing predicted segmentation with RGB. We show that our method consistently boosts the accuracy of state-of-the-art 3D human pose methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08231

PDF

http://arxiv.org/pdf/1905.08231
Read All
Vision-based Navigation of Autonomous Vehicle in Roadway Environments with Unexpected Hazards

2019-05-20

Mhafuzul Islam, Mahsrur Chowdhury, Hongda Li, Hongxin Hu

arXiv_AI

arXiv_AI Adversarial Object_Detection Segmentation Semantic_Segmentation Detection
Abstract

Vision-based navigation of autonomous vehicles primarily depends on the Deep Neural Network (DNN) based systems in which the controller obtains input from sensors/detectors, such as cameras and produces a vehicle control output, such as a steering wheel angle to navigate the vehicle safely in a roadway traffic environment. Typically, these DNN-based systems of the autonomous vehicle are trained through supervised learning; however, recent studies show that a trained DNN-based system can be compromised by perturbation or adversarial inputs. Similarly, this perturbation can be introduced into the DNN-based systems of autonomous vehicle by unexpected roadway hazards, such as debris and roadblocks. In this study, we first introduce a roadway hazardous environment (both intentional and unintentional roadway hazards) that can compromise the DNN-based navigational system of an autonomous vehicle, and produces an incorrect steering wheel angle, which can cause crashes resulting in fatality and injury. Then, we develop a DNN-based autonomous vehicle driving system using object detection and semantic segmentation to mitigate the adverse effect of this type of hazardous environment, which helps the autonomous vehicle to navigate safely around such hazards. We find that our developed DNN-based autonomous vehicle driving system including hazardous object detection and semantic segmentation improves the navigational ability of an autonomous vehicle to avoid a potential hazard by 21% compared to the traditional DNN-based autonomous vehicle driving system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.03967

PDF

http://arxiv.org/pdf/1810.03967
Read All
Accelerated Discovery of Sustainable Building Materials

2019-05-20

Xiou Ge, Richard T. Goodwin, Jeremy R. Gregory, Randolph E. Kirchain, Joana Maria, Lav R. Varshney

arXiv_AI

arXiv_AI Face
Abstract

Concrete is the most widely used engineered material in the world with more than 10 billion tons produced annually. Unfortunately, with that scale comes a significant burden in terms of energy, water, and release of greenhouse gases and other pollutants. As such, there is interest in creating concrete formulas that minimize this environmental burden, while satisfying engineering performance requirements. Recent advances in artificial intelligence have enabled machines to generate highly plausible artifacts, such as images of realistic looking faces. Semi-supervised generative models allow generation of artifacts with specific, desired characteristics. In this work, we use Conditional Variational Autoencoders (CVAE), a type of semi-supervised generative model, to discover concrete formulas with desired properties. Our model is trained using open data from the UCI Machine Learning Repository joined with environmental impact data computed using a web-based tool. We demonstrate CVAEs can design concrete formulas with lower emissions and natural resource usage while meeting design requirements. To ensure fair comparison between extant and generated formulas, we also train regression models to predict the environmental impacts and strength of discovered formulas. With these results, a construction engineer may create a formula that meets structural needs and best addresses local environmental concerns.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08222

PDF

http://arxiv.org/pdf/1905.08222
Read All
Aligning Script Events with Narrative Texts

2019-05-20

Simon Ostermann, Michael Roth, Stefan Thater, Manfred Pinkal

arXiv_CL

arXiv_CL Knowledge
Abstract

Script knowledge plays a central role in text understanding and is relevant for a variety of downstream tasks. In this paper, we consider two recent datasets which provide a rich and general representation of script events in terms of paraphrase sets. We introduce the task of mapping event mentions in narrative texts to such script event types, and present a model for this task that exploits rich linguistic representations as well as information on temporal ordering. The results of our experiments demonstrate that this complex task is indeed feasible.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1710.05709

PDF

http://arxiv.org/pdf/1710.05709
Read All
Drone Shadow Tracking

2019-05-20

Xiaoyan Zou, Ruofan Zhou, Majed El Helou, Sabine Süsstrunk

arXiv_CV

arXiv_CV Knowledge Face Tracking Drone Detection Relation
Abstract

Aerial videos taken by a drone not too far above the surface may contain the drone’s shadow projected on the scene. This deteriorates the aesthetic quality of videos. With the presence of other shadows, shadow removal cannot be directly applied, and the shadow of the drone must be tracked. Tracking a drone’s shadow in a video is, however, challenging. The varying size, shape, change of orientation and drone altitude pose difficulties. The shadow can also easily disappear over dark areas. However, a shadow has specific properties that can be leveraged, besides its geometric shape. In this paper, we incorporate knowledge of the shadow’s physical properties, in the form of shadow detection masks, into a correlation-based tracking algorithm. We capture a test set of aerial videos taken with different settings and compare our results to those of a state-of-the-art tracking algorithm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08214

PDF

http://arxiv.org/pdf/1905.08214
Read All
Target Conditioned Sampling: Optimizing Data Selection for Multilingual Neural Machine Translation

2019-05-20

Xinyi Wang, Graham Neubig

arXiv_CL

arXiv_CL NMT
Abstract

To improve low-resource Neural Machine Translation (NMT) with multilingual corpora, training on the most related high-resource language only is often more effective than using all data available (Neubig and Hu, 2018). However, it is possible that an intelligent data selection strategy can further improve low-resource NMT with data from other auxiliary languages. In this paper, we seek to construct a sampling distribution over all multilingual data, so that it minimizes the training loss of the low-resource language. Based on this formulation, we propose an efficient algorithm, Target Conditioned Sampling (TCS), which first samples a target sentence, and then conditionally samples its source sentence. Experiments show that TCS brings significant gains of up to 2 BLEU on three of four languages we test, with minimal training overhead.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08212

PDF

http://arxiv.org/pdf/1905.08212
Read All
Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

2019-05-20

Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, Dongmei Zhang

arXiv_CL

arXiv_CL Knowledge
Abstract

We present a neural approach called IRNet for complex and cross-domain Text-to-SQL. IRNet aims to address two challenges: 1) the mismatch between intents expressed in natural language (NL) and the implementation details in SQL; 2) the challenge in predicting columns caused by the large number of out-of-domain words. Instead of end-to-end synthesizing a SQL query, IRNet decomposes the synthesis process into three phases. In the first phase, IRNet performs a schema linking over a question and a database schema. Then, IRNet adopts a grammar-based neural model to synthesize a SemQL query which is an intermediate representation that we design to bridge NL and SQL. Finally, IRNet deterministically infers a SQL query from the synthesized SemQL query with domain knowledge. On the challenging Text-to-SQL benchmark Spider, IRNet achieves 46.7% accuracy, obtaining 19.5% absolute improvement over previous state-of-the-art approaches. At the time of writing, IRNet achieves the first position on the Spider leaderboard.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08205

PDF

http://arxiv.org/pdf/1905.08205
Read All
A Bayesian Approach to Robust Reinforcement Learning

2019-05-20

Esther Derman, Daniel Mankowitz, Timothy Mann, Shie Mannor

arXiv_AI

arXiv_AI Adversarial Reinforcement_Learning
Abstract

Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty set and a robust optimal policy can be derived under the worst-case scenario. In this study, we address the issue of learning in RMDPs using a Bayesian approach. We introduce the Uncertainty Robust Bellman Equation (URBE) which encourages safe exploration for adapting the uncertainty set to new observations while preserving robustness. We propose a URBE-based algorithm, DQN-URBE, that scales this method to higher dimensional domains. Our experiments show that the derived URBE-based strategy leads to a better trade-off between less conservative solutions and robustness in the presence of model misspecification. In addition, we show that the DQN-URBE algorithm can adapt significantly faster to changing dynamics online compared to existing robust techniques with fixed uncertainty sets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08188

PDF

http://arxiv.org/pdf/1905.08188
Read All
Semi-Supervised Learning by Augmented Distribution Alignment

2019-05-20

Qin Wang, Wen Li, Luc Van Gool

arXiv_CV

arXiv_CV Adversarial
Abstract

In this work, we propose a simple yet effective semi-supervised learning approach called Augmented Distribution Alignment. We reveal that an essential sampling bias exists in semi-supervised learning due to the limited amount of labeled samples, which often leads to a considerable empirical distribution mismatch between labeled data and unlabeled data. To this end, we propose to align the empirical distributions of labeled and unlabeled data to alleviate the bias. On one hand, we adopt an adversarial training strategy to minimize the distribution distance between labeled and unlabeled data as inspired by domain adaptation works. On the other hand, to deal with the small sample size issue of labeled data, we also propose a simple interpolation strategy to generate pseudo training samples. Those two strategies can be easily implemented into existing deep neural networks. We demonstrate the effectiveness of our proposed approach on the benchmark SVHN and CIFAR10 datasets, on which we achieve new state-of-the-art error rates of $3.54\%$ and $10.09\%$, respectively. Our code will be available at \url{https://github.com/qinenergy/adanet}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08171

PDF

http://arxiv.org/pdf/1905.08171
Read All
DARC: Differentiable ARchitecture Compression

2019-05-20

Shashank Singh, Ashish Khetan, Zohar Karnin

arXiv_CV

arXiv_CV NAS CNN Image_Classification Inference Classification
Abstract

In many learning situations, resources at inference time are significantly more constrained than resources at training time. This paper studies a general paradigm, called Differentiable ARchitecture Compression (DARC), that combines model compression and architecture search to learn models that are resource-efficient at inference time. Given a resource-intensive base architecture, DARC utilizes the training data to learn which sub-components can be replaced by cheaper alternatives. The high-level technique can be applied to any neural architecture, and we report experiments on state-of-the-art convolutional neural networks for image classification. For a WideResNet with $97.2\%$ accuracy on CIFAR-10, we improve single-sample inference speed by $2.28\times$ and memory footprint by $5.64\times$, with no accuracy loss. For a ResNet with $79.15\%$ Top1 accuracy on ImageNet, we improve batch inference speed by $1.29\times$ and memory footprint by $3.57\times$ with $1\%$ accuracy loss. We also give theoretical Rademacher complexity bounds in simplified cases, showing how DARC avoids overfitting despite over-parameterization.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08170

PDF

http://arxiv.org/pdf/1905.08170
Read All
Interpretable Neural Predictions with Differentiable Binary Variables

2019-05-20

Joost Bastings, Wilker Aziz, Ivan Titov

arXiv_CL

arXiv_CL Attention Prediction
Abstract

The success of neural networks comes hand in hand with a desire for more interpretability. We focus on text classifiers and make them more interpretable by having them provide a justification, a rationale, for their predictions. We approach this problem by jointly training two neural network models: a latent model that selects a rationale (i.e. a short and informative part of the input text), and a classifier that learns from the words in the rationale alone. Previous work proposed to assign binary latent masks to input positions and to promote short selections via sparsity-inducing penalties such as L0 regularisation. We propose a latent model that mixes discrete and continuous behaviour allowing at the same time for binary selections and gradient-based training without REINFORCE. In our formulation, we can tractably compute the expected value of penalties such as L0, which allows us to directly optimise the model towards a pre-specified text selection rate. We show that our approach is competitive with previous work on rationale extraction, and explore further uses in attention mechanisms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08160

PDF

http://arxiv.org/pdf/1905.08160
Read All
Self-Supervised Similarity Learning for Digital Pathology

2019-05-20

Jacob Gildenblat, Eldad Klaiman

arXiv_CV

arXiv_CV Image_Retrieval Knowledge Deep_Learning
Abstract

Using features extracted from networks pretrained on ImageNet is a common practice in applications of deep learning for digital pathology. However it presents the downside of missing domain specific image information. In digital pathology, supervised training data is expensive and difficult to collect. We propose a self supervised method for feature extraction by similarity learning on whole slide images (WSI) that is simple to implement and allows creation of robust and compact image descriptors. We train a siamese network, exploiting image spatial continuity and assuming spatially adjacent tiles in the image are more similar to each other than distant tiles. Our network outputs feature vectors of length 128, which allows dramatically lower memory storage and faster processing than networks pretrained on ImageNet. We apply the method on digital pathology whole slide images (WSI) from the Camelyon16 train set and assess and compare our method by measuring image retrieval of tumor tiles and descriptor pair distance ratio for distant/near tiles in the Camelyon16 test set. We show that our method yields better retrieval task results than existing ImageNet based and generic self-supervised feature extraction methods. To the best of our knowledge, this is also the first published method for self supervised learning tailored for digital pathology.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08139

PDF

http://arxiv.org/pdf/1905.08139
Read All
Zero-Shot Knowledge Distillation in Deep Networks

2019-05-20

Gaurav Kumar Nayak, Konda Reddy Mopuri, Vaisakh Shaj, R. Venkatesh Babu, Anirban Chakraborty

arXiv_CV

arXiv_CV Knowledge
Abstract

Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted from it in order to train the Student. However, accessing the dataset on which the Teacher has been trained may not always be feasible if the dataset is very large or it poses privacy or safety concerns (e.g., bio-metric or medical data). Hence, in this paper, we propose a novel data-free method to train the Student from the Teacher. Without even using any meta-data, we synthesize the Data Impressions from the complex Teacher model and utilize these as surrogates for the original training data samples to transfer its learning to Student via knowledge distillation. We, therefore, dub our method “Zero-Shot Knowledge Distillation” and demonstrate that our framework results in competitive generalization performance as achieved by distillation using the actual training data samples on multiple benchmark datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08114

PDF

http://arxiv.org/pdf/1905.08114
Read All
Image Captioning based on Deep Learning Methods: A Survey

2019-05-20

Yiyu Wang, Jungang Xu, Yingfei Sun, Ben He

arXiv_CV

arXiv_CV Image_Caption Image_Retrieval Attention Caption Survey Deep_Learning
Abstract

Image captioning is a challenging task and attracting more and more attention in the field of Artificial Intelligence, and which can be applied to efficient image retrieval, intelligent blind guidance and human-computer interaction, etc. In this paper, we present a survey on advances in image captioning based on Deep Learning methods, including Encoder-Decoder structure, improved methods in Encoder, improved methods in Decoder, and other improvements. Furthermore, we discussed future research directions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08110

PDF

http://arxiv.org/pdf/1905.08110
Read All
Decoding the Rejuvenating Effects of Mechanical Loading on Skeletal Maturation using in Vivo Imaging and Deep Learning

2019-05-20

Pouyan Asgharzadeh, Oliver Röhrle, Bettina M. Willie, Annette I. Birkhold

arXiv_AI

arXiv_AI Deep_Learning Prediction Quantitative
Abstract

Throughout the process of aging, deterioration of bone macro- and micro-architecture, as well as material decomposition result in a loss of strength and therefore in an increased likelihood of fractures. To date, precise contributions of age-related changes in bone (re)modeling and (de)mineralization dynamics and its effect on the loss of functional integrity are not completely understood. Here, we present an image-based deep learning approach to quantitatively describe the dynamic effects of short-term aging and adaptive response to treatment in proximal mouse tibia and fibula. Our approach allowed us to perform an end-to-end age prediction based on $\mu$CT images to determine the dynamic biological process of tissue maturation during a two week period, therefore permitting a short-term bone aging prediction with $95\%$ accuracy. In a second application, our radiomics analysis reveals that two weeks of in vivo mechanical loading are associated with an underlying rejuvenating effect of 5 days. Additionally, by quantitatively analyzing the learning process, we could, for the first time, identify the localization of the age-relevant encoded information and demonstrate $89\%$ load-induced similarity of these locations in the loaded tibia with younger bones. These data suggest that our method enables identifying a general prognostic phenotype of a certain bone age as well as a temporal and localized loading-treatment effect on this apparent bone age. Future translational applications of this method may provide an improved decision-support method for osteoporosis treatment at low cost.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08099

PDF

http://arxiv.org/pdf/1905.08099
Read All
Optimizing the Latent Space of Generative Networks

2019-05-20

Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam

arXiv_CV

arXiv_CV Adversarial GAN CNN Optimization
Abstract

Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images. In most successful applications, GAN models share two common aspects: solving a challenging saddle point optimization problem, interpreted as an adversarial game between a generator and a discriminator functions; and parameterizing the generator and the discriminator as deep convolutional neural networks. The goal of this paper is to disentangle the contribution of these two factors to the success of GANs. In particular, we introduce Generative Latent Optimization (GLO), a framework to train deep convolutional generators using simple reconstruction losses. Throughout a variety of experiments, we show that GLO enjoys many of the desirable properties of GANs: synthesizing visually-appealing samples, interpolating meaningfully between samples, and performing linear arithmetic with noise vectors; all of this without the adversarial optimization scheme.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1707.05776

PDF

http://arxiv.org/pdf/1707.05776
Read All
Activity Recognition and Prediction in Real Homes

2019-05-20

Flavia Dias Casagrande, Evi Zouganeli

arXiv_CV

arXiv_CV CNN Transfer_Learning RNN Classification Prediction Recognition
Abstract

In this paper, we present work in progress on activity recognition and prediction in real homes using either binary sensor data or depth video data. We present our field trial and set-up for collecting and storing the data, our methods, and our current results. We compare the accuracy of predicting the next binary sensor event using probabilistic methods and Long Short-Term Memory (LSTM) networks, include the time information to improve prediction accuracy, as well as predict both the next sensor event and its mean time of occurrence using one LSTM model. We investigate transfer learning between apartments and show that it is possible to pre-train the model with data from other apartments and achieve good accuracy in a new apartment straight away. In addition, we present preliminary results from activity recognition using low-resolution depth video data from seven apartments, and classify four activities - no movement, standing up, sitting down, and TV interaction - by using a relatively simple processing method where we apply an Infinite Impulse Response (IIR) filter to extract movements from the frames prior to feeding them to a convolutional LSTM network for the classification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08654

PDF

http://arxiv.org/pdf/1905.08654
Read All
Catastrophic forgetting: still a problem for DNNs

2019-05-20

B. Pfülb, A. Gepperth, S. Abdullah, A. Kilian

arXiv_CV

arXiv_CV
Abstract

We investigate the performance of DNNs when trained on class-incremental visual problems consisting of initial training, followed by retraining with added visual classes. Catastrophic forgetting (CF) behavior is measured using a new evaluation procedure that aims at an application-oriented view of incremental learning. In particular, it imposes that model selection must be performed on the initial dataset alone, as well as demanding that retraining control be performed only using the retraining dataset, as initial dataset is usually too large to be kept. Experiments are conducted on class-incremental problems derived from MNIST, using a variety of different DNN models, some of them recently proposed to avoid catastrophic forgetting. When comparing our new evaluation procedure to previous approaches for assessing CF, we find their findings are completely negated, and that none of the tested methods can avoid CF in all experiments. This stresses the importance of a realistic empirical measurement procedure for catastrophic forgetting, and the need for further research in incremental learning for DNNs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08077

PDF

http://arxiv.org/pdf/1905.08077
Read All
The Twin-System Approach as One Generic Solution for XAI: An Overview of ANN-CBR Twins for Explaining Deep Learning

2019-05-20

Mark T. Keane, Eoin M. Kenny

arXiv_AI

arXiv_AI Deep_Learning
Abstract

The notion of twin systems is proposed to address the eXplainable AI (XAI) problem, where an uninterpretable black-box system is mapped to a white-box ‘twin’ that is more interpretable. In this short paper, we overview very recent work that advances a generic solution to the XAI problem, the so called twin system approach. The most popular twinning in the literature is that between an Artificial Neural Networks (ANN ) as a black box and Case Based Reasoning (CBR) system as a white-box, where the latter acts as an interpretable proxy for the former. We outline how recent work reviving this idea has applied it to deep learning methods. Furthermore, we detail the many fruitful directions in which this work may be taken; such as, determining the most (i) accurate feature-weighting methods to be used, (ii) appropriate deployments for explanatory cases, (iii) useful cases of explanatory value to users.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08069

PDF

http://arxiv.org/pdf/1905.08069
Read All
Less Memory, Faster Speed: Refining Self-Attention Module for Image Reconstruction

2019-05-20

Zheng Wang, Jianwu Li, Ge Song, Tieling Li

arXiv_CV

arXiv_CV Adversarial Attention GAN
Abstract

Self-attention (SA) mechanisms can capture effectively global dependencies in deep neural networks, and have been applied to natural language processing and image processing successfully. However, SA modules for image reconstruction have high time and space complexity, which restrict their applications to higher-resolution images. In this paper, we refine the SA module in self-attention generative adversarial networks (SAGAN) via adapting a non-local operation, revising the connectivity among the units in SA module and re-implementing its computational pattern, such that its time and space complexity is reduced from $\text{O}(n^2)$ to $\text{O}(n)$, but it is still equivalent to the original SA module. Further, we explore the principles behind the module and discover that our module is a special kind of channel attention mechanisms. Experimental results based on two benchmark datasets of image reconstruction, verify that under the same computational environment, two models can achieve comparable effectiveness for image reconstruction, but the proposed one runs faster and takes up less memory space.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08008

PDF

http://arxiv.org/pdf/1905.08008
Read All
Deep Transfer Learning Methods for Colon Cancer Classification in Confocal Laser Microscopy Images

2019-05-20

Nils Gessert, Marcel Bengs, Lukas Wittig, Daniel Drömann, Tobias Keck, Alexander Schlaefer, David B. Ellebrecht

arXiv_CV

arXiv_CV CNN Image_Classification Transfer_Learning Classification Detection
Abstract

Purpose: The gold standard for colorectal cancer metastases detection in the peritoneum is histological evaluation of a removed tissue sample. For feedback during interventions, real-time in-vivo imaging with confocal laser microscopy has been proposed for differentiation of benign and malignant tissue by manual expert evaluation. Automatic image classification could improve the surgical workflow further by providing immediate feedback. Methods: We analyze the feasibility of classifying tissue from confocal laser microscopy in the colon and peritoneum. For this purpose, we adopt both classical and state-of-the-art convolutional neural networks to directly learn from the images. As the available dataset is small, we investigate several transfer learning strategies including partial freezing variants and full fine-tuning. We address the distinction of different tissue types, as well as benign and malignant tissue. Results: We present a thorough analysis of transfer learning strategies for colorectal cancer with confocal laser microscopy. In the peritoneum, metastases are classified with an AUC of 97.1 and in the colon, the primarius is classified with an AUC of 73.1. In general, transfer learning substantially improves performance over training from scratch. We find that the optimal transfer learning strategy differs for models and classification tasks. Conclusions: We demonstrate that convolutional neural networks and transfer learning can be used to identify cancer tissue with confocal laser microscopy. We show that there is no generally optimal transfer learning strategy and model as well as task-specific engineering is required. Given the high performance for the peritoneum, even with a small dataset, application for intraoperative decision support could be feasible.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07991

PDF

http://arxiv.org/pdf/1905.07991
Read All
Interpreting, axiomatising and representing coherent choice functions in terms of desirability

2019-05-20

Jasper De Bock, Gert de Cooman

arXiv_AI

arXiv_AI
Abstract

Choice functions constitute a simple, direct and very general mathematical framework for modelling choice under uncertainty. In particular, they are able to represent the set-valued choices that appear in imprecise-probabilistic decision making. We provide these choice functions with a clear interpretation in terms of desirability, use this interpretation to derive a set of basic coherence axioms, and show that this notion of coherence leads to a representation in terms of sets of strict preference orders. By imposing additional properties such as totality, the mixing property and Archimedeanity, we obtain representation in terms of sets of strict total orders, lexicographic probability systems, coherent lower previsions or linear previsions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00336

PDF

http://arxiv.org/pdf/1903.00336
Read All
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory

2019-05-20

Ron Amit, Ron Meir

arXiv_AI

arXiv_AI Knowledge
Abstract

In meta-learning an agent extracts knowledge from observed tasks, aiming to facilitate learning of novel future tasks. Under the assumption that future tasks are ‘related’ to previous tasks, the accumulated knowledge should be learned in a way which captures the common structure across learned tasks, while allowing the learner sufficient flexibility to adapt to novel aspects of new tasks. We present a framework for meta-learning that is based on generalization error bounds, allowing us to extend various PAC-Bayes bounds to meta-learning. Learning takes place through the construction of a distribution over hypotheses based on the observed tasks, and its utilization for learning a new task. Thus, prior knowledge is incorporated through setting an experience-dependent prior for novel tasks. We develop a gradient-based algorithm which minimizes an objective function derived from the bounds and demonstrate its effectiveness numerically with deep neural networks. In addition to establishing the improved performance available through meta-learning, we demonstrate the intuitive way by which prior information is manifested at different levels of the network.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1711.01244

PDF

http://arxiv.org/pdf/1711.01244
Read All
Spin Detection in Robotic Table Tennis

2019-05-20

Jonas Tebbe, Lukas Klamt, Andreas Zell

arXiv_CV

arXiv_CV Detection
Abstract

In table tennis the rotation (spin) of the ball plays a crucial role. A table tennis match will feature a variety of strokes. Each generates different amounts and types of spin. To develop a robot which can compete with a human player, the robot needs to be able to detect spin, so that it can plan an appropriate return stroke. In this paper we compare three methods for estimating spin. The first two approaches use a high-speed camera that captures the ball in flight at a frame rate of 380 Hz. This camera allows the movement of the circular brand logo printed on the ball to be seen. The first approach uses background difference to determine the position of the logo. In a second alternative, a CNN is trained to predict the orientation of the logo. The third method evaluates the trajectory of the ball and derives the rotation from the effect of the Magnus force. In a demonstration, our robot must respond to different spin types in a real table tennis rally against a human opponent.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07967

PDF

http://arxiv.org/pdf/1905.07967
Read All
Learning to Explain with Complemental Examples

2019-05-20

Atsushi Kanehira, Tatsuya Harada

arXiv_CV

arXiv_CV
Abstract

This paper addresses the generation of explanations with visual examples. Given an input sample, we build a system that not only classifies it to a specific category, but also outputs linguistic explanations and a set of visual examples that render the decision interpretable. Focusing especially on the complementarity of the multimodal information, i.e., linguistic and visual examples, we attempt to achieve it by maximizing the interaction information, which provides a natural definition of complementarity from an information theoretical viewpoint. We propose a novel framework to generate complemental explanations, on which the joint distribution of the variables to explain, and those to be explained is parameterized by three different neural networks: predictor, linguistic explainer, and example selector. Explanation models are trained collaboratively to maximize the interaction information to ensure the generated explanation are complemental to each other for the target. The results of experiments conducted on several datasets demonstrate the effectiveness of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.01280

PDF

http://arxiv.org/pdf/1812.01280
Read All
Guiding Theorem Proving by Recurrent Neural Networks

2019-05-20

Bartosz Piotrowski, Josef Urban

arXiv_AI

arXiv_AI RNN
Abstract

We describe two theorem proving tasks – premise selection and internal guidance – for which machine learning has been recently used with some success. We argue that the existing methods however do not correspond to the way how humans approach these tasks. In particular, the existing methods so far lack the notion of a state that is updated each time a choice in the reasoning process is made. To address that, we propose an analogy with tasks such as machine translation, where stateful architectures such as recurrent neural networks have been recently very successful. Then we develop and publish a series of sequence-to-sequence data sets that correspond to the theorem proving tasks using several encodings, and provide the first experimental evaluation of the performance of recurrent neural networks on such tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07961

PDF

http://arxiv.org/pdf/1905.07961
Read All
Spatio-Temporal Road Scene Reconstruction using Superpixel Markov Random Field

2019-05-20

Yaochen Li, Yuehu Liu, Jihua Zhu, Shiqi Ma, Zhenning Niu, Rui Guo

arXiv_CV

arXiv_CV Detection
Abstract

Scene model construction based on image rendering is an indispensable but challenging technique in computer vision and intelligent transportation systems. In this paper, we propose a framework for constructing 3D corridor-based road scene models. This consists of two successive stages: road detection and scene construction. The road detection is realized by a new superpixel Markov random field (MRF) algorithm. The data fidelity term in the MRF’s energy function is jointly computed according to the superpixel features of color, texture and location. The smoothness term is established on the basis of the interaction of spatio-temporally adjacent superpixels. In the subsequent scene construction, the foreground and background regions are modeled independently. Experiments for road detection demonstrate the proposed method outperforms the state-of-the-art in both accuracy and speed. The scene construction experiments confirm that the proposed scene models show better correctness ratios, and have the potential to support a range of applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.09790

PDF

http://arxiv.org/pdf/1811.09790
Read All
Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks

2019-05-20

Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, Cho-Jui Hsieh

arXiv_AI

arXiv_AI Embedding CNN Prediction
Abstract

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms. To test the scalability of our algorithm, we create a new Amazon2M data with 2 million nodes and 61 million edges which is more than 5 times larger than the previous largest publicly available dataset (Reddit). For training a 3-layer GCN on this data, Cluster-GCN is faster than the previous state-of-the-art VR-GCN (1523 seconds vs 1961 seconds) and using much less memory (2.2GB vs 11.2GB). Furthermore, for training 4 layer GCN on this data, our algorithm can finish in around 36 minutes while all the existing GCN training algorithms fail to train due to the out-of-memory issue. Furthermore, Cluster-GCN allows us to train much deeper GCN without much time and memory overhead, which leads to improved prediction accuracy—using a 5-layer Cluster-GCN, we achieve state-of-the-art test F1 score 99.36 on the PPI dataset, while the previous best result was 98.71 by [16].

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07953

PDF

http://arxiv.org/pdf/1905.07953
Read All
Learning Image-Specific Attributes by Hyperbolic Neighborhood Graph Propagation

2019-05-20

Xiaofeng Xu, Ivor W. Tsang, Xiaofeng Cao, Ruiheng Zhang, Chuancai Liu

arXiv_CV

arXiv_CV Classification Relation
Abstract

As a kind of semantic representation of visual object descriptions, attributes are widely used in various computer vision tasks. In most of existing attribute-based research, class-specific attributes (CSA), which are class-level annotations, are usually adopted due to its low annotation cost for each class instead of each individual image. However, class-specific attributes are usually noisy because of annotation errors and diversity of individual images. Therefore, it is desirable to obtain image-specific attributes (ISA), which are image-level annotations, from the original class-specific attributes. In this paper, we propose to learn image-specific attributes by graph-based attribute propagation. Considering the intrinsic property of hyperbolic geometry that its distance expands exponentially, hyperbolic neighborhood graph (HNG) is constructed to characterize the relationship between samples. Based on HNG, we define neighborhood consistency for each sample to identify inconsistent samples. Subsequently, inconsistent samples are refined based on their neighbors in HNG. Extensive experiments on five benchmark datasets demonstrate the significant superiority of the learned image-specific attributes over the original class-specific attributes in the zero-shot object classification task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07933

PDF

http://arxiv.org/pdf/1905.07933
Read All
Fast Regularity-Constrained Plane Reconstruction

2019-05-20

Yangbin Lin, Jialian Li, Cheng Wang, Zhonggui Chen, Zongyue Wang, Jonathan Li

arXiv_CV

arXiv_CV Knowledge Relation
Abstract

Man-made environments typically comprise planar structures that exhibit numerous geometric relationships, such as parallelism, coplanarity, and orthogonality. Making full use of these relationships can considerably improve the robustness of algorithmic plane reconstruction of complex scenes. This research leverages a constraint model requiring minimal prior knowledge to implicitly establish relationships among planes. We introduce a method based on energy minimization to reconstruct the planes consistent with our constraint model. The proposed algorithm is efficient, easily to understand, and simple to implement. The experimental results show that our algorithm successfully reconstructs planes under high percentages of noise and outliers. This is superior to other state-of-the-art regularity-constrained plane reconstruction methods in terms of speed and robustness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07922

PDF

http://arxiv.org/pdf/1905.07922
Read All
Disparity-based HDR imaging

2019-05-20

Jennifer Bonnard (CRESTIC), Gilles Valette (CRESTIC), Céline Loscos (CRESTIC)

arXiv_CV

arXiv_CV
Abstract

High-dynamic range imaging permits to extend the dynamic range of intensity values to get close to what the human eye is able to perceive. Although there has been a huge progress in the digital camera sensor range capacity, the need of capturing several exposures in order to reconstruct high-dynamic range values persist. In this paper, we present a study on how to acquire high-dynamic range values for multi-stereo images. In many papers, disparity has been used to register pixels of different images and guide the reconstruction. In this paper, we show the limitations of such approaches and propose heuristics as solutions to identified problematic cases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07918

PDF

http://arxiv.org/pdf/1905.07918
Read All
Skeleton-Based Hand Gesture Recognition by Learning SPD Matrices with Neural Networks

2019-05-20

Xuan Nguyen, Luc Brun, Olivier Lezoray, Sébastien Bougleux

arXiv_CV

arXiv_CV Recognition
Abstract

In this paper, we propose a new hand gesture recognition method based on skeletal data by learning SPD matrices with neural networks. We model the hand skeleton as a graph and introduce a neural network for SPD matrix learning, taking as input the 3D coordinates of hand joints. The proposed network is based on two newly designed layers that transform a set of SPD matrices into a SPD matrix. For gesture recognition, we train a linear SVM classifier using features extracted from our network. Experimental results on a challenging dataset (Dynamic Hand Gesture dataset from the SHREC 2017 3D Shape Retrieval Contest) show that the proposed method outperforms state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07917

PDF

http://arxiv.org/pdf/1905.07917
Read All
FloWaveNet : A Generative Flow for Raw Audio

2019-05-20

Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon

arXiv_SD

arXiv_SD Inference
Abstract

Most modern text-to-speech architectures use a WaveNet vocoder for synthesizing high-fidelity waveform audio, but there have been limitations, such as high inference time, in its practical application due to its ancestral sampling scheme. The recently suggested Parallel WaveNet and ClariNet have achieved real-time audio synthesis capability by incorporating inverse autoregressive flow for parallel sampling. However, these approaches require a two-stage training pipeline with a well-trained teacher network and can only produce natural sound by using probability distillation along with auxiliary loss terms. We propose FloWaveNet, a flow-based generative model for raw audio synthesis. FloWaveNet requires only a single-stage training procedure and a single maximum likelihood loss, without any additional auxiliary terms, and it is inherently parallel due to the characteristics of generative flow. The model can efficiently sample raw audio in real-time, with clarity comparable to previous two-stage parallel models. The code and samples for all models, including our FloWaveNet, are publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.02155

PDF

http://arxiv.org/pdf/1811.02155
Read All
Learning to Count Objects with Few Exemplar Annotations

2019-05-20

Jianfeng Wang, Rong Xiao, Yandong Guo, Lei Zhang

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

In this paper, we study the problem of object counting with incomplete annotations. Based on the observation that in many object counting problems the target objects are normally repeated and highly similar to each other, we are particularly interested in the setting when only a few exemplar annotations are provided. Directly applying object detection with incomplete annotations will result in severe accuracy degradation due to its improper handling of unlabeled object instances. To address the problem, we propose a positiveness-focused object detector (PFOD) to progressively propagate the incomplete labels before applying the general object detection algorithm. The PFOD focuses on the positive samples and ignore the negative instances at most of the learning time. This strategy, though simple, dramatically boosts the object counting accuracy. On the CARPK dataset for parking lot car counting, we improved mAP@0.5 from 4.58% to 72.44% using only 5 training images each with 5 bounding boxes. On the Drink35 dataset for shelf product counting, the mAP@0.5 is improved from 14.16% to 53.73% using 10 training images each with 5 bounding boxes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07898

PDF

http://arxiv.org/pdf/1905.07898
Read All
Intentional Attention Mask Transformation for Robust CNN Classification

2019-05-20

Masanari Kimura, Masayuki Tanaka

arXiv_AI

arXiv_AI Attention CNN Classification Recognition
Abstract

Convolutional Neural Networks have achieved impressive results in various tasks, but interpreting the internal mechanism is a challenging problem. To tackle this problem, we exploit a multi-channel attention mechanism in feature space. Our network architecture allows us to obtain an attention mask for each feature while existing CNN visualization methods provide only a common attention mask for all features. We apply the proposed multi-channel attention mechanism to multi-attribute recognition task. We can obtain different attention mask for each feature and for each attribute. Those analyses give us deeper insight into the feature space of CNNs. Furthermore, our proposed attention mechanism naturally derives a method for improving the robustness of CNNs. From the observation of feature space based on the proposed attention mask, we demonstrate that we can obtain robust CNNs by intentionally emphasizing features that are important for attributes. The experimental results for the benchmark dataset show that the proposed method gives high human interpretability while accurately grasping the attributes of the data, and improves network robustness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.02719

PDF

http://arxiv.org/pdf/1905.02719
Read All
Abusive Language Detection in Online Conversations by Combining Content-and Graph-based Features

2019-05-20

Noé Cecillon (LIA), Vincent Labatut (LIA), Richard Dufour (LIA), Georges Linarès (LIA)

arXiv_CL

arXiv_CL Classification Detection
Abstract

In recent years, online social networks have allowed worldwide users to meet and discuss. As guarantors of these communities, the administrators of these platforms must prevent users from adopting inappropriate behaviors. This verification task, mainly done by humans, is more and more difficult due to the ever growing amount of messages to check. Methods have been proposed to automatize this moderation process, mainly by providing approaches based on the textual content of the exchanged messages. Recent work has also shown that characteristics derived from the structure of conversations, in the form of conversational graphs, can help detecting these abusive messages. In this paper, we propose to take advantage of both sources of information by proposing fusion methods integrating content-and graph-based features. Our experiments on raw chat logs show that the content of the messages, but also of their dynamics within a conversation contain partially complementary information, allowing performance improvements on an abusive message classification task with a final F-measure of 93.26%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07894

PDF

http://arxiv.org/pdf/1905.07894
Read All
Independent Vector Analysis with more Microphones than Sources

2019-05-20

Robin Scheibler, Nobutaka Ono

arXiv_SD

arXiv_SD
Abstract

We extend frequency-domain blind source separation based on independent vector analysis to the case where there are more microphones than sources. The signal is modelled as non-Gaussian sources in a Gaussian background. The proposed algorithm is based on a parametrization of the demixing matrix decreasing the number of parameters to estimate. Furthermore, orthogonal constraints between the signal and background subspaces are imposed to regularize the separation. The problem can then be posed as a constrained likelihood maximization. We propose efficient alternating updates guaranteed to converge to a stationary point of the cost function. The performance of the algorithm is assessed on simulated signals. We find that the separation performance is on par with that of the conventional determined algorithm at a fraction of the computational cost.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07880

PDF

http://arxiv.org/pdf/1905.07880
Read All
Procedural Synthesis of Remote Sensing Images for Robust Change Detection with Neural Networks

2019-05-20

Maria Kolos, Anton Marin, Alexey Artemov, Evgeny Burnaev

arXiv_CV

arXiv_CV CNN Deep_Learning Detection Recognition
Abstract

Data-driven methods such as convolutional neural networks (CNNs) are known to deliver state-of-the-art performance on image recognition tasks when the training data are abundant. However, in some instances, such as change detection in remote sensing images, annotated data cannot be obtained in sufficient quantities. In this work, we propose a simple and efficient method for creating realistic targeted synthetic datasets in the remote sensing domain, leveraging the opportunities offered by game development engines. We provide a description of the pipeline for procedural geometry generation and rendering as well as an evaluation of the efficiency of produced datasets in a change detection scenario. Our evaluations demonstrate that our pipeline helps to improve the performance and convergence of deep learning models when the amount of real-world data is severely limited.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07877

PDF

http://arxiv.org/pdf/1905.07877
Read All
Planning coordinated motions for tethered planar mobile robots

2019-05-20

Xu Zhang, Quang-Cuong Pham

arXiv_RO

arXiv_RO
Abstract

This paper considers the motion planning problem for multiple tethered planar mobile robots. Each robot is attached to a fixed base by a flexible cable. Since the robots share a common workspace, the interactions amongst the robots, cables, and obstacles pose significant difficulties for planning. Previous works have studied the problem of detecting whether a target cable configuration is intersecting (or entangled). Here, we are interested in the motion planning problem: how to plan and coordinate the robot motions to realize a given non-intersecting target cable configuration. We identify four possible modes of motion, depending on whether (i) the robots move in straight lines or following their cable lines; (ii) the robots move sequentially or concurrently. We present an in-depth analysis of Straight & Concurrent, which is the most practically-interesting mode of motion. In particular, we propose algorithms that (a) detect whether a given target cable configuration is realizable by a Straight & Concurrent motion, and (b) return a valid coordinated motion plan. The algorithms are analyzed in detail and validated in simulations and in a hardware experiment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07873

PDF

http://arxiv.org/pdf/1905.07873
Read All
PaperRobot: Incremental Draft Generation of Scientific Ideas

2019-05-20

Qingyun Wang, Lifu Huang, Zhiying Jiang, Kevin Knight, Heng Ji, Mohit Bansal, Yi Luan

arXiv_AI

arXiv_AI Knowledge_Graph Knowledge Attention
Abstract

We present a PaperRobot who performs as an automatic research assistant by (1) conducting deep understanding of a large collection of human-written papers in a target domain and constructing comprehensive background knowledge graphs (KGs); (2) creating new ideas by predicting links from the background KGs, by combining graph attention and contextual text attention; (3) incrementally writing some key elements of a new paper based on memory-attention networks: from the input title along with predicted related entities to generate a paper abstract, from the abstract to generate conclusion and future work, and finally from future work to generate a title for a follow-on paper. Turing Tests, where a biomedical domain expert is asked to compare a system output and a human-authored string, show PaperRobot generated abstracts, conclusion and future work sections, and new titles are chosen over human-written ones up to 30%, 24% and 12% of the time, respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07870

PDF

http://arxiv.org/pdf/1905.07870
Read All
Reinforcement Learning without Ground-Truth State

2019-05-20

Xingyu Lin, Harjatin Singh Baweja, David Held

arXiv_RO

arXiv_RO Knowledge Reinforcement_Learning
Abstract

To perform robot manipulation tasks, a low dimension state of the environment typically needs to be estimated. However, designing a state estimator can sometimes be difficult, especially in environments with deformable objects. An alternative is to learn an end-to-end policy that maps directly from high dimensional sensor inputs to actions. However, if this policy is trained with reinforcement learning, then without a state estimator, it is hard to specify a reward function based on continuous and high dimensional observations. To meet this challenge, we propose a simple indicator reward function for goal-conditioned reinforcement learning: we only give a positive reward when the robot’s observation exactly matches a target goal observation. We show that by utilizing the goal relabeling technique, we can learn with the indicator reward function even in continuous state spaces, in which we do not expect two observations to ever be identical. We propose two methods to further speed up convergence with indicator rewards: reward balancing and reward filtering. We show comparable performance between our method and an oracle which uses the ground-truth state for computing rewards, even though our method only operates on raw observations and does not have access to the ground-truth state. We demonstrate our method in complex tasks in continuous state spaces such as rope manipulation from RGB-D images, without knowledge of the ground truth state.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07866

PDF

http://arxiv.org/pdf/1905.07866
Read All
Not All Parts Are Created Equal: 3D Pose Estimation by Modelling Bi-directional Dependencies of Body Parts

2019-05-20

Jue Wang, Shaoli Huang, Xinchao Wang, Dacheng Tao

arXiv_CV

arXiv_CV Pose_Estimation Prediction
Abstract

Not all the human body parts have the same~degree of freedom~(DOF) due to the physiological structure. For example, the limbs may move more flexibly and freely than the torso does. Most of the existing 3D pose estimation methods, despite the very promising results achieved, treat the body joints equally and consequently often lead to larger reconstruction errors on the limbs. In this paper, we propose a progressive approach that explicitly accounts for the distinct DOFs among the body parts. We model parts with higher DOFs like the elbows, as dependent components of the corresponding parts with lower DOFs like the torso, of which the 3D locations can be more reliably estimated. Meanwhile, the high-DOF parts may, in turn, impose a constraint on where the low-DOF ones lie. As a result, parts with different DOFs supervise one another, yielding physically constrained and plausible pose-estimation results. To further facilitate the prediction of the high-DOF parts, we introduce a pose-attribute estimation, where the relative location of a limb joint with respect to the torso, which has the least DOF of a human body, is explicitly estimated and further fed to the joint-estimation module. The proposed approach achieves very promising results, outperforming the state of the art on several benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07862

PDF

http://arxiv.org/pdf/1905.07862
Read All
Perceptual Values from Observation

2019-05-20

Ashley D. Edwards, Charles L. Isbell

arXiv_AI

arXiv_AI Sparse Reinforcement_Learning
Abstract

Imitation by observation is an approach for learning from expert demonstrations that lack action information, such as videos. Recent approaches to this problem can be placed into two broad categories: training dynamics models that aim to predict the actions taken between states, and learning rewards or features for computing them for Reinforcement Learning (RL). In this paper, we introduce a novel approach that learns values, rather than rewards, directly from observations. We show that by using values, we can significantly speed up RL by removing the need to bootstrap action-values, as compared to sparse-reward specifications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.07861

PDF

http://arxiv.org/pdf/1905.07861
Read All
Three Dimensional Convolutional Neural Network Pruning with Regularization-Based Method

2019-05-20

Yuxin Zhang, Huan Wang, Yang Luo, Lu Yu, Haoji Hu, Hangguan Shan, Tony Q. S. Quek

arXiv_CV

arXiv_CV Regularization CNN
Abstract

Despite enjoying extensive applications in video analysis, three-dimensional convolutional neural networks (3D CNNs)are restricted by their massive computation and storage consumption. To solve this problem, we propose a threedimensional regularization-based neural network pruning method to assign different regularization parameters to different weight groups based on their importance to the network. Further we analyze the redundancy and computation cost for each layer to determine the different pruning ratios. Experiments show that pruning based on our method can lead to 2x theoretical speedup with only 0.41% accuracy loss for 3DResNet18 and 3.28% accuracy loss for C3D. The proposed method performs favorably against other popular methods for model compression and acceleration.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.07555

PDF

http://arxiv.org/pdf/1811.07555
Read All
Multimodal Explanations by Predicting Counterfactuality in Videos

2019-05-20

Atsushi Kanehira, Kentaro Takemoto, Sho Inayoshi, Tatsuya Harada

arXiv_CV

arXiv_CV Action_Recognition Optimization Inference Recognition
Abstract

This study addresses generating counterfactual explanations with multimodal information. Our goal is not only to classify a video into a specific category, but also to provide explanations on why it is not categorized to a specific class with combinations of visual-linguistic information. Requirements that the expected output should satisfy are referred to as counterfactuality in this paper: (1) Compatibility of visual-linguistic explanations, and (2) Positiveness/negativeness for the specific positive/negative class. Exploiting a spatio-temporal region (tube) and an attribute as visual and linguistic explanations respectively, the explanation model is trained to predict the counterfactuality for possible combinations of multimodal information in a post-hoc manner. The optimization problem, which appears during training/inference, can be efficiently solved by inserting a novel neural network layer, namely the maximum subpath layer. We demonstrated the effectiveness of this method by comparison with a baseline of the action recognition datasets extended for this task. Moreover, we provide information-theoretical insight into the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.01263

PDF

http://arxiv.org/pdf/1812.01263
Read All
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective

2019-05-20

Jing Zhang, Wanqing Li, Philip Ogunbona, Dong Xu

arXiv_CV

arXiv_CV Review Survey Transfer_Learning Recognition
Abstract

This paper takes a problem-oriented perspective and presents a comprehensive review of transfer learning methods, both shallow and deep, for cross-dataset visual recognition. Specifically, it categorises the cross-dataset recognition into seventeen problems based on a set of carefully chosen data and label attributes. Such a problem-oriented taxonomy has allowed us to examine how different transfer learning approaches tackle each problem and how well each problem has been researched to date. The comprehensive problem-oriented review of the advances in transfer learning with respect to the problem has not only revealed the challenges in transfer learning for visual recognition, but also the problems (e.g. eight of the seventeen problems) that have been scarcely studied. This survey not only presents an up-to-date technical review for researchers, but also a systematic approach and a reference for a machine learning practitioner to categorise a real problem and to look up for a possible solution accordingly.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1705.04396

PDF

http://arxiv.org/pdf/1705.04396
Read All

19/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL