Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

DSReg: Using Distant Supervision as a Regularizer

2019-05-28

Yuxian Meng, Muyu Li, Wei Wu, Jiwei Li

arXiv_CL

arXiv_CL Text_Classification Classification
Abstract

In this paper, we aim at tackling a general issue in NLP tasks where some of the negative examples are highly similar to the positive examples, i.e., hard-negative examples. We propose the distant supervision as a regularizer (DSReg) approach to tackle this issue. The original task is converted to a multi-task learning problem, in which distant supervision is used to retrieve hard-negative examples. The obtained hard-negative examples are then used as a regularizer. The original target objective of distinguishing positive examples from negative examples is jointly optimized with the auxiliary task objective of distinguishing softened positive (i.e., hard-negative examples plus positive examples) from easy-negative examples. In the neural context, this can be done by outputting the same representation from the last neural layer to different $softmax$ functions. Using this strategy, we can improve the performance of baseline models in a range of different NLP tasks, including text classification, sequence labeling and reading comprehension.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11658

PDF

https://arxiv.org/pdf/1905.11658
Read All
Discrete Infomax Codes for Meta-Learning

2019-05-28

Yoonho Lee, Wonjae Kim, Seungjin Choi

arXiv_CV

arXiv_CV Classification
Abstract

Learning compact discrete representations of data is itself a key task in addition to facilitating subsequent processing. It is also relevant to meta-learning since a latent representation shared across relevant tasks enables a model to adapt to new tasks quickly. In this paper, we present a method for learning a stochastic encoder that yields discrete p-way codes of length d by maximizing the mutual information between representations and labels. We show that previous loss functions for deep metric learning are approximations to this information-theoretic objective function. Our model, Discrete InfoMax Codes (DIMCO), learns to produce a short representation of data that can be used to classify classes with few labeled examples. Our analysis shows that using shorter codes reduces overfitting in the context of few-shot classification. Experiments show that DIMCO requires less memory (i.e., code length) for performance similar to previous methods and that our method is particularly effective when the training dataset is small.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11656

PDF

https://arxiv.org/pdf/1905.11656
Read All
A High-Performance CNN Method for Offline Handwritten Chinese Character Recognition and Visualization

2019-05-28

Pavlo Melnyk, Zhiqiang You, Keqin Li

arXiv_CV

arXiv_CV CNN Recognition
Abstract

Recent researches introduced fast, compact and efficient convolutional neural networks (CNNs) for offline handwritten Chinese character recognition (HCCR). However, many of them did not address the problem of network interpretability. We propose a new architecture of a deep CNN with high recognition performance which is capable of learning deep features for visualization. A special characteristic of our model is the bottleneck layers which enable us to retain its expressiveness while reducing the number of multiply-accumulate operations and the required storage. We introduce a modification of global weighted average pooling (GWAP) - global weighted output average pooling (GWOAP). This paper demonstrates how they allow us to calculate class activation maps (CAMs) in order to indicate the most relevant input character image regions used by our CNN to identify a certain class. Evaluating on the ICDAR-2013 offline HCCR competition dataset, we show that our model enables a relative 0.83% error reduction while having 49% fewer parameters and the same computational cost compared to the current state-of-the-art single-network method trained only on handwritten data. Our solution outperforms even recent residual learning approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.11489

PDF

http://arxiv.org/pdf/1812.11489
Read All
The Nipple-Areola Complex for Criminal Identification

2019-05-28

Wojciech Michal Matkowski, Krzysztof Matkowski, Adams Wai-Kin Kong, Cory Lloyd Hall

arXiv_CV

arXiv_CV Face Deep_Learning Recognition
Abstract

In digital and multimedia forensics, identification of child sexual offenders based on digital evidence images is highly challenging due to the fact that the offender’s face or other obvious characteristics such as tattoos are occluded, covered, or not visible at all. Nevertheless, other naked body parts, e.g., chest are still visible. Some researchers proposed skin marks, skin texture, vein or androgenic hair patterns for criminal and victim identification. There are no available studies of nipple-areola complex (NAC) for offender identification. In this paper, we present a study of offender identification based on the NAC, and we present NTU-Nipple-v1 dataset, which contains 2732 images of 428 different male nipple-areolae. Popular deep learning and hand-crafted recognition methods are evaluated on the provided dataset. The results indicate that the NAC can be a useful characteristic for offender identification.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11651

PDF

https://arxiv.org/pdf/1905.11651
Read All
Image Deformation Meta-Networks for One-Shot Learning

2019-05-28

Zitian Chen, Yanwei Fu, Yu-Xiong Wang, Lin Ma, Wei Liu, Martial Hebert

arXiv_CV

arXiv_CV Recognition
Abstract

Humans can robustly learn novel visual concepts even when images undergo various deformations and loose certain information. Mimicking the same behavior and synthesizing deformed instances of new concepts may help visual recognition systems perform better one-shot learning, i.e., learning concepts from one or few examples. Our key insight is that, while the deformed images may not be visually realistic, they still maintain critical semantic information and contribute significantly to formulating classifier decision boundaries. Inspired by the recent progress of meta-learning, we combine a meta-learner with an image deformation sub-network that produces additional training examples, and optimize both models in an end-to-end manner. The deformation sub-network learns to deform images by fusing a pair of images – a probe image that keeps the visual content and a gallery image that diversifies the deformations. We demonstrate results on the widely used one-shot learning benchmarks (miniImageNet and ImageNet 1K Challenge datasets), which significantly outperform state-of-the-art approaches.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11641

PDF

https://arxiv.org/pdf/1905.11641
Read All
AmoebaContact and GDFold: a new pipeline for rapid prediction of protein structures

2019-05-28

Wenzhi Mao, Wenze Ding, Haipeng Gong

arXiv_CV

arXiv_CV Prediction Gradient_Descent
Abstract

Native contacts between residues could be predicted from the amino acid sequence of proteins, and the predicted contact information could assist the de novo protein structure prediction. Here, we present a novel pipeline of a residue contact predictor AmoebaContact and a contact-assisted folder GDFold for rapid protein structure prediction. Unlike mainstream contact predictors that utilize human-designed neural networks, AmoebaContact adopts a set of network architectures that are found as optimal for contact prediction through automatic searching and predicts the residue contacts at a series of cutoffs. Different from conventional contact-assisted folders that only use top-scored contact pairs, GDFold considers all residue pairs from the prediction results of AmoebaContact in a differentiable loss function and optimizes the atom coordinates using the gradient descent algorithm. Combination of AmoebaContact and GDFold allows quick reconstruction of the protein structure, with comparable model quality to the state-of-the-art protein structure prediction methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11640

PDF

https://arxiv.org/pdf/1905.11640
Read All
Towards robust audio spoofing detection: a detailed comparison of traditional and learned features

2019-05-28

Balamurali BT, Kin Wah Edward Lin, Simon Lui, Jer-Ming Chen, Dorien Herremans

arXiv_SD

arXiv_SD Prediction Detection
Abstract

Automatic speaker verification, like every other biometric system, is vulnerable to spoofing attacks. Using only a few minutes of recorded voice of a genuine client of a speaker verification system, attackers can develop a variety of spoofing attacks that might trick such systems. Detecting these attacks using the audio cues present in the recordings is an important challenge. Most existing spoofing detection systems depend on knowing the used spoofing technique. With this research, we aim at overcoming this limitation, by examining robust audio features, both traditional and those learned through an autoencoder, that are generalizable over different types of replay spoofing. Furthermore, we provide a detailed account of all the steps necessary in setting up state-of-the-art audio feature detection, pre-, and postprocessing, such that the (non-audio expert) machine learning researcher can implement such systems. Finally, we evaluate the performance of our robust replay speaker detection system with a wide variety and different combinations of both extracted and machine learned audio features on the `out in the wild’ ASVspoof 2017 dataset. This dataset contains a variety of new spoofing configurations. Since our focus is on examining which features will ensure robustness, we base our system on a traditional Gaussian Mixture Model-Universal Background Model. We then systematically investigate the relative contribution of each feature set. The fused models, based on both the known audio features and the machine learned features respectively, have a comparable performance with an Equal Error Rate (EER) of 12. The final best performing model, which obtains an EER of 10.8, is a hybrid model that contains both known and machine learned features, thus revealing the importance of incorporating both types of features when developing a robust spoofing prediction model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12439

PDF

http://arxiv.org/pdf/1905.12439
Read All
LatentGNN: Learning Efficient Non-local Relations for Visual Recognition

2019-05-28

Songyang Zhang, Shipeng Yan, Xuming He

arXiv_CV

arXiv_CV CNN Relation Recognition
Abstract

Capturing long-range dependencies in feature representations is crucial for many visual recognition tasks. Despite recent successes of deep convolutional networks, it remains challenging to model non-local context relations between visual features. A promising strategy is to model the feature context by a fully-connected graph neural network (GNN), which augments traditional convolutional features with an estimated non-local context representation. However, most GNN-based approaches require computing a dense graph affinity matrix and hence have difficulty in scaling up to tackle complex real-world visual problems. In this work, we propose an efficient and yet flexible non-local relation representation based on a novel class of graph neural networks. Our key idea is to introduce a latent space to reduce the complexity of graph, which allows us to use a low-rank representation for the graph affinity matrix and to achieve a linear complexity in computation. Extensive experimental evaluations on three major visual recognition tasks show that our method outperforms the prior works with a large margin while maintaining a low computation cost.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11634

PDF

https://arxiv.org/pdf/1905.11634
Read All
Union Visual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

2019-05-28

Zih-Siou Hung, Arun Mallya, Svetlana Lazebnik

arXiv_CV

arXiv_CV Image_Caption Embedding Detection Relation
Abstract

Relations amongst entities play a central role in image understanding. Due to the combinatorial complexity of modeling (subject, predicate, object) relation triplets, it is crucial to develop a method that can not only recognize seen relations, but also generalize well to unseen cases. Inspired by Visual Translation Embedding network (VTransE), we propose the Union Visual Translation Embedding network (UVTransE) to capture both common and rare relations with better accuracy. UVTransE maps the subject, the object, and the union (subject, object) image regions into a low-dimensional relation space where a predicate can be expressed as a vector subtraction, such that predicate $\approx$ union (subject, object) $-$ subject $-$ object. We present a comprehensive evaluation of our method on multiple challenging benchmarks: the Visual Relationship Detection dataset (VRD); UnRel dataset for rare and unusual relations; two subsets of Visual Genome; and the Open Images Challenge. Our approach decisively outperforms VTransE and comes close to or exceeds the state of the art across a range of settings, from small-scale to large-scale datasets, from common to previously unseen relations. On Visual Genome and Open Images, it also achieves promising results on the recently introduced task of scene graph generation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11624

PDF

https://arxiv.org/pdf/1905.11624
Read All
Learning V1 Simple Cells with Vector Representations of Local Contents and Matrix Representations of Local Motions

2019-05-28

Ruiqi Gao, Jianwen Xie, Song-Chun Zhu, Ying Nian Wu

arXiv_CV

arXiv_CV Inference Relation
Abstract

Simple cells in primary visual cortex (V1) can be approximated by Gabor filters, and adjacent simple cells tend to have quadrature phase relationship. This paper entertains the hypothesis that a key purpose of such simple cells is to perceive local motions, i.e., displacements of pixels, caused by the relative motions between the agent and the surrounding environment. Specifically, we propose a representational model that couples the vector representations of local image contents with the matrix representations of local pixel displacements. When the image changes from one time frame to the next due to pixel displacements, the vector at each pixel is rotated by a matrix that represents the displacement of this pixel. We show that by learning from pair of images that are deformed versions of each other, we can learn both vector and matrix representations. The units in the learned vector representations reproduce properties of V1 simple cells. The learned model enables perceptual inference of local motions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.03871

PDF

http://arxiv.org/pdf/1902.03871
Read All
Next-Generation Inertial Navigation Computation Based on Functional Iteration

2019-05-28

Yuanxin Wu

arXiv_RO

arXiv_RO
Abstract

Inertial navigation computation is to acquire the attitude, velocity and position information of a moving body by integrating inertial measurements from gyroscopes and accelerometers. Over half a century has witnessed great efforts in coping with the motion non-commutativity errors to accurately compute the navigation information as far as possible, so as not to comprise the quality measurements of inertial sensors. Highly dynamic applications and the forthcoming cold-atom precision inertial navigation systems demand for even more accurate inertial navigation computation. The paper gives birth to an ultimate inertial navigation algorithm to fulfill that demand, named the iNavFIter, which is based on a brand new framework of functional iterative integration and Chebyshev polynomials. Remarkably, the proposed iNavFIter reduces the non-commutativity errors to almost machine precision, namely, the coning/sculling/scrolling errors that have perplexed the navigation community for long. Numerical results are provided to demonstrate its accuracy superiority over the-state-of-the-art inertial navigation algorithms at affordable computation cost.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11615

PDF

https://arxiv.org/pdf/1905.11615
Read All
Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network

2019-05-28

Kun Xu, Liwei Wang, Mo Yu, Yansong Feng, Yan Song, Zhiguo Wang, Dong Yu

arXiv_CL

arXiv_CL Knowledge_Graph Knowledge Attention Embedding
Abstract

Previous cross-lingual knowledge graph (KG) alignment studies rely on entity embeddings derived only from monolingual KG structural information, which may fail at matching entities that have different facts in two KGs. In this paper, we introduce the topic entity graph, a local sub-graph of an entity, to represent entities with their contextual information in KG. From this view, the KB-alignment task can be formulated as a graph matching problem; and we further propose a graph-attention based solution, which first matches all entities in two topic entity graphs, and then jointly model the local matching information to derive a graph-level matching vector. Experiments show that our model outperforms previous state-of-the-art methods by a large margin.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11605

PDF

https://arxiv.org/pdf/1905.11605
Read All
Differentiable Algorithm Networks for Composable Robot Learning

2019-05-28

Peter Karkus, Xiao Ma, David Hsu, Leslie Pack Kaelbling, Wee Sun Lee, Tomas Lozano-Perez

arXiv_RO

arXiv_RO
Abstract

This paper introduces the Differentiable Algorithm Network (DAN), a composable architecture for robot learning systems. A DAN is composed of neural network modules, each encoding a differentiable robot algorithm and an associated model; and it is trained end-to-end from data. DAN combines the strengths of model-driven modular system design and data-driven end-to-end learning. The algorithms and models act as structural assumptions to reduce the data requirements for learning; end-to-end learning allows the modules to adapt to one another and compensate for imperfect models and algorithms, in order to achieve the best overall system performance. We illustrate the DAN methodology through a case study on a simulated robot system, which learns to navigate in complex 3-D environments with only local visual observations and an image of a partially correct 2-D floor map.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11602

PDF

https://arxiv.org/pdf/1905.11602
Read All
GraphNVP: An Invertible Flow Model for Generating Molecular Graphs

2019-05-28

Kaushalya Madhawa, Katushiko Ishiguro, Kosuke Nakago, Motoki Abe

arXiv_AI

arXiv_AI
Abstract

We propose GraphNVP, the first invertible, normalizing flow-based molecular graph generation model. We decompose the generation of a graph into two steps: generation of (i) an adjacency tensor and (ii) node attributes. This decomposition yields the exact likelihood maximization on graph-structured data, combined with two novel reversible flows. We empirically demonstrate that our model efficiently generates valid molecular graphs with almost no duplicated molecules. In addition, we observe that the learned latent space can be used to generate molecules with desired chemical properties.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.11600

PDF

http://arxiv.org/pdf/1905.11600
Read All
Brain-inspired reverse adversarial examples

2019-05-28

Shaokai Ye, Sia Huat Tan, Kaidi Xu, Yanzhi Wang, Chenglong Bao, Kaisheng Ma

arXiv_AI

arXiv_AI Adversarial Attention Deep_Learning
Abstract

A human does not have to see all elephants to recognize an animal as an elephant. On contrast, current state-of-the-art deep learning approaches heavily depend on the variety of training samples and the capacity of the network. In practice, the size of network is always limited and it is impossible to access all the data samples. Under this circumstance, deep learning models are extremely fragile to human-imperceivable adversarial examples, which impose threats to all safety critical systems. Inspired by the association and attention mechanisms of the human brain, we propose reverse adversarial examples method that can greatly improve models’ robustness on unseen data. Experiments show that our reverse adversarial method can improve accuracy on average 19.02% on ResNet18, MobileNet, and VGG16 on unseen data transformation. Besides, the proposed method is also applicable to compressed models and shows potential to compensate the robustness drop brought by model quantization - an absolute 30.78% accuracy improvement.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12171

PDF

http://arxiv.org/pdf/1905.12171
Read All
Adaptive Lighting for Data-Driven Non-Line-of-Sight 3D Localization and Object Identification

2019-05-28

Sreenithy Chandran, Suren Jayasuriya

arXiv_CV

arXiv_CV Object_Detection Face Detection
Abstract

Non-line-of-sight (NLOS) imaging of objects not visible to either the camera or illumination source is a challenging task with vital applications including surveillance and robotics. Recent NLOS reconstruction advances have been achieved using time-resolved measurements which requires expensive and specialized detectors and laser sources. In contrast, we propose a data-driven approach for NLOS 3D localization requiring only a conventional camera and projector. We achieve an average identification of 79% object identification for three classes of objects, and localization of the NLOS object’s centroid for a mean-squared error (MSE) of 2.89cm in the occluded region for real data taken from a hardware prototype. To generalize to line-of-sight (LOS) scenes with non-planar surfaces, we introduce an adaptive lighting algorithm. This algorithm, based on radiosity, identifies and illuminates scene patches in the LOS which most contribute to the NLOS light paths, and can factor in system power constraints. We further improve our average NLOS object identification to 87.8% accuracy and localization to 1.94cm MSE on a complex LOS scene using adaptive lighting for real data, demonstrating the advantage of combining the physics of light transport with active illumination for data-driven NLOS imaging.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11595

PDF

https://arxiv.org/pdf/1905.11595
Read All
Efficient Wrapper Feature Selection using Autoencoder and Model Based Elimination

2019-05-28

Sharan Ramjee, Aly El Gamal

arXiv_AI

arXiv_AI Salient Classification
Abstract

We propose a computationally efficient wrapper feature selection method - called Autoencoder and Model Based Elimination of features using Relevance and Redundancy scores (AMBER) - that uses a single ranker model along with autoencoders to perform greedy backward elimination of features. The ranker model is used to prioritize the removal of features that are not critical to the classification task, while the autoencoders are used to prioritize the elimination of correlated features. We demonstrate the superior feature selection ability of AMBER on 4 well known datasets corresponding to different domain applications via comparing the classification accuracies with other computationally efficient state-of-the-art feature selection techniques. Interestingly, we find that the ranker model that is used for feature selection does not necessarily have to be the same as the final classifier that is trained on the selected features. Finally, we note how a smaller number of features can lead to higher accuracies on some datasets, and hypothesize that overfitting the ranker model on the training set facilitates the selection of more salient features.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.11592

PDF

http://arxiv.org/pdf/1905.11592
Read All
Local Label Propagation for Large-Scale Semi-Supervised Learning

2019-05-28

Chengxu Zhuang, Xuehao Ding, Divyanshu Murli, Daniel Yamins

arXiv_CV

arXiv_CV Embedding Recognition
Abstract

A significant issue in training deep neural networks to solve supervised learning tasks is the need for large numbers of labelled datapoints. The goal of semi-supervised learning is to leverage ubiquitous unlabelled data, together with small quantities of labelled data, to achieve high task performance. Though substantial recent progress has been made in developing semi-supervised algorithms that are effective for comparatively small datasets, many of these techniques do not scale readily to the large (unlaballed) datasets characteristic of real-world applications. In this paper we introduce a novel approach to scalable semi-supervised learning, called Local Label Propagation (LLP). Extending ideas from recent work on unsupervised embedding learning, LLP first embeds datapoints, labelled and otherwise, in a common latent space using a deep neural network. It then propagates pseudolabels from known to unknown datapoints in a manner that depends on the local geometry of the embedding, taking into account both inter-point distance and local data density as a weighting on propagation likelihood. The parameters of the deep embedding are then trained to simultaneously maximize pseudolabel categorization performance as well as a metric of the clustering of datapoints within each psuedo-label group, iteratively alternating stages of network training and label propagation. We illustrate the utility of the LLP method on the ImageNet dataset, achieving results that outperform previous state-of-the-art scalable semi-supervised learning algorithms by large margins, consistently across a wide variety of training regimes. We also show that the feature representation learned with LLP transfers well to scene recognition in the Places 205 dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11581

PDF

https://arxiv.org/pdf/1905.11581
Read All
Improving Action Localization by Progressive Cross-stream Cooperation

2019-05-28

Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu

arXiv_CV

arXiv_CV Object_Detection Segmentation Classification Detection
Abstract

Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal segmentation. In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework to use both region proposals and features from one stream (i.e. Flow/RGB) to help another stream (i.e. RGB/Flow) to iteratively improve action localization results and generate better bounding boxes in an iterative fashion. Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models. Second, we also propose a new message passing approach to pass information from one stream to another stream in order to learn better representations, which also leads to better action detection models. As a result, our iterative framework progressively improves action localization results at the frame level. To improve action localization results at the video level, we additionally propose a new strategy to train class-specific actionness detectors for better temporal segmentation, which can be readily learnt by focusing on “confusing” samples from the same action class. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11575

PDF

https://arxiv.org/pdf/1905.11575
Read All
JGAN: A Joint Formulation of GAN for Synthesizing Images and Labels

2019-05-28

Minje Park

arXiv_CV

arXiv_CV GAN
Abstract

Image generation with explicit condition or label generally works better than unconditional image generation. In modern GAN frameworks, both generator and discriminator are formulated to model the conditional distribution of images given with labels. In this paper, we provide an alternative formulation of GAN which models joint distribution of images and labels. There are two advantages in this joint formulation over conditional approaches. The first advantage is that the joint formulation is more robust to label noises, and the second is we can use any kind of weak labels (or additional information which has dependence on the original image data) to enhance unconditional image generation. We will show the effectiveness of joint formulation in CIFAR-10, CIFAR-100, and STL dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11574

PDF

https://arxiv.org/pdf/1905.11574
Read All
Case-Based Histopathological Malignancy Diagnosis using Convolutional Neural Networks

2019-05-28

Qicheng Lao, Thomas Fevens

arXiv_CV

arXiv_CV CNN Classification
Abstract

In practice, histopathological diagnosis of tumor malignancy often requires a human expert to scan through histopathological images at multiple magnification levels, after which a final diagnosis can be accurately determined. However, previous research on such classification tasks using convolutional neural networks primarily determine a diagnosis for a single magnification level. In this paper, we propose a case-based approach using deep residual neural networks for histopathological malignancy diagnosis, where a case is defined as a sequence of images from the patient at all available levels of magnification. Effectively, through mimicking what a human expert would actually do, our approach makes a diagnosis decision based on features learned in combination at multiple magnification levels. Our results show that the case-based approach achieves better performance than the state-of-the-art methods when evaluated on BreaKHis, a histopathological image dataset for breast tumors.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11567

PDF

https://arxiv.org/pdf/1905.11567
Read All
Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion

2019-05-28

Andy T. Liu, Po-chun Hsu, Hung-yi Lee

arXiv_CL

arXiv_CL Adversarial
Abstract

We present an unsupervised end-to-end training scheme where we discover discrete subword units from speech without using any labels. The discrete subword units are learned under an ASR-TTS autoencoder reconstruction setting, where an ASR-Encoder is trained to discover a set of common linguistic units given a variety of speakers, and a TTS-Decoder trained to project the discovered units back to the designated speech. We propose a discrete encoding method, Multilabel-Binary Vectors (MBV), to make the ASR-TTS autoencoder differentiable. We found that the proposed encoding method offers automatic extraction of speech content from speaker style, and is sufficient to cover full linguistic content in a given language. Therefore, the TTS-Decoder can synthesize speech with the same content as the input of ASR-Encoder but with different speaker characteristics, which achieves voice conversion (VC). We further improve the quality of VC using adversarial training, where we train a TTS-Patcher that augments the output of TTS-Decoder. Objective and subjective evaluations show that the proposed approach offers strong VC results as it eliminates speaker identity while preserving content within speech. In the ZeroSpeech 2019 Challenge, we achieved outstanding performance in terms of low bitrate.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11563

PDF

https://arxiv.org/pdf/1905.11563
Read All
Leap-LSTM: Enhancing Long Short-Term Memory for Text Categorization

2019-05-28

Ting Huang, Gehui Shen, Zhi-Hong Deng

arXiv_CL

arXiv_CL Sentiment Ontology RNN Classification
Abstract

Recurrent Neural Networks (RNNs) are widely used in the field of natural language processing (NLP), ranging from text categorization to question answering and machine translation. However, RNNs generally read the whole text from beginning to end or vice versa sometimes, which makes it inefficient to process long texts. When reading a long document for a categorization task, such as topic categorization, large quantities of words are irrelevant and can be skipped. To this end, we propose Leap-LSTM, an LSTM-enhanced model which dynamically leaps between words while reading texts. At each step, we utilize several feature encoders to extract messages from preceding texts, following texts and the current word, and then determine whether to skip the current word. We evaluate Leap-LSTM on several text categorization tasks: sentiment analysis, news categorization, ontology classification and topic classification, with five benchmark data sets. The experimental results show that our model reads faster and predicts better than standard LSTM. Compared to previous models which can also skip words, our model achieves better trade-offs between performance and efficiency.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11558

PDF

https://arxiv.org/pdf/1905.11558
Read All
Target-Guided Open-Domain Conversation

2019-05-28

Jianheng Tang, Tiancheng Zhao, Chengyan Xiong, Xiaodan Liang, Eric P. Xing, Zhiting Hu

arXiv_CL

arXiv_CL Quantitative Recommendation
Abstract

Many real-world open-domain conversation applications have specific goals to achieve during open-ended chats, such as recommendation, psychotherapy, education, etc. We study the problem of imposing conversational goals on open-domain chat agents. In particular, we want a conversational system to chat naturally with human and proactively guide the conversation to a designated target subject. The problem is challenging as no public data is available for learning such a target-guided strategy. We propose a structured approach that introduces coarse-grained keywords to control the intended content of system responses. We then attain smooth conversation transition through turn-level supervised learning, and drive the conversation towards the target with discourse-level constraints. We further derive a keyword-augmented conversation dataset for the study. Quantitative and human evaluations show our system can produce meaningful and effective conversations, significantly improving over other approaches.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11553

PDF

https://arxiv.org/pdf/1905.11553
Read All
Room-temperature stability of excitons and transverse-electric polarized deep-ultraviolet luminescence in atomically thin GaN quantum wells

2019-05-28

Dylan Bayerl, Emmanouil Kioupakis

arXiv_CV

arXiv_CV GAN Prediction
Abstract

Quantum confinement profoundly affects the properties and interactions of electrons, holes, and excitons in nanomaterials. We apply first-principles calculations to study the effects of extreme quantum confinement on the electronic, excitonic, and radiative properties of atomically thin GaN quantum wells with a thickness of 1 to 4 atomic monolayers embedded in AlN. We determine the quasiparticle band gaps, exciton energies and wave functions, radiative lifetimes, and Mott critical densities as a function of well and barrier thickness. Our results show that quantum confinement in GaN monolayers increases the band gap up to 5.44 eV and the exciton binding energy up 215 meV, indicating the thermal stability of excitons at room temperature. Exciton radiative lifetimes range from 1 to 3 nanoseconds at room temperature, while the Mott critical density for exciton dissociation is approximately 10$^{13}$ cm$^{-2}$. The luminescence is transverse-electric polarized, which facilitates light extraction from c-plane heterostructures. We also introduce a simple approximate model for calculating the exciton radiative lifetime based on the free-carrier bimolecular radiative recombination coefficient and the exciton radius, which agrees well with our results obtained with the Bethe-Salpeter equation predictions. Our results demonstrate that atomically thin GaN quantum wells exhibit stable excitons at room temperature for potential applications in efficient light emitters in the deep ultraviolet, as well as room-temperature excitonic devices.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11551

PDF

https://arxiv.org/pdf/1905.11551
Read All
Label Universal Targeted Attack

2019-05-27

Naveed Akhtar, Mohammad A. A. K. Jalwana, Mohammed Bennamoun, Ajmal Mian

arXiv_CV

arXiv_CV Face Optimization
Abstract

We introduce Label Universal Targeted Attack (LUTA) that makes a deep model predict a label of attacker’s choice for `any’ sample of a given source class with high probability. Our attack stochastically maximizes the log-probability of the target label for the source class with first order gradient optimization, while accounting for the gradient moments. It also suppresses the leakage of attack information to the non-source classes for avoiding the attack suspicions. The perturbations resulting from our attack achieve high fooling ratios on the large-scale ImageNet and VGGFace models, and transfer well to the Physical World. Given full control over the perturbation scope in LUTA, we also demonstrate it as a tool for deep model autopsy. The proposed attack reveals interesting perturbation patterns and observations regarding the deep models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11544

PDF

https://arxiv.org/pdf/1905.11544
Read All
Jointly Learning Structured Analysis Discriminative Dictionary and Analysis Multiclass Classifier

2019-05-27

Zhao Zhang, Weiming Jiang, Jie Qin, Li Zhang, Fanzhang Li, Min Zhang, Shuicheng Yan

arXiv_CV

arXiv_CV Sparse Classification
Abstract

In this paper, we propose an analysis mechanism based structured Analysis Discriminative Dictionary Learning (ADDL) framework. ADDL seamlessly integrates the analysis discriminative dictionary learning, analysis representation and analysis classifier training into a unified model. The applied analysis mechanism can make sure that the learnt dictionaries, representations and linear classifiers over different classes are independent and discriminating as much as possible. The dictionary is obtained by minimizing a reconstruction error and an analytical incoherence promoting term that encourages the sub-dictionaries associated with different classes to be independent. To obtain the representation coefficients, ADDL imposes a sparse l2,1-norm constraint on the coding coefficients instead of using l0 or l1-norm, since the l0 or l1-norm constraint applied in most existing DL criteria makes the training phase time consuming. The codes-extraction projection that bridges data with the sparse codes by extracting special features from the given samples is calculated via minimizing a sparse codes approximation term. Then we compute a linear classifier based on the approximated sparse codes by an analysis mechanism to simultaneously consider the classification and representation powers. Thus, the classification approach of our model is very efficient, because it can avoid the extra time-consuming sparse reconstruction process with trained dictionary for each new test data as most existing DL algorithms. Simulations on real image databases demonstrate that our ADDL model can obtain superior performance over other state-of-the-arts.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11543

PDF

https://arxiv.org/pdf/1905.11543
Read All
Actor-Attention-Critic for Multi-Agent Reinforcement Learning

2019-05-27

Shariq Iqbal, Fei Sha

arXiv_AI

arXiv_AI Adversarial Attention Reinforcement_Learning
Abstract

Reinforcement learning in multi-agent scenarios is important for real-world applications but presents challenges beyond those seen in single-agent settings. We present an actor-critic algorithm that trains decentralized policies in multi-agent settings, using centrally computed critics that share an attention mechanism which selects relevant information for each agent at every timestep. This attention mechanism enables more effective and scalable learning in complex multi-agent environments, when compared to recent approaches. Our approach is applicable not only to cooperative settings with shared rewards, but also individualized reward settings, including adversarial settings, as well as settings that do not provide global states, and it makes no assumptions about the action spaces of the agents. As such, it is flexible enough to be applied to most multi-agent learning problems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.02912

PDF

http://arxiv.org/pdf/1810.02912
Read All
Semantic Fisher Scores for Task Transfer: Using Objects to Classify Scenes

2019-05-27

Mandar Dixit, Yunsheng Li, Nuno Vasconcelos

arXiv_CV

arXiv_CV Classification
Abstract

The transfer of a neural network (CNN) trained to recognize objects to the task of scene classification is considered. A Bag-of-Semantics (BoS) representation is first induced, by feeding scene image patches to the object CNN, and representing the scene image by the ensuing bag of posterior class probability vectors (semantic posteriors). The encoding of the BoS with a Fisher vector(FV) is then studied. A link is established between the FV of any probabilistic model and the Q-function of the expectation-maximization(EM) algorithm used to estimate its parameters by maximum likelihood. A network implementation of the MFA Fisher Score (MFA-FS), denoted as the MFAFSNet, is finally proposed to enable end-to-end training. Experiments with various object CNNs and datasets show that the approach has state-of-the-art transfer performance. Somewhat surprisingly, the scene classification results are superior to those of a CNN explicitly trained for scene classification, using a large scene dataset (Places). This suggests that holistic analysis is insufficient for scene classification. The modeling of local object semantics appears to be at least equally important. The two approaches are also shown to be strongly complementary, leading to very large scene classification gains when combined, and outperforming all previous scene classification approaches by a sizeable margin

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11539

PDF

https://arxiv.org/pdf/1905.11539
Read All
A Simple Saliency Method That Passes the Sanity Checks

2019-05-27

Arushi Gupta, Sanjeev Arora

arXiv_AI

arXiv_AI Salient Classification
Abstract

There is great interest in saliency methods (also called attribution methods), which give “explanations” for a deep net’s decision, by assigning a score to each feature/pixel in the input. Their design usually involves credit-assignment via the gradient of the output with respect to input. Recently Adebayo et al. [arXiv:1810.03292] questioned the validity of many of these methods since they do not pass simple sanity checks which test whether the scores shift/vanish when layers of the trained net are randomized, or when the net is retrained using random labels for inputs. We propose a simple fix to existing saliency methods that helps them pass sanity checks, which we call competition for pixels. This involves computing saliency maps for all possible labels in the classification task, and using a simple competition among them to identify and remove less relevant pixels from the map. The simplest variant of this is Competitive Gradient $\odot$ Input (CGI): it is efficient, requires no additional training, and uses only the input and gradient. Some theoretical justification is provided for it (especially for ReLU networks) and its performance is empirically demonstrated.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12152

PDF

http://arxiv.org/pdf/1905.12152
Read All
CGaP: Continuous Growth and Pruning for Efficient Deep Learning

2019-05-27

Xiaocong Du, Zheng Li, Yu Cao

arXiv_CV

arXiv_CV Inference Deep_Learning
Abstract

Today a canonical approach to reduce the computation cost of Deep Neural Networks (DNNs) is to pre-define an over-parameterized model before training to guarantee the learning capacity, and then prune unimportant learning units (filters and neurons) during training to improve model compactness. We argue it is unnecessary to introduce redundancy at the beginning of the training but then reduce redundancy for the ultimate inference model. In this paper, we propose a Continuous Growth and Pruning (CGaP) scheme to minimize the redundancy from the beginning. CGaP starts the training from a small network seed, then expands the model continuously by reinforcing important learning units, and finally prunes the network to obtain a compact and accurate model. As the growth phase favors important learning units, CGaP provides a clear learning purpose to the pruning phase. Experimental results on representative datasets and DNN architectures demonstrate that CGaP outperforms previous pruning-only approaches that deal with pre-defined structures. For VGG-19 on CIFAR-100 and SVHN datasets, CGaP reduces the number of parameters by 78.9% and 85.8%, FLOPs by 53.2% and 74.2%, respectively; For ResNet-110 On CIFAR-10, CGaP reduces 64.0% number of parameters and 63.3% FLOPs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11533

PDF

https://arxiv.org/pdf/1905.11533
Read All
Compositional pre-training for neural semantic parsing

2019-05-27

Amir Ziai

arXiv_CL

arXiv_CL Knowledge Inference
Abstract

Semantic parsing is the process of translating natural language utterances into logical forms, which has many important applications such as question answering and instruction following. Sequence-to-sequence models have been very successful across many NLP tasks. However, a lack of task-specific prior knowledge can be detrimental to the performance of these models. Prior work has used frameworks for inducing grammars over the training examples, which capture conditional independence properties that the model can leverage. Inspired by the recent success stories such as BERT we set out to extend this augmentation framework into two stages. The first stage is to pre-train using a corpus of augmented examples in an unsupervised manner. The second stage is to fine-tune to a domain-specific task. In addition, since the pre-training stage is separate from the training on the main task we also expand the universe of possible augmentations without causing catastrophic inference. We also propose a novel data augmentation strategy that interchanges tokens that co-occur in similar contexts to produce new training pairs. We demonstrate that the proposed two-stage framework is beneficial for improving the parsing accuracy in a standard dataset called GeoQuery for the task of generating logical forms from a set of questions about the US geography.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11531

PDF

https://arxiv.org/pdf/1905.11531
Read All
Analyzing the Interpretability Robustness of Self-Explaining Models

2019-05-27

Haizhong Zheng, Earlence Fernandes, Atul Prakash

arXiv_AI

arXiv_AI Adversarial
Abstract

Recently, interpretable models called self-explaining models (SEMs) have been proposed with the goal of providing interpretability robustness. We evaluate the interpretability robustness of SEMs and show that explanations provided by SEMs as currently proposed are not robust to adversarial inputs. Specifically, we successfully created adversarial inputs that do not change the model outputs but cause significant changes in the explanations. We find that even though current SEMs use stable co-efficients for mapping explanations to output labels, they do not consider the robustness of the first stage of the model that creates interpretable basis concepts from the input, leading to non-robust explanations. Our work makes a case for future work to start examining how to generate interpretable basis concepts in a robust way.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12429

PDF

http://arxiv.org/pdf/1905.12429
Read All
Adaptive Masked Proxies for Few-Shot Segmentation

2019-05-27

Mennatullah Siam, Boris Oreshkin, Martin Jagersand

arXiv_CV

arXiv_CV Segmentation Embedding Semantic_Segmentation Deep_Learning
Abstract

Deep learning has thrived by training on large-scale datasets. However, for continual learning in applications such as robotics, it is critical to incrementally update its model in a sample efficient manner. We propose a novel method that constructs the new class weights from few labelled samples in the support set without back-propagation, relying on our adaptive masked proxies approach. It utilizes multi-resolution average pooling on the output embeddings masked with the label to act as a positive proxy for the new class, while fusing it with the previously learned class signatures. Our proposed method is evaluated on PASCAL-$5^i$ dataset and outperforms the state of the art in the 5-shot semantic segmentation. Unlike previous methods, our proposed approach does not require a second branch to estimate parameters or prototypes, which enables it to be used with 2-stream motion and appearance based segmentation networks. The proposed adaptive proxies allow the method to be used with a continuous data stream. Our online adaptation scheme is evaluated on the DAVIS and FBMS video object segmentation benchmark. We further propose a novel setup for evaluating continual learning of object segmentation which we name incremental PASCAL (iPASCAL) where our method has shown to outperform the baseline method. Code is publicly available at https://github.com/MSiam/AdaptiveMaskedProxies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11123

PDF

http://arxiv.org/pdf/1902.11123
Read All
Improved Training Speed, Accuracy, and Data Utilization Through Loss Function Optimization

2019-05-27

Santiago Gonzalez, Risto Miikkulainen

arXiv_CV

arXiv_CV Image_Classification Optimization Classification
Abstract

As the complexity of neural network models has grown, it has become increasingly important to optimize their design automatically through metalearning. Methods for discovering hyperparameters, topologies, and learning rate schedules have lead to significant increases in performance. This paper shows that loss functions can be optimized with metalearning as well, and result in similar improvements. The method, Genetic Loss-function Optimization (GLO), discovers loss functions de novo, and optimizes them for a target task. Leveraging techniques from genetic programming, GLO builds loss functions hierarchically from a set of operators and leaf nodes. These functions are repeatedly recombined and mutated to find an optimal structure, and then a covariance-matrix adaptation evolutionary strategy (CMA-ES) is used to find optimal coefficients. Networks trained with GLO loss functions are found to outperform the standard cross-entropy loss on standard image classification tasks. Training with these new loss functions requires fewer steps, results in lower test error, and allows for smaller datasets to be used. Loss-function optimization thus provides a new dimension of metalearning, and constitutes an important step towards AutoML.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11528

PDF

https://arxiv.org/pdf/1905.11528
Read All
Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies

2019-05-27

Yonathan Efroni, Nadav Merlis, Mohammad Ghavamzadeh, Shie Mannor

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

State-of-the-art efficient model-based Reinforcement Learning (RL) algorithms typically act by iteratively solving empirical models, i.e., by performing \emph{full-planning} on Markov Decision Processes (MDPs) built by the gathered experience. In this paper, we focus on model-based RL in the finite-state finite-horizon MDP setting and establish that exploring with \emph{greedy policies} – act by \emph{1-step planning} – can achieve tight minimax performance in terms of regret, $\tilde{\mathcal{O}}(\sqrt{HSAT})$. Thus, full-planning in model-based RL can be avoided altogether without any performance degradation, and, by doing so, the computational complexity decreases by a factor of $S$. The results are based on a novel analysis of real-time dynamic programming, then extended to model-based RL. Specifically, we generalize existing algorithms that perform full-planning to such that act by 1-step planning. For these generalizations, we prove regret bounds with the same rate as their full-planning counterparts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.11527

PDF

http://arxiv.org/pdf/1905.11527
Read All
Enhancing Salient Object Segmentation Through Attention

2019-05-27

Anuj Pahuja, Avishek Majumder, Anirban Chakraborty, R. Venkatesh Babu

arXiv_CV

arXiv_CV Salient Segmentation Attention CNN
Abstract

Segmenting salient objects in an image is an important vision task with ubiquitous applications. The problem becomes more challenging in the presence of a cluttered and textured background, low resolution and/or low contrast images. Even though existing algorithms perform well in segmenting most of the object(s) of interest, they often end up segmenting false positives due to resembling salient objects in the background. In this work, we tackle this problem by iteratively attending to image patches in a recurrent fashion and subsequently enhancing the predicted segmentation mask. Saliency features are estimated independently for every image patch, which are further combined using an aggregation strategy based on a Convolutional Gated Recurrent Unit (ConvGRU) network. The proposed approach works in an end-to-end manner, removing background noise and false positives incrementally. Through extensive evaluation on various benchmark datasets, we show superior performance to the existing approaches without any post-processing.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11522

PDF

https://arxiv.org/pdf/1905.11522
Read All
Budgeted Reinforcement Learning in Continuous State Space

2019-05-27

Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01004

PDF

http://arxiv.org/pdf/1903.01004
Read All
Universality Theorems for Generative Models

2019-05-27

Valentin Khrulkov, Ivan Oseledets

arXiv_AI

arXiv_AI
Abstract

Despite the fact that generative models are extremely successful in practice, the theory underlying this phenomenon is only starting to catch up with practice. In this work we address the question of the universality of generative models: is it true that neural networks can approximate any data manifold arbitrarily well? We provide a positive answer to this question and show that under mild assumptions on the activation function one can always find a feedforward neural network that maps the latent space onto a set located within the specified Hausdorff distance from the desired data manifold. We also prove similar theorems for the case of multiclass generative models and cycle generative models, trained to map samples from one manifold to another and vice versa.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.11520

PDF

http://arxiv.org/pdf/1905.11520
Read All
A Knowledge Graph-based Approach for Exploring the U.S. Opioid Epidemic

2019-05-27

Maulik R. Kamdar, Tymor Hamamsy, Shea Shelton, Ayin Vala, Tome Eftimov, James Zou, Suzanne Tamang

arXiv_AI

arXiv_AI Knowledge_Graph Knowledge
Abstract

The United States is in the midst of an opioid epidemic with recent estimates indicating that more than 130 people die every day due to drug overdose. The over-prescription and addiction to opioid painkillers, heroin, and synthetic opioids, has led to a public health crisis and created a huge social and economic burden. Statistical learning methods that use data from multiple clinical centers across the US to detect opioid over-prescribing trends and predict possible opioid misuse are required. However, the semantic heterogeneity in the representation of clinical data across different centers makes the development and evaluation of such methods difficult and non-trivial. We create the Opioid Drug Knowledge Graph (ODKG) – a network of opioid-related drugs, active ingredients, formulations, combinations, and brand names. We use the ODKG to normalize drug strings in a clinical data warehouse consisting of patient data from over 400 healthcare facilities in 42 different states. We showcase the use of ODKG to generate summary statistics of opioid prescription trends across US regions. These methods and resources can aid the development of advanced and scalable models to monitor the opioid epidemic and to detect illicit opioid misuse behavior. Our work is relevant to policymakers and pain researchers who wish to systematically assess factors that contribute to opioid over-prescribing and iatrogenic opioid addiction in the US.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.11513

PDF

http://arxiv.org/pdf/1905.11513
Read All
Shape Evasion: Preventing Body Shape Inference of Multi-Stage Approaches

2019-05-27

Hosnieh Sattar, Katharina Krombholz, Gerard Pons-Moll, Mario Fritz

arXiv_CV

arXiv_CV Adversarial Inference Deep_Learning Recommendation
Abstract

Modern approaches to pose and body shape estimation have recently achieved strong performance even under challenging real-world conditions. Even from a single image of a clothed person, a realistic looking body shape can be inferred that captures a users’ weight group and body shape type well. This opens up a whole spectrum of applications – in particular in fashion – where virtual try-on and recommendation systems can make use of these new and automatized cues. However, a realistic depiction of the undressed body is regarded highly private and therefore might not be consented by most people. Hence, we ask if the automatic extraction of such information can be effectively evaded. While adversarial perturbations have been shown to be effective for manipulating the output of machine learning models – in particular, end-to-end deep learning approaches – state of the art shape estimation methods are composed of multiple stages. We perform the first investigation of different strategies that can be used to effectively manipulate the automatic shape estimation while preserving the overall appearance of the original image.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11503

PDF

https://arxiv.org/pdf/1905.11503
Read All
Policy Certificates: Towards Accountable Reinforcement Learning

2019-05-27

Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about the quality of their current policy before executing it, and thus have limited use in high-stakes applications like healthcare. We address this lack of accountability by proposing that algorithms output policy certificates. These certificates bound the sub-optimality and return of the policy in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further introduce two new algorithms with certificates and present a new framework for theoretical analysis that guarantees the quality of their policies and certificates. For tabular MDPs, we show that computing certificates can even improve the sample-efficiency of optimism-based exploration. As a result, one of our algorithms is the first to achieve minimax-optimal PAC bounds up to lower-order terms, and this algorithm also matches (and in some settings slightly improves upon) existing minimax regret bounds.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.03056

PDF

http://arxiv.org/pdf/1811.03056
Read All
FAN: Focused Attention Networks

2019-05-27

Chu Wang, Babak Samari, Vladimir Kim, Siddhartha Chaudhuri, Kaleem Siddiqi

arXiv_CV

arXiv_CV Object_Detection Attention Embedding Classification Detection Relation
Abstract

Attention networks show promise for both vision and language tasks, by emphasizing relationships between constituent elements through appropriate weighting functions. Such elements could be regions in an image output by a region proposal network, or words in a sentence, represented by word embedding. Thus far, however, the learning of attention weights has been driven solely by the minimization of task specific loss functions. We here introduce a method of learning attention weights to better emphasize informative pair-wise relations between entities. The key idea is to use a novel center-mass cross entropy loss, which can be applied in conjunction with the task specific ones. We then introduce a focused attention backbone to learn these attention weights for general tasks. We demonstrate that the focused attention module leads to a new state-of-the-art for the recovery of relations in a relationship proposal task. Our experiments show that it also boosts performance for diverse vision and language tasks, including object detection, scene categorization and document classification.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11498

PDF

https://arxiv.org/pdf/1905.11498
Read All
Near-optimal Optimistic Reinforcement Learning using Empirical Bernstein Inequalities

2019-05-27

Aristide Tossou, Debabrota Basu, Christos Dimitrakakis

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret $\tilde{\mathcal{O}}(\sqrt{DSAT})$ up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.12425

PDF

http://arxiv.org/pdf/1905.12425
Read All
AI Feynman: a Physics-Inspired Method for Symbolic Regression

2019-05-27

Silviu-Marian Udrescu (MIT), Max Tegmark (MIT)

arXiv_AI

arXiv_AI
Abstract

A core challenge for both physics and artificial intellicence (AI) is symbolic regression: finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in principle, functions of practical interest often exhibit symmetries, separability, compositionality and other simplifying properties. In this spirit, we develop a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques. We apply it to 100 equations from the Feynman Lectures on Physics, and it discovers all of them, while previous publicly available software cracks only 71; for a more difficult test set, we improve the state of the art success rate from 15% to 90%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.11481

PDF

http://arxiv.org/pdf/1905.11481
Read All
Infusing domain knowledge in AI-based 'black box' models for better explainability with application in bankruptcy prediction

2019-05-27

Sheikh Rabiul Islam, William Eberle, Sid Bundy, Sheikh Khaled Ghafoor

arXiv_AI

arXiv_AI Knowledge Prediction
Abstract

Although “black box” models such as Artificial Neural Networks, Support Vector Machines, and Ensemble Approaches continue to show superior performance in many disciplines, their adoption in the sensitive disciplines (e.g., finance, healthcare) is questionable due to the lack of interpretability and explainability of the model. In fact, future adoption of “black box” models is difficult because of the recent rule of “right of explanation” by the European Union where a user can ask for an explanation behind an algorithmic decision, and the newly proposed bill by the US government, the “Algorithmic Accountability Act”, which would require companies to assess their machine learning systems for bias and discrimination and take corrective measures. Top Bankruptcy Prediction Models are A.I.-based and are in need of better explainability -the extent to which the internal working mechanisms of an AI system can be explained in human terms. Although explainable artificial intelligence is an emerging field of research, infusing domain knowledge for better explainability might be a possible solution. In this work, we demonstrate a way to collect and infuse domain knowledge into a “black box” model for bankruptcy prediction. Our understanding from the experiments reveals that infused domain knowledge makes the output from the black box model more interpretable and explainable.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.11474

PDF

http://arxiv.org/pdf/1905.11474
Read All
End-to-End Pore Extraction and Matching in Latent Fingerprints: Going Beyond Minutiae

2019-05-27

Dinh-Luan Nguyen, Anil K. Jain

arXiv_CV

arXiv_CV Attention Recognition
Abstract

Latent fingerprint recognition is not a new topic but it has attracted a lot of attention from researchers in both academia and industry over the past 50 years. With the rapid development of pattern recognition techniques, automated fingerprint identification systems (AFIS) have become more and more ubiquitous. However, most AFIS are utilized for live-scan or rolled/slap prints while only a few systems can work on latent fingerprints with reasonable accuracy. The question of whether taking higher resolution scans of latent fingerprints and their rolled/slap mate prints could help improve the identification accuracy still remains an open question in the forensic community. Because pores are one of the most reliable features besides minutiae to identify latent fingerprints, we propose an end-to-end automatic pore extraction and matching system to analyze the utility of pores in latent fingerprint identification. Hence, this paper answers two questions in the latent fingerprint domain: (i) does the incorporation of pores as level-3 features improve the system performance significantly? and (ii) does the 1,000 ppi image resolution improve the recognition results? We believe that our proposed end-to-end pore extraction and matching system will be a concrete baseline for future latent AFIS development.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11472

PDF

https://arxiv.org/pdf/1905.11472
Read All
XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

2019-05-27

Jasdeep Singh, Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

arXiv_CL

arXiv_CL Transfer_Learning Inference
Abstract

While natural language processing systems often focus on a single language, multilingual transfer learning has the potential to improve performance, especially for low-resource languages. We introduce XLDA, cross-lingual data augmentation, a method that replaces a segment of the input text with its translation in another language. XLDA enhances performance of all 14 tested languages of the cross-lingual natural language inference (XNLI) benchmark. With improvements of up to $4.8\%$, training with XLDA achieves state-of-the-art performance for Greek, Turkish, and Urdu. XLDA is in contrast to, and performs markedly better than, a more naive approach that aggregates examples in various languages in a way that each example is solely in one language. On the SQuAD question answering task, we see that XLDA provides a $1.0\%$ performance increase on the English evaluation set. Comprehensive experiments suggest that most languages are effective as cross-lingual augmentors, that XLDA is robust to a wide range of translation quality, and that XLDA is even more effective for randomly initialized models than for pretrained models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11471

PDF

https://arxiv.org/pdf/1905.11471
Read All
Scaleable input gradient regularization for adversarial robustness

2019-05-27

Chris Finlay, Adam M Oberman

arXiv_CV

arXiv_CV Regularization Adversarial
Abstract

Input gradient regularization is not thought to be an effective means for promoting adversarial robustness. In this work we revisit this regularization scheme with some new ingredients. First, we derive new per-image theoretical robustness bounds based on local gradient information, and curvature information when available. These bounds strongly motivate input gradient regularization. Second, we implement a scaleable version of input gradient regularization which avoids double backpropagation: adversarially robust ImageNet models are trained in 33 hours on four consumer grade GPUs. Finally, we show experimentally that input gradient regularization is competitive with adversarial training.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.11468

PDF

https://arxiv.org/pdf/1905.11468
Read All
Bio-inspired digit recognition using reward-modulated spike-timing-dependent plasticity in deep convolutional networks

2019-05-27

Milad Mozafari, Mohammad Ganjtabesh, Abbas Nowzari-Dalini, Simon J. Thorpe, Timothée Masquelier

arXiv_CV

arXiv_CV CNN Recognition
Abstract

The primate visual system has inspired the development of deep artificial neural networks, which have revolutionized the computer vision domain. Yet these networks are much less energy-efficient than their biological counterparts, and they are typically trained with backpropagation, which is extremely data-hungry. To address these limitations, we used a deep convolutional spiking neural network (DCSNN) and a latency-coding scheme. We trained it using a combination of spike-timing-dependent plasticity (STDP) for the lower layers and reward-modulated STDP (R-STDP) for the higher ones. In short, with R-STDP a correct (resp. incorrect) decision leads to STDP (resp. anti-STDP). This approach led to an accuracy of $97.2\%$ on MNIST, without requiring an external classifier. In addition, we demonstrated that R-STDP extracts features that are diagnostic for the task at hand, and discards the other ones, whereas STDP extracts any feature that repeats. Finally, our approach is biologically plausible, hardware friendly, and energy-efficient.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.00227

PDF

http://arxiv.org/pdf/1804.00227
Read All

7/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL