Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

PAWS: Paraphrase Adversaries from Word Scrambling

2019-04-01

Yuan Zhang, Jason Baldridge, Luheng He

arXiv_CL

arXiv_CL
Abstract

Existing paraphrase identification datasets lack sentence pairs that have high lexical overlap without being paraphrases. Models trained on such data fail to distinguish pairs like flights from New York to Florida and flights from Florida to New York. This paper introduces PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 well-formed paraphrase and non-paraphrase pairs with high lexical overlap. Challenging pairs are generated by controlled word swapping and back translation, followed by fluency and paraphrase judgments by human raters. State-of-the-art models trained on existing datasets have dismal performance on PAWS (<40% accuracy); however, including PAWS training data for these models improves their accuracy to 85% while maintaining performance on existing tasks. In contrast, models that do not capture non-local contextual information fail even with PAWS training examples. As such, PAWS provides an effective instrument for driving further progress on models that better exploit structure, context, and pairwise comparisons.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01130

PDF

http://arxiv.org/pdf/1904.01130
Read All
ScriptNet: Neural Static Analysis for Malicious JavaScript Detection

2019-04-01

Jack W. Stokes, Rakshit Agrawal, Geoff McDonald, Matthew Hausknecht

arXiv_AI

arXiv_AI Embedding Detection
Abstract

Malicious scripts are an important computer infection threat vector in the wild. For web-scale processing, static analysis offers substantial computing efficiencies. We propose the ScriptNet system for neural malicious JavaScript detection which is based on static analysis. We use the Convoluted Partitioning of Long Sequences (CPoLS) model, which processes Javascript files as byte sequences. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Unlike previously proposed solutions, our model variants are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Evaluating this model on a large corpus of 212,408 JavaScript files indicates that the best performing CPoLS model offers a 97.20% true positive rate (TPR) for the first 60K byte subsequence at a false positive rate (FPR) of 0.50%. The best performing CPoLS model significantly outperform several baseline models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01126

PDF

http://arxiv.org/pdf/1904.01126
Read All
HYPE: Human eYe Perceptual Evaluation of Generative Models

2019-04-01

Sharon Zhou, Mitchell Gordon, Ranjay Krishna, Austin Narcomey, Durim Morina, Michael S. Bernstein

arXiv_CV

arXiv_CV Adversarial GAN Face
Abstract

Generative models often use human evaluations to determine and justify progress. Unfortunately, existing human evaluation methods are ad-hoc: there is currently no standardized, validated evaluation that: (1) measures perceptual fidelity, (2) is reliable, (3) separates models into clear rank order, and (4) ensures high-quality measurement without intractable cost. In response, we construct Human-eYe Perceptual Evaluation (HYPE), a human metric that is (1) grounded in psychophysics research in perception, (2) reliable across different sets of randomly sampled outputs from a model, (3) results in separable model performances, and (4) efficient in cost and time. We introduce two methods. The first, HYPE-Time, measures visual perception under adaptive time constraints to determine the minimum length of time (e.g., 250ms) that model output such as a generated face needs to be visible for people to distinguish it as real or fake. The second, HYPE-Infinity, measures human error rate on fake and real images with no time constraints, maintaining stability and drastically reducing time and cost. We test HYPE across four state-of-the-art generative adversarial networks (GANs) on unconditional image generation using two datasets, the popular CelebA and the newer higher-resolution FFHQ, and two sampling techniques of model outputs. By simulating HYPE’s evaluation multiple times, we demonstrate consistent ranking of different models, identifying StyleGAN with truncation trick sampling (27.6% HYPE-Infinity deception rate, with roughly one quarter of images being misclassified by humans) as superior to StyleGAN without truncation (19.0%) on FFHQ. See https://hype.stanford.edu for details.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01121

PDF

http://arxiv.org/pdf/1904.01121
Read All
ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

2019-04-01

Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

arXiv_CL

arXiv_CL Attention Optimization
Abstract

We present JHU’s system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT). Anti-spoofing has gathered more and more attention since the inauguration of the ASVspoof Challenges, and ASVspoof 2019 dedicates to address attacks from all three major types: text-to-speech, voice conversion, and replay. Built upon previous research work on Deep Neural Network (DNN), ASSERT is a pipeline for DNN-based approach to anti-spoofing. ASSERT has four components: feature engineering, DNN models, network optimization and system combination, where the DNN models are variants of squeeze-excitation and residual networks. We conducted an ablation study of the effectiveness of each component on the ASVspoof 2019 corpus, and experimental results showed that ASSERT obtained more than 93% and 17% relative improvements over the baseline systems in the two sub-challenges in ASVspooof 2019, ranking ASSERT one of the top performing systems. Code and pretrained models will be made publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01120

PDF

http://arxiv.org/pdf/1904.01120
Read All
Cross-Lingual Transfer Learning for Multilingual Task Oriented Dialog

2019-04-01

Sebastian Schuster, Sonal Gupta, Rushin Shah, Mike Lewis

arXiv_CL

arXiv_CL Embedding Transfer_Learning
Abstract

One of the first steps in the utterance interpretation pipeline of many task-oriented conversational AI systems is to identify user intents and the corresponding slots. Since data collection for machine learning models for this task is time-consuming, it is desirable to make use of existing data in a high-resource language to train models in low-resource languages. However, development of such models has largely been hindered by the lack of multilingual training data. In this paper, we present a new data set of 57k annotated utterances in English (43k), Spanish (8.6k) and Thai (5k) across the domains weather, alarm, and reminder. We use this data set to evaluate three different cross-lingual transfer methods: (1) translating the training data, (2) using cross-lingual pre-trained embeddings, and (3) a novel method of using a multilingual machine translation encoder as contextual word representations. We find that given several hundred training examples in the the target language, the latter two methods outperform translating the training data. Further, in very low-resource settings, multilingual contextual word representations give better results than using cross-lingual static embeddings. We also compare the cross-lingual methods to using monolingual resources in the form of contextual ELMo representations and find that given just small amounts of target language data, this method outperforms all cross-lingual methods, which highlights the need for more sophisticated cross-lingual methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.13327

PDF

http://arxiv.org/pdf/1810.13327
Read All
Deep Industrial Espionage

2019-04-01

Samuel Albanie, James Thewlis, Sebastien Ehrhardt, Joao Henriques

arXiv_CV

arXiv_CV Deep_Learning
Abstract

The theory of deep learning is now considered largely solved, and is well understood by researchers and influencers alike. To maintain our relevance, we therefore seek to apply our skills to under-explored, lucrative applications of this technology. To this end, we propose and Deep Industrial Espionage, an efficient end-to-end framework for industrial information propagation and productisation. Specifically, given a single image of a product or service, we aim to reverse-engineer, rebrand and distribute a copycat of the product at a profitable price-point to consumers in an emerging market—all within in a single forward pass of a Neural Network. Differently from prior work in machine perception which has been restricted to classifying, detecting and reasoning about object instances, our method offers tangible business value in a wide range of corporate settings. Our approach draws heavily on a promising recent arxiv paper until its original authors’ names can no longer be read (we use felt tip pen). We then rephrase the anonymised paper, add the word “novel” to the title, and submit it a prestigious, closed-access espionage journal who assure us that someday, we will be entitled to some fraction of their extortionate readership fees.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01114

PDF

http://arxiv.org/pdf/1904.01114
Read All
Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction

2019-04-01

Florian Knoll, Kerstin Hammernik, Chi Zhang, Steen Moeller, Thomas Pock, Daniel K. Sodickson, Mehmet Akcakaya

arXiv_CV

arXiv_CV Deep_Learning
Abstract

Following the success of deep learning in a wide range of applications, neural network-based machine learning techniques have received interest as a means of accelerating magnetic resonance imaging (MRI). A number of ideas inspired by deep learning techniques from computer vision and image processing have been successfully applied to non-linear image reconstruction in the spirit of compressed sensing for both low dose computed tomography and accelerated MRI. The additional integration of multi-coil information to recover missing k-space lines in the MRI reconstruction process, is still studied less frequently, even though it is the de-facto standard for currently used accelerated MR acquisitions. This manuscript provides an overview of the recent machine learning approaches that have been proposed specifically for improving parallel imaging. A general background introduction to parallel MRI is given that is structured around the classical view of image space and k-space based methods. Both linear and non-linear methods are covered, followed by a discussion of recent efforts to further improve parallel imaging using machine learning, and specifically using artificial neural networks. Image-domain based techniques that introduce improved regularizers are covered as well as k-space based methods, where the focus is on better interpolation strategies using neural networks. Issues and open problems are discussed as well as recent efforts for producing open datasets and benchmarks for the community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01112

PDF

http://arxiv.org/pdf/1904.01112
Read All
Creativity Inspired Zero-Shot Learning

2019-04-01

Mohamed Elhoseiny, Mohamed Elfeki

arXiv_CV

arXiv_CV Knowledge
Abstract

Zero-shot learning (ZSL) aims at understanding unseen categories with no training examples from class-level descriptions. To improve the discriminative power of zero-shot learning, we model the visual learning process of unseen categories with an inspiration from the psychology of human creativity for producing novel art. We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen. We introduce a learning signal inspired by creativity literature that explores the unseen space with hallucinated class-descriptions and encourages careful deviation of their visual feature generations from seen classes while allowing knowledge transfer from seen to unseen classes. Empirically, we show consistent improvement over the state of the art of several percents on the largest available benchmarks on the challenging task or generalized ZSL from a noisy text that we focus on, using the CUB and NABirds datasets. We also show the advantage of our approach on Attribute-based ZSL on three additional datasets (AwA2, aPY, and SUN).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01109

PDF

http://arxiv.org/pdf/1904.01109
Read All
Acceleration of RED via Vector Extrapolation

2019-04-01

Tao Hong, Yaniv Romano, Michael Elad

arXiv_CV

arXiv_CV Regularization
Abstract

Models play an important role in inverse problems, serving as the prior for representing the original signal to be recovered. REgularization by Denoising (RED) is a recently introduced general framework for constructing such priors using state-of-the-art denoising algorithms. Using RED, solving inverse problems is shown to amount to an iterated denoising process. However, as the complexity of denoising algorithms is generally high, this might lead to an overall slow algorithm. In this paper, we suggest an accelerated technique based on vector extrapolation (VE) to speed-up existing RED solvers. Numerical experiments validate the obtained gain by VE, leading to a substantial savings in computations compared with the original fixed-point method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.02158

PDF

http://arxiv.org/pdf/1805.02158
Read All
Fingerprints: Fixed Length Representation via Deep Networks and Domain Knowledge

2019-04-01

Joshua J. Engelsma, Kai Cao, Anil K. Jain

arXiv_CV

arXiv_CV Knowledge CNN
Abstract

We learn a discriminative fixed length feature representation of fingerprints which stands in contrast to commonly used unordered, variable length sets of minutiae points. To arrive at this fixed length representation, we embed fingerprint domain knowledge into a multitask deep convolutional neural network architecture. Empirical results, on two public-domain fingerprint databases (NIST SD4 and FVC 2004 DB1) show that compared to minutiae representations, extracted by two state-of-the-art commercial matchers (Verifinger v6.3 and Innovatrics v2.0.3), our fixed-length representations provide (i) higher search accuracy: Rank-1 accuracy of 97.9% vs. 97.3% on NIST SD4 against a gallery size of 2000 and (ii) significantly faster, large scale search: 682,594 matches per second vs. 22 matches per second for commercial matchers on an i5 3.3 GHz processor with 8 GB of RAM.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01099

PDF

http://arxiv.org/pdf/1904.01099
Read All
Infant-Prints: Fingerprints for Reducing Infant Mortality

2019-04-01

Joshua J. Engelsma, Debayan Deb, Anil K. Jain, Prem S. Sudhish, Anjoo Bhatnager

arXiv_CV

arXiv_CV Recognition
Abstract

In developing countries around the world, a multitude of infants continue to suffer and die from vaccine-preventable diseases, and malnutrition. Lamentably, the lack of any official identification documentation makes it exceedingly difficult to prevent these infant deaths. To solve this global crisis, we propose Infant-Prints which is comprised of (i) a custom, compact, low-cost (85 USD), high-resolution (1,900 ppi) fingerprint reader, (ii) a high-resolution fingerprint matcher, and (iii) a mobile application for search and verification for the infant fingerprint. Using Infant-Prints, we have collected a longitudinal database of infant fingerprints and demonstrate its ability to perform accurate and reliable recognition of infants enrolled at the ages 0-3 months, in time for effective delivery of critical vaccinations and nutritional supplements (TAR=90% @ FAR = 0.1% for infants older than 8 weeks).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01091

PDF

http://arxiv.org/pdf/1904.01091
Read All
Probabilistic Regression of Rotations using Quaternion Averaging and a Deep Multi-Headed Network

2019-04-01

Valentin Peretroukhin, Brandon Wagstaff, Matthew Giamou, Jonathan Kelly

arXiv_CV

arXiv_CV
Abstract

Accurate estimates of rotation are crucial to vision-based motion estimation in augmented reality and robotics. In this work, we present a method to extract probabilistic estimates of rotation from deep regression models. First, we build on prior work and argue that a multi-headed network structure we name HydraNet provides better calibrated uncertainty estimates than methods that rely on stochastic forward passes. Second, we extend HydraNet to targets that belong to the rotation group, SO(3), by regressing unit quaternions and using the tools of rotation averaging and uncertainty injection onto the manifold to produce three-dimensional covariances. Finally, we present results and analysis on a synthetic dataset, learn consistent orientation estimates on the 7-Scenes dataset, and show how we can use our learned covariances to fuse deep estimates of relative orientation with classical stereo visual odometry to improve localization on the KITTI dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03182

PDF

http://arxiv.org/pdf/1904.03182
Read All
Learning Matchable Colorspace Transformations for Long-term Metric Visual Localization

2019-04-01

Lee Clement, Mona Gridseth, Justin Tomasi, Jonathan Kelly

arXiv_CV

arXiv_CV
Abstract

Long-term metric localization is an essential capability of autonomous mobile robots, but remains challenging for vision-based systems in the presence of appearance change caused by lighting, weather or seasonal variations. While experience-based mapping has proven to be an effective technique for enabling visual localization across appearance change, the number of experiences required for reliable long-term localization can be large, and methods for reducing the necessary number of experiences are desired. Taking inspiration from physics-based models of color constancy, we propose a method for learning a nonlinear mapping from RGB to grayscale colorspaces that maximizes the number of feature matches for images captured under varying lighting and weather conditions. Our key insight is that useful image transformations can be learned by approximating conventional non-differentiable localization pipelines with a differentiable learned model that can predict a convenient measure of localization quality, such as the number of feature matches, for a given pair of images. Moreover, we find that the generality of appearance-robust RGB-to-grayscale mappings can be improved by incorporating a learned low-dimensional context feature computed for a specific image pair. Using synthetic and real-world datasets, we show that our method substantially improves feature matching across day-night cycles and presents a viable strategy for significantly improving the efficiency of experience-based visual localization.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01080

PDF

http://arxiv.org/pdf/1904.01080
Read All
Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

2019-04-01

Erdem Bıyık, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh

arXiv_AI

arXiv_AI
Abstract

We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models. Our algorithm guarantees safety by leveraging Lipschitz-continuity to ensure that no unsafe states are visited during exploration. Unlike many other existing techniques, the provided safety guarantee is deterministic. Our algorithm is optimized to reduce the number of actions needed for exploring the safe space. We demonstrate the performance of our algorithm in comparison with baseline methods in simulation on navigation tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01068

PDF

http://arxiv.org/pdf/1904.01068
Read All
Robust Alignment for Panoramic Stitching via an Exact Rank Constraint

2019-04-01

Yuelong Li, Mohammad Tofighi, Vishal Monga

arXiv_CV

arXiv_CV Sparse Optimization
Abstract

We study the problem of image alignment for panoramic stitching. Unlike most existing approaches that are feature-based, our algorithm works on pixels directly, and accounts for errors across the whole images globally. Technically, we formulate the alignment problem as rank-1 and sparse matrix decomposition over transformed images, and develop an efficient algorithm for solving this challenging non-convex optimization problem. The algorithm reduces to solving a sequence of subproblems, where we analytically establish exact recovery conditions, convergence and optimality, together with convergence rate and complexity. We generalize it to simultaneously align multiple images and recover multiple homographies, extending its application scope towards vast majority of practical scenarios. Experimental results demonstrate that the proposed algorithm is capable of more accurately aligning the images and generating higher quality stitched images than state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04158

PDF

http://arxiv.org/pdf/1904.04158
Read All
fairseq: A Fast, Extensible Toolkit for Sequence Modeling

2019-04-01

Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli

arXiv_CL

arXiv_CL Summarization Text_Generation Inference Language_Model
Abstract

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found at https://www.youtube.com/watch?v=OtgDdWtHvto

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01038

PDF

http://arxiv.org/pdf/1904.01038
Read All
Learning to Stop in Structured Prediction for Neural Machine Translation

2019-04-01

Mingbo Ma, Renjie Zheng, Liang Huang

arXiv_CL

arXiv_CL Optimization Prediction
Abstract

Beam search optimization resolves many issues in neural machine translation. However, this method lacks principled stopping criteria and does not learn how to stop during training, and the model naturally prefers the longer hypotheses during the testing time in practice since they use the raw score instead of the probability-based score. We propose a novel ranking method which enables an optimal beam search stopping criteria. We further introduce a structured prediction loss function which penalizes suboptimal finished candidates produced by beam search during training. Experiments of neural machine translation on both synthetic data and real languages (German-to-English and Chinese-to-English) demonstrate our proposed methods lead to better length and BLEU score.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.01032

PDF

http://arxiv.org/pdf/1904.01032
Read All
Equivariant Multi-View Networks

2019-04-01

Carlos Esteves, Yinshuang Xu, Christine Allen-Blanchette, Kostas Daniilidis

arXiv_CV

arXiv_CV CNN Classification
Abstract

Several approaches to 3D vision tasks process multiple views of the input independently with deep neural networks pre-trained on natural images, achieving view permutation invariance through a single round of pooling over all views. We argue that this operation discards important information and leads to subpar global descriptors. In this paper, we propose a group convolutional approach to multiple view aggregation where convolutions are performed over a discrete subgroup of the rotation group, enabling, thus, joint reasoning over all views in an equivariant (instead of invariant) fashion, up to the very last layer. We further develop this idea to operate on smaller discrete homogeneous spaces of the rotation group, where a polar view representation is used to maintain equivariance with only a fraction of the number of input views. We set the new state of the art in several large scale 3D shape retrieval tasks, and show additional applications to panoramic scene classification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00993

PDF

http://arxiv.org/pdf/1904.00993
Read All
You Look Twice: GaterNet for Dynamic Filter Selection in CNNs

2019-04-01

Zhourong Chen, Yang Li, Samy Bengio, Si Si

arXiv_CV

arXiv_CV CNN Prediction
Abstract

The concept of conditional computation for deep nets has been proposed previously to improve model performance by selectively using only parts of the model conditioned on the sample it is processing. In this paper, we investigate input-dependent dynamic filter selection in deep convolutional neural networks (CNNs). The problem is interesting because the idea of forcing different parts of the model to learn from different types of samples may help us acquire better filters in CNNs, improve the model generalization performance and potentially increase the interpretability of model behavior. We propose a novel yet simple framework called GaterNet, which involves a backbone and a gater network. The backbone network is a regular CNN that performs the major computation needed for making a prediction, while a global gater network is introduced to generate binary gates for selectively activating filters in the backbone network based on each input. Extensive experiments on CIFAR and ImageNet datasets show that our models consistently outperform the original models with a large margin. On CIFAR-10, our model also improves upon state-of-the-art results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.11205

PDF

http://arxiv.org/pdf/1811.11205
Read All
Twitter Sentiment Analysis using Distributed Word and Sentence Representation

2019-04-01

Dwarampudi Mahidhar Reddy, Dr. N V Subba Reddy, Dr. N V Subba Reddy

arXiv_AI

arXiv_AI Sentiment CNN RNN
Abstract

An important part of the information gathering and data analysis is to find out what people think about, either a product or an entity. Twitter is an opinion rich social networking site. The posts or tweets from this data can be used for mining people’s opinions. The recent surge of activity in this area can be attributed to the computational treatment of data, which made opinion extraction and sentiment analysis easier. This paper classifies tweets into positive and negative sentiments, but instead of using traditional methods or preprocessing text data here we use the distributed representations of words and sentences to classify the tweets. We use Long Short Term Memory (LSTM) Networks, Convolutional Neural Networks (CNNs) and Artificial Neural Networks. The first two are used on Distributed Representation of words while the latter is used on the distributed representation of sentences. This paper achieves accuracies as high as 81%. It also suggests the best and optimal ways for creating distributed representations of words for sentiment analysis, out of the available methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12580

PDF

http://arxiv.org/pdf/1904.12580
Read All
Automatic Nonrigid Histological Image Registration with Adaptive Multistep Algorithm

2019-04-01

Marek Wodzinski, Andrzej Skalski

arXiv_CV

arXiv_CV GAN
Abstract

In this paper, we present a short description of the method proposed to ANHIR challenge organized jointly with the IEEE ISBI 2019 conference. We propose a method consisting of preprocessing, initial alignment, nonrigid registration algorithms and a method to automatically choose the best result. The method turned out to be robust (99.792% robustness) and accurate (0.38% average median rTRE). The main drawback of the proposed method is relatively high computation time. However, this aspect can be easily improved by cleaning the code and proposing a GPU implementation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00982

PDF

http://arxiv.org/pdf/1904.00982
Read All
Tell, Draw, and Repeat: Generating and modifying images based on continual linguistic instruction

2019-04-01

Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, Graham W.Taylor

arXiv_CV

arXiv_CV
Abstract

Conditional text-to-image generation is an active area of research, with many possible applications. Existing research has primarily focused on generating a single image from available conditioning information in one step. One practical extension beyond one-step generation is a system that generates an image iteratively, conditioned on ongoing linguistic input or feedback. This is significantly more challenging than one-step generation tasks, as such a system must understand the contents of its generated images with respect to the feedback history, the current feedback, as well as the interactions among concepts present in the feedback history. In this work, we present a recurrent image generation model which takes into account both the generated output up to the current step as well as all past instructions for generation. We show that our model is able to generate the background, add new objects, and apply simple transformations to existing objects. We believe our approach is an important step toward interactive generation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.09845

PDF

http://arxiv.org/pdf/1811.09845
Read All
Regional Homogeneity: Towards Learning Transferable Universal Adversarial Perturbations Against Defenses

2019-04-01

Yingwei Li, Song Bai, Cihang Xie, Zhenyu Liao, Xiaohui Shen, Alan L. Yuille

arXiv_CV

arXiv_CV Adversarial Object_Detection Segmentation Semantic_Segmentation Detection
Abstract

This paper focuses on learning transferable adversarial examples specifically against defense models (models to defense adversarial attacks). In particular, we show that a simple universal perturbation can fool a series of state-of-the-art defenses. Adversarial examples generated by existing attacks are generally hard to transfer to defense models. We observe the property of regional homogeneity in adversarial perturbations and suggest that the defenses are less robust to regionally homogeneous perturbations. Therefore, we propose an effective transforming paradigm and a customized gradient transformer module to transform existing perturbations into regionally homogeneous ones. Without explicitly forcing the perturbations to be universal, we observe that a well-trained gradient transformer module tends to output input-independent gradients (hence universal) benefiting from the under-fitting phenomenon. Thorough experiments demonstrate that our work significantly outperforms the prior art attacking algorithms (either image-dependent or universal ones) by an average improvement of 14.0% when attacking 9 defenses in the black-box setting. In addition to the cross-model transferability, we also verify that regionally homogeneous perturbations can well transfer across different vision tasks (attacking with the semantic segmentation task and testing on the object detection task).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00979

PDF

http://arxiv.org/pdf/1904.00979
Read All
Sentiment analysis with genetically evolved Gaussian kernels

2019-04-01

I. Roman, A. Mendiburu, R. Santana, J. A. Lozano

arXiv_CL

arXiv_CL Sentiment
Abstract

Sentiment analysis consists of evaluating opinions or statements from the analysis of text. Among the methods used to estimate the degree in which a text expresses a given sentiment, are those based on Gaussian Processes. However, traditional Gaussian Processes methods use a predefined kernel with hyperparameters that can be tuned but whose structure can not be adapted. In this paper, we propose the application of Genetic Programming for evolving Gaussian Process kernels that are more precise for sentiment analysis. We use use a very flexible representation of kernels combined with a multi-objective approach that simultaneously considers two quality metrics and the computational time spent by the kernels. Our results show that the algorithm can outperform Gaussian Processes with traditional kernels for some of the sentiment analysis tasks considered.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00977

PDF

http://arxiv.org/pdf/1904.00977
Read All
Reducing BERT Pre-Training Time from 3 Days to 76 Minutes

2019-04-01

Yang You, Jing Li, Jonathan Hseu, Xiaodan Song, James Demmel, Cho-Jui Hsieh

arXiv_AI

arXiv_AI Optimization Deep_Learning
Abstract

Large-batch training is key to speeding up deep neural network training in large distributed systems. However, large-batch training is difficult because it produces a generalization gap. Straightforward optimization often leads to accuracy loss on the test set. BERT \cite{devlin2018bert} is a state-of-the-art deep learning model that builds on top of deep bidirectional transformers for language understanding. Previous large-batch training techniques do not perform well for BERT when we scale the batch size (e.g. beyond 8192). BERT pre-training also takes a long time to finish (around three days on 16 TPUv3 chips). To solve this problem, we propose the LAMB optimizer, which helps us to scale the batch size to 65536 without losing accuracy. LAMB is a general optimizer that works for both small and large batch sizes and does not need hyper-parameter tuning besides the learning rate. The baseline BERT-Large model needs 1 million iterations to finish pre-training, while LAMB with batch size 65536/32768 only needs 8599 iterations. We push the batch size to the memory limit of a TPUv3 pod and can finish BERT training in 76 minutes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00962

PDF

http://arxiv.org/pdf/1904.00962
Read All
PyTorch-BigGraph: A Large-scale Graph Embedding System

2019-04-01

Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, Alex Peysakhovich

arXiv_AI

arXiv_AI Embedding Relation
Abstract

Graph embedding methods produce unsupervised node features from graphs that can then be used for a variety of machine learning tasks. Modern graphs, particularly in industrial applications, contain billions of nodes and trillions of edges, which exceeds the capability of existing embedding systems. We present PyTorch-BigGraph (PBG), an embedding system that incorporates several modifications to traditional multi-relation embedding systems that allow it to scale to graphs with billions of nodes and trillions of edges. PBG uses graph partitioning to train arbitrarily large embeddings on either a single machine or in a distributed environment. We demonstrate comparable performance with existing embedding systems on common benchmarks, while allowing for scaling to arbitrarily large graphs and parallelization on multiple machines. We train and evaluate embeddings on several large social network graphs as well as the full Freebase dataset, which contains over 100 million nodes and 2 billion edges.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12287

PDF

http://arxiv.org/pdf/1903.12287
Read All
Guided Meta-Policy Search

2019-04-01

Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

Reinforcement learning (RL) algorithms have demonstrated promising results on complex tasks, yet often require impractical numbers of samples because they learn from scratch. Meta-RL aims to address this challenge by leveraging experience from previous tasks in order to more quickly solve new tasks. However, in practice, these algorithms generally also require large amounts of on-policy experience during the meta-training process, making them impractical for use in many problems. To this end, we propose to learn a reinforcement learning procedure through imitation of expert policies that solve previously-seen tasks. This involves a nested optimization, with RL in the inner loop and supervised imitation learning in the outer loop. Because the outer loop imitation learning can be done with off-policy data, we can achieve significant gains in meta-learning sample efficiency. In this paper, we show how this general idea can be used both for meta-reinforcement learning and for learning fast RL procedures from multi-task demonstration data. The former results in an approach that can leverage policies learned for previous tasks without significant amounts of on-policy data during meta-training, whereas the latter is particularly useful in cases where demonstrations are easy for a person to provide. Across a number of continuous control meta-RL problems, we demonstrate significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00956

PDF

http://arxiv.org/pdf/1904.00956
Read All
Self-Supervised Robot In-hand Object Learning

2019-04-01

Victoria Florence, Jason J. Corso, Brent Griffin

arXiv_RO

arXiv_RO Segmentation Recognition
Abstract

In order to complete tasks in a new environment, robots must be able to recognize unseen, unique objects. Fully supervised methods have made great strides on the object segmentation task, but require many examples of each object class and don’t scale to unseen environments. In this work, we present a method that acquires pixelwise object labels for manipulable in-hand objects with no human supervision. Our two-step approach does a foreground-background segmentation informed by robot kinematics then uses a self-recognition network to segment the robot from the object in the foreground. We are able to achieve 49.4\% mIoU performance on a difficult and varied assortment of items.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00952

PDF

http://arxiv.org/pdf/1904.00952
Read All
Early Diagnosis of Pneumonia with Deep Learning

2019-04-01

Can Jozef Saul, Deniz Yagmur Urey, Can Doruk Taktakoglu

arXiv_CV

arXiv_CV CNN Classification Deep_Learning Detection
Abstract

Pneumonia has been one of the fatal diseases and has the potential to result in severe consequences within a short period of time, due to the flow of fluid in lungs, which leads to drowning. If not acted upon by drugs at the right time, pneumonia may result in death of individuals. Therefore, the early diagnosis is a key factor along the progress of the disease. This paper focuses on the biological progress of pneumonia and its detection by x-ray imaging, overviews the studies conducted on enhancing the level of diagnosis, and presents the methodology and results of an automation of xray images based on various parameters in order to detect the disease at very early stages. In this study we propose our deep learning architecture for the classification task, which is trained with modified images, through multiple steps of preprocessing. Our classification method uses convolutional neural networks and residual network architecture for classifying the images. Our findings yield an accuracy of 78.73%, surpassing the previously top scoring accuracy of 76.8%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00937

PDF

http://arxiv.org/pdf/1904.00937
Read All
Experimental Comparison of Visual-Aided Odometry Methods for Rail Vehicles

2019-04-01

Florian Tschopp, Thomas Schneider, Andrew W. Palmer, Navid Nourani-Vatani, Cesar Cadena, Roland Siegwart, Juan Nieto

arXiv_RO

arXiv_RO
Abstract

Today, rail vehicle localization is based on infrastructure-side Balises (beacons) together with on-board odometry to determine whether a rail segment is occupied. Such a coarse locking leads to a sub-optimal usage of the rail networks. New railway standards propose the use of moving blocks centered around the rail vehicles to increase the capacity of the network. However, this approach requires accurate and robust position and velocity estimation of all vehicles. In this work, we investigate the applicability, challenges and limitations of current visual and visual-inertial motion estimation frameworks for rail applications. An evaluation against RTK-GPS ground truth is performed on multiple datasets recorded in industrial, sub-urban, and forest environments. Our results show that stereo visual-inertial odometry has a great potential to provide a precise motion estimation because of its complementing sensor modalities and shows superior performance in challenging situations compared to other frameworks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00936

PDF

http://arxiv.org/pdf/1904.00936
Read All
Lost in Interpretation: Predicting Untranslated Terminology in Simultaneous Interpretation

2019-04-01

Nikolai Vogler, Craig Stewart, Graham Neubig

arXiv_CL

arXiv_CL Face
Abstract

Simultaneous interpretation, the translation of speech from one language to another in real-time, is an inherently difficult and strenuous task. One of the greatest challenges faced by interpreters is the accurate translation of difficult terminology like proper names, numbers, or other entities. Intelligent computer-assisted interpreting (CAI) tools that could analyze the spoken word and detect terms likely to be untranslated by an interpreter could reduce translation error and improve interpreter performance. In this paper, we propose a task of predicting which terminology simultaneous interpreters will leave untranslated, and examine methods that perform this task using supervised sequence taggers. We describe a number of task-specific features explicitly designed to indicate when an interpreter may struggle with translating a word. Experimental results on a newly-annotated version of the NAIST Simultaneous Translation Corpus (Shimizu et al., 2014) indicate the promise of our proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00930

PDF

http://arxiv.org/pdf/1904.00930
Read All
Unsupervised Abbreviation Disambiguation Contextual disambiguation using word embeddings

2019-04-01

Ciosici, Manuel, Sommer, Tobias, Assent, Ira

arXiv_CL

arXiv_CL Embedding Recommendation
Abstract

As abbreviations often have several distinct meanings, disambiguating their intended meaning in context is important for Machine Reading tasks such as document search, recommendation and question answering. Existing approaches mostly rely on labelled examples of abbreviations and their correct long forms, which is costly to generate and limits their applicability and flexibility. Importantly, they need to be subjected to a full empirical evaluation, which is cumbersome in practice. In this paper, we present an entirely unsupervised abbreviation disambiguation method (called UAD) that picks up abbreviation definitions from text. Creating distinct tokens per meaning, we learn context representations as word embeddings. We demonstrate how to further boost abbreviation disambiguation performance by obtaining better context representations from additional unstructured text. Our method is the first abbreviation disambiguation approach which features a transparent model that allows performance analysis without requiring full-scale evaluation, making it highly relevant for real-world deployments. In our thorough empirical evaluation, UAD achieves high performance on large real world document data sets from different domains and outperforms both baseline and state-of-the-art methods. UAD scales well and supports thousands of abbreviations with many different meanings with a single model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00929

PDF

http://arxiv.org/pdf/1904.00929
Read All
Robustness of 3D Deep Learning in an Adversarial Setting

2019-04-01

Matthew Wicker, Marta Kwiatkowska

arXiv_CV

arXiv_CV Adversarial Classification Deep_Learning
Abstract

Understanding the spatial arrangement and nature of real-world objects is of paramount importance to many complex engineering tasks, including autonomous navigation. Deep learning has revolutionized state-of-the-art performance for tasks in 3D environments; however, relatively little is known about the robustness of these approaches in an adversarial setting. The lack of comprehensive analysis makes it difficult to justify deployment of 3D deep learning models in real-world, safety-critical applications. In this work, we develop an algorithm for analysis of pointwise robustness of neural networks that operate on 3D data. We show that current approaches presented for understanding the resilience of state-of-the-art models vastly overestimate their robustness. We then use our algorithm to evaluate an array of state-of-the-art models in order to demonstrate their vulnerability to occlusion attacks. We show that, in the worst case, these networks can be reduced to 0% classification accuracy after the occlusion of at most 6.5% of the occupied input space.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00923

PDF

http://arxiv.org/pdf/1904.00923
Read All
Ultrasonic Blind Stick for Completely Blind People to Avoid any Kind of Obstacles

2019-04-01

Arnesh Sen, Kaustav Sen, Jayoti Das

arXiv_RO

arXiv_RO
Abstract

The ability to live without being controlled by any action, judgment and any outside factors including any opinions and regulations is defined by the term Independent. But in reality physical movement for travelling or simply walking through a crowded street pose great challenge for a visually impaired person. Also they must learn every detail about the home environment such as placement of tables; chairs etc. to prevent injury. Because of this disability they have to sacrifice their independence in daily living by depending on the sighted people in every busy place like bus, footpaths, railway stations etc. This paper aims to design an artificial navigating system with adjustable sensitivity with the help of ultrasonic proximity sensor to assist these blind persons to walk fearlessly and independently in both indoor and outdoor environment. This system can detect any type of upcoming obstacles and potholes using the reflection properties of ultrasound. Attachment of the system to different body areas makes its utilization more versatile and reliable.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.05318

PDF

http://arxiv.org/pdf/1904.05318
Read All
The RGB-D Triathlon: Towards Agile Visual Toolboxes for Robots

2019-04-01

Fabio Cermelli, Massimiliano Mancini, Elisa Ricci, Barbara Caputo

arXiv_CV

arXiv_CV Object_Detection Segmentation Attention Pose_Estimation Detection Recognition
Abstract

Deep networks have brought significant advances in robot perception, enabling to improve the capabilities of robots in several visual tasks, ranging from object detection and recognition to pose estimation, semantic scene segmentation and many others. Still, most approaches typically address visual tasks in isolation, resulting in overspecialized models which achieve strong performances in specific applications but work poorly in other (often related) tasks. This is clearly sub-optimal for a robot which is often required to perform simultaneously multiple visual recognition tasks in order to properly act and interact with the environment. This problem is exacerbated by the limited computational and memory resources typically available onboard to a robotic platform. The problem of learning flexible models which can handle multiple tasks in a lightweight manner has recently gained attention in the computer vision community and benchmarks supporting this research have been proposed. In this work we study this problem in the robot vision context, proposing a new benchmark, the RGB-D Triathlon, and evaluating state of the art algorithms in this novel challenging scenario. We also define a new evaluation protocol, better suited to the robot vision setting. Results shed light on the strengths and weaknesses of existing approaches and on open issues, suggesting directions for future research.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00912

PDF

http://arxiv.org/pdf/1904.00912
Read All
Spherical U-Net on Cortical Surfaces: Methods and Applications

2019-04-01

Fenqiang Zhao, Shunren Xia, Zhengwang Wu, Dingna Duan, Li Wang, Weili Lin, John H Gilmore, Dinggang Shen, Gang Li

arXiv_CV

arXiv_CV Face CNN Prediction
Abstract

Convolutional Neural Networks (CNNs) have been providing the state-of-the-art performance for learning-related problems involving 2D/3D images in Euclidean space. However, unlike in the Euclidean space, the shapes of many structures in medical imaging have a spherical topology in a manifold space, e.g., brain cortical or subcortical surfaces represented by triangular meshes, with large inter-subject and intrasubject variations in vertex number and local connectivity. Hence, there is no consistent neighborhood definition and thus no straightforward convolution/transposed convolution operations for cortical/subcortical surface data. In this paper, by leveraging the regular and consistent geometric structure of the resampled cortical surface mapped onto the spherical space, we propose a novel convolution filter analogous to the standard convolution on the image grid. Accordingly, we develop corresponding operations for convolution, pooling, and transposed convolution for spherical surface data and thus construct spherical CNNs. Specifically, we propose the Spherical U-Net architecture by replacing all operations in the standard U-Net with their spherical operation counterparts. We then apply the Spherical U-Net to two challenging and neuroscientifically important tasks in infant brains: cortical surface parcellation and cortical attribute map development prediction. Both applications demonstrate the competitive performance in the accuracy, computational efficiency, and effectiveness of our proposed Spherical U-Net, in comparison with the state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00906

PDF

http://arxiv.org/pdf/1904.00906
Read All
Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters

2019-04-01

Axel Barroso Laguna, Edgar Riba, Daniel Ponsa, Krystian Mikolajczyk

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

We introduce a novel approach for keypoint detection task that combines handcrafted and learned CNN filters within a shallow multi-scale architecture. Handcrafted filters provide anchor structures for learned filters, which localize, score and rank repeatable features. Scale-space representation is used within the network to extract keypoints at different levels. We design a loss function to detect robust features that exist across a range of scales and to maximize the repeatability score. Our Key.Net model is trained on data synthetically created from ImageNet and evaluated on HPatches benchmark. Results show that our approach outperforms state-of-the-art detectors in terms of repeatability, matching performance and complexity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00889

PDF

http://arxiv.org/pdf/1904.00889
Read All
Adversarial Defense by Restricting the Hidden Space of Deep Neural Networks

2019-04-01

Aamir Mustafa, Salman Khan, Munawar Hayat, Roland Goecke, Jianging Shen, Ling Shao

arXiv_CV

arXiv_CV Adversarial Knowledge Classification
Abstract

Deep neural networks are vulnerable to adversarial attacks, which can fool them by adding minuscule perturbations to the input images. The robustness of existing defenses suffers greatly under white-box attack settings, where an adversary has full knowledge about the network and can iterate several times to find strong perturbations. We observe that the main reason for the existence of such perturbations is the close proximity of different class samples in the learned feature space. This allows model decisions to be totally changed by adding an imperceptible perturbation in the inputs. To counter this, we propose to class-wise disentangle the intermediate feature representations of deep networks. Specifically, we force the features for each class to lie inside a convex polytope that is maximally separated from the polytopes of other classes. In this manner, the network is forced to learn distinct and distant decision regions for each class. We observe that this simple constraint on the features greatly enhances the robustness of learned models, even against the strongest white-box attacks, without degrading the classification performance on clean images. We report extensive evaluations in both \textit{black-box} and white-box attack scenarios and show significant gains in comparison to state-of-the art defenses.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00887

PDF

http://arxiv.org/pdf/1904.00887
Read All
Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

2019-04-01

Yawei Luo, Liang Zheng, Tao Guan, Junqing Yu, Yi Yang

arXiv_AI

arXiv_AI Adversarial Segmentation Semantic_Segmentation
Abstract

We consider the problem of unsupervised domain adaptation in semantic segmentation. The key in this campaign consists in reducing the domain shift, i.e., enforcing the data distributions of the two domains to be similar. A popular strategy is to align the marginal distribution in the feature space through adversarial learning. However, this global alignment strategy does not consider the local category-level feature distribution. A possible consequence of the global movement is that some categories which are originally well aligned between the source and target may be incorrectly mapped. To address this problem, this paper introduces a category-level adversarial network, aiming to enforce local semantic consistency during the trend of global alignment. Our idea is to take a close look at the category-level data distribution and align each class with an adaptive adversarial loss. Specifically, we reduce the weight of the adversarial loss for category-level aligned features while increasing the adversarial force for those poorly aligned. In this process, we decide how well a feature is category-level aligned between source and target by a co-training approach. In two domain adaptation tasks, i.e., GTA5 -> Cityscapes and SYNTHIA -> Cityscapes, we validate that the proposed method matches the state of the art in segmentation accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.09478

PDF

http://arxiv.org/pdf/1809.09478
Read All
Significance-aware Information Bottleneck for Domain Adaptive Semantic Segmentation

2019-04-01

Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, Yi Yang

arXiv_AI

arXiv_AI Adversarial Segmentation Image_Classification Semantic_Segmentation Classification
Abstract

For unsupervised domain adaptation problems, the strategy of aligning the two domains in latent feature space through adversarial learning has achieved much progress in image classification, but usually fails in semantic segmentation tasks in which the latent representations are overcomplex. In this work, we equip the adversarial network with a “significance-aware information bottleneck (SIB)”, to address the above problem. The new network structure, called SIBAN, enables a significance-aware feature purification before the adversarial adaptation, which eases the feature alignment and stabilizes the adversarial training course. In two domain adaptation tasks, i.e., GTA5 -> Cityscapes and SYNTHIA -> Cityscapes, we validate that the proposed method can yield leading results compared with other feature-space alternatives. Moreover, SIBAN can even match the state-of-the-art output-space methods in segmentation accuracy, while the latter are often considered to be better choices for domain adaptive segmentation task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00876

PDF

http://arxiv.org/pdf/1904.00876
Read All
Non-linear aggregation of filters to improve image denoising

2019-04-01

Benjamin Guedj, Juliette Rengot

arXiv_CV

arXiv_CV
Abstract

We introduce a novel aggregation method to efficiently perform image denoising. Preliminary filters are aggregated in a non-linear fashion, using a new metric of pixel proximity based on how the pool of filters reaches a consensus. The numerical performance of the method is illustrated and we show that the aggregate significantly outperforms each of the preliminary filters.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00865

PDF

http://arxiv.org/pdf/1904.00865
Read All
DefectNET: multi-class fault detection on highly-imbalanced datasets

2019-04-01

N. Anantrasirichai, David Bull

arXiv_CV

arXiv_CV Segmentation CNN Semantic_Segmentation Prediction Detection
Abstract

As a data-driven method, the performance of deep convolutional neural networks (CNN) relies heavily on training data. The prediction results of traditional networks give a bias toward larger classes, which tend to be the background in the semantic segmentation task. This becomes a major problem for fault detection, where the targets appear very small on the images and vary in both types and sizes. In this paper we propose a new network architecture, DefectNet, that offers multi-class (including but not limited to) defect detection on highly-imbalanced datasets. DefectNet consists of two parallel paths, which are a fully convolutional network and a dilated convolutional network to detect large and small objects respectively. We propose a hybrid loss maximising the usefulness of a dice loss and a cross entropy loss, and we also employ the leaky rectified linear unit (ReLU) to deal with rare occurrence of some targets in training batches. The prediction results show that our DefectNet outperforms state-of-the-art networks for detecting multi-class defects with the average accuracy improvement of approximately 10% on a wind turbine.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00863

PDF

http://arxiv.org/pdf/1904.00863
Read All
Precise Detection in Densely Packed Scenes

2019-04-01

Eran Goldman*, Roei Herzig*, Aviv Eisenschtat*, Jacob Goldberger, Tal Hassner

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Man-made scenes can be densely packed, containing numerous objects, often identical, positioned in close proximity. We show that precise object detection in such scenes remains a challenging frontier even for state-of-the-art object detectors. We propose a novel, deep-learning based method for precise object detection, designed for such challenging settings. Our contributions include: (1) A layer for estimating the Jaccard index as a detection quality score; (2) a novel EM merging unit, which uses our quality scores to resolve detection overlap ambiguities; finally, (3) an extensive, annotated data set, \dataset, representing packed retail environments, released for training and testing under such extreme settings. Detection tests on \dataset{} and counting tests on the CARPK and PUCPR+ show our method to outperform existing state-of-the-art with substantial margins. The code and data will be made available on \url{www.github.com/eg4000/SKU110K_CVPR19}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00853

PDF

http://arxiv.org/pdf/1904.00853
Read All
Depth-Aware Video Frame Interpolation

2019-04-01

Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan Yang

arXiv_CV

arXiv_CV CNN Quantitative
Abstract

Video frame interpolation aims to synthesize nonexistent frames in-between the original frames. While significant advances have been made from the recent deep convolutional neural networks, the quality of interpolation is often reduced due to large object motion or occlusion. In this work, we propose a video frame interpolation method which explicitly detects the occlusion by exploring the depth information. Specifically, we develop a depth-aware flow projection layer to synthesize intermediate flows that preferably sample closer objects than farther ones. In addition, we learn hierarchical features to gather contextual information from neighboring pixels. The proposed model then warps the input frames, depth maps, and contextual features based on the optical flow and local interpolation kernels for synthesizing the output frame. Our model is compact, efficient, and fully differentiable. Quantitative and qualitative results demonstrate that the proposed model performs favorably against state-of-the-art frame interpolation methods on a wide variety of datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00830

PDF

http://arxiv.org/pdf/1904.00830
Read All
Implementation of Fruits Recognition Classifier using Convolutional Neural Network Algorithm for Observation of Accuracies for Various Hidden Layers

2019-04-01

Shadman Sakib, Zahidun Ashrafi, Md. Abu Bakr Siddique

arXiv_CV

arXiv_CV CNN Classification Deep_Learning Recognition
Abstract

Fruit recognition using Deep Convolutional Neural Network (CNN) is one of the most promising applications in computer vision. In recent times, deep learning based classifications are making it possible to recognize fruits from images. However, fruit recognition is still a problem for the stacked fruits on weighing scale because of the complexity and similarity. In this paper, a fruit recognition system using CNN is proposed. The proposed method uses deep learning techniques for the classification. We have used Fruits-360 dataset for the evaluation purpose. From the dataset, we have established a dataset which contains 17,823 images from 25 different categories. The images are divided into training and test dataset. Moreover, for the classification accuracies, we have used various combinations of hidden layer and epochs for different cases and made a comparison between them. The overall performance losses of the network for different cases also observed. Finally, we have achieved the best test accuracy of 100% and a training accuracy of 99.79%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00783

PDF

http://arxiv.org/pdf/1904.00783
Read All
COCO_TS Dataset: Pixel--level Annotations Based on Weak Supervision for Scene Text Segmentation

2019-04-01

Simone Bonechi, Paolo Andreini, Monica Bianchini, Franco Scarselli

arXiv_CV

arXiv_CV Segmentation Weakly_Supervised CNN Semantic_Segmentation Detection
Abstract

The absence of large scale datasets with pixel-level supervisions is a significant obstacle for the training of deep convolutional networks for scene text segmentation. For this reason, synthetic data generation is normally employed to enlarge the training dataset. Nonetheless, synthetic data cannot reproduce the complexity and variability of natural images. In this paper, a weakly supervised learning approach is used to reduce the shift between training on real and synthetic data. Pixel-level supervisions for a text detection dataset (i.e. where only bounding-box annotations are available) are generated. In particular, the COCO-Text-Segmentation (COCO_TS) dataset, which provides pixel-level supervisions for the COCO-Text dataset, is created and released. The generated annotations are used to train a deep convolutional neural network for semantic segmentation. Experiments show that the proposed dataset can be used instead of synthetic data, allowing us to use only a fraction of the training samples and significantly improving the performances.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00818

PDF

http://arxiv.org/pdf/1904.00818
Read All
Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora

2019-04-01

Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa

arXiv_CL

arXiv_CL
Abstract

When the available data of a target speaker is insufficient to train a high quality speaker-dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers and train a multi-speaker TTS model instead. Many studies have shown that neural multi-speaker TTS model trained with a small amount data from multiple speakers combined can generate synthetic speech with better quality and stability than a speaker-dependent one. However when the amount of data from each speaker is highly unbalanced, the best approach to make use of the excessive data remains unknown. Our experiments showed that simply combining all available data from every speaker to train a multi-speaker model produces better than or at least similar performance to its speaker-dependent counterpart. Moreover by using an ensemble multi-speaker model, in which each subsystem is trained on a subset of available data, we can further improve the quality of the synthetic speech especially for underrepresented speakers whose training data is limited.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00771

PDF

http://arxiv.org/pdf/1904.00771
Read All
Semantic Nearest Neighbor Fields Monocular Edge Visual-Odometry

2019-04-01

Xiaolong Wu, Assia Benbihi, Antoine Richard, Cedric Pradalier

arXiv_CV

arXiv_CV Segmentation Tracking Deep_Learning Detection
Abstract

Recent advances in deep learning for edge detection and segmentation opens up a new path for semantic-edge-based ego-motion estimation. In this work, we propose a robust monocular visual odometry (VO) framework using category-aware semantic edges. It can reconstruct large-scale semantic maps in challenging outdoor environments. The core of our approach is a semantic nearest neighbor field that facilitates a robust data association of edges across frames using semantics. This significantly enlarges the convergence radius during tracking phases. The proposed edge registration method can be easily integrated into direct VO frameworks to estimate photometrically, geometrically, and semantically consistent camera motions. Different types of edges are evaluated and extensive experiments demonstrate that our proposed system outperforms state-of-art indirect, direct, and semantic monocular VO systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00738

PDF

http://arxiv.org/pdf/1904.00738
Read All
GAN You Do the GAN GAN?

2019-04-01

Joseph Suarez

arXiv_CV

arXiv_CV Adversarial GAN Deep_Learning
Abstract

Generative Adversarial Networks (GANs) have become a dominant class of generative models. In recent years, GAN variants have yielded especially impressive results in the synthesis of a variety of forms of data. Examples include compelling natural and artistic images, textures, musical sequences, and 3D object files. However, one obvious synthesis candidate is missing. In this work, we answer one of deep learning’s most pressing questions: GAN you do the GAN GAN? That is, is it possible to train a GAN to model a distribution of GANs? We release the full source code for this project under the MIT license.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00724

PDF

http://arxiv.org/pdf/1904.00724
Read All
Optimal Fusion of Elliptic Extended Target Estimates based on the Wasserstein Distance

2019-04-01

Kolja Thormann, Marcus Baum

arXiv_CV

arXiv_CV
Abstract

This paper considers the fusion of multiple estimates of a spatially extended object, where the object extent is modeled as an ellipse that is parameterized by the orientation and semi-axes lengths. For this purpose, we propose a novel systematic approach that employs a distance measure for ellipses, i.e., the Gaussian Wasserstein distance, as a cost function. We derive an explicit expression for the Minimium Mean Gaussian Wasserstein distance (MMGW) estimate. Based on the concept of a MMGW estimator, we develop efficient methods for the fusion of extended target estimates. The proposed fusion methods are evaluated in a simulated experiment and the benefits of the novel methods are discussed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00708

PDF

http://arxiv.org/pdf/1904.00708
Read All

98/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL