Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving

2019-05-01

Sascha Rosbach, Vinit James, Simon Großjohann, Silviu Homoceanu, Stefan Roth

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning Optimization
Abstract

Behavior and motion planning play an important role in automated driving. Traditionally, behavior planners instruct local motion planners with predefined behaviors. Due to the high scene complexity in urban environments, unpredictable situations may occur in which behavior planners fail to match predefined behavior templates. Recently, general-purpose planners have been introduced, combining behavior and local motion planning. These general-purpose planners allow behavior-aware motion planning given a single reward function. However, two challenges arise: First, this function has to map a complex feature space into rewards. Second, the reward function has to be manually tuned by an expert. Manually tuning this reward function becomes a tedious task. In this paper, we propose an approach that relies on human driving demonstrations to automatically tune reward functions. This study offers important insights into the driving style optimization of general-purpose planners with maximum entropy inverse reinforcement learning. We evaluate our approach based on the expected value difference between learned and demonstrated policies. Furthermore, we compare the similarity of human driven trajectories with optimal policies of our planner under learned and expert-tuned reward functions. Our experiments show that we are able to learn reward functions exceeding the level of manual expert tuning without prior domain knowledge.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00229

PDF

http://arxiv.org/pdf/1905.00229
Read All
SAI, a Sensible Artificial Intelligence that plays Go

2019-05-01

Francesco Morandin, Gianluca Amato, Rosa Gini, Carlo Metta, Maurizio Parton, Gian-Carlo Pascutto

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero paradigm. The winrate as a function of the komi is modeled with a two-parameters sigmoid function, so that the neural network must predict just one more variable to assess the winrate for all komi values. A second novel feature is that training is based on self-play games that occasionally branch – with changed komi – when the position is uneven. With this setting, reinforcement learning is showed to work on 7x7 Go, obtaining very strong playing agents. As a useful byproduct, the sigmoid parameters given by the network allow to estimate the score difference on the board, and to evaluate how much the game is decided.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.03928

PDF

http://arxiv.org/pdf/1809.03928
Read All
StackNet: Stacking Parameters for Continual learning

2019-05-01

Jangho Kim, Jeesoo Kim, Nojun Kwak

arXiv_CV

arXiv_CV Classification
Abstract

Training a neural network for a classification task typically assumes that the data to train are given from the beginning. However, in the real world, additional data accumulate gradually and the model requires additional training without accessing the old training data. This usually leads to the catastrophic forgetting problem which is inevitable for the traditional training methodology of neural networks. In this paper, we propose a continual learning method stacking feature map based continual learning method that is able to learn additional tasks while retaining the performance of previously learned tasks by stacking parameters. Composed of two complementary components, the index module and the StackNet, our method estimates the index of the corresponding task for an input sample with the index module and utilizes a particular portion of StackNet with this index. The StackNet guarantees no degradation in the performance of the previously learned tasks and the index module shows high confidence in finding the origin of an input sample.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1809.02441

PDF

https://arxiv.org/pdf/1809.02441
Read All
On the Interaction between Autonomous Mobility on Demand Systems and Power Distribution Networks -- An Optimal Power Flow Approach

2019-05-01

Alvaro Estandia, Maximilian Schiffer, Federico Rossi, Emre Can Kara, Ram Rajagopal, Marco Pavone

arXiv_RO

arXiv_RO Optimization
Abstract

In future transportation systems, the charging behavior of electric Autonomous Mobility on Demand (AMoD) fleets, i.e., fleets of self-driving cars that service on-demand trip requests, will likely challenge power distribution networks (PDNs), causing overloads or voltage drops. In this paper, we show that these challenges can be significantly attenuated if the PDNs’ operational constraints and exogenous loads (e.g., from homes or businesses) are considered when operating the electric AMoD fleet. We focus on a system-level perspective, assuming full cooperation between the AMoD and the PDN operators. Through this single entity perspective, we derive an upper bound on the benefits of coordination. We present an optimization-based modeling approach to jointly control an electric AMoD fleet and a series of PDNs, and analyze the benefit of coordination under load balancing constraints. For a case study in Orange County, CA, we show that coordinating the electric AMoD fleet and the PDNs helps to reduce 99% of overloads and 50% of voltage drops which the electric AMoD fleet causes without coordination. Our results show that coordinating electric AMoD and PDNs helps to level loads and can significantly postpone the point at which upgrading the network’s capacity to a larger scale becomes inevitable to preserve stability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00200

PDF

http://arxiv.org/pdf/1905.00200
Read All
Cyber-Physical Testbed for Human-Robot Collaborative Task Planning and Execution

2019-05-01

Tuly Hazbar, Shitij Kumar, Ferat Sahin

arXiv_RO

arXiv_RO
Abstract

In this paper, we present a cyber-physical testbed created to enable a human-robot team to preform a shared task in a shared workspace. The testbed is suitable for the implementation of a tabletop manipulation task, a common human-robot collaboration scenario. The testbed integrates elements that exists in the physical and virtual world. In this work, we report the insights we gathered through out our exploration in understanding and implementing task planning and execution for human-robot team.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00199

PDF

http://arxiv.org/pdf/1905.00199
Read All
Declarative Question Answering over Knowledge Bases containing Natural Language Text with Answer Set Programming

2019-05-01

Arindam Mitra, Peter Clark, Oyvind Tafjord, Chitta Baral

arXiv_AI

arXiv_AI Knowledge
Abstract

While in recent years machine learning (ML) based approaches have been the popular approach in developing end-to-end question answering systems, such systems often struggle when additional knowledge is needed to correctly answer the questions. Proposed alternatives involve translating the question and the natural language text to a logical representation and then use logical reasoning. However, this alternative falters when the size of the text gets bigger. To address this we propose an approach that does logical reasoning over premises written in natural language text. The proposed method uses recent features of Answer Set Programming (ASP) to call external NLP modules (which may be based on ML) which perform simple textual entailment. To test our approach we develop a corpus based on the life cycle questions and showed that Our system achieves up to $18\%$ performance gain when compared to standard MCQ solvers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00198

PDF

http://arxiv.org/pdf/1905.00198
Read All
Nested Variational Autoencoder for Topic Modeling on Microtexts with Word Vectors

2019-05-01

Trung Trinh, Tho Quan, Trung Mai

arXiv_CL

arXiv_CL Knowledge Embedding Optimization Inference Relation
Abstract

Most of the information on the Internet is represented in the form of microtexts, which are short text snippets like news headlines or tweets. These source of information is abundant and mining this data could uncover meaningful insights. Topic modeling is one of the popular methods to extract knowledge from a collection of documents, nevertheless conventional topic models such as Latent Dirichlet Allocation (LDA) is unable to perform well on short documents, mostly due to the scarcity of word co-occurrence statistics embedded in the data. The objective of our research is to create a topic model which can achieve great performances on microtexts while requiring a small runtime for scalability to large datasets. To solve the lack of information of microtexts, we allow our method to take advantage of word embeddings for additional knowledge of relationships between words. For speed and scalability, we apply Auto-Encoding Variational Bayes, an algorithm that can perform efficient black-box inference in probabilistic models. The result of our work is a novel topic model called Nested Variational Autoencoder which is a distribution that takes into account word vectors and is parameterized by a neural network architecture. For optimization, the model is trained to approximate the posterior distribution of the original LDA model. Experiments show the improvements of our model on microtexts as well as its runtime advantage.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00195

PDF

http://arxiv.org/pdf/1905.00195
Read All
Horizon: Facebook's Open Source Applied Reinforcement Learning Platform

2019-05-01

Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye, Zhengxing Chen

arXiv_AI

arXiv_AI Face Reinforcement_Learning
Abstract

In this paper we present Horizon, Facebook’s open source applied reinforcement learning (RL) platform. Horizon is an end-to-end platform designed to solve industry applied RL problems where datasets are large (millions to billions of observations), the feedback loop is slow (vs. a simulator), and experiments must be done with care because they don’t run in a simulator. Unlike other RL platforms, which are often designed for fast prototyping and experimentation, Horizon is designed with production use cases as top of mind. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, optimized serving, and a model-based data understanding tool. We also showcase and describe real examples where reinforcement learning models trained with Horizon significantly outperformed and replaced supervised learning systems at Facebook.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.00260

PDF

http://arxiv.org/pdf/1811.00260
Read All
Learning fashion compatibility across apparel categories for outfit recommendation

2019-05-01

Luisa F. Polania, Satyajit Gupte

arXiv_CV

arXiv_CV Embedding Relation Recommendation
Abstract

This paper addresses the problem of generating recommendations for completing the outfit given that a user is interested in a particular apparel item. The proposed method is based on a siamese network used for feature extraction followed by a fully-connected network used for learning a fashion compatibility metric. The embeddings generated by the siamese network are augmented with color histogram features motivated by the important role that color plays in determining fashion compatibility. The training of the network is formulated as a maximum a posteriori (MAP) problem where Laplacian distributions are assumed for the filters of the siamese network to promote sparsity and matrix-variate normal distributions are assumed for the weights of the metric network to efficiently exploit correlations between the input units of each fully-connected layer.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03703

PDF

http://arxiv.org/pdf/1905.03703
Read All
Disease Identification From Unstructured User Input

2019-05-01

Fahim Faisal (1), Shafkat Ahmed Bhuiyan (1), Dr. Abu Raihan Mostofa Kamal (1) ((1) Islamic University of Technology)

arXiv_CL

arXiv_CL Knowledge Text_Classification Classification Relation
Abstract

The increasing number of Internet users leads to the rapid popularization of online searching for health related advice. Now a days, just in case of facing health problem, people tend to “go online” initially instead of consulting with a health professional. With the proliferation of online symptom checker sites and health forums, it is easy to gain knowledge regarding health condition supported by a number of given symptoms. Though existing symptom checkers provide instant sense of disease diagnosis, these question-answering and selection based systems lack in interactivity. Online health forum sites can also be underwhelming because of it’s time intensive nature and reliability issues. In this scenario, this paper proposes an web based automated disease identification framework which takes unstructured textual data like health forum posts as input and provides a symptom-disease correlation based ranking of probable diseases as output considering all important factors. The proposed framework incorporates a lexicographic and semantic feature based two-phase state-of-the-art text classification system and a disease knowledge base based similarity measurement module to identify probable disease. We evaluate this framework varying the number of feature components and the result suggests that, significant accuracy and reliability is obtained over baseline systems by effective feature engineering at the same time of keeping up with increased user interactivity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.01987

PDF

http://arxiv.org/pdf/1905.01987
Read All
Unsupervised Temperature Scaling: Post-Processing Unsupervised Calibration of Deep Models Decisions

2019-05-01

Azadeh Sadat Mozafari, Hugo Siqueira Gomes, Wilson Leão, Christian Gagné

arXiv_CV

arXiv_CV Inference Deep_Learning Detection
Abstract

Great performances of deep learning are undeniable, with impressive results on wide range of tasks. However, the output confidence of these models is usually not well calibrated, which can be an issue for applications where confidence on the decisions is central to bring trust and reliability (e.g., autonomous driving or medical diagnosis). For models using softmax at the last layer, Temperature Scaling (TS) is a state-of-the-art calibration method, with low time and memory complexity as well as demonstrated effectiveness.TS relies on a T parameter to rescale and calibrate values of the softmax layer, using a labelled dataset to determine the value of that parameter.We are proposing an Unsupervised Temperature Scaling (UTS) approach, which does not dependent on labelled samples to calibrate the model,allowing, for example, using a part of test samples for calibrating the pre-trained model before going into inference mode. We provide theoretical justifications for UTS and assess its effectiveness on the wide range of deep models and datasets. We also demonstrate calibration results of UTS on skin lesion detection, a problem where a well-calibrated output can play an important role for accurate decision-making.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00174

PDF

http://arxiv.org/pdf/1905.00174
Read All
ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector

2019-05-01

Shang-Tse Chen, Cory Cornelius, Jason Martin, Duen Horng Chau

arXiv_CV

arXiv_CV Adversarial Object_Detection Image_Classification Classification Detection
Abstract

Given the ability to directly manipulate image pixels in the digital input space, an adversary can easily generate imperceptible perturbations to fool a Deep Neural Network (DNN) image classifier, as demonstrated in prior work. In this work, we propose ShapeShifter, an attack that tackles the more challenging problem of crafting physical adversarial perturbations to fool image-based object detectors like Faster R-CNN. Attacking an object detector is more difficult than attacking an image classifier, as it needs to mislead the classification results in multiple bounding boxes with different scales. Extending the digital attack to the physical world adds another layer of difficulty, because it requires the perturbation to be robust enough to survive real-world distortions due to different viewing distances and angles, lighting conditions, and camera limitations. We show that the Expectation over Transformation technique, which was originally proposed to enhance the robustness of adversarial perturbations in image classification, can be successfully adapted to the object detection setting. ShapeShifter can generate adversarially perturbed stop signs that are consistently mis-detected by Faster R-CNN as other objects, posing a potential threat to autonomous vehicles and other safety-critical computer vision systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.05810

PDF

http://arxiv.org/pdf/1804.05810
Read All
State-of-the-art in 360° Video/Image Processing: Perception, Assessment and Compression

2019-05-01

Chen Li, Mai Xu, Shanyi Zhang, Patrick Le Callet

arXiv_CV

arXiv_CV Review QA Attention Survey VQA
Abstract

Nowadays, 360° video/image has been increasingly popular and drawn great attention. The spherical viewing range of 360° video/image accounts for huge data, which pose the challenges to 360° video/image processing in solving the bottleneck of storage, transmission, etc. Accordingly, the recent years have witnessed the explosive emergence of works on 360° video/image processing. In this paper, we review the state-of-the-art works on 360° video/image processing from the aspects of perception, assessment and compression. First, this paper reviews both datasets and visual attention modelling approaches for 360° video/image. Second, we survey the related works on both subjective and objective visual quality assessment (VQA) of 360° video/image. Third, we overview the compression approaches for 360° video/image, which either utilize the spherical characteristics or visual attention models. Finally, we summarize this overview paper and outlook the future research trends on 360° video/image processing.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.00161

PDF

https://arxiv.org/pdf/1905.00161
Read All
Precise Synthetic Image and LiDAR Dataset for Autonomous Vehicle Perception

2019-05-01

Braden Hurl, Krzysztof Czarnecki, Steven Waslander

arXiv_CV

arXiv_CV Object_Detection Segmentation Semantic_Segmentation Detection
Abstract

We introduce the Precise Synthetic Image and LiDAR (PreSIL) dataset for autonomous vehicle perception. Grand Theft Auto V (GTA V), a commercial video game, has a large detailed world with realistic graphics, which provides a diverse data collection environment. Existing work creating synthetic data for autonomous driving with GTA V have not released their datasets and rely on an in-game raycasting function which represents people as cylinders and can fail to capture vehicles past 30 metres. Our work creates a precise LiDAR simulator within GTA V which collides with detailed models for all entities no matter the type or position. The PreSIL dataset consists of over 50,000 instances and includes high-definition images with full resolution depth information, semantic segmentation (images), point-wise segmentation (point clouds), ground point labels (point clouds), and detailed annotations for all vehicles and people. Collecting additional data with our framework is entirely automatic and requires no human annotation of any kind. We demonstrate the effectiveness of our dataset by showing an improvement of up to 5% average precision on the KITTI 3D Object Detection benchmark challenge when state-of-the-art 3D object detection networks are pre-trained with our data. The data and code are available at https://uwaterloo.ca/waterloo-intelligent-systems-engineering-lab/projects/precise-synthetic-image-and-lidar-presil-dataset-autonomous

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00160

PDF

http://arxiv.org/pdf/1905.00160
Read All
A Style Transfer Approach to Source Separation

2019-05-01

Shrikant Venkataramani, Efthymios Tzinis, Paris Smaragdis

arXiv_SD

arXiv_SD Style_Transfer
Abstract

Training neural networks for source separation involves presenting a mixture recording at the input of the network and updating network parameters in order to produce an output that resembles the clean source. Consequently, supervised source separation depends on the availability of paired mixture-clean training examples. In this paper, we interpret source separation as a style transfer problem. We present a variational auto-encoder network that exploits the commonality across the domain of mixtures and the domain of clean sounds and learns a shared latent representation across the two domains. Using these cycle-consistent variational auto-encoders, we learn a mapping from the mixture domain to the domain of clean sounds and perform source separation without explicitly supervising with paired training examples.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00151

PDF

http://arxiv.org/pdf/1905.00151
Read All
Self-Supervised Convolutional Subspace Clustering Network

2019-05-01

Junjian Zhang, Chun-Guang Li, Chong You, Xianbiao Qi, Honggang Zhang, Jun Guo, Zhouchen Lin

arXiv_CV

arXiv_CV CNN Optimization Classification
Abstract

Subspace clustering methods based on data self-expression have become very popular for learning from data that lie in a union of low-dimensional linear subspaces. However, the applicability of subspace clustering has been limited because practical visual data in raw form do not necessarily lie in such linear subspaces. On the other hand, while Convolutional Neural Network (ConvNet) has been demonstrated to be a powerful tool for extracting discriminative features from visual data, training such a ConvNet usually requires a large amount of labeled data, which are unavailable in subspace clustering applications. To achieve simultaneous feature learning and subspace clustering, we propose an end-to-end trainable framework, called Self-Supervised Convolutional Subspace Clustering Network (S$^2$ConvSCN), that combines a ConvNet module (for feature learning), a self-expression module (for subspace clustering) and a spectral clustering module (for self-supervision) into a joint optimization framework. Particularly, we introduce a dual self-supervision that exploits the output of spectral clustering to supervise the training of the feature learning module (via a classification loss) and the self-expression module (via a spectral clustering loss). Our experiments on four benchmark datasets show the effectiveness of the dual self-supervision and demonstrate superior performance of our proposed approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00149

PDF

http://arxiv.org/pdf/1905.00149
Read All
Fair Classification and Social Welfare

2019-05-01

Lily Hu, Yiling Chen

arXiv_AI

arXiv_AI Classification Relation
Abstract

Now that machine learning algorithms lie at the center of many resource allocation pipelines, computer scientists have been unwittingly cast as partial social planners. Given this state of affairs, important questions follow. What is the relationship between fairness as defined by computer scientists and notions of social welfare? In this paper, we present a welfare-based analysis of classification and fairness regimes. We translate a loss minimization program into a social welfare maximization problem with a set of implied welfare weights on individuals and groups–weights that can be analyzed from a distribution justice lens. In the converse direction, we ask what the space of possible labelings is for a given dataset and hypothesis class. We provide an algorithm that answers this question with respect to linear hyperplanes in $\mathbb{R}^d$ that runs in $O(n^dd)$. Our main findings on the relationship between fairness criteria and welfare center on sensitivity analyses of fairness-constrained empirical risk minimization programs. We characterize the ranges of $\Delta \epsilon$ perturbations to a fairness parameter $\epsilon$ that yield better, worse, and neutral outcomes in utility for individuals and by extension, groups. We show that applying more strict fairness criteria that are codified as parity constraints, can worsen welfare outcomes for both groups. More generally, always preferring “more fair” classifiers does not abide by the Pareto Principle—a fundamental axiom of social choice theory and welfare economics. Recent work in machine learning has rallied around these notions of fairness as critical to ensuring that algorithmic systems do not have disparate negative impact on disadvantaged social groups. By showing that these constraints often fail to translate into improved outcomes for these groups, we cast doubt on their effectiveness as a means to ensure justice.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00147

PDF

http://arxiv.org/pdf/1905.00147
Read All
Inferring the Importance of Product Appearance: A Step Towards the Screenless Revolution

2019-05-01

Yongshun Gong, Jinfeng Yi, Dongdong Chen, Jian Zhang, Jiayu Zhou, Zhihua Zhou

arXiv_CV

arXiv_CV Classification
Abstract

Nowadays, almost all the online orders were placed through screened devices such as mobile phones, tablets, and computers. With the rapid development of the Internet of Things (IoT) and smart appliances, more and more screenless smart devices, e.g., smart speaker and smart refrigerator, appear in our daily lives. They open up new means of interaction and may provide an excellent opportunity to reach new customers and increase sales. However, not all the items are suitable for screenless shopping, since some items’ appearance play an important role in consumer decision making. Typical examples include clothes, dolls, bags, and shoes. In this paper, we aim to infer the significance of every item’s appearance in consumer decision making and identify the group of items that are suitable for screenless shopping. Specifically, we formulate the problem as a classification task that predicts if an item’s appearance has a significant impact on people’s purchase behavior. To solve this problem, we extract features from three different views, namely items’ intrinsic properties, items’ images, and users’ comments, and collect a set of necessary labels via crowdsourcing. We then propose an iterative semi-supervised learning framework with three carefully designed loss functions. We conduct extensive experiments on a real-world transaction dataset collected from the online retail giant JD.com. Experimental results verify the effectiveness of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03698

PDF

http://arxiv.org/pdf/1905.03698
Read All
Semi-Unsupervised Lifelong Learning for Sentiment Classification

2019-04-30

Xianbin Hong, Gautam Pal, Sheng-Uei Guan, Prudence Wong, Dawei Liu, Ka Lok Man, Xin Huang

arXiv_AI

arXiv_AI Sentiment Review Knowledge Attention Sentiment_Classification Classification
Abstract

Lifelong machine learning is a novel machine learning paradigm which continually learns tasks and accumulates knowledge for reuse. The knowledge extracting and reusing abilities enable lifelong machine learning to understand the knowledge for solving a task and obtain the ability to solve the related problems. In sentiment classification, traditional approaches like Naive Bayes focus on the probability for each word with positive or negative sentiment. However, the lifelong machine learning in this paper will investigate this problem in a different angle and attempt to discover which words determine the sentiment of a review. We will pay all attention to obtain knowledge during learning for future learning rather than just solve a current task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.01988

PDF

http://arxiv.org/pdf/1905.01988
Read All
ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal after Weight Pruning

2019-04-30

Xiaolong Ma, Geng Yuan, Sheng Lin, Zhengang Li, Hao Sun, Yanzhi Wang

arXiv_AI

arXiv_AI Optimization
Abstract

The state-of-art DNN structures involve high computation and great demand for memory storage which pose intensive challenge on DNN framework resources. To mitigate the challenges, weight pruning techniques has been studied. However, high accuracy solution for extreme structured pruning that combines different types of structured sparsity still waiting for unraveling due to the extremely reduced weights in DNN networks. In this paper, we propose a DNN framework which combines two different types of structured weight pruning (filter and column prune) by incorporating alternating direction method of multipliers (ADMM) algorithm for better prune performance. We are the first to find non-optimality of ADMM process and unused weights in a structured pruned model, and further design an optimization framework which contains the first proposed Network Purification and Unused Path Removal algorithms which are dedicated to post-processing an structured pruned model after ADMM steps. Some high lights shows we achieve 232x compression on LeNet-5, 60x compression on ResNet-18 CIFAR-10 and over 5x compression on AlexNet. We share our models at anonymous link this http URL

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00136

PDF

http://arxiv.org/pdf/1905.00136
Read All
Harmonic Networks with Limited Training Samples

2019-04-30

Matej Ulicny, Vladimir A. Krylov, Rozenn Dahyot

arXiv_CV

arXiv_CV CNN
Abstract

Convolutional neural networks (CNNs) are very popular nowadays for image processing. CNNs allow one to learn optimal filters in a (mostly) supervised machine learning context. However this typically requires abundant labelled training data to estimate the filter parameters. Alternative strategies have been deployed for reducing the number of parameters and / or filters to be learned and thus decrease overfitting. In the context of reverting to preset filters, we propose here a computationally efficient harmonic block that uses Discrete Cosine Transform (DCT) filters in CNNs. In this work we examine the performance of harmonic networks in limited training data scenario. We validate experimentally that its performance compares well against scattering networks that use wavelets as preset filters.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00135

PDF

http://arxiv.org/pdf/1905.00135
Read All
3D Grasp Stability Analysis with Coulomb Friction with Hierarchical Convex Relaxations

2019-04-30

Maximilian Haas-Heger, Matei Ciocarlie

arXiv_RO

arXiv_RO
Abstract

We present an algorithm to determine quasistatic equilibrium of three dimensional grasps in the presence of Coulomb Friction. Due to the non-convexity of this friction law we introduce a relaxation that allows us to formulate the problem as a Mixed-Integer Problem. This type of problem can be solved efficiently with methods such as the branch and bound algorithm. However, as the number of integer variables will greatly affect computation time we present an algorithm that successively refines the friction constraint relaxation locally to obtain solutions to arbitrary accuracy efficiently. This allows us to determine if a system is quasistatically stable (i.e. it is in equilibrium) or not. Furthermore, we can solve for the equilibrium contact forces or actuator commands necessary for stability. We apply this algorithm to analyze the conditions for stability of robotic grasps.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00134

PDF

http://arxiv.org/pdf/1905.00134
Read All
To believe or not to believe: Validating explanation fidelity for dynamic malware analysis

2019-04-30

Li Chen, Carter Yagemann, Evan Downing

arXiv_AI

arXiv_AI Transfer_Learning Classification Deep_Learning Detection
Abstract

Converting malware into images followed by vision-based deep learning algorithms has shown superior threat detection efficacy compared with classical machine learning algorithms. When malware are visualized as images, visual-based interpretation schemes can also be applied to extract insights of why individual samples are classified as malicious. In this work, via two case studies of dynamic malware classification, we extend the local interpretable model-agnostic explanation algorithm to explain image-based dynamic malware classification and examine its interpretation fidelity. For both case studies, we first train deep learning models via transfer learning on malware images, demonstrate high classification effectiveness, apply an explanation method on the images, and correlate the results back to the samples to validate whether the algorithmic insights are consistent with security domain expertise. In our first case study, the interpretation framework identifies indirect calls that uniquely characterize the underlying exploit behavior of a malware family. In our second case study, the interpretation framework extracts insightful information such as cryptography-related APIs when applied on images created from API existence, but generate ambiguous interpretation on images created from API sequences and frequencies. Our findings indicate that current image-based interpretation techniques are promising for explaining vision-based malware classification. We continue to develop image-based interpretation schemes specifically for security applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00122

PDF

http://arxiv.org/pdf/1905.00122
Read All
Think Again Networks and the Delta Loss

2019-04-30

Alexandre Salle, Marcelo Prates

arXiv_CL

arXiv_CL
Abstract

This short paper introduces an abstraction called Think Again Networks (ThinkNet) which can be applied to any state-dependent function (such as a recurrent neural network).

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.11816

PDF

https://arxiv.org/pdf/1904.11816
Read All
FastContext: an efficient and scalable implementation of the ConText algorithm

2019-04-30

Jianlin Shi, John F. Hurdle

arXiv_CL

arXiv_CL
Abstract

Objective: To develop and evaluate FastContext, an efficient, scalable implementation of the ConText algorithm suitable for very large-scale clinical natural language processing. Background: The ConText algorithm performs with state-of-art accuracy in detecting the experiencer, negation status, and temporality of concept mentions in clinical narratives. However, the speed limitation of its current implementations hinders its use in big data processing. Methods: We developed FastContext through hashing the ConText’s rules, then compared its speed and accuracy with JavaConText and GeneralConText, two widely used Java implementations. Results: FastContext ran two orders of magnitude faster and was less decelerated by rule increase than the other two implementations used in this study for comparison. Additionally, FastContext consistently gained accuracy improvement as the rules increased (the desired outcome of adding new rules), while the other two implementations did not. Conclusions: FastContext is an efficient, scalable implementation of the popular ConText algorithm, suitable for natural language applications on very large clinical corpora.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00079

PDF

http://arxiv.org/pdf/1905.00079
Read All
Deep Learning for Audio Signal Processing

2019-04-30

Hendrik Purwins (1), Bo Li (2), Tuomas Virtanen (3), Jan Schlüter (4 and 5), Shuo-yiin Chang (2), Tara Sainath (2) ((1) Aalborg University Copenhagen, (2) Google, (3) Tampere University, (4) Université de Toulon, (5) Austrian Research Institute for Artificial Intelligence)

arXiv_SD

arXiv_SD Review Speech_Recognition Tracking CNN Deep_Learning Detection Recognition
Abstract

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00078

PDF

http://arxiv.org/pdf/1905.00078
Read All
Show, Attend and Translate: Unsupervised Image Translation with Self-Regularization and Attention

2019-04-30

Chao Yang, Taehwan Kim, Ruizhe Wang, Hao Peng, C.-C. Jay Kuo

arXiv_CV

arXiv_CV Regularization Salient Adversarial Segmentation Attention Detection
Abstract

Image translation between two domains is a class of problems aiming to learn mapping from an input image in the source domain to an output image in the target domain. It has been applied to numerous domains, such as data augmentation, domain adaptation, and unsupervised training. When paired training data is not accessible, image translation becomes an ill-posed problem. We constrain the problem with the assumption that the translated image needs to be perceptually similar to the original image and also appears to be drawn from the new domain, and propose a simple yet effective image translation model consisting of a single generator trained with a self-regularization term and an adversarial term. We further notice that existing image translation techniques are agnostic to the subjects of interest and often introduce unwanted changes or artifacts to the input. Thus we propose to add an attention module to predict an attention map to guide the image translation process. The module learns to attend to key parts of the image while keeping everything else unaltered, essentially avoiding undesired artifacts or changes. The predicted attention map also opens door to applications such as unsupervised segmentation and saliency detection. Extensive experiments and evaluations show that our model while being simpler, achieves significantly better performance than existing image translation methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.06195

PDF

http://arxiv.org/pdf/1806.06195
Read All
Predicting How to Distribute Work Between Algorithms and Humans to Segment an Image Batch

2019-04-30

Danna Gurari, Yinan Zhao, Suyog Dutt Jain, Margrit Betke, Kristen Grauman

arXiv_CV

arXiv_CV Segmentation Prediction
Abstract

Foreground object segmentation is a critical step for many image analysis tasks. While automated methods can produce high-quality results, their failures disappoint users in need of practical solutions. We propose a resource allocation framework for predicting how best to allocate a fixed budget of human annotation effort in order to collect higher quality segmentations for a given batch of images and automated methods. The framework is based on a prediction module that estimates the quality of given algorithm-drawn segmentations. We demonstrate the value of the framework for two novel tasks related to predicting how to distribute annotation efforts between algorithms and humans. Specifically, we develop two systems that automatically decide, for a batch of images, when to recruit humans versus computers to create 1) coarse segmentations required to initialize segmentation tools and 2) final, fine-grained segmentations. Experiments demonstrate the advantage of relying on a mix of human and computer efforts over relying on either resource alone for segmenting objects in images coming from three diverse modalities (visible, phase contrast microscopy, and fluorescence microscopy).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00060

PDF

http://arxiv.org/pdf/1905.00060
Read All
Personalized Ranking in eCommerce Search

2019-04-30

Grigor Aslanyan, Aritra Mandal, Prathyusha Senthil Kumar, Amit Jaiswal, Manojkumar Rangasamy Kannadasan

arXiv_CL

arXiv_CL Embedding
Abstract

We address the problem of personalization in the context of eCommerce search. Specifically, we develop personalization ranking features that use in-session context to augment a generic ranker optimized for conversion and relevance. We use a combination of latent features learned from item co-clicks in historic sessions and content-based features that use item title and price. Personalization in search has been discussed extensively in the existing literature. The novelty of our work is combining and comparing content-based and content-agnostic features and showing that they complement each other to result in a significant improvement of the ranker. Moreover, our technique does not require an explicit re-ranking step, does not rely on learning user profiles from long term search behavior, and does not involve complex modeling of query-item-user features. Our approach captures item co-click propensity using lightweight item embeddings. We experimentally show that our technique significantly outperforms a generic ranker in terms of Mean Reciprocal Rank (MRR). We also provide anecdotal evidence for the semantic similarity captured by the item embeddings on the eBay search engine.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00052

PDF

http://arxiv.org/pdf/1905.00052
Read All
Attentive Spatio-Temporal Representation Learning for Diving Classification

2019-04-30

Gagan Kanojia, Sudhakar Kumawat, Shanmuganathan Raman

arXiv_CV

arXiv_CV Attention Represenation_Learning RNN Classification
Abstract

Competitive diving is a well recognized aquatic sport in which a person dives from a platform or a springboard into the water. Based on the acrobatics performed during the dive, diving is classified into a finite set of action classes which are standardized by FINA. In this work, we propose an attention guided LSTM-based neural network architecture for the task of diving classification. The network takes the frames of a diving video as input and determines its class. We evaluate the performance of the proposed model on a recently introduced competitive diving dataset, Diving48. It contains over 18000 video clips which covers 48 classes of diving. The proposed model outperforms the classification accuracy of the state-of-the-art models in both 2D and 3D frameworks by 11.54% and 4.24%, respectively. We show that the network is able to localize the diver in the video frames during the dive without being trained with such a supervision.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.00050

PDF

http://arxiv.org/pdf/1905.00050
Read All
Photofeeler-D3: A Neural Network with Voter Modeling for Dating Photo Impression Prediction

2019-04-30

Agastya Kalra, Ben Peterson

arXiv_CV

arXiv_CV CNN Prediction
Abstract

In just a few years, online dating has become the dominant way that young people meet to date, making the deceptively error-prone task of picking good dating profile photos vital to a generation’s ability to form romantic connections. Until now, artificial intelligence approaches to Dating Photo Impression Prediction (DPIP) have been very inaccurate, unadaptable to real-world application, and have only taken into account a subject’s physical attractiveness. To that effect, we propose Photofeeler-D3 - the first convolutional neural network as accurate as 10 human votes for how smart, trustworthy, and attractive the subject appears in highly variable dating photos. Our “attractive” output is also applicable to Facial Beauty Prediction (FBP), making Photofeeler-D3 state-of-the-art for both DPIP and FBP. We achieve this by leveraging Photofeeler’s Dating Dataset (PDD) with over 1 million images and tens of millions of votes, our novel technique of voter modeling, and cutting-edge computer vision techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.07435

PDF

http://arxiv.org/pdf/1904.07435
Read All
Limits on a population of collisional-triples as progenitors of Type-Ia supernovae

2019-04-30

Na'ama Hallakoun, Dan Maoz

arXiv_CV

arXiv_CV
Abstract

The progenitor systems of Type-Ia supernovae (SNe Ia) are yet unknown. The collisional-triple SN Ia progenitor model posits that SNe Ia result from head-on collisions of binary white dwarfs (WDs), driven by dynamical perturbations by the tertiary stars in mild-hierarchical triple systems. To reproduce the Galactic SN Ia rate, some 30-55 per cent of all WDs would need to be in triple systems of a specific architecture. We test this scenario by searching the Gaia DR2 database for the postulated progenitor triples. Within a volume out to 120 pc, we search around Gaia-resolved double WDs with projected separations up to 300 au, for physical tertiary companions at projected separations out to 9000 au. At 120 pc, Gaia can detect faint low-mass tertiaries down to the bottom of the main sequence and to the coolest WDs. Around 27 double WDs, we identify zero tertiaries at such separations, setting a 95 per cent confidence upper limit of 11 per cent on the fraction of binary WDs that are part of mild hierarchical triples of the kind required by the model. As only a fraction (likely ~10 per cent) of all WDs are in <300 au WD binaries, the potential collisional-triple progenitor population appears to be at least an order of magnitude (and likely several) smaller than required by the model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.00032

PDF

https://arxiv.org/pdf/1905.00032
Read All
OpenEDS: Open Eye Dataset

2019-04-30

Stephan J. Garbin, Yiru Shen, Immo Schuetz, Robert Cavin, Gregory Hughes, Sachin S. Talathi

arXiv_CV

arXiv_CV Segmentation Tracking Semantic_Segmentation
Abstract

We present a large scale data set, OpenEDS: Open Eye Dataset, of eye-images captured using a virtual-reality (VR) head mounted display mounted with two synchronized eyefacing cameras at a frame rate of 200 Hz under controlled illumination. This dataset is compiled from video capture of the eye-region collected from 152 individual participants and is divided into four subsets: (i) 12,759 images with pixel-level annotations for key eye-regions: iris, pupil and sclera (ii) 252,690 unlabelled eye-images, (iii) 91,200 frames from randomly selected video sequence of 1.5 seconds in duration and (iv) 143 pairs of left and right point cloud data compiled from corneal topography of eye regions collected from a subset, 143 out of 152, participants in the study. A baseline experiment has been evaluated on OpenEDS for the task of semantic segmentation of pupil, iris, sclera and background, with the mean intersectionover-union (mIoU) of 98.3 %. We anticipate that OpenEDS will create opportunities to researchers in the eye tracking community and the broader machine learning and computer vision community to advance the state of eye-tracking for VR applications. The dataset is available for download upon request at https://research.fb.com/programs/openeds-challenge

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03702

PDF

http://arxiv.org/pdf/1905.03702
Read All
Categorical Feature Compression via Submodular Optimization

2019-04-30

MohammadHossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab S. Mirrokni, Afshin Rostamizadeh

arXiv_AI

arXiv_AI Optimization Prediction
Abstract

In the era of big data, learning from categorical features with very large vocabularies (e.g., 28 million for the Criteo click prediction dataset) has become a practical challenge for machine learning researchers and practitioners. We design a highly-scalable vocabulary compression algorithm that seeks to maximize the mutual information between the compressed categorical feature and the target binary labels and we furthermore show that its solution is guaranteed to be within a $1-1/e \approx 63\%$ factor of the global optimal solution. To achieve this, we introduce a novel re-parametrization of the mutual information objective, which we prove is submodular, and design a data structure to query the submodular function in amortized $O(\log n )$ time (where $n$ is the input vocabulary size). Our complete algorithm is shown to operate in $O(n \log n )$ time. Additionally, we design a distributed implementation in which the query data structure is decomposed across $O(k)$ machines such that each machine only requires $O(\frac n k)$ space, while still preserving the approximation guarantee and using only logarithmic rounds of computation. We also provide analysis of simple alternative heuristic compression methods to demonstrate they cannot achieve any approximation guarantee. Using the large-scale Criteo learning task, we demonstrate better performance in retaining mutual information and also verify competitive learning performance compared to other baseline methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13389

PDF

http://arxiv.org/pdf/1904.13389
Read All
Comparative evaluation of 2D feature correspondence selection algorithms

2019-04-30

Chen Zhao, Jiaqi Yang, Yang Xiao, Zhiguo Cao

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Correspondence selection aiming at seeking correct feature correspondences from raw feature matches is pivotal for a number of feature-matching-based tasks. Various 2D (image) correspondence selection algorithms have been presented with decades of progress. Unfortunately, the lack of an in-depth evaluation makes it difficult for developers to choose a proper algorithm given a specific application. This paper fills this gap by evaluating eight 2D correspondence selection algorithms ranging from classical methods to the most recent ones on four standard datasets. The diversity of experimental datasets brings various nuisances including zoom, rotation, blur, viewpoint change, JPEG compression, light change, different rendering styles and multi-structures for comprehensive test. To further create different distributions of initial matches, a set of combinations of detector and descriptor is also taken into consideration. We measure the quality of a correspondence selection algorithm from four perspectives, i.e., precision, recall, F-measure and efficiency. According to evaluation results, the current advantages and limitations of all considered algorithms are aggregately summarized which could be treated as a “user guide” for the following developers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13383

PDF

http://arxiv.org/pdf/1904.13383
Read All
Very Deep Self-Attention Networks for End-to-End Speech Recognition

2019-04-30

Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Muller, Alex Waibel

arXiv_CL

arXiv_CL Attention Speech_Recognition RNN Recognition
Abstract

Recently, end-to-end sequence-to-sequence models for speech recognition have gained significant interest in the research community. While previous architecture choices revolve around time-delay neural networks (TDNN) and long short-term memory (LSTM) recurrent neural networks, we propose to use self-attention via the Transformer architecture as an alternative. Our analysis shows that deep Transformer networks with high learning capacity are able to exceed performance from previous end-to-end approaches and even match the conventional hybrid systems. Moreover, we trained very deep models with up to 48 Transformer layers for both encoder and decoders combined with stochastic residual connections, which greatly improve generalizability and training efficiency. The resulting models outperform all previous end-to-end ASR approaches on the Switchboard benchmark. An ensemble of these models achieve 9.9% and 17.7% WER on Switchboard and CallHome test sets respectively. This finding brings our end-to-end models to competitive levels with previous hybrid systems. Further, with model ensembling the Transformers can outperform certain hybrid systems, which are more complicated in terms of both structure and training procedure.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13377

PDF

http://arxiv.org/pdf/1904.13377
Read All
The Level Weighted Structural Similarity Loss: A Step Away from the MSE

2019-04-30

Yingjing Lu

arXiv_CV

arXiv_CV CNN Relation
Abstract

The Mean Square Error (MSE) has shown its strength when applied in deep generative models such as Auto-Encoders to model reconstruction loss. However, in image domain especially, the limitation of MSE is obvious: it assumes pixel independence and ignores spatial relationships of samples. This contradicts most architectures of Auto-Encoders which use convolutional layers to extract spatial dependent features. We base on the structural similarity metric (SSIM) and propose a novel level weighted structural similarity (LWSSIM) loss for convolutional Auto-Encoders. Experiments on common datasets on various Auto-Encoder variants show that our loss is able to outperform the MSE loss and the Vanilla SSIM loss. We also provide reasons why our model is able to succeed in cases where the standard SSIM loss fails.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13362

PDF

http://arxiv.org/pdf/1904.13362
Read All
Structured Prediction using cGANs with Fusion Discriminator

2019-04-30

Faisal Mahmood, Wenhao Xu, Nicholas J. Durr, Jeremiah W. Johnson, Alan Yuille

arXiv_CV

arXiv_CV Adversarial Segmentation GAN CNN Semantic_Segmentation Prediction
Abstract

We propose the fusion discriminator, a single unified framework for incorporating conditional information into a generative adversarial network (GAN) for a variety of distinct structured prediction tasks, including image synthesis, semantic segmentation, and depth estimation. Much like commonly used convolutional neural network – conditional Markov random field (CNN-CRF) models, the proposed method is able to enforce higher-order consistency in the model, but without being limited to a very specific class of potentials. The method is conceptually simple and flexible, and our experimental results demonstrate improvement on several diverse structured prediction tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13358

PDF

http://arxiv.org/pdf/1904.13358
Read All
Object Contour and Edge Detection with RefineContourNet

2019-04-30

Andre Peter Kelm, Vijesh Soorya Rao, Udo Zolzer

arXiv_CV

arXiv_CV Detection
Abstract

A ResNet-based multi-path refinement CNN is used for object contour detection. For this task, we prioritise the effective utilization of the high-level abstraction capability of a ResNet, which leads to state-of-the-art results for edge detection. Keeping our focus in mind, we fuse the high, mid and low-level features in that specific order, which differs from many other approaches. It uses the tensor with the highest-levelled features as the starting point to combine it layer-by-layer with features of a lower abstraction level until it reaches the lowest level. We train this network on a modified PASCAL VOC 2012 dataset for object contour detection and evaluate on a refined PASCAL-val dataset reaching an excellent performance and an Optimal Dataset Scale (ODS) of 0.752. Furthermore, by fine-training on the BSDS500 dataset we reach state-of-the-art results for edge-detection with an ODS of 0.824.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13353

PDF

http://arxiv.org/pdf/1904.13353
Read All
Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text

2019-04-30

Murali Karthick Baskar, Shinji Watanabe, Ramon Astudillo, Takaaki Hori, Lukáš Burget, Jan Černocký

arXiv_CL

arXiv_CL
Abstract

Sequence-to-sequence ASR models require large quantities of data to attain high performance. For this reason, there has been a recent surge in interest for self-supervised and supervised training in such models. This work builds upon recent results showing notable improvements in self-supervised training using cycle-consistency and related techniques. Such techniques derive training procedures and losses able to leverage unpaired speech and/or text data by combining ASR with text-to-speech (TTS) models. In particular, this work proposes a new self-supervised loss combining an end-to-end differentiable ASR$\rightarrow$TTS loss with a point estimate TTS$\rightarrow$ASR loss. The method is able to leverage both unpaired speech and text data to outperform recently proposed related techniques in terms of \%WER. We provide extensive results analyzing the impact of data quantity and speech and text modalities and show consistent gains across WSJ and Librispeech corpora. Our code is provided to reproduce the experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.01152

PDF

http://arxiv.org/pdf/1905.01152
Read All
PYRO-NN: Python Reconstruction Operators in Neural Networks

2019-04-30

Christopher Syben, Markus Michen, Bernhard Stimpel, Stephan Seitz, Stefan Ploner, Andreas K. Maier

arXiv_CV

arXiv_CV Embedding Deep_Learning
Abstract

Purpose: Recently, several attempts were conducted to transfer deep learning to medical image reconstruction. An increasingly number of publications follow the concept of embedding the CT reconstruction as a known operator into a neural network. However, most of the approaches presented lack an efficient CT reconstruction framework fully integrated into deep learning environments. As a result, many approaches are forced to use workarounds for mathematically unambiguously solvable problems. Methods: PYRO-NN is a generalized framework to embed known operators into the prevalent deep learning framework Tensorflow. The current status includes state-of-the-art parallel-, fan- and cone-beam projectors and back-projectors accelerated with CUDA provided as Tensorflow layers. On top, the framework provides a high level Python API to conduct FBP and iterative reconstruction experiments with data from real CT systems. Results: The framework provides all necessary algorithms and tools to design end-to-end neural network pipelines with integrated CT reconstruction algorithms. The high level Python API allows a simple use of the layers as known from Tensorflow. To demonstrate the capabilities of the layers, the framework comes with three baseline experiments showing a cone-beam short scan FDK reconstruction, a CT reconstruction filter learning setup, and a TV regularized iterative reconstruction. All algorithms and tools are referenced to a scientific publication and are compared to existing non deep learning reconstruction frameworks. The framework is available as open-source software at \url{https://github.com/csyben/PYRO-NN}. Conclusions: PYRO-NN comes with the prevalent deep learning framework Tensorflow and allows to setup end-to-end trainable neural networks in the medical image reconstruction context. We believe that the framework will be a step towards reproducible research

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13342

PDF

http://arxiv.org/pdf/1904.13342
Read All
Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition

2019-04-30

Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong

arXiv_AI

arXiv_AI Regularization Adversarial Speech_Recognition Classification Recognition
Abstract

Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain. This can be achieved by adversarial training of deep neural network (DNN) acoustic models to learn an intermediate deep representation that is both senone-discriminative and domain-invariant. Specifically, the DNN is trained to jointly optimize the primary task of senone classification and the secondary task of domain classification with adversarial objective functions. In this work, instead of only focusing on learning a domain-invariant feature (i.e. the shared component between domains), we also characterize the difference between the source and target domain distributions by explicitly modeling the private component of each domain through a private component extractor DNN. The private component is trained to be orthogonal with the shared component and thus implicitly increases the degree of domain-invariance of the shared component. A reconstructor DNN is used to reconstruct the original speech feature from the private and shared components as a regularization. This domain separation framework is applied to the unsupervised environment adaptation task and achieved 11.08% relative WER reduction from the gradient reversal layer training, a representative adversarial training method, for automatic speech recognition on CHiME-3 dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1711.08010

PDF

http://arxiv.org/pdf/1711.08010
Read All
Coevo: a collaborative design platform with artificial agents

2019-04-30

Gerard Serra, David Miralles

arXiv_AI

arXiv_AI Knowledge
Abstract

We present Coevo, an online platform that allows both humans and artificial agents to design shapes that solve different tasks. Our goal is to explore common shared design tools that can be used by humans and artificial agents in a context of creation. This approach can provide a better knowledge transfer and interaction with artificial agents since a common language of design is defined. In this paper, we outline the main components of this platform and discuss the definition of a human-centered language to enhance human-AI collaboration in co-creation scenarios.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13333

PDF

http://arxiv.org/pdf/1904.13333
Read All
Adversarial Feature-Mapping for Speech Enhancement

2019-04-30

Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang

arXiv_AI

arXiv_AI Adversarial Classification
Abstract

Feature-mapping with deep neural networks is commonly used for single-channel speech enhancement, in which a feature-mapping network directly transforms the noisy features to the corresponding enhanced ones and is trained to minimize the mean square errors between the enhanced and clean features. In this paper, we propose an adversarial feature-mapping (AFM) method for speech enhancement which advances the feature-mapping approach with adversarial learning. An additional discriminator network is introduced to distinguish the enhanced features from the real clean ones. The two networks are jointly optimized to minimize the feature-mapping loss and simultaneously mini-maximize the discrimination loss. The distribution of the enhanced features is further pushed towards that of the clean features through this adversarial multi-task training. To achieve better performance on ASR task, senone-aware (SA) AFM is further proposed in which an acoustic model network is jointly trained with the feature-mapping and discriminator networks to optimize the senone classification loss in addition to the AFM losses. Evaluated on the CHiME-3 dataset, the proposed AFM achieves 16.95% and 5.27% relative word error rate (WER) improvements over the real noisy data and the feature-mapping baseline respectively and the SA-AFM achieves 9.85% relative WER improvement over the multi-conditional acoustic model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.02251

PDF

http://arxiv.org/pdf/1809.02251
Read All
Cycle-Consistent Speech Enhancement

2019-04-30

Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang

arXiv_CL

arXiv_CL Adversarial
Abstract

Feature mapping using deep neural networks is an effective approach for single-channel speech enhancement. Noisy features are transformed to the enhanced ones through a mapping network and the mean square errors between the enhanced and clean features are minimized. In this paper, we propose a cycle-consistent speech enhancement (CSE) in which an additional inverse mapping network is introduced to reconstruct the noisy features from the enhanced ones. A cycle-consistent constraint is enforced to minimize the reconstruction loss. Similarly, a backward cycle of mappings is performed in the opposite direction with the same networks and losses. With cycle-consistency, the speech structure is well preserved in the enhanced features while noise is effectively reduced such that the feature-mapping network generalizes better to unseen data. In cases where only unparalleled noisy and clean data is available for training, two discriminator networks are used to distinguish the enhanced and noised features from the clean and noisy ones. The discrimination losses are jointly optimized with reconstruction losses through adversarial multi-task learning. Evaluated on the CHiME-3 dataset, the proposed CSE achieves 19.60% and 6.69% relative word error rate improvements respectively when using or without using parallel clean and noisy speech data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.02253

PDF

http://arxiv.org/pdf/1809.02253
Read All
Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation

2019-04-30

Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang

arXiv_CL

arXiv_CL Adversarial Knowledge Transfer_Learning Classification Recognition
Abstract

The teacher-student (T/S) learning has been shown effective in unsupervised domain adaptation [1]. It is a form of transfer learning, not in terms of the transfer of recognition decisions, but the knowledge of posteriori probabilities in the source domain as evaluated by the teacher model. It learns to handle the speaker and environment variability inherent in and restricted to the speech signal in the target domain without proactively addressing the robustness to other likely conditions. Performance degradation may thus ensue. In this work, we advance T/S learning by proposing adversarial T/S learning to explicitly achieve condition-robust unsupervised domain adaptation. In this method, a student acoustic model and a condition classifier are jointly optimized to minimize the Kullback-Leibler divergence between the output distributions of the teacher and student models, and simultaneously, to min-maximize the condition classification loss. A condition-invariant deep feature is learned in the adapted student model through this procedure. We further propose multi-factorial adversarial T/S learning which suppresses condition variabilities caused by multiple factors simultaneously. Evaluated with the noisy CHiME-3 test set, the proposed methods achieve relative word error rate improvements of 44.60% and 5.38%, respectively, over a clean source model and a strong T/S learning baseline model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.00644

PDF

http://arxiv.org/pdf/1804.00644
Read All
Learning from Implicit Information in Natural Language Instructions for Robotic Manipulations

2019-04-30

Ozan Arkan Can, Pedro Zuidberg Dos Martires, Andreas Persson, Julian Gaal, Amy Loutfi, Luc De Raedt, Deniz Yuret, Alessandro Saffiotti

arXiv_AI

arXiv_AI Relation
Abstract

Human-robot interaction often occurs in the form of instructions given from a human to a robot. For a robot to successfully follow instructions, a common representation of the world and objects in it should be shared between humans and the robot so that the instructions can be grounded. Achieving this representation can be done via learning, where both the world representation and the language grounding are learned simultaneously. However, in robotics this can be a difficult task due to the cost and scarcity of data. In this paper, we tackle the problem by separately learning the world representation of the robot and the language grounding. While this approach can address the challenges in getting sufficient data, it may give rise to inconsistencies between both learned components. Therefore, we further propose Bayesian learning to resolve such inconsistencies between the natural language grounding and a robot’s world representation by exploiting spatio-relational information that is implicitly present in instructions given by a human. Moreover, we demonstrate the feasibility of our approach on a scenario involving a robotic arm in the physical world.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13324

PDF

http://arxiv.org/pdf/1904.13324
Read All
Model Comparison for Semantic Grouping

2019-04-30

Francisco Vargas, Kamen Brestnichki, Nils Hammerla

arXiv_CL

arXiv_CL Embedding
Abstract

We introduce a probabilistic framework for quantifying the semantic similarity between two groups of embeddings. We formulate the task of semantic similarity as a model comparison task in which we contrast a generative model which jointly models two sentences versus one that does not. We illustrate how this framework can be used for the Semantic Textual Similarity tasks using clear assumptions about how the embeddings of words are generated. We apply model comparison that utilises information criteria to address some of the shortcomings of Bayesian model comparison, whilst still penalising model complexity. We achieve competitive results by applying the proposed framework with an appropriate choice of likelihood on the STS datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.13323

PDF

http://arxiv.org/pdf/1904.13323
Read All
Speaker-Invariant Training via Adversarial Learning

2019-04-30

Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang (Fred) Juang

arXiv_AI

arXiv_AI Adversarial Classification
Abstract

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system. We call the scheme speaker-invariant training (SIT). In SIT, a DNN acoustic model and a speaker classifier network are jointly optimized to minimize the senone (tied triphone state) classification loss, and simultaneously mini-maximize the speaker classification loss. A speaker-invariant and senone-discriminative deep feature is learned through this adversarial multi-task learning. With SIT, a canonical DNN acoustic model with significantly reduced variance in its output probabilities is learned with no explicit speaker-independent (SI) transformations or speaker-specific representations used in training or testing. Evaluated on the CHiME-3 dataset, the SIT achieves 4.99% relative word error rate (WER) improvement over the conventional SI acoustic model. With additional unsupervised speaker adaptation, the speaker-adapted (SA) SIT model achieves 4.86% relative WER gain over the SA SI acoustic model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.00732

PDF

http://arxiv.org/pdf/1804.00732
Read All
Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support

2019-04-30

Gokce Keskin, Tyler Lee, Cory Stephenson, Oguz H. Elibol

arXiv_CL

arXiv_CL GAN Embedding
Abstract

We present a Cycle-GAN based many-to-many voice conversion method that can convert between speakers that are not in the training set. This property is enabled through speaker embeddings generated by a neural network that is jointly trained with the Cycle-GAN. In contrast to prior work in this domain, our method enables conversion between an out-of-dataset speaker and a target speaker in either direction and does not require re-training. Out-of-dataset speaker conversion quality is evaluated using an independently trained speaker identification model, and shows good style conversion characteristics for previously unheard speakers. Subjective tests on human listeners show style conversion quality for in-dataset speakers is comparable to the state-of-the-art baseline model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.02525

PDF

https://arxiv.org/pdf/1905.02525
Read All

47/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL