Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Kinematic Synthesis of Parallel Manipulator via Neural Network Approach

2019-04-08

J. Ghasemi, R. Moradinezhad, M. A. Hosseini

arXiv_RO

arXiv_RO
Abstract

In this research, Artificial Neural Networks (ANNs) have been used as a powerful tool to solve the inverse kinematic equations of a parallel robot. For this purpose, we have developed the kinematic equations of a Tricept parallel kinematic mechanism with two rotational and one translational degrees of freedom (DoF). Using the analytical method, the inverse kinematic equations are solved for specific trajectory, and used as inputs for the applied ANNs. The results of both applied networks (Multi-Layer Perceptron and Redial Basis Function) satisfied the required performance in solving complex inverse kinematics with proper accuracy and speed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04668

PDF

http://arxiv.org/pdf/1904.04668
Read All
Text-based Depression Detection: What Triggers An Alert

2019-04-08

Heinrich Dinkel, Mengyue Wu, Kai Yu

arXiv_CL

arXiv_CL Attention Embedding RNN Deep_Learning Prediction Detection
Abstract

Recent advances in automatic depression detection mostly derive from modality fusion and deep learning methods. However multi-modal approaches insert significant difficulty in data collection phase while deep learning methods’ opaqueness lowers its credibility. This current work proposes a text-based multi-task BLSTM model with pretrained word embeddings. Our method outputs depression presence results as well as predicted severity score, culminating a state-of-the-art F1 score of 0.87, outperforming previous multi-modal studies. We also achieve the lowest RMSE compared with currently available text-based approaches. Further, by utilizing a per time step attention mechanism we analyse the sentences/words that contribute most in predicting the depressed state. Surprisingly, unmeaningful' words/paralinguistic information such as um’ and `uh’ are the indicators to our model when making a depression prediction. It is for the first time revealed that fillers in a conversation trigger a depression alert for a deep learning model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.05154

PDF

http://arxiv.org/pdf/1904.05154
Read All
Creating Pro-Level AI for Real-Time Fighting Game with Deep Reinforcement Learning

2019-04-08

Inseok Oh, Seungeun Rho, Sangbin Moon, Seongho Son, Hyoil Lee, Jinyun Chung

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning
Abstract

Reinforcement learning combined with deep neural networks has performed remarkably well in many genres of game recently. It surpassed human-level performance in fixed game environments and turn-based two player board games. However, no research has ever shown a result that surpassed human level in modern complex fighting games, to the best of our knowledge. This is due to the inherent difficulties of modern fighting games, including vast action spaces, real-time constraints, and performance generalizations required for various opponents. We overcame these challenges and made 1v1 battle AI agents for the commercial game, “Blade & Soul”. The trained agents competed against five professional gamers and achieved 62% of win rate.This paper presents a practical reinforcement learning method including a novel self-play curriculum and data skipping techniques. Through the curriculum, three different styles of agents are created by reward shaping, and are trained against each other for robust performance. Additionally, this paper suggests data skipping techniques which increased data efficiency and facilitated explorations in vast spaces.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03821

PDF

http://arxiv.org/pdf/1904.03821
Read All
Real-time Soft Robot 3D Proprioception via Deep Vision-based Sensing

2019-04-08

Ruoyu Wang, Shiheng Wang, Erdong Xiao, Kshitij Jindal, Wenzhen Yuan, Chen Feng

arXiv_RO

arXiv_RO
Abstract

The soft robots are welcomed in many robotic applications because of their high flexibility, which also poses a long-standing challenge on their proprioception, or measuring the real-time 3D shapes of the soft robots from internal sensors. The challenge exists in both the sensor design and robot modeling. In this paper, we propose a framework to measure the real-time high-resolution 3D shapes of soft robots. The framework is based on an embedded camera to capture the inside/outside patterns of the robots under different loading conditions, and a CNN to produce a latent code representing the robot state, which can then be used to reconstruct the 3D shape using a neural network improved from FoldingNet. We tested the framework on four different soft actuators with various kinds of deformations, and achieved real-time computation ($<$2ms/frame) for robust shape estimation of high precision ($<$5% relative error for 2025 points) at an arbitrary resolution. We believe the method could be widely applied to different designs of soft robots for proprioception, and enabling people to better control them under complicated environments. Our code is available at https://ai4ce.github.io/Deep-Soft-Prorioception/.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03820

PDF

http://arxiv.org/pdf/1904.03820
Read All
WeNet: Weighted Networks for Recurrent Network Architecture Search

2019-04-08

Zhiheng Huang, Bing Xiang

arXiv_AI

arXiv_AI Image_Classification Classification Deep_Learning Language_Model
Abstract

In recent years, there has been increasing demand for automatic architecture search in deep learning. Numerous approaches have been proposed and led to state-of-the-art results in various applications, including image classification and language modeling. In this paper, we propose a novel way of architecture search by means of weighted networks (WeNet), which consist of a number of networks, with each assigned a weight. These weights are updated with back-propagation to reflect the importance of different networks. Such weighted networks bear similarity to mixture of experts. We conduct experiments on Penn Treebank and WikiText-2. We show that the proposed WeNet can find recurrent architectures which result in state-of-the-art performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03819

PDF

http://arxiv.org/pdf/1904.03819
Read All
Towards Real-Time Automatic Portrait Matting on Mobile Devices

2019-04-08

Seokjun Seo, Seungwoo Choi, Martin Kersner, Beomjun Shin, Hyungsuk Yoon, Hyeongmin Byun, Sungjoo Ha

arXiv_AI

arXiv_AI Inference
Abstract

We tackle the problem of automatic portrait matting on mobile devices. The proposed model is aimed at attaining real-time inference on mobile devices with minimal degradation of model performance. Our model MMNet, based on multi-branch dilated convolution with linear bottleneck blocks, outperforms the state-of-the-art model and is orders of magnitude faster. The model can be accelerated four times to attain 30 FPS on Xiaomi Mi 5 device with moderate increase in the gradient error. Under the same conditions, our model has an order of magnitude less number of parameters and is faster than Mobile DeepLabv3 while maintaining comparable performance. The accompanied implementation can be found at \url{https://github.com/hyperconnect/MMNet}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03816

PDF

http://arxiv.org/pdf/1904.03816
Read All
Quasi-Direct Drive for Low-Cost Compliant Robotic Manipulation

2019-04-08

David V. Gealy, Stephen McKinley, Brent Yi, Philipp Wu, Phillip R. Downey, Greg Balke, Allan Zhao, Menglong Guo, Rachel Thomasson, Anthony Sinclair, Peter Cuellar, Zoe McCarthy, Pieter Abbeel

arXiv_RO

arXiv_RO Face
Abstract

Robots must cost less and be force-controlled to enable widespread, safe deployment in unconstrained human environments. We propose Quasi-Direct Drive actuation as a capable paradigm for robotic force-controlled manipulation in human environments at low-cost. Our prototype - Blue - is a human scale 7 Degree of Freedom arm with 2kg payload. Blue can cost less than $5000. We show that Blue has dynamic properties that meet or exceed the needs of human operators: the robot has a nominal position-control bandwidth of 7.5Hz and repeatability within 4mm. We demonstrate a Virtual Reality based interface that can be used as a method for telepresence and collecting robot training demonstrations. Manufacturability, scaling, and potential use-cases for the Blue system are also addressed. Videos and additional information can be found online at berkeleyopenarms.github.io

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03815

PDF

http://arxiv.org/pdf/1904.03815
Read All
Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

2019-04-08

Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha

arXiv_SD

arXiv_SD Face CNN Deep_Learning Quantitative
Abstract

Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide adoption of convolutional neural networks (CNNs) in KWS systems due to their exceptional accuracy and robustness. The main challenge faced by KWS systems is the trade-off between high accuracy and low latency. Unfortunately, there has been little quantitative analysis of the actual latency of KWS models on mobile devices. This is especially concerning since conventional convolution-based KWS approaches are known to require a large number of operations to attain an adequate level of performance. In this paper, we propose a temporal convolution for real-time KWS on mobile devices. Unlike most of the 2D convolution-based KWS approaches that require a deep architecture to fully capture both low- and high-frequency domains, we exploit temporal convolutions with a compact ResNet architecture. In Google Speech Command Dataset, we achieve more than \textbf{385x} speedup on Google Pixel 1 and surpass the accuracy compared to the state-of-the-art model. In addition, we release the implementation of the proposed and the baseline models including an end-to-end pipeline for training models and evaluating them on mobile devices.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03814

PDF

http://arxiv.org/pdf/1904.03814
Read All
Visual Localization Using Sparse Semantic 3D Map

2019-04-08

Tianxin Shi, Shuhan Shen, Xiang Gao, Lingjie Zhu

arXiv_CV

arXiv_CV Sparse
Abstract

Accurate and robust visual localization under a wide range of viewing condition variations including season and illumination changes, as well as weather and day-night variations, is the key component for many computer vision and robotics applications. Under these conditions, most traditional methods would fail to locate the camera. In this paper we present a visual localization algorithm that combines structure-based method and image-based method with semantic information. Given semantic information about the query and database images, the retrieved images are scored according to the semantic consistency of the 3D model and the query image. Then the semantic matching score is used as weight for RANSAC’s sampling and the pose is solved by a standard PnP solver. Experiments on the challenging long-term visual localization benchmark dataset demonstrate that our method has significant improvement compared with the state-of-the-arts.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03803

PDF

http://arxiv.org/pdf/1904.03803
Read All
Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

2019-04-08

Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma

arXiv_CL

arXiv_CL Speech_Recognition Embedding Recognition
Abstract

The lack of code-switch training data is one of the major concerns in the development of end-to-end code-switching automatic speech recognition (ASR) models. In this work, we propose a method to train an improved end-to-end code-switching ASR using only monolingual data. Our method encourages the distributions of output token embeddings of monolingual languages to be similar, and hence, promotes the ASR model to easily code-switch between languages. Specifically, we propose to use Jensen-Shannon divergence and cosine distance based constraints. The former will enforce output embeddings of monolingual languages to possess similar distributions, while the later simply brings the centroids of two distributions to be close to each other. Experimental results demonstrate high effectiveness of the proposed method, yielding up to 4.5% absolute mixed error rate improvement on Mandarin-English code-switching ASR task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03802

PDF

http://arxiv.org/pdf/1904.03802
Read All
Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation

2019-04-08

Yerbolat Khassanov, Zhiping Zeng, Van Tung Pham, Haihua Xu, Eng Siong Chng

arXiv_CL

arXiv_CL Speech_Recognition Embedding Language_Model Recognition
Abstract

The neural language models (NLM) achieve strong generalization capability by learning the dense representation of words and using them to estimate probability distribution function. However, learning the representation of rare words is a challenging problem causing the NLM to produce unreliable probability estimates. To address this problem, we propose a method to enrich representations of rare words in pre-trained NLM and consequently improve its probability estimation performance. The proposed method augments the word embedding matrices of pre-trained NLM while keeping other parameters unchanged. Specifically, our method updates the embedding vectors of rare words using embedding vectors of other semantically and syntactically similar words. To evaluate the proposed method, we enrich the rare street names in the pre-trained NLM and use it to rescore 100-best hypotheses output from the Singapore English speech recognition system. The enriched NLM reduces the word error rate by 6% relative and improves the recognition accuracy of the rare words by 16% absolute as compared to the baseline NLM.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03799

PDF

http://arxiv.org/pdf/1904.03799
Read All
FoveaBox: Beyond Anchor-based Object Detector

2019-04-08

Tao Kong, Fuchun Sun, Huaping Liu, Yuning Jiang, Jianbo Shi

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

We present FoveaBox, an accurate, flexible and completely anchor-free framework for object detection. While almost all state-of-the-art object detectors utilize the predefined anchors to enumerate possible locations, scales and aspect ratios for the search of the objects, their performance and generalization ability are also limited to the design of anchors. Instead, FoveaBox directly learns the object existing possibility and the bounding box coordinates without anchor reference. This is achieved by: (a) predicting category-sensitive semantic maps for the object existing possibility, and (b) producing category-agnostic bounding box for each position that potentially contains an object. The scales of target boxes are naturally associated with feature pyramid representations for each input image. Without bells and whistles, FoveaBox achieves state-of-the-art single model performance of 42.1 AP on the standard COCO detection benchmark. Specially for the objects with arbitrary aspect ratios, FoveaBox brings in significant improvement compared to the anchor-based detectors. More surprisingly, when it is challenged by the stretched testing images, FoveaBox shows great robustness and generalization ability to the changed distribution of bounding box shapes. The code will be made publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03797

PDF

http://arxiv.org/pdf/1904.03797
Read All
Improved Speaker-Dependent Separation for CHiME-5 Challenge

2019-04-08

Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu

arXiv_SD

arXiv_SD Speech_Recognition Recognition
Abstract

This paper summarizes several follow-up contributions for improving our submitted NWPU speaker-dependent system for CHiME-5 challenge, which aims to solve the problem of multi-channel, highly-overlapped conversational speech recognition in a dinner party scenario with reverberations and non-stationary noises. We adopt a speaker-aware training method by using i-vector as the target speaker information for multi-talker speech separation. With only one unified separation model for all speakers, we achieve a 10\% absolute improvement in terms of word error rate (WER) over the previous baseline of 80.28\% on the development set by leveraging our newly proposed data processing techniques and beamforming approach. With our improved back-end acoustic model, we further reduce WER to 60.15\% which surpasses the result of our submitted CHiME-5 challenge system without applying any fusion techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03792

PDF

http://arxiv.org/pdf/1904.03792
Read All
Selectivity or Invariance: Boundary-aware Salient Object Detection

2019-04-08

Jinming Su, Jia Li, Yu Zhang, Changqun Xia, Yonghong Tian

arXiv_CV

arXiv_CV Salient Object_Detection Face Detection
Abstract

Typically, a salient object detection (SOD) model faces opposite requirements in processing object interiors and boundaries. The features of interiors should be invariant to strong appearance change so as to pop-out the salient object as a whole, while the features of boundaries should be selective to slight appearance change to distinguish salient objects and background. To address this selectivity-invariance dilemma, we propose a novel boundary-aware network with successive dilation for image-based SOD. In this network, the feature selectivity at boundaries is enhanced by incorporating a boundary localization stream, while the feature invariance at interiors is guaranteed with a complex interior perception stream. Moreover, a transition compensation stream is adopted to amend the probable failures in transitional regions between interiors and boundaries. In particular, an integrated successive dilation module is proposed to enhance the feature invariance at interiors and transitional regions. Extensive experiments on six datasets show that the proposed approach outperforms 16 state-of-the-art methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.10066

PDF

https://arxiv.org/pdf/1812.10066
Read All
Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation

2019-04-08

Chaitanya Narisetty, Tatsuya Komatsu, Reishi Kondo

arXiv_SD

arXiv_SD
Abstract

This paper proposes a determined blind source separation method using Bayesian non-parametric modelling of sources. Conventionally source signals are separated from a given set of mixture signals by modelling them using non-negative matrix factorization (NMF). However in NMF, a latent variable signifying model complexity must be appropriately specified to avoid over-fitting or under-fitting. As real-world sources can be of varying and unknown complexities, we propose a Bayesian non-parametric framework which is invariant to such latent variables. We show that our proposed method adapts to different source complexities, while conventional methods require parameter tuning for optimal separation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03787

PDF

http://arxiv.org/pdf/1904.03787
Read All
Resource Constrained Neural Network Architecture Search

2019-04-08

Yunyang Xiong, Ronak Mehta, Vikas Singh

arXiv_CV

arXiv_CV Reinforcement_Learning Optimization
Abstract

The design of neural network architectures is frequently either based on human expertise using trial/error and empirical feedback or tackled via large scale reinforcement learning strategies run over distinct discrete architecture choices. In the latter case, the optimization task is non-differentiable and also not very amenable to derivative-free optimization methods. Most methods in use today require exorbitant computational resources. And if we want networks that additionally satisfy resource constraints, the above challenges are exacerbated because the search procedure must now balance accuracy with certain budget constraints on resources. We formulate this problem as the optimization of a set function - we find that the empirical behavior of this set function often (but not always) satisfies marginal gain and monotonicity principles - properties central to the idea of submodularity. Based on this observation, we adapt algorithms that are well-known within discrete optimization to obtain heuristic schemes for neural network architecture search, with resource constraints on the architecture. This simple scheme when applied on CIFAR-100 and ImageNet, identifies resource-constrained architectures with quantifiably better performance than current state-of-the-art models designed for mobile devices. Specifically, we find high-performing architectures with fewer parameters and computations by a search method that is much faster.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03786

PDF

http://arxiv.org/pdf/1904.03786
Read All
ANTNets: Mobile Convolutional Neural Networks for Resource Efficient Image Classification

2019-04-07

Yunyang Xiong, Hyunwoo J. Kim, Varsha Hedau

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

Deep convolutional neural networks have achieved remarkable success in computer vision. However, deep neural networks require large computing resources to achieve high performance. Although depthwise separable convolution can be an efficient module to approximate a standard convolution, it often leads to reduced representational power of networks. In this paper, under budget constraints such as computational cost (MAdds) and the parameter count, we propose a novel basic architectural block, ANTBlock. It boosts the representational power by modeling, in a high dimensional space, interdependency of channels between a depthwise convolution layer and a projection layer in the ANTBlocks. Our experiments show that ANTNet built by a sequence of ANTBlocks, consistently outperforms state-of-the-art low-cost mobile convolutional neural networks across multiple datasets. On CIFAR100, our model achieves 75.7% top-1 accuracy, which is 1.5% higher than MobileNetV2 with 8.3% fewer parameters and 19.6% less computational cost. On ImageNet, our model achieves 72.8% top-1 accuracy, which is 0.8% improvement, with 157.7ms (20% faster) on iPhone 5s over MobileNetV2.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03775

PDF

http://arxiv.org/pdf/1904.03775
Read All
Time Domain Audio Visual Speech Separation

2019-04-07

Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu

arXiv_SD

arXiv_SD Speech_Recognition Embedding Recognition
Abstract

Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement. This paper introduces a new time-domain audio-visual architecture for target speaker extraction from monaural mixtures. The architecture generalizes the previous TasNet (time-domain speech separation network) to enable multi-modal learning and at meanwhile it extends the classical audio-visual speech separation from frequency-domain to time-domain. The main components of proposed architecture include an audio encoder, a video encoder which can extract lip embedding from video steams, a multi-modal separation network and an audio decoder. Experiments on simulated mixtures based on recently released LRS2 dataset show that our method can bring 3dB+ and 4dB+ Si-SNR improvements on 2 and 3 speakers cases respectively, compared to audio-only TasNet and frequency domain audio-visual networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03760

PDF

http://arxiv.org/pdf/1904.03760
Read All
Meta-Learning with Differentiable Convex Optimization

2019-04-07

Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, Stefano Soatto

arXiv_CV

arXiv_CV Embedding Optimization Classification Recognition
Abstract

Many meta-learning approaches for few-shot learning rely on simple base learners such as nearest-neighbor classifiers. However, even in the few-shot regime, discriminatively trained linear predictors can offer better generalization. We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks. Our objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories. To efficiently solve the objective, we exploit two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem. This allows us to use high-dimensional embeddings with improved generalization at a modest increase in computational overhead. Our approach, named MetaOptNet, achieves state-of-the-art performance on miniImageNet, tieredImageNet, CIFAR-FS, and FC100 few-shot learning benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03758

PDF

http://arxiv.org/pdf/1904.03758
Read All
ContactGrasp: Functional Multi-finger Grasp Synthesis from Contact

2019-04-07

Samarth Brahmbhatt, Ankur Handa, James Hays, Dieter Fox

arXiv_RO

arXiv_RO Face
Abstract

Grasping and manipulating objects is an important human skill. Since most objects are designed to be manipulated by human hands, anthropomorphic hands can enable richer human-robot interaction. Desirable grasps are not only stable, but also functional: they enable post-grasp actions with the object. However, functional grasp synthesis for high-dof anthropomorphic hands from object shape alone is challenging. We present ContactGrasp, a framework that allows functional grasp synthesis from object shape and contact on the object surface. Contact can be manually specified or obtained through demonstrations. Our contact representation is object-centric and allows functional grasp synthesis even for hand models different than the one used for demonstration. Using a dataset of contact demonstrations from humans grasping diverse household objects, we synthesize functional grasps for three hand models and two functional intents.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03754

PDF

http://arxiv.org/pdf/1904.03754
Read All
Can GCNs Go as Deep as CNNs?

2019-04-07

Guohao Li, Matthias Müller, Ali Thabet, Bernard Ghanem

arXiv_CV

arXiv_CV Segmentation CNN Semantic_Segmentation
Abstract

Convolutional Neural Networks (CNNs) achieve impressive results in a wide variety of fields. Their success benefited from a massive boost with the ability to train very deep CNN models. Despite their positive results, CNNs fail to properly address problems with non-Euclidean data. To overcome this challenge, Graph Convolutional Networks (GCNs) build graphs to represent non-Euclidean data, and borrow concepts from CNNs and apply them to train these models. GCNs show promising results, but they are limited to very shallow models due to the vanishing gradient problem. As a result most state-of-the-art GCN algorithms are no deeper than 3 or 4 layers. In this work, we present new ways to successfully train very deep GCNs. We borrow concepts from CNNs, mainly residual/dense connections and dilated convolutions, and adapt them to GCN architectures. Through extensive experiments, we show the positive effect of these deep GCN frameworks. Finally, we use these new concepts to build a very deep 56-layer GCN, and show how it significantly boosts performance (+3.7% mIoU over state-of-the-art) in the task of point cloud semantic segmentation. The project website is available at https://sites.google.com/view/deep-gcns

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03751

PDF

http://arxiv.org/pdf/1904.03751
Read All
JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks

2019-04-07

N. Benjamin Erichson, Zhewei Yao, Michael W. Mahoney

arXiv_CV

arXiv_CV Adversarial Prediction
Abstract

It has been demonstrated that very simple attacks can fool highly-sophisticated neural network architectures. In particular, so-called adversarial examples, constructed from perturbations of input data that are small or imperceptible to humans but lead to different predictions, may lead to an enormous risk in certain critical applications. In light of this, there has been a great deal of work on developing adversarial training strategies to improve model robustness. These training strategies are very expensive, in both human and computational time. To complement these approaches, we propose a very simple and inexpensive strategy which can be used to ``retrofit’’ a previously-trained network to improve its resilience to adversarial attacks. More concretely, we propose a new activation function—the JumpReLU—which, when used in place of a ReLU in an already-trained model, leads to a trade-off between predictive accuracy and robustness. This trade-off is controlled by the jump size, a hyper-parameter which can be tuned during the validation stage. Our empirical results demonstrate that this increases model robustness, protecting against adversarial attacks with substantially increased levels of perturbations. This is accomplished simply by retrofitting existing networks with our JumpReLU activation function, without the need for retraining the model. Additionally, we demonstrate that adversarially trained (robust) models can greatly benefit from retrofitting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03750

PDF

http://arxiv.org/pdf/1904.03750
Read All
Non-Prehensile Manipulation in Clutter with Human-In-The-Loop

2019-04-07

Rafael Papallas, Mehmet R. Dogar

arXiv_RO

arXiv_RO
Abstract

We propose a human-operator guided planning approach to pushing-based robotic manipulation in clutter. Most recent approaches to this problem employs the power of randomized planning (e.g. control-sampling based Kinodynamic RRT) to produce a fast working solution. We build on these control-based randomized planning approaches, but we investigate using them in conjunction with human-operator input. In our framework, the human operator supplies a highlevel plan, in the form of an ordered sequence of objects and their approximate goal positions. We present experiments in simulation and on a real robotic setup, where we compare the success rate and planning times of our human-in-theloop approach with fully autonomous sampling-based planners. We show that the human-operator provided guidance makes the low-level kinodynamic planner solve the planning problem faster and with higher success rates.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03748

PDF

http://arxiv.org/pdf/1904.03748
Read All
Surface Defect Classification in Real-Time Using Convolutional Neural Networks

2019-04-07

Selim Arikan, Kiran Varanasi, Didier Stricker

arXiv_CV

arXiv_CV Knowledge Face CNN Inference Classification Deep_Learning Detection
Abstract

Surface inspection systems are an important application domain for computer vision, as they are used for defect detection and classification in the manufacturing industry. Existing systems use hand-crafted features which require extensive domain knowledge to create. Even though Convolutional neural networks (CNNs) have proven successful in many large-scale challenges, industrial inspection systems have yet barely realized their potential due to two significant challenges: real-time processing speed requirements and specialized narrow domain-specific datasets which are sometimes limited in size. In this paper, we propose CNN models that are specifically designed to handle capacity and real-time speed requirements of surface inspection systems. To train and evaluate our network models, we created a surface image dataset containing more than 22000 labeled images with many types of surface materials and achieved 98.0% accuracy in binary defect classification. To solve the class imbalance problem in our datasets, we introduce neural data augmentation methods which are also applicable to similar domains that suffer from the same problem. Our results show that deep learning based methods are feasible to be used in surface inspection systems and outperform traditional methods in accuracy and inference time by considerable margins.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04671

PDF

http://arxiv.org/pdf/1904.04671
Read All
Bit-Flip Attack: Crushing Neural Network with Progressive Bit Search

2019-04-07

Adnan Siraj Rakin, Zhezhi He, Deliang Fan

arXiv_CV

arXiv_CV Adversarial
Abstract

Several important security issues of Deep Neural Network (DNN) have been raised recently associated with different applications and components. The most widely investigated security concern of DNN is from its malicious input, a.k.a adversarial example. Nevertheless, the security challenge of DNN’s parameters is not well explored yet. In this work, we are the first to propose a novel DNN weight attack methodology called Bit-Flip Attack (BFA) which can crush a neural network through maliciously flipping extremely small amount of bits within its weight storage memory system (i.e., DRAM). The bit-flip operations could be conducted through well-known Row-Hammer attack, while our main contribution is to develop an algorithm to identify the most vulnerable bits of DNN weight parameters (stored in memory as binary bits), that could maximize the accuracy degradation with a minimum number of bit-flips. Our proposed BFA utilizes a Progressive Bit Search (PBS) method which combines gradient ranking and progressive search to identify the most vulnerable bit to be flipped. With the aid of PBS, we can successfully attack a ResNet-18 fully malfunction (i.e., top-1 accuracy degrade from 69.8% to 0.1%) only through 13 bit-flips out of 93 million bits, while randomly flipping 100 bits merely degrades the accuracy by less than 1%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12269

PDF

http://arxiv.org/pdf/1903.12269
Read All
Unsupervised Recurrent Neural Network Grammars

2019-04-07

Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis

arXiv_CL

arXiv_CL Attention Face Inference RNN Language_Model
Abstract

Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNGs. Since directly marginalizing over the space of latent trees is intractable, we instead apply amortized variational inference. To maximize the evidence lower bound, we develop an inference network parameterized as a neural CRF constituency parser. On language modeling, unsupervised RNNGs perform as well their supervised counterparts on benchmarks in English and Chinese. On constituency grammar induction, they are competitive with recent neural language models that induce tree structures from words through attention mechanisms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03746

PDF

http://arxiv.org/pdf/1904.03746
Read All
Nonlinear Model Predictive Control for 3D Formation of Multirotor Micro Aerial Vehicles with Relative Sensing in Local Coordinates

2019-04-07

I. Kagan Erunsal, Rodrigo Ventura, Alcherio Martinoli

arXiv_RO

arXiv_RO Knowledge
Abstract

The complex tasks such as surveillance, construction, search and rescue can benefit of the maneuverability of multirotor Micro Aerial Vehicles (MAVs) to obtain robust, cooperative system behavior and formation control is a prominent component of the these complex tasks. This work focuses on the problem of three-dimensional formation control of multirotor MAVs by using exclusively relative sensory information. It proposes a centralized Nonlinear Model Predictive Control (NMPC) approach in a leader-follower scheme. A realistic six degrees of freedom mathematical model of a multirotor MAVs is introduced and leveraged in the control laws. The formulation of the problem is performed based on NMPC and relative sensing framework with respect to local coordinate frames of the robots. This type of formulation makes the formation independent of the full knowledge of global or common reference frames and the utilization of expensive global localization sensors. Real-time Iteration (RTI) based solution to optimal control problem (OCP) is proposed by taking the novel formulation into account. An extensive scenario is designed to test and validate the strategy. Evaluation of the results suggests that satisfactory robust performance is achieved and maintained under model uncertainty and noise in local sensors and even in cases where the dynamics of the formation suddenly changes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03742

PDF

http://arxiv.org/pdf/1904.03742
Read All
Unsupervised Dialog Structure Learning

2019-04-07

Weiyan Shi, Tiancheng Zhao, Zhou Yu

arXiv_AI

arXiv_AI Face Reinforcement_Learning RNN Quantitative
Abstract

Learning a shared dialog structure from a set of task-oriented dialogs is an important challenge in computational linguistics. The learned dialog structure can shed light on how to analyze human dialogs, and more importantly contribute to the design and evaluation of dialog systems. We propose to extract dialog structures using a modified VRNN model with discrete latent vectors. Different from existing HMM-based models, our model is based on variational-autoencoder (VAE). Such model is able to capture more dynamics in dialogs beyond the surface forms of the language. We find that qualitatively, our method extracts meaningful dialog structure, and quantitatively, outperforms previous models on the ability to predict unseen data. We further evaluate the model’s effectiveness in a downstream task, the dialog system building task. Experiments show that, by integrating the learned dialog structure into the reward function design, the model converges faster and to a better outcome in a reinforcement learning setting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03736

PDF

http://arxiv.org/pdf/1904.03736
Read All
Measuring Human Perception to Improve Handwritten Document Transcription

2019-04-07

Samuel Grieggs, Bingyu Shen, Pei Li, Cana Short, Jiaqi Ma, Mihow McKenny, Melody Wauke, Brian Price, Walter Scheirer

arXiv_CV

arXiv_CV Recognition
Abstract

The subtleties of human perception, as measured by vision scientists through the use of psychophysics, are important clues to the internal workings of visual recognition. For instance, measured reaction time can indicate whether a visual stimulus is easy for a subject to recognize, or whether it is hard. In this paper, we consider how to incorporate psychophysical measurements of visual perception into the loss function of a deep neural network being trained for a recognition task, under the assumption that such information can enforce consistency with human behavior. As a case study to assess the viability of this approach, we look at the problem of handwritten document transcription. While good progress has been made towards automatically transcribing modern handwriting, significant challenges remain in transcribing historical documents. Here we work towards a comprehensive transcription solution for Medieval manuscripts that combines networks trained using our novel loss formulation with natural language processing elements. In a baseline assessment, reliable performance is demonstrated for the standard IAM and RIMES datasets. Further, we go on to show feasibility for our approach on a previously published dataset and a new dataset of digitized Latin manuscripts, originally produced by scribes in the Cloister of St. Gall around the middle of the 9th century.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03734

PDF

http://arxiv.org/pdf/1904.03734
Read All
Bilinear Faster RCNN with ELA for Image Tampering Detection

2019-04-07

Robin Elizabeth Yancey

arXiv_CV

arXiv_CV Object_Detection Knowledge Detection
Abstract

With technological advances leading to an increase in mechanisms of image tampering, our fraud detection methods must continue to be upgraded to match their sophistication. One problem with current methods is that they require prior knowledge of the method of forgery in order to determine which features to extract from the image to localize the region of interest. When a machine learning algorithm is used to learn different types tampering from a large set of various image types, with a big enough database we can easily classify which images are tampered (by training on the entire image feature map for each image), but we still are left with the question of which features to train on, and how to localize the manipulation. To solve this, object detection networks such as Faster RCNN, which combine an RPN (Region Proposal Network) with a CNN have recently been adapted to fraud detection by utilizing their ability to propose bounding boxes for objects of interest to localize the tampering artifacts. In this work, an existing bilinear Faster RCNN model that was developed will be modified with the second stream having an input of the ELA (Error Level Analysis) JPEG compression level mask.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.08484

PDF

http://arxiv.org/pdf/1904.08484
Read All
Stokes Inversion based on Convolutional Neural Networks

2019-04-07

A. Asensio Ramos (1,2), C. Diaz Baso (3) ((1) Instituto de Astrofisica de Canarias, (2) Universidad de La Laguna, (3) Institute for Solar Physics, Dept. of Astronomy, Stockholm University)

arXiv_CV

arXiv_CV CNN
Abstract

Spectropolarimetric inversions are routinely used in the field of Solar Physics for the extraction of physical information from observations. The application to two-dimensional fields of view often requires the use of supercomputers with parallelized inversion codes. Even in this case, the computing time spent on the process is still very large. Our aim is to develop a new inversion code based on the application of convolutional neural networks that can quickly provide a three-dimensional cube of thermodynamical and magnetic properties from the interpretation of two-dimensional maps of Stokes profiles. We train two different architectures of fully convolutional neural networks. To this end, we use the synthetic Stokes profiles obtained from two snapshots of three-dimensional magneto-hydrodynamic numerical simulations of different structures of the solar atmosphere. We provide an extensive analysis of the new inversion technique, showing that it infers the thermodynamical and magnetic properties with a precision comparable to that of standard inversion techniques. However, it provides several key improvements: our method is around one million times faster, it returns a three-dimensional view of the physical properties of the region of interest in geometrical height, it provides quantities that cannot be obtained otherwise (pressure and Wilson depression) and the inferred properties are decontaminated from the blurring effect of instrumental point spread functions for free. The code is provided for free on a specific repository, with options for training and evaluation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03714

PDF

http://arxiv.org/pdf/1904.03714
Read All
AI Meets Austen: Towards Human-Robot Discussions of Literary Metaphor

2019-04-07

Natalie Parde, Rodney D. Nielsen

arXiv_CL

arXiv_CL Attention
Abstract

Artificial intelligence is revolutionizing formal education, fueled by innovations in learning assessment, content generation, and instructional delivery. Informal, lifelong learning settings have been the subject of less attention. We provide a proof-of-concept for an embodied book discussion companion, designed to stimulate conversations with readers about particularly creative metaphors in fiction literature. We collect ratings from 26 participants, each of whom discuss Jane Austen’s “Pride and Prejudice” with the robot across one or more sessions, and find that participants rate their interactions highly. This suggests that companion robots could be an interesting entryway for the promotion of lifelong learning and cognitive exercise in future applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03713

PDF

http://arxiv.org/pdf/1904.03713
Read All
Planar Geometry and Latest Scene Recovery from a Single Motion Blurred Image

2019-04-07

Kuldeep Purohit, Subeesh Vasu, M. Purnachandra Rao, A. N. Rajagopalan

arXiv_CV

arXiv_CV
Abstract

Existing works on motion deblurring either ignore the effects of depth-dependent blur or work with the assumption of a multi-layered scene wherein each layer is modeled in the form of fronto-parallel plane. In this work, we consider the case of 3D scenes with piecewise planar structure i.e., a scene that can be modeled as a combination of multiple planes with arbitrary orientations. We first propose an approach for estimation of normal of a planar scene from a single motion blurred observation. We then develop an algorithm for automatic recovery of a number of planes, the parameters corresponding to each plane, and camera motion from a single motion blurred image of a multiplanar 3D scene. Finally, we propose a first-of-its-kind approach to recover the planar geometry and latent image of the scene by adopting an alternating minimization framework built on our findings. Experiments on synthetic and real data reveal that our proposed method achieves state-of-the-art results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03710

PDF

http://arxiv.org/pdf/1904.03710
Read All
A Novel Apex-Time Network for Cross-Dataset Micro-Expression Recognition

2019-04-07

Min Peng, Chongyang Wang, Tao Bi, Tong Chen, XiangDong Zhou, Yu shi

arXiv_CV

arXiv_CV Deep_Learning Recognition
Abstract

The automatic recognition of micro-expression has been boosted ever since the successful introduction of deep learning approaches. Whilst researchers working on such topics are more and more tending to learn from the nature of micro-expression, the practice of using deep learning techniques has evolved from processing the entire video clip of micro-expression to the recognition on apex frame. Using apex frame is able to get rid of redundant information but the temporal evidence of micro-expression would be thereby left out. In this paper, we propose to do the recognition based on the spatial information from apex frame as well as on the temporal information from respective-adjacent frames. As such, a novel Apex-Time Network (ATNet) is proposed. Through extensive experiments on three benchmarks, we demonstrate the improvement achieved by adding the temporal information learned from adjacent frames around the apex frame. Specially, the model with such temporal information is more robust in cross-dataset validations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03699

PDF

http://arxiv.org/pdf/1904.03699
Read All
Planning and Execution of Dynamic Whole-Body Locomotion for a Hydraulic Quadruped on Challenging Terrain

2019-04-07

Alexander W. Winkler, Carlos Mastalli, Ioannis Havoutis, Michele Focchi, Darwin G. Caldwell, Claudio Semini

arXiv_RO

arXiv_RO Optimization
Abstract

We present a framework for dynamic quadrupedal locomotion over challenging terrain, where the choice of appropriate footholds is crucial for the success of the behaviour. We build a model of the environment on-line and on-board using an efficient occupancy grid representation. We use Any-time-Repairing A* (ARA*) to search over a tree of possible actions, choose a rough body path and select the locally-best footholds accordingly. We run a n-step lookahead optimization of the body trajectory using a dynamic stability metric, the Zero Moment Point (ZMP), that generates natural dynamic whole-body motions. A combination of floating-base inverse dynamics and virtual model control accurately executes the desired motions on an actively compliant system. Experimental trials show that this framework allows us to traverse terrains at nearly 6 times the speed of our previous work, evaluated over the same set of trials.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03695

PDF

http://arxiv.org/pdf/1904.03695
Read All
On-line and on-board planning and perception for quadrupedal locomotion

2019-04-07

Carlos Mastalli, Ioannis Havoutis, Alexander W. Winkler, Darwin G. Caldwell, Claudio Semini

arXiv_CV

arXiv_CV
Abstract

We present a legged motion planning approach for quadrupedal locomotion over challenging terrain. We decompose the problem into body action planning and footstep planning. We use a lattice representation together with a set of defined body movement primitives for computing a body action plan. The lattice representation allows us to plan versatile movements that ensure feasibility for every possible plan. To this end, we propose a set of rules that define the footstep search regions and footstep sequence given a body action. We use Anytime Repairing A* (ARA*) search that guarantees bounded suboptimal plans. Our main contribution is a planning approach that generates on-line versatile movements. Experimental trials demonstrate the performance of our planning approach in a set of challenging terrain conditions. The terrain information and plans are computed on-line and on-board.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03693

PDF

http://arxiv.org/pdf/1904.03693
Read All
Unsupervised Domain Adaptation for Multispectral Pedestrian Detection

2019-04-07

Dayan Guan, Xing Luo, Yanpeng Cao, Jiangxin Yang, Yanlong Cao, George Vosselman, Michael Ying Yang

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Multimodal information (e.g., visible and thermal) can generate robust pedestrian detections to facilitate around-the-clock computer vision applications, such as autonomous driving and video surveillance. However, it still remains a crucial challenge to train a reliable detector working well in different multispectral pedestrian datasets without manual annotations. In this paper, we propose a novel unsupervised domain adaptation framework for multispectral pedestrian detection, by iteratively generating pseudo annotations and updating the parameters of our designed multispectral pedestrian detector on target domain. Pseudo annotations are generated using the detector trained on source domain, and then updated by fixing the parameters of detector and minimizing the cross entropy loss without back-propagation. Training labels are generated using the pseudo annotations by considering the characteristics of similarity and complementarity between well-aligned visible and infrared image pairs. The parameters of detector are updated using the generated labels by minimizing our defined multi-detection loss function with back-propagation. The optimal parameters of detector can be obtained after iteratively updating the pseudo annotations and parameters. Experimental results show that our proposed unsupervised multimodal domain adaptation method achieves significantly higher detection performance than the approach without domain adaptation, and is competitive with the supervised multispectral pedestrian detectors.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03692

PDF

http://arxiv.org/pdf/1904.03692
Read All
Proposing a Localized Relevance Vector Machine for Pattern Classification

2019-04-07

Farhood Rismanchian, Karim Rahimian

arXiv_AI

arXiv_AI Sparse Classification Prediction
Abstract

Relevance vector machine (RVM) can be seen as a probabilistic version of support vector machines which is able to produce sparse solutions by linearly weighting a small number of basis functions instead using all of them. Regardless of a few merits of RVM such as giving probabilistic predictions and relax of parameter tuning, it has poor prediction for test instances that are far away from the relevance vectors. As a solution, we propose a new combination of RVM and k-nearest neighbor (k-NN) rule which resolves this issue with regionally dealing with every test instance. In our settings, we obtain the relevance vectors for each test instance in the local area given by k-NN rule. In this way, relevance vectors are closer and more relevant to the test instance which results in a more accurate model. This can be seen as a piece-wise learner which locally classifies test instances. The model is hence called localized relevance vector machine (LRVM). The LRVM is examined on several datasets of the University of California, Irvine (UCI) repository. Results supported by statistical tests indicate that the performance of LRVM is competitive as compared with a few state-of-the-art classifiers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03688

PDF

http://arxiv.org/pdf/1904.03688
Read All
Speech Model Pre-training for End-to-End Spoken Language Understanding

2019-04-07

Loren Lugosch, Mirco Ravanelli, Patrick Ignoto, Vikrant Singh Tomar, Yoshua Bengio

arXiv_CL

arXiv_CL
Abstract

Whereas conventional spoken language understanding (SLU) systems map speech to text, and then text to intent, end-to-end SLU systems map speech directly to intent through a single trainable model. Achieving high accuracy with these end-to-end models without a large amount of training data is difficult. We propose a method to reduce the data requirements of end-to-end SLU in which the model is first pre-trained to predict words and phonemes, thus learning good features for SLU. We introduce a new SLU dataset, Fluent Speech Commands, and show that our method improves performance both when the full dataset is used for training and when only a small subset is used. We also describe preliminary experiments to gauge the model’s ability to generalize to new phrases not heard during training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03670

PDF

http://arxiv.org/pdf/1904.03670
Read All
Robust Building-based Registration of Airborne LiDAR Data and Optical Imagery on Urban Scenes

2019-04-07

Thanh Huy Nguyen, Sylvie Daniel, Didier Gueriot, Christophe Sintes, Jean-Marc Le Caillec

arXiv_CV

arXiv_CV Segmentation
Abstract

The motivation of this paper is to address the problem of registering airborne LiDAR data and optical aerial or satellite imagery acquired from different platforms, at different times, with different points of view and levels of detail. In this paper, we present a robust registration method based on building regions, which are extracted from optical images using mean shift segmentation, and from LiDAR data using a 3D point cloud filtering process. The matching of the extracted building segments is then carried out using Graph Transformation Matching (GTM) which allows to determine a common pattern of relative positions of segment centers. Thanks to this registration, the relative shifts between the data sets are significantly reduced, which enables a subsequent fine registration and a resulting high-quality data fusion.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03668

PDF

http://arxiv.org/pdf/1904.03668
Read All
Learning to Control Highly Accelerated Ballistic Movements on Muscular Robots

2019-04-07

Dieter Büchler, Roberto Calandra, Jan Peters

arXiv_RO

arXiv_RO Tracking Optimization
Abstract

High-speed and high-acceleration movements are inherently hard to control. Applying learning to the control of such motions on anthropomorphic robot arms can improve the accuracy of the control but might damage the system. The inherent exploration of learning approaches can lead to instabilities and the robot reaching joint limits at high speeds. Having hardware that enables safe exploration of high-speed and high-acceleration movements is therefore desirable. To address this issue, we propose to use robots actuated by Pneumatic Artificial Muscles (PAMs). In this paper, we present a four degrees of freedom (DoFs) robot arm that reaches high joint angle accelerations of up to 28000 deg/s^2 while avoiding dangerous joint limits thanks to the antagonistic actuation and limits on the air pressure ranges. With this robot arm, we are able to tune control parameters using Bayesian optimization directly on the hardware without additional safety considerations. The achieved tracking performance on a fast trajectory exceeds previous results on comparable PAM-driven robots. We also show that our system can be controlled well on slow trajectories with PID controllers due to careful construction considerations such as minimal bending of cables, lightweight kinematics and minimal contact between PAMs and PAMs with the links. Finally, we propose a novel technique to control the the co-contraction of antagonistic muscle pairs. Experimental results illustrate that choosing the optimal co-contraction level is vital to reach better tracking performance. Through the use of PAM-driven robots and learning, we do a small step towards the future development of robots capable of more human-like motions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03665

PDF

http://arxiv.org/pdf/1904.03665
Read All
SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression

2019-04-07

Christos Baziotis, Ion Androutsopoulos, Ioannis Konstas, Alexandros Potamianos

arXiv_CL

arXiv_CL Reinforcement_Learning Optimization Language_Model
Abstract

Neural sequence-to-sequence models are currently the dominant approach in several natural language processing tasks, but require large parallel corpora. We present a sequence-to-sequence-to-sequence autoencoder (SEQ^3), consisting of two chained encoder-decoder pairs, with words used as a sequence of discrete latent variables. We apply the proposed model to unsupervised abstractive sentence compression, where the first and last sequences are the input and reconstructed sentences, respectively, while the middle sequence is the compressed sentence. Constraining the length of the latent word sequences forces the model to distill important information from the input. A pretrained language model, acting as a prior over the latent sequences, encourages the compressed sentences to be human-readable. Continuous relaxations enable us to sample from categorical distributions, allowing gradient-based optimization, unlike alternatives that rely on reinforcement learning. The proposed model does not require parallel text-summary pairs, achieving promising results in unsupervised sentence compression on benchmark datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03651

PDF

http://arxiv.org/pdf/1904.03651
Read All
Real-Time Quality Assessment of Pediatric MRI via Semi-Supervised Deep Nonlocal Residual Neural Networks

2019-04-07

Siyuan Liu, Kim-Han Thung, Weili Lin, Pew-Thian Yap, Dinggang~Shen

arXiv_CV

arXiv_CV QA
Abstract

In this paper, we introduce an image quality assessment (IQA) method for pediatric T1- and T2-weighted MR images. IQA is first performed slice-wise using a nonlocal residual neural network (NR-Net) and then volume-wise by agglomerating the slice QA results using random forest. Our method requires only a small amount of quality-annotated images for training and is designed to be robust to annotation noise that might occur due to rater errors and the inevitable mix of good and bad slices in an image volume. Using a small set of quality-assessed images, we pre-train NR-Net to annotate each image slice with an initial quality rating (i.e., pass, questionable, fail), which we then refine by semi-supervised learning and iterative self-training. Experimental results demonstrate that our method, trained using only samples of modest size, exhibit great generalizability, capable of real-time (milliseconds per volume) large-scale IQA with near-perfect accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03639

PDF

http://arxiv.org/pdf/1904.03639
Read All
Learning to Learn Relation for Important People Detection in Still Images

2019-04-07

Wei-Hong Li, Fa-Ting Hong, Wei-Shi Zheng

arXiv_CV

arXiv_CV Classification Detection Relation
Abstract

Humans can easily recognize the importance of people in social event images, and they always focus on the most important individuals. However, learning to learn the relation between people in an image, and inferring the most important person based on this relation, remains undeveloped. In this work, we propose a deep imPOrtance relatIon NeTwork (POINT) that combines both relation modeling and feature learning. In particular, we infer two types of interaction modules: the person-person interaction module that learns the interaction between people and the event-person interaction module that learns to describe how a person is involved in the event occurring in an image. We then estimate the importance relations among people from both interactions and encode the relation feature from the importance relations. In this way, POINT automatically learns several types of relation features in parallel, and we aggregate these relation features and the person’s feature to form the importance feature for important people classification. Extensive experimental results show that our method is effective for important people detection and verify the efficacy of learning to learn relations for important people detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03632

PDF

http://arxiv.org/pdf/1904.03632
Read All
Adaptive NMS: Refining Pedestrian Detection in a Crowd

2019-04-07

Songtao Liu, Di Huang, Yunhong Wang

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Pedestrian detection in a crowd is a very challenging issue. This paper addresses this problem by a novel Non-Maximum Suppression (NMS) algorithm to better refine the bounding boxes given by detectors. The contributions are threefold: (1) we propose adaptive-NMS, which applies a dynamic suppression threshold to an instance, according to the target density; (2) we design an efficient subnetwork to learn density scores, which can be conveniently embedded into both the single-stage and two-stage detectors; and (3) we achieve state of the art results on the CityPersons and CrowdHuman benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03629

PDF

http://arxiv.org/pdf/1904.03629
Read All
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

2019-04-07

Hyeonwoo Noh, Taehoon Kim, Jonghwan Mun, Bohyung Han

arXiv_CV

arXiv_CV Knowledge Transfer_Learning VQA Recognition
Abstract

We study how to leverage off-the-shelf visual and linguistic data to cope with out-of-vocabulary answers in visual question answering task. Existing large-scale visual datasets with annotations such as image class labels, bounding boxes and region descriptions are good sources for learning rich and diverse visual concepts. However, it is not straightforward how the visual concepts can be captured and transferred to visual question answering models due to missing link between question dependent answering models and visual data without question. We tackle this problem in two steps: 1) learning a task conditional visual classifier, which is capable of solving diverse question-specific visual recognition tasks, based on unsupervised task discovery and 2) transferring the task conditional visual classifier to visual question answering models. Specifically, we employ linguistic knowledge sources such as structured lexical database (e.g. WordNet) and visual descriptions for unsupervised task discovery, and transfer a learned task conditional visual classifier as an answering unit in a visual question answering model. We empirically show that the proposed algorithm generalizes to out-of-vocabulary answers successfully using the knowledge transferred from the visual dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.02358

PDF

http://arxiv.org/pdf/1810.02358
Read All
Learning Metrics from Teachers: Compact Networks for Image Embedding

2019-04-07

Lu Yu, Vacit Oguz Yazici, Xialei Liu, Joost van de Weijer, Yongmei Cheng, Arnau Ramisa

arXiv_CV

arXiv_CV Image_Retrieval Attention Face Embedding Image_Classification Classification Recognition Face_Recognition
Abstract

Metric learning networks are used to compute image embeddings, which are widely used in many applications such as image retrieval and face recognition. In this paper, we propose to use network distillation to efficiently compute image embeddings with small networks. Network distillation has been successfully applied to improve image classification, but has hardly been explored for metric learning. To do so, we propose two new loss functions that model the communication of a deep teacher network to a small student network. We evaluate our system in several datasets, including CUB-200-2011, Cars-196, Stanford Online Products and show that embeddings computed using small student networks perform significantly better than those computed using standard networks of similar size. Results on a very compact network (MobileNet-0.25), which can be used on mobile devices, show that the proposed method can greatly improve Recall@1 results from 27.5\% to 44.6\%. Furthermore, we investigate various aspects of distillation for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. (Code is available at https://github.com/yulu0724/EmbeddingDistillation.)

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03624

PDF

http://arxiv.org/pdf/1904.03624
Read All
Brute-forcing spin-glass problems with CUDA

2019-04-07

Konrad Jałowiecki, Marek M. Rams, Bartłomiej Gardas

arXiv_CV

arXiv_CV
Abstract

We demonstrate how to compute the low energy spectrum for small ($L\le 50$), but otherwise arbitrary, spin-glass instances using modern Graphics Processing Units or a similar heterogeneous architecture. Our algorithm performs an exhaustive (i.e. brute-force) search of all possible configurations to select $N\ll 2^L$ lowest ones together with their corresponding energies. We mainly focus on the Ising model defined on an arbitrary graph. An open source implementation based on CUDA Fortran and a suitable python wrapper are provided. As opposed to heuristic approaches ours is exact and thus can serve as a references point to benchmark other algorithms and hardware, including quantum and digital annealers. Our implementation offers unprecedented speed and efficiency already visible on commodity hardware. At the same time, it can be easily launched on professional, high-end, graphics cards virtually at no extra effort. As a practical application, we employ it to demonstrate that despite its one-dimensional nature, the recent Matrix Product State based algorithm can still accurately approximate the low energy spectrum of fully connected graphs of size $L$ approaching $50$.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03621

PDF

https://arxiv.org/pdf/1904.03621
Read All
Teaching GANs to Sketch in Vector Format

2019-04-07

Varshaneya V, S Balasubramanian, Vineeth N Balasubramanian

arXiv_CV

arXiv_CV Adversarial GAN Reinforcement_Learning
Abstract

Sketching is more fundamental to human cognition than speech. Deep Neural Networks (DNNs) have achieved the state-of-the-art in speech-related tasks but have not made significant development in generating stroke-based sketches a.k.a sketches in vector format. Though there are Variational Auto Encoders (VAEs) for generating sketches in vector format, there is no Generative Adversarial Network (GAN) architecture for the same. In this paper, we propose a standalone GAN architecture SkeGAN and a VAE-GAN architecture VASkeGAN, for sketch generation in vector format. SkeGAN is a stochastic policy in Reinforcement Learning (RL), capable of generating both multidimensional continuous and discrete outputs. VASkeGAN hybridizes a VAE and a GAN, in order to couple the efficient representation of data by VAE with the powerful generating capabilities of a GAN, to produce visually appealing sketches. We also propose a new metric called the Ske-score which quantifies the quality of vector sketches. We have validated that SkeGAN and VASkeGAN generate visually appealing sketches by using Human Turing Test and Ske-score.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.03620

PDF

https://arxiv.org/pdf/1904.03620
Read All
VAE-based regularization for deep speaker embedding

2019-04-07

Yang Zhang, Lantian Li, Dong Wang

arXiv_SD

arXiv_SD Regularization Embedding Recognition
Abstract

Deep speaker embedding has achieved state-of-the-art performance in speaker recognition. A potential problem of these embedded vectors (called `x-vectors’) are not Gaussian, causing performance degradation with the famous PLDA back-end scoring. In this paper, we propose a regularization approach based on Variational Auto-Encoder (VAE). This model transforms x-vectors to a latent space where mapped latent codes are more Gaussian, hence more suitable for PLDA scoring.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.03617

PDF

http://arxiv.org/pdf/1904.03617
Read All

85/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL