Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Dense Haze: A benchmark for image dehazing with dense-haze and haze-free images

2019-04-05

Codruta O. Ancuti, Cosmin Ancuti, Mateu Sbert, Radu Timofte

arXiv_CV

arXiv_CV Attention Quantitative
Abstract

Single image dehazing is an ill-posed problem that has recently drawn important attention. Despite the significant increase in interest shown for dehazing over the past few years, the validation of the dehazing methods remains largely unsatisfactory, due to the lack of pairs of real hazy and corresponding haze-free reference images. To address this limitation, we introduce Dense-Haze - a novel dehazing dataset. Characterized by dense and homogeneous hazy scenes, Dense-Haze contains 33 pairs of real hazy and corresponding haze-free images of various outdoor scenes. The hazy scenes have been recorded by introducing real haze, generated by professional haze machines. The hazy and haze-free corresponding scenes contain the same visual content captured under the same illumination parameters. Dense-Haze dataset aims to push significantly the state-of-the-art in single-image dehazing by promoting robust methods for real and various hazy scenes. We also provide a comprehensive qualitative and quantitative evaluation of state-of-the-art single image dehazing techniques based on the Dense-Haze dataset. Not surprisingly, our study reveals that the existing dehazing techniques perform poorly for dense homogeneous hazy scenes and that there is still much room for improvement.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02904

PDF

https://arxiv.org/pdf/1904.02904
Read All
Nutty-based Robot Animation -- Principles and Practices

2019-04-05

Tiago Ribeiro, Ana Paiva

arXiv_RO

arXiv_RO Review
Abstract

Robot animation is a new form of character animation that extends the traditional process by allowing the animated motion to become more interactive and adaptable during interaction with users in real-world settings. This paper reviews how this new type of character animation has evolved and been shaped from character animation principles and practices. We outline some new paradigms that aim at allowing character animators to become robot animators, and to properly take part in the development of social robots. One such paradigm consists of the 12 principles of robot animation, which describes general concepts that both animators and robot developers should consider in order to properly understand each other. We also introduce the concept of Kinematronics, for specifying the controllable and programmable expressive abilities of robots, and the Nutty Workflow and Pipeline. The Nutty Pipeline introduces the concept of the Programmable Robot Animation Engine, which allows to generate, compose and blend various types of animation sources into a final, interaction-enabled motion that can be rendered on robots in real-time during real-world interactions. Additionally, we describe some types of tools that can be developed and integrated into Nutty-based workflows and pipelines, which allow animation artists to perform an integral part of the expressive behaviour development within social robots, and thus to evolve from standard (3D) character animators, towards a full-stack type of robot animators.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02898

PDF

http://arxiv.org/pdf/1904.02898
Read All
WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation

2019-04-05

Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo

arXiv_SD

arXiv_SD Adversarial GAN Inference
Abstract

WaveCycleGAN has recently been proposed to bridge the gap between natural and synthesized speech waveforms in statistical parametric speech synthesis and provides fast inference with a moving average model rather than an autoregressive model and high-quality speech synthesis with the adversarial training. However, the human ear can still distinguish the processed speech waveforms from natural ones. One possible cause of this distinguishability is the aliasing observed in the processed speech waveform via down/up-sampling modules. To solve the aliasing and provide higher quality speech synthesis, we propose WaveCycleGAN2, which 1) uses generators without down/up-sampling modules and 2) combines discriminators of the waveform domain and acoustic parameter domain. The results show that the proposed method 1) alleviates the aliasing well, 2) is useful for both speech waveforms generated by analysis-and-synthesis and statistical parametric speech synthesis, and 3) achieves a mean opinion score comparable to those of natural speech and speech synthesized by WaveNet (open WaveNet) and WaveGlow while processing speech samples at a rate of more than 150 kHz on an NVIDIA Tesla P100.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02892

PDF

https://arxiv.org/pdf/1904.02892
Read All
Snap and Find: Deep Discrete Cross-domain Garment Image Retrieval

2019-04-05

Yadan Luo, Ziwei Wang, Zi Huang, Yang Yang, Huimin Lu

arXiv_CV

arXiv_CV Image_Retrieval Attention Embedding Relation
Abstract

With the increasing number of online stores, there is a pressing need for intelligent search systems to understand the item photos snapped by customers and search against large-scale product databases to find their desired items. However, it is challenging for conventional retrieval systems to match up the item photos captured by customers and the ones officially released by stores, especially for garment images. To bridge the customer- and store- provided garment photos, existing studies have been widely exploiting the clothing attributes (\textit{e.g.,} black) and landmarks (\textit{e.g.,} collar) to learn a common embedding space for garment representations. Unfortunately they omit the sequential correlation of attributes and consume large quantity of human labors to label the landmarks. In this paper, we propose a deep multi-task cross-domain hashing termed \textit{DMCH}, in which cross-domain embedding and sequential attribute learning are modeled simultaneously. Sequential attribute learning not only provides the semantic guidance for embedding, but also generates rich attention on discriminative local details (\textit{e.g.,} black buttons) of clothing items without requiring extra landmark labels. This leads to promising performance and 306$\times$ boost on efficiency when compared with the state-of-the-art models, which is demonstrated through rigorous experiments on two public fashion datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02887

PDF

https://arxiv.org/pdf/1904.02887
Read All
Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks

2019-04-05

Yinpeng Dong, Tianyu Pang, Hang Su, Jun Zhu

arXiv_CV

arXiv_CV Adversarial
Abstract

Deep neural networks are vulnerable to adversarial examples, which can mislead classifiers by adding imperceptible perturbations. An intriguing property of adversarial examples is their good transferability, making black-box attacks feasible in real-world applications. Due to the threat of adversarial attacks, many methods have been proposed to improve the robustness. Several state-of-the-art defenses are shown to be robust against transferable adversarial examples. In this paper, we propose a translation-invariant attack method to generate more transferable adversarial examples against the defense models. By optimizing a perturbation over an ensemble of translated images, the generated adversarial example is less sensitive to the white-box model being attacked and has better transferability. To improve the efficiency of attacks, we further show that our method can be implemented by convolving the gradient at the untranslated image with a pre-defined kernel. Our method is generally applicable to any gradient-based attack method. Extensive experiments on the ImageNet dataset validate the effectiveness of the proposed method. Our best attack fools eight state-of-the-art defenses at an 82% success rate on average based only on the transferability, demonstrating the insecurity of the current defense techniques.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02884

PDF

https://arxiv.org/pdf/1904.02884
Read All
CubeSLAM: Monocular 3D Object SLAM

2019-04-05

Shichao Yang, Sebastian Scherer

arXiv_RO

arXiv_RO Object_Detection Pose_Estimation Detection SLAM
Abstract

We present a method for single image 3D cuboid object detection and multi-view object SLAM in both static and dynamic environments, and demonstrate that the two parts can improve each other. Firstly for single image object detection, we generate high-quality cuboid proposals from 2D bounding boxes and vanishing points sampling. The proposals are further scored and selected based on the alignment with image edges. Secondly, multi-view bundle adjustment with new object measurements is proposed to jointly optimize poses of cameras, objects and points. Objects can provide long-range geometric and scale constraints to improve camera pose estimation and reduce monocular drift. Instead of treating dynamic regions as outliers, we utilize object representation and motion model constraints to improve the camera pose estimation. The 3D detection experiments on SUN RGBD and KITTI show better accuracy and robustness over existing approaches. On the public TUM, KITTI odometry and our own collected datasets, our SLAM method achieves the state-of-the-art monocular camera pose estimation and at the same time, improves the 3D object detection accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.00557

PDF

http://arxiv.org/pdf/1806.00557
Read All
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

2019-04-05

Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu

arXiv_SD

arXiv_SD Speech_Recognition Recognition
Abstract

This paper introduces a new speech corpus called “LibriTTS” designed for text-to-speech use. It is derived from the original audio and text materials of the LibriSpeech corpus, which has been used for training and evaluating automatic speech recognition systems. The new corpus inherits desired properties of the LibriSpeech corpus while addressing a number of issues which make LibriSpeech less than ideal for text-to-speech work. The released corpus consists of 585 hours of speech data at 24kHz sampling rate from 2,456 speakers and the corresponding texts. Experimental results show that neural end-to-end TTS models trained from the LibriTTS corpus achieved above 4.0 in mean opinion scores in naturalness in five out of six evaluation speakers. The corpus is freely available for download from this http URL.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02882

PDF

https://arxiv.org/pdf/1904.02882
Read All
End-to-End Multi-Task Learning with Attention

2019-04-05

Shikun Liu, Edward Johns, Andrew J. Davison

arXiv_CV

arXiv_CV Attention Image_Classification NMT Classification Prediction
Abstract

We propose a novel multi-task learning architecture, which allows learning of task-specific feature-level attention. Our design, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with a soft-attention module for each task. These modules allow for learning of task-specific features from the global features, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be trained end-to-end and can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. We evaluate our approach on a variety of datasets, across both image-to-image predictions and image classification tasks. We show that our architecture is state-of-the-art in multi-task learning compared to existing methods, and is also less sensitive to various weighting schemes in the multi-task loss function. Code is available at https://github.com/lorenmt/mtan.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.10704

PDF

http://arxiv.org/pdf/1803.10704
Read All
Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours

2019-04-05

Dimitrios Stamoulis, Ruizhou Ding, Di Wang, Dimitrios Lymberopoulos, Bodhi Priyantha, Jie Liu, Diana Marculescu

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

Can we automatically design a Convolutional Network (ConvNet) with the highest image classification accuracy under the runtime constraint of a mobile device? Neural architecture search (NAS) has revolutionized the design of hardware-efficient ConvNets by automating this process. However, the NAS problem remains challenging due to the combinatorially large design space, causing a significant searching time (at least 200 GPU-hours). To alleviate this complexity, we propose Single-Path NAS, a novel differentiable NAS method for designing hardware-efficient ConvNets in less than 4 hours. Our contributions are as follows: 1. Single-path search space: Compared to previous differentiable NAS methods, Single-Path NAS uses one single-path over-parameterized ConvNet to encode all architectural decisions with shared convolutional kernel parameters, hence drastically decreasing the number of trainable parameters and the search cost down to few epochs. 2. Hardware-efficient ImageNet classification: Single-Path NAS achieves 74.96% top-1 accuracy on ImageNet with 79ms latency on a Pixel 1 phone, which is state-of-the-art accuracy compared to NAS methods with similar constraints (<80ms). 3. NAS efficiency: Single-Path NAS search cost is only 8 epochs (30 TPU-hours), which is up to 5,000x faster compared to prior work. 4. Reproducibility: Unlike all recent mobile-efficient NAS methods which only release pretrained models, we open-source our entire codebase at: this https URL.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02877

PDF

https://arxiv.org/pdf/1904.02877
Read All
Scalable Nonlinear Planning with Deep Neural Network Learned Transition Models

2019-04-05

Ga Wu, Buser Say, Scott Sanner

arXiv_AI

arXiv_AI Optimization RNN
Abstract

In many real-world planning problems with factored, mixed discrete and continuous state and action spaces such as Reservoir Control, Heating Ventilation, and Air Conditioning, and Navigation domains, it is difficult to obtain a model of the complex nonlinear dynamics that govern state evolution. However, the ubiquity of modern sensors allows us to collect large quantities of data from each of these complex systems and build accurate, nonlinear deep neural network models of their state transitions. But there remains one major problem for the task of control – how can we plan with deep network learned transition models without resorting to Monte Carlo Tree Search and other black-box transition model techniques that ignore model structure and do not easily extend to mixed discrete and continuous domains? In this paper, we introduce two types of nonlinear planning methods that can leverage deep neural network learned transition models: Hybrid Deep MILP Planner (HD-MILP-Plan) and Tensorflow Planner (TF-Plan). In HD-MILP-Plan, we make the critical observation that the Rectified Linear Unit transfer function for deep networks not only allows faster convergence of model learning, but also permits a direct compilation of the deep network transition model to a Mixed-Integer Linear Program encoding. Further, we identify deep network specific optimizations for HD-MILP-Plan that improve performance over a base encoding and show that we can plan optimally with respect to the learned deep networks. In TF-Plan, we take advantage of the efficiency of auto-differentiation tools and GPU-based computation where we encode a subclass of purely continuous planning problems as Recurrent Neural Networks and directly optimize the actions through backpropagation. We compare both planners and show that TF-Plan is able to approximate the optimal plans found by HD-MILP-Plan in less computation time…

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02873

PDF

https://arxiv.org/pdf/1904.02873
Read All
Multiphase Level-Set Loss for Semi-Supervised and Unsupervised Segmentation with Deep Learning

2019-04-05

Boah Kim, Jong Chul Ye

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Deep_Learning
Abstract

Recent state-of-the-art image segmentation algorithms are mostly based on deep neural network, thanks to its high performance and fast computation time. However, these methods are usually trained in a supervised manner, which requires large number of high quality ground-truth segmentation masks. On the other hand, classical image segmentation approaches such as level-set methods are still useful to help generation of segmentation masks without labels, but these algorithms are usually computationally expensive and often have limitation in semantic segmentation. In this paper, we propose a novel multiphase level-set loss function for deep learning-based semantic image segmentation without or with small labeled data. This loss function is based on the observation that the softmax layer of deep neural networks has striking similarity to the characteristic function in the classical multiphase level-set algorithms. We show that the multiphase level-set loss function enables semi-supervised or even unsupervised semantic segmentation. In addition, our loss function can be also used as a regularized function to enhance supervised semantic segmentation algorithms. Experimental results on multiple datasets demonstrate the effectiveness of the proposed method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02872

PDF

https://arxiv.org/pdf/1904.02872
Read All
The Lifted Matrix-Space Model for Semantic Composition

2019-04-05

WooJin Chung, Sheng-Fu Wang, Samuel R. Bowman

arXiv_CL

arXiv_CL Sentiment Embedding RNN
Abstract

Tree-structured neural network architectures for sentence encoding draw inspiration from the approach to semantic composition generally seen in formal linguistics, and have shown empirical improvements over comparable sequence models by doing so. Moreover, adding multiplicative interaction terms to the composition functions in these models can yield significant further improvements. However, existing compositional approaches that adopt such a powerful composition function scale poorly, with parameter counts exploding as model dimension or vocabulary size grows. We introduce the Lifted Matrix-Space model, which uses a global transformation to map vector word embeddings to matrices, which can then be composed via an operation based on matrix-matrix multiplication. Its composition function effectively transmits a larger number of activations across layers with relatively few model parameters. We evaluate our model on the Stanford NLI corpus, the Multi-Genre NLI corpus, and the Stanford Sentiment Treebank and find that it consistently outperforms TreeLSTM (Tai et al., 2015), the previous best known composition function for tree-structured models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1711.03602

PDF

http://arxiv.org/pdf/1711.03602
Read All
Fast Spatio-Temporal Residual Network for Video Super-Resolution

2019-04-05

Sheng Li, Fengxiang He, Bo Du, Lefei Zhang, Yonghao Xu, Dacheng Tao

arXiv_CV

arXiv_CV Super_Resolution Deep_Learning
Abstract

Recently, deep learning based video super-resolution (SR) methods have achieved promising performance. To simultaneously exploit the spatial and temporal information of videos, employing 3-dimensional (3D) convolutions is a natural approach. However, straight utilizing 3D convolutions may lead to an excessively high computational complexity which restricts the depth of video SR models and thus undermine the performance. In this paper, we present a novel fast spatio-temporal residual network (FSTRN) to adopt 3D convolutions for the video SR task in order to enhance the performance while maintaining a low computational load. Specifically, we propose a fast spatio-temporal residual block (FRB) that divide each 3D filter to the product of two 3D filters, which have considerably lower dimensions. Furthermore, we design a cross-space residual learning that directly links the low-resolution space and the high-resolution space, which can greatly relieve the computational burden on the feature fusion and up-scaling parts. Extensive evaluations and comparisons on benchmark datasets validate the strengths of the proposed approach and demonstrate that the proposed network significantly outperforms the current state-of-the-art methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02870

PDF

https://arxiv.org/pdf/1904.02870
Read All
Data Shapley: Equitable Valuation of Data for Machine Learning

2019-04-05

Amirata Ghorbani, James Zou

arXiv_AI

arXiv_AI Prediction
Abstract

As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions. For example, in healthcare and consumer markets, it has been suggested that individuals should be compensated for the data that they generate, but it is not clear what is an equitable valuation for individual data. In this work, we develop a principled framework to address data valuation in the context of supervised machine learning. Given a learning algorithm trained on $n$ data points to produce a predictor, we propose data Shapley as a metric to quantify the value of each training datum to the predictor performance. Data Shapley uniquely satisfies several natural properties of equitable data valuation. We develop Monte Carlo and gradient-based methods to efficiently estimate data Shapley values in practical settings where complex learning algorithms, including neural networks, are trained on large datasets. In addition to being equitable, extensive experiments across biomedical, image and synthetic data demonstrate that data Shapley has several other benefits: 1) it is more powerful than the popular leave-one-out or leverage score in providing insight on what data is more valuable for a given learning task; 2) low Shapley value data effectively capture outliers and corruptions; 3) high Shapley value data inform what type of new data to acquire to improve the predictor.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02868

PDF

https://arxiv.org/pdf/1904.02868
Read All
Actively Seeking and Learning from Live Data

2019-04-05

Damien Teney, Anton van den Hengel

arXiv_CV

arXiv_CV QA Face Caption VQA
Abstract

One of the key limitations of traditional machine learning methods is their requirement for training data that exemplifies all the information to be learned. This is a particular problem for visual question answering methods, which may be asked questions about virtually anything. The approach we propose is a step toward overcoming this limitation by searching for the information required at test time. The resulting method dynamically utilizes data from an external source, such as a large set of questions/answers or images/captions. Concretely, we learn a set of base weights for a simple VQA model, that are specifically adapted to a given question with the information specifically retrieved for this question. The adaptation process leverages recent advances in gradient-based meta learning and contributions for efficient retrieval and cross-domain adaptation. We surpass the state-of-the-art on the VQA-CP v2 benchmark and demonstrate our approach to be intrinsically more robust to out-of-distribution test data. We demonstrate the use of external non-VQA data using the MS COCO captioning dataset to support the answering process. This approach opens a new avenue for open-domain VQA systems that interface with diverse sources of data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02865

PDF

https://arxiv.org/pdf/1904.02865
Read All
Abusive Language Detection with Graph Convolutional Networks

2019-04-05

Pushkar Mishra, Marco Del Tredici, Helen Yannakoudakis, Ekaterina Shutova

arXiv_CL

arXiv_CL CNN Detection Relation
Abstract

Abuse on the Internet represents a significant societal problem of our time. Previous research on automated abusive language detection in Twitter has shown that community-based profiling of users is a promising technique for this task. However, existing approaches only capture shallow properties of online communities by modeling follower-following relationships. In contrast, working with graph convolutional networks (GCNs), we present the first approach that captures not only the structure of online communities but also the linguistic behavior of the users within them. We show that such a heterogeneous graph-structured modeling of communities significantly advances the current state of the art in abusive language detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04073

PDF

http://arxiv.org/pdf/1904.04073
Read All
Deep Tree Learning for Zero-shot Face Anti-Spoofing

2019-04-05

Yaojie Liu, Joel Stehouwer, Amin Jourabloo, Xiaoming Liu

arXiv_CV

arXiv_CV Face Detection Recognition Face_Recognition
Abstract

Face anti-spoofing is designed to keep face recognition systems from recognizing fake faces as the genuine users. While advanced face anti-spoofing methods are developed, new types of spoof attacks are also being created and becoming a threat to all existing systems. We define the detection of unknown spoof attacks as Zero-Shot Face Anti-spoofing (ZSFA). Previous works of ZSFA only study 1-2 types of spoof attacks, such as print/replay attacks, which limits the insight of this problem. In this work, we expand the ZSFA problem to a wide range of 13 types of spoof attacks, including print attack, replay attack, 3D mask attacks, and so on. A novel Deep Tree Network (DTN) is proposed to tackle the ZSFA. The tree is learned to partition the spoof samples into semantic sub-groups in an unsupervised fashion. When a data sample arrives, being know or unknown attacks, DTN routes it to the most similar spoof cluster, and make the binary decision. In addition, to enable the study of ZSFA, we introduce the first face anti-spoofing database that contains diverse types of spoof attacks. Experiments show that our proposed method achieves the state of the art on multiple testing protocols of ZSFA.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02860

PDF

https://arxiv.org/pdf/1904.02860
Read All
A Neurorobotics Approach to Investigating the Emergence of Communication in Robots

2019-04-05

Jungsik Hwang, Nadine Wirkuttis, Jun Tani

arXiv_RO

arXiv_RO Prediction
Abstract

This paper introduces our approach to building a robot with communication capability based on the two key features: stochastic neural dynamics and prediction error minimization (PEM). A preliminary experiment with humanoid robots showed that the robot was able to imitate other’s action by means of those key features. In addition, we found that some sorts of communicative patterns emerged between two robots in which the robots inferred the intention of another agent behind the sensory observation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02858

PDF

http://arxiv.org/pdf/1904.02858
Read All
Graph Pattern Entity Ranking Model for Knowledge Graph Completion

2019-04-05

Takuma Ebisu, Ryutaro Ichise

arXiv_AI

arXiv_AI Knowledge_Graph Knowledge Embedding Prediction
Abstract

Knowledge graphs have evolved rapidly in recent years and their usefulness has been demonstrated in many artificial intelligence tasks. However, knowledge graphs often have lots of missing facts. To solve this problem, many knowledge graph embedding models have been developed to populate knowledge graphs and these have shown outstanding performance. However, knowledge graph embedding models are so-called black boxes, and the user does not know how the information in a knowledge graph is processed and the models can be difficult to interpret. In this paper, we utilize graph patterns in a knowledge graph to overcome such problems. Our proposed model, the {\it graph pattern entity ranking model} (GRank), constructs an entity ranking system for each graph pattern and evaluates them using a ranking measure. By doing so, we can find graph patterns which are useful for predicting facts. Then, we perform link prediction tasks on standard datasets to evaluate our GRank method. We show that our approach outperforms other state-of-the-art approaches such as ComplEx and TorusE for standard metrics such as HITS@{\it n} and MRR. Moreover, our model is easily interpretable because the output facts are described by graph patterns.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02856

PDF

https://arxiv.org/pdf/1904.02856
Read All
Modelling of Sound Events with Hidden Imbalances Based on Clustering and Separate Sub-Dictionary Learning

2019-04-05

Chaitanya Narisetty, Tatsuya Komatsu, Reishi Kondo

arXiv_SD

arXiv_SD Detection
Abstract

This paper proposes an effective modelling of sound event spectra with a hidden data-size-imbalance, for improved Acoustic Event Detection (AED). The proposed method models each event as an aggregated representation of a few latent factors, while conventional approaches try to find acoustic elements directly from the event spectra. In the method, all the latent factors across all events are assigned comparable importance and complexity to overcome the hidden imbalance of data-sizes in event spectra. To extract latent factors in each event, the proposed method employs clustering and performs non-negative matrix factorization to each latent factor, and learns its acoustic elements as a sub-dictionary. Separate sub-dictionary learning effectively models the acoustic elements with limited data-sizes and avoids over-fitting due to hidden imbalances in training data. For the task of polyphonic sound event detection from DCASE 2013 challenge, an AED based on the proposed modelling achieves a detection F-measure of 46.5%, a significant improvement of more than 19% as compared to the existing state-of-the-art methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02852

PDF

https://arxiv.org/pdf/1904.02852
Read All
Planning under risk and uncertainty based on Prospect-theoretic models

2019-04-05

Aamodh Suresh, Sonia Martinez

arXiv_RO

arXiv_RO
Abstract

In this work, we develop a novel sampling-based motion planing approach to generate plans in a risky and uncertain environment. To model a variety of risk-sensitivity profiles, we propose an adaption of Cumulative Prospect Theory (CPT) to the setting of path planning. This leads to the definition of a non-rational continuous cost envelope (as well as a continuous uncertainty envelope) associated with an obstacle environment. We use these metrics along with standard costs like path length to formulate path planning problems. Building on RRT*, we then develop a sampling-based motion planner that generates desirable paths from the perspective of a given risk sensitive profile. Since risk sensitivity can greatly vary, we provide a tuning knob to appease a diversity of decision makers (DM), ranging from totally risk-averse to risk-indifferent. Additionally, we adapt a Simultaneous Perturbation Stochastic Approximation (SPSA)-based algorithm to learn the CPT parameters that can best represent a certain DM. Simulations are presented in a 2D environment to evaluate the modeling approach and algorithm’s performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02851

PDF

http://arxiv.org/pdf/1904.02851
Read All
Decentralized Control Systems Laboratory Using Human Centered Robotic Actuators

2019-04-05

Binghan He, Kunye Chen, Rachel Schlossman, Neal Ormsbee, Mara Altman, Nathan Young, Matt Mangum, Luis Sentis

arXiv_RO

arXiv_RO Face
Abstract

University laboratories deliver unique hands-on experimentation for STEM students but often lack state-of-the-art equipment and provide limited access to their equipment. The University of Texas Cloud Laboratory provides remote access to a cutting-edge series elastic actuators for student experimentation regarding human-centered robotics, dynamical systems, and controls. Through a browser-based interface, students are provided with various learning materials using the remote hardware-in-the-loop system for effective experiment-based education. This paper discusses the methods used to connect remote hardware to mobile browsers, the adaptation of textbook materials regarding system identification and feedback control, data processing to generate clean and useful results for student interpretation, and initial usage of the end-to-end system for individual and group learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.11119

PDF

http://arxiv.org/pdf/1803.11119
Read All
Deep Learning-based Universal Beamformer for Ultrasound Imaging

2019-04-05

Shujaat Khan, Jaeyoung Huh, Jong Chul Ye

arXiv_CV

arXiv_CV Deep_Learning
Abstract

In ultrasound (US) imaging, individual channel RF measurements are back-propagated and accumulated to form an image after applying specific delays. While this time reversal is usually implemented using a hardware- or software-based delay-and-sum (DAS) beamformer, the performance of DAS decreases rapidly in situations where data acquisition is not ideal. Herein, for the first time, we demonstrate that a single data-driven beamformer designed as a deep neural network can directly process sub-sampled RF data acquired at different sampling rates to generate high quality US images. In particular, the proposed deep beamformer is evaluated for two distinct acquisition schemes: focused ultrasound imaging and planewave imaging. Experimental results showed that the proposed deep beamformer exhibit significant performance gain for both focused and planar imaging schemes, in terms of contrast-to-noise ratio and structural similarity.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02843

PDF

https://arxiv.org/pdf/1904.02843
Read All
Combining Sentiment Lexica with a Multi-View Variational Autoencoder

2019-04-05

Alexander Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Ryan Cotterell, Isabelle Augenstein

arXiv_CL

arXiv_CL Sentiment Text_Classification Classification Quantitative
Abstract

When assigning quantitative labels to a dataset, different methodologies may rely on different scales. In particular, when assigning polarities to words in a sentiment lexicon, annotators may use binary, categorical, or continuous labels. Naturally, it is of interest to unify these labels from disparate scales to both achieve maximal coverage over words and to create a single, more robust sentiment lexicon while retaining scale coherence. We introduce a generative model of sentiment lexica to combine disparate scales into a common latent representation. We realize this model with a novel multi-view variational autoencoder (VAE), called SentiVAE. We evaluate our approach via a downstream text classification task involving nine English-Language sentiment analysis datasets; our representation outperforms six individual sentiment lexica, as well as a straightforward combination thereof.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02839

PDF

https://arxiv.org/pdf/1904.02839
Read All
Material Based Object Tracking in Hyperspectral Videos

2019-04-05

Fengchao Xiong, Jun Zhou, Xi Li, Kun Qian, Yuntao Qian

arXiv_AI

arXiv_AI Tracking Object_Tracking Relation
Abstract

Traditional color images only depict color intensities in red, green and blue channels, often making object trackers fail in challenging scenarios, e.g., background clutter and rapid changes of target appearance. Alternatively, material information of targets contained in a large amount of bands of hyperspectral images (HSI) is more robust to these challenging conditions. In this paper, we conduct a comprehensive study on how material information can be utilized to boost object tracking from three aspects: benchmark dataset, material feature representation and material based tracking. In terms of benchmark, we construct a dataset of fully-annotated videos which contain both hyperspectral and color sequences of the same scene. Material information is represented by spectral-spatial histogram of multidimensional gradient, which describes the 3D local spectral-spatial structure in an HSI, and abundances which encode the underlying material distribution. These two types of features are embedded into correlation filters, yielding material based tracking. Experimental results on the collected benchmark dataset show the potentials and advantages of material based object tracking.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.04179

PDF

http://arxiv.org/pdf/1812.04179
Read All
FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference

2019-04-05

Ruizhou Ding, Zeye Liu, Ting-Wu Chin, Diana Marculescu, R. D. (Shawn)Blanton

arXiv_CV

arXiv_CV CNN Inference
Abstract

To improve the throughput and energy efficiency of Deep Neural Networks (DNNs) on customized hardware, lightweight neural networks constrain the weights of DNNs to be a limited combination (denoted as $k\in{1,2}$) of powers of 2. In such networks, the multiply-accumulate operation can be replaced with a single shift operation, or two shifts and an add operation. To provide even more design flexibility, the $k$ for each convolutional filter can be optimally chosen instead of being fixed for every filter. In this paper, we formulate the selection of $k$ to be differentiable, and describe model training for determining $k$-based weights on a per-filter basis. Over 46 FPGA-design experiments involving eight configurations and four data sets reveal that lightweight neural networks with a flexible $k$ value (dubbed FLightNNs) fully utilize the hardware resources on Field Programmable Gate Arrays (FPGAs), our experimental results show that FLightNNs can achieve 2$\times$ speedup when compared to lightweight NNs with $k=2$, with only 0.1\% accuracy degradation. Compared to a 4-bit fixed-point quantization, FLightNNs achieve higher accuracy and up to 2$\times$ inference speedup, due to their lightweight shift operations. In addition, our experiments also demonstrate that FLightNNs can achieve higher computational energy efficiency for ASIC implementation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02835

PDF

https://arxiv.org/pdf/1904.02835
Read All
A Validated Physical Model For Real-Time Simulation of Soft Robotic Snakes

2019-04-05

Renato Gasoto, Miles Macklin, Xuan Liu, Yinan Sun, Kenny Erleben, Cagdas Onal, Jie Fu

arXiv_RO

arXiv_RO
Abstract

In this work we present a framework that is capable of accurately representing soft robotic actuators in a multiphysics environment in real-time. We propose a constraint-based dynamics model of a 1-dimensional pneumatic soft actuator that accounts for internal pressure forces, as well as the effect of actuator latency and damping under inflation and deflation and demonstrate its accuracy a full soft robotic snake with the composition of multiple 1D actuators. We verify our model’s accuracy in static deformation and dynamic locomotion open-loop control experiments. To achieve real-time performance we leverage the parallel computation power of GPUs to allow interactive control and feedback.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02833

PDF

http://arxiv.org/pdf/1904.02833
Read All
A Regularization Approach for Instance-Based Superset Label Learning

2019-04-05

Chen Gong, Tongliang Liu, Yuanyan Tang, Jian Yang, Jie Yang, Dacheng Tao

arXiv_CV

arXiv_CV Regularization Relation
Abstract

Different from the traditional supervised learning in which each training example has only one explicit label, superset label learning (SLL) refers to the problem that a training example can be associated with a set of candidate labels, and only one of them is correct. Existing SLL methods are either regularization-based or instance-based, and the latter of which has achieved state-of-the-art performance. This is because the latest instance-based methods contain an explicit disambiguation operation that accurately picks up the groundtruth label of each training example from its ambiguous candidate labels. However, such disambiguation operation does not fully consider the mutually exclusive relationship among different candidate labels, so the disambiguated labels are usually generated in a nondiscriminative way, which is unfavorable for the instance-based methods to obtain satisfactory performance. To address this defect, we develop a novel regularization approach for instance-based superset label (RegISL) learning so that our instance-based method also inherits the good discriminative ability possessed by the regularization scheme. Specifically, we employ a graph to represent the training set, and require the examples that are adjacent on the graph to obtain similar labels. More importantly, a discrimination term is proposed to enlarge the gap of values between possible labels and unlikely labels for every training example. As a result, the intrinsic constraints among different candidate labels are deployed, and the disambiguated labels generated by RegISL are more discriminative and accurate than those output by existing instance-based algorithms. The experimental results on various tasks convincingly demonstrate the superiority of our RegISL to other typical SLL methods in terms of both training accuracy and test accuracy.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02832

PDF

https://arxiv.org/pdf/1904.02832
Read All
An Evolutionary Framework for Automatic and Guided Discovery of Algorithms

2019-04-05

Ruchira Sasanka, Konstantinos Krommydas

arXiv_AI

arXiv_AI
Abstract

This paper presents Automatic Algorithm Discoverer (AAD), an evolutionary framework for synthesizing programs of high complexity. To guide evolution, prior evolutionary algorithms have depended on fitness (objective) functions, which are challenging to design. To make evolutionary progress, instead, AAD employs Problem Guided Evolution (PGE), which requires introduction of a group of problems together. With PGE, solutions discovered for simpler problems are used to solve more complex problems in the same group. PGE also enables several new evolutionary strategies, and naturally yields to High-Performance Computing (HPC) techniques. We find that PGE and related evolutionary strategies enable AAD to discover algorithms of similar or higher complexity relative to the state-of-the-art. Specifically, AAD produces Python code for 29 array/vector problems ranging from min, max, reverse, to more challenging problems like sorting and matrix-vector multiplication. Additionally, we find that AAD shows adaptability to constrained environments/inputs and demonstrates outside-of-the-box problem solving abilities.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02830

PDF

https://arxiv.org/pdf/1904.02830
Read All
NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction

2019-04-04

Yuan Gao, Jiayi Ma, Mingbo Zhao, Wei Liu, Alan L. Yuille

arXiv_CV

arXiv_CV CNN
Abstract

In this paper, we propose a novel Convolutional Neural Network (CNN) structure for general-purpose multi-task learning (MTL), which enables automatic feature fusing at every layer from different tasks. This is in contrast with the most widely used MTL CNN structures which empirically or heuristically share features on some specific layers (e.g., share all the features except the last convolutional layer). The proposed layerwise feature fusing scheme is formulated by combining existing CNN components in a novel way, with clear mathematical interpretability as discriminative dimensionality reduction, which is referred to as Neural Discriminative Dimensionality Reduction (NDDR). Specifically, we first concatenate features with the same spatial resolution from different tasks according to their channel dimension. Then, we show that the discriminative dimensionality reduction can be fulfilled by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN. The use of existing CNN components ensures the end-to-end training and the extensibility of the proposed NDDR layer to various state-of-the-art CNN architectures in a “plug-and-play” manner. The detailed ablation analysis shows that the proposed NDDR layer is easy to train and also robust to different hyperparameters. Experiments on different task sets with various base network architectures demonstrate the promising performance and desirable generalizability of our proposed method. The code of our paper is available at https://github.com/ethanygao/NDDR-CNN.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.08297

PDF

http://arxiv.org/pdf/1801.08297
Read All
Regularizing Activation Distribution for Training Binarized Deep Networks

2019-04-04

Ruizhou Ding, Ting-Wu Chin, Zeye Liu, Diana Marculescu

arXiv_CV

arXiv_CV Regularization Inference
Abstract

Binarized Neural Networks (BNNs) can significantly reduce the inference latency and energy consumption in resource-constrained devices due to their pure-logical computation and fewer memory accesses. However, training BNNs is difficult since the activation flow encounters degeneration, saturation, and gradient mismatch problems. Prior work alleviates these issues by increasing activation bits and adding floating-point scaling factors, thereby sacrificing BNN’s energy efficiency. In this paper, we propose to use distribution loss to explicitly regularize the activation flow, and develop a framework to systematically formulate the loss. Our experiments show that the distribution loss can consistently improve the accuracy of BNNs without losing their energy benefits. Moreover, equipped with the proposed regularization, BNN training is shown to be robust to the selection of hyper-parameters including optimizer and learning rate.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02823

PDF

https://arxiv.org/pdf/1904.02823
Read All
Neural Networks for Modeling Source Code Edits

2019-04-04

Rui Zhao, David Bieber, Kevin Swersky, Daniel Tarlow

arXiv_CL

arXiv_CL Knowledge Attention
Abstract

Programming languages are emerging as a challenging and interesting domain for machine learning. A core task, which has received significant attention in recent years, is building generative models of source code. However, to our knowledge, previous generative models have always been framed in terms of generating static snapshots of code. In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files. This requires extracting intent from previous edits and leveraging it to generate subsequent edits. We develop several neural networks and use synthetic data to test their ability to learn challenging edit patterns that require strong generalization. We then collect and train our models on a large-scale dataset of Google source code, consisting of millions of fine-grained edits from thousands of Python developers. From the modeling perspective, our main conclusion is that a new composition of attentional and pointer network components provides the best overall performance and scalability. From the application perspective, our results provide preliminary evidence of the feasibility of developing tools that learn to predict future edits.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02818

PDF

https://arxiv.org/pdf/1904.02818
Read All
Unsupervised Domain Adaptation of Contextualized Embeddings: A Case Study in Early Modern English

2019-04-04

Xiaochuang Han, Jacob Eisenstein

arXiv_CL

arXiv_CL Embedding Language_Model
Abstract

Contextualized word embeddings such as ELMo and BERT provide a foundation for strong performance across a range of natural language processing tasks, in part by pretraining on a large and topically-diverse corpus. However, the applicability of this approach is unknown when the target domain varies substantially from the text used during pretraining. Specifically, we are interested the scenario in which labeled data is available in only a canonical source domain such as newstext, and the target domain is distinct from both the labeled corpus and the pretraining data. To address this scenario, we propose domain-adaptive fine-tuning, in which the contextualized embeddings are adapted by masked language modeling on the target domain. We test this approach on the challenging domain of Early Modern English, which differs substantially from existing pretraining corpora. Domain-adaptive fine-tuning yields an improvement of 4\% in part-of-speech tagging accuracy over a BERT baseline, substantially improving on prior work on this task.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02817

PDF

https://arxiv.org/pdf/1904.02817
Read All
Topic Spotting using Hierarchical Networks with Self Attention

2019-04-04

Pooja Chitkara, Ashutosh Modi, Pravalika Avvaru, Sepehr Janghorbani, Mubbasir Kapadia

arXiv_AI

arXiv_AI Attention Text_Classification Classification Deep_Learning
Abstract

Success of deep learning techniques have renewed the interest in development of dialogue systems. However, current systems struggle to have consistent long term conversations with the users and fail to build rapport. Topic spotting, the task of automatically inferring the topic of a conversation, has been shown to be helpful in making a dialog system more engaging and efficient. We propose a hierarchical model with self attention for topic spotting. Experiments on the Switchboard corpus show the superior performance of our model over previously proposed techniques for topic spotting and deep models for text classification. Additionally, in contrast to offline processing of dialog, we also analyze the performance of our model in a more realistic setting i.e. in an online setting where the topic is identified in real time as the dialog progresses. Results show that our model is able to generalize even with limited information in the online setting.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02815

PDF

https://arxiv.org/pdf/1904.02815
Read All
Video Classification with Channel-Separated Convolutional Networks

2019-04-04

Du Tran, Heng Wang, Lorenzo Torresani, Matt Feiszli

arXiv_AI

arXiv_AI Regularization CNN Image_Classification Video_Classification Classification
Abstract

Group convolution has been shown to offer great computational savings in various 2D convolutional architectures for image classification. It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks. This paper studies different effects of group convolution in 3D convolutional networks for video classification. We empirically demonstrate that the amount of channel interactions plays an important role in the accuracy of group convolutional networks. Our experiments suggest two main findings. First, it is a good practice to factorize 3D convolutions by separating channel interactions and spatiotemporal interactions as this leads to improved accuracy and lower computational cost. Second, 3D channel-separated convolutions provide a form of regularization, yielding lower training accuracy but higher test accuracy compared to 3D convolutions. These two empirical findings lead us to design an architecture – Channel-Separated Convolutional Network (CSN) – which is simple, efficient, yet accurate. On Kinetics and Sports1M, our CSNs significantly outperform state-of-the-art models while being 11-times more efficient.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02811

PDF

https://arxiv.org/pdf/1904.02811
Read All
Assessment of Faster R-CNN in Man-Machine collaborative search

2019-04-04

Arturo Deza, Amit Surana, Miguel P. Eckstein

arXiv_RO

arXiv_RO Deep_Learning Detection
Abstract

With the advent of modern expert systems driven by deep learning that supplement human experts (e.g. radiologists, dermatologists, surveillance scanners), we analyze how and when do such expert systems enhance human performance in a fine-grained small target visual search task. We set up a 2 session factorial experimental design in which humans visually search for a target with and without a Deep Learning (DL) expert system. We evaluate human changes of target detection performance and eye-movements in the presence of the DL system. We find that performance improvements with the DL system (computed via a Faster R-CNN with a VGG16) interacts with observer’s perceptual abilities (e.g., sensitivity). The main results include: 1) The DL system reduces the False Alarm rate per Image on average across observer groups of both high/low sensitivity; 2) Only human observers with high sensitivity perform better than the DL system, while the low sensitivity group does not surpass individual DL system performance, even when aided with the DL system itself; 3) Increases in number of trials and decrease in viewing time were mainly driven by the DL system only for the low sensitivity group. 4) The DL system aids the human observer to fixate at a target by the 3rd fixation. These results provide insights of the benefits and limitations of deep learning systems that are collaborative or competitive with humans.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02805

PDF

http://arxiv.org/pdf/1904.02805
Read All
Improving Dialogue State Tracking by Discerning the Relevant Context

2019-04-04

Sanuj Sharma, Prafulla Kumar Choubey, Ruihong Huang

arXiv_CL

arXiv_CL Tracking
Abstract

A typical conversation comprises of multiple turns between participants where they go back-and-forth between different topics. At each user turn, dialogue state tracking (DST) aims to estimate user’s goal by processing the current utterance. However, in many turns, users implicitly refer to the previous goal, necessitating the use of relevant dialogue history. Nonetheless, distinguishing relevant history is challenging and a popular method of using dialogue recency for that is inefficient. We, therefore, propose a novel framework for DST that identifies relevant historical context by referring to the past utterances where a particular slot-value changes and uses that together with weighted system utterance to identify the relevant context. Specifically, we use the current user utterance and the most recent system utterance to determine the relevance of a system utterance. Empirical analyses show that our method improves joint goal accuracy by 2.75% and 2.36% on WoZ 2.0 and MultiWoZ 2.0 restaurant domain datasets respectively over the previous state-of-the-art GLAD model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02800

PDF

https://arxiv.org/pdf/1904.02800
Read All
Deep Learning Sentiment Analysis of Amazon.com Reviews and Ratings

2019-04-04

Nishit Shrestha, Fatma Nasoz

arXiv_CL

arXiv_CL Sentiment Review Deep_Learning Relation
Abstract

Our study employs sentiment analysis to evaluate the compatibility of Amazon.com reviews with their corresponding ratings. Sentiment analysis is the task of identifying and classifying the sentiment expressed in a piece of text as being positive or negative. On e-commerce websites such as Amazon.com, consumers can submit their reviews along with a specific polarity rating. In some instances, there is a mismatch between the review and the rating. To identify the reviews with mismatched ratings we performed sentiment analysis using deep learning on Amazon.com product review data. Product reviews were converted to vectors using paragraph vector, which then was used to train a recurrent neural network with gated recurrent unit. Our model incorporated both semantic relationship of review text and product information. We also developed a web service application that predicts the rating score for a submitted review using the trained model and if there is a mismatch between predicted rating score and submitted rating score, it provides feedback to the reviewer.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.04096

PDF

http://arxiv.org/pdf/1904.04096
Read All
Generalized Lazy Search for Robot Motion Planning: Interleaving Search and Edge Evaluation via Event-based Toggles

2019-04-04

Aditya Mandalika, Sanjiban Choudhury, Oren Salzman, Siddhartha Srinivasa

arXiv_RO

arXiv_RO
Abstract

Lazy search algorithms can efficiently solve problems where edge evaluation is the bottleneck in computation, as is the case for robotic motion planning. The optimal algorithm in this class, LazySP, lazily restricts edge evaluation to only the shortest path. Doing so comes at the expense of search effort, i.e., LazySP must recompute the search tree every time an edge is found to be invalid. This becomes prohibitively expensive when dealing with large graphs or highly cluttered environments. Our key insight is the need to balance both edge evaluation and search effort to minimize the total planning time. Our contribution is two-fold. First, we propose a framework, Generalized Lazy Search (GLS), that seamlessly toggles between search and evaluation to prevent wasted efforts. We show that for a choice of toggle, GLS is provably more efficient than LazySP. Second, we leverage prior experience of edge probabilities to derive GLS policies that minimize expected planning time. We show that GLS equipped with such priors significantly outperforms competitive baselines for many simulated environments in R2, SE(2) and 7-DoF manipulation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02795

PDF

http://arxiv.org/pdf/1904.02795
Read All
VQD: Visual Query Detection in Natural Scenes

2019-04-04

Manoj Acharya, Karan Jariwala, Christopher Kanan

arXiv_CV

arXiv_CV Detection Recognition
Abstract

We propose Visual Query Detection (VQD), a new visual grounding task. In VQD, a system is guided by natural language to localize a \emph{variable} number of objects in an image. VQD is related to visual referring expression recognition, where the task is to localize only \emph{one} object. We describe the first dataset for VQD and we propose baseline algorithms that demonstrate the difficulty of the task compared to referring expression recognition.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02794

PDF

https://arxiv.org/pdf/1904.02794
Read All
Affect-Driven Dialog Generation

2019-04-04

Pierre Colombo, Wojciech Witon, Ashutosh Modi, James Kennedy, Mubbasir Kapadia

arXiv_AI

arXiv_AI Optimization Inference Quantitative
Abstract

The majority of current systems for end-to-end dialog generation focus on response quality without an explicit control over the affective content of the responses. In this paper, we present an affect-driven dialog system, which generates emotional responses in a controlled manner using a continuous representation of emotions. The system achieves this by modeling emotions at a word and sequence level using: (1) a vector representation of the desired emotion, (2) an affect regularizer, which penalizes neutral words, and (3) an affect sampling method, which forces the neural network to generate diverse words that are emotionally relevant. During inference, we use a reranking procedure that aims to extract the most emotionally relevant responses using a human-in-the-loop optimization process. We study the performance of our system in terms of both quantitative (BLEU score and response diversity), and qualitative (emotional appropriateness) measures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02793

PDF

https://arxiv.org/pdf/1904.02793
Read All
Unifying Human and Statistical Evaluation for Natural Language Generation

2019-04-04

Tatsunori B. Hashimoto, Hugh Zhang, Percy Liang

arXiv_AI

arXiv_AI Summarization
Abstract

How can we measure whether a natural language generation system produces both high quality and diverse outputs? Human evaluation captures quality but not diversity, as it does not catch models that simply plagiarize from the training set. On the other hand, statistical evaluation (i.e., perplexity) captures diversity but not quality, as models that occasionally emit low quality samples would be insufficiently penalized. In this paper, we propose a unified framework which evaluates both diversity and quality, based on the optimal error rate of predicting whether a sentence is human- or machine-generated. We demonstrate that this error rate can be efficiently estimated by combining human and statistical evaluation, using an evaluation metric which we call HUSE. On summarization and chit-chat dialogue, we show that (i) HUSE detects diversity defects which fool pure human evaluation and that (ii) techniques such as annealing for improving quality actually decrease HUSE due to decreased diversity.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02792

PDF

https://arxiv.org/pdf/1904.02792
Read All
In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data

2019-04-04

Nishant Prateek, Mateusz Łajszczak, Roberto Barra-Chicote, Thomas Drugman, Jaime Lorenzo-Trueba, Thomas Merritt, Srikanth Ronanki, Trevor Wood

arXiv_CL

arXiv_CL Embedding
Abstract

Neural text-to-speech synthesis (NTTS) models have shown significant progress in generating high-quality speech, however they require a large quantity of training data. This makes creating models for multiple styles expensive and time-consuming. In this paper different styles of speech are analysed based on prosodic variations, from this a model is proposed to synthesise speech in the style of a newscaster, with just a few hours of supplementary data. We pose the problem of synthesising in a target style using limited data as that of creating a bi-style model that can synthesise both neutral-style and newscaster-style speech via a one-hot vector which factorises the two styles. We also propose conditioning the model on contextual word embeddings, and extensively evaluate it against neutral NTTS, and neutral concatenative-based synthesis. This model closes the gap in perceived style-appropriateness between natural recordings for newscaster-style of speech, and neutral speech synthesis by approximately two-thirds.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02790

PDF

https://arxiv.org/pdf/1904.02790
Read All
Accelerated Reinforcement Learning for Sentence Generation by Vocabulary Prediction

2019-04-04

Kazuma Hashimoto, Yoshimasa Tsuruoka

arXiv_CV

arXiv_CV Image_Caption Reinforcement_Learning Caption Prediction
Abstract

A major obstacle in reinforcement learning-based sentence generation is the large action space whose size is equal to the vocabulary size of the target-side language. To improve the efficiency of reinforcement learning, we present a novel approach for reducing the action space based on dynamic vocabulary prediction. Our method first predicts a fixed-size small vocabulary for each input to generate its target sentence. The input-specific vocabularies are then used at supervised and reinforcement learning steps, and also at test time. In our experiments on six machine translation and two image captioning datasets, our method achieves faster reinforcement learning ($\sim$2.7x faster) with less GPU memory ($\sim$2.3x less) than the full-vocabulary counterpart. The reinforcement learning with our method consistently leads to significant improvement of BLEU scores, and the scores are equal to or better than those of baselines using the full vocabularies, with faster decoding time ($\sim$3x faster) on CPUs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1809.01694

PDF

https://arxiv.org/pdf/1809.01694
Read All
Crowd Transformer Network

2019-04-04

Viresh Ranjan, Mubarak Shah, Minh Hoai Nguyen

arXiv_CV

arXiv_CV Attention
Abstract

In this paper, we tackle the problem of Crowd Counting, and present a crowd density estimation based approach for obtaining the crowd count. Most of the existing crowd counting approaches rely on local features for estimating the crowd density map. In this work, we investigate the usefulness of combining local with non-local features for crowd counting. We use convolution layers for extracting local features, and a type of self-attention mechanism for extracting non-local features. We combine the local and the non-local features, and use it for estimating crowd density map. We conduct experiments on three publicly available Crowd Counting datasets, and achieve significant improvement over the previous approaches.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02774

PDF

https://arxiv.org/pdf/1904.02774
Read All
Biometric Fish Classification of Temperate Species Using Convolutional Neural Network with Squeeze-and-Excitation

2019-04-04

Erlend Olsvik, Christian M. D. Trinh, Kristian Muri Knausgård, Arne Wiklund, Tonje Knutsen Sørdalen, Alf Ring Kleiven, Lei Jiao, Morten Goodwin

arXiv_CV

arXiv_CV Knowledge CNN Transfer_Learning Classification Recognition
Abstract

Our understanding and ability to effectively monitor and manage coastal ecosystems are severely limited by observation methods. Automatic recognition of species in natural environment is a promising tool which would revolutionize video and image analysis for a wide range of applications in marine ecology. However, classifying fish from images captured by underwater cameras is in general very challenging due to noise and illumination variations in water. Previous classification methods in the literature relies on filtering the images to separate the fish from the background or sharpening the images by removing background noise. This pre-filtering process may negatively impact the classification accuracy. In this work, we propose a Convolutional Neural Network (CNN) using the Squeeze-and-Excitation (SE) architecture for classifying images of fish without pre-filtering. Different from conventional schemes, this scheme is divided into two steps. The first step is to train the fish classifier via a public data set, i.e., Fish4Knowledge, without using image augmentation, named as pre-training. The second step is to train the classifier based on a new data set consisting of species that we are interested in for classification, named as post-training. The weights obtained from pre-training are applied to post-training as a priori. This is also known as transfer learning. Our solution achieves the state-of-the-art accuracy of 99.27% accuracy on the pre-training. The accuracy on the post-training is 83.68%. Experiments on the post-training with image augmentation yields an accuracy of 87.74%, indicating that the solution is viable with a larger data set.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02768

PDF

https://arxiv.org/pdf/1904.02768
Read All
Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification

2019-04-04

Reno Kriz, João Sedoc, Marianna Apidianaki, Carolina Zheng, Gaurav Kumar, Eleni Miltsakaki, Chris Callison-Burch

arXiv_CL

arXiv_CL Reinforcement_Learning
Abstract

Sentence simplification is the task of rewriting texts so they are easier to understand. Recent research has applied sequence-to-sequence (Seq2Seq) models to this task, focusing largely on training-time improvements via reinforcement learning and memory augmentation. One of the main problems with applying generic Seq2Seq models for simplification is that these models tend to copy directly from the original sentence, resulting in outputs that are relatively long and complex. We aim to alleviate this issue through the use of two main techniques. First, we incorporate content word complexities, as predicted with a leveled word complexity model, into our loss function during training. Second, we generate a large set of diverse candidate simplifications at test time, and rerank these to promote fluency, adequacy, and simplicity. Here, we measure simplicity through a novel sentence complexity model. These extensions allow our models to perform competitively with state-of-the-art systems while generating simpler sentences. We report standard automatic and human evaluation metrics.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02767

PDF

https://arxiv.org/pdf/1904.02767
Read All
Brain-Computer Interface meets ROS: A robotic approach to mentally drive telepresence robots

2019-04-04

Gloria Beraldo, Morris Antonello, Andrea Cimolato, Emanuele Menegatti, Luca Tonin

arXiv_RO

arXiv_RO Attention Face
Abstract

This paper shows and evaluates a novel approach to integrate a non-invasive Brain-Computer Interface (BCI) with the Robot Operating System (ROS) to mentally drive a telepresence robot. Controlling a mobile device by using human brain signals might improve the quality of life of people suffering from severe physical disabilities or elderly people who cannot move anymore. Thus, the BCI user is able to actively interact with relatives and friends located in different rooms thanks to a video streaming connection to the robot. To facilitate the control of the robot via BCI, we explore new ROS-based algorithms for navigation and obstacle avoidance, making the system safer and more reliable. In this regard, the robot can exploit two maps of the environment, one for localization and one for navigation, and both can be used also by the BCI user to watch the position of the robot while it is moving. As demonstrated by the experimental results, the user’s cognitive workload is reduced, decreasing the number of commands necessary to complete the task and helping him/her to keep attention for longer periods of time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1712.01772

PDF

http://arxiv.org/pdf/1712.01772
Read All
Intent-Aware Probabilistic Trajectory Estimation for Collision Prediction with Uncertainty Quantification

2019-04-04

Andrew Patterson, Arun Lakshmanan, Naira Hovakimyan

arXiv_RO

arXiv_RO Knowledge Prediction
Abstract

Collision prediction in a dynamic and unknown environment relies on knowledge of how the environment is changing. Many collision prediction methods rely on deterministic knowledge of how obstacles are moving in the environment. However, complete deterministic knowledge of the obstacles’ motion is often unavailable. This work proposes a Gaussian process based prediction method that replaces the assumption of deterministic knowledge of each obstacle’s future behavior with probabilistic knowledge, to allow a larger class of obstacles to be considered. The method solely relies on position and velocity measurements to predict collisions with dynamic obstacles. We show that the uncertainty region for obstacle positions can be expressed in terms of a combination of polynomials generated with Gaussian process regression. To control the growth of uncertainty over arbitrary time horizons, a probabilistic obstacle intention is assumed as a distribution over obstacle positions and velocities, which can be naturally included in the Gaussian process framework. Our approach is demonstrated in two case studies in which (i), an obstacle overtakes the agent and (ii), an obstacle crosses the agent’s path perpendicularly. In these simulations we show that the collision can be predicted despite having limited knowledge of the obstacle’s behavior.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.02765

PDF

http://arxiv.org/pdf/1904.02765
Read All
Learning Implicit Generative Models by Matching Perceptual Features

2019-04-04

Cicero Nogueira dos Santos, Youssef Mroueh, Inkit Padhi, Pierre Dognin

arXiv_CV

arXiv_CV Adversarial Super_Resolution Style_Transfer Transfer_Learning
Abstract

Perceptual features (PFs) have been used with great success in tasks such as transfer learning, style transfer, and super-resolution. However, the efficacy of PFs as key source of information for learning generative models is not well studied. We investigate here the use of PFs in the context of learning implicit generative models through moment matching (MM). More specifically, we propose a new effective MM approach that learns implicit generative models by performing mean and covariance matching of features extracted from pretrained ConvNets. Our proposed approach improves upon existing MM methods by: (1) breaking away from the problematic min/max game of adversarial learning; (2) avoiding online learning of kernel functions; and (3) being efficient with respect to both number of used moments and required minibatch size. Our experimental results demonstrate that, due to the expressiveness of PFs from pretrained deep ConvNets, our method achieves state-of-the-art results for challenging benchmarks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.02762

PDF

https://arxiv.org/pdf/1904.02762
Read All

90/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL