Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Bayesian Pitch Tracking Based on the Harmonic Model

2019-05-21

Liming Shi, Jesper Kjaer Nielsen, Jesper Rindom Jensen, Max A. Little, Mads Graesboll Christensen

arXiv_SD

arXiv_SD Tracking Detection Relation
Abstract

Fundamental frequency is one of the most important characteristics of speech and audio signals. Harmonic model-based fundamental frequency estimators offer a higher estimation accuracy and robustness against noise than the widely used autocorrelation-based methods. However, the traditional harmonic model-based estimators do not take the temporal smoothness of the fundamental frequency, the model order, and the voicing into account as they process each data segment independently. In this paper, a fully Bayesian fundamental frequency tracking algorithm based on the harmonic model and a first-order Markov process model is proposed. Smoothness priors are imposed on the fundamental frequencies, model orders, and voicing using first-order Markov process models. Using these Markov models, fundamental frequency estimation and voicing detection errors can be reduced. Using the harmonic model, the proposed fundamental frequency tracker has an improved robustness to noise. An analytical form of the likelihood function, which can be computed efficiently, is derived. Compared to the state-of-the-art neural network and non-parametric approaches, the proposed fundamental frequency tracking algorithm reduces the mean absolute errors and gross errors by 15\% and 20\% on the Keele pitch database and 36\% and 26\% on sustained /a/ sounds from a database of Parkinson’s disease voices under 0 dB white Gaussian noise. A MATLAB version of the proposed algorithm is made freely available for reproduction of the results\footnote{An implementation of the proposed algorithm using MATLAB may be found in \url{https://tinyurl.com/yxn4a543}

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08557

PDF

http://arxiv.org/pdf/1905.08557
Read All
A multi-room reverberant dataset for sound event localization and detection

2019-05-21

Sharath Adavanne, Archontis Politis, Tuomas Virtanen

arXiv_SD

arXiv_SD CNN Detection
Abstract

This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge. The goal of the SELD task is to detect the temporal activities of a known set of sound event classes, and further localize them in space when active. As part of the challenge, a synthesized dataset with each sound event associated with a spatial coordinate represented using azimuth and elevation angles is provided. These sound events are spatialized using real-life impulse responses collected at multiple spatial coordinates in five different rooms with varying dimensions and material properties. A baseline SELD method employing a convolutional recurrent neural network is used to generate benchmark scores for this reverberant dataset. The benchmark scores are obtained using the recommended cross-validation setup.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08546

PDF

http://arxiv.org/pdf/1905.08546
Read All
Contrast Enhancement of Medical X-Ray Image Using Morphological Operators with Optimal Structuring Element

2019-05-21

Rafsanjany Kushol, Md. Nishat Raihan, Md Sirajus Salekin, A. B. M. Ashikur Rahman

arXiv_CV

arXiv_CV GAN Image_Enhancement
Abstract

To guide surgical and medical treatment X-ray images have been used by physicians in every modern healthcare organization and hospitals. Doctor’s evaluation process and disease identification in the area of skeletal system can be performed in a faster and efficient way with the help of X-ray imaging technique as they can depict bone structure painlessly. This paper presents an efficient contrast enhancement technique using morphological operators which will help to visualize important bone segments and soft tissues more clearly. Top-hat and Bottom-hat transform are utilized to enhance the image where gradient magnitude value is calculated for automatically selecting the structuring element (SE) size. Experimental evaluation on different x-ray imaging databases shows the effectiveness of our method which also produces comparatively better output against some existing image enhancement techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08545

PDF

http://arxiv.org/pdf/1905.08545
Read All
A Two-stage Classification Method for High-dimensional Data and Point Clouds

2019-05-21

Xiaohao Cai, Raymond Chan, Xiaoyu Xie, Tieyong Zeng

arXiv_CV

arXiv_CV Classification
Abstract

High-dimensional data classification is a fundamental task in machine learning and imaging science. In this paper, we propose a two-stage multiphase semi-supervised classification method for classifying high-dimensional data and unstructured point clouds. To begin with, a fuzzy classification method such as the standard support vector machine is used to generate a warm initialization. We then apply a two-stage approach named SaT (smoothing and thresholding) to improve the classification. In the first stage, an unconstraint convex variational model is implemented to purify and smooth the initialization, followed by the second stage which is to project the smoothed partition obtained at stage one to a binary partition. These two stages can be repeated, with the latest result as a new initialization, to keep improving the classification quality. We show that the convex model of the smoothing stage has a unique solution and can be solved by a specifically designed primal-dual algorithm whose convergence is guaranteed. We test our method and compare it with the state-of-the-art methods on several benchmark data sets. The experimental results demonstrate clearly that our method is superior in both the classification accuracy and computation speed for high-dimensional data and point clouds.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08538

PDF

http://arxiv.org/pdf/1905.08538
Read All
Qualitative vision-based navigation based on sloped funnel lane concept

2019-05-21

Mohamad Mahdi Kassir, Maziar Palhang, Mohammad Reza Ahmadzadeh

arXiv_RO

arXiv_RO Face
Abstract

Funnel lane concept is a qualitative visual navigation method which helps robots to autonomously navigate by using a recorded video. A visual path is extracted from the video by extracting some keyframes from the video. The robot uses this visual path for its navigation. Funnel lane unlike some other methods does not make use of traditional calculations of Jacobians, homographies, fundamental matrices, or the focus of expansion, and does not require any camera calibration. However, funnel lane has some shortcomings. One problem is that funnel lane gives no information about the radius of rotation, so in turnings, the robot turns by a constant radius of rotation along the path. This reduces the maneuverability and limits the robot from dealing with all turnings conditions. In addition, this problem makes the robot faces a serious problem in correcting its path when it deviates from the desired path. Another flaw is that in some situations the robot faces an ambiguity to understand whether a translation or a rotation should be followed in the visual path which leads the robot to deviate and to fail in following the desired path. This paper introduces the sloped funnel lane technique which does not have these shortcomings. The roll and pitch angles are added to the funnel lane, which help the robot to set its radius of rotation according to the turnings conditions it faces. Moreover, they help to reduce the ambiguity between translation and rotation. Therefore the robot can deal with different turnings conditions and the navigation method will be more robust and accurate. Experimental results on challenging scenarios on a real ground robot demonstrate the effectiveness of sloped funnel lane technique.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.07707

PDF

http://arxiv.org/pdf/1808.07707
Read All
Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search

2019-05-21

Youhei Akimoto, Shinichi Shirakawa, Nozomu Yoshinari, Kento Uchida, Shota Saito, Kouhei Nishida

arXiv_CV

arXiv_CV NAS Image_Classification Optimization Classification
Abstract

High sensitivity of neural architecture search (NAS) methods against their input such as step-size (i.e., learning rate) and search space prevents practitioners from applying them out-of-the-box to their own problems, albeit its purpose is to automate a part of tuning process. Aiming at a fast, robust, and widely-applicable NAS, we develop a generic optimization framework for NAS. We turn a coupled optimization of connection weights and neural architecture into a differentiable optimization by means of stochastic relaxation. It accepts arbitrary search space (widely-applicable) and enables to employ a gradient-based simultaneous optimization of weights and architecture (fast). We propose a stochastic natural gradient method with an adaptive step-size mechanism built upon our theoretical investigation (robust). Despite its simplicity and no problem-dependent parameter tuning, our method exhibited near state-of-the-art performances with low computational budgets both on image classification and inpainting tasks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1905.08537

PDF

https://arxiv.org/pdf/1905.08537
Read All
Domain adaptation for part-of-speech tagging of noisy user-generated text

2019-05-21

Luisa März, Dietrich Trautmann, Benjamin Roth

arXiv_CL

arXiv_CL Regularization Embedding RNN
Abstract

The performance of a Part-of-speech (POS) tagger is highly dependent on the domain ofthe processed text, and for many domains there is no or only very little training data available. This work addresses the problem of POS tagging noisy user-generated text using a neural network. We propose an architecture that trains an out-of-domain model on a large newswire corpus, and transfers those weights by using them as a prior for a model trained on the target domain (a data-set of German Tweets) for which there is very little an-notations available. The neural network has two standard bidirectional LSTMs at its core. However, we find it crucial to also encode a set of task-specific features, and to obtain reliable (source-domain and target-domain) word representations. Experiments with different regularization techniques such as early stopping, dropout and fine-tuning the domain adaptation prior weights are conducted. Our best model uses external weights from the out-of-domain model, as well as feature embeddings, pre-trained word and sub-word embeddings and achieves a tagging accuracy of slightly over 90%, improving on the previous state of the art for this task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08920

PDF

http://arxiv.org/pdf/1905.08920
Read All
CNNs found to jump around more skillfully than RNNs: Compositional generalization in seq2seq convolutional networks

2019-05-21

Roberto Dessì, Marco Baroni

arXiv_AI

arXiv_AI CNN RNN
Abstract

Lake and Baroni (2018) introduced the SCAN dataset probing the ability of seq2seq models to capture compositional generalizations, such as inferring the meaning of “jump around” 0-shot from the component words. Recurrent networks (RNNs) were found to completely fail the most challenging generalization cases. We test here a convolutional network (CNN) on these tasks, reporting hugely improved performance with respect to RNNs. Despite the big improvement, the CNN has however not induced systematic rules, suggesting that the difference between compositional and non-compositional behaviour is not clear-cut.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08527

PDF

http://arxiv.org/pdf/1905.08527
Read All
Learning Latent Beliefs and Performing Epistemic Reasoning for Efficient and Meaningful Dialog Management

2019-05-21

Aishwarya Chhabra, Pratik Saini, Amit Sangroya, C. Anantaram

arXiv_AI

arXiv_AI
Abstract

Many dialogue management frameworks allow the system designer to directly define belief rules to implement an efficient dialog policy. Because these rules are directly defined, the components are said to be hand-crafted. As dialogues become more complex, the number of states, transitions, and policy decisions becomes very large. To facilitate the dialog policy design process, we propose an approach to automatically learn belief rules using a supervised machine learning approach. We validate our ideas in Student-Advisor conversation domain, where we extract latent beliefs like student is curious, confused and neutral, etc. Further, we also perform epistemic reasoning that helps to tailor the dialog according to student’s emotional state and hence improve the overall effectiveness of the dialog system. Our latent belief identification approach shows an accuracy of 87% and this results in efficient and meaningful dialog management.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.10238

PDF

http://arxiv.org/pdf/1811.10238
Read All
Stochastic Inverse Reinforcement Learning

2019-05-21

Ce Ju, Dong Eui Chang

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

Inverse reinforcement learning (IRL) is an ill-posed inverse problem since expert demonstrations may infer many solutions of reward functions which is hard to recover by local search methods such as a gradient method. In this paper, we generalize the original IRL problem to recover a probability distribution for reward functions. We call such a generalized problem stochastic inverse reinforcement learning (SIRL) which is first formulated as an expectation optimization problem. We adopt the Monte Carlo expectation-maximization (MCEM) method, a global search method, to estimate the parameter of the probability distribution as the first solution to SIRL. With our approach, it is possible to observe the deep intrinsic property in IRL from a global viewpoint, and the technique achieves a considerable robust recovery performance on the classic learning environment, objectworld.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08513

PDF

http://arxiv.org/pdf/1905.08513
Read All
Answering while Summarizing: Multi-task Learning for Multi-hop QA with Evidence Extraction

2019-05-21

Kosuke Nishida, Kyosuke Nishida, Masaaki Nagata, Atsushi Otsuka, Itsumi Saito, Hisako Asano, Junji Tomita

arXiv_CL

arXiv_CL QA Attention Summarization RNN
Abstract

Question answering (QA) using textual sources such as reading comprehension (RC) has attracted much attention recently. This study focuses on the task of explainable multi-hop QA, which requires the system to return the answer with evidence sentences by reasoning and gathering disjoint pieces of the reference texts. For evidence extraction of explainable multi-hop QA, the existed method extracted evidence sentences by evaluating the importance of each sentence independently. In this study, we propose the Query Focused Extractor (QFE) model and introduce the multi-task learning of the QA model for answer selection and the QFE model for evidence extraction. QFE sequentially extracts the evidence sentences by an RNN with an attention mechanism to the question sentence, which is inspired by extractive summarization models. It enables QFE to consider the dependency among the evidence sentences and cover the important information in the question sentence. Experimental results show that QFE with the simple RC baseline model achieves a state-of-the-art evidence extraction score on HotpotQA. Although designed for RC, QFE also achieves a state-of-the-art evidence extraction score on FEVER, which is a recognizing textual entailment task on a large textual database.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08511

PDF

http://arxiv.org/pdf/1905.08511
Read All
Neighborhood Enlargement in Graph Neural Networks

2019-05-21

Xinhan Di, Pengqian Yu, Mingchao Sun, Rui Bu

arXiv_CV

arXiv_CV Represenation_Learning Classification Prediction
Abstract

Graph Neural Network (GNN) is an effective framework for representation learning and prediction for graph structural data. A neighborhood aggregation scheme is applied in the training of GNN and variants, that representation of each node is calculated through recursively aggregating and transforming representation of the neighboring nodes. A variety of GNNS and the variants are build and have achieved state-of-the-art results on both node and graph classification tasks. However, despite common neighborhood which is used in the state-of-the-art GNN models, there is little analysis on the properties of the neighborhood in the neighborhood aggregation scheme. Here, we analyze the properties of the node, edges, and neighborhood of the graph model. Our results characterize the efficiency of the common neighborhood used in the state-of-the-art GNNs, and show that it is not sufficient for the representation learning of the nodes. We propose a simple neighborhood which is likely to be more sufficient. We empirically validate our theoretical analysis on a number of graph classification benchmarks and demonstrate that our methods achieve state-of-the-art performance on listed benchmarks. The implementation code is available at \url{https://github.com/CODE-SUBMIT/Neighborhood-Enlargement-in-Graph-Network}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08509

PDF

http://arxiv.org/pdf/1905.08509
Read All
Mesh-based Camera Pairs Selection and Occlusion-Aware Masking for Mesh Refinement

2019-05-21

Andrea Romanoni, Matteo Matteucci

arXiv_CV

arXiv_CV Sparse Face Quantitative
Abstract

Many Multi-View-Stereo algorithms extract a 3D mesh model of a scene, after fusing depth maps into a volumetric representation of the space. Due to the limited scalability of such representations, the estimated model does not capture fine details of the scene. Therefore a mesh refinement algorithm is usually applied; it improves the mesh resolution and accuracy by minimizing the photometric error induced by the 3D model into pairs of cameras. The choice of these pairs significantly affects the quality of the refinement and usually relies on sparse 3D points belonging to the surface. Instead, in this paper, to increase the quality of pairs selection, we exploit the 3D model (before the refinement) to compute five metrics: scene coverage, mutual image overlap, image resolution, camera parallax, and a new symmetry term. To improve the refinement robustness, we also propose an explicit method to manage occlusions, which may negatively affect the computation of the photometric error. The proposed method takes into account the depth of the model while computing the similarity measure and its gradient. We quantitatively and qualitatively validated our approach on publicly available datasets against state of the art reconstruction methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08502

PDF

http://arxiv.org/pdf/1905.08502
Read All
PDH : Probabilistic deep hashing based on MAP estimation of Hamming distance

2019-05-21

Yosuke Kaga, Masakazu Fujio, Kenta Takahashi, Tetsushi Ohki, Masakatsu Nishigaki

arXiv_CV

arXiv_CV Image_Retrieval
Abstract

With the growth of image on the web, research on hashing which enables high-speed image retrieval has been actively studied. In recent years, various hashing methods based on deep neural networks have been proposed and achieved higher precision than the other hashing methods. In these methods, multiple losses for hash codes and the parameters of neural networks are defined. They generate hash codes that minimize the weighted sum of the losses. Therefore, an expert has to tune the weights for the losses heuristically, and the probabilistic optimality of the loss function cannot be explained. In order to generate explainable hash codes without weight tuning, we theoretically derive a single loss function with no hyperparameters for the hash code from the probability distribution of the images. By generating hash codes that minimize this loss function, highly accurate image retrieval with probabilistic optimality is performed. We evaluate the performance of hashing using MNIST, CIFAR-10, SVHN and show that the proposed method outperforms the state-of-the-art hashing methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08501

PDF

http://arxiv.org/pdf/1905.08501
Read All
Exploring Bias in GAN-based Data Augmentation for Small Samples

2019-05-21

Mengxiao Hu, Jinlong Li

arXiv_AI

arXiv_AI Adversarial GAN
Abstract

For machine learning task, lacking sufficient samples mean the trained model has low confidence to approach the ground truth function. Until recently, after the generative adversarial networks (GAN) had been proposed, we see the hope of small samples data augmentation (DA) with realistic fake data, and many works validated the viability of GAN-based DA. Although most of the works pointed out higher accuracy can be achieved using GAN-based DA, some researchers stressed that the fake data generated from GAN has inherent bias, and in this paper, we explored when the bias is so low that it cannot hurt the performance, we set experiments to depict the bias in different GAN-based DA setting, and from the results, we design a pipeline to inspect specific dataset is efficiently-augmentable with GAN-based DA or not. And finally, depending on our trial to reduce the bias, we proposed some advice to mitigate bias in GAN-based DA application.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08495

PDF

http://arxiv.org/pdf/1905.08495
Read All
DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

2019-05-21

Marvin Tammen, Dörte Fischer, Simon Doclo

arXiv_SD

arXiv_SD Relation
Abstract

Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-variance-distortionless-response (MVDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, provided that an accurate estimate of the correlation matrices and especially the speech interframe correlation vector is available. Typical estimation procedures of the correlation matrices and the speech interframe correlation (IFC) vector require an estimate of the speech presence probability (SPP) in each time-frequency bin. In this paper, we propose to use a bi-directional long short-term memory deep neural network (DNN) to estimate a speech mask and a noise mask for each time-frequency bin, using which two different SPP estimates are derived. Aiming at achieving a robust performance, the DNN is trained for various noise types and signal-to-noise ratios. Experimental results show that the multi-frame MVDR in combination with the proposed data-driven SPP estimator yields an increased speech quality compared to a state-of-the-art model-based estimator.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08492

PDF

http://arxiv.org/pdf/1905.08492
Read All
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems

2019-05-21

Ohsung Kwon, Eunwoo Song, Jae-Min Kim, Hong-Goo Kang

arXiv_SD

arXiv_SD Attention Embedding Prediction
Abstract

In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method. Our previous research verified the effectiveness of the ExcitNet-based speech generation model in a parametric TTS framework. However, the challenge remains to build a high-quality speech synthesis system because auxiliary conditional features estimated by a simple deep neural network often contain large prediction errors, and the errors are inevitably propagated throughout the autoregressive generation process of the ExcitNet vocoder. To generate more natural speech signals, we exploited a sequence-to-sequence (seq2seq) acoustic model with an attention-based generative network (e.g., Tacotron 2) to estimate the condition parameters of the ExcitNet vocoder. Because the seq2seq acoustic model accurately estimates spectral parameters, and because the ExcitNet model effectively generates the corresponding time-domain excitation signals, combining these two models can synthesize natural speech signals. Furthermore, we verified the merit of the proposed method in producing expressive speech segments by adopting a global style token-based emotion embedding method. The experimental results confirmed that the proposed system significantly outperforms the systems with a similarly configured conventional WaveNet vocoder and our best prior parametric TTS counterpart.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08486

PDF

http://arxiv.org/pdf/1905.08486
Read All
S-Flow GAN

2019-05-21

Yakov Miron, Yona Coscas

arXiv_CV

arXiv_CV Adversarial GAN Embedding
Abstract

This work offers a new method for generating photo-realistic images from semantic label maps and a simulator edge map images. We do so in a conditional manner, where we train a Generative Adversarial network (GAN) given an image and its semantic label map to output a photo-realistic version of that scene. Existing architectures of GANs still lack the photo-realism capabilities. We address this issue by embedding edge maps, and presenting the Generator with an edge map image as a prior, which enables generating high level details in the image. We offer a model that uses this generator to create visually appealing videos as well, when a sequence of images is given.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08474

PDF

http://arxiv.org/pdf/1905.08474
Read All
Robustness Against Outliers For Deep Neural Networks By Gradient Conjugate Priors

2019-05-21

Pavel Gurevich, Hannes Stuke

arXiv_AI

arXiv_AI Knowledge
Abstract

We analyze a new robust method for the reconstruction of probability distributions of observed data in the presence of output outliers. It is based on a so-called gradient conjugate prior (GCP) network which outputs the parameters of a prior. By rigorously studying the dynamics of the GCP learning process, we derive an explicit formula for correcting the obtained variance of the marginal distribution and removing the bias caused by outliers in the training set. Assuming a Gaussian (input-dependent) ground truth distribution contaminated with a proportion $\varepsilon$ of outliers, we show that the fitted mean is in a $c e^{-1/\varepsilon}$-neighborhood of the ground truth mean and the corrected variance is in a $b\varepsilon$-neighborhood of the ground truth variance, whereas the uncorrected variance of the marginal distribution can even be infinite. We explicitly find $b$ as a function of the output of the GCP network, without a priori knowledge of the outliers (possibly input-dependent) distribution. Experiments with synthetic and real-world data sets indicate that the GCP network fitted with a standard optimizer outperforms other robust methods for regression.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08464

PDF

http://arxiv.org/pdf/1905.08464
Read All
Parallel Neural Text-to-Speech

2019-05-21

Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

arXiv_CL

arXiv_CL Attention CNN
Abstract

In this work, we propose a non-autoregressive seq2seq model that converts text to spectrogram. It is fully convolutional and obtains about 17.5 times speed-up over Deep Voice 3 at synthesis while maintaining comparable speech quality using a WaveNet vocoder. Interestingly, it has even fewer attention errors than the autoregressive model on the challenging test sentences. Furthermore, we build the first fully parallel neural text-to-speech system by applying the inverse autoregressive flow~(IAF) as the parallel neural vocoder. Our system can synthesize speech from text through a single feed-forward pass. We also explore a novel approach to train the IAF from scratch as a generative model for raw waveform, which avoids the need for distillation from a separately trained WaveNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08459

PDF

http://arxiv.org/pdf/1905.08459
Read All
A Seq-to-Seq Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation

2019-05-21

Wei Jiang, Yan Tang

arXiv_CL

arXiv_CL Segmentation CNN RNN
Abstract

The prevalent approaches of Chinese word segmentation task almost rely on the Bi-LSTM neural network. However, the methods based the Bi-LSTM have some inherent drawbacks: hard to parallel computing, little efficient in applying the Dropout method to inhibit the Overfitting and little efficient in capturing the character information at the more distant site of a long sentence for the word segmentation task. In this work, we propose a sequence-to-sequence transformer model for Chinese word segmentation, which is premised a type of convolutional neural network named temporal convolutional network. The model uses the temporal convolutional network to construct an encoder, and uses one layer of fully-connected neural network to build a decoder, and applies the Dropout method to inhibit the Overfitting, and captures the character information at the distant site of a sentence by adding the layers of the encoder, and binds Conditional Random Fields model to train parameters, and uses the Viterbi algorithm to infer the final result of the Chinese word segmentation. The experiments on traditional Chinese corpora and simplified Chinese corpora show that the performance of Chinese word segmentation of the model is equivalent to the performance of the methods based the Bi-LSTM, and the model has a tremendous growth in parallel computing than the models based the Bi-LSTM.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08454

PDF

http://arxiv.org/pdf/1905.08454
Read All
Towards Safety-Aware Computing System Design in Autonomous Vehicles

2019-05-21

Hengyu Zhao, Yubo Zhang, Pingfan Meng, Hui Shi, Li Erran Li, Tiancheng Lou, Jishen Zhao

arXiv_RO

arXiv_RO Knowledge
Abstract

Recently, autonomous driving development ignited competition among car makers and technical corporations. Low-level automation cars are already commercially available. But high automated vehicles where the vehicle drives by itself without human monitoring is still at infancy. Such autonomous vehicles (AVs) entirely rely on the computing system in the car to to interpret the environment and make driving decisions. Therefore, computing system design is essential particularly in enhancing the attainment of driving safety. However, to our knowledge, no clear guideline exists so far regarding safety-aware AV computing system and architecture design. To understand the safety requirement of AV computing system, we performed a field study by running industrial Level-4 autonomous driving fleets in various locations, road conditions, and traffic patterns. The field study indicates that traditional computing system performance metrics, such as tail latency, average latency, maximum latency, and timeout, cannot fully satisfy the safety requirement for AV computing system design. To address this issue, we propose a `safety score’ as a primary metric for measuring the level of safety in AV computing system design. Furthermore, we propose a perception latency model, which helps architects estimate the safety score of given architecture and system design without physically testing them in an AV. We demonstrate the use of our safety score and latency model, by developing and evaluating a safety-aware AV computing system computation hardware resource management scheme.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08453

PDF

http://arxiv.org/pdf/1905.08453
Read All
Medical Image Analysis using Convolutional Neural Networks: A Review

2019-05-21

Syed Muhammad Anwar, Muhammad Majid, Adnan Qayyum, Muhammad Awais, Majdi Alnowami, Muhammad Khurram Khan

arXiv_CV

arXiv_CV Review Segmentation CNN Classification Deep_Learning Detection
Abstract

The science of solving clinical problems by analyzing images generated in clinical practice is known as medical image analysis. The aim is to extract information in an effective and efficient manner for improved clinical diagnosis. The recent advances in the field of biomedical engineering has made medical image analysis one of the top research and development area. One of the reason for this advancement is the application of machine learning techniques for the analysis of medical images. Deep learning is successfully used as a tool for machine learning, where a neural network is capable of automatically learning features. This is in contrast to those methods where traditionally hand crafted features are used. The selection and calculation of these features is a challenging task. Among deep learning techniques, deep convolutional networks are actively used for the purpose of medical image analysis. This include application areas such as segmentation, abnormality detection, disease classification, computer aided diagnosis and retrieval. In this study, a comprehensive review of the current state-of-the-art in medical image analysis using deep convolutional networks is presented. The challenges and potential of these techniques are also highlighted.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1709.02250

PDF

http://arxiv.org/pdf/1709.02250
Read All
Deep Drone Racing: From Simulation to Reality with Domain Randomization

2019-05-21

Antonio Loquercio, Elia Kaufmann, René Ranftl, Alexey Dosovitskiy, Vladlen Koltun, Davide Scaramuzza

arXiv_RO

arXiv_RO Drone CNN
Abstract

Dynamically changing environments, unreliable state estimation, and operation under severe resource constraints are fundamental challenges for robotics, which still limit the deployment of small autonomous drones. We address these challenges in the context of autonomous, vision-based drone racing in dynamic environments. A racing drone must traverse a track with possibly moving gates at high speed. We enable this functionality by combining the performance of a state-of-the-art path-planning and control system with the perceptual awareness of a convolutional neural network (CNN). The CNN directly maps raw images to a desired waypoint and speed. Given the CNN output, the planner generates a short minimum-jerk trajectory segment that is tracked by a model-based controller to actuate the drone towards the waypoint. The resulting modular system has several desirable features: (i) it can run fully on-board, (ii) it does not require globally consistent state estimation, and (iii) it is both platform and domain independent. We extensively test the precision and robustness of our system, both in simulation and on a physical platform. In both domains, our method significantly outperforms the prior state of the art. In order to understand the limits of our approach, we additionally compare against professional human drone pilots with different skill levels.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.09727

PDF

http://arxiv.org/pdf/1905.09727
Read All
A Classification-based Approach for Approximate Reachability

2019-05-21

Vicenc Rubies-Royo, David Fridovich-Keil, Sylvia Herbert, Claire J. Tomlin

arXiv_RO

arXiv_RO Classification
Abstract

Hamilton-Jacobi (HJ) reachability analysis has been developed over the past decades into a widely-applicable tool for determining goal satisfaction and safety verification in nonlinear systems. While HJ reachability can be formulated very generally, computational complexity can be a serious impediment for many systems of practical interest. Much prior work has been devoted to computing approximate solutions to large reachability problems, yet many of these methods may only apply to very restrictive problem classes, do not generate controllers, and/or can be extremely conservative. In this paper, we present a new method for approximating the optimal controller of the HJ reachability problem for control-affine systems. While also a specific problem class, many dynamical systems of interest are, or can be well approximated, by control-affine models. We explicitly avoid storing a representation of the reachability value function, and instead learn a controller as a sequence of simple binary classifiers. We compare our approach to existing grid-based methodologies in HJ reachability and demonstrate its utility on several examples, including a physical quadrotor navigation task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.03237

PDF

http://arxiv.org/pdf/1803.03237
Read All
Clustering with Similarity Preserving

2019-05-21

Zhao Kang, Honghui Xu, Boyu Wang, Hongyuan Zhu, Zenglin Xu

arXiv_AI

arXiv_AI
Abstract

Graph-based clustering has shown promising performance in many tasks. A key step of graph-based approach is the similarity graph construction. In general, learning graph in kernel space can enhance clustering accuracy due to the incorporation of nonlinearity. However, most existing kernel-based graph learning mechanisms is not similarity-preserving, hence leads to sub-optimal performance. To overcome this drawback, we propose a more discriminative graph learning method which can preserve the pairwise similarities between samples in an adaptive manner for the first time. Specifically, we require the learned graph be close to a kernel matrix, which serves as a measure of similarity in raw data. Moreover, the structure is adaptively tuned so that the number of connected components of the graph is exactly equal to the number of clusters. Finally, our method unifies clustering and graph learning which can directly obtain cluster indicators from the graph itself without performing further clustering step. The effectiveness of this approach is examined on both single and multiple kernel learning scenarios in several datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08419

PDF

http://arxiv.org/pdf/1905.08419
Read All
Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

2019-05-21

Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, Kurt Keutzer

arXiv_CV

arXiv_CV Attention Inference Classification
Abstract

Using FPGAs to accelerate ConvNets has attracted significant attention in recent years. However, FPGA accelerator design has not leveraged the latest progress of ConvNets. As a result, the key application characteristics such as frames-per-second (FPS) are ignored in favor of simply counting GOPs, and results on accuracy, which is critical to application success, are often not even reported. In this work, we adopt an algorithm-hardware co-design approach to develop a ConvNet accelerator called Synetgy and a novel ConvNet model called DiracDeltaNet. Both the accelerator and ConvNet are tailored to FPGA requirements. DiracDeltaNet, as the name suggests, is a ConvNet with only 1$\times$1 convolutions while spatial convolutions are replaced by more efficient shift operations. DiracDeltaNet achieves competitive accuracy on ImageNet (88.7% top-5), but with 42$\times$ fewer parameters and 48$\times$ fewer OPs than VGG16. We further quantize DiracDeltaNet’s weights to 4-bit and activations to 4-bits, with less than 1% accuracy loss. These quantizations exploit well the nature of FPGA hardware. In short, DiracDeltaNet’s small model size, low computational OP count, low precision and simplified operators allow us to co-design a highly customized computing unit for an FPGA. We implement the computing units for DiracDeltaNet on an Ultra96 SoC system through high-level synthesis. Our accelerator’s final top-5 accuracy of 88.1% on ImageNet, is higher than all the previously reported embedded FPGA accelerators. In addition, the accelerator reaches an inference speed of 96.5 FPS on the ImageNet classification task, surpassing prior works with similar accuracy by at least 16.9$\times$.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.08634

PDF

http://arxiv.org/pdf/1811.08634
Read All
A novel algorithm for segmentation of leukocytes in peripheral blood

2019-05-21

Haichao Cao, Hong Liu, Enmin Song

arXiv_CV

arXiv_CV Segmentation Detection
Abstract

In the detection of anemia, leukemia and other blood diseases, the number and type of leukocytes are essential evaluation parameters. However, the conventional leukocyte counting method is not only quite time-consuming but also error-prone. Consequently, many automation methods are introduced for the diagnosis of medical images. It remains difficult to accurately extract related features and count the number of cells under the variable conditions such as background, staining method, staining degree, light conditions and so on. Therefore, in order to adapt to various complex situations, we consider RGB color space, HSI color space, and the linear combination of G, H and S components, and propose a fast and accurate algorithm for the segmentation of peripheral blood leukocytes in this paper. First, the nucleus of leukocyte was separated by using the stepwise averaging method. Then based on the interval-valued fuzzy sets, the cytoplasm of leukocyte was segmented by minimizing the fuzzy divergence. Next, post-processing was carried out by using the concave-convex iterative repair algorithm and the decision mechanism of candidate mask sets. Experimental results show that the proposed method outperforms the existing non-fuzzy sets methods. Among the methods based on fuzzy sets, the interval-valued fuzzy sets perform slightly better than interval-valued intuitionistic fuzzy sets and intuitionistic fuzzy sets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08416

PDF

http://arxiv.org/pdf/1905.08416
Read All
Dual-branch residual network for lung nodule segmentation

2019-05-21

Haichao Cao, Hong Liu, Enmin Song, Chih-Cheng Hung, Guangzhi Ma, Xiangyang Xu, Renchao Jin, Jianguo Lu

arXiv_CV

arXiv_CV Segmentation CNN
Abstract

An accurate segmentation of lung nodules in computed tomography (CT) images is critical to lung cancer analysis and diagnosis. However, due to the variety of lung nodules and the similarity of visual characteristics between nodules and their surroundings, a robust segmentation of nodules becomes a challenging problem. In this study, we propose the Dual-branch Residual Network (DB-ResNet) which is a data-driven model. Our approach integrates two new schemes to improve the generalization capability of the model: 1) the proposed model can simultaneously capture multi-view and multi-scale features of different nodules in CT images; 2) we combine the features of the intensity and the convolution neural networks (CNN). We propose a pooling method, called the central intensity-pooling layer (CIP), to extract the intensity features of the center voxel of the block, and then use the CNN to obtain the convolutional features of the center voxel of the block. In addition, we designed a weighted sampling strategy based on the boundary of nodules for the selection of those voxels using the weighting score, to increase the accuracy of the model. The proposed method has been extensively evaluated on the LIDC dataset containing 986 nodules. Experimental results show that the DB-ResNet achieves superior segmentation performance with an average dice score of 82.74% on the dataset. Moreover, we compared our results with those of four radiologists on the same dataset. The comparison showed that our average dice score was 0.49% higher than that of human experts. This proves that our proposed method is as good as the experienced radiologist.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08413

PDF

http://arxiv.org/pdf/1905.08413
Read All
Position Paper: From Multi-Agent Pathfinding to Pipe Routing

2019-05-21

Gleb Belov, Liron Cohen, Maria Garcia de la Banda, Daniel Harabor, Sven Koenig, Xinrui Wei

arXiv_AI

arXiv_AI
Abstract

The 2D Multi-Agent Path Finding (MAPF) problem aims at finding collision-free paths for a number of agents, from a set of start locations to a set of goal positions in a known 2D environment. MAPF has been studied in theoretical computer science, robotics, and artificial intelligence over several decades, due to its importance for robot navigation. It is currently experiencing significant scientific progress due to its relevance in automated warehousing (such as those operated by Amazon) and in other contemporary application areas. In this paper, we demonstrate that many recently developed MAPF algorithms apply more broadly than currently believed in the MAPF research community. In particular, we describe the 3D Pipe Routing (PR) problem, which aims at placing collision-free pipes from given start locations to given goal locations in a known 3D environment. The MAPF and PR problems are similar: a solution to a MAPF instance is a set of blocked cells in x-y-t space, while a solution to the corresponding PR instance is a set of blocked cells in x-y-z space. We show how to use this similarity to apply several recently developed MAPF algorithms to the PR problem, and discuss their performance on abstract PR instances. We also discuss further research necessary to tackle real-world pipe-routing instances of interest to industry today. This opens up a new direction of industrial relevance for the MAPF research community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08412

PDF

http://arxiv.org/pdf/1905.08412
Read All
Convolutions on Spherical Images

2019-05-21

Marc Eder, Jan-Michael Frahm

arXiv_CV

arXiv_CV Segmentation CNN Semantic_Segmentation Inference
Abstract

Applying convolutional neural networks to spherical images requires particular considerations. We look to the millennia of work on cartographic map projections to provide the tools to define an optimal representation of spherical images for the convolution operation. We propose a representation for deep spherical image inference based on the icosahedral Snyder equal-area (ISEA) projection, a projection onto a geodesic grid, and show that it vastly exceeds the state-of-the-art for convolution on spherical images, improving semantic segmentation results by 12.6%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08409

PDF

http://arxiv.org/pdf/1905.08409
Read All
Generating Logical Forms from Graph Representations of Text and Entities

2019-05-21

Peter Shaw, Philip Massey, Angelica Chen, Francesco Piccinno, Yasemin Altun

arXiv_CL

arXiv_CL Relation
Abstract

Structured information about entities is critical for many semantic parsing tasks. We present an approach that uses a Graph Neural Network (GNN) architecture to incorporate information about relevant entities and their relations during parsing. Combined with a decoder copy mechanism, this approach provides a conceptually simple mechanism to generate logical forms with entities. We demonstrate that this approach is competitive with state-of-the-art across several tasks without pre-training, and outperforms existing approaches when combined with BERT pre-training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08407

PDF

http://arxiv.org/pdf/1905.08407
Read All
A Causality-Guided Prediction of the TED Talk Ratings from the Speech-Transcripts using Neural Networks

2019-05-21

Md Iftekhar Tanveer, Md Kamrul Hasan, Daniel Gildea, M. Ehsan Hoque

arXiv_CL

arXiv_CL Prediction
Abstract

Automated prediction of public speaking performance enables novel systems for tutoring public speaking skills. We use the largest open repository—TED Talks—to predict the ratings provided by the online viewers. The dataset contains over 2200 talk transcripts and the associated meta information including over 5.5 million ratings from spontaneous visitors to the website. We carefully removed the bias present in the dataset (e.g., the speakers’ reputations, popularity gained by publicity, etc.) by modeling the data generating process using a causal diagram. We use a word sequence based recurrent architecture and a dependency tree based recursive architecture as the neural networks for predicting the TED talk ratings. Our neural network models can predict the ratings with an average F-score of 0.77 which largely outperforms the competitive baseline method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08392

PDF

http://arxiv.org/pdf/1905.08392
Read All
Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes

2019-05-20

Aina Garí Soler, Marianna Apidianaki, Alexandre Allauzen

arXiv_CL

arXiv_CL Embedding RNN Prediction
Abstract

Usage similarity estimation addresses the semantic proximity of word instances in different contexts. We apply contextualized (ELMo and BERT) word and sentence embeddings to this task, and propose supervised models that leverage these representations for prediction. Our models are further assisted by lexical substitute annotations automatically assigned to word instances by context2vec, a neural model that relies on a bidirectional LSTM. We perform an extensive comparison of existing word and sentence representations on benchmark datasets addressing both graded and binary similarity. The best performing models outperform previous methods in both settings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08377

PDF

http://arxiv.org/pdf/1905.08377
Read All
Building Damage Annotation on Post-Hurricane Satellite Imagery Based on Convolutional Neural Networks

2019-05-20

Quoc Dung Cao, Youngjun Choe

arXiv_CV

arXiv_CV Survey CNN Image_Classification Classification
Abstract

After a hurricane, damage assessment is critical to emergency managers for efficient response and resource allocation. One way to gauge the damage extent is to quantify the number of flooded/damaged buildings, which is traditionally done by ground survey. This process can be labor-intensive and time-consuming. In this paper, we propose to improve the efficiency of building damage assessment by applying image classification algorithms to post-hurricane satellite imagery. At the known building coordinates (available from public data), we extract square-sized images from the satellite imagery to create training, validation, and test datasets. Each square-sized image contains a building to be classified as either ‘Flooded/Damaged’ (labeled by volunteers in a crowd-sourcing project) or ‘Undamaged’. We design and train a convolutional neural network from scratch and compare it with an existing neural network used widely for common object classification. We demonstrate the promise of our damage annotation model (over 97% accuracy) in the case study of building damage assessment in the Greater Houston area affected by 2017 Hurricane Harvey.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.01688

PDF

http://arxiv.org/pdf/1807.01688
Read All
A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

2019-05-20

Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen

arXiv_CV

arXiv_CV Object_Detection Optimization Inference Deep_Learning Detection
Abstract

Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption. Existing approaches typically separate the DNN model development step from its deployment on IoT devices, resulting in suboptimal solutions. In this paper, we first introduce a few interesting but counterintuitive observations about such a separate design approach, and empirically show why it may lead to suboptimal designs. Motivated by these observations, we then propose a novel and practical bi-directional co-design approach: a bottom-up DNN model design strategy together with a top-down flow for DNN accelerator design. It enables a joint optimization of both DNN models and their deployment configurations on IoT devices as represented as FPGAs. We demonstrate the effectiveness of the proposed co-design approach on a real-life object detection application using Pynq-Z1 embedded FPGA. Our method obtains the state-of-the-art results on both QoR with high accuracy (IoU) and QoS with high throughput (FPS) and high energy efficiency.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08369

PDF

http://arxiv.org/pdf/1905.08369
Read All
Naive probability

2019-05-20

Zalan Gyenis, Andras Kornai

arXiv_AI

arXiv_AI
Abstract

We describe a rational, but low resolution model of probability.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.10924

PDF

http://arxiv.org/pdf/1905.10924
Read All
Kernelized Hashcode Representations for Relation Extraction

2019-05-20

Sahil Garg, Aram Galstyan, Greg Ver Steeg, Irina Rish, Guillermo Cecchi, Shuyang Gao

arXiv_CL

arXiv_CL Relation_Extraction Classification Relation
Abstract

Kernel methods have produced state-of-the-art results for a number of NLP tasks such as relation extraction, but suffer from poor scalability due to the high cost of computing kernel similarities between natural language structures. A recently proposed technique, kernelized locality-sensitive hashing (KLSH), can significantly reduce the computational cost, but is only applicable to classifiers operating on kNN graphs. Here we propose to use random subspaces of KLSH codes for efficiently constructing an explicit representation of NLP structures suitable for general classification methods. Further, we propose an approach for optimizing the KLSH model for classification problems by maximizing an approximation of mutual information between the KLSH codes (feature vectors) and the class labels. We evaluate the proposed approach on biomedical relation extraction datasets, and observe significant and robust improvements in accuracy w.r.t. state-of-the-art classifiers, along with drastic (orders-of-magnitude) speedup compared to conventional kernel methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1711.04044

PDF

http://arxiv.org/pdf/1711.04044
Read All
Why Machines Cannot Learn Mathematics, Yet

2019-05-20

André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky, Bela Gipp

arXiv_AI

arXiv_AI Knowledge Embedding
Abstract

Nowadays, Machine Learning (ML) is seen as the universal solution to improve the effectiveness of information retrieval (IR) methods. However, while mathematics is a precise and accurate science, it is usually expressed by less accurate and imprecise descriptions, contributing to the relative dearth of machine learning applications for IR in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in ML, it seems canonical to apply ML techniques to represent and retrieve mathematics semantically. In this work, we apply popular text embedding techniques to the arXiv collection of STEM documents and explore how these are unable to properly understand mathematics from that corpus. In addition, we also investigate the missing aspects that would allow mathematics to be learned by computers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08359

PDF

http://arxiv.org/pdf/1905.08359
Read All
Robust sound event detection in bioacoustic sensor networks

2019-05-20

Vincent Lostanlen, Justin Salamon, Andrew Farnsworth, Steve Kelling, Juan Pablo Bello

arXiv_SD

arXiv_SD Object_Detection CNN Classification Prediction Detection
Abstract

Bioacoustic sensors, sometimes known as autonomous recording units (ARUs), can record sounds of wildlife over long periods of time in scalable and minimally invasive ways. Deriving per-species abundance estimates from these sensors requires detection, classification, and quantification of animal vocalizations as individual acoustic events. Yet, variability in ambient noise, both over time and across sensors, hinders the reliability of current automated systems for sound event detection (SED), such as convolutional neural networks (CNN) in the time-frequency domain. In this article, we develop, benchmark, and combine several machine listening techniques to improve the generalizability of SED models across heterogeneous acoustic environments. As a case study, we consider the problem of detecting avian flight calls from a ten-hour recording of nocturnal bird migration, recorded by a network of six ARUs in the presence of heterogeneous background noise. Starting from a CNN yielding state-of-the-art accuracy on this task, we introduce two noise adaptation techniques, respectively integrating short-term (60-millisecond) and long-term (30-minute) context. First, we apply per-channel energy normalization (PCEN) in the time-frequency domain, which applies short-term automatic gain control to every subband in the mel-frequency spectrogram. Secondly, we replace the last dense layer in the network by a context-adaptive neural network (CA-NN) layer, i.e. an affine layer whose weights are dynamically adapted at prediction time by an auxiliary network taking long-term summary statistics of spectrotemporal features as input. We show that both techniques are helpful and complementary. […] We release a pre-trained version of our best performing system under the name of BirdVoxDetect, a ready-to-use detector of avian flight calls in field recordings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08352

PDF

http://arxiv.org/pdf/1905.08352
Read All
Decrement Operators in Belief Change

2019-05-20

Kai Sauerwald, Christoph Beierle

arXiv_AI

arXiv_AI Attention
Abstract

While research on iterated revision is predominant in the field of iterated belief change, the class of iterated contraction operators received more attention in recent years. In this article, we examine a non-prioritized generalisation of iterated contraction. In particular, the class of weak decrement operators is introduced, which are operators that by multiple steps achieve the same as a contraction. Inspired by Darwiche and Pearl’s work on iterated revision the subclass of decrement operators is defined. For both, decrement and weak decrement operators, postulates are presented and for each of them a representation theorem in the framework of total preorders is given. Furthermore, we present two types of decrement operators which have a unique representative.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08347

PDF

http://arxiv.org/pdf/1905.08347
Read All
CODIT: Code Editing with Tree-Based NeuralMachine Translation

2019-05-20

Saikat Chakraborty, Miltiadis Allamanis, Baishakhi Ray

arXiv_CL

arXiv_CL NMT
Abstract

The way developers edit day-to-day code tends to be repetitive, often using existing code elements. Many researchers have tried to automate repetitive code changes by learning from specific change templates which are applied to limited scope. The advancement of Neural Machine Translation (NMT) and the availability of vast open-source evolutionary data opens up the possibility of automatically learning those templates from the wild. However, unlike natural languages, for which NMT techniques were originally devised, source code and its changes have certain properties. For instance, compared to natural language, source code vocabulary can be significantly larger. Further, good changes in code do not break its syntactic structure. Thus, deploying state-of-the-art NMT models without adapting the methods to the source code domain yields sub-optimal results. To this end, we propose a novel Tree based NMT system to model source code changes and learn code change patterns from the wild. We realize our model with a change suggestion engine: CODIT and train the model with more than 30k real-world changes and evaluate it on 6k patches. Our evaluation shows the effectiveness of CODIT in learning and suggesting patches.CODIT also shows promise generating bug fix patches.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1810.00314

PDF

https://arxiv.org/pdf/1810.00314
Read All
Behavior Identification and Prediction for a Probabilistic Risk Framework

2019-05-20

Jasprit Singh Gill, Pierluigi Pisu, Venkat N. Krovi, Matthias J. Schmid

arXiv_AI

arXiv_AI Prediction
Abstract

Operation in a real world traffic requires autonomous vehicles to be able to plan their motion in complex environments (multiple moving participants). Planning through such environment requires the right search space to be provided for the trajectory or maneuver planners so that the safest motion for the ego vehicle can be identified. Given the current states of the environment and its participants, analyzing the risks based on the predicted trajectories of all the traffic participants provides the necessary search space for the planning of motion. This paper provides a fresh taxonomy of safety / risks that an autonomous vehicle should be able to handle while navigating through traffic. It provides a reference system architecture that needs to be implemented as well as describes a novel way of identifying and predicting the behaviors of the traffic participants using classic Multi Model Adaptive Estimation (MMAE). Preliminary simulation results of the implemented model are included.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08332

PDF

http://arxiv.org/pdf/1905.08332
Read All
Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

2019-05-20

Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Vivienne Sze

arXiv_CV

arXiv_CV Sparse Inference Deep_Learning
Abstract

A recent trend in DNN development is to extend the reach of deep learning applications to platforms that are more resource and energy constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency, and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity. These compact or sparse models are different from the traditional large ones in that there is much more variation in their layer shapes and sizes, and often require specialized hardware to exploit sparsity for performance improvement. Thus, many DNN accelerators designed for large DNNs do not perform well on these models. In this work, we present Eyeriss v2, a DNN accelerator architecture designed for running compact and sparse DNNs. To deal with the widely varying layer shapes and sizes, it introduces a highly flexible on-chip network, called hierarchical mesh, that can adapt to the different amounts of data reuse and bandwidth requirements of different data types, which improves the utilization of the computation resources. Furthermore, Eyeriss v2 can process sparse data directly in the compressed domain for both weights and activations, and therefore is able to improve both processing speed and energy efficiency with sparse models. Overall, with sparse MobileNet, Eyeriss v2 in a 65nm CMOS process achieves a throughput of 1470.6 inferences/sec and 2560.3 inferences/J at a batch size of 1, which is 12.6x faster and 2.5x more energy efficient than the original Eyeriss running MobileNet. We also present an analysis methodology called Eyexam that provides a systematic way of understanding the performance limits for DNN processors as a function of specific characteristics of the DNN model and accelerator design; it applies these characteristics as sequential steps to increasingly tighten the bound on the performance limits.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.07928

PDF

http://arxiv.org/pdf/1807.07928
Read All
OBOE: Collaborative Filtering for AutoML Model Selection

2019-05-20

Chengrun Yang, Yuji Akimoto, Dae Won Kim, Madeleine Udell

arXiv_AI

arXiv_AI
Abstract

Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. This paper introduces OBOE, a collaborative filtering method for time-constrained model selection and hyperparameter tuning. OBOE forms a matrix of the cross-validated errors of a large number of supervised learning models (algorithms together with hyperparameters) on a large number of datasets, and fits a low rank model to learn the low-dimensional feature vectors for the models and datasets that best predict the cross-validated errors. To find promising models for a new dataset, OBOE runs a set of fast but informative algorithms on the new dataset and uses their cross-validated errors to infer the feature vector for the new dataset. OBOE can find good models under constraints on the number of models fit or the total time budget. To this end, this paper develops a new heuristic for active learning in time-constrained matrix completion based on optimal experiment design. Our experiments demonstrate that OBOE delivers state-of-the-art performance faster than competing approaches on a test bed of supervised learning problems. Moreover, the success of the bilinear model used by OBOE suggests that AutoML may be simpler than was previously understood.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.03233

PDF

http://arxiv.org/pdf/1808.03233
Read All
Multitask Learning of Temporal Connectionism in Convolutional Networks using a Joint Distribution Loss Function to Simultaneously Identify Tools and Phase in Surgical Videos

2019-05-20

Shanka Subhra Mondal, Rachana Sathish, Debdoot Sheet

arXiv_CV

arXiv_CV CNN RNN Prediction
Abstract

Surgical workflow analysis is of importance for understanding onset and persistence of surgical phases and individual tool usage across surgery and in each phase. It is beneficial for clinical quality control and to hospital administrators for understanding surgery planning. Video acquired during surgery typically can be leveraged for this task. Currently, a combination of convolutional neural network (CNN) and recurrent neural networks (RNN) are popularly used for video analysis in general, not only being restricted to surgical videos. In this paper, we propose a multi-task learning framework using CNN followed by a bi-directional long short term memory (Bi-LSTM) to learn to encapsulate both forward and backward temporal dependencies. Further, the joint distribution indicating set of tools associated with a phase is used as an additional loss during learning to correct for their co-occurrence in any predictions. Experimental evaluation is performed using the Cholec80 dataset. We report a mean average precision (mAP) score of 0.99 and 0.86 for tool and phase identification respectively which are higher compared to prior-art in the field.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08315

PDF

http://arxiv.org/pdf/1905.08315
Read All
Conceptual Content in Deep Convolutional Neural Networks: An analysis into multi-faceted properties of neurons

2019-05-20

Zahra Sadeghi

arXiv_CV

arXiv_CV Face CNN
Abstract

In this paper we analyze convolutional layers of VGG16 model pre-trained on ILSVRC2012. We based our analysis on the responses of neurons to the images of all classes in ImageNet database. In our analysis, we first propose a visualization method to illustrate the learned content of each neuron. Next, we investigate single and multi-faceted neurons based on the diversity of neurons responses to different classes. Finally, we compute the neuronal similarity at each layer and make a comparison between them. Our results demonstrate that the neurons in lower layers exhibit a multi-faceted behavior, whereas the majority of neurons in higher layers comprise single-faceted property and tend to respond to a smaller number of classes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.00161

PDF

http://arxiv.org/pdf/1811.00161
Read All
A Neural Network Architecture for Learning Word-Referent Associations in Multiple Contexts

2019-05-20

Hansenclever F. Bassani, Aluizio F. R. Araujo

arXiv_CL

arXiv_CL GAN
Abstract

This article proposes a biologically inspired neurocomputational architecture which learns associations between words and referents in different contexts, considering evidence collected from the literature of Psycholinguistics and Neurolinguistics. The multi-layered architecture takes as input raw images of objects (referents) and streams of word’s phonemes (labels), builds an adequate representation, recognizes the current context, and associates label with referents incrementally, by employing a Self-Organizing Map which creates new association nodes (prototypes) as required, adjusts the existing prototypes to better represent the input stimuli and removes prototypes that become obsolete/unused. The model takes into account the current context to retrieve the correct meaning of words with multiple meanings. Simulations show that the model can reach up to 78% of word-referent association accuracy in ambiguous situations and approximates well the learning rates of humans as reported by three different authors in five Cross-Situational Word Learning experiments, also displaying similar learning patterns in the different learning conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08300

PDF

http://arxiv.org/pdf/1905.08300
Read All
Better Technical Debt Detection via SURVEYing

2019-05-20

Fahmid M. Fahid, Zhe Yu, Tim Menzies

arXiv_AI

arXiv_AI Review Survey Detection
Abstract

Software analytics can be improved by surveying; i.e. rechecking and (possibly) revising the labels offered by prior analysis. Surveying is a time-consuming task and effective surveyors must carefully manage their time. Specifically, they must balance the cost of further surveying against the additional benefits of that extra effort. This paper proposes SURVEY0, an incremental Logistic Regression estimation method that implements cost/benefit analysis. Some classifier is used to rank the as-yet-unvisited examples according to how interesting they might be. Humans then review the most interesting examples, after which their feedback is used to update an estimator for estimating how many examples are remaining. This paper evaluates SURVEY0 in the context of self-admitted technical debt. As software project mature, they can accumulate “technical debt” i.e. developer decisions which are sub-optimal and decrease the overall quality of the code. Such decisions are often commented on by programmers in the code; i.e. it is self-admitted technical debt (SATD). Recent results show that text classifiers can automatically detect such debt. We find that we can significantly outperform prior results by SURVEYing the data. Specifically, for ten open-source JAVA projects, we can find 83% of the technical debt via SURVEY0 using just 16% of the comments (and if higher levels of recall are required, SURVEY0can adjust towards that with some additional effort).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08297

PDF

http://arxiv.org/pdf/1905.08297
Read All
Issues concerning realizability of Blackwell optimal policies in reinforcement learning

2019-05-20

Nicholas Denis

arXiv_AI

arXiv_AI Sparse Knowledge Reinforcement_Learning
Abstract

N-discount optimality was introduced as a hierarchical form of policy- and value-function optimality, with Blackwell optimality lying at the top level of the hierarchy Veinott (1969); Blackwell (1962). We formalize notions of myopic discount factors, value functions and policies in terms of Blackwell optimality in MDPs, and we provide a novel concept of regret, called Blackwell regret, which measures the regret compared to a Blackwell optimal policy. Our main analysis focuses on long horizon MDPs with sparse rewards. We show that selecting the discount factor under which zero Blackwell regret can be achieved becomes arbitrarily hard. Moreover, even with oracle knowledge of such a discount factor that can realize a Blackwell regret-free value function, an $\epsilon$-Blackwell optimal value function may not even be gain optimal. Difficulties associated with this class of problems is discussed, and the notion of a policy gap is defined as the difference in expected return between a given policy and any other policy that differs at that state; we prove certain properties related to this gap. Finally, we provide experimental results that further support our theoretical results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.08293

PDF

http://arxiv.org/pdf/1905.08293
Read All

18/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL