Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

On the Effectiveness of Low Frequency Perturbations

2019-02-28

Yash Sharma, Gavin Weiguang Ding, Marcus Brubaker

arXiv_CV

arXiv_CV Adversarial Optimization
Abstract

Carefully crafted, often imperceptible, adversarial perturbations have been shown to cause state-of-the-art models to yield extremely inaccurate outputs, rendering them unsuitable for safety-critical application domains. In addition, recent work has shown that constraining the attack space to a low frequency regime is particularly effective. Yet, it remains unclear whether this is due to generally constraining the attack search space or specifically removing high frequency components from consideration. By systematically controlling the frequency components of the perturbation, evaluating against the top-placing defense submissions in the NeurIPS 2017 competition, we empirically show that performance improvements in both optimization and generalization are yielded only when low frequency components are preserved. In fact, the defended models based on (ensemble) adversarial training are roughly as vulnerable to low frequency perturbations as undefended models, suggesting that the purported robustness of proposed defenses is reliant upon adversarial perturbations being high frequency in nature. We do find that under $\ell_\infty$ $\epsilon=16/255$, a commonly used distortion bound, low frequency perturbations are indeed perceptible. This questions the use of the $\ell_\infty$-norm, in particular, as a distortion metric, and suggests that explicitly considering the frequency space is promising for learning robust models which better align with human perception.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00073

PDF

http://arxiv.org/pdf/1903.00073
Read All
Learning to Plan via Neural Exploration-Exploitation Trees

2019-02-28

Binghong Chen, Bo Dai, Le Song

arXiv_RO

arXiv_RO
Abstract

Sampling-based algorithms such as RRT and its variants are powerful tools for path planning problems in high-dimensional continuous state and action spaces. While these algorithms perform systematic exploration of the state space, they do not fully exploit past planning experiences from similar environments. In this paper, we design a meta path planning algorithm, called \emph{Neural Exploration-Exploitation Trees} (NEXT), which can exploit past experience to drastically reduce the sample requirement for solving new path planning problems. More specifically, NEXT contains a novel neural architecture which can learn from experiences the dependency between task structures and promising path search directions. Then this learned prior is integrated with a UCB-type algorithm to achieve an online balance between \emph{exploration} and \emph{exploitation} when solving a new problem. Empirically, we show that NEXT can complete the planning tasks with very small searching trees and significantly outperforms previous state-of-the-arts on several benchmark problems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00070

PDF

http://arxiv.org/pdf/1903.00070
Read All
Vine Robots: Design, Teleoperation, and Deployment for Navigation and Exploration

2019-02-28

Margaret M. Coad, Laura H. Blumenschein, Sadie Cutler, Javier A. Reyna Zepeda, Nicholas D. Naclerio, Haitham El-Hussieny, Usman Mehmood, Jee-Hwan Ryu, Elliot W. Hawkes, Allison M. Okamura

arXiv_RO

arXiv_RO
Abstract

A new class of robots has recently been explored, characterized by tip extension, significant length change, and directional control. Here, we call this class of robots “vine robots,” due to their similar behavior to plants with the growth habit of trailing. Due to their growth-based movement, vine robots are well suited for navigation and exploration in cluttered environments, but until now, they have not been deployed outside the lab. Portability of these robots and steerability at length scales relevant for navigation are key to field applications. In addition, intuitive human-in-the-loop teleoperation enables movement in unknown and dynamic environments. We present a vine robot system that is teleoperated using a custom designed flexible joystick and camera system, long enough for use in navigation tasks, and portable for use in the field. We report on deployment of this system in two scenarios: a soft robot navigation competition and exploration of an archaeological site. The competition course required movement over uneven terrain, past unstable obstacles, and through a small aperture. The archaeological site required movement over rocks and through horizontal and vertical turns. The robot tip successfully moved past the obstacles and through the tunnels, demonstrating the capability of vine robots to achieve real-world navigation and exploration tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00069

PDF

http://arxiv.org/pdf/1903.00069
Read All
Non-Parametric Adaptation for Neural Machine Translation

2019-02-28

Ankur Bapna, Orhan Firat

arXiv_CL

arXiv_CL NMT Inference Gradient_Descent
Abstract

Neural Networks trained with gradient descent are known to be susceptible to catastrophic forgetting caused by parameter shift during the training process. In the context of Neural Machine Translation (NMT) this results in poor performance on heterogeneous datasets and on sub-tasks like rare phrase translation. On the other hand, non-parametric approaches are immune to forgetting, perfectly complementing the generalization ability of NMT. However, attempts to combine non-parametric or retrieval based approaches with NMT have only been successful on narrow domains, possibly due to over-reliance on sentence level retrieval. We propose a novel n-gram level retrieval approach that relies on local phrase level similarities, allowing us to retrieve neighbors that are useful for translation even when overall sentence similarity is low. We complement this with an expressive neural network, allowing our model to extract information from the noisy retrieved context. We evaluate our semi-parametric NMT approach on a heterogeneous dataset composed of WMT, IWSLT, JRC-Acquis and OpenSubtitles, and demonstrate gains on all 4 evaluation sets. The semi-parametric nature of our approach opens the door for non-parametric domain adaptation, demonstrating strong inference-time adaptation performance on new domains without the need for any parameter updates.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00058

PDF

http://arxiv.org/pdf/1903.00058
Read All
Speeding up Deep Learning with Transient Servers

2019-02-28

Shijian Li, Robert J. Walls, Lijie Xu, Tian Guo

arXiv_CV

arXiv_CV Deep_Learning
Abstract

Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable—e.g., for rapidly evaluating new model designs—they often come with significantly higher monetary costs due to sublinear scalability. In this paper, we investigate the feasibility of using training clusters composed of cheaper transient GPU servers to get the benefits of distributed training without the high costs. We conduct the first large-scale empirical analysis, launching more than a thousand GPU servers of various capacities, aimed at understanding the characteristics of transient GPU servers and their impact on distributed training performance. Our study demonstrates the potential of transient servers with a speedup of 7.7X with more than 62.9% monetary savings for some cluster configurations. We also identify a number of important challenges and opportunities for redesigning distributed training frameworks to be transient-aware. For example, the dynamic cost and availability characteristics of transient servers suggest the need for frameworks to dynamically change cluster configurations to best take advantage of current conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00045

PDF

http://arxiv.org/pdf/1903.00045
Read All
Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence

2019-02-28

David Manheim

arXiv_AI

arXiv_AI Adversarial Optimization
Abstract

Overoptimization failures in machine learning and artificial intelligence systems can involve specification gaming, reward hacking, fragility to distributional shifts, and Goodhart’s or Campbell’s law. These failure modes are an important challenge in building safe AI systems, and multi-agent systems have additional failure modes that are closely related. These failure modes for multi-agent systems are more complex, more problematic, and less well understood than the single-agent case. They are also already occurring, largely unnoticed. After motivating the discussion with examples from poker-playing AI, the paper explains why these failure modes are in some sense fundamental. Following this, the paper categorizes failure modes, provides definitions, and cites examples for each of: accidental steering, coordination failures, adversarial misalignment, input spoofing and filtering, and goal co-option or direct hacking. The paper then discusses ongoing and potential work on mitigation of these failure modes, and what to expect when these failures continue to proliferate.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.10862

PDF

http://arxiv.org/pdf/1810.10862
Read All
Reinforcement Learning based Curriculum Optimization for Neural Machine Translation

2019-02-28

Gaurav Kumar, George Foster, Colin Cherry, Maxim Krikun

arXiv_CL

arXiv_CL Knowledge Reinforcement_Learning Optimization NMT
Abstract

We consider the problem of making efficient use of heterogeneous training data in neural machine translation (NMT). Specifically, given a training dataset with a sentence-level feature such as noise, we seek an optimal curriculum, or order for presenting examples to the system during training. Our curriculum framework allows examples to appear an arbitrary number of times, and thus generalizes data weighting, filtering, and fine-tuning schemes. Rather than relying on prior knowledge to design a curriculum, we use reinforcement learning to learn one automatically, jointly with the NMT system, in the course of a single training run. We show that this approach can beat uniform and filtering baselines on Paracrawl and WMT English-to-French datasets by up to +3.4 BLEU, and match the performance of a hand-designed, state-of-the-art curriculum.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00041

PDF

http://arxiv.org/pdf/1903.00041
Read All
RLgraph: Modular Computation Graphs for Deep Reinforcement Learning

2019-02-28

Michael Schaarschmidt, Sven Mika, Kai Fricke, Eiko Yoneki

arXiv_AI

arXiv_AI Reinforcement_Learning Deep_Learning
Abstract

Reinforcement learning (RL) tasks are challenging to implement, execute and test due to algorithmic instability, hyper-parameter sensitivity, and heterogeneous distributed communication patterns. We argue for the separation of logical component composition, backend graph definition, and distributed execution. To this end, we introduce RLgraph, a library for designing and executing reinforcement learning tasks in both static graph and define-by-run paradigms. The resulting implementations are robust, incrementally testable, and yield high performance across different deep learning frameworks and distributed backends.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.09028

PDF

http://arxiv.org/pdf/1810.09028
Read All
Data-Driven Gait Segmentation for Walking Assistance in a Lower-Limb Assistive Device

2019-02-28

Aleksandra Kalinowska, Thomas A. Berrueta, Adam Zoss, Todd Murphey

arXiv_RO

arXiv_RO Segmentation
Abstract

Hybrid systems, such as bipedal walkers, are challenging to control because of discontinuities in their nonlinear dynamics. Little can be predicted about the systems’ evolution without modeling the guard conditions that govern transitions between hybrid modes, so even systems with reliable state sensing can be difficult to control. We propose an algorithm that allows for determining the hybrid mode of a system in real-time using data-driven analysis. The algorithm is used with data-driven dynamics identification to enable model predictive control based entirely on data. Two examples—a simulated hopper and experimental data from a bipedal walker—are used. In the context of the first example, we are able to closely approximate the dynamics of a hybrid SLIP model and then successfully use them for control in simulation. In the second example, we demonstrate gait partitioning of human walking data, accurately differentiating between stance and swing, as well as selected subphases of swing. We identify contact events, such as heel strike and toe-off, without a contact sensor using only kinematics data from the knee and hip joints, which could be particularly useful in providing online assistance during walking. Our algorithm does not assume a predefined gait structure or gait phase transitions, lending itself to segmentation of both healthy and pathological gaits. With this flexibility, impairment-specific rehabilitation strategies or assistance could be designed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00036

PDF

http://arxiv.org/pdf/1903.00036
Read All
SPDA: Superpixel-based Data Augmentation for Biomedical Image Segmentation

2019-02-28

Yizhe Zhang, Lin Yang, Hao Zheng, Peixian Liang, Colleen Mangold, Raquel G. Loreto, David P. Hughes, Danny Z. Chen

arXiv_AI

arXiv_AI Segmentation CNN Deep_Learning
Abstract

Supervised training a deep neural network aims to “teach” the network to mimic human visual perception that is represented by image-and-label pairs in the training data. Superpixelized (SP) images are visually perceivable to humans, but a conventionally trained deep learning model often performs poorly when working on SP images. To better mimic human visual perception, we think it is desirable for the deep learning model to be able to perceive not only raw images but also SP images. In this paper, we propose a new superpixel-based data augmentation (SPDA) method for training deep learning models for biomedical image segmentation. Our method applies a superpixel generation scheme to all the original training images to generate superpixelized images. The SP images thus obtained are then jointly used with the original training images to train a deep learning model. Our experiments of SPDA on four biomedical image datasets show that SPDA is effective and can consistently improve the performance of state-of-the-art fully convolutional networks for biomedical image segmentation in 2D and 3D images. Additional studies also demonstrate that SPDA can practically reduce the generalization gap.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00035

PDF

http://arxiv.org/pdf/1903.00035
Read All
Insertion-based Decoding with automatically Inferred Generation Order

2019-02-28

Jiatao Gu, Qi Liu, Kyunghyun Cho

arXiv_CL

arXiv_CL Image_Caption Caption
Abstract

Conventional neural autoregressive decoding commonly assumes a fixed left-to-right generation order, which may be sub-optimal. In this work, we propose a novel decoding algorithm – InDIGO – which supports flexible sequence generation in arbitrary orders through insertion operations. We extend Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or adaptive orders obtained from beam-search. Experiments on four real-world tasks, including word order recovery, machine translation, image caption and code generation, demonstrate that our algorithm can generate sequences following arbitrary orders, while achieving competitive or even better performance compared to the conventional left-to-right generation. The generated sequences show that InDIGO adopts adaptive generation orders based on input information.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01370

PDF

http://arxiv.org/pdf/1902.01370
Read All
FastFusionNet: New State-of-the-Art for DAWNBench SQuAD

2019-02-28

Felix Wu, Boyi Li, Lequn Wang, Ni Lao, John Blitzer, Kilian Q. Weinberger

arXiv_CL

arXiv_CL Inference RNN
Abstract

In this technical report, we introduce FastFusionNet, an efficient variant of FusionNet [12]. FusionNet is a high performing reading comprehension architecture, which was designed primarily for maximum retrieval accuracy with less regard towards computational requirements. For FastFusionNets we remove the expensive CoVe layers [21] and substitute the BiLSTMs with far more efficient SRU layers [19]. The resulting architecture obtains state-of-the-art results on DAWNBench [5] while achieving the lowest training and inference time on SQuAD [25] to-date. The code is available at https://github.com/felixgwu/FastFusionNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11291

PDF

http://arxiv.org/pdf/1902.11291
Read All
Homunculus' Brain and Categorical Logic

2019-02-28

Michael Heller

arXiv_AI

arXiv_AI
Abstract

The interaction between syntax (formal language) and its semantics (meanings of language) is well studied in categorical logic. Results of this study are employed to understand how the brain could create meanings. To emphasize the toy character of the proposed model, we prefer to speak on homunculus’ brain rather than just on the brain. Homunculus’ brain consists of neurons, each of which is modeled by a category, and axons between neurons, which are modeled by functors between the corresponding neuron-categories. Each neuron (category) has its own program enabling its working, i.e. a “theory” of this neuron. In analogy with what is known from categorical logic, we postulate the existence of the pair of adjoint functors, called Lang and Syn, from a category, now called BRAIN, of categories, to a category, now called MIND, of theories. Our homunculus is a kind of “mathematical robot”, the neuronal architecture of which is not important. Its only aim is to provide us with the opportunity to study how such a simple brain-like structure could “create meanings” out of its purely syntactic program. The pair of adjoint functors Lang and Syn models mutual dependencies between the syntactical structure of a given theory of MIND and the internal logic of its semantics given by a category of BRAIN. In this way, a formal language (syntax) and its meanings (semantics) are interwoven with each other in a manner corresponding to the adjointness of the functors Lang and Syn. Categories BRAIN and MIND interact with each other with their entire structures and, at the same time, these very structures are shaped by this interaction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03424

PDF

http://arxiv.org/pdf/1903.03424
Read All
Optimal Combination of Image Denoisers

2019-02-28

Joon Hee Choi, Omar Elgendy, Stanley H. Chan

arXiv_CV

arXiv_CV Optimization
Abstract

Given a set of image denoisers, each having a different denoising capability, is there a provably optimal way of combining these denoisers to produce an overall better result? An answer to this question is fundamental to designing an ensemble of weak estimators for complex scenes. In this paper, we present an optimal combination scheme by leveraging deep neural networks and convex optimization. The proposed framework, called the Consensus Neural Network (CsNet), introduces three new concepts in image denoising: (1) A provably optimal procedure to combine the denoised outputs via convex optimization; (2) A deep neural network to estimate the mean squared error (MSE) of denoised images without needing the ground truths; (3) An image boosting procedure using a deep neural network to improve contrast and to recover lost details of the combined images. Experimental results show that CsNet can consistently improve denoising performance for both deterministic and neural network denoisers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1711.06712

PDF

http://arxiv.org/pdf/1711.06712
Read All
From Visual to Acoustic Question Answering

2019-02-28

Jerome Abdelnour, Giampiero Salvi, Jean Rouat

arXiv_SD

arXiv_SD QA Relation
Abstract

We introduce the new task of Acoustic Question Answering (AQA) to promote research in acoustic reasoning. The AQA task consists of analyzing an acoustic scene composed by a combination of elementary sounds and answering questions that relate the position and properties of these sounds. The kind of relational questions asked, require that the models perform non-trivial reasoning in order to answer correctly. Although similar problems have been extensively studied in the domain of visual reasoning, we are not aware of any previous studies addressing the problem in the acoustic domain. We propose a method for generating the acoustic scenes from elementary sounds and a number of relevant questions for each scene using templates. We also present preliminary results obtained with two models (FiLM and MAC) that have been shown to work for visual reasoning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11280

PDF

http://arxiv.org/pdf/1902.11280
Read All
A CNN-RNN Framework with a Novel Patch-Based Multi-Attention Mechanism for Multi-Label Image Classification in Remote Sensing

2019-02-28

Gencer Sumbul, Begüm Demir

arXiv_CV

arXiv_CV Attention CNN Image_Classification RNN Classification Relation
Abstract

This paper presents a novel framework that jointly exploits Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) in the context of multi-label remote sensing (RS) image classification. The proposed framework consists of four main modules. The first module aims to extract preliminary local descriptors by considering that RS image bands can be associated with different spatial resolutions. To this end, we introduce a K-Branch CNN in which each branch aims at extracting descriptors of image bands that have the same spatial resolution. The second module aims to model spatial relationship among local descriptors. To this end, we propose a Bidirectional RNN architecture in which Long Short-Term Memory nodes enrich local descriptors by considering spatial relationships of local areas (image patches). The third module aims to define multiple attention scores for local descriptors. To this end, we introduce a novel patch-based multi-attention mechanism that takes into account the joint occurrence of multiple land-cover classes and provides the attention-based local descriptors. The last module aims to employ these descriptors for multi-label RS image classification. Experimental results obtained on our large-scale Sentinel-2 benchmark archive (called as BigEarthNet) show the effectiveness of the proposed framework compared to a state of the art method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11274

PDF

http://arxiv.org/pdf/1902.11274
Read All
Efficient Contextual Representation Learning Without Softmax Layer

2019-02-28

Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, Kai-Wei Chang

arXiv_CL

arXiv_CL Embedding Represenation_Learning Language_Model
Abstract

Contextual representation models have achieved great success in improving various downstream tasks. However, these language-model-based encoders are difficult to train due to the large parameter sizes and high computational complexity. By carefully examining the training procedure, we find that the softmax layer (the output layer) causes significant inefficiency due to the large vocabulary size. Therefore, we redesign the learning objective and propose an efficient framework for training contextual representation models. Specifically, the proposed approach bypasses the softmax layer by performing language modeling with dimension reduction, and allows the models to leverage pre-trained word embeddings. Our framework reduces the time spent on the output layer to a negligible level, eliminates almost all the trainable parameters of the softmax layer and performs language modeling without truncating the vocabulary. When applied to ELMo, our method achieves a 4 times speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11269

PDF

http://arxiv.org/pdf/1902.11269
Read All
CircConv: A Structured Convolution with Low Complexity

2019-02-28

Siyu Liao, Zhe Li, Liang Zhao, Qinru Qiu, Yanzhi Wang, Bo Yuan

arXiv_CV

arXiv_CV CNN
Abstract

Deep neural networks (DNNs), especially deep convolutional neural networks (CNNs), have emerged as the powerful technique in various machine learning applications. However, the large model sizes of DNNs yield high demands on computation resource and weight storage, thereby limiting the practical deployment of DNNs. To overcome these limitations, this paper proposes to impose the circulant structure to the construction of convolutional layers, and hence leads to circulant convolutional layers (CircConvs) and circulant CNNs. The circulant structure and models can be either trained from scratch or re-trained from a pre-trained non-circulant model, thereby making it very flexible for different training environments. Through extensive experiments, such strong structure-imposing approach is proved to be able to substantially reduce the number of parameters of convolutional layers and enable significant saving of computational cost by using fast multiplication of the circulant tensor.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11268

PDF

http://arxiv.org/pdf/1902.11268
Read All
Efficient Parameter-free Clustering Using First Neighbor Relations

2019-02-28

M. Saquib Sarfraz, Vivek Sharma, Rainer Stiefelhagen

arXiv_CV

arXiv_CV Relation
Abstract

We present a new clustering method in the form of a single clustering equation that is able to directly discover groupings in the data. The main proposition is that the first neighbor of each sample is all one needs to discover large chains and finding the groups in the data. In contrast to most existing clustering algorithms our method does not require any hyper-parameters, distance thresholds and/or the need to specify the number of clusters. The proposed algorithm belongs to the family of hierarchical agglomerative methods. The technique has a very low computational overhead, is easily scalable and applicable to large practical problems. Evaluation on well known datasets from different domains ranging between 1077 and 8.1 million samples shows substantial performance gains when compared to the existing clustering techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11266

PDF

http://arxiv.org/pdf/1902.11266
Read All
Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport

2019-02-28

Adarsh Subbaswamy, Peter Schulam, Suchi Saria

arXiv_AI

arXiv_AI Knowledge Relation
Abstract

Classical supervised learning produces unreliable models when training and target distributions differ, with most existing solutions requiring samples from the target domain. We propose a proactive approach which learns a relationship in the training domain that will generalize to the target domain by incorporating prior knowledge of aspects of the data generating process that are expected to differ as expressed in a causal selection diagram. Specifically, we remove variables generated by unstable mechanisms from the joint factorization to yield the Surgery Estimator—an interventional distribution that is invariant to the differences across environments. We prove that the surgery estimator finds stable relationships in strictly more scenarios than previous approaches which only consider conditional relationships, and demonstrate this in simulated experiments. We also evaluate on real world data for which the true causal diagram is unknown, performing competitively against entirely data-driven approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.04597

PDF

http://arxiv.org/pdf/1812.04597
Read All
Incorporating End-to-End Speech Recognition Models for Sentiment Analysis

2019-02-28

Egor Lakomkin, Mohammad Ali Zamani, Cornelius Weber, Sven Magg, Stefan Wermter

arXiv_CL

arXiv_CL Sentiment Sentiment_Classification Speech_Recognition Classification Recognition
Abstract

Previous work on emotion recognition demonstrated a synergistic effect of combining several modalities such as auditory, visual, and transcribed text to estimate the affective state of a speaker. Among these, the linguistic modality is crucial for the evaluation of an expressed emotion. However, manually transcribed spoken text cannot be given as input to a system practically. We argue that using ground-truth transcriptions during training and evaluation phases leads to a significant discrepancy in performance compared to real-world conditions, as the spoken text has to be recognized on the fly and can contain speech recognition mistakes. In this paper, we propose a method of integrating an automatic speech recognition (ASR) output with a character-level recurrent neural network for sentiment recognition. In addition, we conduct several experiments investigating sentiment recognition for human-robot interaction in a noise-realistic scenario which is challenging for the ASR systems. We quantify the improvement compared to using only the acoustic modality in sentiment recognition. We demonstrate the effectiveness of this approach on the Multimodal Corpus of Sentiment Intensity (MOSI) by achieving 73,6% accuracy in a binary sentiment classification task, exceeding previously reported results that use only acoustic input. In addition, we set a new state-of-the-art performance on the MOSI dataset (80.4% accuracy, 2% absolute improvement).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11245

PDF

http://arxiv.org/pdf/1902.11245
Read All
Application-level Studies of Cellular Neural Network-based Hardware Accelerators

2019-02-28

Qiuwen Lou, Indranil Palit, Tang Li, Andras Horvath, Michael Niemier, X. Sharon Hu

arXiv_CV

arXiv_CV Tracking CNN RNN
Abstract

As cost and performance benefits associated with Moore’s Law scaling slow, researchers are studying alternative architectures (e.g., based on analog and/or spiking circuits) and/or computational models (e.g., convolutional and recurrent neural networks) to perform application-level tasks faster, more energy efficiently, and/or more accurately. We investigate cellular neural network (CeNN)-based co-processors at the application-level for these metrics. While it is well-known that CeNNs can be well-suited for spatio-temporal information processing, few (if any) studies have quantified the energy/delay/accuracy of a CeNN-friendly algorithm and compared the CeNN-based approach to the best von Neumann algorithm at the application level. We present an evaluation framework for such studies. As a case study, a CeNN-friendly target-tracking algorithm was developed and mapped to an array architecture developed in conjunction with the algorithm. We compare the energy, delay, and accuracy of our architecture/algorithm (assuming all overheads) to the most accurate von Neumann algorithm (Struck). Von Neumann CPU data is measured on an Intel i5 chip. The CeNN approach is capable of matching the accuracy of Struck, and can offer approximately 1000x improvements in energy-delay product.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06649

PDF

http://arxiv.org/pdf/1903.06649
Read All
Large-Scale Object Mining for Object Discovery from Unlabeled Video

2019-02-28

Aljosa Osep, Paul Voigtlaender, Jonathon Luiten, Stefan Breuers, Bastian Leibe

arXiv_CV

arXiv_CV
Abstract

This paper addresses the problem of object discovery from unlabeled driving videos captured in a realistic automotive setting. Identifying recurring object categories in such raw video streams is a very challenging problem. Not only do object candidates first have to be localized in the input images, but many interesting object categories occur relatively infrequently. Object discovery will therefore have to deal with the difficulties of operating in the long tail of the object distribution. We demonstrate the feasibility of performing fully automatic object discovery in such a setting by mining object tracks using a generic object tracker. In order to facilitate further research in object discovery, we release a collection of more than 360,000 automatically mined object tracks from 10+ hours of video data (560,000 frames). We use this dataset to evaluate the suitability of different feature representations and clustering strategies for object discovery.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00362

PDF

http://arxiv.org/pdf/1903.00362
Read All
No Padding Please: Efficient Neural Handwriting Recognition

2019-02-28

Gideon Maillette de Buy Wenniger, Lambert Schomaker, Andy Way

arXiv_CV

arXiv_CV RNN Deep_Learning Recognition
Abstract

Neural handwriting recognition (NHR) is the recognition of handwritten text with deep learning models, such as multi-dimensional long short-term memory (MDLSTM) recurrent neural networks. Models with MDLSTM layers have achieved state-of-the art results on handwritten text recognition tasks. While multi-directional MDLSTM-layers have an unbeaten ability to capture the complete context in all directions, this strength limits the possibilities for parallelization, and therefore comes at a high computational cost. In this work we develop methods to create efficient MDLSTM-based models for NHR, particularly a method aimed at eliminating computation waste that results from padding. This proposed method, called example-packing, replaces wasteful stacking of padded examples with efficient tiling in a 2-dimensional grid. For word-based NHR this yields a speed improvement of factor 6.6 over an already efficient baseline of minimal padding for each batch separately. For line-based NHR the savings are more modest, but still significant. In addition to example-packing, we propose: 1) a technique to optimize parallelization for dynamic graph definition frameworks including PyTorch, using convolutions with grouping, 2) a method for parallelization across GPUs for variable-length example batches. All our techniques are thoroughly tested on our own PyTorch re-implementation of MDLSTM-based NHR models. A thorough evaluation on the IAM dataset shows that our models are performing similar to earlier implementations of state-of-the-art models. Our efficient NHR model and some of the reusable techniques discussed with it offer ways to realize relatively efficient models for the omnipresent scenario of variable-length inputs in deep learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11208

PDF

http://arxiv.org/pdf/1902.11208
Read All
Jointly Optimizing Diversity and Relevance in Neural Response Generation

2019-02-28

Xiang Gao, Sungjin Lee, Yizhe Zhang, Chris Brockett, Michel Galley, Jianfeng Gao, Bill Dolan

arXiv_AI

arXiv_AI Regularization
Abstract

Although recent neural conversation models have shown great potential, they often generate bland and generic responses. While various approaches have been explored to diversify the output of the conversation model, the improvement often comes at the cost of decreased relevance. In this paper, we propose a method to jointly optimize diversity and relevance that essentially fuses the latent space of a sequence-to-sequence model and that of an autoencoder model by leveraging novel regularization terms. As a result, our approach induces a latent space in which the distance and direction from the predicted response vector roughly match the relevance and diversity, respectively. This property also lends itself well to an intuitive visualization of the latent space. Both automatic and human evaluation results demonstrate that the proposed approach brings significant improvement compared to strong baselines in both diversity and relevance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11205

PDF

http://arxiv.org/pdf/1902.11205
Read All
Two-phase Hair Image Synthesis by Self-Enhancing Generative Model

2019-02-28

Haonan Qiu, Chuan Wang, Hang Zhu, Xiangyu Zhu, Jinjin Gu, Xiaoguang Han

arXiv_CV

arXiv_CV Adversarial Super_Resolution Sparse GAN
Abstract

Generating plausible hair image given limited guidance, such as sparse sketches or low-resolution image, has been made possible with the rise of Generative Adversarial Networks (GANs). Traditional image-to-image translation networks can generate recognizable results, but finer textures are usually lost and blur artifacts commonly exist. In this paper, we propose a two-phase generative model for high-quality hair image synthesis. The two-phase pipeline first generates a coarse image by an existing image translation model, then applies a re-generating network with self-enhancing capability to the coarse image. The self-enhancing capability is achieved by a proposed structure extraction layer, which extracts the texture and orientation map from a hair image. Extensive experiments on two tasks, Sketch2Hair and Hair Super-Resolution, demonstrate that our approach is able to synthesize plausible hair image with finer details, and outperforms the state-of-the-art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11203

PDF

http://arxiv.org/pdf/1902.11203
Read All
FaceLiveNet+: A Holistic Networks For Face Authentication Based On Dynamic Multi-task Convolutional Neural Networks

2019-02-28

Zuheng Ming, Junshi Xia, Muhammad Muzzamil Luqman, Jean-Christophe Burie, Kaixing Zhao

arXiv_CV

arXiv_CV Face CNN Recognition
Abstract

This paper proposes a holistic multi-task Convolutional Neural Networks (CNNs) with the dynamic weights of the tasks,namely FaceLiveNet+, for face authentication. FaceLiveNet+ can employ face verification and facial expression recognition as a solution of liveness control simultaneously. Comparing to the single-task learning, the proposed multi-task learning can better capture the feature representation for all of the tasks. The experimental results show the superiority of the multi-task learning to the single-task learning for both the face verification task and facial expression recognition task. Rather using a conventional multi-task learning with fixed weights for the tasks, this work proposes a so called dynamic-weight-unit to automatically learn the weights of the tasks. The experiments have shown the effectiveness of the dynamic weights for training the networks. Finally, the holistic evaluation for face authentication based on the proposed protocol has shown the feasibility to apply the FaceLiveNet+ for face authentication.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11179

PDF

http://arxiv.org/pdf/1902.11179
Read All
The Ethics of AI Ethics -- An Evaluation of Guidelines

2019-02-28

Thilo Hagendorff

arXiv_AI

arXiv_AI Recommendation
Abstract

Current advances in research, development and application of artificial intelligence (AI) systems have yielded a far-reaching discourse on AI ethics. In consequence, a number of ethics guidelines have been released in recent years. These guidelines comprise normative principles and recommendations aimed to harness the “disruptive” potentials of new AI technologies. Designed as a comprehensive evaluation, this paper analyzes and compares these guidelines highlighting overlaps but also omissions. As a result, I give a detailed overview of the field of AI ethics. Finally, I also examine to what extent the respective ethical principles and values are implemented in the practice of research, development and application of AI systems - and how the effectiveness in the demands of AI ethics can be improved.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03425

PDF

http://arxiv.org/pdf/1903.03425
Read All
ROVO: Robust Omnidirectional Visual Odometryfor Wide-baseline Wide-FOV Camera Systems

2019-02-28

Hochang Seok, Jongwoo Lim

arXiv_CV

arXiv_CV Pose_Estimation Optimization
Abstract

In this paper we propose a robust visual odometry system for a wide-baseline camera rig with wide field-of-view (FOV) fisheye lenses, which provides full omnidirectional stereo observations of the environment. For more robust and accurate ego-motion estimation we adds three components to the standard VO pipeline, 1) the hybrid projection model for improved feature matching, 2) multi-view P3P RANSAC algorithm for pose estimation, and 3) online update of rig extrinsic parameters. The hybrid projection model combines the perspective and cylindrical projection to maximize the overlap between views and minimize the image distortion that degrades feature matching performance. The multi-view P3P RANSAC algorithm extends the conventional P3P RANSAC to multi-view images so that all feature matches in all views are considered in the inlier counting for robust pose estimation. Finally the online extrinsic calibration is seamlessly integrated in the backend optimization framework so that the changes in camera poses due to shocks or vibrations can be corrected automatically. The proposed system is extensively evaluated with synthetic datasets with ground-truth and real sequences of highly dynamic environment, and its superior performance is demonstrated.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11154

PDF

http://arxiv.org/pdf/1902.11154
Read All
Adversarial Training for Satire Detection: Controlling for Confounding Variables

2019-02-28

Robert McHardy, Heike Adel, Roman Klinger

arXiv_CL

arXiv_CL Adversarial Knowledge Attention Classification Detection
Abstract

The automatic detection of satire vs. regular news is relevant for downstream applications (for instance, knowledge base population) and to improve the understanding of linguistic characteristics of satire. Recent approaches build upon corpora which have been labeled automatically based on article sources. We hypothesize that this encourages the models to learn characteristics for different publication sources (e.g., “The Onion” vs. “The Guardian”) rather than characteristics of satire, leading to poor generalization performance to unseen publication sources. We therefore propose a novel model for satire detection with an adversarial component to control for the confounding variable of publication source. On a large novel data set collected from German news (which we make available to the research community), we observe comparable satire classification performance and, as desired, a considerable drop in publication classification performance with adversarial training. Our analysis shows that the adversarial component is crucial for the model to learn to pay attention to linguistic properties of satire.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11145

PDF

http://arxiv.org/pdf/1902.11145
Read All
Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability

2019-02-28

Miriam Redi, Besnik Fetahu, Jonathan Morgan, Dario Taraborelli

arXiv_CL

arXiv_CL
Abstract

Wikipedia is playing an increasingly central role on the web,and the policies its contributors follow when sourcing and fact-checking content affect million of readers. Among these core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate and fact-check Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required by collecting labeled data from editors of multiple Wikipedia language editions. We then collect a large-scale crowdsourced dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design and evaluate algorithmic models to determine if a statement requires a citation, and to predict the citation reason based on our taxonomy. We evaluate the robustness of such models across different classes of Wikipedia articles of varying quality, as well as on an additional dataset of claims annotated for fact-checking purposes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11116

PDF

http://arxiv.org/pdf/1902.11116
Read All
Tensor-variate Mixture of Experts

2019-02-28

Noémie Jaquier, Robert Haschke, Sylvain Calinon

arXiv_RO

arXiv_RO GAN
Abstract

When data are organized in matrices or arrays of higher dimensions (tensors), classical regression methods first transform these data into vectors, therefore ignoring the underlying structure of the data and increasing the dimensionality of the problem. This flattening operation typically leads to overfitting when only few training data is available. In this paper, we present a mixture of experts model that exploits tensorial representations for regression of tensor-valued data. The proposed formulation takes into account the underlying structure of the data and remains efficient when few training data are available. Evaluation on artificially generated data, as well as offline and real-time experiments recognizing hand movements from tactile myography prove the effectiveness of the proposed approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11104

PDF

http://arxiv.org/pdf/1902.11104
Read All
Real-time 3D Shape Instantiation for Partially-deployed Stent Segment from a Single 2D Fluoroscopic Image in Robot-assisted Fenestrated Endovascular Aortic Repair

2019-02-28

Jian-Qing Zheng, Xiao-Yun Zhou, Guang-Zhong Yang

arXiv_CV

arXiv_CV CNN Classification
Abstract

In robot-assisted Fenestrated Endovascular Aortic Repair (FEVAR), accurate alignment of stent graft fenestrations or scallops with aortic branches is essential for establishing complete blood flow perfusion. Current navigation is largely based on 2D fluoroscopic images, which lacks 3D anatomical information, thus causing longer operation time as well as high risks of radiation exposure. Previously, 3D shape instantiation frameworks for real-time 3D shape reconstruction of fully-deployed or fully-compressed stent graft from a single 2D fluoroscopic image have been proposed for 3D navigation in robot-assisted FEVAR. However, these methods could not instantiate partially-deployed stent segments, as the 3D marker references are unknown. In this paper, an adapted Graph Convolutional Network (GCN) is proposed to predict 3D marker references from 3D fully-deployed markers. As original GCN is for classification, in this paper, the coarsening layers are removed and the softmax function at the network end is replaced with linear mapping for the regression task. The derived 3D and the 2D marker references are used to instantiate partially-deployed stent segment shape with the existing 3D shape instantiation framework. Validations were performed on three commonly used stent grafts and five patient-specific 3D printed aortic aneurysm phantoms. Comparable performances with average mesh distance errors of 1$\sim$3mm and average angular errors around 7degree were achieved.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11089

PDF

http://arxiv.org/pdf/1902.11089
Read All
Rolling Shutter Camera Synchronization with Sub-millisecond Accuracy

2019-02-28

Matej Smid, Jiri Matas

arXiv_CV

arXiv_CV
Abstract

A simple method for synchronization of video streams with a precision better than one millisecond is proposed. The method is applicable to any number of rolling shutter cameras and when a few photographic flashes or other abrupt lighting changes are present in the video. The approach exploits the rolling shutter sensor property that every sensor row starts its exposure with a small delay after the onset of the previous row. The cameras may have different frame rates and resolutions, and need not have overlapping fields of view. The method was validated on five minutes of four streams from an ice hockey match. The found transformation maps events visible in all cameras to a reference time with a standard deviation of the temporal error in the range of 0.3 to 0.5 milliseconds. The quality of the synchronization is demonstrated on temporally and spatially overlapping images of a fast moving puck observed in two cameras.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11084

PDF

http://arxiv.org/pdf/1902.11084
Read All
Active Transfer Learning for Persian Offline Signature Verification

2019-02-28

Taraneh Younesian, Saeed Masoudnia, Reshad Hosseini, Babak N. Araabi

arXiv_CV

arXiv_CV Transfer_Learning Recognition
Abstract

Offline Signature Verification (OSV) remains a challenging pattern recognition task, especially in the presence of skilled forgeries that are not available during the training. This challenge is aggravated when there are small labeled training data available but with large intra-personal variations. In this study, we address this issue by employing an active learning approach, which selects the most informative instances to label and therefore reduces the human labeling effort significantly. Our proposed OSV includes three steps: feature learning, active learning, and final verification. We benefit from transfer learning using a pre-trained CNN for feature learning. We also propose SVM-based active learning for each user to separate his genuine signatures from the random forgeries. We finally used the SVMs to verify the authenticity of the questioned signature. We examined our proposed active transfer learning method on UTSig: A Persian offline signature dataset. We achieved near 13% improvement compared to the random selection of instances. Our results also showed 1% improvement over the state-of-the-art method in which a fully supervised setting with five more labeled instances per user was used.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06255

PDF

http://arxiv.org/pdf/1903.06255
Read All
AFS: An Attention-based mechanism for Supervised Feature Selection

2019-02-28

Ning Gui, Danni Ge, Ziyin Hu

arXiv_AI

arXiv_AI Attention Classification Relation
Abstract

As an effective data preprocessing step, feature selection has shown its effectiveness to prepare high-dimensional data for many machine learning tasks. The proliferation of high di-mension and huge volume big data, however, has brought major challenges, e.g. computation complexity and stability on noisy data, upon existing feature-selection techniques. This paper introduces a novel neural network-based feature selection architecture, dubbed Attention-based Feature Selec-tion (AFS). AFS consists of two detachable modules: an at-tention module for feature weight generation and a learning module for the problem modeling. The attention module for-mulates correlation problem among features and supervision target into a binary classification problem, supported by a shallow attention net for each feature. Feature weights are generated based on the distribution of respective feature se-lection patterns adjusted by backpropagation during the train-ing process. The detachable structure allows existing off-the-shelf models to be directly reused, which allows for much less training time, demands for the training data and requirements for expertise. A hybrid initialization method is also intro-duced to boost the selection accuracy for datasets without enough samples for feature weight generation. Experimental results show that AFS achieves the best accuracy and stability in comparison to several state-of-art feature selection algo-rithms upon both MNIST, noisy MNIST and several datasets with small samples.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11074

PDF

http://arxiv.org/pdf/1902.11074
Read All
An alternative approach to coherent choice functions

2019-02-28

Jasper De Bock, Gert de Cooman

arXiv_AI

arXiv_AI
Abstract

Choice functions constitute a simple, direct and very general mathematical framework for modelling choice under uncertainty. In particular, they are able to represent the set-valued choices that appear in imprecise-probabilistic decision making. We provide these choice functions with a clear interpretation in terms of desirability, use this interpretation to derive a set of basic coherence axioms, and show that this notion of coherence leads to a representation in terms of sets of strict preference orders. By imposing additional properties such as totality, the mixing property and Archimedeanity, we obtain representation in terms of sets of strict total orders, lexicographic probability systems, coherent lower previsions or linear previsions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00336

PDF

http://arxiv.org/pdf/1903.00336
Read All
Biometric Presentation Attack Detection: Beyond the Visible Spectrum

2019-02-28

Ruben Tolosana, Marta Gomez-Barrero, Christoph Busch, Javier Ortega-Garcia

arXiv_CV

arXiv_CV Face Deep_Learning Detection
Abstract

The increased need for unattended authentication in multiple scenarios has motivated a wide deployment of biometric systems in the last few years. This has in turn led to the disclosure of security concerns specifically related to biometric systems. Among them, Presentation Attacks (PAs, i.e., attempts to log into the system with a fake biometric characteristic or presentation attack instrument) pose a severe threat to the security of the system: any person could eventually fabricate or order a gummy finger or face mask to impersonate someone else. The biometrics community has thus made a considerable effort to the development of automatic Presentation Attack Detection (PAD) mechanisms, for instance through the international LivDet competitions. In this context, we present a novel fingerprint PAD scheme based on $i)$ a new capture device able to acquire images within the short wave infrared (SWIR) spectrum, and $ii)$ an in-depth analysis of several state-of-the-art techniques based on both handcrafted and deep learning features. The approach is evaluated on a database comprising over 4700 samples, stemming from 562 different subjects and 35 different presentation attack instrument (PAI) species. The results show the soundness of the proposed approach with a detection equal error rate (D-EER) as low as 1.36\% even in a realistic scenario where five different PAI species are considered only for testing purposes (i.e., unknown attacks).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11065

PDF

http://arxiv.org/pdf/1902.11065
Read All
Efficient Dense Frontier Detection for 2D Graph SLAM Based on Occupancy Grid Submaps

2019-02-28

Juraj Oršulić, Damjan Miklić, Zdenko Kovačić

arXiv_RO

arXiv_RO Detection SLAM
Abstract

In autonomous robot exploration, the frontier is the border in the world map between the explored space and unexplored space. The frontier plays an important role when deciding where in the environment the robots should go explore next. We examine a modular control system pipeline for autonomous exploration where a 2D graph SLAM algorithm based on occupancy grid submaps performs map building and localization. We provide an overview of the state of the art in frontier detection and the relevant SLAM concepts and propose a specialized frontier detection method which is efficiently constrained to active submaps, yet robust to SLAM loop closures.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11061

PDF

http://arxiv.org/pdf/1902.11061
Read All
Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

2019-02-28

Daniel Ortega, Chia-Yu Li, Gisela Vallejo, Pavel Denisov, Ngoc Thang Vu

arXiv_CL

arXiv_CL Speech_Recognition CNN Classification Recognition
Abstract

This paper presents our latest investigations on dialog act (DA) classification on automatically generated transcriptions. We propose a novel approach that combines convolutional neural networks (CNNs) and conditional random fields (CRFs) for context modeling in DA classification. We explore the impact of transcriptions generated from different automatic speech recognition systems such as hybrid TDNN/HMM and End-to-End systems on the final performance. Experimental results on two benchmark datasets (MRDA and SwDA) show that the combination CNN and CRF improves consistently the accuracy. Furthermore, they show that although the word error rates are comparable, End-to-End ASR system seems to be more suitable for DA classification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11060

PDF

http://arxiv.org/pdf/1902.11060
Read All
Representation Learning for Recommender Systems with Application to the Scientific Literature

2019-02-28

Robin Brochier

arXiv_CL

arXiv_CL Attention Embedding Represenation_Learning Recommendation
Abstract

The scientific literature is a large information network linking various actors (laboratories, companies, institutions, etc.). The vast amount of data generated by this network constitutes a dynamic heterogeneous attributed network (HAN), in which new information is constantly produced and from which it is increasingly difficult to extract content of interest. In this article, I present my first thesis works in partnership with an industrial company, Digital Scientific Research Technology. This later offers a scientific watch tool, Peerus, addressing various issues, such as the real time recommendation of newly published papers or the search for active experts to start new collaborations. To tackle this diversity of applications, a common approach consists in learning representations of the nodes and attributes of this HAN and use them as features for a variety of recommendation tasks. However, most works on attributed network embedding pay too little attention to textual attributes and do not fully take advantage of recent natural language processing techniques. Moreover, proposed methods that jointly learn node and document representations do not provide a way to effectively infer representations for new documents for which network information is missing, which happens to be crucial in real time recommender systems. Finally, the interplay between textual and graph data in text-attributed heterogeneous networks remains an open research direction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11058

PDF

http://arxiv.org/pdf/1902.11058
Read All
A Roadmap-Path Reshaping Algorithm for Real-Time Motion Planning

2019-02-28

Chaoyi Sun (1), Qing Li (1), Li Li (1) ((1) Tsinghua University)

arXiv_RO

arXiv_RO Optimization
Abstract

Real-time motion planning is a vital function of robotic systems. Different from existing roadmap algorithms which first determine the free space and then determine the collision-free path, researchers recently proposed several convex relaxation based smoothing algorithms which first select an initial path to link the starting configuration and the goal configuration and then reshape this path to meet other requirements (e.g., collision-free conditions) by using convex relaxation. However, convex relaxation based smoothing algorithms often fail to give a satisfactory path, since the initial paths are selected randomly. Moreover, the curvature constraints were not considered in the existing convex relaxation based smoothing algorithms. In this paper, we show that we can first grid the whole configuration space to pick a candidate path and reshape this shortest path to meet our goal. This new algorithm inherits the merits of the roadmap algorithms and the convex feasible set algorithm. We further discuss how to meet the curvature constraints by using both the Beamlet algorithm to select a better initial path and an iterative optimization algorithm to adjust the curvature of the path. Theoretical analyzing and numerical testing results show that it can almost surely find a feasible path and use much less time than the recently proposed convex feasible set algorithm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11056

PDF

http://arxiv.org/pdf/1902.11056
Read All
Link Prediction with Mutual Attention for Text-Attributed Networks

2019-02-28

Robin Brochier, Adrien Guille, Julien Velcin

arXiv_CL

arXiv_CL Attention Prediction
Abstract

In this extended abstract, we present an algorithm that learns a similarity measure between documents from the network topology of a structured corpus. We leverage the Scaled Dot-Product Attention, a recently proposed attention mechanism, to design a mutual attention mechanism between pairs of documents. To train its parameters, we use the network links as supervision. We provide preliminary experiment results with a citation dataset on two prediction tasks, demonstrating the capacity of our model to learn a meaningful textual similarity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11054

PDF

http://arxiv.org/pdf/1902.11054
Read All
Segmentation of Roots in Soil with U-Net

2019-02-28

Abraham George Smith, Jens Petersen, Raghavendra Selvan, Camilla Ruø Rasmussen

arXiv_CV

arXiv_CV Segmentation Face CNN Relation
Abstract

Plant root research can provide a way to attain stress-tolerant crops that produce greater yield in a diverse array of conditions. Phenotyping roots in soil is often challenging due to the roots being difficult to access and the use of time consuming manual methods. Rhizotrons allow visual inspection of root growth through transparent surfaces. Agronomists currently manually label photographs of roots obtained from rhizotrons using a line-intersect method to obtain root length density and rooting depth measurements which are essential for their experiments. We investigate the effectiveness of an automated image segmentation method based on the U-Net Convolutional Neural Network (CNN) architecture to enable such measurements. We design a data-set of 50 annotated Chicory (Cichorium intybus L.) root images which we use to train, validate and test the system and compare against a baseline built using the Frangi vesselness filter. We obtain metrics using manual annotations and line-intersect counts. Our results on the held out data show our proposed automated segmentation system to be a viable solution for detecting and quantifying roots. We validate our system using 867 images for which we have obtained line-intersect counts, attaining a Spearman rank correlation of 0.9748 and an $r^2$ of 0.9217. We also achieve an $F_1$ of 0.7 when comparing the automated segmentation to the manual annotations, with our automated segmentation system producing segmentations with higher quality than the manual annotations for large portions of the image.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11050

PDF

http://arxiv.org/pdf/1902.11050
Read All
Evaluating Rewards for Question Generation Models

2019-02-28

Tom Hosking, Sebastian Riedel

arXiv_CL

arXiv_CL Reinforcement_Learning Prediction
Abstract

Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation. Models are trained using teacher forcing to optimise only the one-step-ahead prediction. However, at test time, the model is asked to generate a whole sequence, causing errors to propagate through the generation process (exposure bias). A number of authors have proposed countering this bias by optimising for a reward that is less tightly coupled to the training data, using reinforcement learning. We optimise directly for quality metrics, including a novel approach using a discriminator learned directly from the training data. We confirm that policy gradient methods can be used to decouple training from the ground truth, leading to increases in the metrics used as rewards. We perform a human evaluation, and show that although these metrics have previously been assumed to be good proxies for question quality, they are poorly aligned with human judgement and the model simply learns to exploit the weaknesses of the reward source.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11049

PDF

http://arxiv.org/pdf/1902.11049
Read All
GCNv2: Efficient Correspondence Prediction for Real-Time SLAM

2019-02-28

Jiexiong Tang, Ludvig Ericson, John Folkesson, Patric Jensfelt

arXiv_CV

arXiv_CV Drone Deep_Learning Prediction SLAM
Abstract

In this paper, we present a deep learning-based network, GCNv2, for generation of keypoints and descriptors. GCNv2 is built on our previous method, GCN, a network trained for 3D projective geometry. GCNv2 is designed with a binary descriptor vector as the ORB feature so that it can easily replace ORB in systems such as ORB-SLAM. GCNv2 significantly improves the computational efficiency over GCN that was only able to run on desktop hardware. We show how a modified version of ORB-SLAM using GCNv2 features runs on a Jetson TX2, an embdded low-power platform. Experimental results show that GCNv2 retains almost the same accuracy as GCN and that it is robust enough to use for control of a flying drone.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11046

PDF

http://arxiv.org/pdf/1902.11046
Read All
Unsupervised Abnormality Detection through Mixed Structure Regularization in Deep Sparse Autoencoders

2019-02-28

Moti Freiman, Ravindra Manjeshwar, Liran Goshen

arXiv_CV

arXiv_CV Regularization Sparse Detection
Abstract

Deep sparse auto-encoders with mixed structure regularization (MSR) in addition to explicit sparsity regularization term and stochastic corruption of the input data with Gaussian noise have the potential to improve unsupervised abnormality detection. Unsupervised abnormality detection based on identifying outliers using deep sparse auto-encoders is a very appealing approach for medical computer aided detection systems as it requires only healthy data for training rather than expert annotated abnormality. In the task of detecting coronary artery disease from Coronary Computed Tomography Angiography (CCTA), our results suggests that the MSR has the potential to improve overall performance by 20-30% compared to deep sparse and denoising auto-encoders.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11036

PDF

http://arxiv.org/pdf/1902.11036
Read All
Multilspectral snapshot demosaicing via non-convex matrix completion

2019-02-28

Giancarlo A. Antonucci, Simon Vary, David Humphreys, Robert A. Lamb, Jonathan Piper, Jared Tanner

arXiv_CV

arXiv_CV Sparse
Abstract

Snapshot mosaic multispectral imagery acquires an undersampled data cube by acquiring a single spectral measurement per spatial pixel. Sensors which acquire $p$ frequencies, therefore, suffer from severe $1/p$ undersampling of the full data cube. We show that the missing entries can be accurately imputed using non-convex techniques from sparse approximation and matrix completion initialised with traditional demosaicing algorithms. In particular, we observe the peak signal-to-noise ratio can typically be improved by 2 to 5 dB over current state-of-the-art methods when simulating a $p=16$ mosaic sensor measuring both high and low altitude urban and rural scenes as well as ground-based scenes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11032

PDF

http://arxiv.org/pdf/1902.11032
Read All
Enhancing the Robustness of Deep Neural Networks by Boundary Conditional GAN

2019-02-28

Ke Sun, Zhanxing Zhu, Zhouchen Lin

arXiv_CV

arXiv_CV Adversarial GAN Classification Quantitative
Abstract

Deep neural networks have been widely deployed in various machine learning tasks. However, recent works have demonstrated that they are vulnerable to adversarial examples: carefully crafted small perturbations to cause misclassification by the network. In this work, we propose a novel defense mechanism called Boundary Conditional GAN to enhance the robustness of deep neural networks against adversarial examples. Boundary Conditional GAN, a modified version of Conditional GAN, can generate boundary samples with true labels near the decision boundary of a pre-trained classifier. These boundary samples are fed to the pre-trained classifier as data augmentation to make the decision boundary more robust. We empirically show that the model improved by our approach consistently defenses against various types of adversarial attacks successfully. Further quantitative investigations about the improvement of robustness and visualization of decision boundaries are also provided to justify the effectiveness of our strategy. This new defense mechanism that uses boundary samples to enhance the robustness of networks opens up a new way to defense adversarial attacks consistently.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.11029

PDF

https://arxiv.org/pdf/1902.11029
Read All
Towards Multi-pose Guided Virtual Try-on Network

2019-02-28

Haoye Dong, Xiaodan Liang, Bochao Wang, Hanjiang Lai, Jia Zhu, Jian Yin

arXiv_CV

arXiv_CV Adversarial GAN Quantitative
Abstract

Virtual try-on system under arbitrary human poses has huge application potential, yet raises quite a lot of challenges, e.g. self-occlusions, heavy misalignment among diverse poses, and diverse clothes textures. Existing methods aim at fitting new clothes into a person can only transfer clothes on the fixed human pose, but still show unsatisfactory performances which often fail to preserve the identity, lose the texture details, and decrease the diversity of poses. In this paper, we make the first attempt towards multi-pose guided virtual try-on system, which enables transfer clothes on a person image under diverse poses. Given an input person image, a desired clothes image, and a desired pose, the proposed Multi-pose Guided Virtual Try-on Network (MG-VTON) can generate a new person image after fitting the desired clothes into the input image and manipulating human poses. Our MG-VTON is constructed in three stages: 1) a desired human parsing map of the target image is synthesized to match both the desired pose and the desired clothes shape; 2) a deep Warping Generative Adversarial Network (Warp-GAN) warps the desired clothes appearance into the synthesized human parsing map and alleviates the misalignment problem between the input human pose and desired human pose; 3) a refinement render utilizing multi-pose composition masks recovers the texture details of clothes and removes some artifacts. Extensive experiments on well-known datasets and our newly collected largest virtual try-on benchmark demonstrate that our MG-VTON significantly outperforms all state-of-the-art methods both qualitatively and quantitatively with promising multi-pose virtual try-on performances.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11026

PDF

http://arxiv.org/pdf/1902.11026
Read All

139/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL