Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

End-to-end named entity extraction from speech

2018-05-30

Sahar Ghannay, Antoine Caubrière, Yannick Estève, Antoine Laurent, Emmanuel Morin

arXiv_CV

arXiv_CV Speech_Recognition Optimization Recognition
Abstract

Named entity recognition (NER) is among SLU tasks that usually extract semantic information from textual documents. Until now, NER from speech is made through a pipeline process that consists in processing first an automatic speech recognition (ASR) on the audio and then processing a NER on the ASR outputs. Such approach has some disadvantages (error propagation, metric to tune ASR systems sub-optimal in regards to the final task, reduced space search at the ASR output level…) and it is known that more integrated approaches outperform sequential ones, when they can be applied. In this paper, we present a first study of end-to-end approach that directly extracts named entities from speech, though a unique neural architecture. On a such way, a joint optimization is able for both ASR and NER. Experiments are carried on French data easily accessible, composed of data distributed in several evaluation campaign. Experimental results show that this end-to-end approach provides better results (F-measure=0.69 on test data) than a classical pipeline approach to detect named entity categories (F-measure=0.65).

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.12045

PDF

https://arxiv.org/pdf/1805.12045
Read All
Bi-Directional Neural Machine Translation with Synthetic Parallel Data

2018-05-30

Xing Niu, Michael Denkowski, Marine Carpuat

arXiv_CL

arXiv_CL NMT
Abstract

Despite impressive progress in high-resource settings, Neural Machine Translation (NMT) still struggles in low-resource and out-of-domain scenarios, often failing to match the quality of phrase-based translation. We propose a novel technique that combines back-translation and multilingual NMT to improve performance in these difficult cases. Our technique trains a single model for both directions of a language pair, allowing us to back-translate source or target monolingual data without requiring an auxiliary model. We then continue training on the augmented parallel data, enabling a cycle of improvement for a single model that can incorporate any source, target, or parallel data to improve both translation directions. As a byproduct, these models can reduce training and deployment costs significantly compared to uni-directional models. Extensive experiments show that our technique outperforms standard back-translation in low-resource scenarios, improves quality on cross-domain tasks, and effectively reduces costs across the board.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11213

PDF

https://arxiv.org/pdf/1805.11213
Read All
BUNDLEP: Prioritizing Conflict Free Regions in Multi-Threaded Programs to Improve Cache Reuse -- Extended Results and Technical Report

2018-05-30

Corey Tessler, Nathan Fisher

arXiv_CV

arXiv_CV Knowledge
Abstract

In BUNDLE: Real-Time Multi-Threaded Scheduling to Reduce Cache Contention, Tessler and Fisher propose a scheduling mechanism and combined worst-case execution time calculation method that treats the instruction cache as a beneficial resource shared between threads. Object analysis produces a worst-case execution time bound and separates code segments into regions. Threads are dynamically placed in bundles as- sociated with regions at run time by the BUNDLE scheduling algorithm where they benefit from shared cache values. In the evaluation of the previous work, tasks were created with a predetermined worst-case execution time path through the control flow graph. Apriori knowledge of the worst-case path is an impractical restriction on any analysis. At the time, the only other solution available was an all-paths search of the graph, which is an equally impractical approach due to its complexity. The primary focus of this work is to build upon BUNDLE, expanding its applicability beyond a proof of concept. We present a complete a worst-case execution time calculation method that includes thread level context switch costs, operating on real programs, with representative architecture parameters, and compare our results to those produced by Heptane’s state of the art method. To these ends, we propose a modification to the BUNDLE scheduling algorithm called BUNDLEP. Bundles are assigned priorities that enforce an ordered flow of threads through the control flow graph – avoiding the need for multiple all-paths searches through the graph. In many cases, our evaluation shows a run-time and analytical benefit for BUNLDEP compared to serialized thread execution and state of the art WCET analysis.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.12041

PDF

https://arxiv.org/pdf/1805.12041
Read All
Optimal Placement of Baseband Functions for Energy Harvesting Virtual Small Cells

2018-05-30

Dagnachew A. Temesgene, Nicola Piovesan, Marco Miozzo, Paolo Dini

arXiv_CV

arXiv_CV
Abstract

Flexible functional split in Cloud Radio Access Network (CRAN) greatly overcomes fronthaul capacity and latency challenges. In such architecture, part of the baseband processing is done locally and the remaining is done remotely in the central cloud. On the other hand, Energy Harvesting (EH) technologies are increasingly adopted due to sustainability and economic advantages. Power consumption due to baseband processing has a huge share in the total power consumption breakdown of smaller base stations. Given that such base stations are powered by EH, in addition to QoS constraints, energy availability also conditions the decision on where to place each baseband function in the system. This work focuses on determining the performance bounds of an optimal placement of baseband functional split option in virtualized small cells that are solely powered by EH. The work applies Dynamic Programming (DP), in particular, Shortest Path search is used to determine the optimal functional split option considering traffic QoS requirements and available energy budget.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.12015

PDF

https://arxiv.org/pdf/1805.12015
Read All
Neural Joking Machine : Humorous image captioning

2018-05-30

Kota Yoshida, Munetaka Minoguchi, Kenichiro Wani, Akio Nakamura, Hirokatsu Kataoka

arXiv_CV

arXiv_CV Image_Caption Caption RNN
Abstract

What is an effective expression that draws laughter from human beings? In the present paper, in order to consider this question from an academic standpoint, we generate an image caption that draws a “laugh” by a computer. A system that outputs funny captions based on the image caption proposed in the computer vision field is constructed. Moreover, we also propose the Funny Score, which flexibly gives weights according to an evaluation database. The Funny Score more effectively brings out “laughter” to optimize a model. In addition, we build a self-collected BoketeDB, which contains a theme (image) and funny caption (text) posted on “Bokete”, which is an image Ogiri website. In an experiment, we use BoketeDB to verify the effectiveness of the proposed method by comparing the results obtained using the proposed method and those obtained using MS COCO Pre-trained CNN+LSTM, which is the baseline and idiot created by humans. We refer to the proposed method, which uses the BoketeDB pre-trained model, as the Neural Joking Machine (NJM).

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11850

PDF

https://arxiv.org/pdf/1805.11850
Read All
Learning Instance-Aware Object Detection Using Determinantal Point Processes

2018-05-30

Nuri Kim, Donghoon Lee, Songhwai Oh

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Recent object detectors find instances while categorizing candidate regions in an input image. As each region is evaluated independently, the number of candidate regions from a detector is usually larger than the number of objects. Since the final goal of detection is to assign a single detection to each object, an additional algorithm, such as non-maximum suppression (NMS), is used to select a single bounding box for an object. While simple heuristic algorithms, such as NMS, are effective for stand-alone objects, they can fail to detect overlapped objects. In this paper, we address this issue by training a network to distinguish different objects while localizing and categorizing them. We propose an instance-aware detection network (IDNet), which can learn to extract features from candidate regions and measures their similarities. Based on pairwise similarities and detection qualities, the IDNet selects an optimal subset of candidate bounding boxes using determinantal point processes (DPPs). Extensive experiments demonstrate that the proposed algorithm performs favorably compared to existing state-of-the-art detection methods particularly for overlapped objects on the PASCAL VOC and MS COCO datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.10765

PDF

https://arxiv.org/pdf/1805.10765
Read All
Long short-term memory networks in memristor crossbars

2018-05-30

Can Li, Zhongrui Wang, Mingyi Rao, Daniel Belkin, Wenhao Song, Hao Jiang, Peng Yan, Yunning Li, Peng Lin, Miao Hu, Ning Ge, John Paul Strachan, Mark Barnell, Qing Wu, R. Stanley Williams, J. Joshua Yang, Qiangfei Xia

arXiv_CV

arXiv_CV Inference RNN Classification Memory_Networks
Abstract

Recent breakthroughs in recurrent deep neural networks with long short-term memory (LSTM) units has led to major advances in artificial intelligence. State-of-the-art LSTM models with significantly increased complexity and a large number of parameters, however, have a bottleneck in computing power resulting from limited memory capacity and data communication bandwidth. Here we demonstrate experimentally that LSTM can be implemented with a memristor crossbar, which has a small circuit footprint to store a large number of parameters and in-memory computing capability that circumvents the ‘von Neumann bottleneck’. We illustrate the capability of our system by solving real-world problems in regression and classification, which shows that memristor LSTM is a promising low-power and low-latency hardware platform for edge inference.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11801

PDF

https://arxiv.org/pdf/1805.11801
Read All
Automated proof synthesis for propositional logic with deep neural networks

2018-05-30

Taro Sekiyama, Kohei Suenaga

arXiv_CV

arXiv_CV Inference Deep_Learning
Abstract

This work explores the application of deep learning, a machine learning technique that uses deep neural networks (DNN) in its core, to an automated theorem proving (ATP) problem. To this end, we construct a statistical model which quantifies the likelihood that a proof is indeed a correct one of a given proposition. Based on this model, we give a proof-synthesis procedure that searches for a proof in the order of the likelihood. This procedure uses an estimator of the likelihood of an inference rule being applied at each step of a proof. As an implementation of the estimator, we propose a proposition-to-proof architecture, which is a DNN tailored to the automated proof synthesis problem. To empirically demonstrate its usefulness, we apply our model to synthesize proofs of propositional logic. We train the proposition-to-proof model using a training dataset of proposition-proof pairs. The evaluation against a benchmark set shows the very high accuracy and an improvement to the recent work of neural proof synthesis.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11799

PDF

https://arxiv.org/pdf/1805.11799
Read All
Large Multiuser MIMO Detection: Algorithms and Architectures

2018-05-29

Hadi Sarieddeen

arXiv_CV

arXiv_CV Object_Detection Classification Detection
Abstract

In this thesis, we investigate the problem of efficient data detection in large MIMO and high order MU-MIMO systems. First, near-optimal low-complexity detection algorithms are proposed for regular MIMO systems. Then, a family of low-complexity hard-output and soft-output detection schemes based on channel matrix puncturing targeted for large MIMO systems is proposed. The performance of these schemes is characterized and analyzed mathematically, and bounds on capacity, diversity gain, and probability of bit error are derived. After that, efficient high order MU-MIMO detectors are proposed, based on joint modulation classification and subspace detection, where the modulation type of the interferer is estimated, while multiple decoupled streams are individually detected. Hardware architectures are designed for the proposed algorithms, and the promised gains are verified via simulations. Finally, we map the studied search-based detection schemes to low-resolution precoding at the transmitter side in massive MIMO and report the performance-complexity tradeoffs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11514

PDF

https://arxiv.org/pdf/1805.11514
Read All
InN and GaN/InN monolayers grown on ZnO {0001}

2018-05-29

Torsten Ernst, Caroline Chèze, Raffaella Calarco

arXiv_CV

arXiv_CV GAN Face
Abstract

Thin InN and GaN/InN films were grown on oxygen-polar (O) (000-1) and zinc-polar (Zn) (0001) zinc oxide (ZnO) by plasma-assisted molecular beam epitaxy (PAMBE). The influence of the growth rate (GR) and the substrate polarity on the growth mode and the surface morphology of InN and GaN/InN was investigated in situ by reflection high-energy electron diffraction (RHEED) and ex situ by atomic force microscopy (AFM). During InN deposition, a transition from two dimensional to three dimensional (2D-3D) growth mode is observed in RHEED. The critical thickness for relaxation increases with decreasing GR and varies from 0.6 ML (GR: 1.0 ML/s) to 1.2 MLs (GR: 0.2 ML/s) on O-ZnO and from 1.2 MLs (GR: 0.5 ML/s) to 1.7 MLs (GR: 0.2 ML/s) on Zn-ZnO. The critical thickness for relaxation of GaN on top of 1.2 MLs and 1.5 MLs thick InN is close to zero on O-ZnO and 1.6 MLs on Zn-ZnO, respectively.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11495

PDF

https://arxiv.org/pdf/1805.11495
Read All
Long Short-Term Memory Networks for CSI300 Volatility Prediction with Baidu Search Volume

2018-05-29

Yu-Long Zhou, Ren-Jie Han, Qian Xu, Wei-Ke Zhang

arXiv_CV

arXiv_CV RNN Prediction Memory_Networks
Abstract

Intense volatility in financial markets affect humans worldwide. Therefore, relatively accurate prediction of volatility is critical. We suggest that massive data sources resulting from human interaction with the Internet may offer a new perspective on the behavior of market participants in periods of large market movements. First we select 28 key words, which are related to finance as indicators of the public mood and macroeconomic factors. Then those 28 words of the daily search volume based on Baidu index are collected manually, from June 1, 2006 to October 29, 2017. We apply a Long Short-Term Memory neural network to forecast CSI300 volatility using those search volume data. Compared to the benchmark GARCH model, our forecast is more accurate, which demonstrates the effectiveness of the LSTM neural network in volatility forecasting.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11954

PDF

https://arxiv.org/pdf/1805.11954
Read All
Currency exchange prediction using machine learning, genetic algorithms and technical analysis

2018-05-29

Gonçalo Abreu, Rui Neves, Nuno Horta

arXiv_CV

arXiv_CV Embedding Prediction
Abstract

Technical analysis is used to discover investment opportunities. To test this hypothesis we propose an hybrid system using machine learning techniques together with genetic algorithms. Using technical analysis there are more ways to represent a currency exchange time series than the ones it is possible to test computationally, i.e., it is unfeasible to search the whole input feature space thus a genetic algorithm is an alternative. In this work, an architecture for automatic feature selection is proposed to optimize the cross validated performance estimation of a Naive Bayes model using a genetic algorithm. The proposed architecture improves the return on investment of the unoptimized system from 0,43% to 10,29% in the validation set. The features selected and the model decision boundary are visualized using the algorithm t-Distributed Stochastic Neighbor embedding.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11232

PDF

https://arxiv.org/pdf/1805.11232
Read All
Inducing Grammars with and for Neural Machine Translation

2018-05-28

Ke Tran, Yonatan Bisk

arXiv_CL

arXiv_CL Knowledge Attention NMT
Abstract

Machine translation systems require semantic knowledge and grammatical understanding. Neural machine translation (NMT) systems often assume this information is captured by an attention mechanism and a decoder that ensures fluency. Recent work has shown that incorporating explicit syntax alleviates the burden of modeling both types of knowledge. However, requiring parses is expensive and does not explore the question of what syntax a model needs during translation. To address both of these issues we introduce a model that simultaneously translates while inducing dependency trees. In this way, we leverage the benefits of structure while investigating what syntax NMT must induce to maximize performance. We show that our dependency trees are 1. language pair dependent and 2. improve translation quality.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.10850

PDF

https://arxiv.org/pdf/1805.10850
Read All
tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow

2018-05-28

You Xie, Erik Franz, Mengyu Chu, Nils Thuerey

arXiv_CV

arXiv_CV Adversarial Super_Resolution GAN Inference
Abstract

We propose a temporally coherent generative model addressing the super-resolution problem for fluid flows. Our work represents a first approach to synthesize four-dimensional physics fields with neural networks. Based on a conditional generative adversarial network that is designed for the inference of three-dimensional volumetric data, our model generates consistent and detailed results by using a novel temporal discriminator, in addition to the commonly used spatial one. Our experiments show that the generator is able to infer more realistic high-resolution details by using additional physical quantities, such as low-resolution velocities or vorticities. Besides improvements in the training process and in the generated outputs, these inputs offer means for artistic control as well. We additionally employ a physics-aware data augmentation step, which is crucial to avoid overfitting and to reduce memory requirements. In this way, our network learns to generate advected quantities with highly detailed, realistic, and temporally coherent features. Our method works instantaneously, using only a single time-step of low-resolution fluid data. We demonstrate the abilities of our method using a variety of complex inputs and applications in two and three dimensions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.09710

PDF

https://arxiv.org/pdf/1801.09710
Read All
OpenNMT: Neural Machine Translation Toolkit

2018-05-28

Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean Senellart, Alexander M. Rush

arXiv_CL

arXiv_CL NMT Deep_Learning
Abstract

OpenNMT is an open-source toolkit for neural machine translation (NMT). The system prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architectures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of modeling and translation support, as well as detailed pedagogical documentation about the underlying techniques. OpenNMT has been used in several production MT systems, modified for numerous research papers, and is implemented across several deep learning frameworks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.11462

PDF

https://arxiv.org/pdf/1805.11462
Read All
Detecting binary compact-object mergers with gravitational waves: Understanding and Improving the sensitivity of the PyCBC search

2018-05-27

Alexander H. Nitz, Thomas Dent, Tito Dal Canton, Stephen Fairhurst, Duncan A. Brown

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

We present an improved search for binary compact-object mergers using a network of ground-based gravitational-wave detectors. We model a volumetric, isotropic source population and incorporate the resulting distribution over signal amplitude, time delay, and coalescence phase into the ranking of candidate events. We describe an improved modeling of the background distribution, and demonstrate incorporating a prior model of the binary mass distribution in the ranking of candidate events. We find a $\sim 10\%$ and $\sim 20\%$ increase in detection volume for simulated binary neutron star and neutron star–binary black hole systems, respectively, corresponding to a reduction of the false alarm rates assigned to signals by between one and two orders of magnitude.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1705.01513

PDF

https://arxiv.org/pdf/1705.01513
Read All
Defending Against Adversarial Attacks by Leveraging an Entire GAN

2018-05-27

Gokula Krishnan Santhanam, Paulina Grnarova

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Recent work has shown that state-of-the-art models are highly vulnerable to adversarial perturbations of the input. We propose cowboy, an approach to detecting and defending against adversarial attacks by using both the discriminator and generator of a GAN trained on the same dataset. We show that the discriminator consistently scores the adversarial samples lower than the real samples across multiple attacks and datasets. We provide empirical evidence that adversarial samples lie outside of the data manifold learned by the GAN. Based on this, we propose a cleaning method which uses both the discriminator and generator of the GAN to project the samples back onto the data manifold. This cleaning procedure is independent of the classifier and type of attack and thus can be deployed in existing systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.10652

PDF

https://arxiv.org/pdf/1805.10652
Read All
DeepScores -- A Dataset for Segmentation, Detection and Classification of Tiny Objects

2018-05-26

Lukas Tuggener, Ismail Elezi, Jürgen Schmidhuber, Marcello Pelillo, Thilo Stadelmann

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Classification Detection Recognition
Abstract

We present the DeepScores dataset with the goal of advancing the state-of-the-art in small objects recognition, and by placing the question of object recognition in the context of scene understanding. DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. With close to a hundred millions of small objects, this makes our dataset not only unique, but also the largest public dataset. DeepScores comes with ground truth for object classification, detection and semantic segmentation. DeepScores thus poses a relevant challenge for computer vision in general, beyond the scope of optical music recognition (OMR) research. We present a detailed statistical analysis of the dataset, comparing it with other computer vision datasets like Caltech101/256, PASCAL VOC, SUN, SVHN, ImageNet, MS-COCO, smaller computer vision datasets, as well as with other OMR datasets. Finally, we provide baseline performances for object classification and give pointers to future research based on this dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.00525

PDF

https://arxiv.org/pdf/1804.00525
Read All
SJTU-NLP at SemEval-2018 Task 9: Neural Hypernym Discovery with Term Embeddings

2018-05-26

Zhuosheng Zhang, Jiangtong Li, Hai Zhao, Bingjie Tang

arXiv_CV

arXiv_CV Embedding CNN
Abstract

This paper describes a hypernym discovery system for our participation in the SemEval-2018 Task 9, which aims to discover the best (set of) candidate hypernyms for input concepts or entities, given the search space of a pre-defined vocabulary. We introduce a neural network architecture for the concerned task and empirically study various neural network models to build the representations in latent space for words and phrases. The evaluated models include convolutional neural network, long-short term memory network, gated recurrent unit and recurrent convolutional neural network. We also explore different embedding methods, including word embedding and sense embedding for better performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.10465

PDF

https://arxiv.org/pdf/1805.10465
Read All
Robustness Analysis of Visual QA Models by Basic Questions

2018-05-26

Jia-Hong Huang, Cuong Duc Dao, Modar Alfadly, C. Huck Yang, Bernard Ghanem

arXiv_CV

arXiv_CV QA Optimization VQA
Abstract

Visual Question Answering (VQA) models should have both high robustness and accuracy. Unfortunately, most of the current VQA research only focuses on accuracy because there is a lack of proper methods to measure the robustness of VQA models. There are two main modules in our algorithm. Given a natural language question about an image, the first module takes the question as input and then outputs the ranked basic questions, with similarity scores, of the main given question. The second module takes the main question, image and these basic questions as input and then outputs the text-based answer of the main question about the given image. We claim that a robust VQA model is one, whose performance is not changed much when related basic questions as also made available to it as input. We formulate the basic questions generation problem as a LASSO optimization, and also propose a large scale Basic Question Dataset (BQD) and Rscore (novel robustness measure), for analyzing the robustness of VQA models. We hope our BQD will be used as a benchmark for to evaluate the robustness of VQA models, so as to help the community build more robust and accurate VQA models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.04625

PDF

https://arxiv.org/pdf/1709.04625
Read All
Large-scale Distance Metric Learning with Uncertainty

2018-05-25

Qi Qian, Jiasheng Tang, Hao Li, Shenghuo Zhu, Rong Jin

arXiv_CV

arXiv_CV
Abstract

Distance metric learning (DML) has been studied extensively in the past decades for its superior performance with distance-based algorithms. Most of the existing methods propose to learn a distance metric with pairwise or triplet constraints. However, the number of constraints is quadratic or even cubic in the number of the original examples, which makes it challenging for DML to handle the large-scale data set. Besides, the real-world data may contain various uncertainty, especially for the image data. The uncertainty can mislead the learning procedure and cause the performance degradation. By investigating the image data, we find that the original data can be observed from a small set of clean latent examples with different distortions. In this work, we propose the margin preserving metric learning framework to learn the distance metric and latent examples simultaneously. By leveraging the ideal properties of latent examples, the training efficiency can be improved significantly while the learned metric also becomes robust to the uncertainty in the original data. Furthermore, we can show that the metric is learned from latent examples only, but it can preserve the large margin property even for the original data. The empirical study on the benchmark image data sets demonstrates the efficacy and efficiency of the proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.10384

PDF

http://arxiv.org/pdf/1805.10384
Read All
Deep Generative Dual Memory Network for Continual Learning

2018-05-25

Nitin Kamra, Umang Gupta, Yan Liu

arXiv_CV

arXiv_CV Deep_Learning
Abstract

Despite advances in deep learning, neural networks can only learn multiple tasks when trained on them jointly. When tasks arrive sequentially, they lose performance on previously learnt tasks. This phenomenon called catastrophic forgetting is a fundamental challenge to overcome before neural networks can learn continually from incoming data. In this work, we derive inspiration from human memory to develop an architecture capable of learning continuously from sequentially incoming tasks, while averting catastrophic forgetting. Specifically, our contributions are: (i) a dual memory architecture emulating the complementary learning systems (hippocampus and the neocortex) in the human brain, (ii) memory consolidation via generative replay of past experiences, (iii) demonstrating advantages of generative replay and dual memories via experiments, and (iv) improved performance retention on challenging tasks even for low capacity models. Our architecture displays many characteristics of the mammalian memory and provides insights on the connection between sleep and learning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.10368

PDF

https://arxiv.org/pdf/1710.10368
Read All
Detecting Deceptive Reviews using Generative Adversarial Networks

2018-05-25

Hojjat Aghakhani, Aravind Machiry, Shirin Nilizadeh, Christopher Kruegel, Giovanni Vigna

arXiv_CV

arXiv_CV Review Adversarial GAN Text_Classification Reinforcement_Learning Classification
Abstract

In the past few years, consumer review sites have become the main target of deceptive opinion spam, where fictitious opinions or reviews are deliberately written to sound authentic. Most of the existing work to detect the deceptive reviews focus on building supervised classifiers based on syntactic and lexical patterns of an opinion. With the successful use of Neural Networks on various classification applications, in this paper, we propose FakeGAN a system that for the first time augments and adopts Generative Adversarial Networks (GANs) for a text classification task, in particular, detecting deceptive reviews. Unlike standard GAN models which have a single Generator and Discriminator model, FakeGAN uses two discriminator models and one generative model. The generator is modeled as a stochastic policy agent in reinforcement learning (RL), and the discriminators use Monte Carlo search algorithm to estimate and pass the intermediate action-value as the RL reward to the generator. Providing the generator model with two discriminator models avoids the mod collapse issue by learning from both distributions of truthful and deceptive reviews. Indeed, our experiments show that using two discriminators provides FakeGAN high stability, which is a known issue for GAN architectures. While FakeGAN is built upon a semi-supervised classifier, known for less accuracy, our evaluation results on a dataset of TripAdvisor hotel reviews show the same performance in terms of accuracy as of the state-of-the-art approaches that apply supervised machine learning. These results indicate that GANs can be effective for text classification tasks. Specifically, FakeGAN is effective at detecting deceptive reviews.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.10364

PDF

https://arxiv.org/pdf/1805.10364
Read All
Zero-Shot Dual Machine Translation

2018-05-25

Lierni Sestorain, Massimiliano Ciaramita, Christian Buck, Thomas Hofmann

arXiv_CL

arXiv_CL Reinforcement_Learning NMT
Abstract

Neural Machine Translation (NMT) systems rely on large amounts of parallel data. This is a major challenge for low-resource languages. Building on recent work on unsupervised and semi-supervised methods, we present an approach that combines zero-shot and dual learning. The latter relies on reinforcement learning, to exploit the duality of the machine translation task, and requires only monolingual data for the target language pair. Experiments show that a zero-shot dual system, trained on English-French and English-Spanish, outperforms by large margins a standard NMT system in zero-shot translation performance on Spanish-French (both directions). The zero-shot dual method approaches the performance, within 2.2 BLEU points, of a comparable supervised setting. Our method can obtain improvements also on the setting where a small amount of parallel data for the zero-shot language pair is available. Adding Russian, to extend our experiments to jointly modeling 6 zero-shot translation directions, all directions improve between 4 and 15 BLEU points, again, reaching performance near that of the supervised setting.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.10338

PDF

https://arxiv.org/pdf/1805.10338
Read All
Parallel Architecture and Hyperparameter Search via Successive Halving and Classification

2018-05-25

Manoj Kumar, George E. Dahl, Vijay Vasudevan, Mohammad Norouzi

arXiv_CV

arXiv_CV Optimization Classification
Abstract

We present a simple and powerful algorithm for parallel black box optimization called Successive Halving and Classification (SHAC). The algorithm operates in $K$ stages of parallel function evaluations and trains a cascade of binary classifiers to iteratively cull the undesirable regions of the search space. SHAC is easy to implement, requires no tuning of its own configuration parameters, is invariant to the scale of the objective function and can be built using any choice of binary classifier. We adopt tree-based classifiers within SHAC and achieve competitive performance against several strong baselines for optimizing synthetic functions, hyperparameters and architectures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.10255

PDF

https://arxiv.org/pdf/1805.10255
Read All
A Case for Variability-Aware Policies for NISQ-Era Quantum Computers

2018-05-25

Swamit S. Tannu, Moinuddin K.Qureshi

arXiv_CV

arXiv_CV QA VQA
Abstract

Recently, IBM, Google, and Intel showcased quantum computers ranging from 49 to 72 qubits. While these systems represent a significant milestone in the advancement of quantum computing, existing and near-term quantum computers are not yet large enough to fully support quantum error-correction. Such systems with few tens to few hundreds of qubits are termed as Noisy Intermediate Scale Quantum computers (NISQ), and these systems can provide benefits for a class of quantum algorithms. In this paper, we study the problems of Qubit-Allocation (mapping of program qubits to machine qubits) and Qubit-Movement(routing qubits from one location to another to perform entanglement). We observe that there exists variation in the error rates of different qubits and links, which can have an impact on the decisions for qubit movement and qubit allocation. We analyze characterization data for the IBM-Q20 quantum computer gathered over 52 days to understand and quantify the variation in the error-rates and find that there is indeed significant variability in the error rates of the qubits and the links connecting them. We define reliability metrics for NISQ computers and show that the device variability has the substantial impact on the overall system reliability. To exploit the variability in error rate, we propose Variation-Aware Qubit Movement (VQM) and Variation-Aware Qubit Allocation (VQA), policies that optimize the movement and allocation of qubits to avoid the weaker qubits and links and guide more operations towards the stronger qubits and links. We show that our Variation-Aware policies improve the reliability of the NISQ system up to 2.5x.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.10224

PDF

https://arxiv.org/pdf/1805.10224
Read All
Information-Propogation-Enhanced Neural Machine Translation by Relation Model

2018-05-25

Wen Zhang, Jiawei Hu, Yang Feng, Qun Liu

arXiv_CL

arXiv_CL CNN NMT RNN Relation
Abstract

Even though sequence-to-sequence neural machine translation (NMT) model have achieved state-of-art performance in the recent fewer years, but it is widely concerned that the recurrent neural network (RNN) units are very hard to capture the long-distance state information, which means RNN can hardly find the feature with long term dependency as the sequence becomes longer. Similarly, convolutional neural network (CNN) is introduced into NMT for speeding recently, however, CNN focus on capturing the local feature of the sequence; To relieve this issue, we incorporate a relation network into the standard encoder-decoder framework to enhance information-propogation in neural network, ensuring that the information of the source sentence can flow into the decoder adequately. Experiments show that proposed framework outperforms the statistical MT model and the state-of-art NMT model significantly on two data sets with different scales.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.01766

PDF

https://arxiv.org/e-print/1709.01766
Read All
Refining Source Representations with Relation Networks for Neural Machine Translation

2018-05-25

Wen Zhang, Jiawei Hu, Yang Feng, Qun Liu

arXiv_CL

arXiv_CL Attention NMT RNN Relation
Abstract

Although neural machine translation (NMT) with the encoder-decoder framework has achieved great success in recent times, it still suffers from some drawbacks: RNNs tend to forget old information which is often useful and the encoder only operates through words without considering word relationship. To solve these problems, we introduce a relation networks (RN) into NMT to refine the encoding representations of the source. In our method, the RN first augments the representation of each source word with its neighbors and reasons all the possible pairwise relations between them. Then the source representations and all the relations are fed to the attention module and the decoder together, keeping the main encoder-decoder architecture unchanged. Experiments on two Chinese-to-English data sets in different scales both show that our method can outperform the competitive baselines significantly.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.03980

PDF

https://arxiv.org/e-print/1709.03980
Read All
An Analysis of Scale Invariance in Object Detection - SNIP

2018-05-25

Bharat Singh, Larry S. Davis

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

An analysis of different techniques for recognizing and detecting objects under extreme scale variation is presented. Scale specific and scale invariant design of detectors are compared by training them with different configurations of input data. By evaluating the performance of different network architectures for classifying small objects on ImageNet, we show that CNNs are not robust to changes in scale. Based on this analysis, we propose to train and test detectors on the same scales of an image-pyramid. Since small and large objects are difficult to recognize at smaller and larger scales respectively, we present a novel training scheme called Scale Normalization for Image Pyramids (SNIP) which selectively back-propagates the gradients of object instances of different sizes as a function of the image scale. On the COCO dataset, our single model performance is 45.7% and an ensemble of 3 networks obtains an mAP of 48.3%. We use off-the-shelf ImageNet-1000 pre-trained models and only train with bounding box supervision. Our submission won the Best Student Entry in the COCO 2017 challenge. Code will be made available at \url{this http URL}.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.08189

PDF

https://arxiv.org/pdf/1711.08189
Read All
Japanese Predicate Conjugation for Neural Machine Translation

2018-05-25

Michiki Kurosawa, Yukio Matsumura, Hayahide Yamagishi, Mamoru Komachi

arXiv_CL

arXiv_CL Face NMT
Abstract

Neural machine translation (NMT) has a drawback in that can generate only high-frequency words owing to the computational costs of the softmax function in the output layer. In Japanese-English NMT, Japanese predicate conjugation causes an increase in vocabulary size. For example, one verb can have as many as 19 surface varieties. In this research, we focus on predicate conjugation for compressing the vocabulary size in Japanese. The vocabulary list is filled with the various forms of verbs. We propose methods using predicate conjugation information without discarding linguistic information. The proposed methods can generate low-frequency words and deal with unknown words. Two methods were considered to introduce conjugation information: the first considers it as a token (conjugation token) and the second considers it as an embedded vector (conjugation feature). The results using these methods demonstrate that the vocabulary size can be compressed by approximately 86.1% (Tanaka corpus) and the NMT models can output the words not in the training data set. Furthermore, BLEU scores improved by 0.91 points in Japanese-to-English translation, and 0.32 points in English-to-Japanese translation with ASPEC.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.10047

PDF

https://arxiv.org/pdf/1805.10047
Read All
Phrase Table as Recommendation Memory for Neural Machine Translation

2018-05-25

Yang Zhao, Yining Wang, Jiajun Zhang, Chengqing Zong

arXiv_CL

arXiv_CL Attention NMT Prediction Recommendation
Abstract

Neural Machine Translation (NMT) has drawn much attention due to its promising translation performance recently. However, several studies indicate that NMT often generates fluent but unfaithful translations. In this paper, we propose a method to alleviate this problem by using a phrase table as recommendation memory. The main idea is to add bonus to words worthy of recommendation, so that NMT can make correct predictions. Specifically, we first derive a prefix tree to accommodate all the candidate target phrases by searching the phrase translation table according to the source sentence. Then, we construct a recommendation word set by matching between candidate target phrases and previously translated target words by NMT. After that, we determine the specific bonus value for each recommendable word by using the attention vector and phrase translation probability. Finally, we integrate this bonus value into NMT to improve the translation results. The extensive experiments demonstrate that the proposed methods obtain remarkable improvements over the strong attentionbased NMT.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.09960

PDF

https://arxiv.org/pdf/1805.09960
Read All
Fairness GAN

2018-05-24

Prasanna Sattigeri, Samuel C. Hoffman, Vijil Chenthamarakshan, Kush R. Varshney

arXiv_CV

arXiv_CV GAN Face
Abstract

In this paper, we introduce the Fairness GAN, an approach for generating a dataset that is plausibly similar to a given multimedia dataset, but is more fair with respect to protected attributes in allocative decision making. We propose a novel auxiliary classifier GAN that strives for demographic parity or equality of opportunity and show empirical results on several datasets, including the CelebFaces Attributes (CelebA) dataset, the Quick, Draw!\ dataset, and a dataset of soccer player images and the offenses they were called for. The proposed formulation is well-suited to absorbing unlabeled data; we leverage this to augment the soccer dataset with the much larger CelebA dataset. The methodology tends to improve demographic parity and equality of opportunity while generating plausible images.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.09910

PDF

https://arxiv.org/pdf/1805.09910
Read All
RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition

2018-05-24

Albert Zeyer, Tamer Alkhouli, Hermann Ney

arXiv_CV

arXiv_CV Attention Speech_Recognition RNN Recognition
Abstract

We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder. We show that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks. Promising preliminary results on max. expected BLEU training are presented. We are able to train state-of-the-art models for translation and end-to-end models for speech recognition and show results on WMT 2017 and Switchboard. The flexibility of RETURNN allows a fast research feedback loop to experiment with alternative architectures, and its generality allows to use it on a wide range of applications.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.05225

PDF

https://arxiv.org/pdf/1805.05225
Read All
Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

2018-05-24

Keze Wang, Xiaopeng Yan, Dongyu Zhang, Lei Zhang, Liang Lin

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Though quite challenging, leveraging large-scale unlabeled or partially labeled images in a cost-effective way has increasingly attracted interests for its great importance to computer vision. To tackle this problem, many Active Learning (AL) methods have been developed. However, these methods mainly define their sample selection criteria within a single image context, leading to the suboptimal robustness and impractical solution for large-scale object detection. In this paper, aiming to remedy the drawbacks of existing AL methods, we present a principled Self-supervised Sample Mining (SSM) process accounting for the real challenges in object detection. Specifically, our SSM process concentrates on automatically discovering and pseudo-labeling reliable region proposals for enhancing the object detector via the introduced cross image validation, i.e., pasting these proposals into different labeled images to comprehensively measure their values under different image contexts. By resorting to the SSM process, we propose a new AL framework for gradually incorporating unlabeled or partially labeled data into the model learning while minimizing the annotating effort of users. Extensive experiments on two public benchmarks clearly demonstrate our proposed framework can achieve the comparable performance to the state-of-the-art methods with significantly fewer annotations.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1803.09867

PDF

https://arxiv.org/pdf/1803.09867
Read All
Primal-Dual Wasserstein GAN

2018-05-24

Mevlana Gemici, Zeynep Akata, Max Welling

arXiv_CV

arXiv_CV Adversarial GAN Inference
Abstract

We introduce Primal-Dual Wasserstein GAN, a new learning algorithm for building latent variable models of the data distribution based on the primal and the dual formulations of the optimal transport (OT) problem. We utilize the primal formulation to learn a flexible inference mechanism and to create an optimal approximate coupling between the data distribution and the generative model. In order to learn the generative model, we use the dual formulation and train the decoder adversarially through a critic network that is regularized by the approximate coupling obtained from the primal. Unlike previous methods that violate various properties of the optimal critic, we regularize the norm and the direction of the gradients of the critic function. Our model shares many of the desirable properties of auto-encoding models in terms of mode coverage and latent structure, while avoiding their undesirable averaging properties, e.g. their inability to capture sharp visual features when modeling real images. We compare our algorithm with several other generative modeling techniques that utilize Wasserstein distances on Frechet Inception Distance (FID) and Inception Scores (IS).

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.09575

PDF

https://arxiv.org/pdf/1805.09575
Read All
You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery

2018-05-24

Adam Van Etten

arXiv_CV

arXiv_CV Object_Detection Deep_Learning Detection
Abstract

Detection of small objects in large swaths of imagery is one of the primary problems in satellite imagery analytics. While object detection in ground-based imagery has benefited from research into new deep learning approaches, transitioning such technology to overhead imagery is nontrivial. Among the challenges is the sheer number of pixels and geographic extent per image: a single DigitalGlobe satellite image encompasses >64 km2 and over 250 million pixels. Another challenge is that objects of interest are minuscule (often only ~10 pixels in extent), which complicates traditional computer vision techniques. To address these issues, we propose a pipeline (You Only Look Twice, or YOLT) that evaluates satellite images of arbitrary size at a rate of >0.5 km2/s. The proposed approach can rapidly detect objects of vastly different scales with relatively little training data over multiple sensors. We evaluate large test images at native resolution, and yield scores of F1 > 0.8 for vehicle localization. We further explore resolution and object size requirements by systematically testing the pipeline at decreasing resolution, and conclude that objects only ~5 pixels in size can still be localized with high confidence. Code is available at this https URL.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.09512

PDF

https://arxiv.org/pdf/1805.09512
Read All
Quantifying the visual concreteness of words and topics in multimodal datasets

2018-05-23

Jack Hessel, David Mimno, Lillian Lee

arXiv_CV

arXiv_CV Image_Caption Caption Relation Recommendation
Abstract

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multimodal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multimodal research.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.06786

PDF

https://arxiv.org/pdf/1804.06786
Read All
Working Memory Networks: Augmenting Memory Networks with a Relational Reasoning Module

2018-05-23

Juan Pavez, Héctor Allende, Héctor Allende-Cid

arXiv_CV

arXiv_CV QA Attention Relation Memory_Networks
Abstract

During the last years, there has been a lot of interest in achieving some kind of complex reasoning using deep neural networks. To do that, models like Memory Networks (MemNNs) have combined external memory storages and attention mechanisms. These architectures, however, lack of more complex reasoning mechanisms that could allow, for instance, relational reasoning. Relation Networks (RNs), on the other hand, have shown outstanding results in relational reasoning tasks. Unfortunately, their computational cost grows quadratically with the number of memories, something prohibitive for larger problems. To solve these issues, we introduce the Working Memory Network, a MemNN architecture with a novel working memory storage and reasoning module. Our model retains the relational reasoning abilities of the RN while reducing its computational complexity from quadratic to linear. We tested our model on the text QA dataset bAbI and the visual QA dataset NLVR. In the jointly trained bAbI-10k, we set a new state-of-the-art, achieving a mean error of less than 0.5%. Moreover, a simple ensemble of two of our models solves all 20 tasks in the joint version of the benchmark.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.09354

PDF

https://arxiv.org/pdf/1805.09354
Read All
Evidence of nanoscale Anderson localization induced by intrinsic compositional disorder in InGaN/GaN quantum wells by scanning tunneling luminescence spectroscopy

2018-05-23

W. Hahn, J.-M. Lentali, P. Polovodov, N. Young, S. Nakamura, J. S. Speck, C. Weisbuch, M. Filoche, Y-R. Wu, M. Piccardo, F. Maroun, L. Martinelli, Y. Lassailly, J. Peretti

arXiv_CV

arXiv_CV GAN Face
Abstract

We present direct experimental evidences of Anderson localization induced by the intrinsic alloy compositional disorder of InGaN/GaN quantum wells. Our approach relies on the measurement of the luminescence spectrum under local injection of electrons from a scanning tunneling microscope tip into a near-surface single quantum well. Fluctuations in the emission line shape are observed on a few-nanometer scale. Narrow emission peaks characteristic of single localized states are resolved. Calculations in the framework of the localization landscape theory provide the effective confining potential map stemming from composition fluctuations. This theory explains well the observed nanometer scale carrier localization and the energies of these Anderson-type localized states. The energy spreading of the emission from localized states is consistent with the usually observed very broad photo- or electro-luminescence spectra of InGaN/GaN quantum well structures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.09030

PDF

https://arxiv.org/pdf/1805.09030
Read All
CNN+CNN: Convolutional Decoders for Image Captioning

2018-05-23

Qingzhong Wang, Antoni B. Chan

arXiv_CV

arXiv_CV Image_Caption Attention Caption CNN RNN
Abstract

Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural network (RNN) or long-short term memory (LSTM) based models dominate this field. However, RNNs or LSTMs cannot be calculated in parallel and ignore the underlying hierarchical structure of a sentence. In this paper, we propose a framework that only employs convolutional neural networks (CNNs) to generate captions. Owing to parallel computing, our basic model is around 3 times faster than NIC (an LSTM-based model) during training time, while also providing better results. We conduct extensive experiments on MSCOCO and investigate the influence of the model width and depth. Compared with LSTM-based models that apply similar attention mechanisms, our proposed models achieves comparable scores of BLEU-1,2,3,4 and METEOR, and higher scores of CIDEr. We also test our model on the paragraph annotation dataset, and get higher CIDEr score compared with hierarchical LSTMs

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.09019

PDF

https://arxiv.org/pdf/1805.09019
Read All
Semi-Supervised Learning with GANs: Revisiting Manifold Regularization

2018-05-23

Bruno Lecouat, Chuan-Sheng Foo, Houssam Zenati, Vijay R. Chandrasekhar

arXiv_CV

arXiv_CV Regularization GAN
Abstract

GANS are powerful generative models that are able to model the manifold of natural images. We leverage this property to perform manifold regularization by approximating the Laplacian norm using a Monte Carlo approximation that is easily computed with the GAN. When incorporated into the feature-matching GAN of Improved GAN, we achieve state-of-the-art results for GAN-based semi-supervised learning on the CIFAR-10 dataset, with a method that is significantly easier to implement than competing methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08957

PDF

https://arxiv.org/pdf/1805.08957
Read All
Elastic Registration of Medical Images With GANs

2018-05-23

Dwarikanath Mahapatra, Suman Sedai, Rahil Garnavi

arXiv_CV

arXiv_CV Adversarial GAN Deep_Learning
Abstract

Conventional approaches to image registration consist of time consuming iterative methods. Most current deep learning (DL) based registration methods extract deep features to use in an iterative setting. We propose an end-to-end DL method for registering multimodal images. Our approach uses generative adversarial networks (GANs) that eliminates the need for time consuming iterative methods, and directly generates the registered image with the deformation field. Appropriate constraints in the GAN cost function produce accurately registered images in less than a second. Experiments demonstrate their accuracy for multimodal retinal and cardiac MR image registration.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.02369

PDF

https://arxiv.org/pdf/1805.02369
Read All
Learning Time/Memory-Efficient Deep Architectures with Budgeted Super Networks

2018-05-22

Tom Veniat, Ludovic Denoyer

arXiv_CV

arXiv_CV CNN Prediction Gradient_Descent
Abstract

We propose to focus on the problem of discovering neural network architectures efficient in terms of both prediction quality and cost. For instance, our approach is able to solve the following tasks: learn a neural network able to predict well in less than 100 milliseconds or learn an efficient model that fits in a 50 Mb memory. Our contribution is a novel family of models called Budgeted Super Networks (BSN). They are learned using gradient descent techniques applied on a budgeted learning objective function which integrates a maximum authorized cost, while making no assumption on the nature of this cost. We present a set of experiments on computer vision problems and analyze the ability of our technique to deal with three different costs: the computation cost, the memory consumption cost and a distributed computation cost. We particularly show that our model can discover neural network architectures that have a better accuracy than the ResNet and Convolutional Neural Fabrics architectures on CIFAR-10 and CIFAR-100, at a lower cost.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.00046

PDF

https://arxiv.org/pdf/1706.00046
Read All
A scene perception system for visually impaired based on object detection and classification using multi-modal DCNN

2018-05-22

Baljit Kaur, Jhilik Bhattacharya

arXiv_CV

arXiv_CV Object_Detection Classification Detection
Abstract

This paper represents a cost-effective scene perception system aimed towards visually impaired individual. We use an odroid system integrated with an USB camera and USB laser that can be attached on the chest. The system classifies the detected objects along with its distance from the user and provides a voice output. Experimental results provided in this paper use outdoor traffic scenes. The object detection and classification framework exploits a multi-modal fusion based faster RCNN using motion, sharpening and blurring filters for efficient feature representation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08798

PDF

https://arxiv.org/pdf/1805.08798
Read All
Expectation propagation: a probabilistic view of Deep Feed Forward Networks

2018-05-22

Mirco Milletarí, Thiparat Chotibut, Paolo E. Trevisanutto

arXiv_CV

arXiv_CV Classification
Abstract

We present a statistical mechanics model of deep feed forward neural networks (FFN). Our energy-based approach naturally explains several known results and heuristics, providing a solid theoretical framework and new instruments for a systematic development of FFN. We infer that FFN can be understood as performing three basic steps: encoding, representation validation and propagation. We obtain a set of natural activations – such as sigmoid, $\tanh$ and ReLu – together with a state-of-the-art one, recently obtained by Ramachandran et al.(arXiv:1710.05941) using an extensive search algorithm. We term this activation ESP (Expected Signal Propagation), explain its probabilistic meaning, and study the eigenvalue spectrum of the associated Hessian on classification tasks. We find that ESP allows for faster training and more consistent performances over a wide range of network architectures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08786

PDF

https://arxiv.org/pdf/1805.08786
Read All
Memory-Efficient Deep Salient Object Segmentation Networks on Gridized Superpixels

2018-05-22

Caglar Aytekin, Xingyang Ni, Francesco Cricri, Lixin Fan, Emre Aksu

arXiv_CV

arXiv_CV Salient Object_Detection Segmentation CNN Semantic_Segmentation Deep_Learning Detection
Abstract

Computer vision algorithms with pixel-wise labeling tasks, such as semantic segmentation and salient object detection, have gone through a significant accuracy increase with the incorporation of deep learning. Deep segmentation methods slightly modify and fine-tune pre-trained networks that have hundreds of millions of parameters. In this work, we question the need to have such memory demanding networks for the specific task of salient object segmentation. To this end, we propose a way to learn a memory-efficient network from scratch by training it only on salient object detection datasets. Our method encodes images to gridized superpixels that preserve both the object boundaries and the connectivity rules of regular pixels. This representation allows us to use convolutional neural networks that operate on regular grids. By using these encoded images, we train a memory-efficient network using only 0.048\% of the number of parameters that other deep salient object detection networks have. Our method shows comparable accuracy with the state-of-the-art deep salient object detection methods and provides a faster and a much more memory-efficient alternative to them. Due to its easy deployment, such a network is preferable for applications in memory limited devices such as mobile phones and IoT devices.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.09558

PDF

https://arxiv.org/pdf/1712.09558
Read All
On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

2018-05-22

Maziar Sanjabi, Jimmy Ba, Meisam Razaviyayn, Jason D. Lee

arXiv_CV

arXiv_CV Adversarial GAN Optimization
Abstract

Generative Adversarial Networks (GANs) are one of the most practical methods for learning data distributions. A popular GAN formulation is based on the use of Wasserstein distance as a metric between probability distributions. Unfortunately, minimizing the Wasserstein distance between the data distribution and the generative model distribution is a computationally challenging problem as its objective is non-convex, non-smooth, and even hard to compute. In this work, we show that obtaining gradient information of the smoothed Wasserstein GAN formulation, which is based on regularized Optimal Transport (OT), is computationally effortless and hence one can apply first order optimization methods to minimize this objective. Consequently, we establish theoretical convergence guarantee to stationarity for a proposed class of GAN optimization algorithms. Unlike the original non-smooth formulation, our algorithm only requires solving the discriminator to approximate optimality. We apply our method to learning MNIST digits as well as CIFAR-10images. Our experiments show that our method is computationally efficient and generates images comparable to the state of the art algorithms given the same architecture and computational power.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1802.08249

PDF

https://arxiv.org/pdf/1802.08249
Read All
Joint Image Captioning and Question Answering

2018-05-22

Jialin Wu, Zeyuan Hu, Raymond J. Mooney

arXiv_CV

arXiv_CV Image_Caption Knowledge QA Caption VQA
Abstract

Answering visual questions need acquire daily common knowledge and model the semantic connection among different parts in images, which is too difficult for VQA systems to learn from images with the only supervision from answers. Meanwhile, image captioning systems with beam search strategy tend to generate similar captions and fail to diversely describe images. To address the aforementioned issues, we present a system to have these two tasks compensate with each other, which is capable of jointly producing image captions and answering visual questions. In particular, we utilize question and image features to generate question-related captions and use the generated captions as additional features to provide new knowledge to the VQA system. For image captioning, our system attains more informative results in term of the relative improvements on VQA tasks as well as competitive results using automated metrics. Applying our system to the VQA tasks, our results on VQA v2 dataset achieve 65.8% using generated captions and 69.1% using annotated captions in validation set and 68.4% in the test-standard set. Further, an ensemble of 10 models results in 69.7% in the test-standard split.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08389

PDF

https://arxiv.org/pdf/1805.08389
Read All
A Solvable High-Dimensional Model of GAN

2018-05-22

Chuang Wang, Hong Hu, Yue M. Lu

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Despite the remarkable successes of generative adversarial networks (GANs) in many applications, theoretical understandings of their performance is still limited. In this paper, we present a simple shallow GAN model fed by high-dimensional input data. The dynamics of the training process of the proposed model can be exactly analyzed in the high-dimensional limit. In particular, by using the tool of scaling limits of stochastic processes, we show that the macroscopic quantities measuring the quality of the training process converge to a deterministic process that is characterized as the unique solution of a finite-dimensional ordinary differential equation (ODE). The proposed model is simple, but its training process already exhibits several different phases that can mimic the behaviors of more realistic GAN models used in practice. Specifically, depending on the choice of the learning rates, the training process can reach either a successful, a failed, or an oscillating phase. By studying the steady-state solutions of the limiting ODEs, we obtain a phase diagram that precisely characterizes the conditions under which each phase takes place. Although this work focuses on a simple GAN model, the analysis methods developed here might prove useful in the theoretical understanding of other variants of GANs with more advanced training algorithms.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08349

PDF

https://arxiv.org/pdf/1805.08349
Read All
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning

2018-05-22

Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Cho-Jui Hsieh

arXiv_CV

arXiv_CV Image_Caption Adversarial Caption CNN RNN
Abstract

Visual language grounding is widely studied in modern neural image captioning systems, which typically adopts an encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature extraction and a recurrent neural network (RNN) for language caption generation. To study the robustness of language grounding to adversarial perturbations in machine vision and perception, we propose Show-and-Fool, a novel algorithm for crafting adversarial examples in neural image captioning. The proposed algorithm provides two evaluation approaches, which check whether neural image captioning systems can be mislead to output some randomly chosen captions or keywords. Our extensive experiments show that our algorithm can successfully craft visually-similar adversarial examples with randomly targeted captions or keywords, and the adversarial examples can be made highly transferable to other image captioning systems. Consequently, our approach leads to new robustness implications of neural image captioning and novel insights in visual language grounding.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.02051

PDF

https://arxiv.org/pdf/1712.02051
Read All

216/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL