Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Object Detection Through Exploration With A Foveated Visual Field

2017-11-06

Emre Akbas, Miguel P. Eckstein

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

We present a foveated object detector (FOD) as a biologically-inspired alternative to the sliding window (SW) approach which is the dominant method of search in computer vision object detection. Similar to the human visual system, the FOD has higher resolution at the fovea and lower resolution at the visual periphery. Consequently, more computational resources are allocated at the fovea and relatively fewer at the periphery. The FOD processes the entire scene, uses retino-specific object detection classifiers to guide eye movements, aligns its fovea with regions of interest in the input image and integrates observations across multiple fixations. Our approach combines modern object detectors from computer vision with a recent model of peripheral pooling regions found at the V1 layer of the human visual system. We assessed various eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD performs on par with the SW detector while bringing significant computational cost savings.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1408.0814

PDF

https://arxiv.org/pdf/1408.0814
Read All
KGAN: How to Break The Minimax Game in GAN

2017-11-06

Trung Le, Tu Dinh Nguyen, Dinh Phung

arXiv_CV

arXiv_CV Adversarial GAN Classification
Abstract

Generative Adversarial Networks (GANs) were intuitively and attractively explained under the perspective of game theory, wherein two involving parties are a discriminator and a generator. In this game, the task of the discriminator is to discriminate the real and generated (i.e., fake) data, whilst the task of the generator is to generate the fake data that maximally confuses the discriminator. In this paper, we propose a new viewpoint for GANs, which is termed as the minimizing general loss viewpoint. This viewpoint shows a connection between the general loss of a classification problem regarding a convex loss function and a f-divergence between the true and fake data distributions. Mathematically, we proposed a setting for the classification problem of the true and fake data, wherein we can prove that the general loss of this classification problem is exactly the negative f-divergence for a certain convex function f. This allows us to interpret the problem of learning the generator for dismissing the f-divergence between the true and fake data distributions as that of maximizing the general loss which is equivalent to the min-max problem in GAN if the Logistic loss is used in the classification problem. However, this viewpoint strengthens GANs in two ways. First, it allows us to employ any convex loss function for the discriminator. Second, it suggests that rather than limiting ourselves in NN-based discriminators, we can alternatively utilize other powerful families. Bearing this viewpoint, we then propose using the kernel-based family for discriminators. This family has two appealing features: i) a powerful capacity in classifying non-linear nature data and ii) being convex in the feature space. Using the convexity of this family, we can further develop Fenchel duality to equivalently transform the max-min problem to the max-max dual problem.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.01744

PDF

https://arxiv.org/pdf/1711.01744
Read All
Active Learning for Visual Question Answering: An Empirical Study

2017-11-06

Xiao Lin, Devi Parikh

arXiv_CV

arXiv_CV QA VQA
Abstract

We present an empirical study of active learning for Visual Question Answering, where a deep VQA model selects informative question-image pairs from a pool and queries an oracle for answers to maximally improve its performance under a limited query budget. Drawing analogies from human learning, we explore cramming (entropy), curiosity-driven (expected model change), and goal-driven (expected error reduction) active learning approaches, and propose a fast and effective goal-driven active learning scoring function to pick question-image pairs for deep VQA models under the Bayesian Neural Network framework. We find that deep VQA models need large amounts of training data before they can start asking informative questions. But once they do, all three approaches outperform the random selection baseline and achieve significant query savings. For the scenario where the model is allowed to ask generic questions about images but is evaluated only on specific questions (e.g., questions whose answer is either yes or no), our proposed goal-driven scoring function performs the best.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.01732

PDF

https://arxiv.org/pdf/1711.01732
Read All
Spatial Pyramid Context-Aware Moving Object Detection and Tracking for Full Motion Video and Wide Aerial Motion Imagery

2017-11-05

Mahdieh Poostchi

arXiv_CV

arXiv_CV Object_Detection Face Tracking Detection
Abstract

A robust and fast automatic moving object detection and tracking system is essential to characterize target object and extract spatial and temporal information for different functionalities including video surveillance systems, urban traffic monitoring and navigation, robotic. In this dissertation, I present a collaborative Spatial Pyramid Context-aware moving object detection and Tracking system. The proposed visual tracker is composed of one master tracker that usually relies on visual object features and two auxiliary trackers based on object temporal motion information that will be called dynamically to assist master tracker. SPCT utilizes image spatial context at different level to make the video tracking system resistant to occlusion, background noise and improve target localization accuracy and robustness. We chose a pre-selected seven-channel complementary features including RGB color, intensity and spatial pyramid of HoG to encode object color, shape and spatial layout information. We exploit integral histogram as building block to meet the demands of real-time performance. A novel fast algorithm is presented to accurately evaluate spatially weighted local histograms in constant time complexity using an extension of the integral histogram method. Different techniques are explored to efficiently compute integral histogram on GPU architecture and applied for fast spatio-temporal median computations and 3D face reconstruction texturing. We proposed a multi-component framework based on semantic fusion of motion information with projected building footprint map to significantly reduce the false alarm rate in urban scenes with many tall structures. The experiments on extensive VOTC2016 benchmark dataset and aerial video confirm that combining complementary tracking cues in an intelligent fusion framework enables persistent tracking for Full Motion Video and Wide Aerial Motion Imagery.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.01656

PDF

https://arxiv.org/pdf/1711.01656
Read All
Robots as-a-Service in Cloud Computing: Search and Rescue in Large-scale Disasters Case Study

2017-11-04

Carla Mouradian, Sami Yangui, Roch H. Glitho

arXiv_CV

arXiv_CV
Abstract

Internet of Things (IoT) is expected to enable a myriad of applications by interconnecting objects - such as sensors and robots - over the Internet. IoT applications range from healthcare to autonomous vehicles and include disaster management. Enabling these applications in cloud environments requires the design of appropriate IoT Infrastructure-as-a-Service (IoT IaaS) to ease the provisioning of the IoT objects as cloud services. This paper discusses a case study on search and rescue IoT applications in large-scale disaster scenarios. It proposes an IoT IaaS architecture that virtualizes robots (IaaS for robots) and provides them to the upstream applications as-a-Service. Node- and Network-level robots virtualization are supported. The proposed architecture meets a set of identified requirements, such as the need for a unified description model for heterogeneous robots, publication/discovery mechanism, and federation with other IaaS for robots when needed. A validating proof of concept is built and experiments are made to evaluate its performance. Lessons learned and prospective research directions are discussed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.04919

PDF

https://arxiv.org/pdf/1710.04919
Read All
Accelerating Training of Deep Neural Networks via Sparse Edge Processing

2017-11-03

Sourya Dey, Yinan Shao, Keith M. Chugg, Peter A. Beerel

arXiv_CV

arXiv_CV Sparse Inference
Abstract

We propose a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements. This novel architecture introduces the notion of edge-processing to provide flexibility and combines junction pipelining and operational parallelization to speed up training. The overall effect is to reduce network complexity by factors up to 30x and training time by up to 35x relative to GPUs, while maintaining high fidelity of inference results. This has the potential to enable extensive parameter searches and development of the largely unexplored theoretical foundation of DNNs. The architecture automatically adapts itself to different network sizes given available hardware resources. As proof of concept, we show results obtained for different bit widths.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.01343

PDF

https://arxiv.org/pdf/1711.01343
Read All
Fisher GAN

2017-11-03

Youssef Mroueh, Tom Sercu

arXiv_CV

arXiv_CV Adversarial GAN Classification
Abstract

Generative Adversarial Networks (GANs) are powerful models for learning complex distributions. Stable training of GANs has been addressed in many recent works which explore different metrics between distributions. In this paper we introduce Fisher GAN which fits within the Integral Probability Metrics (IPM) framework for training GANs. Fisher GAN defines a critic with a data dependent constraint on its second order moments. We show in this paper that Fisher GAN allows for stable and time efficient training that does not compromise the capacity of the critic, and does not need data independent constraints such as weight clipping. We analyze our Fisher IPM theoretically and provide an algorithm based on Augmented Lagrangian for Fisher GAN. We validate our claims on both image sample generation and semi-supervised classification using Fisher GAN.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1705.09675

PDF

https://arxiv.org/pdf/1705.09675
Read All
Good Semi-supervised Learning that Requires a Bad GAN

2017-11-03

Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Ruslan Salakhutdinov

arXiv_CV

arXiv_CV Adversarial GAN Classification
Abstract

Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time. Theoretically, we show that given the discriminator objective, good semisupervised learning indeed requires a bad generator, and propose the definition of a preferred generator. Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1705.09783

PDF

https://arxiv.org/pdf/1705.09783
Read All
A Taught-Obesrve-Ask Method for Object Detection with Critical Supervision

2017-11-03

Chi-Hao Wu, Qin Huang, Siyang Li, C.-C. Jay Kuo

arXiv_CV

arXiv_CV Object_Detection QA Weakly_Supervised Detection
Abstract

Being inspired by child’s learning experience - taught first and followed by observation and questioning, we investigate a critically supervised learning methodology for object detection in this work. Specifically, we propose a taught-observe-ask (TOA) method that consists of several novel components such as negative object proposal, critical example mining, and machine-guided question-answer (QA) labeling. To consider labeling time and performance jointly, new evaluation methods are developed to compare the performance of the TOA method, with the fully and weakly supervised learning methods. Extensive experiments are conducted on the PASCAL VOC and the Caltech benchmark datasets. The TOA method provides significantly improved performance of weakly supervision yet demands only about 3-6% of labeling time of full supervision. The effectiveness of each novel component is also analyzed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.01043

PDF

https://arxiv.org/pdf/1711.01043
Read All
Towards Neural Machine Translation with Partially Aligned Corpora

2017-11-03

Yining Wang, Yang Zhao, Jiajun Zhang, Chengqing Zong, Zhengshan Xue

arXiv_CL

arXiv_CL Optimization NMT
Abstract

While neural machine translation (NMT) has become the new paradigm, the parameter optimization requires large-scale parallel data which is scarce in many domains and language pairs. In this paper, we address a new translation scenario in which there only exists monolingual corpora and phrase pairs. We propose a new method towards translation with partially aligned sentence pairs which are derived from the phrase pairs and monolingual corpora. To make full use of the partially aligned corpora, we adapt the conventional NMT training method in two aspects. On one hand, different generation strategies are designed for aligned and unaligned target words. On the other hand, a different objective function is designed to model the partially aligned parts. The experiments demonstrate that our method can achieve a relatively good result in such a translation scenario, and tiny bitexts can boost translation quality to a large extent.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.01006

PDF

https://arxiv.org/pdf/1711.01006
Read All
Learning Hard Alignments with Variational Inference

2017-11-01

Dieterich Lawson, Chung-Cheng Chiu, George Tucker, Colin Raffel, Kevin Swersky, Navdeep Jaitly

arXiv_CV

arXiv_CV Attention Speech_Recognition Caption Inference Recognition
Abstract

There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training hard attention models can be difficult because of the discrete latent variables they introduce. Previous work used REINFORCE and Q-learning to approach these issues, but those methods can provide high-variance gradient estimates and be slow to train. In this paper, we tackle the problem of learning hard attention for a sequential task using variational inference methods, specifically the recently introduced VIMCO and NVIL. Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We demonstrate our method on a phoneme recognition task in clean and noisy environments and show that our method outperforms REINFORCE, with the difference being greater for a more complicated task.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1705.05524

PDF

https://arxiv.org/pdf/1705.05524
Read All
Pipeline for the Detection of Serendipitous Stellar Occultations by Kuiper Belt Objects with the Colibri Fast-Photometry Array

2017-11-01

Emily Pass, Stanimir Metchev, Peter Brown, Steven Beauchemin

arXiv_CV

arXiv_CV Detection
Abstract

We report results from the preliminary trials of Colibri, a dedicated fast-photometry array for the detection of small Kuiper belt objects through serendipitous stellar occultations. Colibri’s novel data processing pipeline analyzed 4000 star hours with two overlapping-field EMCCD cameras, detecting no Kuiper belt objects and one false positive occultation event in a high ecliptic latitude field. No occultations would be expected at these latitudes, allowing these results to provide a control sample for the upcoming main Colibri campaign. The empirical false positive rate found by the processing pipeline is consistent with the 0.002% simulation-determined false positive rate. We also describe Colibri’s software design, kernel sets for modeling stellar occultations, and method for retrieving occultation parameters from noisy diffraction curves. Colibri’s main campaign will begin in mid-2018, operating at a 40 Hz sampling rate.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.00358

PDF

https://arxiv.org/pdf/1711.00358
Read All
Improving Neural Machine Translation through Phrase-based Forced Decoding

2017-11-01

Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi Nakamura

arXiv_CL

arXiv_CL NMT
Abstract

Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using this cost to rerank the n-best NMT outputs. The main challenge in implementing this approach is that NMT outputs may not be in the search space of the standard phrase-based decoding algorithm, because the search space of phrase-based SMT is limited by the phrase-based translation rule table. We propose a soft forced decoding algorithm, which can always successfully find a decoding path for any NMT output. We show that using the forced decoding cost to rerank the NMT outputs can successfully improve translation quality on four different language pairs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.00309

PDF

https://arxiv.org/pdf/1711.00309
Read All
Semantic Image Retrieval via Active Grounding of Visual Situations

2017-10-31

Max H. Quinn, Erik Conser, Jordan M. Witte, Melanie Mitchell

arXiv_CV

arXiv_CV Image_Retrieval Relation
Abstract

We describe a novel architecture for semantic image retrieval—in particular, retrieval of instances of visual situations. Visual situations are concepts such as “a boxing match,” “walking the dog,” “a crowd waiting for a bus,” or “a game of ping-pong,” whose instantiations in images are linked more by their common spatial and semantic structure than by low-level visual similarity. Given a query situation description, our architecture—called Situate—learns models capturing the visual features of expected objects as well the expected spatial configuration of relationships among objects. Given a new image, Situate uses these models in an attempt to ground (i.e., to create a bounding box locating) each expected component of the situation in the image via an active search procedure. Situate uses the resulting grounding to compute a score indicating the degree to which the new image is judged to contain an instance of the situation. Such scores can be used to rank images in a collection as part of a retrieval system. In the preliminary study described here, we demonstrate the promise of this system by comparing Situate’s performance with that of two baseline methods, as well as with a related semantic image-retrieval system based on “scene graphs.”

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.00088

PDF

https://arxiv.org/pdf/1711.00088
Read All
Parametrizing filters of a CNN with a GAN

2017-10-31

Yannic Kilcher, Gary Becigneul, Thomas Hofmann

arXiv_CV

arXiv_CV Adversarial GAN CNN
Abstract

It is commonly agreed that the use of relevant invariances as a good statistical bias is important in machine-learning. However, most approaches that explicitly incorporate invariances into a model architecture only make use of very simple transformations, such as translations and rotations. Hence, there is a need for methods to model and extract richer transformations that capture much higher-level invariances. To that end, we introduce a tool allowing to parametrize the set of filters of a trained convolutional neural network with the latent space of a generative adversarial network. We then show that the method can capture highly non-linear invariances of the data by visualizing their effect in the data space.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.11386

PDF

https://arxiv.org/pdf/1710.11386
Read All
Alloying strategy for two-dimensional GaN optical emitters

2017-10-30

C. Pashartis, O. Rubel

arXiv_CV

arXiv_CV GAN
Abstract

The recent progress in formation of two-dimensional (2D) GaN by a migration-enhanced encapsulated technique opens up new possibilities for group III-V 2D semiconductors with a band gap within the visible energy spectrum. Using first-principles calculations we explored alloying of 2D-GaN to achieve an optically active material with a tuneable band gap. The effect of isoelectronic III-V substitutional elements on the band gaps, band offsets, and spatial electron localization is studied. In addition to optoelectronic properties, the formability of alloys is evaluated using impurity formation energies. A dilute highly-mismatched solid solution 2D-GaN$_{1-x}$P$_x$ features an efficient band gap reduction in combination with a moderate energy penalty associated with incorporation of phosphorous in 2D-GaN, which is substantially lower than in the case of the bulk GaN. The group-V alloying elements also introduce significant disorder and localization at the valence band edge that facilitates direct band gap optical transitions thus implying the feasibility of using III-V alloys of 2D-GaN in light-emitting devices.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.04625

PDF

https://arxiv.org/pdf/1707.04625
Read All
Understanding Hidden Memories of Recurrent Neural Networks

2017-10-30

Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu

arXiv_CV

arXiv_CV Review Knowledge RNN
Abstract

Recurrent neural networks (RNNs) have been successfully applied to various natural language processing (NLP) tasks and achieved better results than conventional methods. However, the lack of understanding of the mechanisms behind their effectiveness limits further improvements on their architectures. In this paper, we present a visual analytics method for understanding and comparing RNN models for NLP tasks. We propose a technique to explain the function of individual hidden state units based on their expected response to input texts. We then co-cluster hidden state units and words based on the expected response and visualize co-clustering results as memory chips and word clouds to provide more structured knowledge on RNNs’ hidden states. We also propose a glyph-based sequence visualization based on aggregate information to analyze the behavior of an RNN’s hidden state at the sentence-level. The usability and effectiveness of our method are demonstrated through case studies and reviews from domain experts.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.10777

PDF

https://arxiv.org/pdf/1710.10777
Read All
Transfer Learning to Learn with Multitask Neural Model Search

2017-10-30

Catherine Wong, Andrea Gesmundo

arXiv_CV

arXiv_CV Knowledge NAS Reinforcement_Learning Transfer_Learning Optimization Deep_Learning
Abstract

Deep learning models require extensive architecture design exploration and hyperparameter optimization to perform well on a given task. The exploration of the model design space is often made by a human expert, and optimized using a combination of grid search and search heuristics over a large space of possible choices. Neural Architecture Search (NAS) is a Reinforcement Learning approach that has been proposed to automate architecture design. NAS has been successfully applied to generate Neural Networks that rival the best human-designed architectures. However, NAS requires sampling, constructing, and training hundreds to thousands of models to achieve well-performing architectures. This procedure needs to be executed from scratch for each new task. The application of NAS to a wide set of tasks currently lacks a way to transfer generalizable knowledge across tasks. In this paper, we present the Multitask Neural Model Search (MNMS) controller. Our goal is to learn a generalizable framework that can condition model construction on successful model searches for previously seen tasks, thus significantly speeding up the search for new tasks. We demonstrate that MNMS can conduct an automated architecture search for multiple tasks simultaneously while still learning well-performing, specialized models for each task. We then show that pre-trained MNMS controllers can transfer learning to new tasks. By leveraging knowledge from previous searches, we find that pre-trained MNMS models start from a better location in the search space and reduce search time on unseen tasks, while still discovering models that outperform published human-designed models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.10776

PDF

https://arxiv.org/pdf/1710.10776
Read All
Cascade Region Proposal and Global Context for Deep Object Detection

2017-10-30

Qiaoyong Zhong, Chao Li, Yingying Zhang, Di Xie, Shicai Yang, Shiliang Pu

arXiv_CV

arXiv_CV Object_Detection Detection Recognition
Abstract

Deep region-based object detector consists of a region proposal step and a deep object recognition step. In this paper, we make significant improvements on both of the two steps. For region proposal we propose a novel lightweight cascade structure which can effectively improve RPN proposal quality. For object recognition we re-implement global context modeling with a few modications and obtain a performance boost (4.2% mAP gain on the ILSVRC 2016 validation set). Besides, we apply the idea of pre-training extensively and show its importance in both steps. Together with common training and testing tricks, we improve Faster R-CNN baseline by a large margin. In particular, we obtain 87.9% mAP on the PASCAL VOC 2012 test set, 65.3% on the ILSVRC 2016 test set and 36.8% on the COCO test-std set.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.10749

PDF

https://arxiv.org/pdf/1710.10749
Read All
Evaluation of Automatic Video Captioning Using Direct Assessment

2017-10-29

Yvette Graham, George Awad, Alan Smeaton

arXiv_CV

arXiv_CV Video_Caption Caption
Abstract

We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for any given video clip there is no definitive ground truth or correct answer against which to measure. Automatic metrics for comparing automatic video captions against a manual caption such as BLEU and METEOR, drawn from techniques used in evaluating machine translation, were used in the TRECVid video captioning task in 2016 but these are shown to have weaknesses. The work presented here brings human assessment into the evaluation by crowdsourcing how well a caption describes a video. We automatically degrade the quality of some sample captions which are assessed manually and from this we are able to rate the quality of the human assessors, a factor we take into account in the evaluation. Using data from the TRECVid video-to-text task in 2016, we show how our direct assessment method is replicable and robust and should scale to where there many caption-generation techniques to be evaluated.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.10586

PDF

https://arxiv.org/pdf/1710.10586
Read All
A Novel Approach to Artistic Textual Visualization via GAN

2017-10-29

Yichi Ma, Muhan Ma

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

While the visualization of statistical data tends to a mature technology, the visualization of textual data is still in its infancy, especially for the artistic text. Due to the fact that visualization of artistic text is valuable and attractive in both art and information science, we attempt to realize this tentative idea in this article. We propose the Generative Adversarial Network based Artistic Textual Visualization (GAN-ATV) which can create paintings after analyzing the semantic content of existing poems. Our GAN-ATV consists of two main sections: natural language analysis section and visual information synthesis section. In natural language analysis section, we use Bag-of-Word (BoW) feature descriptors and a two-layer network to mine and analyze the high-level semantic information from poems. In visual information synthesis section, we design a cross-modal semantic understanding module and integrate it with Generative Adversarial Network (GAN) to create paintings, whose content are corresponding to the original poems. Moreover, in order to train our GAN-ATV and verify its performance, we establish a cross-modal artistic dataset named “Cross-Art”. In the Cross-Art dataset, there are six topics and each topic has their corresponding paintings and poems. The experimental results on Cross-Art dataset are shown in this article.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.10553

PDF

https://arxiv.org/pdf/1710.10553
Read All
Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

2017-10-27

Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy

arXiv_CV

arXiv_CV Image_Caption Caption Classification Prediction Relation Recognition
Abstract

Reward augmented maximum likelihood (RAML), a simple and effective learning framework to directly optimize towards the reward function in structured prediction tasks, has led to a number of impressive empirical successes. RAML incorporates task-specific reward by performing maximum-likelihood updates on candidate outputs sampled according to an exponentiated payoff distribution, which gives higher probabilities to candidates that are close to the reference output. While RAML is notable for its simplicity, efficiency, and its impressive empirical successes, the theoretical properties of RAML, especially the behavior of the exponentiated payoff distribution, has not been examined thoroughly. In this work, we introduce softmax Q-distribution estimation, a novel theoretical interpretation of RAML, which reveals the relation between RAML and Bayesian decision theory. The softmax Q-distribution can be regarded as a smooth approximation of the Bayes decision boundary, and the Bayes decision rule is achieved by decoding with this Q-distribution. We further show that RAML is equivalent to approximately estimating the softmax Q-distribution, with the temperature $\tau$ controlling approximation error. We perform two experiments, one on synthetic data of multi-class classification and one on real data of image captioning, to demonstrate the relationship between RAML and the proposed softmax Q-distribution estimation method, verifying our theoretical analysis. Additional experiments on three structured prediction tasks with rewards defined on sequential (named entity recognition), tree-based (dependency parsing) and irregular (machine translation) structures show notable improvements over maximum likelihood baselines.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1705.07136

PDF

https://arxiv.org/pdf/1705.07136
Read All
A Self-Training Method for Semi-Supervised GANs

2017-10-27

Alan Do-Omri, Dalei Wu, Xiaohua Liu

arXiv_CV

arXiv_CV Adversarial GAN Deep_Learning
Abstract

Since the creation of Generative Adversarial Networks (GANs), much work has been done to improve their training stability, their generated image quality, their range of application but nearly none of them explored their self-training potential. Self-training has been used before the advent of deep learning in order to allow training on limited labelled training data and has shown impressive results in semi-supervised learning. In this work, we combine these two ideas and make GANs self-trainable for semi-supervised learning tasks by exploiting their infinite data generation potential. Results show that using even the simplest form of self-training yields an improvement. We also show results for a more complex self-training scheme that performs at least as well as the basic self-training scheme but with significantly less data augmentation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.10313

PDF

https://arxiv.org/pdf/1710.10313
Read All
Set2Model Networks: Learning Discriminatively To Learn Generative Models

2017-10-27

A. Vakhitov, A. Kuzmin, V. Lempitsky

arXiv_CV

arXiv_CV Image_Retrieval Embedding
Abstract

We present a new “learning-to-learn”-type approach that enables rapid learning of concepts from small-to-medium sized training sets and is primarily designed for web-initialized image retrieval. At the core of our approach is a deep architecture (a Set2Model network) that maps sets of examples to simple generative probabilistic models such as Gaussians or mixtures of Gaussians in the space of high-dimensional descriptors. The parameters of the embedding into the descriptor space are trained in the end-to-end fashion in the meta-learning stage using a set of training learning problems. The main technical novelty of our approach is the derivation of the backprop process through the mixture model fitting, which makes the likelihood of the resulting models differentiable with respect to the positions of the input descriptors. While the meta-learning process for a Set2Model network is discriminative, a trained Set2Model network performs generative learning of generative models in the descriptor space, which facilitates learning in the cases when no negative examples are available, and whenever the concept being learned is polysemous or represented by noisy training sets. Among other experiments, we demonstrate that these properties allow Set2Model networks to pick visual concepts from the raw outputs of Internet image search engines better than a set of strong baselines.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1612.07697

PDF

https://arxiv.org/pdf/1612.07697
Read All
Bias-controlled spectral response in GaN/AlN single-nanowire ultraviolet photodetectors

2017-10-27

Maria Spies, Martien I. Den Hertog, Pascal Hille, Jörg Schörmann, Jakub Polaczyński, Bruno Gayral, Martin Eickhoff, Eva Monroy, Jonas Lähnemann

arXiv_CV

arXiv_CV Object_Detection GAN Detection Relation
Abstract

We present a study of GaN single-nanowire ultraviolet photodetectors with an embedded GaN/AlN superlattice. The heterostructure dimensions and doping profile were designed in such a way that the application of positive or negative bias leads to an enhancement of the collection of photogenerated carriers from the GaN/AlN superlattice or from the GaN base, respectively, as confirmed by electron beam-induced current measurements. The devices display enhanced response in the ultraviolet A ($\approx$ 330-360 nm) / B ($\approx$ 280-330 nm) spectral windows under positive/negative bias. The result is explained by correlation of the photocurrent measurements with scanning transmission electron microscopy observations of the same single nanowire, and semi-classical simulations of the strain and band structure in one and three dimensions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.01869

PDF

https://arxiv.org/pdf/1712.01869
Read All
First-principles identification of defect levels in Er-doped GaN

2017-10-26

Khang Hoang

arXiv_CV

arXiv_CV GAN
Abstract

Erbium (Er) doped GaN has been studied extensively for optoelectronic applications, yet its defect physics is still not well understood. In this work, we report a first-principles hybrid density functional study of the structure, energetics, and thermodynamic transition levels of Er-related defect complexes in GaN. We discover for the first time that Er${\rm Ga}$-C${\rm N}$-$V_{\rm N}$, a defect complex of Er, a C impurity, and an N vacancy, and Er${\rm Ga}$-O${\rm N}$-$V_{\rm N}$, a complex of Er, an O impurity, and an N vacancy, form defect levels at 0.18 and 0.46 eV below the conduction band, respectively. Together with Er${\rm Ga}$-$V{\rm N}$, a defect complex of Er and an N vacancy which has recently been found to produce a donor level at 0.61 eV, these defect complexes provide explanation for the Er-related defect levels observed in experiments. The role of these defects in optical excitation of the luminescent Er center is also discussed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.09886

PDF

https://arxiv.org/pdf/1710.09886
Read All
Thermal boundary conductance across epitaxial ZnO/GaN interfaces: Assessment of phonon gas models and atomistic Green's function approaches for predicting interfacial phonon transport

2017-10-26

John T. Gaskins (1), George Kotsonis (2), Ashutosh Giri (1), Christopher T. Shelton (2), Edward Sachet (2), Zhe Cheng (3), Brian M. Foley (3), Zeyu Liu (4), Shenghong Ju (5 and 6), Junichiro (5 and 6), Mark S. Goorsky (7), Samuel Graham (3 and 8), Tengfei Luo (4 and 9), Asegun Henry (3, 8, and 10), Jon-Paul Maria (2), Patrick E. Hopkins (1, 11, and 12) ((1) Department of Mechanical and Aerospace Engineering, University of Virginia, (2) Department of Materials Science and Engineering, North Carolina State University, (3) George W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, (4) Department of Aerospace and Mechanical Engineering, Notre Dame, (5) Department of Mechanical Engineering, The University of Tokyo, (6) Center for Materials Research by Information Integration, National Institute for Materials Science, (7) Department of Materials Science and Engineering, University of California, (8) School of Materials Science and Engineering, Georgia Institute of Technology, (9) Center for Sustainable Energy of Notre Dame (ND Energy), Notre Dame, (10) Heat Lab, Georgia Institute of Technology, (11) Department of Materials Science and Engineering, University of Virginia, (12) Department of Physics, University of Virginia)

arXiv_CV

arXiv_CV GAN Face
Abstract

We present experimental measurements of the thermal boundary conductance (TBC) from $77 - 500$ K across isolated heteroepitaxially grown ZnO films on GaN substrates. These data provide an assessment of the assumptions that drive the phonon gas model-based diffuse mismatch models (DMM) and atomistic Green’s function (AGF) formalisms for predicting TBC. Our measurements, when compared to previous experimental data, suggest that the TBC can be influenced by long wavelength, zone center modes in a material on one side of the interface as opposed to the “vibrational mismatch” concept assumed in the DMM; this disagreement is pronounced at high temperatures. At room temperature, we measure the ZnO/GaN TBC as $490\lbrack +150, -110\rbrack$ MW m$^{-2}$ K$^{-1}$. The disagreement among the DMM and AGF and the experimental data these elevated temperatures suggests a non-negligible contribution from additional modes contributing to TBC that not accounted for in the fundamental assumptions of these harmonic formalisms, such as inelastic scattering. Given the high quality of these ZnO/GaN interface, these results provide an invaluable critical and quantitive assessment of the accuracy of assumptions in the current state of the art of computational approaches for predicting the phonon TBC across interfaces.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.09525

PDF

https://arxiv.org/pdf/1710.09525
Read All
Neural Stain-Style Transfer Learning using GAN for Histopathological Images

2017-10-25

Hyungjoo Cho, Sungbin Lim, Gunho Choi, Hyunseok Min

arXiv_CV

arXiv_CV Adversarial GAN Style_Transfer Transfer_Learning Classification
Abstract

Performance of data-driven network for tumor classification varies with stain-style of histopathological images. This article proposes the stain-style transfer (SST) model based on conditional generative adversarial networks (GANs) which is to learn not only the certain color distribution but also the corresponding histopathological pattern. Our model considers feature-preserving loss in addition to well-known GAN loss. Consequently our model does not only transfers initial stain-styles to the desired one but also prevent the degradation of tumor classifier on transferred images. The model is examined using the CAMELYON16 dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.08543

PDF

https://arxiv.org/pdf/1710.08543
Read All
Automated Audio Captioning with Recurrent Neural Networks

2017-10-24

Konstantinos Drossos, Sharath Adavanne, Tuomas Virtanen

arXiv_CV

arXiv_CV Image_Caption Caption RNN Classification
Abstract

We present the first approach to automated audio captioning. We employ an encoder-decoder scheme with an alignment model in between. The input to the encoder is a sequence of log mel-band energies calculated from an audio file, while the output is a sequence of words, i.e. a caption. The encoder is a multi-layered, bi-directional gated recurrent unit (GRU) and the decoder a multi-layered GRU with a classification layer connected to the last GRU of the decoder. The classification layer and the alignment model are fully connected layers with shared weights between timesteps. The proposed method is evaluated using data drawn from a commercial sound effects library, ProSound Effects. The resulting captions were rated through metrics utilized in machine translation and image captioning fields. Results from metrics show that the proposed method can predict words appearing in the original caption, but not always correctly ordered.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.10006

PDF

https://arxiv.org/pdf/1706.10006
Read All
A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction

2017-10-24

Thomas Wiatowski, Helmut Bölcskei

arXiv_CV

arXiv_CV Image_Caption Caption CNN Classification
Abstract

Deep convolutional neural networks have led to breakthrough results in numerous practical machine learning tasks such as classification of images in the ImageNet data set, control-policy-learning to play Atari games or the board game Go, and image captioning. Many of these applications first perform feature extraction and then feed the results thereof into a trainable classifier. The mathematical analysis of deep convolutional neural networks for feature extraction was initiated by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on a wavelet transform followed by the modulus non-linearity in each network layer, and proved translation invariance (asymptotically in the wavelet scale parameter) and deformation stability of the corresponding feature extractor. This paper complements Mallat’s results by developing a theory that encompasses general convolutional transforms, or in more technical parlance, general semi-discrete frames (including Weyl-Heisenberg filters, curvelets, shearlets, ridgelets, wavelets, and learned filters), general Lipschitz-continuous non-linearities (e.g., rectified linear units, shifted logistic sigmoids, hyperbolic tangents, and modulus functions), and general Lipschitz-continuous pooling operators emulating, e.g., sub-sampling and averaging. In addition, all of these elements can be different in different network layers. For the resulting feature extractor we prove a translation invariance result of vertical nature in the sense of the features becoming progressively more translation-invariant with increasing network depth, and we establish deformation sensitivity bounds that apply to signal classes such as, e.g., band-limited functions, cartoon functions, and Lipschitz functions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1512.06293

PDF

https://arxiv.org/pdf/1512.06293
Read All
Luminescent N-polar N/GaN quantum wells grown by plasma-assisted molecular beam epitaxy at high temperature

2017-10-23

C. Chèze, F. Feix, J. Lähnemann, T. Flissikowski, O. Brandt, M. Kryśko, P. Wolny, H. Turski, C. Skierbiszewski, O. Brandt

arXiv_CV

arXiv_CV GAN Face
Abstract

N-polar (In,Ga)N/GaN quantum wells prepared on freestanding GaN substrates by plasma-assisted molecular beam epitaxy at conventional growth temperatures of about 650 °C do not exhibit any detectable luminescence even at 10 K. In the present work, we investigate (In,Ga)N/GaN quantum wells grown on Ga- and N-polar GaN substrates at a constant temperature of 730 °C. This exceptionally high temperature results in a vanishing In incorporation for the Ga-polar sample. In contrast, quantum wells with an In content of 20% and abrupt interfaces are formed on N-polar GaN. Moreover, these quantum wells exhibit a spatially homogeneous green luminescence band up to room temperature, but the intensity of this band is observed to strongly quench with temperature. Temperature-dependent photoluminescence transients show that this thermal quenching is related to a high density of nonradiative Shockley-Read-Hall centers with large capture coefficients for electrons and holes.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.08351

PDF

https://arxiv.org/pdf/1710.08351
Read All
Attentive Semantic Video Generation using Captions

2017-10-21

Tanya Marwah, Gaurav Mittal, Vineeth N. Balasubramanian

arXiv_CV

arXiv_CV Style_Transfer Caption Action_Recognition Recognition
Abstract

This paper proposes a network architecture to perform variable length semantic video generation using captions. We adopt a new perspective towards video generation where we allow the captions to be combined with the long-term and short-term dependencies between video frames and thus generate a video in an incremental manner. Our experiments demonstrate our network architecture’s ability to distinguish between objects, actions and interactions in a video and combine them to generate videos for unseen captions. The network also exhibits the capability to perform spatio-temporal style transfer when asked to generate videos for a sequence of captions. We also show that the network’s ability to learn a latent representation allows it generate videos in an unsupervised manner and perform other tasks such as action recognition. (Accepted in International Conference in Computer Vision (ICCV) 2017)

Abstract (translated by Google)

URL

https://arxiv.org/abs/1708.05980

PDF

https://arxiv.org/pdf/1708.05980
Read All
Sync-DRAW: Automatic Video Generation using Deep Recurrent Attentive Architectures

2017-10-21

Gaurav Mittal, Tanya Marwah, Vineeth N. Balasubramanian

arXiv_CV

arXiv_CV Knowledge Attention Caption
Abstract

This paper introduces a novel approach for generating videos called Synchronized Deep Recurrent Attentive Writer (Sync-DRAW). Sync-DRAW can also perform text-to-video generation which, to the best of our knowledge, makes it the first approach of its kind. It combines a Variational Autoencoder~(VAE) with a Recurrent Attention Mechanism in a novel manner to create a temporally dependent sequence of frames that are gradually formed over time. The recurrent attention mechanism in Sync-DRAW attends to each individual frame of the video in sychronization, while the VAE learns a latent distribution for the entire video at the global level. Our experiments with Bouncing MNIST, KTH and UCF-101 suggest that Sync-DRAW is efficient in learning the spatial and temporal information of the videos and generates frames with high structural integrity, and can generate videos from simple captions on these datasets. (Accepted as oral paper in ACM-Multimedia 2017)

Abstract (translated by Google)

URL

https://arxiv.org/abs/1611.10314

PDF

https://arxiv.org/pdf/1611.10314
Read All
An efficient deep learning hashing neural network for mobile visual search

2017-10-21

Heng Qi, Wu Liu, Liang Liu

arXiv_CV

arXiv_CV Deep_Learning Recognition
Abstract

Mobile visual search applications are emerging that enable users to sense their surroundings with smart phones. However, because of the particular challenges of mobile visual search, achieving a high recognition bitrate has becomes a consistent target of previous related works. In this paper, we propose a few-parameter, low-latency, and high-accuracy deep hashing approach for constructing binary hash codes for mobile visual search. First, we exploit the architecture of the MobileNet model, which significantly decreases the latency of deep feature extraction by reducing the number of model parameters while maintaining accuracy. Second, we add a hash-like layer into MobileNet to train the model on labeled mobile visual data. Evaluations show that the proposed system can exceed state-of-the-art accuracy performance in terms of the MAP. More importantly, the memory consumption is much less than that of other deep learning models. The proposed method requires only $13$ MB of memory for the neural network and achieves a MAP of $97.80\%$ on the mobile location recognition dataset used for testing.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.07750

PDF

https://arxiv.org/pdf/1710.07750
Read All
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

2017-10-20

Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, Heng Tao Shen

arXiv_CV

arXiv_CV Video_Caption Caption RNN
Abstract

Video captioning in essential is a complex natural process, which is affected by various uncertainties stemming from video content, subjective judgment, etc. In this paper we build on the recent progress in using encoder-decoder framework for video captioning and address what we find to be a critical deficiency of the existing methods, that most of the decoders propagate deterministic hidden states. Such complex uncertainty cannot be modeled efficiently by the deterministic models. In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables. Therefore, MS-RNN can improve the performance of video captioning, and generate multiple sentences to describe a video considering different random factors. Specifically, a multi-modal LSTM (M-LSTM) is first proposed to interact with both visual and textual features to capture a high-level representation. Then, a backward stochastic LSTM (S-LSTM) is proposed to support uncertainty propagation by introducing latent variables. Experimental results on the challenging datasets MSVD and MSR-VTT show that our proposed MS-RNN approach outperforms the state-of-the-art video captioning benchmarks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1708.02478

PDF

https://arxiv.org/pdf/1708.02478
Read All
Learning Wasserstein Embeddings

2017-10-20

Nicolas Courty, Rémi Flamary, Mélanie Ducoffe

arXiv_CV

arXiv_CV Attention Embedding Optimization
Abstract

The Wasserstein distance received a lot of attention recently in the community of machine learning, especially for its principled way of comparing distributions. It has found numerous applications in several hard problems, such as domain adaptation, dimensionality reduction or generative models. However, its use is still limited by a heavy computational cost. Our goal is to alleviate this problem by providing an approximation mechanism that allows to break its inherent complexity. It relies on the search of an embedding where the Euclidean distance mimics the Wasserstein distance. We show that such an embedding can be found with a siamese architecture associated with a decoder network that allows to move from the embedding space back to the original input space. Once this embedding has been found, computing optimization problems in the Wasserstein space (e.g. barycenters, principal directions or even archetypes) can be conducted extremely fast. Numerical experiments supporting this idea are conducted on image datasets, and show the wide potential benefits of our method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.07457

PDF

https://arxiv.org/pdf/1710.07457
Read All
Meta-Learning via Feature-Label Memory Network

2017-10-19

Dawit Mureja, Hyunsin Park, Chang D. Yoo

arXiv_CV

arXiv_CV Inference Classification Deep_Learning Prediction
Abstract

Deep learning typically requires training a very capable architecture using large datasets. However, many important learning problems demand an ability to draw valid inferences from small size datasets, and such problems pose a particular challenge for deep learning. In this regard, various researches on “meta-learning” are being actively conducted. Recent work has suggested a Memory Augmented Neural Network (MANN) for meta-learning. MANN is an implementation of a Neural Turing Machine (NTM) with the ability to rapidly assimilate new data in its memory, and use this data to make accurate predictions. In models such as MANN, the input data samples and their appropriate labels from previous step are bound together in the same memory locations. This often leads to memory interference when performing a task as these models have to retrieve a feature of an input from a certain memory location and read only the label information bound to that location. In this paper, we tried to address this issue by presenting a more robust MANN. We revisited the idea of meta-learning and proposed a new memory augmented neural network by explicitly splitting the external memory into feature and label memories. The feature memory is used to store the features of input data samples and the label memory stores their labels. Hence, when predicting the label of a given input, our model uses its feature memory unit as a reference to extract the stored feature of the input, and based on that feature, it retrieves the label information of the input from the label memory unit. In order for the network to function in this framework, a new memory-writingmodule to encode label information into the label memory in accordance with the meta-learning task structure is designed. Here, we demonstrate that our model outperforms MANN by a large margin in supervised one-shot classification tasks using Omniglot and MNIST datasets.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.07110

PDF

https://arxiv.org/pdf/1710.07110
Read All
Improved Search in Hamming Space using Deep Multi-Index Hashing

2017-10-19

Hanjiang Lai, Yan Pan

arXiv_CV

arXiv_CV Image_Caption Image_Retrieval
Abstract

Similarity-preserving hashing is a widely-used method for nearest neighbour search in large-scale image retrieval tasks. There has been considerable research on generating efficient image representation via the deep-network-based hashing methods. However, the issue of efficient searching in the deep representation space remains largely unsolved. To this end, we propose a simple yet efficient deep-network-based multi-index hashing method for simultaneously learning the powerful image representation and the efficient searching. To achieve these two goals, we introduce the multi-index hashing (MIH) mechanism into the proposed deep architecture, which divides the binary codes into multiple substrings. Due to the non-uniformly distributed codes will result in inefficiency searching, we add the two balanced constraints at feature-level and instance-level, respectively. Extensive evaluations on several benchmark image retrieval datasets show that the learned balanced binary codes bring dramatic speedups and achieve comparable performance over the existing baselines.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.06993

PDF

https://arxiv.org/pdf/1710.06993
Read All
Paying Attention to Multi-Word Expressions in Neural Machine Translation

2017-10-17

Matīss Rikters, Ondřej Bojar

arXiv_CL

arXiv_CL Attention NMT
Abstract

Processing of multi-word expressions (MWEs) is a known problem for any natural language processing task. Even neural machine translation (NMT) struggles to overcome it. This paper presents results of experiments on investigating NMT attention allocation to the MWEs and improving automated translation of sentences that contain MWEs in English->Latvian and English->Czech NMT systems. Two improvement strategies were explored -(1) bilingual pairs of automatically extracted MWE candidates were added to the parallel corpus used to train the NMT system, and (2) full sentences containing the automatically extracted MWE candidates were added to the parallel corpus. Both approaches allowed to increase automated evaluation results. The best result - 0.99 BLEU point increase - has been reached with the first approach, while with the second approach minimal improvements achieved. We also provide open-source software and tools used for MWE extraction and alignment inspection.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.06313

PDF

https://arxiv.org/pdf/1710.06313
Read All
Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance

2017-10-17

Aditya Mogadala, Umanga Bista, Lexing Xie, Achim Rettinger

arXiv_CV

arXiv_CV Image_Caption Knowledge Attention Caption Inference Recognition
Abstract

Images in the wild encapsulate rich knowledge about varied abstract concepts and cannot be sufficiently described with models built only using image-caption pairs containing selected objects. We propose to handle such a task with the guidance of a knowledge base that incorporate many abstract concepts. Our method is a two-step process where we first build a multi-entity-label image recognition model to predict abstract concepts as image labels and then leverage them in the second step as an external semantic attention and constrained inference in the caption generation model for describing images that depict unseen/novel objects. Evaluations show that our models outperform most of the prior work for out-of-domain captioning on MSCOCO and are useful for integration of knowledge and vision in general.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.06303

PDF

https://arxiv.org/pdf/1710.06303
Read All
Dynamic Oracle for Neural Machine Translation in Decoding Phase

2017-10-16

Zi-Yi Dou, Hao Zhou, Shu-Jian Huang, Xin-Yu Dai, Jia-Jun Chen

arXiv_CL

arXiv_CL NMT Inference
Abstract

The past several years have witnessed the rapid progress of end-to-end Neural Machine Translation (NMT). However, there exists discrepancy between training and inference in NMT when decoding, which may lead to serious problems since the model might be in a part of the state space it has never seen during training. To address the issue, Scheduled Sampling has been proposed. However, there are certain limitations in Scheduled Sampling and we propose two dynamic oracle-based methods to improve it. We manage to mitigate the discrepancy by changing the training process towards a less guided scheme and meanwhile aggregating the oracle’s demonstrations. Experimental results show that the proposed approaches improve translation quality over standard NMT system.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.06265

PDF

https://arxiv.org/pdf/1709.06265
Read All
Mapping higher-order network flows in memory and multilayer networks with Infomap

2017-10-16

Daniel Edler, Ludvig Bohlin, Martin Rosvall

arXiv_CV

arXiv_CV Sparse Detection Memory_Networks
Abstract

Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and mapping higher-order network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile community-detection algorithms to reveal important modular regularities in the flows. Here we show that various forms of higher-order network flows can be represented in a unified way with networks that distinguish physical nodes for representing a~complex system’s objects from state nodes for describing flows between the objects. Moreover, these so-called sparse memory networks allow the information-theoretic community detection method known as the map equation to identify overlapping and nested flow modules in data from a range of~different higher-order interactions such as multistep, multi-source, and temporal data. We derive the map equation applied to sparse memory networks and describe its search algorithm Infomap, which can exploit the flexibility of sparse memory networks. Together they provide a general solution to reveal overlapping modular patterns in higher-order flows through complex systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.04792

PDF

https://arxiv.org/pdf/1706.04792
Read All
Gradient-free Policy Architecture Search and Adaptation

2017-10-16

Sayna Ebrahimi, Anna Rohrbach, Trevor Darrell

arXiv_CV

arXiv_CV NAS Optimization
Abstract

We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent’s lifetime as it learns to drive in a realistic simulated environment.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.05958

PDF

https://arxiv.org/pdf/1710.05958
Read All
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

2017-10-16

Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem

arXiv_CV

arXiv_CV QA Embedding VQA Recognition
Abstract

An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning. In particular, the task of visual recognition is aligned to the task of visual question answering by forcing each to use the same word-region embeddings. We show this leads to greater inductive transfer from recognition to VQA than standard multitask learning. Visual recognition also improves, especially for categories that have relatively few recognition training labels but appear often in the VQA setting. Thus, our paper takes a small step towards creating more general vision systems by showing the benefit of interpretable, flexible, and trainable core representations.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1704.00260

PDF

https://arxiv.org/pdf/1704.00260
Read All
Optimally Stopped Variational Quantum Algorithms

2017-10-15

Walter Vinci, Alireza Shabani

arXiv_CV

arXiv_CV QA Optimization VQA
Abstract

Quantum processors promise a paradigm shift in high-performance computing which needs to be assessed by accurate benchmarking measures. In this work, we introduce a new benchmark for variational quantum algorithm (VQA), recently proposed as a heuristic algorithm for small-scale quantum processors. In VQA, a classical optimization algorithm guides the quantum dynamics of the processor to yield the best solution for a given problem. A complete assessment of scalability and competitiveness of VQA should take into account both the quality and the time of dynamics optimization. The method of optimal stopping, employed here, provides such an assessment by explicitly including time as a cost factor. Here we showcase this measure for benchmarking VQA as a solver for some quadratic unconstrained binary optimization. Moreover we show that a better choice for the cost function of the classical routine can significantly improve the performance of the VQA algorithm and even improving it’s scaling properties.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.05365

PDF

https://arxiv.org/pdf/1710.05365
Read All
Sushi Dish - Object detection and classification from real images

2017-10-15

Yeongjin Oh, Seunghyun Son, Gyumin Sim

arXiv_CV

arXiv_CV Object_Detection Classification Detection
Abstract

In conveyor belt sushi restaurants, billing is a burdened job because one has to manually count the number of dishes and identify the color of them to calculate the price. In a busy situation, there can be a mistake that customers are overcharged or under-charged. To deal with this problem, we developed a method that automatically identifies the color of dishes and calculate the total price using real images. Our method consists of ellipse fitting and convol-utional neural network. It achieves ellipse detection precision 85% and recall 96% and classification accuracy 92%.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.00751

PDF

https://arxiv.org/pdf/1709.00751
Read All
SELYMATRA: Web Application for the analysis of mass spectra

2017-10-15

Davide Nardone, Angelo Ciaramella, Mariangela Cerreta, Salvatore Pulcrano, Gian Carlo Bellenchi, Giuseppe Manco, Ferdinando Febbraio

arXiv_CV

arXiv_CV Face Prediction
Abstract

Surface Enhanced Laser Desorption/Ionization-Time Of Flight Mass Spectrometry (SELDI-TOF MS) is a variant of the MALDI. It is uses in many cases especially for the analysis of protein profiling and for preliminary screening tasks of complex sample aimed for the searching of biomarker. Unfortunately, these analysis are time consuming and strictly limited about the protein identification. Seldi analysis of mass spectra (SELYMATRA) is a Web Application (WA) developed with the aim of reduce these lacks automating the identification processes and introducing the possibility to predict the proteins present in complex mixtures from cells and tissues analysed by Mass Spectrometry. SELYMATRA has the following characteristics. The architectural pattern used to develop the WA is the Model-View-Controller (MVC), extremely used in the development of software system. The WA expects an user to upload data in a Microsoft Excel spreadsheet file format, usually generated by means of the proprietary Mass Spectrometry softwares. Several parameters can be set such as experiment conditions, range of isoelectric point, range of pH, relative errors and so on. The WA compare the mass value among two mass spectra (sample vs control) to extract differences, and according to the parameters set, it queries a local database for the prediction of the most likely proteins related to the masses differently expressed. The WA was validated in a cellular model overexpressing a tagged NURR1 receptor. SELYMATRA is available at this http URL.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.05914

PDF

https://arxiv.org/pdf/1710.05914
Read All
Vector Quantization using the Improved Differential Evolution Algorithm for Image Compression

2017-10-15

Sayan Nag

arXiv_CV

arXiv_CV Optimization
Abstract

Vector Quantization, VQ is a popular image compression technique with a simple decoding architecture and high compression ratio. Codebook designing is the most essential part in Vector Quantization. LindeBuzoGray, LBG is a traditional method of generation of VQ Codebook which results in lower PSNR value. A Codebook affects the quality of image compression, so the choice of an appropriate codebook is a must. Several optimization techniques have been proposed for global codebook generation to enhance the quality of image compression. In this paper, a novel algorithm called IDE-LBG is proposed which uses Improved Differential Evolution Algorithm coupled with LBG for generating optimum VQ Codebooks. The proposed IDE works better than the traditional DE with modifications in the scaling factor and the boundary control mechanism. The IDE generates better solutions by efficient exploration and exploitation of the search space. Then the best optimal solution obtained by the IDE is provided as the initial Codebook for the LBG. This approach produces an efficient Codebook with less computational time and the consequences include excellent PSNR values and superior quality reconstructed images. It is observed that the proposed IDE-LBG find better VQ Codebooks as compared to IPSO-LBG, BA-LBG and FA-LBG.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.05311

PDF

https://arxiv.org/pdf/1710.05311
Read All
Cold-Start Reinforcement Learning with Softmax Policy Gradient

2017-10-13

Nan Ding, Radu Soricut

arXiv_CV

arXiv_CV Image_Caption Summarization Reinforcement_Learning Caption Prediction
Abstract

Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction. In this paper, we describe a reinforcement learning method based on a softmax value function that requires neither of these procedures. Our method combines the advantages of policy-gradient methods with the efficiency and simplicity of maximum-likelihood approaches. We apply this new cold-start reinforcement learning method in training sequence generation models for structured output prediction problems. Empirical evidence validates this method on automatic summarization and image captioning tasks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.09346

PDF

https://arxiv.org/pdf/1709.09346
Read All
Quenching of the luminescence intensity of GaN nanowires under electron beam exposure: Impact of C adsorption on the exciton lifetime

2017-10-12

Jonas Lähnemann, Timur Flissikowski, Martin Wölz, Lutz Geelhaar, Holger T. Grahn, Oliver Brandt, Uwe Jahn

arXiv_CV

arXiv_CV GAN Face
Abstract

Electron irradiation of GaN nanowires in a scanning electron microscope strongly reduces their luminous efficiency as shown by cathodoluminescence imaging and spectroscopy. We demonstrate that this luminescence quenching originates from a combination of charge trapping at already existing surface states and the formation of new surface states induced by the adsorption of C on the nanowire sidewalls. The interplay of these effects leads to a complex temporal evolution of the quenching, which strongly depends on the incident electron dose per area. Time-resolved photoluminescence measurements on electron-irradiated samples reveal that the carbonaceous adlayer affects both the nonradiative and the radiative recombination dynamics.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1607.03397

PDF

https://arxiv.org/pdf/1607.03397
Read All

227/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL