Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Image Captioning using Deep Neural Architectures

2018-01-17

Parth Shah, Vishvajit Bakarola, Supriya Pati

arXiv_CV

arXiv_CV Image_Caption Caption Recognition
Abstract

Automatically creating the description of an image using any natural languages sentence like English is a very challenging task. It requires expertise of both image processing as well as natural language processing. This paper discuss about different available models for image captioning task. We have also discussed about how the advancement in the task of object recognition and machine translation has greatly improved the performance of image captioning model in recent years. In addition to that we have discussed how this model can be implemented. In the end, we have also evaluated the performance of model using standard evaluation matrices.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.05568

PDF

https://arxiv.org/pdf/1801.05568
Read All
GitGraph - Architecture Search Space Creation through Frequent Computational Subgraph Mining

2018-01-16

Kamil Bennani-Smires, Claudiu Musat, Andreea Hossmann, Michael Baeriswyl

arXiv_CV

arXiv_CV NAS Reinforcement_Learning
Abstract

The dramatic success of deep neural networks across multiple application areas often relies on experts painstakingly designing a network architecture specific to each task. To simplify this process and make it more accessible, an emerging research effort seeks to automate the design of neural network architectures, using e.g. evolutionary algorithms or reinforcement learning or simple search in a constrained space of neural modules. Considering the typical size of the search space (e.g. $10^{10}$ candidates for a $10$-layer network) and the cost of evaluating a single candidate, current architecture search methods are very restricted. They either rely on static pre-built modules to be recombined for the task at hand, or they define a static hand-crafted framework within which they can generate new architectures from the simplest possible operations. In this paper, we relax these restrictions, by capitalizing on the collective wisdom contained in the plethora of neural networks published in online code repositories. Concretely, we (a) extract and publish GitGraph, a corpus of neural architectures and their descriptions; (b) we create problem-specific neural architecture search spaces, implemented as a textual search mechanism over GitGraph; (c) we propose a method of identifying unique common subgraphs within the architectures solving each problem (e.g., image processing, reinforcement learning), that can then serve as modules in the newly created problem specific neural search space.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.05159

PDF

https://arxiv.org/pdf/1801.05159
Read All
Localization-Aware Active Learning for Object Detection

2018-01-16

Chieh-Chi Kao, Teng-Yok Lee, Pradeep Sen, Ming-Yu Liu

arXiv_CV

arXiv_CV Object_Detection Image_Classification Classification Prediction Detection
Abstract

Active learning - a class of algorithms that iteratively searches for the most informative samples to include in a training dataset - has been shown to be effective at annotating data for image classification. However, the use of active learning for object detection is still largely unexplored as determining informativeness of an object-location hypothesis is more difficult. In this paper, we address this issue and present two metrics for measuring the informativeness of an object hypothesis, which allow us to leverage active learning to reduce the amount of annotated data needed to achieve a target object detection performance. Our first metric measures ‘localization tightness’ of an object hypothesis, which is based on the overlapping ratio between the region proposal and the final prediction. Our second metric measures ‘localization stability’ of an object hypothesis, which is based on the variation of predicted object locations when input images are corrupted by noise. Our experimental results show that by augmenting a conventional active-learning algorithm designed for classification with the proposed metrics, the amount of labeled training data required can be reduced up to 25%. Moreover, on PASCAL 2007 and 2012 datasets our localization-stability method has an average relative improvement of 96.5% and 81.9% over the baseline method using classification only.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.05124

PDF

https://arxiv.org/pdf/1801.05124
Read All
Variational Recurrent Neural Machine Translation

2018-01-16

Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, Biao Zhang

arXiv_CL

arXiv_CL NMT Inference RNN
Abstract

Partially inspired by successful applications of variational recurrent neural networks, we propose a novel variational recurrent neural machine translation (VRNMT) model in this paper. Different from the variational NMT, VRNMT introduces a series of latent random variables to model the translation procedure of a sentence in a generative way, instead of a single latent variable. Specifically, the latent random variables are included into the hidden states of the NMT decoder with elements from the variational autoencoder. In this way, these variables are recurrently generated, which enables them to further capture strong and complex dependencies among the output translations at different timesteps. In order to deal with the challenges in performing efficient posterior inference and large-scale training during the incorporation of latent variables, we build a neural posterior approximator, and equip it with a reparameterization technique to estimate the variational lower bound. Experiments on Chinese-English and English-German translation tasks demonstrate that the proposed model achieves significant improvements over both the conventional and variational NMT models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.05119

PDF

https://arxiv.org/pdf/1801.05119
Read All
Grounded Language Understanding for Manipulation Instructions Using GAN-Based Classification

2018-01-16

Komei Sugiura, Hisashi Kawai

arXiv_CV

arXiv_CV Adversarial Knowledge QA GAN Classification Quantitative
Abstract

The target task of this study is grounded language understanding for domestic service robots (DSRs). In particular, we focus on instruction understanding for short sentences where verbs are missing. This task is of critical importance to build communicative DSRs because manipulation is essential for DSRs. Existing instruction understanding methods usually estimate missing information only from non-grounded knowledge; therefore, whether the predicted action is physically executable or not was unclear. In this paper, we present a grounded instruction understanding method to estimate appropriate objects given an instruction and situation. We extend the Generative Adversarial Nets (GAN) and build a GAN-based classifier using latent representations. To quantitatively evaluate the proposed method, we have developed a data set based on the standard data set used for Visual QA. Experimental results have shown that the proposed method gives the better result than baseline methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.05096

PDF

https://arxiv.org/pdf/1801.05096
Read All
What Level of Quality can Neural Machine Translation Attain on Literary Text?

2018-01-15

Antonio Toral, Andy Way

arXiv_CL

arXiv_CL NMT
Abstract

Given the rise of a new approach to MT, Neural MT (NMT), and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text. Specifically, we target novels, arguably the most popular type of literary text. We build a literary-adapted NMT system for the English-to-Catalan translation direction and evaluate it against a system pertaining to the previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this end, for the first time we train MT systems, both NMT and PBSMT, on large amounts of literary text (over 100 million words) and evaluate them on a set of twelve widely known novels spanning from the the 1920s to the present day. According to the BLEU automatic evaluation metric, NMT is significantly better than PBSMT (p < 0.01) on all the novels considered. Overall, NMT results in a 11% relative improvement (3 points absolute) over PBSMT. A complementary human evaluation on three of the books shows that between 17% and 34% of the translations, depending on the book, produced by NMT (versus 8% and 20% with PBSMT) are perceived by native speakers of the target language to be of equivalent quality to translations produced by a professional human translator.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.04962

PDF

https://arxiv.org/pdf/1801.04962
Read All
Unsupervised Cipher Cracking Using Discrete GANs

2018-01-15

Aidan N. Gomez, Sicong Huang, Ivan Zhang, Bryan M. Li, Muhammad Osama, Lukasz Kaiser

arXiv_CV

arXiv_CV GAN
Abstract

This work details CipherGAN, an architecture inspired by CycleGAN used for inferring the underlying cipher mapping given banks of unpaired ciphertext and plaintext. We demonstrate that CipherGAN is capable of cracking language data enciphered using shift and Vigenere ciphers to a high degree of fidelity and for vocabularies much larger than previously achieved. We present how CycleGAN can be made compatible with discrete data and train in a stable way. We then prove that the technique used in CipherGAN avoids the common problem of uninformative discrimination associated with GANs applied to discrete data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.04883

PDF

https://arxiv.org/pdf/1801.04883
Read All
Top k Memory Candidates in Memory Networks for Common Sense Reasoning

2018-01-14

Vatsal Mahajan

arXiv_CV

arXiv_CV Knowledge Inference Memory_Networks
Abstract

Successful completion of reasoning task requires the agent to have relevant prior knowledge or some given context of the world dynamics. Usually, the information provided to the system for a reasoning task is just the query or some supporting story, which is often not enough for common reasoning tasks. The goal here is that, if the information provided along the question is not sufficient to correctly answer the question, the model should choose k most relevant documents that can aid its inference process. In this work, the model dynamically selects top k most relevant memory candidates that can be used to successfully solve reasoning tasks. Experiments were conducted on a subset of Winograd Schema Challenge (WSC) problems to show that the proposed model has the potential for commonsense reasoning. The WSC is a test of machine intelligence, designed to be an improvement on the Turing test.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.04622

PDF

https://arxiv.org/pdf/1801.04622
Read All
Gradient descent GAN optimization is locally stable

2018-01-13

Vaishnavh Nagarajan, J. Zico Kolter

arXiv_CV

arXiv_CV Regularization Adversarial GAN Optimization Gradient_Descent
Abstract

Despite the growing prominence of generative adversarial networks (GANs), optimization in GANs is still a poorly understood topic. In this paper, we analyze the “gradient descent” form of GAN optimization i.e., the natural setting where we simultaneously take small gradient steps in both generator and discriminator parameters. We show that even though GAN optimization does not correspond to a convex-concave game (even for simple parameterizations), under proper conditions, equilibrium points of this optimization procedure are still \emph{locally asymptotically stable} for the traditional GAN formulation. On the other hand, we show that the recently proposed Wasserstein GAN can have non-convergent limit cycles near equilibrium. Motivated by this stability analysis, we propose an additional regularization term for gradient descent GAN updates, which \emph{is} able to guarantee local stability for both the WGAN and the traditional GAN, and also shows practical promise in speeding up convergence and addressing mode collapse.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.04156

PDF

https://arxiv.org/pdf/1706.04156
Read All
MOF-BC: A Memory Optimized and Flexible BlockChain for Large Scale Networks

2018-01-13

Ali Dorri, Salil S. Kanhere, Raja Jurdak

arXiv_CV

arXiv_CV
Abstract

BlockChain (BC) immutability ensures BC resilience against modification or removal of the stored data. In large scale networks like the Internet of Things (IoT), however, this feature significantly increases BC storage size and raises privacy challenges. In this paper, we propose a Memory Optimized and Flexible BC (MOF-BC) that enables the IoT users and service providers to remove or summarize their transactions and age their data and to exercise the “right to be forgotten”. To increase privacy, a user may employ multiple keys for different transactions. To allow for the removal of stored transactions, all keys would need to be stored which complicates key management and storage. MOF-BC introduces the notion of a Generator Verifier (GV) which is a signed hash of a Generator Verifier Secret (GVS). The GV changes for each transaction to provide privacy yet is signed by a unique key, thus minimizing the information that needs to be stored. A flexible transaction fee model and a reward mechanism is proposed to incentivize users to participate in optimizing memory consumption. Qualitative security and privacy analysis demonstrates that MOF-BC is resilient against several security attacks. Evaluation results show that MOF-BC decreases BC memory consumption by up to 25\% and the user cost by more than two orders of magnitude compared to conventional BC instantiations.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.04416

PDF

https://arxiv.org/pdf/1801.04416
Read All
SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

2018-01-13

Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska

arXiv_CV

arXiv_CV Optimization Deep_Learning
Abstract

Going deeper and wider in neural architectures improves the accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, \textit{Liveness Analysis}, \textit{Unified Tensor Pool}, and \textit{Cost-Aware Recomputation}, all together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in those memory saving techniques. Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have demonstrated that SuperNeurons trains at least 3.2432 deeper network than current ones with the leading performance. Particularly, SuperNeurons can train ResNet2500 that has $10^4$ basic network layers on a 12GB K40c.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.04380

PDF

https://arxiv.org/pdf/1801.04380
Read All
MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

2018-01-12

Fen Xiao, Wenzheng Deng, Liangchan Peng, Chunhong Cao, Kai Hu, Xieping Gao

arXiv_CV

arXiv_CV Salient Object_Detection Attention CNN Deep_Learning Detection
Abstract

Salient object detection is a fundamental problem and has been received a great deal of attentions in computer vision. Recently deep learning model became a powerful tool for image feature extraction. In this paper, we propose a multi-scale deep neural network (MSDNN) for salient object detection. The proposed model first extracts global high-level features and context information over the whole source image with recurrent convolutional neural network (RCNN). Then several stacked deconvolutional layers are adopted to get the multi-scale feature representation and obtain a series of saliency maps. Finally, we investigate a fusion convolution module (FCM) to build a final pixel level saliency map. The proposed model is extensively evaluated on four salient object detection benchmark datasets. Results show that our deep model significantly outperforms other 12 state-of-the-art approaches.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.04187

PDF

https://arxiv.org/pdf/1801.04187
Read All
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

2018-01-12

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, Sepp Hochreiter

arXiv_CV

arXiv_CV Adversarial GAN Optimization Gradient_Descent
Abstract

Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not been proved. We propose a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions. TTUR has an individual learning rate for both the discriminator and the generator. Using the theory of stochastic approximation, we prove that the TTUR converges under mild assumptions to a stationary local Nash equilibrium. The convergence carries over to the popular Adam optimization, for which we prove that it follows the dynamics of a heavy ball with friction and thus prefers flat minima in the objective landscape. For the evaluation of the performance of GANs at image generation, we introduce the “Fréchet Inception Distance” (FID) which captures the similarity of generated images to real ones better than the Inception Score. In experiments, TTUR improves learning for DCGANs and Improved Wasserstein GANs (WGAN-GP) outperforming conventional GAN training on CelebA, CIFAR-10, SVHN, LSUN Bedrooms, and the One Billion Word Benchmark.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.08500

PDF

https://arxiv.org/pdf/1706.08500
Read All
Online Detection of Effectively Callback Free Objects with Applications to Smart Contracts

2018-01-12

Shelly Grossman, Ittai Abraham, Guy Golan-Gueta, Yan Michalevsky, Noam Rinetzky, Mooly Sagiv, Yoni Zohar

arXiv_CV

arXiv_CV Detection
Abstract

Callbacks are essential in many programming environments, but drastically complicate program understanding and reasoning because they allow to mutate object’s local states by external objects in unexpected fashions, thus breaking modularity. The famous DAO bug in the cryptocurrency framework Ethereum, employed callbacks to steal $150M. We define the notion of Effectively Callback Free (ECF) objects in order to allow callbacks without preventing modular reasoning. An object is ECF in a given execution trace if there exists an equivalent execution trace without callbacks to this object. An object is ECF if it is ECF in every possible execution trace. We study the decidability of dynamically checking ECF in a given execution trace and statically checking if an object is ECF. We also show that dynamically checking ECF in Ethereum is feasible and can be done online. By running the history of all execution traces in Ethereum, we were able to verify that virtually all existing contracts, excluding the DAO or contracts with similar known vulnerabilities, are ECF. Finally, we show that ECF, whether it is verified dynamically or statically, enables modular reasoning about objects with encapsulated state.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.04032

PDF

https://arxiv.org/pdf/1801.04032
Read All
DeepSeek: Content Based Image Search & Retrieval

2018-01-11

Tanya Piplani, David Bamman

arXiv_CV

arXiv_CV Image_Caption Face Caption Deep_Learning Language_Model
Abstract

Most of the internet today is composed of digital media that includes videos and images. With pixels becoming the currency in which most transactions happen on the internet, it is becoming increasingly important to have a way of browsing through this ocean of information with relative ease. YouTube has 400 hours of video uploaded every minute and many million images are browsed on Instagram, Facebook, etc. Inspired by recent advances in the field of deep learning and success that it has gained on various problems like image captioning and, machine translation , word2vec , skip thoughts, etc, we present DeepSeek a natural language processing based deep learning model that allows users to enter a description of the kind of images that they want to search, and in response the system retrieves all the images that semantically and contextually relate to the query. Two approaches are described in the following sections.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.03406

PDF

https://arxiv.org/pdf/1801.03406
Read All
Improved English to Russian Translation by Neural Suffix Prediction

2018-01-11

Kai Song, Yue Zhang, Min Zhang, Weihua Luo

arXiv_CL

arXiv_CL NMT Prediction
Abstract

Neural machine translation (NMT) suffers a performance deficiency when a limited vocabulary fails to cover the source or target side adequately, which happens frequently when dealing with morphologically rich languages. To address this problem, previous work focused on adjusting translation granularity or expanding the vocabulary size. However, morphological information is relatively under-considered in NMT architectures, which may further improve translation quality. We propose a novel method, which can not only reduce data sparsity but also model morphology through a simple but effective mechanism. By predicting the stem and suffix separately during decoding, our system achieves an improvement of up to 1.98 BLEU compared with previous work on English to Russian translation. Our method is orthogonal to different NMT architectures and stably gains improvements on various domains.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.03615

PDF

https://arxiv.org/pdf/1801.03615
Read All
From Superpixel to Human Shape Modelling for Carried Object Detection

2018-01-10

Farnoosh Ghadiri, Robert Bergevin, Guillaume-Alexandre Bilodeau

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Detecting carried objects is one of the requirements for developing systems to reason about activities involving people and objects. We present an approach to detect carried objects from a single video frame with a novel method that incorporates features from multiple scales. Initially, a foreground mask in a video frame is segmented into multi-scale superpixels. Then the human-like regions in the segmented area are identified by matching a set of extracted features from superpixels against learned features in a codebook. A carried object probability map is generated using the complement of the matching probabilities of superpixels to human-like regions and background information. A group of superpixels with high carried object probability and strong edge support is then merged to obtain the shape of the carried object. We applied our method to two challenging datasets, and results show that our method is competitive with or better than the state-of-the-art.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.03551

PDF

https://arxiv.org/pdf/1801.03551
Read All
Translating Pro-Drop Languages with Reconstruction Models

2018-01-10

Longyue Wang, Zhaopeng Tu, Shuming Shi, Tong Zhang, Yvette Graham, Qun Liu

arXiv_CL

arXiv_CL Attention NMT
Abstract

Pronouns are frequently omitted in pro-drop languages, such as Chinese, generally leading to significant challenges with respect to the production of complete translations. To date, very little attention has been paid to the dropped pronoun (DP) problem within neural machine translation (NMT). In this work, we propose a novel reconstruction-based approach to alleviating DP translation problems for NMT models. Firstly, DPs within all source sentences are automatically annotated with parallel information extracted from the bilingual training corpus. Next, the annotated source sentence is reconstructed from hidden representations in the NMT model. With auxiliary training objectives, in terms of reconstruction scores, the parameters associated with the NMT model are guided to produce enhanced hidden representations that are encouraged as much as possible to embed annotated DP information. Experimental results on both Chinese-English and Japanese-English dialogue translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT baseline, which is directly built on the training data annotated with DPs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.03257

PDF

https://arxiv.org/pdf/1801.03257
Read All
Self-Assembled formation of long, thin, and uncoalesced GaN nanowires on crystalline TiN films

2018-01-09

David van Treeck, Gabriele Calabrese, Jelle J. W. Goertz, Vladimir M. Kaganer, Oliver Brandt, Sergio Fernández-Garrido, Lutz Geelhaar

arXiv_CV

arXiv_CV GAN Face
Abstract

We investigate in detail the self-assembled nucleation and growth of GaN nanowires by molecular beam epitaxy on crystalline TiN films. We demonstrate that this type of substrate allows the growth of long and thin GaN nanowires that do not suffer from coalescence, which is in contrast to the growth on Si and other substrates. Only beyond a certain nanowire length that depends on the nanowire number density and exceeds here 1.5 {\mu}m, coalescence takes place by bundling, i.e. the same process as on Si. By analyzing the nearest neighbor distance distribution, we identify diffusion-induced repulsion of neighboring nanowires as the main mechanism limiting the nanowire number density during nucleation on TiN. Since on Si the final number density is determined by shadowing of the impinging molecular beams by existing nanowires, it is the difference in adatom surface diffusion that enables on TiN the formation of nanowire ensembles with reduced number density. These nanowire ensembles combine properties that make them a promising basis for the growth of core-shell heterostructures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.02966

PDF

https://arxiv.org/pdf/1801.02966
Read All
Synthetic Data Augmentation using GAN for Improved Liver Lesion Classification

2018-01-08

Maayan Frid-Adar, Eyal Klang, Michal Amitai, Jacob Goldberger, Hayit Greenspan

arXiv_CV

arXiv_CV Adversarial GAN Classification
Abstract

In this paper, we present a data augmentation method that generates synthetic medical images using Generative Adversarial Networks (GANs). We propose a training scheme that first uses classical data augmentation to enlarge the training set and then further enlarges the data size and its diversity by applying GAN techniques for synthetic data augmentation. Our method is demonstrated on a limited dataset of computed tomography (CT) images of 182 liver lesions (53 cysts, 64 metastases and 65 hemangiomas). The classification performance using only classic data augmentation yielded 78.6% sensitivity and 88.4% specificity. By adding the synthetic data augmentation the results significantly increased to 85.7% sensitivity and 92.4% specificity.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.02385

PDF

https://arxiv.org/pdf/1801.02385
Read All
Morphology dependent surface properties of nanostructured GaN films grown by molecular beam epitaxy

2018-01-08

Abhijit Chatterjee, S. P. Swathi, S.M. Shivaprasad

arXiv_CV

arXiv_CV GAN Face
Abstract

The effect of film morphology on its surface chemistry and band structure has been analyzed for gallium nitride epitaxial films grown by molecular beam epitaxy. The film morphology has been studied using scanning electron microscopy and atomic force microscopy, and the bandstructure, defect and emission properties have been studied by X ray photoelectron spectroscopy and cathodoluminescence spectroscopy. It was found that the highly porous GaN nanowall network shows the highest relative conductivity and does not have defect related luminescence. The flatter films were more resistive and showed yellow luminescence, due to Ga vacancies. GaN nanowall network exhibited a Fermi level pinning at (1.8 $\pm$ 0.2) eV above valence band maximum, suggesting the presence of a Ga adlayer on the surface of GaN nanowall network. Ar ion sputtering was found to preferentially sputter N atoms leading to surface metallization.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.02374

PDF

https://arxiv.org/pdf/1801.02374
Read All
Approximate FPGA-based LSTMs under Computation Time Constraints

2018-01-07

Michalis Rizakis, Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis

arXiv_CV

arXiv_CV Image_Caption Caption RNN Quantitative
Abstract

Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art accuracy in several emerging Artificial Intelligence tasks. However, the models are becoming increasingly demanding in terms of computational and memory load. Emerging latency-sensitive applications including mobile robots and autonomous vehicles often operate under stringent computation time constraints. In this paper, we address the challenge of deploying computationally demanding LSTMs at a constrained time budget by introducing an approximate computing scheme that combines iterative low-rank compression and pruning, along with a novel FPGA-based LSTM architecture. Combined in an end-to-end framework, the approximation method’s parameters are optimised and the architecture is configured to address the problem of high-performance LSTM execution in time-constrained applications. Quantitative evaluation on a real-life image captioning application indicates that the proposed methods required up to 6.5x less time to achieve the same application-level accuracy compared to a baseline method, while achieving an average of 25x higher accuracy under the same computation time constraints.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.02190

PDF

https://arxiv.org/pdf/1801.02190
Read All
Improving utility of brain tumor confocal laser endomicroscopy: objective value assessment and diagnostic frame detection with convolutional neural networks

2018-01-06

Mohammadhassan Izadyyazdanabadi, Evgenii Belykh, Nikolay Martirosyan, Jennifer Eschbacher, Peter Nakaji, Yezhou Yang, Mark C. Preul

arXiv_CV

arXiv_CV CNN Detection
Abstract

Confocal laser endomicroscopy (CLE), although capable of obtaining images at cellular resolution during surgery of brain tumors in real time, creates as many non-diagnostic as diagnostic images. Non-useful images are often distorted due to relative motion between probe and brain or blood artifacts. Many images, however, simply lack diagnostic features immediately informative to the physician. Examining all the hundreds or thousands of images from a single case to discriminate diagnostic images from nondiagnostic ones can be tedious. Providing a real-time diagnostic value assessment of images (fast enough to be used during the surgical acquisition process and accurate enough for the pathologist to rely on) to automatically detect diagnostic frames would streamline the analysis of images and filter useful images for the pathologist/surgeon. We sought to automatically classify images as diagnostic or non-diagnostic. AlexNet, a deep-learning architecture, was used in a 4-fold cross validation manner. Our dataset includes 16,795 images (8572 nondiagnostic and 8223 diagnostic) from 74 CLE-aided brain tumor surgery patients. The ground truth for all the images is provided by the pathologist. Average model accuracy on test data was 91% overall (90.79 % accuracy, 90.94 % sensitivity and 90.87 % specificity). To evaluate the model reliability we also performed receiver operating characteristic (ROC) analysis yielding 0.958 average for the area under ROC curve (AUC). These results demonstrate that a deeply trained AlexNet network can achieve a model that reliably and quickly recognizes diagnostic CLE images.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.02101

PDF

https://arxiv.org/pdf/1801.02101
Read All
GeoSeq2Seq: Information Geometric Sequence-to-Sequence Networks

2018-01-05

Alessandro Bay, Biswa Sengupta

arXiv_CV

arXiv_CV Image_Caption Caption Embedding RNN
Abstract

The Fisher information metric is an important foundation of information geometry, wherein it allows us to approximate the local geometry of a probability distribution. Recurrent neural networks such as the Sequence-to-Sequence (Seq2Seq) networks that have lately been used to yield state-of-the-art performance on speech translation or image captioning have so far ignored the geometry of the latent embedding, that they iteratively learn. We propose the information geometric Seq2Seq (GeoSeq2Seq) network which abridges the gap between deep recurrent neural networks and information geometry. Specifically, the latent embedding offered by a recurrent network is encoded as a Fisher kernel of a parametric Gaussian Mixture Model, a formalism common in computer vision. We utilise such a network to predict the shortest routes between two nodes of a graph by learning the adjacency matrix using the GeoSeq2Seq formalism; our results show that for such a problem the probabilistic representation of the latent embedding supersedes the non-probabilistic embedding by 10-15\%.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.09363

PDF

https://arxiv.org/pdf/1710.09363
Read All
Dual Long Short-Term Memory Networks for Sub-Character Representation Learning

2018-01-04

Han He, Lei Wu, Xiaokun Yang, Hua Yan, Zhimin Gao, Yi Feng, George Townsend

arXiv_CV

arXiv_CV Knowledge Segmentation Embedding Represenation_Learning RNN Memory_Networks
Abstract

Characters have commonly been regarded as the minimal processing unit in Natural Language Processing (NLP). But many non-latin languages have hieroglyphic writing systems, involving a big alphabet with thousands or millions of characters. Each character is composed of even smaller parts, which are often ignored by the previous work. In this paper, we propose a novel architecture employing two stacked Long Short-Term Memory Networks (LSTMs) to learn sub-character level representation and capture deeper level of semantic meanings. To build a concrete study and substantiate the efficiency of our neural architecture, we take Chinese Word Segmentation as a research case example. Among those languages, Chinese is a typical case, for which every character contains several components called radicals. Our networks employ a shared radical level embedding to solve both Simplified and Traditional Chinese Word Segmentation, without extra Traditional to Simplified Chinese conversion, in such a highly end-to-end way the word segmentation can be significantly simplified compared to the previous work. Radical level embeddings can also capture deeper semantic meaning below character level and improve the system performance of learning. By tying radical and character embeddings together, the parameter count is reduced whereas semantic knowledge is shared and transferred between two levels, boosting the performance largely. On 3 out of 4 Bakeoff 2005 datasets, our method surpassed state-of-the-art results by up to 0.4%. Our results are reproducible, source codes and corpora are available on GitHub.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.08841

PDF

https://arxiv.org/pdf/1712.08841
Read All
Spot the Difference by Object Detection

2018-01-03

Junhui Wu, Yun Ye, Yu Chen, Zhi Weng

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

In this paper, we propose a simple yet effective solution to a change detection task that detects the difference between two images, which we call “spot the difference”. Our approach uses CNN-based object detection by stacking two aligned images as input and considering the differences between the two images as objects to detect. An early-merging architecture is used as the backbone network. Our method is accurate, fast and robust while using very cheap annotation. We verify the proposed method on the task of change detection between the digital design and its photographic image of a book. Compared to verification based methods, our object detection based method outperforms other methods by a large margin and gives extra information of location. We compress the network and achieve 24 times acceleration while keeping the accuracy. Besides, as we synthesize the training data for detection using weakly labeled images, our method does not need expensive bounding box annotation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.01051

PDF

https://arxiv.org/pdf/1801.01051
Read All
Single-Shot Refinement Neural Network for Object Detection

2018-01-03

Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. Li

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

For object detection, the two-stage approach (e.g., Faster R-CNN) has been achieving the highest accuracy, whereas the one-stage approach (e.g., SSD) has the advantage of high efficiency. To inherit the merits of both while overcoming their disadvantages, in this paper, we propose a novel single-shot based detector, called RefineDet, that achieves better accuracy than two-stage methods and maintains comparable efficiency of one-stage methods. RefineDet consists of two inter-connected modules, namely, the anchor refinement module and the object detection module. Specifically, the former aims to (1) filter out negative anchors to reduce search space for the classifier, and (2) coarsely adjust the locations and sizes of anchors to provide better initialization for the subsequent regressor. The latter module takes the refined anchors as the input from the former to further improve the regression and predict multi-class label. Meanwhile, we design a transfer connection block to transfer the features in the anchor refinement module to predict locations, sizes and class labels of objects in the object detection module. The multi-task loss function enables us to train the whole network in an end-to-end way. Extensive experiments on PASCAL VOC 2007, PASCAL VOC 2012, and MS COCO demonstrate that RefineDet achieves state-of-the-art detection accuracy with high efficiency. Code is available at this https URL

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.06897

PDF

https://arxiv.org/pdf/1711.06897
Read All
Exploring Architectures, Data and Units For Streaming End-to-End Speech Recognition with RNN-Transducer

2018-01-02

Kanishka Rao, Haşim Sak, Rohit Prabhavalkar

arXiv_CV

arXiv_CV Speech_Recognition RNN Classification Language_Model Recognition
Abstract

We investigate training end-to-end speech recognition models with the recurrent neural network transducer (RNN-T): a streaming, all-neural, sequence-to-sequence architecture which jointly learns acoustic and language model components from transcribed acoustic data. We explore various model architectures and demonstrate how the model can be improved further if additional text or pronunciation data are available. The model consists of an encoder', which is initialized from a connectionist temporal classification-based (CTC) acoustic model, and a decoder’ which is partially initialized from a recurrent neural network language model trained on text data alone. The entire neural network is trained with the RNN-T loss and directly outputs the recognized transcript as a sequence of graphemes, thus performing end-to-end speech recognition. We find that performance can be improved further through the use of sub-word units (`wordpieces’) which capture longer context and significantly reduce substitution errors. The best RNN-T system, a twelve-layer LSTM encoder with a two-layer LSTM decoder trained with 30,000 wordpieces as output targets achieves a word error rate of 8.5\% on voice-search and 5.2\% on voice-dictation tasks and is comparable to a state-of-the-art baseline at 8.3\% on voice-search and 5.4\% voice-dictation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.00841

PDF

https://arxiv.org/pdf/1801.00841
Read All
High Performance Architecture for Flow-Table Lookup in SDN on FPGA

2018-01-02

Rashid Hatamia, Hossein Bahramgiria, Ahmad Khonsari

arXiv_CV

arXiv_CV
Abstract

We propose Range-based Ternary Search Tree (RTST), a tree-based approach for flow-table lookup in SDN network. RTST builds upon flow-tables in SDN switches to provide a fast lookup among flows. We present a parallel multi-pipeline architecture for implementing RTST that benefits from high throughput and low latency. The proposed RTST and architecture achieve a memory efficiency of 1 byte of memory for each byte of flow. We also present a set of techniques to support dynamic updates. Experimental results show that RTST can be used to improve the performance of flow-lookup. It achieves a throughput of 670 Million Packets Per Second (MPPS), for a 1 K 15-tuple flow-table, on a state-of-the-art FPGA.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.00840

PDF

https://arxiv.org/pdf/1801.00840
Read All
5G Millimeter Wave Cellular System Capacity with Fully Digital Beamforming

2018-01-02

Sourjya Dutta, C.Nicolas Barati, Aditya Dhananjay, Sundeep Rangan

arXiv_CV

arXiv_CV
Abstract

Due to heavy reliance of millimeter-wave (mmWave) wireless systems on directional links, Beamforming (BF) with high-dimensional arrays is essential for cellular systems in these frequencies. How to perform the array processing in a power efficient manner is a fundamental challenge. Analog and hybrid BF require fewer analog-to-digital converters (ADCs), but can only communicate in a small number of directions at a time,limiting directional search, spatial multiplexing and control signaling. Digital BF enables flexible spatial processing, but must be operated at a low quantization resolution to stay within reasonable power levels. This paper presents a simple additive white Gaussian noise (AWGN) model to assess the effect of low resolution quantization of cellular system capacity. Simulations with this model reveal that at moderate resolutions (3-4 bits per ADC), there is negligible loss in downlink cellular capacity from quantization. In essence, the low-resolution ADCs limit the high SNR, where cellular systems typically do not operate. The findings suggest that low-resolution fully digital BF architectures can be power efficient, offer greatly enhanced control plane functionality and comparable data plane performance to analog BF.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.02586

PDF

https://arxiv.org/pdf/1711.02586
Read All
Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs

2017-12-30

Xing Di, Vishal M. Patel

arXiv_CV

arXiv_CV Adversarial GAN Face CNN
Abstract

Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original problem as a stage-wise learning problem. We first synthesize the facial sketch corresponding to the visual attributes and then we reconstruct the face image based on the synthesized sketch. The proposed Attribute2Sketch2Face framework, which is based on a combination of deep Conditional Variational Autoencoder (CVAE) and Generative Adversarial Networks (GANs), consists of three stages: (1) Synthesis of facial sketch from attributes using a CVAE architecture, (2) Enhancement of coarse sketches to produce sharper sketches using a GAN-based framework, and (3) Synthesis of face from sketch using another GAN-based network. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed attribute-based three stage face synthesis method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.00077

PDF

https://arxiv.org/pdf/1801.00077
Read All
Multi-timescale memory dynamics in a reinforcement learning network with attention-gated memory

2017-12-28

Marco Martinolli, Wulfram Gerstner, Aditya Gilra

arXiv_CV

arXiv_CV Attention Reinforcement_Learning Relation
Abstract

Learning and memory are intertwined in our brain and their relationship is at the core of several recent neural network models. In particular, the Attention-Gated MEmory Tagging model (AuGMEnT) is a reinforcement learning network with an emphasis on biological plausibility of memory dynamics and learning. We find that the AuGMEnT network does not solve some hierarchical tasks, where higher-level stimuli have to be maintained over a long time, while lower-level stimuli need to be remembered and forgotten over a shorter timescale. To overcome this limitation, we introduce hybrid AuGMEnT, with leaky or short-timescale and non-leaky or long-timescale units in memory, that allow to exchange lower-level information while maintaining higher-level one, thus solving both hierarchical and distractor tasks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.10062

PDF

https://arxiv.org/pdf/1712.10062
Read All
Consensus-based Sequence Training for Video Captioning

2017-12-27

Sang Phan, Gustav Eje Henter, Yusuke Miyao, Shin'ichi Satoh

arXiv_CV

arXiv_CV Video_Caption Reinforcement_Learning Caption
Abstract

Captioning models are typically trained using the cross-entropy loss. However, their performance is evaluated on other metrics designed to better correlate with human assessments. Recently, it has been shown that reinforcement learning (RL) can directly optimize these metrics in tasks such as captioning. However, this is computationally costly and requires specifying a baseline reward at each step to make training converge. We propose a fast approach to optimize one’s objective of interest through the REINFORCE algorithm. First we show that, by replacing model samples with ground-truth sentences, RL training can be seen as a form of weighted cross-entropy loss, giving a fast, RL-based pre-training algorithm. Second, we propose to use the consensus among ground-truth captions of the same video as the baseline reward. This can be computed very efficiently. We call the complete proposal Consensus-based Sequence Training (CST). Applied to the MSRVTT video captioning benchmark, our proposals train significantly faster than comparable methods and establish a new state-of-the-art on the task, improving the CIDEr score from 47.3 to 54.2.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.09532

PDF

https://arxiv.org/pdf/1712.09532
Read All
Modeling Past and Future for Neural Machine Translation

2017-12-26

Zaixiang Zheng, Hao Zhou, Shujian Huang, Lili Mou, Xinyu Dai, Jiajun Chen, Zhaopeng Tu

arXiv_CL

arXiv_CL Knowledge Attention NMT
Abstract

Existing neural machine translation systems do not explicitly model what has been translated and what has not during the decoding phase. To address this problem, we propose a novel mechanism that separates the source information into two parts: translated Past contents and untranslated Future contents, which are modeled by two additional recurrent layers. The Past and Future contents are fed to both the attention model and the decoder states, which offers NMT systems the knowledge of translated and untranslated contents. Experimental results show that the proposed approach significantly improves translation performance in Chinese-English, German-English and English-German translation tasks. Specifically, the proposed model outperforms the conventional coverage model in both of the translation quality and the alignment error rate.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.09502

PDF

https://arxiv.org/pdf/1711.09502
Read All
REDBEE: A Visual-Inertial Drone System for Real-Time Moving Object Detection

2017-12-26

Chong Huang, Peng Chen, Xin Yang, Kwang-Ting (Tim)Cheng

arXiv_CV

arXiv_CV Object_Detection Drone Detection
Abstract

Aerial surveillance and monitoring demand both real-time and robust motion detection from a moving camera. Most existing techniques for drones involve sending a video data streams back to a ground station with a high-end desktop computer or server. These methods share one major drawback: data transmission is subjected to considerable delay and possible corruption. Onboard computation can not only overcome the data corruption problem but also increase the range of motion. Unfortunately, due to limited weight-bearing capacity, equipping drones with computing hardware of high processing capability is not feasible. Therefore, developing a motion detection system with real-time performance and high accuracy for drones with limited computing power is highly desirable. In this paper, we propose a visual-inertial drone system for real-time motion detection, namely REDBEE, that helps overcome challenges in shooting scenes with strong parallax and dynamic background. REDBEE, which can run on the state-of-the-art commercial low-power application processor (e.g. Snapdragon Flight board used for our prototype drone), achieves real-time performance with high detection accuracy. The REDBEE system overcomes obstacles in shooting scenes with strong parallax through an inertial-aided dual-plane homography estimation; it solves the issues in shooting scenes with dynamic background by distinguishing the moving targets through a probabilistic model based on spatial, temporal, and entropy consistency. The experiments are presented which demonstrate that our system obtains greater accuracy when detecting moving targets in outdoor environments than the state-of-the-art real-time onboard detection systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.09162

PDF

https://arxiv.org/pdf/1712.09162
Read All
Improved Training of Wasserstein GANs

2017-12-25

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville

arXiv_CV

arXiv_CV Adversarial GAN Language_Model
Abstract

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1704.00028

PDF

https://arxiv.org/pdf/1704.00028
Read All
HelPal: A Search System for Mobile Crowd Service

2017-12-25

Yao Wu, Tianzhen Wu, Ziyi Xiong, Yuncheng Wu, Hong Chen, Cuiping Li, Xiaoying Zhang

arXiv_CV

arXiv_CV Face Tracking
Abstract

Proliferation of ubiquitous mobile devices makes location based services prevalent. Mobile users are able to volunteer as providers of specific services and in the meanwhile to search these services. For example, drivers may be interested in tracking available nearby users who are willing to help with motor repair or are willing to provide travel directions or first aid. With the diffusion of mobile users, it is necessary to provide scalable means of enabling such users to connect with other nearby users so that they can help each other with specific services. Motivated by these observations, we design and implement a general location based system HelPal for mobile users to provide and enjoy instant service, which is called mobile crowd service. In this demo, we introduce a mobile crowd service system featured with several novel techniques. We sketch the system architecture and illustrate scenarios via several cases. Demonstration shows the user-friendly search interface for users to conveniently find skilled and qualified nearby service providers.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.09010

PDF

https://arxiv.org/pdf/1712.09010
Read All
Change points, memory and epidemic spreading in temporal networks

2017-12-24

Tiago P. Peixoto, Laetitia Gauvin

arXiv_CV

arXiv_CV
Abstract

Dynamic networks exhibit temporal patterns that vary across different time scales, all of which can potentially affect processes that take place on the network. However, most data-driven approaches used to model time-varying networks attempt to capture only a single characteristic time scale in isolation — typically associated with the short-time memory of a Markov chain or with long-time abrupt changes caused by external or systemic events. Here we propose a unified approach to model both aspects simultaneously, detecting short and long-time behaviors of temporal networks. We do so by developing an arbitrary-order mixed Markov model with change points, and using a nonparametric Bayesian formulation that allows the Markov order and the position of change points to be determined from data without overfitting. In addition, we evaluate the quality of the multiscale model in its capacity to reproduce the spreading of epidemics on the temporal network, and we show that describing multiple time scales simultaneously has a synergistic effect, where statistically significant features are uncovered that otherwise would remain hidden by treating each time scale independently.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.08948

PDF

https://arxiv.org/pdf/1712.08948
Read All
Texture Object Segmentation Based on Affine Invariant Texture Detection

2017-12-23

Jianwei Zhang, Xu Chen, Xuezhong Xiao

arXiv_CV

arXiv_CV Segmentation Detection
Abstract

To solve the issue of segmenting rich texture images, a novel detection methods based on the affine invariable principle is proposed. Considering the similarity between the texture areas, we first take the affine transform to get numerous shapes, and utilize the KLT algorithm to verify the similarity. The transforms include rotation, proportional transformation and perspective deformation to cope with a variety of situations. Then we propose an improved LBP method combining canny edge detection to handle the boundary in the segmentation process. Moreover, human-computer interaction of this method which helps splitting the matched texture area from the original images is user-friendly.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.08776

PDF

https://arxiv.org/pdf/1712.08776
Read All
SFCN-OPI: Detection and Fine-grained Classification of Nuclei Using Sibling FCN with Objectness Prior Interaction

2017-12-22

Yanning Zhou, Qi Dou, Hao Chen, Jing Qin, Pheng-Ann Heng

arXiv_CV

arXiv_CV CNN Classification Detection
Abstract

Cell nuclei detection and fine-grained classification have been fundamental yet challenging problems in histopathology image analysis. Due to the nuclei tiny size, significant inter-/intra-class variances, as well as the inferior image quality, previous automated methods would easily suffer from limited accuracy and robustness. In the meanwhile, existing approaches usually deal with these two tasks independently, which would neglect the close relatedness of them. In this paper, we present a novel method of sibling fully convolutional network with prior objectness interaction (called SFCN-OPI) to tackle the two tasks simultaneously and interactively using a unified end-to-end framework. Specifically, the sibling FCN branches share features in earlier layers while holding respective higher layers for specific tasks. More importantly, the detection branch outputs the objectness prior which dynamically interacts with the fine-grained classification sibling branch during the training and testing processes. With this mechanism, the fine-grained classification successfully focuses on regions with high confidence of nuclei existence and outputs the conditional probability, which in turn benefits the detection through back propagation. Extensive experiments on colon cancer histology images have validated the effectiveness of our proposed SFCN-OPI and our method has outperformed the state-of-the-art methods by a large margin.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.08297

PDF

https://arxiv.org/pdf/1712.08297
Read All
Phonon transport unveils the prevalent point defects in GaN

2017-12-21

Ankita Katre, Jesús Carrete, Tao Wang, Georg K. H. Madsen, Natalio Mingo

arXiv_CV

arXiv_CV GAN
Abstract

Determining the types and concentrations of vacancies present in intentionally doped GaN is a notoriously difficult and long-debated problem. Here we use an unconventional approach, based on thermal transport modeling, to determine the prevalence of vacancies in previous measurements. This allows us to provide conclusive evidence of the recent hypothesis that gallium vacancies in ammonothermally grown samples can be complexed with hydrogen. Our calculations for O-doped and Mg-O co-doped samples yield a consistent picture interlinking dopant and vacancy concentration, carrier density, and thermal conductivity, in excellent agreement with experimental measurements. These results also highlight the predictive power of ab initio phonon transport modeling, and its value for understanding and quantifying defects in semiconductors.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.08124

PDF

https://arxiv.org/pdf/1712.08124
Read All
Memory-induced mechanism for self-sustaining activity in networks

2017-12-21

A. E. Allahverdyan, G. Ver Steeg, A. Galstyan

arXiv_CV

arXiv_CV Attention
Abstract

We study a mechanism of activity sustaining on networks inspired by a well-known model of neuronal dynamics. Our primary focus is the emergence of self-sustaining collective activity patterns, where no single node can stay active by itself, but the activity provided initially is sustained within the collective of interacting agents. In contrast to existing models of self-sustaining activity that are caused by (long) loops present in the network, here we focus on tree–like structures and examine activation mechanisms that are due to temporal memory of the nodes. This approach is motivated by applications in social media, where long network loops are rare or absent. Our results suggest that under a weak behavioral noise, the nodes robustly split into several clusters, with partial synchronization of nodes within each cluster. We also study the randomly-weighted version of the models where the nodes are allowed to change their connection strength (this can model attention redistribution), and show that it does facilitate the self-sustained activity.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.07844

PDF

https://arxiv.org/pdf/1712.07844
Read All
Exploring Models and Data for Remote Sensing Image Caption Generation

2017-12-21

Xiaoqiang Lu, Binqiang Wang, Xiangtao Zheng, Xuelong Li

arXiv_CV

arXiv_CV Image_Caption Review Attention Caption Classification Detection
Abstract

Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, noticeable progress has been made in scene classification and target detection.However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at this https URL

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.07835

PDF

https://arxiv.org/pdf/1712.07835
Read All
Order-Free RNN with Visual Attention for Multi-Label Classification

2017-12-20

Shang-Fu Chen, Yi-Chen Chen, Chih-Kuan Yeh, Yu-Chiang Frank Wang

arXiv_CV

arXiv_CV Image_Caption Knowledge Attention Caption Inference RNN Classification Prediction
Abstract

In this paper, we propose the joint learning attention and recurrent neural network (RNN) models for multi-label classification. While approaches based on the use of either model exist (e.g., for the task of image captioning), training such existing network architectures typically require pre-defined label sequences. For multi-label classification, it would be desirable to have a robust inference process, so that the prediction error would not propagate and thus affect the performance. Our proposed model uniquely integrates attention and Long Short Term Memory (LSTM) models, which not only addresses the above problem but also allows one to identify visual objects of interests with varying sizes without the prior knowledge of particular label ordering. More importantly, label co-occurrence information can be jointly exploited by our LSTM model. Finally, by advancing the technique of beam search, prediction of multiple labels can be efficiently achieved by our proposed network model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.05495

PDF

https://arxiv.org/pdf/1707.05495
Read All
A Flexible Approach to Automated RNN Architecture Generation

2017-12-20

Martin Schrimpf, Stephen Merity, James Bradbury, Richard Socher

arXiv_CV

arXiv_CV Knowledge NAS Reinforcement_Learning RNN Language_Model
Abstract

The process of designing neural architectures requires expert knowledge and extensive trial and error. While automated architecture search may simplify these requirements, the recurrent neural network (RNN) architectures generated by existing methods are limited in both flexibility and components. We propose a domain-specific language (DSL) for use in automated architecture search which can produce novel RNNs of arbitrary depth and width. The DSL is flexible enough to define standard architectures such as the Gated Recurrent Unit and Long Short Term Memory and allows the introduction of non-standard RNN components such as trigonometric curves and layer normalization. Using two different candidate generation techniques, random search with a ranking function and reinforcement learning, we explore the novel architectures produced by the RNN DSL for language modeling and machine translation domains. The resulting architectures do not follow human intuition yet perform well on their targeted tasks, suggesting the space of usable RNN architectures is far larger than previously assumed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.07316

PDF

https://arxiv.org/pdf/1712.07316
Read All
Hyperparameters Optimization in Deep Convolutional Neural Network / Bayesian Approach with Gaussian Process Prior

2017-12-19

Pushparaja Murugan

arXiv_CV

arXiv_CV CNN Optimization
Abstract

Convolutional Neural Network is known as ConvNet have been extensively used in many complex machine learning tasks. However, hyperparameters optimization is one of a crucial step in developing ConvNet architectures, since the accuracy and performance are reliant on the hyperparameters. This multilayered architecture parameterized by a set of hyperparameters such as the number of convolutional layers, number of fully connected dense layers & neurons, the probability of dropout implementation, learning rate. Hence the searching the hyperparameter over the hyperparameter space are highly difficult to build such complex hierarchical architecture. Many methods have been proposed over the decade to explore the hyperparameter space and find the optimum set of hyperparameter values. Reportedly, Gird search and Random search are said to be inefficient and extremely expensive, due to a large number of hyperparameters of the architecture. Hence, Sequential model-based Bayesian Optimization is a promising alternative technique to address the extreme of the unknown cost function. The recent study on Bayesian Optimization by Snoek in nine convolutional network parameters is achieved the lowerest error report in the CIFAR-10 benchmark. This article is intended to provide the overview of the mathematical concept behind the Bayesian Optimization over a Gaussian prior.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.07233

PDF

https://arxiv.org/pdf/1712.07233
Read All
Attentive Memory Networks: Efficient Machine Reading for Conversational Search

2017-12-19

Tom Kenter, Maarten de Rijke

arXiv_CV

arXiv_CV Knowledge Memory_Networks
Abstract

Recent advances in conversational systems have changed the search paradigm. Traditionally, a user poses a query to a search engine that returns an answer based on its index, possibly leveraging external knowledge bases and conditioning the response on earlier interactions in the search session. In a natural conversation, there is an additional source of information to take into account: utterances produced earlier in a conversation can also be referred to and a conversational IR system has to keep track of information conveyed by the user during the conversation, even if it is implicit. We argue that the process of building a representation of the conversation can be framed as a machine reading task, where an automated system is presented with a number of statements about which it should answer questions. The questions should be answered solely by referring to the statements provided, without consulting external knowledge. The time is right for the information retrieval community to embrace this task, both as a stand-alone task and integrated in a broader conversational search setting. In this paper, we focus on machine reading as a stand-alone task and present the Attentive Memory Network (AMN), an end-to-end trainable machine reading algorithm. Its key contribution is in efficiency, achieved by having an hierarchical input encoder, iterating over the input only once. Speed is an important requirement in the setting of conversational search, as gaps between conversational turns have a detrimental effect on naturalness. On 20 datasets commonly used for evaluating machine reading algorithms we show that the AMN achieves performance comparable to the state-of-the-art models, while using considerably fewer computations.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.07229

PDF

https://arxiv.org/pdf/1712.07229
Read All
Learning Fixation Point Strategy for Object Detection and Classification

2017-12-19

Jie Lyu, Zejian Yuan, Dapeng Chen

arXiv_CV

arXiv_CV Object_Detection Attention Classification Detection
Abstract

We propose a novel recurrent attentional structure to localize and recognize objects jointly. The network can learn to extract a sequence of local observations with detailed appearance and rough context, instead of sliding windows or convolutions on the entire image. Meanwhile, those observations are fused to complete detection and classification tasks. On training, we present a hybrid loss function to learn the parameters of the multi-task network end-to-end. Particularly, the combination of stochastic and object-awareness strategy, named SA, can select more abundant context and ensure the last fixation close to the object. In addition, we build a real-world dataset to verify the capacity of our method in detecting the object of interest including those small ones. Our method can predict a precise bounding box on an image, and achieve high speed on large images without pooling operations. Experimental results indicate that the proposed method can mine effective context by several local observations. Moreover, the precision and speed are easily improved by changing the number of recurrent steps. Finally, we will open the source code of our proposed approach.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.06897

PDF

https://arxiv.org/pdf/1712.06897
Read All
Synthesizing Novel Pairs of Image and Text

2017-12-18

Jason Xie, Tingwen Bao

arXiv_CV

arXiv_CV Image_Caption Adversarial GAN Caption
Abstract

Generating novel pairs of image and text is a problem that combines computer vision and natural language processing. In this paper, we present strategies for generating novel image and caption pairs based on existing captioning datasets. The model takes advantage of recent advances in generative adversarial networks and sequence-to-sequence modeling. We make generalizations to generate paired samples from multiple domains. Furthermore, we study cycles – generating from image to text then back to image and vise versa, as well as its connection with autoencoders.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.06682

PDF

https://arxiv.org/pdf/1712.06682
Read All
Fermi level and bands offsets determination in insulating N/GaN structures

2017-12-16

L. Janicki, G. Kunert, M. Sawicki, E. Piskorska-Hommel, K. Gas, R. Jakiela, D. Hommel, R. Kudrawiec

arXiv_CV

arXiv_CV GAN Face
Abstract

The Fermi level position in (Ga,Mn)N has been determined from the period-analysis of GaN-related Franz-Keldysh oscillation obtained by contactless electroreflectance in a series of carefully prepared by molecular beam epitaxy GaN/Ga1-xMnxN/GaN(template) bilayers of various Mn concentration x. It is shown that the Fermi level in (Ga,Mn)N is strongly pinned in the middle of the band gap and the thickness of the depletion layer is negligibly small. For x > 0.1% the Fermi level is located about 1.25 - 1.55 eV above the valence band, that is very close to, but visibly below the Mn-related Mn2+/Mn3+ impurity band. The accumulated data allows us to estimate the Mn-related band offsets at the (Ga,Mn)N/GaN interface. It is found that most of the band gap change in (Ga,Mn)N takes place in the valence band on the absolute scale and amounts to -0.028+-0.008 eV/% Mn. The strong Fermi level pinning in the middle of the band gap, no carrier conductivity within the Mn-related impurity band, and a good homogeneity enable a novel functionality of (Ga,Mn)N as a semi-insulating buffer layers for applications in GaN-based heterostuctures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.06008

PDF

https://arxiv.org/pdf/1712.06008
Read All

224/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL