Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Sparse and Constrained Attention for Neural Machine Translation

2018-05-21

Chaitanya Malaviya, Pedro Ferreira, André F. T. Martins

arXiv_CL

arXiv_CL Sparse Attention NMT
Abstract

In NMT, words are sometimes dropped from the source or generated repeatedly in the translation. We explore novel strategies to address the coverage problem that change only the attention transformation. Our approach allocates fertilities to source words, used to bound the attention each word can receive. We experiment with various sparse and constrained attention transformations and propose a new one, constrained sparsemax, shown to be differentiable and sparse. Empirical evaluation is provided in three languages pairs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08241

PDF

https://arxiv.org/pdf/1805.08241
Read All
The Exoplanet Population Observation Simulator. I - The Inner Edges of Planetary Systems

2018-05-21

Gijs D. Mulders, Ilaria Pascucci, Daniel Apai, Fred J. Ciesla

arXiv_CV

arXiv_CV Knowledge Survey
Abstract

The Kepler survey provides a statistical census of planetary systems out to the habitable zone. Because most planets are non-transiting, orbital architectures are best estimated using simulated observations of ensemble populations. Here, we introduce EPOS, the Exoplanet Population Observation Simulator, to estimate the prevalence and orbital architectures of multi-planet systems based on the latest Kepler data release, DR25. We estimate that at least 42% of sun-like stars have nearly coplanar planetary systems with 7 or more exoplanets. The fraction of stars with at least one planet within 1 au could be as high as 100% depending on assumptions about the distribution of single transiting planets. We estimate an occurrence rate of planets in the habitable zone around sun-like stars of eta_earth=36+-14%. The innermost planets in multi-planet systems are clustered around an orbital period of 10 days (0.1 au), reminiscent of the protoplanetary disk inner edge or could be explained by a planet trap at that location. Only a small fraction of planetary systems have the innermost planet at long orbital periods, with fewer than ~8% and ~3% having no planet interior to the orbit of Mercury and Venus, respectively. These results reinforce the view that the solar system is not a typical planetary system, but an outlier among the distribution of known exoplanetary systems. We predict that at least half of the habitable zone exoplanets are accompanied by (non-transiting) planets at shorter orbital periods, hence knowledge of a close-in exoplanet could be used as a way to optimize the search for Earth-size planets in the Habitable Zone with future direct imaging missions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08211

PDF

https://arxiv.org/pdf/1805.08211
Read All
Italian center for Astronomical Archives publishing solution: modular and distributed

2018-05-21

Marco Molinaro, Nicola F. Calabria, Robert Butora, Sonia Zorba, Riccardo Smareglia

arXiv_CV

arXiv_CV Knowledge Face
Abstract

The Italian center for Astronomical Archives tries to provide astronomical data resources as interoperable services based on IVOA standards. Its VO expertise and knowledge comes from active participation within IVOA and VO at European and international level, with a double-fold goal: learn from the collaboration and provide inputs to the community. The first solution to build an easy to configure and maintain resource publisher conformant to VO standards proved to be too optimistic. For this reason it has been necessary to re-think the architecture with a modular system built around the messaging concept, where each modular component speaks to the other interested parties through a system of broker-managed queues. The first implemented protocol, the Simple Cone Search, shows the messaging task architecture connecting the parametric HTTP interface to the database backend access module, the logging module, and allows multiple cone search resources to be managed together through a configuration manager module. Even if relatively young, it already proved the flexibility required by the overall system when the database backend changed from MySQL to PostgreSQL+PgSphere. Another implementation test has been made to leverage task distribution over multiple servers to serve simultaneously: FITS cubes direct linking, cubes cutout and cubes positional merging. Currently the implementation of the SIA-2.0 standard protocol is ongoing while for TAP we will be adapting the TAPlib library. Alongside these tools a first administration tool (TASMAN) has been developed to ease the build up and maintenance of TAP_SCHEMA-ta including also ObsCore maintenance capability. Future work will be devoted at widening the range of VO protocols covered by the set of available modules, improve the configuration management and develop specific purpose modules common to all the service components.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08040

PDF

https://arxiv.org/pdf/1805.08040
Read All
Object Detection in Equirectangular Panorama

2018-05-21

Wenyan Yang, Yanlin Qian, Francesco Cricri, Lixin Fan, Joni-Kristian Kamarainen

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

We introduced a high-resolution equirectangular panorama (360-degree, virtual reality) dataset for object detection and propose a multi-projection variant of YOLO detector. The main challenge with equirectangular panorama image are i) the lack of annotated training data, ii) high-resolution imagery and iii) severe geometric distortions of objects near the panorama projection poles. In this work, we solve the challenges by i) using training examples available in the “conventional datasets” (ImageNet and COCO), ii) employing only low-resolution images that require only moderate GPU computing power and memory, and iii) our multi-projection YOLO handles projection distortions by making multiple stereographic sub-projections. In our experiments, YOLO outperforms the other state-of-art detector, Faster RCNN and our multi-projection YOLO achieves the best accuracy with low-resolution input.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08009

PDF

https://arxiv.org/pdf/1805.08009
Read All
Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

2018-05-21

Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, Rita Cucchiara

arXiv_CV

arXiv_CV Image_Caption Salient Attention Caption CNN RNN Prediction Quantitative
Abstract

Image captioning has been recently gaining a lot of attention thanks to the impressive achievements shown by deep captioning architectures, which combine Convolutional Neural Networks to extract image representations, and Recurrent Neural Networks to generate the corresponding captions. At the same time, a significant research effort has been dedicated to the development of saliency prediction models, which can predict human eye fixations. Even though saliency information could be useful to condition an image captioning architecture, by providing an indication of what is salient and what is not, research is still struggling to incorporate these two techniques. In this work, we propose an image captioning approach in which a generative recurrent neural network can focus on different parts of the input image during the generation of the caption, by exploiting the conditioning given by a saliency prediction model on which parts of the image are salient and which are contextual. We show, through extensive quantitative and qualitative experiments on large scale datasets, that our model achieves superior performances with respect to captioning baselines with and without saliency, and to different state of the art approaches combining saliency and captioning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1706.08474

PDF

https://arxiv.org/pdf/1706.08474
Read All
Tree Memory Networks for Modelling Long-term Temporal Dependencies

2018-05-20

Tharindu Fernando, Simon Denman, Aaron McFadyen, Sridha Sridharan, Clinton Fookes

arXiv_CV

arXiv_CV RNN Prediction Relation Memory_Networks VQA
Abstract

In the domain of sequence modelling, Recurrent Neural Networks (RNN) have been capable of achieving impressive results in a variety of application areas including visual question answering, part-of-speech tagging and machine translation. However this success in modelling short term dependencies has not successfully transitioned to application areas such as trajectory prediction, which require capturing both short term and long term relationships. In this paper, we propose a Tree Memory Network (TMN) for modelling long term and short term relationships in sequence-to-sequence mapping problems. The proposed network architecture is composed of an input module, controller and a memory module. In contrast to related literature, which models the memory as a sequence of historical states, we model the memory as a recursive tree structure. This structure more effectively captures temporal dependencies across both short term and long term sequences using its hierarchical structure. We demonstrate the effectiveness and flexibility of the proposed TMN in two practical problems, aircraft trajectory modelling and pedestrian trajectory modelling in a surveillance setting, and in both cases we outperform the current state-of-the-art. Furthermore, we perform an in depth analysis on the evolution of the memory module content over time and provide visual evidence on how the proposed TMN is able to map both long term and short term relationships efficiently via a hierarchical structure.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1703.04706

PDF

https://arxiv.org/pdf/1703.04706
Read All
The IIT Bombay English-Hindi Parallel Corpus

2018-05-19

Anoop Kunchukuttan, Pratik Mehta, Pushpak Bhattacharyya

arXiv_CL

arXiv_CL Knowledge NMT
Abstract

We present the IIT Bombay English-Hindi Parallel Corpus. The corpus is a compilation of parallel corpora previously available in the public domain as well as new parallel corpora we collected. The corpus contains 1.49 million parallel segments, of which 694k segments were not previously available in the public domain. The corpus has been pre-processed for machine translation, and we report baseline phrase-based SMT and NMT translation results on this corpus. This corpus has been used in two editions of shared tasks at the Workshop on Asian Language Translation (2016 and 2017). The corpus is freely available for non-commercial research. To the best of our knowledge, this is the largest publicly available English-Hindi parallel corpus.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1710.02855

PDF

https://arxiv.org/pdf/1710.02855
Read All
Optimizing the F-measure for Threshold-free Salient Object Detection

2018-05-19

Kai Zhao, Shanghua Gao, Qibin Hou, Dan-Dan Li, Ming-Ming Cheng

arXiv_CV

arXiv_CV Salient Object_Detection Optimization Prediction Detection
Abstract

Current CNN-based solutions to salient object detection (SOD) mainly rely on the optimization of cross-entropy loss (CELoss). Then the quality of detected saliency maps is often evaluated in terms of F-measure. In this paper, we investigate an interesting issue: can we consistently use the F-measure formulation in both training and evaluation for SOD? By reformulating the standard F-measure we propose the relaxed F-measure which is differentiable w.r.t the posterior and can be easily appended to the back of CNNs as the loss function. Compared to the conventional cross-entropy loss of which the gradients decrease dramatically in the saturated area, our loss function, named FLoss, holds considerable gradients even when the activation approaches the target. Consequently, the FLoss can continuously force the network to produce polarized activations. Comprehensive benchmarks on several popular datasets show that FLoss outperforms the state- of-the-arts with a considerable margin. More specifically, due to the polarized predictions, our method is able to obtain high quality saliency maps without carefully tuning the optimal threshold, showing significant advantages in real world applications.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.07567

PDF

https://arxiv.org/pdf/1805.07567
Read All
LMNet: Real-time Multiclass Object Detection on CPU using 3D LiDAR

2018-05-18

Kazuki Minemura, Hengfui Liau, Abraham Monrroy, Shinpei Kato

arXiv_CV

arXiv_CV Object_Detection GAN CNN Quantitative Detection
Abstract

This paper describes an optimized single-stage deep convolutional neural network to detect objects in urban environments, using nothing more than point cloud data. This feature enables our method to work regardless the time of the day and the lighting conditions.The proposed network structure employs dilated convolutions to gradually increase the perceptive field as depth increases, this helps to reduce the computation time by about 30%. The network input consists of five perspective representations of the unorganized point cloud data. The network outputs an objectness map and the bounding box offset values for each point. Our experiments showed that using reflection, range, and the position on each of the three axes helped to improve the location and orientation of the output bounding box. We carried out quantitative evaluations with the help of the KITTI dataset evaluation server. It achieved the fastest processing speed among the other contenders, making it suitable for real-time applications. We implemented and tested it on a real vehicle with a Velodyne HDL-64 mounted on top of it. We achieved execution times as fast as 50 FPS using desktop GPUs, and up to 10 FPS on a single Intel Core i5 CPU. The deploy implementation is open-sourced and it can be found as a feature branch inside the autonomous driving framework Autoware. Code is available at: this https URL

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.04902

PDF

https://arxiv.org/pdf/1805.04902
Read All
Combining Advanced Methods in Japanese-Vietnamese Neural Machine Translation

2018-05-18

Thi-Vinh Ngo, Thanh-Le Ha, Phuong-Thai Nguyen, Le-Minh Nguyen

arXiv_CL

arXiv_CL Segmentation NMT
Abstract

Neural machine translation (NMT) systems have recently obtained state-of-the art in many machine translation systems between popular language pairs because of the availability of data. For low-resourced language pairs, there are few researches in this field due to the lack of bilingual data. In this paper, we attempt to build the first NMT systems for a low-resourced language pairs:Japanese-Vietnamese. We have also shown significant improvements when combining advanced methods to reduce the adverse impacts of data sparsity and improve the quality of NMT systems. In addition, we proposed a variant of Byte-Pair Encoding algorithm to perform effective word segmentation for Vietnamese texts and alleviate the rare-word problem that persists in NMT systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.07133

PDF

https://arxiv.org/pdf/1805.07133
Read All
SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

2018-05-18

Alexander Mathews, Lexing Xie, Xuming He

arXiv_CV

arXiv_CV Image_Caption Caption Language_Model
Abstract

Linguistic style is an essential part of written communication, with the power to affect both clarity and attractiveness. With recent advances in vision and language, we can start to tackle the problem of generating image captions that are both visually grounded and appropriately styled. Existing approaches either require styled training captions aligned to images or generate captions with low relevance. We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images. The core idea of this model, called SemStyle, is to separate semantics and style. One key component is a novel and concise semantic term representation generated using natural language processing techniques and frame semantics. In addition, we develop a unified language model that decodes sentences with diverse word choices and syntax for different styles. Evaluations, both automatic and manual, show captions from SemStyle preserve image semantics, are descriptive, and are style shifted. More broadly, this work provides possibilities to learn richer image descriptions from the plethora of linguistic data available on the web.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.07030

PDF

https://arxiv.org/pdf/1805.07030
Read All
Zero-Shot Object Detection by Hybrid Region Embedding

2018-05-17

Berkan Demirel, Ramazan Gokberk Cinbis, Nazli Ikizler-Cinbis

arXiv_CV

arXiv_CV Object_Detection Embedding Prediction Detection
Abstract

Object detection is considered as one of the most challenging problems in computer vision, since it requires correct prediction of both classes and locations of objects in images. In this study, we define a more difficult scenario, namely zero-shot object detection (ZSD) where no visual training data is available for some of the target object classes. We present a novel approach to tackle this ZSD problem, where a convex combination of embeddings are used in conjunction with a detection framework. For evaluation of ZSD methods, we propose a simple dataset constructed from Fashion-MNIST images and also a custom zero-shot split for the Pascal VOC detection challenge. The experimental results suggest that our method yields promising results for ZSD.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.06157

PDF

https://arxiv.org/pdf/1805.06157
Read All
Hungarian Layer: Logics Empowered Neural Architecture

2018-05-17

Han Xiao, Yidong Chen, Xiaodong Shi

arXiv_CV

arXiv_CV RNN
Abstract

Neural architecture is a purely numeric framework, which fits the data as a continuous function. However, lacking of logic flow (e.g. \textit{if, for, while}), traditional algorithms (e.g. \textit{Hungarian algorithm, A$^*$ searching, decision tress algorithm}) could not be embedded into this paradigm, which limits the theories and applications. In this paper, we reform the calculus graph as a dynamic process, which is guided by logic flow. Within our novel methodology, traditional algorithms could empower numerical neural network. Specifically, regarding the subject of sentence matching, we reformulate this issue as the form of task-assignment, which is solved by Hungarian algorithm. First, our model applies BiLSTM to parse the sentences. Then Hungarian layer aligns the matching positions. Last, we transform the matching results for soft-max regression by another BiLSTM. Extensive experiments show that our model outperforms other state-of-the-art baselines substantially.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.02555

PDF

https://arxiv.org/pdf/1712.02555
Read All
Defoiling Foiled Image Captions

2018-05-16

Pranava Madhyastha, Josiah Wang, Lucia Specia

arXiv_CV

arXiv_CV Image_Caption Caption
Abstract

We address the task of detecting foiled image captions, i.e. identifying whether a caption contains a word that has been deliberately replaced by a semantically similar word, thus rendering it inaccurate with respect to the image being described. Solving this problem should in principle require a fine-grained understanding of images to detect linguistically valid perturbations in captions. In such contexts, encoding sufficiently descriptive image information becomes a key challenge. In this paper, we demonstrate that it is possible to solve this task using simple, interpretable yet powerful representations based on explicit object information. Our models achieve state-of-the-art performance on a standard dataset, with scores exceeding those achieved by humans on the task. We also measure the upper-bound performance of our models using gold standard annotations. Our analysis reveals that the simpler model performs well even without image information, suggesting that the dataset contains strong linguistic bias.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.06549

PDF

https://arxiv.org/pdf/1805.06549
Read All
Are BLEU and Meaning Representation in Opposition?

2018-05-16

Ondřej Cífka, Ondřej Bojar

arXiv_CL

arXiv_CL Attention NMT Classification
Abstract

One of possible ways of obtaining continuous-space sentence representations is by training neural machine translation (NMT) systems. The recent attention mechanism however removes the single point in the neural network from which the source sentence representation can be extracted. We propose several variations of the attentive NMT architecture bringing this meeting point back. Empirical evaluation suggests that the better the translation quality, the worse the learned sentence representations serve in a wide range of classification and similarity tasks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.06536

PDF

https://arxiv.org/pdf/1805.06536
Read All
Experimental Two-dimensional Quantum Walk on a Photonic Chip

2018-05-16

Hao Tang, Xiao-Feng Lin, Zhen Feng, Jing-Yuan Chen, Jun Gao, Ke Sun, Chao-Yue Wang, Peng-Cheng Lai, Xiao-Yun Xu, Yao Wang, Lu-Feng Qiao, Ai-Lin Yang, Xian-Min Jin

arXiv_CV

arXiv_CV
Abstract

Quantum walks, in virtue of the coherent superposition and quantum interference, possess exponential superiority over its classical counterpart in applications of quantum searching and quantum simulation. The quantum enhanced power is highly related to the state space of quantum walks, which can be expanded by enlarging the photon number and/or the dimensions of the evolution network, but the former is considerably challenging due to probabilistic generation of single photons and multiplicative loss. Here we demonstrate a two-dimensional continuous-time quantum walk by using the external geometry of photonic waveguide arrays, rather than the inner degree of freedoms of photons. Using femtosecond laser direct writing, we construct a large-scale three-dimensional structure which forms a two-dimensional lattice with up to 49X49 nodes on a photonic chip. We demonstrate spatial two-dimensional quantum walks using heralded single photons and single-photon-level imaging. We analyze the quantum transport properties via observing the ballistic evolution pattern and the variance profile, which agree well with simulation results. We further reveal the transient nature that is the unique feature for quantum walks of beyond one dimension. An architecture that allows a walk to freely evolve in all directions and a large scale, combining with defect and disorder control, may bring up powerful and versatile quantum walk machines for classically intractable problems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1704.08242

PDF

https://arxiv.org/pdf/1704.08242
Read All
#phramacovigilance - Exploring Deep Learning Techniques for Identifying Mentions of Medication Intake from Twitter

2018-05-16

Debanjan Mahata, Jasper Friedrichs, Hitkul, Rajiv Ratn Shah

arXiv_CV

arXiv_CV Sentiment CNN Classification Deep_Learning
Abstract

Mining social media messages for health and drug related information has received significant interest in pharmacovigilance research. Social media sites (e.g., Twitter), have been used for monitoring drug abuse, adverse reactions of drug usage and analyzing expression of sentiments related to drugs. Most of these studies are based on aggregated results from a large population rather than specific sets of individuals. In order to conduct studies at an individual level or specific cohorts, identifying posts mentioning intake of medicine by the user is necessary. Towards this objective, we train different deep neural network classification models on a publicly available annotated dataset and study their performances on identifying mentions of personal intake of medicine in tweets. We also design and train a new architecture of a stacked ensemble of shallow convolutional neural network (CNN) ensembles. We use random search for tuning the hyperparameters of the models and share the details of the values taken by the hyperparameters for the best learnt model in different deep neural network architectures. Our system produces state-of-the-art results, with a micro- averaged F-score of 0.693.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.06375

PDF

https://arxiv.org/pdf/1805.06375
Read All
Object detection at 200 Frames Per Second

2018-05-16

Rakesh Mehta, Cemalettin Ozturk

arXiv_CV

arXiv_CV Object_Detection Knowledge Detection
Abstract

In this paper, we propose an efficient and fast object detector which can process hundreds of frames per second. To achieve this goal we investigate three main aspects of the object detection framework: network architecture, loss function and training data (labeled and unlabeled). In order to obtain compact network architecture, we introduce various improvements, based on recent work, to develop an architecture which is computationally light-weight and achieves a reasonable performance. To further improve the performance, while keeping the complexity same, we utilize distillation loss function. Using distillation loss we transfer the knowledge of a more accurate teacher network to proposed light-weight student network. We propose various innovations to make distillation efficient for the proposed one stage detector pipeline: objectness scaled distillation loss, feature map non-maximal suppression and a single unified distillation loss function for detection. Finally, building upon the distillation loss, we explore how much can we push the performance by utilizing the unlabeled data. We train our model with unlabeled data using the soft labels of the teacher network. Our final network consists of 10x fewer parameters than the VGG based object detection network and it achieves a speed of more than 200 FPS and proposed changes improve the detection accuracy by 14 mAP over the baseline on Pascal dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.06361

PDF

https://arxiv.org/pdf/1805.06361
Read All
Document Context Neural Machine Translation with Memory Networks

2018-05-16

Sameen Maruf, Gholamreza Haffari

arXiv_CV

arXiv_CV Prediction Memory_Networks
Abstract

We present a document-level neural machine translation model which takes both source and target document context into account using memory networks. We model the problem as a structured prediction problem with interdependencies among the observed and hidden variables, i.e., the source sentences and their unobserved target translations in the document. The resulting structured prediction problem is tackled with a neural translation model equipped with two memory components, one each for the source and target side, to capture the documental interdependencies. We train the model end-to-end, and propose an iterative decoding algorithm based on block coordinate descent. Experimental results of English translations from French, German, and Estonian documents show that our model is effective in exploiting both source and target document context, and statistically significantly outperforms the previous work in terms of BLEU and METEOR.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1711.03688

PDF

https://arxiv.org/pdf/1711.03688
Read All
Towards Robust Neural Machine Translation

2018-05-16

Yong Cheng, Zhaopeng Tu, Fandong Meng, Junjie Zhai, Yang Liu

arXiv_CL

arXiv_CL Adversarial NMT
Abstract

Small perturbations in the input can severely distort intermediate representations and thus impact translation quality of neural machine translation (NMT) models. In this paper, we propose to improve the robustness of NMT models with adversarial stability training. The basic idea is to make both the encoder and decoder in NMT models robust against input perturbations by enabling them to behave similarly for the original input and its perturbed counterpart. Experimental results on Chinese-English, English-German and English-French translation tasks show that our approaches can not only achieve significant improvements over strong NMT systems but also improve the robustness of NMT models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.06130

PDF

https://arxiv.org/pdf/1805.06130
Read All
Generating Continuous Representations of Medical Texts

2018-05-15

Graham Spinks, Marie-Francine Moens

arXiv_CV

arXiv_CV Adversarial Caption RNN Quantitative
Abstract

We present an architecture that generates medical texts while learning an informative, continuous representation with discriminative features. During training the input to the system is a dataset of captions for medical X-Rays. The acquired continuous representations are of particular interest for use in many machine learning techniques where the discrete and high-dimensional nature of textual input is an obstacle. We use an Adversarially Regularized Autoencoder to create realistic text in both an unconditional and conditional setting. We show that this technique is applicable to medical texts which often contain syntactic and domain-specific shorthands. A quantitative evaluation shows that we achieve a lower model perplexity than a traditional LSTM generator.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.05691

PDF

https://arxiv.org/pdf/1805.05691
Read All
Differentiating Objects by Motion: Joint Detection and Tracking of Small Flying Objects

2018-05-15

Ryota Yoshihashi, Tu Tuan Trinh, Rei Kawakami, Shaodi You, Makoto Iida, Takeshi Naemura

arXiv_CV

arXiv_CV Object_Detection Tracking CNN Detection Relation
Abstract

While generic object detection has achieved large improvements with rich feature hierarchies from deep nets, detecting small objects with poor visual cues remains challenging. Motion cues from multiple frames may be more informative for detecting such hard-to-distinguish objects in each frame. However, how to encode discriminative motion patterns, such as deformations and pose changes that characterize objects, has remained an open question. To learn them and thereby realize small object detection, we present a neural model called the Recurrent Correlational Network, where detection and tracking are jointly performed over a multi-frame representation learned through a single, trainable, and end-to-end network. A convolutional long short-term memory network is utilized for learning informative appearance change for detection, while learned representation is shared in tracking for enhancing its performance. In experiments with datasets containing images of scenes with small flying objects, such as birds and unmanned aerial vehicles, the proposed method yielded consistent improvements in detection performance over deep single-frame detectors and existing motion-based detectors. Furthermore, our network performs as well as state-of-the-art generic object trackers when it was evaluated as a tracker on the bird dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.04666

PDF

https://arxiv.org/pdf/1709.04666
Read All
NASH: Toward End-to-End Neural Architecture for Generative Semantic Hashing

2018-05-14

Dinghan Shen, Qinliang Su, Paidamoyo Chapfuwa, Wenlin Wang, Guoyin Wang, Lawrence Carin, Ricardo Henao

arXiv_CV

arXiv_CV Inference
Abstract

Semantic hashing has become a powerful paradigm for fast similarity search in many information retrieval systems. While fairly successful, previous techniques generally require two-stage training, and the binary constraints are handled ad-hoc. In this paper, we present an end-to-end Neural Architecture for Semantic Hashing (NASH), where the binary hashing codes are treated as Bernoulli latent variables. A neural variational inference framework is proposed for training, where gradients are directly back-propagated through the discrete latent variable to optimize the hash function. We also draw connections between proposed method and rate-distortion theory, which provides a theoretical foundation for the effectiveness of the proposed framework. Experimental results on three public datasets demonstrate that our method significantly outperforms several state-of-the-art models on both unsupervised and supervised scenarios.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.05361

PDF

https://arxiv.org/pdf/1805.05361
Read All
Practical Block-wise Neural Network Architecture Generation

2018-05-14

Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, Cheng-Lin Liu

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

Convolutional neural networks have gained a remarkable success in computer vision. However, most usable network architectures are hand-crafted and usually require expertise and elaborate design. In this paper, we provide a block-wise network generation pipeline called BlockQNN which automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy exploration strategy. The optimal network block is constructed by the learning agent which is trained sequentially to choose component layers. We stack the block to construct the whole auto-generated network. To accelerate the generation process, we also propose a distributed asynchronous framework and an early stop strategy. The block-wise generation brings unique advantages: (1) it performs competitive results in comparison to the hand-crafted state-of-the-art networks on image classification, additionally, the best network generated by BlockQNN achieves 3.54% top-1 error rate on CIFAR-10 which beats all existing auto-generate networks. (2) in the meanwhile, it offers tremendous reduction of the search space in designing networks which only spends 3 days with 32 GPUs, and (3) moreover, it has strong generalizability that the network built on CIFAR also performs well on a larger-scale ImageNet dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1708.05552

PDF

https://arxiv.org/pdf/1708.05552
Read All
Method of increasing the information capacity of associative memory of oscillator neural networks using high-order synchronization effect

2018-05-14

Andrei Velichko, Maksim Belyaev, Vadim Putrolaynen, Petr Boriskov

arXiv_CV

arXiv_CV Recognition
Abstract

Computational modelling of two- and three-oscillator schemes with thermally coupled $VO_2$-switches is used to demonstrate a novel method of pattern storage and recognition in an impulse oscillator neural network (ONN) based on the high-order synchronization effect. The method ensures high information capacity of associative memory, i.e. a large number of synchronous states $N_s$. Each state in the system is characterized by the synchronization order determined as the ratio of harmonics number at the common synchronization frequency. The modelling demonstrates attainment of $N_s$ of several orders both for a three-oscillator scheme $N_s$~650 and for a two-oscillator scheme $N_s$~260. A number of regularities are obtained, in particular, an optimal strength of oscillator coupling is revealed when $N_s$ has a maximum. A general tendency toward information capacity decrease is shown when the coupling strength and switch inner noise amplitude increase. An algorithm of pattern storage and test vector recognition is suggested. It is also shown that the coordinate number in each vector should be one less than the switch number to reduce recognition ambiguity. The demonstrated method of associative memory realization is a general one and it may be applied in ONNs with various mechanisms and oscillator coupling topology.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.08737

PDF

https://arxiv.org/pdf/1805.08737
Read All
Exploiting the Value of the Center-dark Channel Prior for Salient Object Detection

2018-05-14

Chunbiao Zhu, Wenhao Zhang, Thomas H. Li, Ge Li

arXiv_CV

arXiv_CV Salient Object_Detection Detection
Abstract

Saliency detection aims to detect the most attractive objects in images and is widely used as a foundation for various applications. In this paper, we propose a novel salient object detection algorithm for RGB-D images using center-dark channel priors. First, we generate an initial saliency map based on a color saliency map and a depth saliency map of a given RGB-D image. Then, we generate a center-dark channel map based on center saliency and dark channel priors. Finally, we fuse the initial saliency map with the center dark channel map to generate the final saliency map. Extensive evaluations over four benchmark datasets demonstrate that our proposed method performs favorably against most of the state-of-the-art approaches. Besides, we further discuss the application of the proposed algorithm in small target detection and demonstrate the universal value of center-dark channel priors in the field of object detection.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.05132

PDF

https://arxiv.org/pdf/1805.05132
Read All
Token-level and sequence-level loss smoothing for RNN language models

2018-05-14

Maha Elbayad, Laurent Besacier, Jakob Verbeek

arXiv_CV

arXiv_CV Image_Caption Caption RNN Language_Model Prediction
Abstract

Despite the effectiveness of recurrent neural network language models, their maximum likelihood estimation suffers from two limitations. It treats all sentences that do not match the ground truth as equally poor, ignoring the structure of the output space. Second, it suffers from “exposure bias”: during training tokens are predicted given ground-truth sequences, while at test time prediction is conditioned on generated output sequences. To overcome these limitations we build upon the recent reward augmented maximum likelihood approach \ie sequence-level smoothing that encourages the model to predict sentences close to the ground truth according to a given performance metric. We extend this approach to token-level loss smoothing, and propose improvements to the sequence-level smoothing approach. Our experiments on two different tasks, image captioning and machine translation, show that token-level and sequence-level loss smoothing are complementary, and significantly improve results.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.05062

PDF

https://arxiv.org/pdf/1805.05062
Read All
Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery

2018-05-13

Thomas Stepleton, Razvan Pascanu, Will Dabney, Siddhant M. Jayakumar, Hubert Soyer, Remi Munos

arXiv_CV

arXiv_CV Reinforcement_Learning RNN Relation
Abstract

Reinforcement learning (RL) agents performing complex tasks must be able to remember observations and actions across sizable time intervals. This is especially true during the initial learning stages, when exploratory behaviour can increase the delay between specific actions and their effects. Many new or popular approaches for learning these distant correlations employ backpropagation through time (BPTT), but this technique requires storing observation traces long enough to span the interval between cause and effect. Besides memory demands, learning dynamics like vanishing gradients and slow convergence due to infrequent weight updates can reduce BPTT’s practicality; meanwhile, although online recurrent network learning is a developing topic, most approaches are not efficient enough to use as replacements. We propose a simple, effective memory strategy that can extend the window over which BPTT can learn without requiring longer traces. We explore this approach empirically on a few tasks and discuss its implications.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.04955

PDF

https://arxiv.org/pdf/1805.04955
Read All
Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT

2018-05-11

Danielle Saunders, Felix Stahlberg, Adria de Gispert, Bill Byrne

arXiv_CL

arXiv_CL NMT
Abstract

We explore strategies for incorporating target syntax into Neural Machine Translation. We specifically focus on syntax in ensembles containing multiple sentence representations. We formulate beam search over such ensembles using WFSTs, and describe a delayed SGD update training procedure that is especially effective for long representations like linearized syntax. Our approach gives state-of-the-art performance on a difficult Japanese-English task.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.00456

PDF

https://arxiv.org/pdf/1805.00456
Read All
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition

2018-05-11

Jinmian Ye, Linnan Wang, Guangxi Li, Di Chen, Shandian Zhe, Xinqi Chu, Zenglin Xu

arXiv_CV

arXiv_CV Image_Caption Caption Action_Recognition RNN Prediction Recognition
Abstract

Recurrent Neural Networks (RNNs) are powerful sequence modeling tools. However, when dealing with high dimensional inputs, the training of RNNs becomes computational expensive due to the large number of model parameters. This hinders RNNs from solving many important computer vision tasks, such as Action Recognition in Videos and Image Captioning. To overcome this problem, we propose a compact and flexible structure, namely Block-Term tensor decomposition, which greatly reduces the parameters of RNNs and improves their training efficiency. Compared with alternative low-rank approximations, such as tensor-train RNN (TT-RNN), our method, Block-Term RNN (BT-RNN), is not only more concise (when using the same rank), but also able to attain a better approximation to the original RNNs with much fewer parameters. On three challenging tasks, including Action Recognition in Videos, Image Captioning and Image Generation, BT-RNN outperforms TT-RNN and the standard RNN in terms of both prediction accuracy and convergence rate. Specifically, BT-LSTM utilizes 17,388 times fewer parameters than the standard LSTM to achieve an accuracy improvement over 15.6\% in the Action Recognition task on the UCF11 dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.05134

PDF

https://arxiv.org/pdf/1712.05134
Read All
Deep Neural Machine Translation with Weakly-Recurrent Units

2018-05-10

Mattia Antonino Di Gangi, Marcello Federico

arXiv_CL

arXiv_CL Attention NMT Inference RNN
Abstract

Recurrent neural networks (RNNs) have represented for years the state of the art in neural machine translation. Recently, new architectures have been proposed, which can leverage parallel computation on GPUs better than classical RNNs. Faster training and inference combined with different sequence-to-sequence modeling also lead to performance improvements. While the new models completely depart from the original recurrent architecture, we decided to investigate how to make RNNs more efficient. In this work, we propose a new recurrent NMT architecture, called Simple Recurrent NMT, built on a class of fast and weakly-recurrent units that use layer normalization and multiple attentions. Our experiments on the WMT14 English-to-German and WMT16 English-Romanian benchmarks show that our model represents a valid alternative to LSTMs, as it can achieve better results at a significantly lower computational cost.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.04185

PDF

https://arxiv.org/pdf/1805.04185
Read All
Pragmatically Informative Image Captioning with Character-Level Inference

2018-05-10

Reuben Cohn-Gordon, Noah Goodman, Christopher Potts

arXiv_CV

arXiv_CV Image_Caption Caption Inference
Abstract

We combine a neural image captioner with a Rational Speech Acts (RSA) model to make a system that is pragmatically informative: its objective is to produce captions that are not merely true but also distinguish their inputs from similar images. Previous attempts to combine RSA with neural image captioning require an inference which normalizes over the entire set of possible utterances. This poses a serious problem of efficiency, previously solved by sampling a small subset of possible utterances. We instead solve this problem by implementing a version of RSA which operates at the level of characters (“a”,”b”,”c”…) during the unrolling of the caption. We find that the utterance-level effect of referential captions can be obtained with only character-level decisions. Finally, we introduce an automatic method for testing the performance of pragmatic speaker models, and show that our model outperforms a non-pragmatic baseline as well as a word-level RSA captioner.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.05417

PDF

https://arxiv.org/pdf/1804.05417
Read All
Query for Architecture, Click through Military: Comparing the Roles of Search and Navigation on Wikipedia

2018-05-10

Dimitar Dimitrov, Florian Lemmerich, Fabian Flöck, Markus Strohmaier

arXiv_CV

arXiv_CV
Abstract

As one of the richest sources of encyclopedic information on the Web, Wikipedia generates an enormous amount of traffic. In this paper, we study large-scale article access data of the English Wikipedia in order to compare articles with respect to the two main paradigms of information seeking, i.e., search by formulating a query, and navigation by following hyperlinks. To this end, we propose and employ two main metrics, namely (i) searchshare – the relative amount of views an article received by search –, and (ii) resistance – the ability of an article to relay traffic to other Wikipedia articles – to characterize articles. We demonstrate how articles in distinct topical categories differ substantially in terms of these properties. For example, architecture-related articles are often accessed through search and are simultaneously a “dead end” for traffic, whereas historical articles about military events are mainly navigated. We further link traffic differences to varying network, content, and editing activity features. Lastly, we measure the impact of the article properties by modeling access behavior on articles with a gradient boosting approach. The results of this paper constitute a step towards understanding human information seeking behavior on the Web.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.04022

PDF

https://arxiv.org/pdf/1805.04022
Read All
WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval

2018-05-10

Daniel Cohen, Liu Yang, W. Bruce Croft

arXiv_CV

arXiv_CV QA
Abstract

With the rise in mobile and voice search, answer passage retrieval acts as a critical component of an effective information retrieval system for open domain question answering. Currently, there are no comparable collections that address non-factoid question answering within larger documents while simultaneously providing enough examples sufficient to train a deep neural network. In this paper, we introduce a new Wikipedia based collection specific for non-factoid answer passage retrieval containing thousands of questions with annotated answers and show benchmark results on a variety of state of the art neural architectures and retrieval models. The experimental results demonstrate the unique challenges presented by answer passage retrieval within topically relevant documents for future research.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.03797

PDF

https://arxiv.org/pdf/1805.03797
Read All
Neural Machine Translation Decoding with Terminology Constraints

2018-05-09

Eva Hasler, Adrià De Gispert, Gonzalo Iglesias, Bill Byrne

arXiv_CL

arXiv_CL Attention NMT
Abstract

Despite the impressive quality improvements yielded by neural machine translation (NMT) systems, controlling their translation output to adhere to user-provided terminology constraints remains an open problem. We describe our approach to constrained neural decoding based on finite-state machines and multi-stack decoding which supports target-side constraints as well as constraints with corresponding aligned input text spans. We demonstrate the performance of our framework on multiple translation tasks and motivate the need for constrained decoding with attentions as a means of reducing misplacement and duplication when translating user constraints.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.03750

PDF

https://arxiv.org/pdf/1805.03750
Read All
VizWiz Grand Challenge: Answering Visual Questions from Blind People

2018-05-09

Danna Gurari, Qing Li, Abigale J. Stangl, Anhong Guo, Chi Lin, Kristen Grauman, Jiebo Luo, Jeffrey P. Bigham

arXiv_CV

arXiv_CV QA VQA
Abstract

The study of algorithms to automatically answer visual questions currently is motivated by visual question answering (VQA) datasets constructed in artificial VQA settings. We propose VizWiz, the first goal-oriented VQA dataset arising from a natural VQA setting. VizWiz consists of over 31,000 visual questions originating from blind people who each took a picture using a mobile phone and recorded a spoken question about it, together with 10 crowdsourced answers per visual question. VizWiz differs from the many existing VQA datasets because (1) images are captured by blind photographers and so are often poor quality, (2) questions are spoken and so are more conversational, and (3) often visual questions cannot be answered. Evaluation of modern algorithms for answering visual questions and deciding if a visual question is answerable reveals that VizWiz is a challenging dataset. We introduce this dataset to encourage a larger community to develop more generalized algorithms that can assist blind people.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1802.08218

PDF

https://arxiv.org/pdf/1802.08218
Read All
A Click Sequence Model for Web Search

2018-05-09

Alexey Borisov, Martijn Wardenaar, Ilya Markov, Maarten de Rijke

arXiv_CV

arXiv_CV Attention Embedding Prediction
Abstract

Getting a better understanding of user behavior is important for advancing information retrieval systems. Existing work focuses on modeling and predicting single interaction events, such as clicks. In this paper, we for the first time focus on modeling and predicting sequences of interaction events. And in particular, sequences of clicks. We formulate the problem of click sequence prediction and propose a click sequence model (CSM) that aims to predict the order in which a user will interact with search engine results. CSM is based on a neural network that follows the encoder-decoder architecture. The encoder computes contextual embeddings of the results. The decoder predicts the sequence of positions of the clicked results. It uses an attention mechanism to extract necessary information about the results at each timestep. We optimize the parameters of CSM by maximizing the likelihood of observed click sequences. We test the effectiveness of CSM on three new tasks: (i) predicting click sequences, (ii) predicting the number of clicks, and (iii) predicting whether or not a user will interact with the results in the order these results are presented on a search engine result page (SERP). Also, we show that CSM achieves state-of-the-art results on a standard click prediction task, where the goal is to predict an unordered set of results a user will click on.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.03411

PDF

https://arxiv.org/pdf/1805.03411
Read All
Improving GAN Training via Binarized Representation Entropy Regularization

2018-05-09

Yanshuai Cao, Gavin Weiguang Ding, Kry Yik-Chau Lui, Ruitong Huang

arXiv_CV

arXiv_CV Regularization Adversarial GAN Classification
Abstract

We propose a novel regularizer to improve the training of Generative Adversarial Networks (GANs). The motivation is that when the discriminator D spreads out its model capacity in the right way, the learning signals given to the generator G are more informative and diverse. These in turn help G to explore better and discover the real data manifold while avoiding large unstable jumps due to the erroneous extrapolation made by D. Our regularizer guides the rectifier discriminator D to better allocate its model capacity, by encouraging the binary activation patterns on selected internal layers of D to have a high joint entropy. Experimental results on both synthetic data and real datasets demonstrate improvements in stability and convergence speed of the GAN training, as well as higher sample quality. The approach also leads to higher classification accuracies in semi-supervised learning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.03644

PDF

https://arxiv.org/pdf/1805.03644
Read All
A Memristor based Unsupervised Neuromorphic System Towards Fast and Energy-Efficient GAN

2018-05-09

F. Liu, C. Liu

arXiv_CV

arXiv_CV Adversarial GAN Deep_Learning Relation
Abstract

Deep Learning has gained immense success in pushing today’s artificial intelligence forward. To solve the challenge of limited labeled data in the supervised learning world, unsupervised learning has been proposed years ago while low accuracy hinters its realistic applications. Generative adversarial network (GAN) emerges as an unsupervised learning approach with promising accuracy and are under extensively study. However, the execution of GAN is extremely memory and computation intensive and results in ultra-low speed and high-power consumption. In this work, we proposed a holistic solution for fast and energy-efficient GAN computation through a memristor-based neuromorphic system. First, we exploited a hardware and software co-design approach to map the computation blocks in GAN efficiently. We also proposed an efficient data flow for optimal parallelism training and testing, depending on the computation correlations between different computing blocks. To compute the unique and complex loss of GAN, we developed a diff-block with optimized accuracy and performance. The experiment results on big data show that our design achieves 2.8x speedup and 6.1x energy-saving compared with the traditional GPU accelerator, as well as 5.5x speedup and 1.4x energy-saving compared with the previous FPGA-based accelerator.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.01775

PDF

https://arxiv.org/pdf/1806.01775
Read All
Apply Chinese Radicals Into Neural Machine Translation: Deeper Than Character Level

2018-05-08

Shaohui Kuang, Lifeng Han

arXiv_CL

arXiv_CL Knowledge Attention Face NMT
Abstract

In neural machine translation (NMT), researchers face the challenge of un-seen (or out-of-vocabulary OOV) words translation. To solve this, some researchers propose the splitting of western languages such as English and German into sub-words or compounds. In this paper, we try to address this OOV issue and improve the NMT adequacy with a harder language Chinese whose characters are even more sophisticated in composition. We integrate the Chinese radicals into the NMT model with different settings to address the unseen words challenge in Chinese to English translation. On the other hand, this also can be considered as semantic part of the MT system since the Chinese radicals usually carry the essential meaning of the words they are constructed in. Meaningful radicals and new characters can be integrated into the NMT systems with our models. We use an attention-based NMT system as a strong baseline system. The experiments on standard Chinese-to-English NIST translation shared task data 2006 and 2008 show that our designed models outperform the baseline model in a wide range of state-of-the-art evaluation metrics including LEPOR, BEER, and CharacTER, in addition to the traditional BLEU and NIST scores, especially on the adequacy-level translation. We also have some interesting findings from the results of our various experiment settings about the performance of words and characters in Chinese NMT, which is different with other languages. For instance, the full character level NMT may perform very well or the state of the art in some other languages as researchers demonstrated recently, however, in the Chinese NMT model, word boundary knowledge is important for the model learning.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.01565

PDF

https://arxiv.org/pdf/1805.01565
Read All
Improving Character-level Japanese-Chinese Neural Machine Translation with Radicals as an Additional Input Feature

2018-05-08

Jinyi Zhang, Tadahiro Matsumoto

arXiv_CL

arXiv_CL NMT
Abstract

In recent years, Neural Machine Translation (NMT) has been proven to get impressive results. While some additional linguistic features of input words improve word-level NMT, any additional character features have not been used to improve character-level NMT so far. In this paper, we show that the radicals of Chinese characters (or kanji), as a character feature information, can be easily provide further improvements in the character-level NMT. In experiments on WAT2016 Japanese-Chinese scientific paper excerpt corpus (ASPEC-JP), we find that the proposed method improves the translation quality according to two aspects: perplexity and BLEU. The character-level NMT with the radical input feature’s model got a state-of-the-art result of 40.61 BLEU points in the test set, which is an improvement of about 8.6 BLEU points over the best system on the WAT2016 Japanese-to-Chinese translation subtask with ASPEC-JP. The improvements over the character-level NMT with no additional input feature are up to about 1.5 and 1.4 BLEU points in the development-test set and the test set of the corpus, respectively.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.02937

PDF

https://arxiv.org/pdf/1805.02937
Read All
Hierarchical Temporal Memory using Memristor Networks: A Survey

2018-05-08

Olga Krestinskaya, Irina Dolzhikova, Alex Pappachen James

arXiv_CV

arXiv_CV Review Survey
Abstract

This paper presents a survey of the currently available hardware designs for implementation of the human cortex inspired algorithm, Hierarchical Temporal Memory (HTM). In this review, we focus on the state of the art advances of memristive HTM implementation and related HTM applications. With the advent of edge computing, HTM can be a potential algorithm to implement on-chip near sensor data processing. The comparison of analog memristive circuit implementations with the digital and mixed-signal solutions are provided. The advantages of memristive HTM over digital implementations against performance metrics such as processing speed, reduced on-chip area and power dissipation are discussed. The limitations and open problems concerning the memristive HTM, such as the design scalability, sneak currents, leakage, parasitic effects, lack of the analog learning circuits implementations and unreliability of the memristive devices integrated with CMOS circuits are also discussed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.02921

PDF

https://arxiv.org/pdf/1805.02921
Read All
The Effects of Statistical Multiplicity of Infection on Virus Quantification and Infectivity Assays

2018-05-08

Bhaven Mistry, Maria R. D'Orsogna, Tom Chou

arXiv_CV

arXiv_CV QA VQA
Abstract

Many biological assays are employed in virology to quantify parameters of interest. Two such classes of assays, virus quantification assays (VQA) and infectivity assays (IA), aim to estimate the number of viruses present in a solution, and the ability of a viral strain to successfully infect a host cell, respectively. VQAs operate at extremely dilute concentrations and results can be subject to stochastic variability in virus-cell interactions. At the other extreme, high viral particle concentrations are used in IAs, resulting in large numbers of viruses infecting each cell, enough for measurable change in total transcription activity. Furthermore, host cells can be infected at any concentration regime by multiple particles, resulting in a statistical multiplicity of infection (SMOI) and yielding potentially significant variability in the assay signal and parameter estimates. We develop probabilistic models for SMOI at low and high viral particle concentration limits and apply them to the plaque (VQA), endpoint dilution (VQA), and luciferase reporter (IA) assays. A web-based tool implementing our models and analysis is also developed and presented. We test our proposed new methods for inferring experimental parameters from data using numerical simulations and show improvement on existing procedures in all limits.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.02810

PDF

https://arxiv.org/pdf/1805.02810
Read All
ReGAN: RELAX|BAR|INFORCE based Sequence Generation using GANs

2018-05-08

Aparna Balagopalan, Satya Gorti, Mathieu Ravaut, Raeid Saqur

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Generative Adversarial Networks (GANs) have seen steep ascension to the peak of ML research zeitgeist in recent years. Mostly catalyzed by its success in the domain of image generation, the technique has seen wide range of adoption in a variety of other problem domains. Although GANs have had a lot of success in producing more realistic images than other approaches, they have only seen limited use for text sequences. Generation of longer sequences compounds this problem. Most recently, SeqGAN (Yu et al., 2017) has shown improvements in adversarial evaluation and results with human evaluation compared to a MLE based trained baseline. The main contributions of this paper are three-fold: 1. We show results for sequence generation using a GAN architecture with efficient policy gradient estimators, 2. We attain improved training stability, and 3. We perform a comparative study of recent unbiased low variance gradient estimation techniques such as REBAR (Tucker et al., 2017), RELAX (Grathwohl et al., 2018) and REINFORCE (Williams, 1992). Using a simple grammar on synthetic datasets with varying length, we indicate the quality of sequences generated by the model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.02788

PDF

https://arxiv.org/pdf/1805.02788
Read All
Implementing quantum algorithms on temporal photonic cluster states

2018-05-07

Daiqin Su, Krishna Kumar Sabapathy, Casey R. Myers, Haoyu Qi, Christian Weedbrook, Kamil Brádler

arXiv_CV

arXiv_CV Review
Abstract

Implementing quantum algorithms is essential for quantum computation. We study the implementation of three quantum algorithms by performing homodyne measurements on a two-dimensional temporal continuous-variable cluster state. We first review the generation of temporal cluster states and the implementation of gates using the measurement-based model. Alongside this we discuss methods to introduce non-Gaussianity into the cluster states. The first algorithm we consider is Gaussian Boson Sampling in which only Gaussian unitaries need to be implemented. Taking into account the fact that input states are also Gaussian, the errors due to the effect of finite squeezing can be corrected, provided a moderate amount of online squeezing is available. This helps to construct a large Gaussian Boson Sampling machine. The second algorithm is the continuous-variable Instantaneous Quantum Polynomial circuit in which one needs to implement non-Gaussian gates, such as the cubic phase gate. We discuss several methods of implementing the cubic phase gate and fit them into the temporal cluster state architecture. The third algorithm is the continuous-variable version of Grover’s search algorithm, the main challenge of which is the implementation of the inversion operator. We propose a method to implement the inversion operator by injecting a resource state into a teleportation circuit. The resource state is simulated using the Strawberry Fields quantum software package.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.02645

PDF

https://arxiv.org/pdf/1805.02645
Read All
Unpaired Multi-Domain Image Generation via Regularized Conditional GANs

2018-05-07

Xudong Mao, Qing Li

arXiv_CV

arXiv_CV GAN Face
Abstract

In this paper, we study the problem of multi-domain image generation, the goal of which is to generate pairs of corresponding images from different domains. With the recent development in generative models, image generation has achieved great progress and has been applied to various computer vision tasks. However, multi-domain image generation may not achieve the desired performance due to the difficulty of learning the correspondence of different domain images, especially when the information of paired samples is not given. To tackle this problem, we propose Regularized Conditional GAN (RegCGAN) which is capable of learning to generate corresponding images in the absence of paired training data. RegCGAN is based on the conditional GAN, and we introduce two regularizers to guide the model to learn the corresponding semantics of different domains. We evaluate the proposed model on several tasks for which paired training data is not given, including the generation of edges and photos, the generation of faces with different attributes, etc. The experimental results show that our model can successfully generate corresponding images for all these tasks, while outperforms the baseline methods. We also introduce an approach of applying RegCGAN to unsupervised domain adaptation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.02456

PDF

https://arxiv.org/pdf/1805.02456
Read All
ECO: Efficient Convolutional Network for Online Video Understanding

2018-05-07

Mohammadreza Zolfaghari, Kamaljeet Singh, Thomas Brox

arXiv_CV

arXiv_CV Video_Caption Caption CNN Classification Relation
Abstract

The state of the art in video understanding suffers from two problems: (1) The major part of reasoning is performed locally in the video, therefore, it misses important relationships within actions that span several seconds. (2) While there are local methods with fast per-frame processing, the processing of the whole video is not efficient and hampers fast video retrieval or online classification of long-term activities. In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time. The architecture is based on merging long-term content already in the network rather than in a post-hoc fusion. Together with a sampling strategy, which exploits that neighboring frames are largely redundant, this yields high-quality action classification and video captioning at up to 230 videos per second, where each video can consist of a few hundred frames. The approach achieves competitive performance across all datasets while being 10x to 80x faster than state-of-the-art methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.09066

PDF

https://arxiv.org/pdf/1804.09066
Read All
Detecting Small, Densely Distributed Objects with Filter-Amplifier Networks and Loss Boosting

2018-05-07

Zhenhua Chen, David Crandall, Robert Templeman

arXiv_CV

arXiv_CV Detection
Abstract

Detecting small, densely distributed objects is a significant challenge: small objects often contain less distinctive information compared to larger ones, and finer-grained precision of bounding box boundaries are required. In this paper, we propose two techniques for addressing this problem. First, we estimate the likelihood that each pixel belongs to an object boundary rather than predicting coordinates of bounding boxes (as YOLO, Faster-RCNN and SSD do), by proposing a new architecture called Filter-Amplifier Networks (FANs). Second, we introduce a technique called Loss Boosting (LB) which attempts to soften the loss imbalance problem on each image. We test our algorithm on the problem of detecting electrical components on a new, realistic, diverse dataset of printed circuit boards (PCBs), as well as the problem of detecting vehicles in the Vehicle Detection in Aerial Imagery (VEDAI) dataset. Experiments show that our method works significantly better than current state-of-the-art algorithms with respect to accuracy, recall and average IoU.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1802.07845

PDF

https://arxiv.org/e-print/1802.07845
Read All
Multi-Domain Neural Machine Translation

2018-05-06

Sander Tars, Mark Fishel

arXiv_CL

arXiv_CL Knowledge NMT
Abstract

We present an approach to neural machine translation (NMT) that supports multiple domains in a single model and allows switching between the domains when translating. The core idea is to treat text domains as distinct languages and use multilingual NMT methods to create multi-domain translation systems, we show that this approach results in significant translation quality gains over fine-tuning. We also explore whether the knowledge of pre-specified text domains is necessary, turns out that it is after all, but also that when it is not known quite high translation quality can be reached.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.02282

PDF

https://arxiv.org/pdf/1805.02282
Read All
Limited Evaluation Cooperative Co-evolutionary Differential Evolution for Large-scale Neuroevolution

2018-05-06

Anil Yaman, Decebal Constantin Mocanu, Giovanni Iacca, George Fletcher, Mykola Pechenizkiy

arXiv_CV

arXiv_CV Classification
Abstract

Many real-world control and classification tasks involve a large number of features. When artificial neural networks (ANNs) are used for modeling these tasks, the network architectures tend to be large. Neuroevolution is an effective approach for optimizing ANNs; however, there are two bottlenecks that make their application challenging in case of high-dimensional networks using direct encoding. First, classic evolutionary algorithms tend not to scale well for searching large parameter spaces; second, the network evaluation over a large number of training instances is in general time-consuming. In this work, we propose an approach called the Limited Evaluation Cooperative Co-evolutionary Differential Evolution algorithm (LECCDE) to optimize high-dimensional ANNs. The proposed method aims to optimize the pre-synaptic weights of each post-synaptic neuron in different subpopulations using a Cooperative Co-evolutionary Differential Evolution algorithm, and employs a limited evaluation scheme where fitness evaluation is performed on a relatively small number of training instances based on fitness inheritance. We test LECCDE on three datasets with various sizes, and our results show that cooperative co-evolution significantly improves the test error comparing to standard Differential Evolution, while the limited evaluation scheme facilitates a significant reduction in computing time.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.07234

PDF

https://arxiv.org/pdf/1804.07234
Read All

217/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL