Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search

2018-07-18

Arber Zela, Aaron Klein, Stefan Falkner, Frank Hutter

arXiv_CV

arXiv_CV NAS Optimization Deep_Learning Relation
Abstract

While existing work on neural architecture search (NAS) tunes hyperparameters in a separate post-processing step, we demonstrate that architectural choices and other hyperparameter settings interact in a way that can render this separation suboptimal. Likewise, we demonstrate that the common practice of using very few epochs during the main NAS and much larger numbers of epochs during a post-processing step is inefficient due to little correlation in the relative rankings for these two training regimes. To combat both of these problems, we propose to use a recent combination of Bayesian optimization and Hyperband for efficient joint neural architecture and hyperparameter search.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.06906

PDF

https://arxiv.org/pdf/1807.06906
Read All
BAM: Bottleneck Attention Module

2018-07-18

Jongchan Park, Sanghyun Woo, Joon-Young Lee, In So Kweon

arXiv_CV

arXiv_CV Attention NAS CNN Classification Detection
Abstract

Recent advances in deep neural networks have been developed via architecture search for stronger representational power. In this work, we focus on the effect of attention in general deep neural networks. We propose a simple and effective attention module, named Bottleneck Attention Module (BAM), that can be integrated with any feed-forward convolutional neural networks. Our module infers an attention map along two separate pathways, channel and spatial. We place our module at each bottleneck of models where the downsampling of feature maps occurs. Our module constructs a hierarchical attention at bottlenecks with a number of parameters and it is trainable in an end-to-end manner jointly with any feed-forward models. We validate our BAM through extensive experiments on CIFAR-100, ImageNet-1K, VOC 2007 and MS COCO benchmarks. Our experiments show consistent improvement in classification and detection performances with various models, demonstrating the wide applicability of BAM. The code and models will be publicly available.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.06514

PDF

https://arxiv.org/pdf/1807.06514
Read All
Unpaired Image Captioning by Language Pivoting

2018-07-18

Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang

arXiv_CV

arXiv_CV Image_Caption Caption Quantitative
Abstract

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this unpaired image captioning problem by language pivoting. Our method can effectively capture the characteristics of an image captioner from the pivot language (Chinese) and align it to the target language (English) using another pivot-target (Chinese-English) sentence parallel corpus. We evaluate our method on two image-to-English benchmark datasets: MSCOCO and Flickr30K. Quantitative comparisons against several baseline approaches demonstrate the effectiveness of our method.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1803.05526

PDF

https://arxiv.org/pdf/1803.05526
Read All
Learning to Search with MCTSnets

2018-07-17

Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, David Silver

arXiv_CV

arXiv_CV Embedding
Abstract

Planning problems are among the most important and well-studied problems in artificial intelligence. They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree. Among these algorithms, Monte-Carlo tree search (MCTS) is one of the most general, powerful and widely used. A typical implementation of MCTS uses cleverly designed rules, optimized to the particular characteristics of the domain. These rules control where the simulation traverses, what to evaluate in the states that are reached, and how to back-up those evaluations. In this paper we instead learn where, what and how to search. Our architecture, which we call an MCTSnet, incorporates simulation-based search inside a neural network, by expanding, evaluating and backing-up a vector embedding. The parameters of the network are trained end-to-end using gradient-based optimisation. When applied to small searches in the well known planning problem Sokoban, the learned search algorithm significantly outperformed MCTS baselines.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1802.04697

PDF

https://arxiv.org/pdf/1802.04697
Read All
Recurrent Stacking of Layers for Compact Neural Machine Translation Models

2018-07-17

Raj Dabre, Atsushi Fujita

arXiv_CL

arXiv_CL NMT
Abstract

In neural machine translation (NMT), the most common practice is to stack a number of recurrent or feed-forward layers in the encoder and the decoder. As a result, the addition of each new layer improves the translation quality significantly. However, this also leads to a significant increase in the number of parameters. In this paper, we propose to share parameters across all the layers thereby leading to a recurrently stacked NMT model. We empirically show that the translation quality of a model that recurrently stacks a single layer 6 times is comparable to the translation quality of a model that stacks 6 separate layers. We also show that using pseudo-parallel corpora by back-translation leads to further significant improvements in translation quality.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.05353

PDF

https://arxiv.org/pdf/1807.05353
Read All
Leveraging Pre-Trained 3D Object Detection Models For Fast Ground Truth Generation

2018-07-16

Jungwook Lee, Sean Walsh, Ali Harakeh, Steven L. Waslander

arXiv_CV

arXiv_CV Object_Detection Segmentation Detection
Abstract

Training 3D object detectors for autonomous driving has been limited to small datasets due to the effort required to generate annotations. Reducing both task complexity and the amount of task switching done by annotators is key to reducing the effort and time required to generate 3D bounding box annotations. This paper introduces a novel ground truth generation method that combines human supervision with pretrained neural networks to generate per-instance 3D point cloud segmentation, 3D bounding boxes, and class annotations. The annotators provide object anchor clicks which behave as a seed to generate instance segmentation results in 3D. The points belonging to each instance are then used to regress object centroids, bounding box dimensions, and object orientation. Our proposed annotation scheme requires 30x lower human annotation time. We use the KITTI 3D object detection dataset to evaluate the efficiency and the quality of our annotation scheme. We also test the the proposed scheme on previously unseen data from the Autonomoose self-driving vehicle to demonstrate generalization capabilities of the network.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.06072

PDF

https://arxiv.org/pdf/1807.06072
Read All
Pangloss: Fast Entity Linking in Noisy Text Environments

2018-07-16

Michael Conover, Matthew Hayes, Scott Blackburn, Pete Skomoroch, Sam Shah

arXiv_CV

arXiv_CV Knowledge_Graph Knowledge GAN Embedding
Abstract

Entity linking is the task of mapping potentially ambiguous terms in text to their constituent entities in a knowledge base like Wikipedia. This is useful for organizing content, extracting structured data from textual documents, and in machine learning relevance applications like semantic search, knowledge graph construction, and question answering. Traditionally, this work has focused on text that has been well-formed, like news articles, but in common real world datasets such as messaging, resumes, or short-form social media, non-grammatical, loosely-structured text adds a new dimension to this problem. This paper presents Pangloss, a production system for entity disambiguation on noisy text. Pangloss combines a probabilistic linear-time key phrase identification algorithm with a semantic similarity engine based on context-dependent document embeddings to achieve better than state-of-the-art results (>5% in F1) compared to other research or commercially available systems. In addition, Pangloss leverages a local embedded database with a tiered architecture to house its statistics and metadata, which allows rapid disambiguation in streaming contexts and on-device disambiguation in low-memory environments such as mobile phones.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.06036

PDF

https://arxiv.org/pdf/1807.06036
Read All
Applying Domain Randomization to Synthetic Data for Object Category Detection

2018-07-16

João Borrego, Atabak Dehban, Rui Figueiredo, Plinio Moreno, Alexandre Bernardino, José Santos-Victor

arXiv_CV

arXiv_CV Object_Detection Deep_Learning Detection
Abstract

Recent advances in deep learning-based object detection techniques have revolutionized their applicability in several fields. However, since these methods rely on unwieldy and large amounts of data, a common practice is to download models pre-trained on standard datasets and fine-tune them for specific application domains with a small set of domain relevant images. In this work, we show that using synthetic datasets that are not necessarily photo-realistic can be a better alternative to simply fine-tune pre-trained networks. Specifically, our results show an impressive 25% improvement in the mAP metric over a fine-tuning baseline when only about 200 labelled images are available to train. Finally, an ablation study of our results is presented to delineate the individual contribution of different components in the randomization pipeline.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.09834

PDF

https://arxiv.org/pdf/1807.09834
Read All
Object Relation Detection Based on One-shot Learning

2018-07-16

Li Zhou, Jian Zhao, Jianshu Li, Li Yuan, Jiashi Feng

arXiv_CV

arXiv_CV Image_Caption Attention Deep_Learning Detection Relation Recognition
Abstract

Detecting the relations among objects, such as “cat on sofa” and “person ride horse”, is a crucial task in image understanding, and beneficial to bridging the semantic gap between images and natural language. Despite the remarkable progress of deep learning in detection and recognition of individual objects, it is still a challenging task to localize and recognize the relations between objects due to the complex combinatorial nature of various kinds of object relations. Inspired by the recent advances in one-shot learning, we propose a simple yet effective Semantics Induced Learner (SIL) model for solving this challenging task. Learning in one-shot manner can enable a detection model to adapt to a huge number of object relations with diverse appearance effectively and robustly. In addition, the SIL combines bottom-up and top-down attention mech- anisms, therefore enabling attention at the level of vision and semantics favorably. Within our proposed model, the bottom-up mechanism, which is based on Faster R-CNN, proposes objects regions, and the top-down mechanism selects and integrates visual features according to semantic information. Experiments demonstrate the effectiveness of our framework over other state-of-the-art methods on two large-scale data sets for object relation detection.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.05857

PDF

https://arxiv.org/pdf/1807.05857
Read All
Object Detection with Deep Learning: A Review

2018-07-15

Zhong-Qiu Zhao, Peng Zheng, Shou-tao Xu, Xindong Wu

arXiv_CV

arXiv_CV Image_Caption Salient Review Object_Detection Attention Face Survey CNN Optimization Deep_Learning Detection Face_Detection Relation
Abstract

Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles which combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy and optimization function, etc. In this paper, we provide a review on deep learning based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely Convolutional Neural Network (CNN). Then we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network based learning systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.05511

PDF

https://arxiv.org/pdf/1807.05511
Read All
Interface roughness, carrier localization and wave function overlap in $c$-plane InGaN/GaN quantum wells: Interplay of well width, alloy microstructure, structural inhomogeneities and Coulomb effects

2018-07-14

Daniel S. P. Tanner, Joshua M. McMahon, Stefan Schulz

arXiv_CV

arXiv_CV Attention GAN Face
Abstract

In this work we present a detailed analysis of the interplay of Coulomb effects and different mechanisms that can lead to carrier localization effects in c-plane InGaN/GaN quantum wells. As mechanisms for carrier localization we consider here effects introduced by random alloy fluctuations as well as structural inhomogeneities such as well width fluctuations. Special attention is paid to the impact of the well width on the results. All calculations have been carried out in the framework of atomistic tight-binding theory. Our theoretical investigations show that independent of the here studied well widths, carrier localization effects due to built-in fields, well width fluctuations and random alloy fluctuations dominate over Coulomb effects in terms of charge density redistributions. However, the situation is less clear cut when the well width fluctuations are absent. For large well width (approx. > 2.5 nm) charge density redistributions are possible but the electronic and optical properties are basically dominated by the spatial out-of plane carrier separation originating from the electrostatic built-in field. The situation changes for lower well width (< 2.5 nm) where the Coulomb effect can lead to significant charge density redistributions and thus might compensate a large fraction of the spatial in-plane wave function separation observed in a single-particle picture. Given that this in-plane separation has been regarded as one of the main drivers behind the green gap problem, our calculations indicate that radiative recombination rates might significantly benefit from a reduced quantum well barrier interface roughness.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.05392

PDF

https://arxiv.org/pdf/1807.05392
Read All
Predicting Visual Features from Text for Image and Video Caption Retrieval

2018-07-14

Jianfeng Dong, Xirong Li, Cees G. M. Snoek

arXiv_CV

arXiv_CV Video_Caption Caption Embedding CNN
Abstract

This paper strives to find amidst a set of sentences the one best describing the content of a given image or video. Different from existing works, which rely on a joint subspace for their image and video caption retrieval, we propose to do so in a visual space exclusively. Apart from this conceptual novelty, we contribute \emph{Word2VisualVec}, a deep neural network architecture that learns to predict a visual feature representation from textual input. Example captions are encoded into a textual embedding based on multi-scale sentence vectorization and further transferred into a deep visual feature of choice via a simple multi-layer perceptron. We further generalize Word2VisualVec for video caption retrieval, by predicting from text both 3-D convolutional neural network features as well as a visual-audio representation. Experiments on Flickr8k, Flickr30k, the Microsoft Video Description dataset and the very recent NIST TrecVid challenge for video caption retrieval detail Word2VisualVec’s properties, its benefit over textual embeddings, the potential for multimodal query composition and its state-of-the-art results.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1709.01362

PDF

https://arxiv.org/pdf/1709.01362
Read All
Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption

2018-07-14

Peilun Li, Xiaodan Liang, Daoyuan Jia, Eric P. Xing

arXiv_CV

arXiv_CV Knowledge Segmentation GAN Semantic_Segmentation Quantitative Recognition
Abstract

Recent advances in vision tasks (e.g., segmentation) highly depend on the availability of large-scale real-world image annotations obtained by cumbersome human labors. Moreover, the perception performance often drops significantly for new scenarios, due to the poor generalization capability of models trained on limited and biased annotations. In this work, we resort to transfer knowledge from automatically rendered scene annotations in virtual-world to facilitate real-world visual tasks. Although virtual-world annotations can be ideally diverse and unlimited, the discrepant data distributions between virtual and real-world make it challenging for knowledge transferring. We thus propose a novel Semantic-aware Grad-GAN (SG-GAN) to perform virtual-to-real domain adaption with the ability of retaining vital semantic information. Beyond the simple holistic color/texture transformation achieved by prior works, SG-GAN successfully personalizes the appearance adaption for each semantic region in order to preserve their key characteristic for better recognition. It presents two main contributions to traditional GANs: 1) a soft gradient-sensitive objective for keeping semantic boundaries; 2) a semantic-aware discriminator for validating the fidelity of personalized adaptions with respect to each semantic region. Qualitative and quantitative experiments demonstrate the superiority of our SG-GAN in scene adaption over state-of-the-art GANs. Further evaluations on semantic segmentation on Cityscapes show using adapted virtual images by SG-GAN dramatically improves segmentation performance than original virtual data. We release our code at this https URL.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1801.01726

PDF

https://arxiv.org/pdf/1801.01726
Read All
TequilaGAN: How to easily identify GAN samples

2018-07-13

Rafael Valle, Wilson Cai, Anish Doshi

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

In this paper we show strategies to easily identify fake samples generated with the Generative Adversarial Network framework. One strategy is based on the statistical analysis and comparison of raw pixel values and features extracted from them. The other strategy learns formal specifications from the real data and shows that fake samples violate the specifications of the real data. We show that fake samples produced with GANs have a universal signature that can be used to identify fake samples. We provide results on MNIST, CIFAR10, music and speech data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.04919

PDF

https://arxiv.org/pdf/1807.04919
Read All
TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection

2018-07-13

Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi, Jinjun Xiong, Jiashi Feng, Thomas Huang

arXiv_CV

arXiv_CV Object_Detection Segmentation Weakly_Supervised Detection
Abstract

This work provides a simple approach to discover tight object bounding boxes with only image-level supervision, called Tight box mining with Surrounding Segmentation Context (TS2C). We observe that object candidates mined through current multiple instance learning methods are usually trapped to discriminative object parts, rather than the entire object. TS2C leverages surrounding segmentation context derived from weakly-supervised segmentation to suppress such low-quality distracting candidates and boost the high-quality ones. Specifically, TS2C is developed based on two key properties of desirable bounding boxes: 1) high purity, meaning most pixels in the box are with high object response, and 2) high completeness, meaning the box covers high object response pixels comprehensively. With such novel and computable criteria, more tight candidates can be discovered for learning a better object detector. With TS2C, we obtain 48.0% and 44.4% mAP scores on VOC 2007 and 2012 benchmarks, which are the new state-of-the-arts.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.04897

PDF

https://arxiv.org/pdf/1807.04897
Read All
Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes

2018-07-13

Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu

arXiv_CV

arXiv_CV Object_Detection Quantitative Detection
Abstract

Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource. In this paper, an optical flow based moving object detection framework is proposed to address this problem. We utilize homography matrixes to online construct a background model in the form of optical flow. When judging out moving foregrounds from scenes, a dual-mode judge mechanism is designed to heighten the system’s adaptation to challenging situations. In experiment part, two evaluation metrics are redefined for more properly reflecting the performance of methods. We quantitatively and qualitatively validate the effectiveness and feasibility of our method with videos in various scene conditions. The experimental results show that our method adapts itself to different situations and outperforms the state-of-the-art methods, indicating the advantages of optical flow based methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.04890

PDF

https://arxiv.org/pdf/1807.04890
Read All
Joint 3D Proposal Generation and Object Detection from View Aggregation

2018-07-12

Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, Steven Waslander

arXiv_CV

arXiv_CV Object_Detection Classification Detection
Abstract

We present AVOD, an Aggregate View Object Detection network for autonomous driving scenarios. The proposed neural network architecture uses LIDAR point clouds and RGB images to generate features that are shared by two subnetworks: a region proposal network (RPN) and a second stage detector network. The proposed RPN uses a novel architecture capable of performing multimodal feature fusion on high resolution feature maps to generate reliable 3D object proposals for multiple object classes in road scenes. Using these proposals, the second stage detection network performs accurate oriented 3D bounding box regression and category classification to predict the extents, orientation, and classification of objects in 3D space. Our proposed architecture is shown to produce state of the art results on the KITTI 3D object detection benchmark while running in real time with a low memory footprint, making it a suitable candidate for deployment on autonomous vehicles. Code is at: this https URL

Abstract (translated by Google)

URL

https://arxiv.org/abs/1712.02294

PDF

https://arxiv.org/pdf/1712.02294
Read All
Unsupervised nonparametric detection of unknown objects in noisy images based on percolation theory

2018-07-12

Mikhail A. Langovoy, Olaf Wittich, Patrick Laurie Davies

arXiv_CV

arXiv_CV Detection
Abstract

We develop an unsupervised, nonparametric, and scalable statistical learning method for detection of unknown objects in noisy images. The method uses results from percolation theory and random graph theory. We present an algorithm that allows to detect objects of unknown shapes and sizes in the presence of nonparametric noise of unknown level. The noise density is assumed to be unknown and can be very irregular. The algorithm has linear complexity and exponential accuracy and is appropriate for real-time systems. We prove strong consistency and scalability of our method in this setup with minimal assumptions.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1102.5019

PDF

https://arxiv.org/pdf/1102.5019
Read All
Fictitious GAN: Training GANs with Historical Models

2018-07-11

Hao Ge, Yin Xia, Xu Chen, Randall Berry, Ying Wu

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Generative adversarial networks (GANs) are powerful tools for learning generative models. In practice, the training may suffer from lack of convergence. GANs are commonly viewed as a two-player zero-sum game between two neural networks. Here, we leverage this game theoretic view to study the convergence behavior of the training process. Inspired by the fictitious play learning process, a novel training method, referred to as Fictitious GAN, is introduced. Fictitious GAN trains the deep neural networks using a mixture of historical models. Specifically, the discriminator (resp. generator) is updated according to the best-response to the mixture outputs from a sequence of previously trained generators (resp. discriminators). It is shown that Fictitious GAN can effectively resolve some convergence issues that cannot be resolved by the standard training approach. It is proved that asymptotically the average of the generator outputs has the same distribution as the data samples.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1803.08647

PDF

https://arxiv.org/pdf/1803.08647
Read All
Manifold regularization with GANs for semi-supervised learning

2018-07-11

Bruno Lecouat, Chuan-Sheng Foo, Houssam Zenati, Vijay Chandrasekhar

arXiv_CV

arXiv_CV Regularization Adversarial GAN
Abstract

Generative Adversarial Networks are powerful generative models that are able to model the manifold of natural images. We leverage this property to perform manifold regularization by approximating a variant of the Laplacian norm using a Monte Carlo approximation that is easily computed with the GAN. When incorporated into the semi-supervised feature-matching GAN we achieve state-of-the-art results for GAN-based semi-supervised learning on CIFAR-10 and SVHN benchmarks, with a method that is significantly easier to implement than competing methods. We also find that manifold regularization improves the quality of generated images, and is affected by the quality of the GAN used to approximate the regularizer.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.04307

PDF

https://arxiv.org/pdf/1807.04307
Read All
Recurrent Semi-supervised Classification and Constrained Adversarial Generation with Motion Capture Data

2018-07-11

Félix G. Harvey, Julien Roy, David Kanaa, Christopher Pal

arXiv_CV

arXiv_CV Adversarial GAN Classification
Abstract

We explore recurrent encoder multi-decoder neural network architectures for semi-supervised sequence classification and reconstruction. We find that the use of multiple reconstruction modules helps models generalize in a classification task when only a small amount of labeled data is available, which is often the case in practice. Such models provide useful high-level representations of motions allowing clustering, searching and faster labeling of new sequences. We also propose a new, realistic partitioning of a well-known, high quality motion-capture dataset for better evaluations. We further explore a novel formulation for future-predicting decoders based on conditional recurrent generative adversarial networks, for which we propose both soft and hard constraints for transition generation derived from desired physical properties of synthesized future movements and desired animation goals. We find that using such constraints allow to stabilize the training of recurrent adversarial architectures for animation generation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1511.06653

PDF

https://arxiv.org/pdf/1511.06653
Read All
Triangular Architecture for Rare Language Translation

2018-07-11

Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, Shuai Ma

arXiv_CL

arXiv_CL NMT
Abstract

Neural Machine Translation (NMT) performs poor on the low-resource language pair $(X,Z)$, especially when $Z$ is a rare language. By introducing another rich language $Y$, we propose a novel triangular training architecture (TA-NMT) to leverage bilingual data $(Y,Z)$ (may be small) and $(X,Y)$ (can be rich) to improve the translation performance of low-resource pairs. In this triangular architecture, $Z$ is taken as the intermediate latent variable, and translation models of $Z$ are jointly optimized with a unified bidirectional EM algorithm under the goal of maximizing the translation likelihood of $(X,Y)$. Empirical results demonstrate that our method significantly improves the translation quality of rare languages on MultiUN and IWSLT2012 datasets, and achieves even better performance combining back-translation methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1805.04813

PDF

https://arxiv.org/pdf/1805.04813
Read All
Recurrent neural networks running on quantum spins: memory accuracy and capacity

2018-07-11

Aki Kutvonen, Takahiro Sagawa, Keisuke Fujii

arXiv_CV

arXiv_CV Speech_Recognition RNN Prediction Recognition
Abstract

Quantum computing and neural networks show great promise for the future of information processing. In this paper we study a quantum reservoir computer, a framework harnessing quantum dynamics and designed for fast and efficient solving of temporal machine learning tasks such as speech recognition, time series prediction and natural language processing. Specifically, we study memory capacity and accuracy of a quantum reservoir computer based on the fully connected transverse field Ising model by investigating different forms of inter-spin interactions and computing timescales. We show that variation in inter-spin interactions leads to a better memory capacity in general, by engineering the type of interactions the capacity can be greatly enhanced and there exists an optimal timescale at which the capacity is maximized. To connect computational capabilities to physical properties of the underlaying system, we also study the out-of-time-ordered correlator and find that its faster decay implies a more accurate memory.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.03947

PDF

https://arxiv.org/pdf/1807.03947
Read All
A multiline study of a high-mass young stellar object in the Small Magellanic Cloud with ALMA: The detection of methanol gas at 0.2 solar metallicity

2018-07-11

Takashi Shimonishi, Yoshimasa Watanabe, Yuri Nishimura, Yuri Aikawa, Satoshi Yamamoto, Takashi Onaka, Nami Sakai, Akiko Kawamura

arXiv_CV

arXiv_CV GAN Face Detection
Abstract

We report the results of subparsec-scale submillimeter observations towards an embedded high-mass young stellar object in the Small Magellanic Cloud (SMC) with ALMA. Complementary infrared data obtained with the AKARI satellite and the Gemini South telescope are also presented. The target infrared point source is spatially resolved into two dense molecular cloud cores; one is associated with a high-mass young stellar object (YSO core), while another is not associated with an infrared source (East core). The two cores are dynamically associated but show different chemical characteristics. Emission lines of CS, C33S, H2CS, SO, SO2, CH3OH, H13CO+, H13CN, SiO, and dust continuum are detected from the observed region. Tentative detection of HDS is also reported. The first detection of CH3OH in the SMC has a strong impact on our understanding of the formation of complex organic molecules in metal-poor environments. The gas temperature is estimated to be ~10 K based on the rotation analysis of CH3OH lines. The fractional abundance of CH3OH gas in the East core is estimated to be (0.5-1.5) x 10^(-8), which is comparable with or marginally higher than those of similar cold sources in our Galaxy despite a factor of five lower metallicity in the SMC. This work provides observational evidence that an organic molecule like CH3OH, which is largely formed on grain surfaces, can be produced even in a significantly lower metallicity environment compared to the solar neighborhood. A possible origin of cold CH3OH gas in the observed dense core is discussed.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.07120

PDF

https://arxiv.org/pdf/1806.07120
Read All
Robust Beamforming Design in a NOMA Cognitive Radio Network Relying on SWIPT

2018-07-11

Haijian Sun, Fuhui Zhou, Rose Qingyang Hu, Lajos Hanzo

arXiv_CV

arXiv_CV Optimization
Abstract

This paper studies a multiple-input single-output non-orthogonal multiple access cognitive radio network relying on simultaneous wireless information and power transfer. A realistic non-linear energy harvesting model is applied and a power splitting architecture is adopted at each secondary user (SU). Since it is difficult to obtain perfect channel state information (CSI) in practice, instead either a bounded or gaussian CSI error model is considered. Our robust beamforming and power splitting ratio are jointly designed for two problems with different objectives, namely that of minimizing the transmission power of the cognitive base station and that of maximizing the total harvested energy of the SUs, respectively. The optimization problems are challenging to solve, mainly because of the non-linear structure of the energy harvesting and CSI errors models. We converted them into convex forms by using semi-definite relaxation. For the minimum transmission power problem, we obtain the rank-2 solution under the bounded CSI error model, while for the maximum energy harvesting problem, a two-loop procedure using a one-dimensional search is proposed. Our simulation results show that the proposed scheme significantly outperforms its traditional orthogonal multiple access counterpart. Furthermore, the performance using the gaussian CSI error model is generally better than that using the bounded CSI error model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.03930

PDF

https://arxiv.org/pdf/1807.03930
Read All
Falling Things: A Synthetic Dataset for 3D Object Detection and Pose Estimation

2018-07-10

Jonathan Tremblay, Thang To, Stan Birchfield

arXiv_CV

arXiv_CV Object_Detection Segmentation Pose_Estimation Detection
Abstract

We present a new dataset, called Falling Things (FAT), for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics. By synthetically combining object models and backgrounds of complex composition and high graphical quality, we are able to generate photorealistic images with accurate 3D pose annotations for all objects in all images. Our dataset contains 60k annotated photos of 21 household objects taken from the YCB dataset. For each image, we provide the 3D poses, per-pixel class segmentation, and 2D/3D bounding box coordinates for all objects. To facilitate testing different input modalities, we provide mono and stereo RGB images, along with registered dense depth images. We describe in detail the generation process and statistical analysis of the data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.06534

PDF

https://arxiv.org/pdf/1804.06534
Read All
SimArch: A Multi-agent System For Human Path Simulation In Architecture Design

2018-07-10

Yen-Chia Hsu

arXiv_CV

arXiv_CV Prediction Detection
Abstract

Human moving path is an important feature in architecture design. By studying the path, architects know where to arrange the basic elements (e.g. structures, glasses, furniture, etc.) in the space. This paper presents SimArch, a multi-agent system for human moving path simulation. It involves a behavior model built by using a Markov Decision Process. The model simulates human mental states, target range detection, and collision prediction when agents are on the floor, in a particular small gallery, looking at an exhibit, or leaving the floor. It also models different kinds of human characteristics by assigning different transition probabilities. A modified weighted A* search algorithm quickly plans the sub-optimal path of the agents. In an experiment, SimArch takes a series of preprocessed floorplans as inputs, simulates the moving path, and outputs a density map for evaluation. The density map provides the prediction that how likely a person will occur in a location. A following discussion illustrates how architects can use the density map to improve their floorplan design.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.03760

PDF

https://arxiv.org/pdf/1807.03760
Read All
Data-Driven Forecasting of High-Dimensional Chaotic Systems with Long Short-Term Memory Networks

2018-07-10

Pantelis R. Vlachas, Wonmin Byeon, Zhong Y. Wan, Themistoklis P. Sapsis, Petros Koumoutsakos

arXiv_CV

arXiv_CV Inference RNN Memory_Networks
Abstract

We introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks. The proposed LSTM neural networks perform inference of high-dimensional dynamical systems in their reduced order space and are shown to be an effective set of nonlinear approximators of their attractor. We demonstrate the forecasting performance of the LSTM and compare it with Gaussian processes (GPs) in time series obtained from the Lorenz 96 system, the Kuramoto-Sivashinsky equation and a prototype climate model. The LSTM networks outperform the GPs in short-term forecasting accuracy in all applications considered. A hybrid architecture, extending the LSTM with a mean stochastic model (MSM-LSTM), is proposed to ensure convergence to the invariant measure. This novel hybrid method is fully data-driven and extends the forecasting capabilities of LSTM networks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1802.07486

PDF

https://arxiv.org/pdf/1802.07486
Read All
Topic-Guided Attention for Image Captioning

2018-07-10

Zhihao Zhu, Zhan Xue, Zejian Yuan

arXiv_CV

arXiv_CV Image_Caption Attention Caption Quantitative
Abstract

Attention mechanisms have attracted considerable interest in image captioning because of its powerful performance. Existing attention-based models use feedback information from the caption generator as guidance to determine which of the image features should be attended to. A common defect of these attention generation methods is that they lack a higher-level guiding information from the image itself, which sets a limit on selecting the most informative image features. Therefore, in this paper, we propose a novel attention mechanism, called topic-guided attention, which integrates image topics in the attention model as a guiding information to help select the most important image features. Moreover, we extract image features and image topics with separate networks, which can be fine-tuned jointly in an end-to-end manner during training. The experimental results on the benchmark Microsoft COCO dataset show that our method yields state-of-art performance on various quantitative metrics.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.03514

PDF

https://arxiv.org/pdf/1807.03514
Read All
Pooling Pyramid Network for Object Detection

2018-07-09

Pengchong Jin, Vivek Rathod, Xiangxin Zhu

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

We’d like to share a simple tweak of Single Shot Multibox Detector (SSD) family of detectors, which is effective in reducing model size while maintaining the same quality. We share box predictors across all scales, and replace convolution between scales with max pooling. This has two advantages over vanilla SSD: (1) it avoids score miscalibration across scales; (2) the shared predictor sees the training data over all scales. Since we reduce the number of predictors to one, and trim all convolutions between them, model size is significantly smaller. We empirically show that these changes do not hurt model quality compared to vanilla SSD.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.03284

PDF

https://arxiv.org/pdf/1807.03284
Read All
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling

2018-07-09

Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang

arXiv_CV

arXiv_CV Adversarial Face Reinforcement_Learning Caption
Abstract

Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem. Different from captions, stories have more expressive language styles and contain many imaginary concepts that do not appear in the images. Thus it poses challenges to behavioral cloning algorithms. Furthermore, due to the limitations of automatic metrics on evaluating story quality, reinforcement learning methods with hand-crafted rewards also face difficulties in gaining an overall performance boost. Therefore, we propose an Adversarial REward Learning (AREL) framework to learn an implicit reward function from human demonstrations, and then optimize policy search with the learned reward function. Though automatic eval- uation indicates slight performance boost over state-of-the-art (SOTA) methods in cloning expert behaviors, human evaluation shows that our approach achieves significant improvement in generating more human-like stories than SOTA systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.09160

PDF

https://arxiv.org/pdf/1804.09160
Read All
Learning The Sequential Temporal Information with Recurrent Neural Networks

2018-07-08

Pushparaja Murugan

arXiv_CV

arXiv_CV Image_Caption Review Speech_Recognition Tracking Caption Object_Tracking RNN Language_Model Prediction Recognition
Abstract

Recurrent Networks are one of the most powerful and promising artificial neural network algorithms to processing the sequential data such as natural languages, sound, time series data. Unlike traditional feed-forward network, Recurrent Network has a inherent feed back loop that allows to store the temporal context information and pass the state of information to the entire sequences of the events. This helps to achieve the state of art performance in many important tasks such as language modeling, stock market prediction, image captioning, speech recognition, machine translation and object tracking etc., However, training the fully connected RNN and managing the gradient flow are the complicated process. Many studies are carried out to address the mentioned limitation. This article is intent to provide the brief details about recurrent neurons, its variances and trips & tricks to train the fully recurrent neural network. This review work is carried out as a part of our IPO studio software module ‘Multiple Object Tracking’.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.02857

PDF

https://arxiv.org/pdf/1807.02857
Read All
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

2018-07-08

Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty

arXiv_CV

arXiv_CV Video_Caption Attention Caption RNN Language_Model Prediction
Abstract

The explosion of video data on the internet requires effective and efficient technology to generate captions automatically for people who are not able to watch the videos. Despite the great progress of video captioning research, particularly on video feature encoding, the language decoder is still largely based on the prevailing RNN decoder such as LSTM, which tends to prefer the frequent word that aligns with the video. In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model. Most importantly, we introduce a binary gate into the low-level GRU language decoder to detect the language boundaries. Together with other advanced components including joint video prediction, shared soft attention, and boundary-aware video encoding, our integrated video captioning framework can discover hierarchical language information and distinguish the subject and the object in a sentence, which are usually confusing during the language generation. Extensive experiments on two widely-used video captioning datasets, MSR-Video-to-Text (MSR-VTT) \cite{xu2016msr} and YouTube-to-Text (MSVD) \cite{chen2011collecting} show that our method is highly competitive, compared with the state-of-the-art methods.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.03658

PDF

https://arxiv.org/pdf/1807.03658
Read All
Optimal Sensor Data Fusion Architecture for Object Detection in Adverse Weather Conditions

2018-07-06

Andreas Pfeuffer, Klaus Dietmayer

arXiv_CV

arXiv_CV Object_Detection Knowledge Detection
Abstract

A good and robust sensor data fusion in diverse weather conditions is a quite challenging task. There are several fusion architectures in the literature, e.g. the sensor data can be fused right at the beginning (Early Fusion), or they can be first processed separately and then concatenated later (Late Fusion). In this work, different fusion architectures are compared and evaluated by means of object detection tasks, in which the goal is to recognize and localize predefined objects in a stream of data. Usually, state-of-the-art object detectors based on neural networks are highly optimized for good weather conditions, since the well-known benchmarks only consist of sensor data recorded in optimal weather conditions. Therefore, the performance of these approaches decreases enormously or even fails in adverse weather conditions. In this work, different sensor fusion architectures are compared for good and adverse weather conditions for finding the optimal fusion architecture for diverse weather situations. A new training strategy is also introduced such that the performance of the object detector is greatly enhanced in adverse weather scenarios or if a sensor fails. Furthermore, the paper responds to the question if the detection accuracy can be increased further by providing the neural network with a-priori knowledge such as the spatial calibration of the sensors.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.02323

PDF

https://arxiv.org/pdf/1807.02323
Read All
Slytherin: Dynamic, Network-assisted Prioritization of Tail Packets in Datacenter Networks

2018-07-05

Hamed Rezaei, Mojtaba Malekpourshahraki, Balajee Vamanan

arXiv_CV

arXiv_CV
Abstract

Datacenter applications demand both low latency and high throughput; while interactive applications (e.g., Web Search) demand low tail latency for their short messages due to their partition-aggregate software architecture, many data-intensive applications (e.g., Map-Reduce) require high throughput for long flows as they move vast amounts of data across the network. Recent proposals improve latency of short flows and throughput of long flows by addressing the shortcomings of existing packet scheduling and congestion control algorithms, respectively. We make the key observation that long tails in the Flow Completion Times (FCT) of short flows result from packets that suffer congestion at more than one switch along their paths in the network. Our proposal, Slytherin, specifically targets packets that suffered from congestion at multiple points and prioritizes them in the network. Slytherin leverages ECN mechanism which is widely used in existing datacenters to identify such tail packets and dynamically prioritizes them using existing priority queues. As compared to existing state-of-the-art packet scheduling proposals, Slytherin achieves 18.6% lower 99th percentile flow completion times for short flows without any loss of throughput. Further, Slytherin drastically reduces 99th percentile queue length in switches by a factor of about 2x on average.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.02184

PDF

https://arxiv.org/pdf/1807.02184
Read All
Localization Recall Precision : A New Performance Metric for Object Detection

2018-07-05

Kemal Oksuz, Baris Can Cam, Emre Akbas, Sinan Kalkan

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Average precision (AP), the area under the recall-precision (RP) curve, is the standard performance measure for object detection. Despite its wide acceptance, it has a number of shortcomings, the most important of which are (i) the inability to distinguish very different RP curves, and (ii) the lack of directly measuring bounding box localization accuracy. In this paper, we propose ‘Localization Recall Precision (LRP) Error’, a new metric which we specifically designed for object detection. LRP Error is composed of three components related to localization, false negative (FN) rate and false positive (FP) rate. Based on LRP, we introduce the ‘Optimal LRP’, the minimum achievable LRP error representing the best achievable configuration of the detector in terms of recall-precision and the tightness of the boxes. In contrast to AP, which considers precisions over the entire recall domain, Optimal LRP determines the ‘best’ confidence score threshold for a class, which balances the trade-off between localization and recall-precision. In our experiments, we show that, for state-of-the-art object (SOTA) detectors, Optimal LRP provides richer and more discriminative information than AP. We also demonstrate that the best confidence score thresholds vary significantly among classes and detectors. Moreover, we present LRP results of a simple online video object detector which uses a SOTA still image object detector and show that the class-specific optimized thresholds increase the accuracy against the common approach of using a general threshold for all classes. At this https URL we provide the source code that can compute LRP for the PASCAL VOC and MSCOCO datasets. Our source code can easily be adapted to other datasets as well.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.01696

PDF

https://arxiv.org/pdf/1807.01696
Read All
Faster Bounding Box Annotation for Object Detection in Indoor Scenes

2018-07-03

Bishwo Adhikari, Jukka Peltomäki, Jussi Puura, Heikki Huttunen

arXiv_CV

arXiv_CV Object_Detection Deep_Learning Detection
Abstract

This paper proposes an approach for rapid bounding box annotation for object detection datasets. The procedure consists of two stages: The first step is to annotate a part of the dataset manually, and the second step proposes annotations for the remaining samples using a model trained with the first stage annotations. We experimentally study which first/second stage split minimizes to total workload. In addition, we introduce a new fully labeled object detection dataset collected from indoor scenes. Compared to other indoor datasets, our collection has more class categories, different backgrounds, lighting conditions, occlusion and high intra-class differences. We train deep learning based object detectors with a number of state-of-the-art models and compare them in terms of speed and accuracy. The fully annotated dataset is released freely available for the research community.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.03142

PDF

https://arxiv.org/pdf/1807.03142
Read All
A JND-based Video Quality Assessment Model and Its Application

2018-07-02

Haiqiang Wang, Xinfeng Zhang, Chao Yang, C.-C. Jay Kuo

arXiv_CV

arXiv_CV QA VQA
Abstract

Based on the Just-Noticeable-Difference (JND) criterion, a subjective video quality assessment (VQA) dataset, called the VideoSet, was constructed recently. In this work, we propose a JND-based VQA model using a probabilistic framework to analyze and clean collected subjective test data. While most traditional VQA models focus on content variability, our proposed VQA model takes both subject and content variabilities into account. The model parameters used to describe subject and content variabilities are jointly optimized by solving a maximum likelihood estimation (MLE) problem. As an application, the new subjective VQA model is used to filter out unreliable video quality scores collected in the VideoSet. Experiments are conducted to demonstrate the effectiveness of the proposed model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.00920

PDF

https://arxiv.org/pdf/1807.00920
Read All
Women also Snowboard: Overcoming Bias in Captioning Models

2018-07-02

Lisa Anne Hendricks, Kaylee Burns, Kate Saenko, Trevor Darrell, Anna Rohrbach

arXiv_CV

arXiv_CV Image_Caption Caption Prediction
Abstract

Most machine learning methods are known to capture and exploit biases of the training data. While some biases are beneficial for learning, others are harmful. Specifically, image captioning models tend to exaggerate biases present in training data. This can lead to incorrect captions in domains where unbiased captions are desired, or required, due to over reliance on the learned prior and image context. We investigate generation of gender specific caption words (e.g. man, woman) based on the person’s appearance or the image context. We introduce a new Equalizer model that ensures equal gender probability when gender evidence is occluded in a scene and confident predictions when gender evidence is present. The resulting model is forced to look at a person rather than use contextual cues to make a gender specific prediction. The losses that comprise our model, the Appearance Confusion Loss and the Confident Loss, are general, and can be added to any description model in order to mitigate impacts of unwanted bias in a description dataset. Our proposed model has lower error than prior work when describing images with people and mentioning their gender and more closely matches the ground truth ratio of sentences including women to sentences including men.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.00517

PDF

https://arxiv.org/pdf/1807.00517
Read All
Channel Agnostic End-to-End Learning based Communication Systems with Conditional GAN

2018-07-02

Hao Ye, Geoffrey Ye Li, Biing-Hwang Fred Juang, Kathiravetpillai Sivanesan

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

In this article, we use deep neural networks (DNNs) to develop a wireless end-to-end communication system, in which DNNs are employed for all signal-related functionalities, such as encoding, decoding, modulation, and equalization. However, accurate instantaneous channel transfer function, \emph{i.e.}, the channel state information (CSI), is necessary to compute the gradient of the DNN representing. In many communication systems, the channel transfer function is hard to obtain in advance and varies with time and location. In this article, this constraint is released by developing a channel agnostic end-to-end system that does not rely on any prior information about the channel. We use a conditional generative adversarial net (GAN) to represent the channel effects, where the encoded signal of the transmitter will serve as the conditioning information. In addition, in order to deal with the time-varying channel, the received signal corresponding to the pilot data can also be added as a part of the conditioning information. From the simulation results, the proposed method is effective on additive white Gaussian noise (AWGN) and Rayleigh fading channels, which opens a new door for building data-driven communication systems.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.00447

PDF

https://arxiv.org/pdf/1807.00447
Read All
Dense Information Flow for Neural Machine Translation

2018-07-02

Yanyao Shen, Xu Tan, Di He, Tao Qin, Tie-Yan Liu

arXiv_CL

arXiv_CL Attention Optimization NMT
Abstract

Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework. From the optimization perspective, residual connections are adopted to improve learning performance for both encoder and decoder in most of these deep architectures, and advanced attention connections are applied as well. Inspired by the success of the DenseNet model in computer vision problems, in this paper, we propose a densely connected NMT architecture (DenseNMT) that is able to train more efficiently for NMT. The proposed DenseNMT not only allows dense connection in creating new features for both encoder and decoder, but also uses the dense attention structure to improve attention quality. Our experiments on multiple datasets show that DenseNMT structure is more competitive and efficient.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.00722

PDF

https://arxiv.org/pdf/1806.00722
Read All
Hyperspectral Image Dataset for Benchmarking on Salient Object Detection

2018-07-02

Nevrez Imamoglu, Yu Oishi, Xiaoqiang Zhang, Guanqun Ding, Yuming Fang, Toru Kouyama, Ryosuke Nakamura

arXiv_CV

arXiv_CV Salient Object_Detection Detection
Abstract

Many works have been done on salient object detection using supervised or unsupervised approaches on colour images. Recently, a few studies demonstrated that efficient salient object detection can also be implemented by using spectral features in visible spectrum of hyperspectral images from natural scenes. However, these models on hyperspectral salient object detection were tested with a very few number of data selected from various online public dataset, which are not specifically created for object detection purposes. Therefore, here, we aim to contribute to the field by releasing a hyperspectral salient object detection dataset with a collection of 60 hyperspectral images with their respective ground-truth binary images and representative rendered colour images (sRGB). We took several aspects in consideration during the data collection such as variation in object size, number of objects, foreground-background contrast, object position on the image, and etc. Then, we prepared ground truth binary images for each hyperspectral data, where salient objects are labelled on the images. Finally, we did performance evaluation using Area Under Curve (AUC) metric on some existing hyperspectral saliency detection models in literature.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.11314

PDF

https://arxiv.org/pdf/1806.11314
Read All
Product-based Neural Networks for User Response Prediction over Multi-field Categorical Data

2018-07-01

Yanru Qu, Bohui Fang, Weinan Zhang, Ruiming Tang, Minzhe Niu, Huifeng Guo, Yong Yu, Xiuqiang He

arXiv_CV

arXiv_CV Sparse Attention Optimization Prediction Recommendation
Abstract

User response prediction is a crucial component for personalized information retrieval and filtering scenarios, such as recommender system and web search. The data in user response prediction is mostly in a multi-field categorical format and transformed into sparse representations via one-hot encoding. Due to the sparsity problems in representation and optimization, most research focuses on feature engineering and shallow modeling. Recently, deep neural networks have attracted research attention on such a problem for their high capacity and end-to-end training scheme. In this paper, we study user response prediction in the scenario of click prediction. We first analyze a coupled gradient issue in latent vector-based models and propose kernel product to learn field-aware feature interactions. Then we discuss an insensitive gradient issue in DNN-based models and propose Product-based Neural Network (PNN) which adopts a feature extractor to explore feature interactions. Generalizing the kernel product to a net-in-net architecture, we further propose Product-network In Network (PIN) which can generalize previous models. Extensive experiments on 4 industrial datasets and 1 contest dataset demonstrate that our models consistently outperform 8 baselines on both AUC and log loss. Besides, PIN makes great CTR improvement (relatively 34.67%) in online A/B test.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.00311

PDF

https://arxiv.org/pdf/1807.00311
Read All
Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships

2018-06-30

Yong Liu, Ruiping Wang, Shiguang Shan, Xilin Chen

arXiv_CV

arXiv_CV Object_Detection Inference Detection Relation Recognition
Abstract

Context is important for accurate visual recognition. In this work we propose an object detection algorithm that not only considers object visual appearance, but also makes use of two kinds of context including scene contextual information and object relationships within a single image. Therefore, object detection is regarded as both a cognition problem and a reasoning problem when leveraging these structured information. Specifically, this paper formulates object detection as a problem of graph structure inference, where given an image the objects are treated as nodes in a graph and relationships between the objects are modeled as edges in such graph. To this end, we present a so-called Structure Inference Network (SIN), a detector that incorporates into a typical detection framework (e.g. Faster R-CNN) with a graphical model which aims to infer object state. Comprehensive experiments on PASCAL VOC and MS COCO datasets indicate that scene context and object relationships truly improve the performance of object detection with more desirable and reasonable outputs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.00119

PDF

https://arxiv.org/pdf/1807.00119
Read All
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features

2018-06-30

Chiori Hori, Huda Alamri, Jue Wang, Gordon Wichern, Takaaki Hori, Anoop Cherian, Tim K. Marks, Vincent Cartillier, Raphael Gontijo Lopes, Abhishek Das, Irfan Essa, Dhruv Batra, Devi Parikh

arXiv_CV

arXiv_CV QA Attention Caption VQA
Abstract

Dialog systems need to understand dynamic visual scenes in order to have conversations with users about the objects and events around them. Scene-aware dialog systems for real-world applications could be developed by integrating state-of-the-art technologies from multiple research areas, including: end-to-end dialog technologies, which generate system responses using models trained from dialog data; visual question answering (VQA) technologies, which answer questions about images using learned image features; and video description technologies, in which descriptions/captions are generated from videos using multimodal information. We introduce a new dataset of dialogs about videos of human behaviors. Each dialog is a typed conversation that consists of a sequence of 10 question-and-answer(QA) pairs between two Amazon Mechanical Turk (AMT) workers. In total, we collected dialogs on roughly 9,000 videos. Using this new dataset for Audio Visual Scene-aware dialog (AVSD), we trained an end-to-end conversation model that generates responses in a dialog about a video. Our experiments demonstrate that using multimodal features that were developed for multimodal attention-based video description enhances the quality of generated dialog about dynamic scenes (videos). Our dataset, model code and pretrained models will be publicly available for a new Video Scene-Aware Dialog challenge.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.08409

PDF

https://arxiv.org/pdf/1806.08409
Read All
Weakest-link control of invasive species: Impacts of memory, bounded rationality and network structure in repeated cooperative games

2018-06-29

Adam Kleczkowski, Andrew Bate, Michael Redenti, Nick Hanley

arXiv_CV

arXiv_CV
Abstract

The nature of dispersal of many invasive pests and pathogens in agricultural and forestry makes it necessary to consider how the actions of one manager affect neighbouring properties. In addition to the direct effects of a potential spread of a pest and the resulting economic loss, there are also indirect consequences that affect whole regions and that require coordinated actions to manage and/or to eradicate it (like movement restrictions). In this paper we address the emergence and stability of cooperation among agents who respond to a threat of an invasive pest or disease. The model, based on the weakest-link paradigm, uses repeated multi-participant coordination games where players’ pay-offs depend on management decisions to prevent the invasion on their own land as well as of their neighbours on a network. We show that for the basic cooperation game agents select the risk-dominant strategy of a Stag hunt game over the pay-off dominant strategy of implementing control measures. However, cooperation can be achieved by the social planner offering a biosecurity payment. The critical level of this payment depends on the details of the decision-making process, with higher trust (based on a reputation of other agents reflecting their past performance) allowing a significant reduction in necessary payments and slowing down decay in cooperation when the payment is low. We also find that allowing for uncertainty in decision-making process can enhance cooperation for low levels of payments. Finally, we show the importance of industry structure to the emergence of cooperation, with increase in the average coordination number of network nodes leading to increase in the critical biosecurity payment.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.00701

PDF

https://arxiv.org/pdf/1807.00701
Read All
Convergence Problems with Generative Adversarial Networks

2018-06-29

Samuel A. Barnett

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Generative adversarial networks (GANs) are a novel approach to generative modelling, a task whose goal it is to learn a distribution of real data points. They have often proved difficult to train: GANs are unlike many techniques in machine learning, in that they are best described as a two-player game between a discriminator and generator. This has yielded both unreliability in the training process, and a general lack of understanding as to how GANs converge, and if so, to what. The purpose of this dissertation is to provide an account of the theory of GANs suitable for the mathematician, highlighting both positive and negative results. This involves identifying the problems when training GANs, and how topological and game-theoretic perspectives of GANs have contributed to our understanding and improved our techniques in recent years.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.11382

PDF

https://arxiv.org/pdf/1806.11382
Read All
Generate the corresponding Image from Text Description using Modified GAN-CLS Algorithm

2018-06-29

Fuzhou Gong, Zigeng Xia

arXiv_CV

arXiv_CV Adversarial GAN Inference
Abstract

Synthesizing images or texts automatically is a useful research area in the artificial intelligence nowadays. Generative adversarial networks (GANs), which are proposed by Goodfellow in 2014, make this task to be done more efficiently by using deep neural networks. We consider generating corresponding images from an input text description using a GAN. In this paper, we analyze the GAN-CLS algorithm, which is a kind of advanced method of GAN proposed by Scott Reed in 2016. First, we find the problem with this algorithm through inference. Then we correct the GAN-CLS algorithm according to the inference by modifying the objective function of the model. Finally, we do the experiments on the Oxford-102 dataset and the CUB dataset. As a result, our modified algorithm can generate images which are more plausible than the GAN-CLS algorithm in some cases. Also, some of the generated images match the input texts better.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.11302

PDF

https://arxiv.org/pdf/1806.11302
Read All
YH Technologies at ActivityNet Challenge 2018

2018-06-29

Ting Yao, Xue Li

arXiv_CV

arXiv_CV Caption Action_Recognition Recognition
Abstract

This notebook paper presents an overview and comparative analysis of our systems designed for the following five tasks in ActivityNet Challenge 2018: temporal action proposals, temporal action localization, dense-captioning events in videos, trimmed action recognition, and spatio-temporal action localization.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1807.00686

PDF

https://arxiv.org/pdf/1807.00686
Read All
Neural Machine Translation with Key-Value Memory-Augmented Attention

2018-06-29

Fandong Meng, Zhaopeng Tu, Yong Cheng, Haiyang Wu, Junjie Zhai, Yuekui Yang, Di Wang

arXiv_CL

arXiv_CL Attention NMT
Abstract

Although attention-based Neural Machine Translation (NMT) has achieved remarkable progress in recent years, it still suffers from issues of repeating and dropping translations. To alleviate these issues, we propose a novel key-value memory-augmented attention model for NMT, called KVMEMATT. Specifically, we maintain a timely updated keymemory to keep track of attention history and a fixed value-memory to store the representation of source sentence throughout the whole translation process. Via nontrivial transformations and iterative interactions between the two memories, the decoder focuses on more appropriate source word(s) for predicting the next target word at each decoding step, therefore can improve the adequacy of translations. Experimental results on Chinese=>English and WMT17 German<=>English translation tasks demonstrate the superiority of the proposed model.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.11249

PDF

https://arxiv.org/pdf/1806.11249
Read All

213/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL