Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

English Out-of-Vocabulary Lexical Evaluation Task

2019-02-17

Han Wang, Ye Wang, Mi Lu, Yoonsuck Choe

arXiv_CL

arXiv_CL Knowledge Embedding Classification Language_Model Prediction
Abstract

Unlike previous unknown nouns tagging task (Curran, 2005) (Ciaramita and Johnson, 2003), this is the first attempt to focus on out-of-vocabulary(OOV) lexical evaluation tasks that does not require any prior knowledge. The OOV words are words that only appear in test samples. The goal of tasks is to provide solutions for OOV lexical classification and predication. The tasks require annotators to conclude the attributes of the OOV words based on their related contexts. Then, we utilize unsupervised word embedding methods such as Word2Vec(Mikolov et al., 2013) and Word2GM (Athiwaratkun and Wilson, 2017) to perform the baseline experiments on the categorical classification task and OOV words attribute prediction tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.04242

PDF

http://arxiv.org/pdf/1804.04242
Read All
Structured Group Local Sparse Tracker

2019-02-17

Mohammadreza Javanmardi, Xiaojun Qi

arXiv_CV

arXiv_CV Regularization Sparse Tracking Optimization Quantitative
Abstract

Sparse representation is considered as a viable solution to visual tracking. In this paper, we propose a structured group local sparse tracker (SGLST), which exploits local patches inside target candidates in the particle filter framework. Unlike the conventional local sparse trackers, the proposed optimization model in SGLST not only adopts local and spatial information of the target candidates but also attains the spatial layout structure among them by employing a group-sparsity regularization term. To solve the optimization model, we propose an efficient numerical algorithm consisting of two subproblems with the closed-form solutions. Both qualitative and quantitative evaluations on the benchmarks of challenging image sequences demonstrate the superior performance of the proposed tracker against several state-of-the-art trackers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06182

PDF

http://arxiv.org/pdf/1902.06182
Read All
Iterated Belief Base Revision: A Dynamic Epistemic Logic Approach

2019-02-17

Marlo Souza, Álvaro Moreira, Renata Vieira

arXiv_AI

arXiv_AI Relation
Abstract

AGM’s belief revision is one of the main paradigms in the study of belief change operations. In this context, belief bases (prioritised bases) have been largely used to specify the agent’s belief state - whether representing the agent’s `explicit beliefs’ or as a computational model for her belief state. While the connection of iterated AGM-like operations and their encoding in dynamic epistemic logics have been studied before, few works considered how well-known postulates from iterated belief revision theory can be characterised by means of belief bases and their counterpart in a dynamic epistemic logic. This work investigates how priority graphs, a syntactic representation of preference relations deeply connected to prioritised bases, can be used to characterise belief change operators, focusing on well-known postulates of Iterated Belief Change. We provide syntactic representations of belief change operators in a dynamic context, as well as new negative results regarding the possibility of representing an iterated belief revision operation using transformations on priority graphs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06178

PDF

http://arxiv.org/pdf/1902.06178
Read All
Regularized Evolution for Image Classifier Architecture Search

2019-02-16

Esteban Real, Alok Aggarwal, Yanping Huang, Quoc V Le

arXiv_CV

arXiv_CV NAS Reinforcement_Learning
Abstract

The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers thus discovered have remained inferior to human-crafted ones. Here, we evolve an image classifier—AmoebaNet-A—that surpasses hand-designs for the first time. To do this, we modify the tournament selection evolutionary algorithm by introducing an age property to favor the younger genotypes. Matching size, AmoebaNet-A has comparable accuracy to current state-of-the-art ImageNet models discovered with more complex architecture-search methods. Scaled to larger size, AmoebaNet-A sets a new state-of-the-art 83.9% / 96.6% top-5 ImageNet accuracy. In a controlled comparison against a well known reinforcement learning algorithm, we give evidence that evolution can obtain results faster with the same hardware, especially at the earlier stages of the search. This is relevant when fewer compute resources are available. Evolution is, thus, a simple method to effectively discover high-quality architectures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1802.01548

PDF

https://arxiv.org/pdf/1802.01548
Read All
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey

2019-02-16

Longlong Jing, Yingli Tian

arXiv_CV

arXiv_CV Review Survey Deep_Learning Quantitative
Abstract

Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the main components and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used image and video datasets and the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06162

PDF

http://arxiv.org/pdf/1902.06162
Read All
Deep Convolutional Sum-Product Networks for Probabilistic Image Representations

2019-02-16

Jos van de Wolfshaar, Andrzej Pronobis

arXiv_CV

arXiv_CV Image_Caption Regularization CNN Inference Relation
Abstract

Sum-Product Networks (SPNs) are hierarchical probabilistic graphical models capable of fast and exact inference. Applications of SPNs to real-world data such as large image datasets has been fairly limited in previous literature. We introduce Convolutional Sum-Product Networks (ConvSPNs) which exploit the inherent structure of images in a way similar to deep convolutional neural networks, optionally with weight sharing. ConvSPNs encode spatial relationships through local products and local sum operations. ConvSPNs obtain state-of-the-art results compared to other SPN-based approaches on several visual datasets, including color images, for both generative as well as discriminative tasks. ConvSPNs are the first pure-SPN models applied to color images that do not depend on additional techniques for feature extraction. In addition, we introduce two novel methods for regularizing SPNs trained with hard EM. Both regularization methods have been motivated by observing an exponentially decreasing variance of log probabilities with respect to the depth of randomly structured SPNs. We show that our regularization provides substantial further improvements in generative visual tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06155

PDF

http://arxiv.org/pdf/1902.06155
Read All
BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding

2019-02-16

Gencer Sumbul, Marcela Charfuelan, Begüm Demir, Volker Markl

arXiv_CV

arXiv_CV Image_Caption CNN Classification Deep_Learning
Abstract

This paper presents a new large-scale multi-label Sentinel-2 benchmark archive, named BigEarthNet. Our archive consists of 590,326 Sentinel-2 image patches, each of which has 10, 20 and 60 meter image bands associated to the pixel sizes of 120x120, 60x60 and 20x20, respectively. Unlike most of the existing archives, each image patch is annotated by multiple land-cover classes (i.e., multi-labels) that are provided from the CORINE Land Cover database of the year 2018 (CLC 2018). The BigEarthNet is 20 times larger than the existing archives in remote sensing (RS) and thus is much more convenient to be used as a training source in the context of deep learning. This paper first addresses the limitations of the existing archives and then describes properties of our archive. Experimental results obtained in the framework of RS image scene classification problems show that a shallow Convolutional Neural Network (CNN) architecture trained on the BigEarthNet provides very high accuracy compared to a state-of-the-art CNN model pre-trained on the ImageNet (which is a very popular large-scale benchmark archive in computer vision). The BigEarthNet opens up promising directions to advance operational RS applications and research in massive Sentinel-2 image archives.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06148

PDF

http://arxiv.org/pdf/1902.06148
Read All
Neuromodulated Goal-Driven Perception in Uncertain Domains

2019-02-16

Xinyun Zou, Soheil Kolouri, Praveen K. Pilly, Jeffrey L. Krichmar

arXiv_CV

arXiv_CV Attention GAN
Abstract

In uncertain domains, the goals are often unknown and need to be predicted by the organism or system. In this paper, contrastive excitation backprop (c-EB) was used in a goal-driven perception task with pairs of noisy MNIST digits, where the system had to increase attention to one of the two digits corresponding to a goal (i.e., even, odd, low value, or high value) and decrease attention to the distractor digit or noisy background pixels. Because the valid goal was unknown, an online learning model based on the cholinergic and noradrenergic neuromodulatory systems was used to predict a noisy goal (expected uncertainty) and re-adapt when the goal changed (unexpected uncertainty). This neurobiologically plausible model demonstrates how neuromodulatory systems can predict goals in uncertain domains and how attentional mechanisms can enhance the perception of that goal.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00068

PDF

http://arxiv.org/pdf/1903.00068
Read All
A Fleet of Miniature Cars for Experiments in Cooperative Driving

2019-02-16

Nicholas Hyldmar, Yijun He, Amanda Prorok

arXiv_RO

arXiv_RO
Abstract

We introduce a unique experimental testbed that consists of a fleet of 16 miniature Ackermann-steering vehicles. We are motivated by a lack of available low-cost platforms to support research and education in multi-car navigation and trajectory planning. This article elaborates the design of our miniature robotic car, the Cambridge Minicar, as well as the fleet’s control architecture. Our experimental testbed allows us to implement state-of-the-art driver models as well as autonomous control strategies, and test their validity in a real, physical multi-lane setup. Through experiments on our miniature highway, we are able to tangibly demonstrate the benefits of cooperative driving on multi-lane road topographies. Our setup paves the way for indoor large-fleet experimental research.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06133

PDF

http://arxiv.org/pdf/1902.06133
Read All
LISA: a MATLAB package for Longitudinal Image Sequence Analysis

2019-02-16

Jang Ik Cho, Xiaofeng Wang, Yifan Xu, Jiayang Sun

arXiv_CV

arXiv_CV Segmentation
Abstract

Large sequences of images (or movies) can now be obtained on an unprecedented scale, which poses fundamental challenges to the existing image analysis techniques. The challenges include heterogeneity, (automatic) alignment, multiple comparisons, potential artifacts, and hidden noises. This paper introduces our MATLAB package, Longitudinal Image Sequence Analysis (LISA), as a one-stop ensemble of image processing and analysis tool for comparing a general class of images from either different times, sessions, or subjects. Given two contrasting sequences of images, the image processing in LISA starts with selecting a region of interest in two representative images, followed by automatic or manual segmentation and registration. Automatic segmentation de-noises an image using a mixture of Gaussian distributions of the pixel intensity values, while manual segmentation applies a user-chosen intensity cut-off value to filter out noises. Automatic registration aligns the contrasting images based on a mid-line regression whereas manual registration lines up the images along a reference line formed by two user-selected points. The processed images are then rendered for simultaneous statistical comparisons to generate D, S, T, and P-maps. The D map represents a curated difference of contrasting images, the S map is the non-parametrically smoothed differences, the T map presents the variance-adjusted, smoothed differences, and the P-map provides multiplicity-controlled p-values. These maps reveal the regions with significant differences due to either longitudinal, subject-specific, or treatment changes. A user can skip the image processing step to dive directly into the statistical analysis step if the images have already been processed. Hence, LISA offers flexibility in applying other image pre-processing tools. LISA also has a parallel computing option for high definition images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06131

PDF

http://arxiv.org/pdf/1902.06131
Read All
Atlas-based automated detection of swim bladder in Medaka embryo

2019-02-16

Diane Genest (LIGM), Marc Léonard, Jean Cousty (LIGM), Noémie De Crozé (RCO), Hugues Talbot (LIGM)

arXiv_CV

arXiv_CV Segmentation Detection
Abstract

Fish embryo models are increasingly being used both for the assessment of chemicals efficacy and potential toxicity. This article proposes a methodology to automatically detect the swim bladder on 2D images of Medaka fish embryos seen either in dorsal view or in lateral view. After embryo segmentation and for each studied orientation, the method builds an atlas of a healthy embryo. This atlas is then used to define the region of interest and to guide the swim bladder segmentation with a discrete globally optimal active contour. Descriptors are subsequently designed from this segmentation. An automated random forest clas-sifier is built from these descriptors in order to classify embryos with and without a swim bladder. The proposed method is assessed on a dataset of 261 images, containing 202 embryos with a swim bladder (where 196 are in dorsal view and 6 are in lateral view) and 59 without (where 43 are in dorsal view and 16 are in lateral view). We obtain an average precision rate of 95% in the total dataset following 5-fold cross-validation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06130

PDF

http://arxiv.org/pdf/1902.06130
Read All
Timeline-based planning: Expressiveness and Complexity

2019-02-16

Nicola Gigante

arXiv_AI

arXiv_AI
Abstract

Timeline-based planning is an approach originally developed in the context of space mission planning and scheduling, where problem domains are modelled as systems made of a number of independent but interacting components, whose behaviour over time, the timelines, is governed by a set of temporal constraints. This approach is different from the action-based perspective of common PDDL-like planning languages. Timeline-based systems have been successfully deployed in a number of space missions and other domains. However, despite this practical success, a thorough theoretical understanding of the paradigm was missing. This thesis fills this gap, providing the first detailed account of formal and computational properties of the timeline-based approach to planning. In particular, we show that a particularly restricted variant of the formalism is already expressive enough to compactly capture action-based temporal planning problems. Then, finding a solution plan for a timeline-based planning problem is proved to be EXPSPACE-complete. Then, we study the problem of timeline-based planning with uncertainty, that include external components whose behaviour is not under the control of the planned system. We identify a few issues in the state-of-the-art approach based on flexible plans, proposing timeline-based games, a more general game-theoretic formulation of the problem, that addresses those issues. We show that winning strategies for such games can be found in doubly-exponential time. Then, we study the expressiveness of the formalism from a logic point of view, showing that (most of) timeline-based planning problems can be captured by Bounded TPTL with Past, a fragment of TPTL+P that, unlike the latter, keeps an EXPSPACE satisfiability problem. The logic is introduced and its satisfiabilty problem is solved by extending a recent one-pass tree-shaped tableau method for LTL.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06123

PDF

http://arxiv.org/pdf/1902.06123
Read All
How Machine Learning Helps Us Understand Human Learning: the Value of Big Ideas

2019-02-16

Marc Maliar

arXiv_AI

arXiv_AI Regularization Recognition
Abstract

I use simulation of two multilayer neural networks to gain intuition into the determinants of human learning. The first network, the teacher, is trained to achieve a high accuracy in handwritten digit recognition. The second network, the student, learns to reproduce the output of the first network. I show that learning from the teacher is more effective than learning from the data under the appropriate degree of regularization. Regularization allows the teacher to distinguish the trends and to deliver “big ideas” to the student. I also model other learning situations such as expert and novice teachers, high- and low-ability students and biased learning experience due to, e.g., poverty and trauma. The results from computer simulation accord remarkably well with finding of the modern psychological literature. The code is written in MATLAB and will be publicly available from the author’s web page.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03408

PDF

http://arxiv.org/pdf/1903.03408
Read All
A Baseline for Multi-Label Image Classification Using An Ensemble of Deep Convolutional Neural Networks

2019-02-16

Qian Wang, Ning Jia, Toby P. Breckon

arXiv_CV

arXiv_CV Attention CNN Image_Classification Classification
Abstract

Recent studies on multi-label image classification have focused on designing more complex architectures of deep neural networks such as the use of attention mechanisms and region proposal networks. Although performance gains have been reported, the backbone deep models of the proposed approaches and the evaluation metrics employed in different works vary, making it difficult to compare each fairly. Moreover, due to the lack of properly investigated baselines, the advantage introduced by the proposed techniques are often ambiguous. To address these issues, we make a thorough investigation of the mainstream deep convolutional neural network architectures for multi-label image classification and present a strong baseline. With the use of proper data augmentation techniques and model ensembles, the basic deep architectures can achieve better performance than many existing more complex ones on three benchmark datasets, providing great insight for the future studies on multi-label image classification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.08412

PDF

http://arxiv.org/pdf/1811.08412
Read All
Reinforcement Learning of Active Vision for Manipulating Objects under Occlusions

2019-02-16

Ricson Cheng, Arpit Agarwal, Katerina Fragkiadaki

arXiv_CV

arXiv_CV Attention Reinforcement_Learning
Abstract

We consider artificial agents that learn to jointly control their gripperand camera in order to reinforcement learn manipulation policies in the presenceof occlusions from distractor objects. Distractors often occlude the object of in-terest and cause it to disappear from the field of view. We propose hand/eye con-trollers that learn to move the camera to keep the object within the field of viewand visible, in coordination to manipulating it to achieve the desired goal, e.g.,pushing it to a target location. We incorporate structural biases of object-centricattention within our actor-critic architectures, which our experiments suggest tobe a key for good performance. Our results further highlight the importance ofcurriculum with regards to environment difficulty. The resulting active vision /manipulation policies outperform static camera setups for a variety of clutteredenvironments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.08067

PDF

http://arxiv.org/pdf/1811.08067
Read All
Semi-supervised Learning on Graph with an Alternating Diffusion Process

2019-02-16

Qilin Li, Senjian An, Ling Li, Wanquan Liu

arXiv_CV

arXiv_CV Inference Relation
Abstract

Graph-based semi-supervised learning usually involves two separate stages, constructing an affinity graph and then propagating labels for transductive inference on the graph. It is suboptimal to solve them independently, as the correlation between the affinity graph and labels are not fully exploited. In this paper, we integrate the two stages into one unified framework by formulating the graph construction as a regularized function estimation problem similar to label propagation. We propose an alternating diffusion process to solve the two problems simultaneously, which allows us to learn the graph and unknown labels in an iterative fashion. With the proposed framework, we are able to adequately leverage both the given labels and estimated labels to construct a better graph, and effectively propagate labels on such a dynamic graph updated simultaneously with the newly obtained labels. Extensive experiments on various real-world datasets have demonstrated the superiority of the proposed method compared to other state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06105

PDF

http://arxiv.org/pdf/1902.06105
Read All
Study of dynamical system based obstacle avoidance via manipulating orthogonal coordinates

2019-02-16

Weiya Ren

arXiv_RO

arXiv_RO
Abstract

In this paper, we consider the general problem of obstacle avoidance based on dynamical system. The modulation matrix is developed by introducing orthogonal coordinates, which makes the modulation matrix more reasonable. The new trajectory’s direction can be represented by the linear combination of orthogonal coordinates. A orthogonal coordinates manipulating approach is proposed by introducing rotating matrix to solve the local minimal problem and provide more reasonable motions in 3-D or higher dimension space. The proposed method also provide a solution for patrolling around a convex shape. Experimental results on several designed dynamical systems demonstrate the effectiveness of the proposed approach.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.05343

PDF

https://arxiv.org/pdf/1902.05343
Read All
Exploring Language Similarities with Dimensionality Reduction Technique

2019-02-16

Sangarshanan Veeraraghavan

arXiv_CL

arXiv_CL
Abstract

In recent years several novel models were developed to process natural language, development of accurate language translation systems have helped us overcome geographical barriers and communicate ideas effectively. These models are developed mostly for a few languages that are widely used while other languages are ignored. Most of the languages that are spoken share lexical, syntactic and sematic similarity with several other languages and knowing this can help us leverage the existing model to build more specific and accurate models that can be used for other languages, so here I have explored the idea of representing several known popular languages in a lower dimension such that their similarities can be visualized using simple 2 dimensional plots. This can even help us understand newly discovered languages that may not share its vocabulary with any of the existing languages.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06092

PDF

http://arxiv.org/pdf/1902.06092
Read All
DC-Al GAN: Pseudoprogression and True Tumor Progression of Glioblastoma multiform Image Classification Based On DCGAN and Alexnet

2019-02-16

Meiyu Li

arXiv_CV

arXiv_CV Adversarial GAN CNN Image_Classification Classification Relation
Abstract

Glioblastoma multiform (GBM) is a kind of head tumor with an extraordinarily complex treatment process. The survival period is typically 14-16 months, and the 2 year survival rate is approximately 26%-33%. The clinical treatment strategies for the pseudoprogression (PsP) and true tumor progression (TTP) of GBM are different, so accurately distinguishing these two conditions is particularly significant.As PsP and TTP of GBM are similar in shape and other characteristics, it is hard to distinguish these two forms with precision. In order to differentiate them accurately, this paper introduces a feature learning method based on a generative adversarial network: DC-Al GAN. GAN consists of two architectures: generator and discriminator. Alexnet is used as the discriminator in this work. Owing to the adversarial and competitive relationship between generator and discriminator, the latter extracts highly concise features during training. In DC-Al GAN, features are extracted from Alexnet in the final classification phase, and the highly nature of them contributes positively to the classification accuracy.The generator in DC-Al GAN is modified by the deep convolutional generative adversarial network (DCGAN) by adding three convolutional layers. This effectively generates higher resolution sample images. Feature fusion is used to combine high layer features with low layer features, allowing for the creation and use of more precise features for classification. The experimental results confirm that DC-Al GAN achieves high accuracy on GBM datasets for PsP and TTP image classification, which is superior to other state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06085

PDF

http://arxiv.org/pdf/1902.06085
Read All
Local Fourier Slice Photography

2019-02-16

Christian Lessig

arXiv_CV

arXiv_CV
Abstract

Light field cameras provide intriguing possibilities, such as post-capture refocus or the ability to look behind an object. This comes, however, at the price of significant storage requirements. Compression techniques can be used to reduce these but refocusing and reconstruction require so far again a dense representation. To avoid this, we introduce a sheared local Fourier slice equation that allows for refocusing directly from a compressed light field, either to obtain an image or a compressed representation of it. The result is made possible by wavelets that respect the “slicing’s” intrinsic structure and enable us to derive exact reconstruction filters for the refocused image in closed form. Image reconstruction then amounts to applying these filters to the light field’s wavelet coefficients, and hence no decompression is necessary. We demonstrate that this substantially reduces storage requirements and also computation times. We furthermore analyze the computational complexity of our algorithm and show that it scales linearly with the size of the reconstructed region and the non-negligible wavelet coefficients, i.e. with the visual complexity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06082

PDF

http://arxiv.org/pdf/1902.06082
Read All
Re-determinizing Information Set Monte Carlo Tree Search in Hanabi

2019-02-16

James Goodman

arXiv_AI

arXiv_AI
Abstract

This technical report documents the winner of the Computational Intelligence in Games(CIG) 2018 Hanabi competition. We introduce Re-determinizing IS-MCTS, a novel extension of Information Set Monte Carlo Tree Search (IS-MCTS) \cite{IS-MCTS} that prevents a leakage of hidden information into opponent models that can occur in IS-MCTS, and is particularly severe in Hanabi. Re-determinizing IS-MCTS scores higher in Hanabi for 2-4 players than previously published work. Given the 40ms competition time limit per move we use a learned evaluation function to estimate leaf node values and avoid full simulations during MCTS. For the Mixed track competition, in which the identity of the other players is unknown, a simple Bayesian opponent model is used that is updated as each game proceeds.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06075

PDF

http://arxiv.org/pdf/1902.06075
Read All
Deep Learning for Image Super-resolution: A Survey

2019-02-16

Zhihao Wang, Jian Chen, Steven C.H. Hoi

arXiv_CV

arXiv_CV Super_Resolution Survey Deep_Learning
Abstract

Image Super-Resolution (SR) is an important class of image processing techniques to enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. In this survey, we aim to give a survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06068

PDF

http://arxiv.org/pdf/1902.06068
Read All
RES-SE-NET: Boosting Performance of Resnets by Enhancing Bridge-connections

2019-02-16

Varshaneya V, Balasubramanian S, Darshan Gera

arXiv_CV

arXiv_CV
Abstract

One of the ways to train deep neural networks effectively is to use residual connections. Residual connections can be classified as being either identity connections or bridge-connections with a reshaping convolution. Empirical observations on CIFAR-10 and CIFAR-100 datasets using a baseline Resnet model, with bridge-connections removed, have shown a significant reduction in accuracy. This reduction is due to lack of contribution, in the form of feature maps, by the bridge-connections. Hence bridge-connections are vital for Resnet. However, all feature maps in the bridge-connections are considered to be equally important. In this work, an upgraded architecture “Res-SE-Net” is proposed to further strengthen the contribution from the bridge-connections by quantifying the importance of each feature map and weighting them accordingly using Squeeze-and-Excitation (SE) block. It is demonstrated that Res-SE-Net generalizes much better than Resnet and SE-Resnet on the benchmark CIFAR-10 and CIFAR-100 datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06066

PDF

http://arxiv.org/pdf/1902.06066
Read All
Skin Lesion Segmentation and Classification with Deep Learning System

2019-02-16

Devansh Bisla, Anna Choromanska, Jennifer A. Stein, David Polsky, Russell Berman

arXiv_CV

arXiv_CV Segmentation Classification Deep_Learning Detection
Abstract

Melanoma is one of the ten most common cancers in the US. Early detection is crucial for survival, but often the cancer is diagnosed in the fatal stage. Deep learning has the potential to improve cancer detection rates, but its applicability to melanoma detection is compromised by the limitations of the available skin lesion databases, which are small, heavily imbalanced, and contain images with occlusions. We propose a complete deep learning system for lesion segmentation and classification that utilizes networks specialized in data purification and augmentation. It contains the processing unit for removing image occlusions and the data generation unit for populating scarce lesion classes, or equivalently creating virtual patients with pre-defined types of lesions. We empirically verify our approach and show superior performance over common baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06061

PDF

http://arxiv.org/pdf/1902.06061
Read All
Min-Entropy Latent Model for Weakly Supervised Object Detection

2019-02-16

Fang Wan, Pengxu Wei, Zhenjun Han, Jianbin Jiao, Qixiang Ye

arXiv_CV

arXiv_CV Object_Detection Weakly_Supervised Image_Classification Optimization Classification Detection
Abstract

Weakly supervised object detection is a challenging task when provided with image category supervision but required to learn, at the same time, object locations and object detectors. The inconsistency between the weak supervision and learning objectives introduces significant randomness to object locations and ambiguity to detectors. In this paper, a min-entropy latent model (MELM) is proposed for weakly supervised object detection. Min-entropy serves as a model to learn object locations and a metric to measure the randomness of object localization during learning. It aims to principally reduce the variance of learned instances and alleviate the ambiguity of detectors. MELM is decomposed into three components including proposal clique partition, object clique discovery, and object localization. MELM is optimized with a recurrent learning algorithm, which leverages continuation optimization to solve the challenging non-convexity problem. Experiments demonstrate that MELM significantly improves the performance of weakly supervised object detection, weakly supervised object localization, and image classification, against the state-of-the-art approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06057

PDF

http://arxiv.org/pdf/1902.06057
Read All
Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media

2019-02-16

Khuong Vo, Tri Nguyen, Dang Pham, Mao Nguyen, Minh Truong, Trung Mai, Tho Quan

arXiv_CL

arXiv_CL Sentiment Knowledge Embedding CNN Transfer_Learning Deep_Learning
Abstract

Sentiment analysis has been emerging recently as one of the major natural language processing (NLP) tasks in many applications. Especially, as social media channels (e.g. social networks or forums) have become significant sources for brands to observe user opinions about their products, this task is thus increasingly crucial. However, when applied with real data obtained from social media, we notice that there is a high volume of short and informal messages posted by users on those channels. This kind of data makes the existing works suffer from many difficulties to handle, especially ones using deep learning approaches. In this paper, we propose an approach to handle this problem. This work is extended from our previous work, in which we proposed to combine the typical deep learning technique of Convolutional Neural Networks with domain knowledge. The combination is used for acquiring additional training data augmentation and a more reasonable loss function. In this work, we further improve our architecture by various substantial enhancements, including negation-based data augmentation, transfer learning for word embeddings, the combination of word-level embeddings and character-level embeddings, and using multitask learning technique for attaching domain knowledge rules in the learning process. Those enhancements, specifically aiming to handle short and informal messages, help us to enjoy significant improvement in performance once experimenting on real datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06050

PDF

http://arxiv.org/pdf/1902.06050
Read All
Forecasting the 2017-2018 Yemen Cholera Outbreak with Machine Learning

2019-02-16

Rohil Badkundri, Victor Valbuena, Srikusmanjali Pinnamareddy, Brittney Cantrell, Janet Standeven

arXiv_AI

arXiv_AI Relation
Abstract

The ongoing Yemen cholera outbreak has been deemed one of the worst cholera outbreaks in history, with over a million people impacted and thousands dead. Triggered by a civil war, the outbreak has been shaped by various political, environmental, and epidemiological factors and continues to worsen. While cholera has several effective treatments, the untimely and inefficient distribution of existing medicines has been the primary cause of cholera mortality. With the hope of facilitating resource allocation, various mathematical models have been created to track the Yemeni outbreak and identify at-risk administrative divisions, called governorates. Existing models are not powerful enough to accurately and consistently forecast cholera cases per governorate over multiple timeframes. To address the need for a complex, reliable model, we offer the Cholera Artificial Learning Model (CALM); a system of 4 extreme-gradient-boosting (XGBoost) machine learning models that forecast the number of new cholera cases a Yemeni governorate will experience from a time range of 2 weeks to 2 months. CALM provides a novel machine learning approach that makes use of rainfall data, past cholera cases and deaths data, civil war fatalities, and inter-governorate interactions represented across multiple time frames. Additionally, the use of machine learning, along with extensive feature engineering, allows CALM to easily learn complex non-linear relations apparent in an epidemiological phenomenon. CALM is able to forecast cholera incidence 2 weeks to 2 months in advance within a margin of just 5 cholera cases per 10,000 people in real-world simulation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06739

PDF

http://arxiv.org/pdf/1902.06739
Read All
$mathcal{R}^2$-CNN: Fast Tiny Object Detection in Large-scale Remote Sensing Images

2019-02-16

Jiangmiao Pang, Cong Li, Jianping Shi, Zhihai Xu, Huajun Feng

arXiv_CV

arXiv_CV Object_Detection Knowledge Attention CNN Detection
Abstract

Recently, convolutional neural network has brought impressive improvements for object detection. However, detecting tiny objects in large-scale remote sensing images still remains challenging. Firstly, the extreme large input size makes existing object detection solutions too slow for practical use. Secondly, the massive and complex backgrounds cause serious false alarms. Moreover, the ultra tiny objects increase the difficulty of accurate detection. To tackle these problems, we propose a unified and self-reinforced network called $\mathcal{R}^2$-CNN: Remote sensing Region-based Convolutional Neural Network, composing of backbone Tiny-Net, intermediate global attention block, and final classifier and detector. Tiny-Net is a lightweight residual structure which enables fast and powerful features extraction from inputs. Global attention block is built upon Tiny-Net to inhibit false positives. Classifier is then used to predict the existence of target in each patch, and detector is followed to locate them accurately if available. The classifier and detector are mutually reinforced with end-to-end training, which further speed-up the process and avoid false alarms. Effectiveness of $\mathcal{R}^2$-CNN is validated on hundreds of \emph{GF-1} images and \emph{GF-2} images, which are $18000 \times 18192$ pixels, 2.0m resolution, and $27620 \times 29200$ pixels, 0.8m resolution respectively. Specifically, we can process a GF-1 image in 29.4s on Titian X just with single thread. According to our knowledge, no previous solution can detect tiny object on such huge remote sensing images gracefully. We believe that it is a significant step towards practical real-time remote sensing systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06042

PDF

http://arxiv.org/pdf/1902.06042
Read All
TopicEq: A Joint Topic and Mathematical Equation Model for Scientific Texts

2019-02-16

Michihiro Yasunaga, John Lafferty

arXiv_CL

arXiv_CL Inference RNN Relation
Abstract

Scientific documents rely on both mathematics and text to communicate ideas. Inspired by the topical correspondence between mathematical equations and word contexts observed in scientific texts, we propose a novel topic model that jointly generates mathematical equations and their surrounding text (TopicEq). Using an extension of the correlated topic model, the context is generated from a mixture of latent topics, and the equation is generated by an RNN that depends on the latent topic activations. To experiment with this model, we create a corpus of 400K equation-context pairs extracted from a range of scientific articles from arXiv, and fit the model using a variational autoencoder approach. Experimental results show that this joint model significantly outperforms existing topic models and equation models for scientific texts. Moreover, we qualitatively show that the model effectively captures the relationship between topics and mathematics, enabling novel applications such as topic-aware equation generation, equation topic inference, and topic-aware alignment of mathematical symbols and words.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06034

PDF

http://arxiv.org/pdf/1902.06034
Read All
Learning Quickly to Plan Quickly Using Modular Meta-Learning

2019-02-16

Rohan Chitnis, Leslie Pack Kaelbling, Tomás Lozano-Pérez

arXiv_AI

arXiv_AI
Abstract

Multi-object manipulation problems in continuous state and action spaces can be solved by planners that search over sampled values for the continuous parameters of operators. The efficiency of these planners depends critically on the effectiveness of the samplers used, but effective sampling in turn depends on details of the robot, environment, and task. Our strategy is to learn functions called “specializers” that generate values for continuous operator parameters, given a state description and values for the discrete parameters. Rather than trying to learn a single specializer for each operator from large amounts of data on a single task, we take a modular meta-learning approach. We train on multiple tasks and learn a variety of specializers that, on a new task, can be quickly adapted using relatively little data – thus, our system “learns quickly to plan quickly” using these specializers. We validate our approach experimentally in simulated 3D pick-and-place tasks with continuous state and action spaces. Visit this http URL for a supplementary video.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.07878

PDF

http://arxiv.org/pdf/1809.07878
Read All
CruzAffect at AffCon 2019 Shared Task: A feature-rich approach to characterize happiness

2019-02-16

Jiaqi Wu, Ryan Compton, Geetanjali Rakshit, Marilyn Walker, Pranav Anand, Steve Whittaker

arXiv_CL

arXiv_CL Sentiment CNN Classification Deep_Learning Prediction
Abstract

We present our system, CruzAffect, for the CL-Aff Shared Task 2019. CruzAffect consists of several types of robust and efficient models for affective classification tasks. We utilize both traditional classifiers, such as XGBoosted Forest, as well as a deep learning Convolutional Neural Networks (CNN) classifier. We explore rich feature sets such as syntactic features, emotional features, and profile features, and utilize several sentiment lexicons, to discover essential indicators of social involvement and control that a subject might exercise in their happy moments, as described in textual snippets from the HappyDB database. The data comes with a labeled set (10K), and a larger unlabeled set (70K). We therefore use supervised methods on the 10K dataset, and a bootstrapped semi-supervised approach for the 70K. We evaluate these models for binary classification of agency and social labels (Task 1), as well as multi-class prediction for concepts labels (Task 2). We obtain promising results on the held-out data, suggesting that the proposed feature sets effectively represent the data for affective classification tasks. We also build concepts models that discover general themes recurring in happy moments. Our results indicate that generic characteristics are shared between the classes of agency, social and concepts, suggesting it should be possible to build general models for affective classification tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06024

PDF

http://arxiv.org/pdf/1902.06024
Read All
A Fully Differentiable Beam Search Decoder

2019-02-16

Ronan Collobert, Awni Hannun, Gabriel Synnaeve

arXiv_CL

arXiv_CL Attention Speech_Recognition Inference Language_Model Recognition
Abstract

We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g. acoustic and language models). It can be used when target sequences are not aligned to input sequences by considering all possible alignments between the two. We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models. The system is end-to-end, with gradients flowing through the whole architecture from the word-level transcriptions. Recent research efforts have shown that deep neural networks with attention-based mechanisms are powerful enough to successfully train an acoustic model from the final transcription, while implicitly learning a language model. Instead, we show that it is possible to discriminatively train an acoustic model jointly with an explicit and possibly pre-trained language model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06022

PDF

http://arxiv.org/pdf/1902.06022
Read All
Letter-Based Speech Recognition with Gated ConvNets

2019-02-16

Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

arXiv_AI

arXiv_AI Speech_Recognition Inference Recognition
Abstract

In the recent literature, “end-to-end” speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC). In contrast to traditional phone (or senone)-based approaches, these “end-to-end’’ approaches alleviate the need of word pronunciation modeling, and do not require a “forced alignment” step at training time. Phone-based approaches remain however state of the art on classical benchmarks. In this paper, we propose a letter-based speech recognition system, leveraging a ConvNet acoustic model. Key ingredients of the ConvNet are Gated Linear Units and high dropout. The ConvNet is trained to map audio sequences to their corresponding letter transcriptions, either via a classical CTC approach, or via a recent variant called ASG. Coupled with a simple decoder at inference time, our system matches the best existing letter-based systems on WSJ (in word error rate), and shows near state of the art performance on LibriSpeech.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1712.09444

PDF

http://arxiv.org/pdf/1712.09444
Read All
Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial Puzzles

2019-02-16

Thiago Freitas dos Santos, Paulo E. Santos, Leonardo A. Ferreira, Reinaldo A. C. Bianchi, Pedro Cabalar

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Spatial puzzles composed of rigid objects, flexible strings and holes offer interesting domains for reasoning about spatial entities that are common in the human daily-life’s activities. The goal of this work is to investigate the automated solution of this kind of puzzles adapting an algorithm that combines Answer Set Programming (ASP) with Markov Decision Process (MDP), algorithm oASP(MDP), to use heuristics accelerating the learning process. ASP is applied to represent the domain as an MDP, while a Reinforcement Learning algorithm (Q-Learning) is used to find the optimal policies. In this work, the heuristics were obtained from the solution of relaxed versions of the puzzles. Experiments were performed on deterministic, non-deterministic and non-stationary versions of the puzzles. Results show that the proposed approach can accelerate the learning process, presenting an advantage when compared to the non-heuristic versions of oASP(MDP) and Q-Learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03411

PDF

http://arxiv.org/pdf/1903.03411
Read All
Realizing Continual Learning through Modeling a Learning System as a Fiber Bundle

2019-02-16

Zhenfeng Cao

arXiv_AI

arXiv_AI
Abstract

A human brain is capable of continual learning by nature; however the current mainstream deep neural networks suffer from a phenomenon named catastrophic forgetting (i.e., learning a new set of patterns suddenly and completely would result in fully forgetting what has already been learned). In this paper we propose a generic learning model, which regards a learning system as a fiber bundle. By comparing the learning performance of our model with conventional ones whose neural networks are multilayer perceptrons through a variety of machine-learning experiments, we found our proposed model not only enjoys a distinguished capability of continual learning but also bears a high information capacity. In addition, we found in some learning scenarios the learning performance can be further enhanced by making the learning time-aware to mimic the episodic memory in human brain. Last but not least, we found that the properties of forgetting in our model correspond well to those of human memory. This work may shed light on how a human brain learns.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03511

PDF

http://arxiv.org/pdf/1903.03511
Read All
ProLoNets: Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement Learning

2019-02-15

Andrew Silva, Matthew Gombolay

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning Gradient_Descent
Abstract

Deep reinforcement learning has seen great success across a breadth of tasks such as in game playing and robotic manipulation. However, the modern practice of attempting to learn tabula rasa disregards the logical structure of many domains and the wealth of readily-available human domain experts’ knowledge that could help ``warm start’’ the learning process. Further, learning from demonstration techniques are not yet sufficient to infer this knowledge through sampling-based mechanisms in large state and action spaces, or require immense amounts of data. We present a new reinforcement learning architecture that can encode expert knowledge, in the form of propositional logic, directly into a neural, tree-like structure of fuzzy propositions that are amenable to gradient descent. We show that our novel architecture is able to outperform reinforcement and imitation learning techniques across an array of canonical challenge problems for artificial intelligence.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06007

PDF

http://arxiv.org/pdf/1902.06007
Read All
Contextual Word Representations: A Contextual Introduction

2019-02-15

Noah A. Smith

arXiv_CL

arXiv_CL Embedding
Abstract

This introduction aims to tell the story of how we put words into computers. It is part of the story of the field of natural language processing (NLP), a branch of artificial intelligence. It targets a wide audience with a basic understanding of computer programming, but avoids a detailed mathematical treatment, and it does not present any algorithms. It also does not focus on any particular application of NLP such as translation, question answering, or information extraction. The ideas presented here were developed by many researchers over many decades, so the citations are not exhaustive but rather direct the reader to a handful of papers that are, in the author’s view, seminal. After reading this document, you should have a general understanding of word vectors (also known as word embeddings): why they exist, what problems they solve, where they come from, how they have changed over time, and what some of the open questions about them are. Readers already familiar with word vectors are advised to skip to Section 5 for the discussion of the most recent advance, contextual word vectors.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06006

PDF

http://arxiv.org/pdf/1902.06006
Read All
Improving Semantic Parsing for Task Oriented Dialog

2019-02-15

Arash Einolghozati, Panupong Pasupat, Sonal Gupta, Rushin Shah, Mrinal Mohit, Mike Lewis, Luke Zettlemoyer

arXiv_AI

arXiv_AI Embedding Language_Model
Abstract

Semantic parsing using hierarchical representations has recently been proposed for task oriented dialog with promising results [Gupta et al 2018]. In this paper, we present three different improvements to the model: contextualized embeddings, ensembling, and pairwise re-ranking based on a language model. We taxonomize the errors possible for the hierarchical representation, such as wrong top intent, missing spans or split spans, and show that the three approaches correct different kinds of errors. The best model combines the three techniques and gives 6.4% better exact match accuracy than the state-of-the-art, with an error reduction of 33%, resulting in a new state-of-the-art result on the Task Oriented Parsing (TOP) dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06000

PDF

http://arxiv.org/pdf/1902.06000
Read All
Bootstrapping Deep Neural Networks from Approximate Image Processing Pipelines

2019-02-15

Kilho Son, Jesse Hostetler, Sek Chai

arXiv_CV

arXiv_CV Knowledge
Abstract

Complex image processing and computer vision systems often consist of a processing pipeline of functional modules. We intend to replace parts or all of a target pipeline with deep neural networks to achieve benefits such as increased accuracy or reduced computational requirement. To acquire a large amount of labeled data necessary to train the deep neural network, we propose a workflow that leverages the target pipeline to create a significantly larger labeled training set automatically, without prior domain knowledge of the target pipeline. We show experimentally that despite the noise introduced by automated labeling and only using a very small initially labeled data set, the trained deep neural networks can achieve similar or even better performance than the components they replace, while in some cases also reducing computational requirements.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.12108

PDF

http://arxiv.org/pdf/1811.12108
Read All
On resampling vs. adjusting probabilistic graphical models in estimation of distribution algorithms

2019-02-15

Mohamed El Yafrani, Marcella S. R. Martins, Myriam R. B. S. Delgado, Inkyung Sung, Ricardo Lüders, Markus Wagner

arXiv_AI

arXiv_AI Sparse Relation
Abstract

The Bayesian Optimisation Algorithm (BOA) is an Estimation of Distribution Algorithm (EDA) that uses a Bayesian network as probabilistic graphical model (PGM). Determining the optimal Bayesian network structure given a solution sample is an NP-hard problem. This step should be completed at each iteration of BOA, resulting in a very time-consuming process. For this reason most implementations use greedy estimation algorithms such as K2. However, we show in this paper that significant changes in PGM structure do not occur so frequently, and can be particularly sparse at the end of evolution. A statistical study of BOA is thus presented to characterise a pattern of PGM adjustments that can be used as a guide to reduce the frequency of PGM updates during the evolutionary process. This is accomplished by proposing a new BOA-based optimisation approach (FBOA) whose PGM is not updated at each iteration. This new approach avoids the computational burden usually found in the standard BOA. The results compare the performances of both algorithms on an NK-landscape optimisation problem using the correlation between the ruggedness and the expected runtime over enumerated instances. The experiments show that FBOA presents competitive results while significantly saving computational time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05946

PDF

http://arxiv.org/pdf/1902.05946
Read All
Operational Neural Networks

2019-02-15

Serkan Kiranyaz, Turker Ince, Alexandros Iosifidis, Moncef Gabbouj

arXiv_AI

arXiv_AI CNN
Abstract

Feed-forward, fully-connected Artificial Neural Networks (ANNs) or the so-called Multi-Layer Perceptrons (MLPs) are well-known universal approximators. However, their learning performance varies significantly depending on the function or the solution space that they attempt to approximate. This is mainly because of their homogenous configuration based solely on the linear neuron model. Therefore, while they learn very well those problems with a monotonous, relatively simple and linearly separable solution space, they may entirely fail to do so when the solution space is highly nonlinear and complex. Sharing the same linear neuron model with two additional constraints (local connections and weight sharing), this is also true for the conventional Convolutional Neural Networks (CNNs) and, it is, therefore, not surprising that in many challenging problems only the deep CNNs with a massive complexity and depth can achieve the required diversity and the learning performance. In order to address this drawback and also to accomplish a more generalized model over the convolutional neurons, this study proposes a novel network model, called Operational Neural Networks (ONNs), which can be heterogeneous and encapsulate neurons with any set of operators to boost diversity and to learn highly complex and multi-modal functions or spaces with minimal network complexity and training data. Finally, a novel training method is formulated to back-propagate the error through the operational layers of ONNs. Experimental results over highly challenging problems demonstrate the superior learning capabilities of ONNs even with few neurons and hidden layers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11106

PDF

http://arxiv.org/pdf/1902.11106
Read All
GANFIT: Generative Adversarial Network Fitting for High Fidelity 3D Face Reconstruction

2019-02-15

Baris Gecer, Stylianos Ploumpis, Irene Kotsia, Stefanos Zafeiriou

arXiv_CV

arXiv_CV Adversarial Knowledge GAN Face CNN Optimization Relation
Abstract

In the past few years, a lot of work has been done towards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the most recent works, differentiable renderers were employed in order to learn the relationship between the facial identity features and the parameters of a 3D morphable model for shape and texture. The texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction of the state-of-the-art methods is still not capable of modelling textures in high fidelity. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful generator of facial texture in UV space. Then, we revisit the original 3D Morphable Models (3DMMs) fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05978

PDF

http://arxiv.org/pdf/1902.05978
Read All
The Capacity Constrained Facility Location problem

2019-02-15

Haris Aziz, Hau Chan, Barton E. Lee, David C. Parkes

arXiv_AI

arXiv_AI
Abstract

We initiate the study of the capacity constrained facility location problem from a mechanism design perspective. The capacity constrained setting leads to a new strategic environment where a facility serves a subset of the population, which is endogenously determined by the ex-post Nash equilibrium of an induced subgame and is not directly controlled by the mechanism designer. Our focus is on mechanisms that are ex-post dominant-strategy incentive compatible (DIC) at the reporting stage. We provide a complete characterization of DIC mechanisms via the family of Generalized Median Mechanisms (GMMs). In general, the social welfare optimal mechanism is not DIC. Adopting the worst-case approximation measure, we attain tight lower bounds on the approximation ratio of any DIC mechanism. The well-known median mechanism is shown to be optimal among the family of DIC mechanisms for certain capacity ranges. Surprisingly, the framework we introduce provides a new characterization for the family of GMMs, and is responsive to gaps in the current social choice literature highlighted by Border and Jordan (1983) and Barbar{`a}, Mass{'o} and Serizawa (1998).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.00960

PDF

http://arxiv.org/pdf/1806.00960
Read All
DeepFault: Fault Localization for Deep Neural Networks

2019-02-15

Hasan Ferit Eniser, Simos Gerasimou, Alper Sen

arXiv_CV

arXiv_CV Adversarial
Abstract

Deep Neural Networks (DNNs) are increasingly deployed in safety-critical applications including autonomous vehicles and medical diagnostics. To reduce the residual risk for unexpected DNN behaviour and provide evidence for their trustworthy operation, DNNs should be thoroughly tested. The DeepFault whitebox DNN testing approach presented in our paper addresses this challenge by employing suspiciousness measures inspired by fault localization to establish the hit spectrum of neurons and identify suspicious neurons whose weights have not been calibrated correctly and thus are considered responsible for inadequate DNN performance. DeepFault also uses a suspiciousness-guided algorithm to synthesize new inputs, from correctly classified inputs, that increase the activation values of suspicious neurons. Our empirical evaluation on several DNN instances trained on MNIST and CIFAR-10 datasets shows that DeepFault is effective in identifying suspicious neurons. Also, the inputs synthesized by DeepFault closely resemble the original inputs, exercise the identified suspicious neurons and are highly adversarial.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05974

PDF

http://arxiv.org/pdf/1902.05974
Read All
Privacy of Existence of Secrets: Introducing Steganographic DCOPs and Revisiting DCOP Frameworks

2019-02-15

Viorel D. Silaghi, Marius C. Silaghi, René Mandiau

arXiv_AI

arXiv_AI GAN Optimization
Abstract

Here we identify a type of privacy concern in Distributed Constraint Optimization (DCOPs) not previously addressed in literature, despite its importance and impact on the application field: the privacy of existence of secrets. Science only starts where metrics and assumptions are clearly defined. The area of Distributed Constraint Optimization has emerged at the intersection of the multi-agent system community and constraint programming. For the multi-agent community, the constraint optimization problems are an elegant way to express many of the problems occurring in trading and distributed robotics. For the theoretical constraint programming community the DCOPs are a natural extension of their main object of study, the constraint satisfaction problem. As such, the understanding of the DCOP framework has been refined with the needs of the two communities, but sometimes without spelling the new assumptions formally and therefore making it difficult to compare techniques. Here we give a direction to the efforts for structuring concepts in this area.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05943

PDF

http://arxiv.org/pdf/1902.05943
Read All
Learning to Understand Goal Specifications by Modelling Reward

2019-02-15

Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette

arXiv_AI

arXiv_AI Relation
Abstract

Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards. However, this places on environment designers the onus of designing language-conditional reward functions which may not be easily or tractably implemented as the complexity of the environment and the language scales. To overcome this limitation, we present a framework within which instruction-conditional RL agents are trained using rewards obtained not from the environment, but from reward models which are jointly trained from expert examples. As reward models improve, they learn to accurately reward agents for completing tasks for environment configurations—and for instructions—not present amongst the expert data. This framework effectively separates the representation of what instructions require from how they can be executed. In a simple grid world, it enables an agent to learn a range of commands requiring interaction with blocks and understanding of spatial relations and underspecified abstract arrangements. We further show the method allows our agent to adapt to changes in the environment without requiring new expert examples.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.01946

PDF

http://arxiv.org/pdf/1806.01946
Read All
Robot Co-design: Beyond the Monotone Case

2019-02-15

Luca Carlone, Carlo Pinciroli

arXiv_RO

arXiv_RO Drone Optimization
Abstract

Recent advances in 3D printing and manufacturing of miniaturized robotic hardware and computing are paving the way to build inexpensive and disposable robots. This will have a large impact on several applications including scientific discovery (e.g., hurricane monitoring), search-and-rescue (e.g., operation in confined spaces), and entertainment (e.g., nano drones). The need for inexpensive and task-specific robots clashes with the current practice, where human experts are in charge of designing hardware and software aspects of the robotic platform. This makes the robot design process expensive and time-consuming, and ultimately unsuitable for small-volumes low-cost applications. This paper considers the computational robot co-design problem, which aims to create an automatic algorithm that selects the best robotic modules (sensing, actuation, computing) in order to maximize the performance on a task, while satisfying given specifications (e.g., maximum cost of the resulting design). We propose a binary optimization formulation of the co-design problem and show that such formulation generalizes previous work based on strong modeling assumptions. We show that the proposed formulation can solve relatively large co-design problems in seconds and with minimal human intervention. We demonstrate the proposed approach in two applications: the co-design of an autonomous drone racing platform and the co-design of a multi-robot system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05880

PDF

http://arxiv.org/pdf/1902.05880
Read All
Street Scene: A new dataset and evaluation protocol for video anomaly detection

2019-02-15

Barathkumar Ramachandra, Michael Jones

arXiv_CV

arXiv_CV Detection
Abstract

Progress in video anomaly detection research is currently slowed by small datasets that lack a wide variety of activities as well as flawed evaluation criteria. This paper aims to help move this research effort forward by introducing a large and varied new dataset called Street Scene, as well as two new evaluation criteria that provide a better estimate of how an algorithm will perform in practice. In addition to the new dataset and evaluation criteria, we present two variations of a novel baseline video anomaly detection algorithm and show they are much more accurate on Street Scene than two state-of-the-art algorithms from the literature.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05872

PDF

http://arxiv.org/pdf/1902.05872
Read All
Estimation of blood oxygenation with learned spectral decoloring for quantitative photoacoustic imaging

2019-02-15

Janek Gröhl, Thomas Kirchner, Tim Adler, Lena Maier-Hein

arXiv_CV

arXiv_CV Quantitative
Abstract

One of the main applications of photoacoustic (PA) imaging is the recovery of functional tissue properties, such as blood oxygenation (sO2). This is typically achieved by linear spectral unmixing of relevant chromophores from multispectral photoacoustic images. Despite the progress that has been made towards quantitative PA imaging (qPAI), most sO2 estimation methods yield poor results in realistic settings. In this work, we tackle the challenge by employing learned spectral decoloring for quantitative photoacoustic imaging (LSD-qPAI) to obtain quantitative estimates for blood oxygenation. LSD-qPAI computes sO2 directly from pixel-wise initial pressure spectra Sp0, which are vectors comprised of the initial pressure at the same spatial location over all recorded wavelengths. Initial results suggest that LSD-qPAI is able to obtain accurate sO2 estimates directly from multispectral photoacoustic measurements in silico and plausible estimates in vivo.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05839

PDF

http://arxiv.org/pdf/1902.05839
Read All
Deeply Supervised Multimodal Attentional Translation Embeddings for Visual Relationship Detection

2019-02-15

Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Koutras, Athanasia Zlatintsi, Petros Maragos

arXiv_CV

arXiv_CV Attention Embedding Quantitative Detection Relation
Abstract

Detecting visual relationships, i.e. <Subject, Predicate, Object> triplets, is a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch. We introduce a new deeply supervised two-branch architecture, the Multimodal Attentional Translation Embeddings, where the visual features of each branch are driven by a multimodal attentional mechanism that exploits spatio-linguistic similarities in a low-dimensional space. We present a variety of experiments comparing against all related approaches in the literature, as well as by re-implementing and fine-tuning several of them. Results on the commonly employed VRD dataset [1] show that the proposed method clearly outperforms all others, while we also justify our claims both quantitatively and qualitatively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05829

PDF

http://arxiv.org/pdf/1902.05829
Read All

154/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL