Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Detailed Human Shape Estimation from a Single Image by Hierarchical Mesh Deformation

2019-04-24

Hao Zhu, Xinxin Zuo, Sen Wang, Xun Cao, Ruigang Yang

arXiv_CV

arXiv_CV Face
Abstract

This paper presents a novel framework to recover detailed human body shapes from a single image. It is a challenging task due to factors such as variations in human shapes, body poses, and viewpoints. Prior methods typically attempt to recover the human body shape using a parametric based template that lacks the surface details. As such the resulting body shape appears to be without clothing. In this paper, we propose a novel learning-based framework that combines the robustness of parametric model with the flexibility of free-form 3D deformation. We use the deep neural networks to refine the 3D shape in a Hierarchical Mesh Deformation (HMD) framework, utilizing the constraints from body joints, silhouettes, and per-pixel shading information. We are able to restore detailed human body shapes beyond skinned models. Experiments demonstrate that our method has outperformed previous state-of-the-art approaches, achieving better accuracy in terms of both 2D IoU number and 3D metric distance. The code is available in https://github.com/zhuhao-nju/hmd.git

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10506

PDF

http://arxiv.org/pdf/1904.10506
Read All
On the Contributions of Visual and Textual Supervision in Low-resource Semantic Speech Retrieval

2019-04-24

Ankita Pasad, Bowen Shi, Herman Kamper, Karen Livescu

arXiv_CL

arXiv_CL Caption
Abstract

Recent work has shown that speech paired with images can be used to learn semantically meaningful speech representations even without any textual supervision. In real-world low-resource settings, however, we often have access to some transcribed speech. We study whether and how visual grounding is useful in the presence of varying amounts of textual supervision. In particular, we consider the task of semantic speech retrieval in a low-resource setting. We use a previously studied data set and task, where models are trained on images with spoken captions and evaluated on human judgments of semantic relevance. We propose a multitask learning approach to leverage both visual and textual modalities, with visual supervision in the form of keyword probabilities from an external tagger. We find that visual grounding is helpful even in the presence of textual supervision, and we analyze this effect over a range of sizes of transcribed data sets. With ~5 hours of transcribed speech, we obtain 23% higher average precision when also using visual supervision.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10947

PDF

http://arxiv.org/pdf/1904.10947
Read All
Tactile Mapping and Localization from High-Resolution Tactile Imprints

2019-04-24

Maria Bauza, Oleguer Canal, Alberto Rodriguez

arXiv_RO

arXiv_RO
Abstract

This work studies the problem of shape reconstruction and object localization using a vision-based tactile sensor, GelSlim. The main contributions are the recovery of local shapes from contact, an approach to reconstruct the tactile shape of objects from tactile imprints, and an accurate method for object localization of previously reconstructed objects. The algorithms can be applied to a large variety of 3D objects and provide accurate tactile feedback for in-hand manipulation. Results show that by exploiting the dense tactile information we can reconstruct the shape of objects with high accuracy and do on-line object identification and localization, opening the door to reactive manipulation guided by tactile sensing. We provide videos and supplemental information in the project’s website this http URL

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10944

PDF

http://arxiv.org/pdf/1904.10944
Read All
The iterative convolution-thresholding method for image segmentation

2019-04-24

Dong Wang, Xiao-Ping Wang

arXiv_CV

arXiv_CV Regularization Segmentation Face
Abstract

In this paper, we propose a novel iterative convolution-thresholding method (ICTM) that is applicable to a range of variational models for image segmentation. A variational model usually minimizes an energy functional consisting of a fidelity term and a regularization term. In the ICTM, the interface between two different segment domains is implicitly represented by their characteristic functions. The fidelity term is then usually written as a linear functional of the characteristic functions and the regularized term is approximated by a functional of characteristic functions in terms of heat kernel convolution. This allows us to design an iterative convolution-thresholding method to minimize the approximate energy. The method is simple, efficient and enjoys the energy-decaying property. Numerical experiments show that the method is easy to implement, robust and applicable to various image segmentation models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10917

PDF

http://arxiv.org/pdf/1904.10917
Read All
ViDeNN: Deep Blind Video Denoising

2019-04-24

Michele Claus, Jan van Gemert

arXiv_CV

arXiv_CV Knowledge
Abstract

We propose ViDeNN: a CNN for Video Denoising without prior knowledge on the noise distribution (blind denoising). The CNN architecture uses a combination of spatial and temporal filtering, learning to spatially denoise the frames first and at the same time how to combine their temporal information, handling objects motion, brightness changes, low-light conditions and temporal inconsistencies. We demonstrate the importance of the data used for CNNs training, creating for this purpose a specific dataset for low-light conditions. We test ViDeNN on common benchmarks and on self-collected data, achieving good results comparable with the state-of-the-art.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10898

PDF

http://arxiv.org/pdf/1904.10898
Read All
Learning Single-Image Depth from Videos using Quality Assessment Networks

2019-04-24

Weifeng Chen, Shengyi Qian, Jia Deng

arXiv_CV

arXiv_CV
Abstract

Depth estimation from a single image in the wild remains a challenging problem. One main obstacle is the lack of high-quality training data for images in the wild. In this paper we propose a method to automatically generate such data through Structure-from-Motion (SfM) on Internet videos. The core of this method is a Quality Assessment Network that identifies high-quality reconstructions obtained from SfM. Using this method, we collect single-view depth training data from a large number of YouTube videos and construct a new dataset called YouTube3D. Experiments show that YouTube3D is useful in training depth estimation networks and advances the state of the art of single-view depth estimation in the wild.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.09573

PDF

http://arxiv.org/pdf/1806.09573
Read All
Listening between the Lines: Learning Personal Attributes from Conversations

2019-04-24

Anna Tigunova, Andrew Yates, Paramita Mirza, Gerhard Weikum

arXiv_CL

arXiv_CL Knowledge Attention Embedding Deep_Learning
Abstract

Open-domain dialogue agents must be able to converse about many topics while incorporating knowledge about the user into the conversation. In this work we address the acquisition of such knowledge, for personalization in downstream Web applications, by extracting personal attributes from conversations. This problem is more challenging than the established task of information extraction from scientific publications or Wikipedia articles, because dialogues often give merely implicit cues about the speaker. We propose methods for inferring personal attributes, such as profession, age or family status, from conversations using deep learning. Specifically, we propose several Hidden Attribute Models, which are neural networks leveraging attention mechanisms and embeddings. Our methods are trained on a per-predicate basis to output rankings of object values for a given subject-predicate combination (e.g., ranking the doctor and nurse professions high when speakers talk about patients, emergency rooms, etc). Experiments with various conversational texts including Reddit discussions, movie scripts and a collection of crowdsourced personal dialogues demonstrate the viability of our methods and their superior performance compared to state-of-the-art baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10887

PDF

http://arxiv.org/pdf/1904.10887
Read All
Integrating Social Media into a Pan-European Flood Awareness System: A Multilingual Approach

2019-04-24

V. Lorini (European Commission, Joint Research Centre (JRC), Ispra, Italy, Universitat Pompeu Fabra, Barcelona, Spain), C. Castillo (Universitat Pompeu Fabra, Barcelona, Spain), F. Dottori (European Commission, Joint Research Centre (JRC), Ispra, Italy), M. Kalas (KAJO, Bytca, Slovakia), D. Nappo (European Commission, Joint Research Centre (JRC), Ispra, Italy), P. Salamon (European Commission, Joint Research Centre (JRC), Ispra, Italy)

arXiv_AI

arXiv_AI Face Embedding
Abstract

This paper describes a prototype system that integrates social media analysis into the European Flood Awareness System (EFAS). This integration allows the collection of social media data to be automatically triggered by flood risk warnings determined by a hydro-meteorological model. Then, we adopt a multi-lingual approach to find flood-related messages by employing two state-of-the-art methodologies: language-agnostic word embeddings and language-aligned word embeddings. Both approaches can be used to bootstrap a classifier of social media messages for a new language with little or no labeled data. Finally, we describe a method for selecting relevant and representative messages and displaying them back in the interface of EFAS.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10876

PDF

http://arxiv.org/pdf/1904.10876
Read All
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

2019-04-24

Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville

arXiv_CL

arXiv_CL Inference RNN Language_Model
Abstract

Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.09536

PDF

http://arxiv.org/pdf/1810.09536
Read All
$S^{2}$-LBI: Stochastic Split Linearized Bregman Iterations for Parsimonious Deep Learning

2019-04-24

Yanwei Fu, Donghao Li, Xinwei Sun, Shun Zhang, Yizhou Wang, Yuan Yao

arXiv_CV

arXiv_CV Regularization CNN Deep_Learning Recognition
Abstract

This paper proposes a novel Stochastic Split Linearized Bregman Iteration ($S^{2}$-LBI) algorithm to efficiently train the deep network. The $S^{2}$-LBI introduces an iterative regularization path with structural sparsity. Our $S^{2}$-LBI combines the computational efficiency of the LBI, and model selection consistency in learning the structural sparsity. The computed solution path intrinsically enables us to enlarge or simplify a network, which theoretically, is benefited from the dynamics property of our $S^{2}$-LBI algorithm. The experimental results validate our $S^{2}$-LBI on MNIST and CIFAR-10 dataset. For example, in MNIST, we can either boost a network with only 1.5K parameters (1 convolutional layer of 5 filters, and 1 FC layer), achieves 98.40\% recognition accuracy; or we simplify $82.5\%$ of parameters in LeNet-5 network, and still achieves the 98.47\% recognition accuracy. In addition, we also have the learning results on ImageNet, which will be added in the next version of our report.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10873

PDF

http://arxiv.org/pdf/1904.10873
Read All
Unsupervised Assignment Flow: Label Learning on Feature Manifolds by Spatially Regularized Geometric Assignment

2019-04-24

Artjom Zern, Matthias Zisler, Stefania Petra, Christoph Schnörr

arXiv_CV

arXiv_CV
Abstract

This paper introduces the unsupervised assignment flow that couples the assignment flow for supervised image labeling with Riemannian gradient flows for label evolution on feature manifolds. The latter component of the approach encompasses extensions of state-of-the-art clustering approaches to manifold-valued data. Coupling label evolution with the spatially regularized assignment flow induces a sparsifying effect that enables to learn compact label dictionaries in an unsupervised manner. Our approach alleviates the requirement for supervised labeling to have proper labels at hand, because an initial set of labels can evolve and adapt to better values while being assigned to given data. The separation between feature and assignment manifolds enables the flexible application which is demonstrated for three scenarios with manifold-valued features. Experiments demonstrate beneficial effect in both directions: adaptivity of labels improves image labeling, and steering label evolution by spatially regularized assignments leads to proper labels, because the assignment flow for supervised labeling is exactly used without any approximation for label learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10863

PDF

http://arxiv.org/pdf/1904.10863
Read All
Optical machine learning with incoherent light and a single-pixel detector

2019-04-24

Shuming Jiao, Xiang Li, Zibang Zhang, Yang Gao, Ting Lei, Zhenwei Xie, Xiaocong Yuan

arXiv_CV

arXiv_CV Object_Detection Detection Recognition
Abstract

The concept of optical diffractive neural network (DNN) is proposed recently, which is implemented by a cascaded phase mask architecture. Like an optical computer, the system can perform machine learning tasks such as number digit recognition in an all-optical manner. However, the system can only work under coherent light illumination and the precision requirement in practical experiments is quite high. This paper proposes an optical machine learning framework based on single-pixel imaging (MLSPI). The MLSPI system can perform the same linear pattern recognition task as DNN. Furthermore, it can work under incoherent lighting conditions, has lower experimental complexity and being easily programmable.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10851

PDF

http://arxiv.org/pdf/1904.10851
Read All
Semantic Drift in Multilingual Representations

2019-04-24

Lisa Beinborn, Rochelle Choenni

arXiv_AI

arXiv_AI GAN Quantitative Relation
Abstract

Multilingual representations have mostly been evaluated based on their performance on specific tasks. In this article, we look beyond engineering goals and analyze the relations between languages in computational representations. We introduce a methodology for comparing languages based on their organization of semantic concepts. We propose to conduct an adapted version of representational similarity analysis of a selected set of concepts in computational multilingual representations. Using this analysis method, we can reconstruct a phylogenetic tree that closely resembles those assumed by linguistic experts. These results indicate that multilingual distributional representations which are only trained on monolingual text and bilingual dictionaries preserve relations between languages without the need for any etymological information. In addition, we propose a measure to identify semantic drift between language families. We perform experiments on word-based and sentence-based multilingual models and provide both quantitative results and qualitative examples. Analyses of semantic drift in multilingual representations can serve two purposes: they can indicate unwanted characteristics of the computational models and they provide a quantitative means to study linguistic phenomena across languages. The code is available at https://github.com/beinborn/SemanticDrift.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10820

PDF

http://arxiv.org/pdf/1904.10820
Read All
Automatic cephalometric landmarks detection on frontal faces: an approach based on supervised learning techniques

2019-04-24

Lucas Faria Porto, Laise Nascimento Correia Lima, Marta Flores, Andrea Valsecchi, Oscar Ibanez, Carlos Eduardo Machado Palhares, Flavio de Barros Vidal

arXiv_CV

arXiv_CV Face Detection Recognition
Abstract

Facial landmarks are employed in many research areas such as facial recognition, craniofacial identification, age and sex estimation among the most important. In the forensic field, the focus is on the analysis of a particular set of facial landmarks, defined as cephalometric landmarks. Previous works demonstrated that the descriptive adequacy of these anatomical references for an indirect application (photo-anthropometric description) increased the marking precision of these points, contributing to a greater reliability of these analyzes. However, most of them are performed manually and all of them are subjectivity inherent to the expert examiners. In this sense, the purpose of this work is the development and validation of automatic techniques to detect cephalometric landmarks from digital images of frontal faces in forensic field. The presented approach uses a combination of computer vision and image processing techniques within a supervised learning procedures. The proposed methodology obtains similar precision to a group of human manual cephalometric reference markers and result to be more accurate against others state-of-the-art facial landmark detection frameworks. It achieves a normalized mean distance (in pixel) error of 0.014, similar to the mean inter-expert dispersion (0.009) and clearly better than other automatic approaches also analyzed along of this work (0.026 and 0.101).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10816

PDF

http://arxiv.org/pdf/1904.10816
Read All
Simultaneous regression and feature learning for facial landmarking

2019-04-24

Janez Križaj, Peter Peer, Vitomir Štruc, Simon Dobrišek

arXiv_CV

arXiv_CV Salient Face Tracking Classification Recognition
Abstract

Face alignment (or facial landmarking) is an important task in many face-related applications, ranging from registration, tracking and animation to higher-level classification problems such as face, expression or attribute recognition. While several solutions have been presented in the literature for this task so far, reliably locating salient facial features across a wide range of posses still remains challenging. To address this issue, we propose in this paper a novel method for automatic facial landmark localization in 3D face data designed specifically to address appearance variability caused by significant pose variations. Our method builds on recent cascaded-regression-based methods to facial landmarking and uses a gating mechanism to incorporate multiple linear cascaded regression models each trained for a limited range of poses into a single powerful landmarking model capable of processing arbitrary posed input data. We develop two distinct approaches around the proposed gating mechanism: i) the first uses a gated multiple ridge descent (GRID) mechanism in conjunction with established (hand-crafted) HOG features for face alignment and achieves state-of-the-art landmarking performance across a wide range of facial poses, ii) the second simultaneously learns multiple-descent directions as well as binary features (SMUF) that are optimal for the alignment tasks and in addition to competitive landmarking results also ensures extremely rapid processing. We evaluate both approaches in rigorous experiments on several popular datasets of 3D face images, i.e., the FRGCv2 and Bosphorus 3D Face datasets and image collections F and G from the University of Notre Dame. The results of our evaluation show that both approaches are competitive in comparison to the state-of-the-art, while exhibiting considerable robustness to pose variations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10787

PDF

http://arxiv.org/pdf/1904.10787
Read All
Informative sample generation using class aware generative adversarial networks for classification of chest Xrays

2019-04-24

Behzad Bozorgtabar, Dwarikanath Mahapatra

arXiv_CV

arXiv_CV Adversarial GAN Classification Deep_Learning Detection
Abstract

Training robust deep learning (DL) systems for disease detection from medical images is challenging due to limited images covering different disease types and severity. The problem is especially acute, where there is a severe class imbalance. We propose an active learning (AL) framework to select most informative samples for training our model using a Bayesian neural network. Informative samples are then used within a novel class aware generative adversarial network (CAGAN) to generate realistic chest xray images for data augmentation by transferring characteristics from one class label to another. Experiments show our proposed AL framework is able to achieve state-of-the-art performance by using about $35\%$ of the full dataset, thus saving significant time and effort over conventional methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10781

PDF

http://arxiv.org/pdf/1904.10781
Read All
CED: Color Event Camera Dataset

2019-04-24

Cedric Scheerlinck, Henri Rebecq, Timo Stoffregen, Nick Barnes, Robert Mahony, Davide Scaramuzza

arXiv_CV

arXiv_CV
Abstract

Event cameras are novel, bio-inspired visual sensors, whose pixels output asynchronous and independent timestamped spikes at local intensity changes, called ‘events’. Event cameras offer advantages over conventional frame-based cameras in terms of latency, high dynamic range (HDR) and temporal resolution. Until recently, event cameras have been limited to outputting events in the intensity channel, however, recent advances have resulted in the development of color event cameras, such as the Color-DAVIS346. In this work, we present and release the first Color Event Camera Dataset (CED), containing 50 minutes of footage with both color frames and events. CED features a wide variety of indoor and outdoor scenes, which we hope will help drive forward event-based vision research. We also present an extension of the event camera simulator ESIM that enables simulation of color events. Finally, we present an evaluation of three state-of-the-art image reconstruction methods that can be used to convert the Color-DAVIS346 into a continuous-time, HDR, color video camera to visualise the event stream, and for use in downstream vision applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10772

PDF

http://arxiv.org/pdf/1904.10772
Read All
OperatorNet: Recovering 3D Shapes From Difference Operators

2019-04-24

Ruqi Huang, Marie-Julie Rakotosaona, Panos Achlioptas, Leonidas Guibas, Maks Ovsjanikov

arXiv_CV

arXiv_CV Embedding
Abstract

This paper proposes a learning-based framework for reconstructing 3D shapes from functional operators, compactly encoded as small-sized matrices. To this end we introduce a novel neural architecture, called OperatorNet, which takes as input a set of linear operators representing a shape and produces its 3D embedding. We demonstrate that this approach significantly outperforms previous purely geometric methods for the same problem. Furthermore, we introduce a novel functional operator, which encodes the extrinsic or pose-dependent shape information, and thus complements purely intrinsic pose-oblivious operators, such as the classical Laplacian. Coupled with this novel operator, our reconstruction network achieves very high reconstruction accuracy, even in the presence of incomplete information about a shape, given a soft or functional map expressed in a reduced basis. Finally, we demonstrate that the multiplicative functional algebra enjoyed by these operators can be used to synthesize entirely new unseen shapes, in the context of shape interpolation and shape analogy applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10754

PDF

http://arxiv.org/pdf/1904.10754
Read All
InGAN: Capturing and Remapping the 'DNA' of a Natural Image

2019-04-24

Assaf Shocher, Shai Bagon, Phillip Isola, Michal Irani

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

Generative Adversarial Networks (GANs) typically learn a distribution of images in a large image dataset, and are then able to generate new images from this distribution. However, each natural image has its own internal statistics, captured by its unique distribution of patches. In this paper we propose an “Internal GAN” (InGAN) - an image-specific GAN - which trains on a single input image and learns its internal distribution of patches. It is then able to synthesize a plethora of new natural images of significantly different sizes, shapes and aspect-ratios - all with the same internal patch-distribution (same “DNA”) as the input image. In particular, despite large changes in global size/shape of the image, all elements inside the image maintain their local size/shape. InGAN is fully unsupervised, requiring no additional data other than the input image itself. Once trained on the input image, it can remap the input to any size or shape in a single feedforward pass, while preserving the same internal patch distribution. InGAN provides a unified framework for a variety of tasks, bridging the gap between textures and natural images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.00231

PDF

http://arxiv.org/pdf/1812.00231
Read All
Extracting Keywords from Open-Ended Business Survey Questions

2019-04-24

Barbara McGillivray, Gard Jenset, Dominik Heil

arXiv_CL

arXiv_CL Survey
Abstract

Open-ended survey data constitute an important basis in research as well as for making business decisions. Collecting and manually analysing free-text survey data is generally more costly than collecting and analysing survey data consisting of answers to multiple-choice questions. Yet free-text data allow for new content to be expressed beyond predefined categories and are a very valuable source of new insights into people’s opinions. At the same time, surveys always make ontological assumptions about the nature of the entities that are researched, and this has vital ethical consequences. Human interpretations and opinions can only be properly ascertained in their richness using textual data sources; if these sources are analyzed appropriately, the essential linguistic nature of humans and social entities is safeguarded. Natural Language Processing (NLP) offers possibilities for meeting this ethical business challenge by automating the analysis of natural language and thus allowing for insightful investigations of human judgements. We present a computational pipeline for analysing large amounts of responses to open-ended questions in surveys and extract keywords that appropriately represent people’s opinions. This pipeline addresses the need to perform such tasks outside the scope of both commercial software and bespoke analysis, exceeds the performance to state-of-the-art systems, and performs this task in a transparent way that allows for scrutinising and exposing potential biases in the analysis. Following the principle of Open Data Science, our code is open-source and generalizable to other datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.10685

PDF

http://arxiv.org/pdf/1808.10685
Read All
A bag-of-concepts model improves relation extraction in a narrow knowledge domain with limited data

2019-04-24

Jiyu Chen, Karin Verspoor, Zenan Zhai

arXiv_CL

arXiv_CL Knowledge Relation_Extraction Embedding Relation
Abstract

This paper focuses on a traditional relation extraction task in the context of limited annotated data and a narrow knowledge domain. We explore this task with a clinical corpus consisting of 200 breast cancer follow-up treatment letters in which 16 distinct types of relations are annotated. We experiment with an approach to extracting typed relations called window-bounded co-occurrence (WBC), which uses an adjustable context window around entity mentions of a relevant type, and compare its performance with a more typical intra-sentential co-occurrence baseline. We further introduce a new bag-of-concepts (BoC) approach to feature engineering based on the state-of-the-art word embeddings and word synonyms. We demonstrate the competitiveness of BoC by comparing with methods of higher complexity, and explore its effectiveness on this small dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10743

PDF

http://arxiv.org/pdf/1904.10743
Read All
A Self-Attentive Emotion Recognition Network

2019-04-24

Harris Partaourides, Kostantinos Papadamou, Nicolas Kourtellis, Ilias Leontiadis, Sotirios Chatzis

arXiv_CL

arXiv_CL Attention Inference Deep_Learning Relation Recognition
Abstract

Modern deep learning approaches have achieved groundbreaking performance in modeling and classifying sequential data. Specifically, attention networks constitute the state-of-the-art paradigm for capturing long temporal dynamics. This paper examines the efficacy of this paradigm in the challenging task of emotion recognition in dyadic conversations. In contrast to existing approaches, our work introduces a novel attention mechanism capable of inferring the immensity of the effect of each past utterance on the current speaker emotional state. The proposed attention mechanism performs this inference procedure without the need of a decoder network; this is achieved by means of innovative self-attention arguments. Our self-attention networks capture the correlation patterns among consecutive encoder network states, thus allowing to robustly and effectively model temporal dynamics over arbitrary long temporal horizons. Thus, we enable capturing strong affective patterns over the course of long discussions. We exhibit the effectiveness of our approach considering the challenging IEMOCAP benchmark. As we show, our devised methodology outperforms state-of-the-art alternatives and commonly used approaches, giving rise to promising new research directions in the context of Online Social Network (OSN) analysis tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.01972

PDF

http://arxiv.org/pdf/1905.01972
Read All
Generating Token-Level Explanations for Natural Language Inference

2019-04-24

James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal

arXiv_CL

arXiv_CL Attention Inference RNN Classification Prediction Relation
Abstract

The task of Natural Language Inference (NLI) is widely modeled as supervised sentence pair classification. While there has been a lot of work recently on generating explanations of the predictions of classifiers on a single piece of text, there have been no attempts to generate explanations of classifiers operating on pairs of sentences. In this paper, we show that it is possible to generate token-level explanations for NLI without the need for training data explicitly annotated for this purpose. We use a simple LSTM architecture and evaluate both LIME and Anchor explanations for this task. We compare these to a Multiple Instance Learning (MIL) method that uses thresholded attention make token-level predictions. The approach we present in this paper is a novel extension of zero-shot single-sentence tagging to sentence pairs for NLI. We conduct our experiments on the well-studied SNLI dataset that was recently augmented with manually annotation of the tokens that explain the entailment relation. We find that our white-box MIL-based method, while orders of magnitude faster, does not reach the same accuracy as the black-box methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10717

PDF

http://arxiv.org/pdf/1904.10717
Read All
Bayesian Gaussian mixture model for robotic policy imitation

2019-04-24

Emmanuel Pignat, Sylvain Calinon

arXiv_RO

arXiv_RO
Abstract

A common approach to learn robotic skills is to imitate a policy demonstrated by a supervisor. One of the existing problems is that, due to the compounding of small errors and perturbations, the robot may leave the states where demonstrations were given. If no strategy is employed to provide a guarantee on how the robot will behave when facing unknown states, catastrophic outcomes can happen. An appealing approach is to use Bayesian methods, which offer a quantification of the action uncertainty given the state. Bayesian methods are usually more computationally demanding and require more complex design choices than their non-Bayesian alternatives, which limits their application. In this work, we present a Bayesian method that is both simple to set up, computationally efficient and that can adapt to a wide range of problems. These advantages make this method very convenient for imitation of robotic manipulation tasks in the continuous domain. We exploit the provided uncertainty to fuse the imitation policy with other policies. The approach is validated on a Panda robot with three tasks using different control input/state pairs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10716

PDF

http://arxiv.org/pdf/1904.10716
Read All
A CNN-RNN Architecture for Multi-Label Weather Recognition

2019-04-24

Bin Zhao, Xuelong Li, Xiaoqiang Lu, Zhigang Wang

arXiv_AI

arXiv_AI Attention CNN RNN Classification Relation Recognition
Abstract

Weather Recognition plays an important role in our daily lives and many computer vision applications. However, recognizing the weather conditions from a single image remains challenging and has not been studied thoroughly. Generally, most previous works treat weather recognition as a single-label classification task, namely, determining whether an image belongs to a specific weather class or not. This treatment is not always appropriate, since more than one weather conditions may appear simultaneously in a single image. To address this problem, we make the first attempt to view weather recognition as a multi-label classification task, i.e., assigning an image more than one labels according to the displayed weather conditions. Specifically, a CNN-RNN based multi-label classification approach is proposed in this paper. The convolutional neural network (CNN) is extended with a channel-wise attention model to extract the most correlated visual features. The Recurrent Neural Network (RNN) further processes the features and excavates the dependencies among weather classes. Finally, the weather labels are predicted step by step. Besides, we construct two datasets for the weather recognition task and explore the relationships among different weather conditions. Experimental results demonstrate the superiority and effectiveness of the proposed approach. The new constructed datasets will be available at https://github.com/wzgwzg/Multi-Label-Weather-Recognition.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10709

PDF

http://arxiv.org/pdf/1904.10709
Read All
PCA-RECT: An Energy-efficient Object Detection Approach for Event Cameras

2019-04-24

Bharath Ramesh, Andres Ussa, Luca Della Vedova, Hong Yang, Garrick Orchard

arXiv_CV

arXiv_CV Object_Detection Tracking Classification Detection Recognition
Abstract

We present the first purely event-based, energy-efficient approach for object detection and categorization using an event camera. Compared to traditional frame-based cameras, choosing event cameras results in high temporal resolution (order of microseconds), low power consumption (few hundred mW) and wide dynamic range (120 dB) as attractive properties. However, event-based object recognition systems are far behind their frame-based counterparts in terms of accuracy. To this end, this paper presents an event-based feature extraction method devised by accumulating local activity across the image frame and then applying principal component analysis (PCA) to the normalized neighborhood region. Subsequently, we propose a backtracking-free k-d tree mechanism for efficient feature matching by taking advantage of the low-dimensionality of the feature representation. Additionally, the proposed k-d tree mechanism allows for feature selection to obtain a lower-dimensional dictionary representation when hardware resources are limited to implement dimensionality reduction. Consequently, the proposed system can be realized on a field-programmable gate array (FPGA) device leading to high performance over resource ratio. The proposed system is tested on real-world event-based datasets for object categorization, showing superior classification performance and relevance to state-of-the-art algorithms. Additionally, we verified the object detection method and real-time FPGA performance in lab settings under non-controlled illumination conditions with limited training data and ground truth annotations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12665

PDF

http://arxiv.org/pdf/1904.12665
Read All
Pixelation is NOT Done in Videos Yet

2019-04-24

Jizhe Zhou, Chi-Man Pun, YingYu Wang

arXiv_CV

arXiv_CV Face Recognition Face_Recognition
Abstract

This paper introduces an algorithm to protect the privacy of individuals in streaming video data by blurring faces such that face cannot be reliably recognized. This thwarts any possible face recognition, but because all facial details are obscured, the result is of limited use. We propose a new clustering algorithm to create raw trajectories for detected faces. Associating faces across frames to form trajectories, it auto-generates cluster number and discovers new clusters through deep feature and position aggregated affinities. We introduce a Gaussian Process to refine the raw trajectories. We conducted an online experiment with 47 participants to evaluate the effectiveness of face blurring compared to the original photo (as-is), and users’ experience (satisfaction, information sufficiency, enjoyment, social presence, and filter likeability)

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.10836

PDF

http://arxiv.org/pdf/1903.10836
Read All
The VGG Image Annotator

2019-04-24

Abhishek Dutta, Andrew Zisserman

arXiv_CV

arXiv_CV
Abstract

Manual image annotation, such as defining and labelling regions of interest, is a fundamental processing stage of many research projects and industrial applications. In this paper, we introduce a simple and standalone manual image annotation tool: the VGG Image Annotator (\href{this http URL}{VIA}). This is a light weight, standalone and offline software package that does not require any installation or setup and runs solely in a web browser. Due to its lightness and flexibility, the VIA software has quickly become an essential and invaluable research support tool in many academic disciplines. Furthermore, it has also been immensely popular in several industrial sectors which have invested in adapting this open source software to their requirements. Since its public release in 2017, the VIA software has been used more than $500,000$ times and has nurtured a large and thriving open source community.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10699

PDF

http://arxiv.org/pdf/1904.10699
Read All
Context-Aware Zero-Shot Learning for Object Recognition

2019-04-24

Eloi Zablocki, Patrick Bordes, Benjamin Piwowarski, Laure Soulier, Patrick Gallinari

arXiv_CV

arXiv_CV Knowledge Recognition
Abstract

Zero-Shot Learning (ZSL) aims at classifying unlabeled objects by leveraging auxiliary knowledge, such as semantic representations. A limitation of previous approaches is that only intrinsic properties of objects, e.g. their visual appearance, are taken into account while their context, e.g. the surrounding objects in the image, is ignored. Following the intuitive principle that objects tend to be found in certain contexts but not others, we propose a new and challenging approach, context-aware ZSL, that leverages semantic representations in a new way to model the conditional likelihood of an object to appear in a given context. Finally, through extensive experiments conducted on Visual Genome, we show that contextual information can substantially improve the standard ZSL approach and is robust to unbalanced classes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12638

PDF

http://arxiv.org/pdf/1904.12638
Read All
Animation Techniques in Human-Robot Interaction User Studies: a Systematic Literature Review

2019-04-24

Trenton Schulz, Jim Torresen, Jo Herstad

arXiv_RO

arXiv_RO Review
Abstract

There are many different ways a robot can move in Human-Robot Interaction. One way is to use techniques from film animation to instruct the robot to move. This article is a systematic literature review of human-robot trials, pilots, and evaluations that have applied techniques from animation to move a robot. Through 27 articles, we find that animation techniques improves individual’s interaction with robots, improving individual’s perception of qualities of a robot, understanding what a robot intends to do, and showing the robot’s state, or possible emotion. Animation techniques also help people relate to robots that do not resemble a human or robot. The studies in the articles show further areas for research, such as applying animation principles in other types of robots and situations, combining animation techniques with other modalities, and testing robots moving with animation techniques over the long term.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1812.06784

PDF

http://arxiv.org/pdf/1812.06784
Read All
Multi-scale deep neural networks for real image super-resolution

2019-04-24

Shangqi Gao, Xiahai Zhuang

arXiv_CV

arXiv_CV Super_Resolution
Abstract

Single image super-resolution (SR) is extremely difficult if the upscaling factors of image pairs are unknown and different from each other, which is common in real image SR. To tackle the difficulty, we develop two multi-scale deep neural networks (MsDNN) in this work. Firstly, due to the high computation complexity in high-resolution spaces, we process an input image mainly in two different downscaling spaces, which could greatly lower the usage of GPU memory. Then, to reconstruct the details of an image, we design a multi-scale residual network (MsRN) in the downscaling spaces based on the residual blocks. Besides, we propose a multi-scale dense network based on the dense blocks to compare with MsRN. Finally, our empirical experiments show the robustness of MsDNN for image SR when the upscaling factor is unknown. According to the preliminary results of NTIRE 2019 image SR challenge, our team (ZXHresearch@fudan) ranks 21-st among all participants. The implementation of MsDNN is released https://github.com/shangqigao/gsq-image-SR

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10698

PDF

http://arxiv.org/pdf/1904.10698
Read All
Gradual Machine Learning for Entity Resolution

2019-04-24

Boyi Hou, Qun Chen, Yanyan Wang, Zhanhuai Li

arXiv_AI

arXiv_AI Inference Classification
Abstract

Usually considered as a classification problem, entity resolution can be very challenging on real data due to the prevalence of dirty values. The state-of-the-art solutions for ER were built on a variety of learning models (most notably deep neural networks), which require lots of accurately labeled training data. Unfortunately, high-quality labeled data usually require expensive manual work, and are therefore not readily available in many real scenarios. In this paper, we propose a novel learning paradigm for ER, called gradual machine learning, which aims to enable effective machine learning without the requirement for manual labeling effort. It begins with some easy instances in a task, which can be automatically labeled by the machine with high accuracy, and then gradually labels more challenging instances based on iterative factor graph inference. In gradual machine learning, the hard instances in a task are gradually labeled in small stages based on the estimated evidential certainty provided by the labeled easier instances. Our extensive experiments on real data have shown that the proposed approach performs considerably better than its unsupervised alternatives, and it is highly competitive with the state-of-the-art supervised techniques. Using ER as a test case, we demonstrate that gradual machine learning is a promising paradigm potentially applicable to other challenging classification tasks requiring extensive labeling effort.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.12125

PDF

http://arxiv.org/pdf/1810.12125
Read All
A Large-scale Varying-view RGB-D Action Dataset for Arbitrary-view Human Action Recognition

2019-04-24

Yanli Ji, Feixiang Xu, Yang Yang, Fumin Shen, Heng Tao Shen, Wei-Shi Zheng

arXiv_CV

arXiv_CV Action_Recognition Recognition
Abstract

Current researches of action recognition mainly focus on single-view and multi-view recognition, which can hardly satisfies the requirements of human-robot interaction (HRI) applications to recognize actions from arbitrary views. The lack of datasets also sets up barriers. To provide data for arbitrary-view action recognition, we newly collect a large-scale RGB-D action dataset for arbitrary-view action analysis, including RGB videos, depth and skeleton sequences. The dataset includes action samples captured in 8 fixed viewpoints and varying-view sequences which covers the entire 360 degree view angles. In total, 118 persons are invited to act 40 action categories, and 25,600 video samples are collected. Our dataset involves more participants, more viewpoints and a large number of samples. More importantly, it is the first dataset containing the entire 360 degree varying-view sequences. The dataset provides sufficient data for multi-view, cross-view and arbitrary-view action analysis. Besides, we propose a View-guided Skeleton CNN (VS-CNN) to tackle the problem of arbitrary-view action recognition. Experiment results show that the VS-CNN achieves superior performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10681

PDF

http://arxiv.org/pdf/1904.10681
Read All
Unsupervised Adversarial Domain Adaptation Based On The Wasserstein Distance For Acoustic Scene Classification

2019-04-24

Konstantinos Drossos, Paul Magron, Tuomas Virtanen

arXiv_SD

arXiv_SD Adversarial Classification Deep_Learning
Abstract

A challenging problem in deep learning-based machine listening field is the degradation of the performance when using data from unseen conditions. In this paper we focus on the acoustic scene classification (ASC) task and propose an adversarial deep learning method to allow adapting an acoustic scene classification system to deal with a new acoustic channel resulting from data captured with a different recording device. We build upon the theoretical model of H{\Delta}H-distance and previous adversarial discriminative deep learning method for ASC unsupervised domain adaptation, and we present an adversarial training based method using the Wasserstein distance. We improve the state-of-the-art mean accuracy on the data from the unseen conditions from 32% to 45%, using the TUT Acoustic Scenes dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10678

PDF

http://arxiv.org/pdf/1904.10678
Read All
Acute and sub-acute stroke lesion segmentation from multimodal MRI

2019-04-24

Albert Clèrigues, Sergi Valverde, Jose Bernal, Jordi Freixenet, Arnau Oliver, Xavier Lladó

arXiv_CV

arXiv_CV Segmentation Deep_Learning
Abstract

Acute stroke lesion segmentation tasks are of great clinical interest as they can help doctors make better informed treatment decisions. Magnetic resonance imaging (MRI) is time demanding but can provide images that are considered gold standard for diagnosis. Automated stroke lesion segmentation can provide with an estimate of the location and volume of the lesioned tissue, which can help in the clinical practice to better assess and evaluate the risks of each treatment. We propose a deep learning methodology for acute and sub-acute stroke lesion segmentation using multimodal MR imaging. The proposed method is evaluated using two public datasets from the 2015 Ischemic Stroke Lesion Segmentation challenge (ISLES 2015). These involve the tasks of sub-acute stroke lesion segmentation (SISS) and acute stroke penumbra estimation (SPES) from diffusion, perfusion and anatomical MRI modalities. The performance is compared against state-of-the-art methods with a blind online testing set evaluation on each of the challenges. At the time of submitting this manuscript, our approach is the first method in the online rankings for the SISS (DSC=0.59$\pm$0.31) and SPES sub-tasks (DSC=0.84$\pm$0.10). When compared with the rest of submitted strategies, we achieve top rank performance with a lower Hausdorff distance. Better segmentation results are obtained by leveraging the anatomy and pathophysiology of acute stroke lesions and using a combined approach to minimize the effects of class imbalance. The same training procedure is used for both tasks, showing the proposed methodology can generalize well enough to deal with different unrelated tasks and imaging modalities without training hyper-parameter tuning. A public version of the proposed method has been released to the scientific community at https://github.com/NIC-VICOROB/stroke-mri-segmentation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.13304

PDF

http://arxiv.org/pdf/1810.13304
Read All
Deep Learning for Classification of Hyperspectral Data: A Comparative Review

2019-04-24

Nicolas Audebert (OBELIX), Bertrand Saux, Sébastien Lefèvre (OBELIX)

arXiv_CV

arXiv_CV Review Classification Deep_Learning
Abstract

In recent years, deep learning techniques revolutionized the way remote sensing data are processed. Classification of hyperspectral data is no exception to the rule, but has intrinsic specificities which make application of deep learning less straightforward than with other optical data. This article presents a state of the art of previous machine learning approaches, reviews the various deep learning approaches currently proposed for hyperspectral classification, and identifies the problems and difficulties which arise to implement deep neural networks for this task. In particular, the issues of spatial and spectral resolution, data volume, and transfer of models from multimedia images to hyperspectral data are addressed. Additionally, a comparative study of various families of network architectures is provided and a software toolbox is publicly released to allow experimenting with these methods. 1 This article is intended for both data scientists with interest in hyperspectral data and remote sensing experts eager to apply deep learning techniques to their own dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10674

PDF

http://arxiv.org/pdf/1904.10674
Read All
A General Framework for Edited Video and Raw Video Summarization

2019-04-24

Xuelong Li, Bin Zhao, Xiaoqiang Lu

arXiv_CV

arXiv_CV Summarization
Abstract

In this paper, we build a general summarization framework for both of edited video and raw video summarization. Overall, our work can be divided into three folds: 1) Four models are designed to capture the properties of video summaries, i.e., containing important people and objects (importance), representative to the video content (representativeness), no similar key-shots (diversity) and smoothness of the storyline (storyness). Specifically, these models are applicable to both edited videos and raw videos. 2) A comprehensive score function is built with the weighted combination of the aforementioned four models. Note that the weights of the four models in the score function, denoted as property-weight, are learned in a supervised manner. Besides, the property-weights are learned for edited videos and raw videos, respectively. 3) The training set is constructed with both edited videos and raw videos in order to make up the lack of training data. Particularly, each training video is equipped with a pair of mixing-coefficients which can reduce the structure mess in the training set caused by the rough mixture. We test our framework on three datasets, including edited videos, short raw videos and long raw videos. Experimental results have verified the effectiveness of the proposed framework.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10669

PDF

http://arxiv.org/pdf/1904.10669
Read All
Segmenting the Future

2019-04-24

Hsu-kuang Chiu, Ehsan Adeli, Juan Carlos Niebles

arXiv_CV

arXiv_CV Knowledge Segmentation Semantic_Segmentation
Abstract

Predicting the future is an important aspect for decision-making in robotics or autonomous driving systems, which heavily rely upon visual scene understanding. While prior work attempts to predict future video pixels, anticipate activities or forecast future scene semantic segments from segmentation of the preceding frames, methods that predict future semantic segmentation solely from the previous frame RGB data in a single end-to-end trainable model do not exist. In this paper, we propose a temporal encoder-decoder network architecture that encodes RGB frames from the past and decodes the future semantic segmentation. The network is coupled with a new knowledge distillation training framework specifically for the forecasting task. Our method, only seeing preceding video frames, implicitly models the scene segments while simultaneously accounting for the object dynamics to infer the future scene semantic segments. Our results on Cityscapes outperform the baseline and current state-of-the-art methods. Code is available at https://github.com/eddyhkchiu/segmenting_the_future/.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10666

PDF

http://arxiv.org/pdf/1904.10666
Read All
The utility of a convolutional neural network for generating a myelin volume index map from rapid simultaneous relaxometry imaging

2019-04-24

Yasuhiko Tachibana (1 and 2), Akifumi Hagiwara (2 and 3), Masaaki Hori (2), Jeff Kershaw (1), Misaki Nakazawa (2), Tokuhiko Omatsu (1), Riwa Kishimoto (1), Kazumasa Yokoyama (4), Nobutaka Hattori (4), Shigeki Aoki (2), Tatsuya Higashi (5), Takayuki Obata (1 and 5) ((1) Applied MRI Research, Department of Molecular imaging and Theranostics, National Institute of Radiological Sciences, QST, (2) Department of Radiology, Juntendo University School of Medicine, (3) Department of Radiology, Graduate School of Medicine, The University of Tokyo, (4) Department of Neurology, Juntendo University School of Medicine, (5) Department of Molecular imaging and Theranostics, National Institute of Radiological Sciences, QST)

arXiv_AI

arXiv_AI CNN Relation
Abstract

Background and Purpose: A current algorithm to obtain a synthetic myelin volume fraction map (SyMVF) from rapid simultaneous relaxometry imaging (RSRI) has a potential problem, that it does not incorporate information from surrounding pixels. The purpose of this study was to develop a method that utilizes a convolutional neural network (CNN) to overcome this problem. Methods: RSRI and magnetization transfer images from 20 healthy volunteers were included. A CNN was trained to reconstruct RSRI-related metric maps into a myelin volume-related index (generated myelin volume index: GenMVI) map using the myelin volume index map calculated from magnetization transfer images (MTMVI) as reference. The SyMVF and GenMVI maps were statistically compared by testing how well they correlated with the MTMVI map. The correlations were evaluated based on: (i) averaged values obtained from 164 atlas-based ROIs, and (ii) pixel-based comparison for ROIs defined in four different tissue types (cortical and subcortical gray matter, white matter, and whole brain). Results: For atlas-based ROIs, the overall correlation with the MTMVI map was higher for the GenMVI map than for the SyMVF map. In the pixel-based comparison, correlation with the MTMVI map was stronger for the GenMVI map than for the SyMVF map, and the difference in the distribution for the volunteers was significant (Wilcoxon sign-rank test, P<.001) in all tissue types. Conclusion: The proposed method is useful, as it can incorporate more specific information about local tissue properties than the existing method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10960

PDF

http://arxiv.org/pdf/1904.10960
Read All
Super-resolution based generative adversarial network using visual perceptual loss function

2019-04-24

Xuan Zhu, Yue Cheng, Rongzhi Wang

arXiv_CV

arXiv_CV Salient Adversarial Super_Resolution
Abstract

In recent years, perceptual-quality driven super-resolution methods show satisfactory results. However, super-resolved images have uncertain texture details and unpleasant artifact. We build a novel perceptual loss function composed of morphological components adversarial loss and color adversarial loss and salient content loss to ameliorate these problems. The adversarial loss is applied to constrain color and morphological components distribution of super-resolved images and the salient content loss highlights the perceptual similarity of feature-rich regions. Experiments show that proposed method achieves significant improvements in terms of perceptual index and visual quality compared with the state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10654

PDF

http://arxiv.org/pdf/1904.10654
Read All
Stochastic Lipschitz Q-Learning

2019-04-24

Xu Zhu, David Dunson

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning
Abstract

In an episodic Markov Decision Process (MDP) problem, an online algorithm chooses from a set of actions in a sequence of $H$ trials, where $H$ is the episode length, in order to maximize the total payoff of the chosen actions. Q-learning, as the most popular model-free reinforcement learning (RL) algorithm, directly parameterizes and updates value functions without explicitly modeling the environment. Recently, [Jin et al. 2018] studies the sample complexity of Q-learning with finite states and actions. Their algorithm achieves nearly optimal regret, which shows that Q-learning can be made sample efficient. However, MDPs with large discrete states and actions [Silver et al. 2016] or continuous spaces [Mnih et al. 2013] cannot learn efficiently in this way. Hence, it is critical to develop new algorithms to solve this dilemma with provable guarantee on the sample complexity. With this motivation, we propose a novel algorithm that works for MDPs with a more general setting, which has infinitely many states and actions and assumes that the payoff function and transition kernel are Lipschitz continuous. We also provide corresponding theory justification for our algorithm. It achieves the regret $\tilde{\mathcal{O}}(K^{\frac{d+1}{d+2}}\sqrt{H^3}),$ where $K$ denotes the number of episodes and $d$ denotes the dimension of the joint space. To the best of our knowledge, this is the first analysis in the model-free setting whose established regret matches the lower bound up to a logarithmic factor.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10653

PDF

http://arxiv.org/pdf/1904.10653
Read All
Automatic Face Aging in Videos via Deep Reinforcement Learning

2019-04-24

Chi Nhan Duong, Khoa Luu, Kha Gia Quach, Nghia Nguyen, Eric Patterson, Tien D. Bui, Ngan Le

arXiv_CV

arXiv_CV Face Reinforcement_Learning CNN
Abstract

This paper presents a novel approach to synthesize automatically age-progressed facial images in video sequences using Deep Reinforcement Learning. The proposed method models facial structures and the longitudinal face-aging process of given subjects coherently across video frames. The approach is optimized using a long-term reward, Reinforcement Learning function with deep feature extraction from Deep Convolutional Neural Network. Unlike previous age-progression methods that are only able to synthesize an aged likeness of a face from a single input image, the proposed approach is capable of age-progressing facial likenesses in videos with consistently synthesized facial features across frames. In addition, the deep reinforcement learning method guarantees preservation of the visual identity of input faces after age-progression. Results on videos of our new collected aging face AGFW-v2 database demonstrate the advantages of the proposed solution in terms of both quality of age-progressed faces, temporal smoothness, and cross-age face verification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.11082

PDF

http://arxiv.org/pdf/1811.11082
Read All
The Trajectron: Probabilistic Multi-Agent Trajectory Modeling with Dynamic Spatiotemporal Graphs

2019-04-24

Boris Ivanovic, Marco Pavone

arXiv_RO

arXiv_RO Prediction
Abstract

Developing safe human-robot interaction systems is a necessary step towards the widespread integration of autonomous agents in society. A key component of such systems is the ability to reason about the many potential futures (e.g. trajectories) of other agents in the scene. Towards this end, we present the Trajectron, a graph-structured model that predicts many potential future trajectories of multiple agents simultaneously in both highly dynamic and multimodal scenarios (i.e. where the number of agents in the scene is time-varying and there are many possible highly-distinct futures for each agent). It combines tools from recurrent sequence modeling and variational deep generative modeling to produce a distribution of future trajectories for each agent in a scene. We demonstrate the performance of our model on several datasets, obtaining state-of-the-art results on standard trajectory prediction metrics as well as introducing a new metric for comparing models that output distributions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.05993

PDF

http://arxiv.org/pdf/1810.05993
Read All
Differentiable Learning-to-Normalize via Switchable Normalization

2019-04-24

Ping Luo, Jiamin Ren, Zhanglin Peng, Ruimao Zhang, Jingyu Li

arXiv_CV

arXiv_CV Deep_Learning
Abstract

We address a learning-to-normalize problem by proposing Switchable Normalization (SN), which learns to select different normalizers for different normalization layers of a deep neural network. SN employs three distinct scopes to compute statistics (means and variances) including a channel, a layer, and a minibatch. SN switches between them by learning their importance weights in an end-to-end manner. It has several good properties. First, it adapts to various network architectures and tasks (see Fig.1). Second, it is robust to a wide range of batch sizes, maintaining high performance even when small minibatch is presented (e.g. 2 images/GPU). Third, SN does not have sensitive hyper-parameter, unlike group normalization that searches the number of groups as a hyper-parameter. Without bells and whistles, SN outperforms its counterparts on various challenging benchmarks, such as ImageNet, COCO, CityScapes, ADE20K, and Kinetics. Analyses of SN are also presented. We hope SN will help ease the usage and understand the normalization techniques in deep learning. The code of SN has been made available in https://github.com/switchablenorms/.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.10779

PDF

http://arxiv.org/pdf/1806.10779
Read All
Solving zero-sum extensive-form games with arbitrary payoff uncertainty models

2019-04-24

Juan Leni, John Levine, John Quigley

arXiv_AI

arXiv_AI
Abstract

Modeling strategic conflict from a game theoretical perspective involves dealing with epistemic uncertainty. Payoff uncertainty models are typically restricted to simple probability models due to computational restrictions. Recent breakthroughs Artificial Intelligence (AI) research applied to Poker have resulted in novel approximation approaches such as counterfactual regret minimization, that can successfully deal with large-scale imperfect games. By drawing from these ideas, this work addresses the problem of arbitrary continuous payoff distributions. We propose a method, Harsanyi-Counterfactual Regret Minimization, to solve two-player zero-sum extensive-form games with arbitrary payoff distribution models. Given a game $\Gamma$, using a Harsanyi transformation we generate a new game $\Gamma^#$ to which we later apply Counterfactual Regret Minimization to obtain $\varepsilon$-Nash equilibria. We include numerical experiments showing how the method can be applied to a previously published problem.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1905.03850

PDF

http://arxiv.org/pdf/1905.03850
Read All
Efficient FPGA Floorplanning for Partial Reconfiguration-Based Applications

2019-04-24

Norbert Deak, Octavian Creţ, Horia Hedeşiu

arXiv_CV

arXiv_CV
Abstract

Partial Reconfiguration (PR) is a technique that allows reconfiguring the FPGA chip at runtime. However, current design support tools require manual floorplanning of the partial modules. Several approaches have been proposed in this field, but only a few of them consider all aspects of PR, like the shape and the aspect ratio of the reconfigurable region. Most of them are defined for old FPGA architectures and have a high computational time. This paper introduces an efficient automatic floorplanning algorithm, which takes into account the heterogeneous architectures of modern FPGA families, as well as PR constraints, introducing the aspect ratio constraint to optimize routing. The algorithm generates possible placements of the partial modules, then applies a recursive pseudo-bipartitioning heuristic search to find the best floorplan. The experiments showed that the algorithm’s performance is significantly better than the one of other algorithms in this field.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1904.10646

PDF

https://arxiv.org/pdf/1904.10646
Read All
Towards Understanding Regularization in Batch Normalization

2019-04-24

Ping Luo, Xinjiang Wang, Wenqi Shao, Zhanglin Peng

arXiv_CV

arXiv_CV Regularization CNN
Abstract

Batch Normalization (BN) improves both convergence and generalization in training neural networks. This work understands these phenomena theoretically. We analyze BN by using a basic block of neural networks, consisting of a kernel layer, a BN layer, and a nonlinear activation function. This basic network helps us understand the impacts of BN in three aspects. First, by viewing BN as an implicit regularizer, BN can be decomposed into population normalization (PN) and gamma decay as an explicit regularization. Second, learning dynamics of BN and the regularization show that training converged with large maximum and effective learning rate. Third, generalization of BN is explored by using statistical mechanics. Experiments demonstrate that BN in convolutional neural networks share the same traits of regularization as the above analyses.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.00846

PDF

http://arxiv.org/pdf/1809.00846
Read All
Beauty Learning and Counterfactual Inference

2019-04-24

Tao Li

arXiv_CV

arXiv_CV Inference
Abstract

This work showcases a new approach for causal discovery by leveraging user experiments and recent advances in photo-realistic image editing, demonstrating a potential of identifying causal factors and understanding complex systems counterfactually. We introduce the beauty learning problem as an example, which has been discussed metaphysically for centuries and been proved exists, is quantifiable, and can be learned by deep models in our recent paper, where we utilize a natural image generator coupled with user studies to infer causal effects from facial semantics to beauty outcomes, the results of which also align with existing empirical studies. We expect the proposed framework for a broader application in causal inference.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.12629

PDF

http://arxiv.org/pdf/1904.12629
Read All
Facilitating Bayesian Continual Learning by Natural Gradients and Stein Gradients

2019-04-24

Yu Chen, Tom Diethe, Neil Lawrence

arXiv_AI

arXiv_AI Knowledge Inference
Abstract

Continual learning aims to enable machine learning models to learn a general solution space for past and future tasks in a sequential manner. Conventional models tend to forget the knowledge of previous tasks while learning a new task, a phenomenon known as catastrophic forgetting. When using Bayesian models in continual learning, knowledge from previous tasks can be retained in two ways: 1). posterior distributions over the parameters, containing the knowledge gained from inference in previous tasks, which then serve as the priors for the following task; 2). coresets, containing knowledge of data distributions of previous tasks. Here, we show that Bayesian continual learning can be facilitated in terms of these two means through the use of natural gradients and Stein gradients respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10644

PDF

http://arxiv.org/pdf/1904.10644
Read All
Detecting Machine-Translated Paragraphs by Matching Similar Words

2019-04-24

Hoang-Quoc Nguyen-Son, Tran Phuong Thao, Seira Hidano, Shinsaku Kiyomoto

arXiv_CL

arXiv_CL Object_Detection Language_Model Detection
Abstract

Machine-translated text plays an important role in modern life by smoothing communication from various communities using different languages. However, unnatural translation may lead to misunderstanding, a detector is thus needed to avoid the unfortunate mistakes. While a previous method measured the naturalness of continuous words using a N-gram language model, another method matched noncontinuous words across sentences but this method ignores such words in an individual sentence. We have developed a method matching similar words throughout the paragraph and estimating the paragraph-level coherence, that can identify machine-translated text. Experiment evaluates on 2000 English human-generated and 2000 English machine-translated paragraphs from German showing that the coherence-based method achieves high performance (accuracy = 87.0%; equal error rate = 13.0%). It is efficiently better than previous methods (best accuracy = 72.4%; equal error rate = 29.7%). Similar experiments on Dutch and Japanese obtain 89.2% and 97.9% accuracy, respectively. The results demonstrate the persistence of the proposed method in various languages with different resource levels.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.10641

PDF

http://arxiv.org/pdf/1904.10641
Read All

56/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL