Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Fine-Grained Temporal Relation Extraction

2019-02-04

SIddharth Vashishtha, Benjamin Van Durme, Aaron Steven White

arXiv_CL

arXiv_CL Relation_Extraction Relation
Abstract

We present a novel semantic framework for modeling temporal relations and event durations that maps pairs of events to real-valued scales for the purpose of constructing document-level event timelines. We use this framework to construct the largest temporal relations dataset to date, covering the entirety of the Universal Dependencies English Web Treebank. We use this dataset to train models for jointly predicting fine-grained temporal relations and event durations. We report strong results on our data and show the efficacy of a transfer-learning approach for predicting standard, categorical TimeML relations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01390

PDF

http://arxiv.org/pdf/1902.01390
Read All
Embodied Multimodal Multitask Learning

2019-02-04

Devendra Singh Chaplot, Lisa Lee, Ruslan Salakhutdinov, Devi Parikh, Dhruv Batra

arXiv_AI

arXiv_AI Object_Detection Knowledge Attention Reinforcement_Learning Detection
Abstract

Recent efforts on training visual navigation agents conditioned on language using deep reinforcement learning have been successful in learning policies for different multimodal tasks, such as semantic goal navigation and embodied question answering. In this paper, we propose a multitask model capable of jointly learning these multimodal tasks, and transferring knowledge of words and their grounding in visual objects across the tasks. The proposed model uses a novel Dual-Attention unit to disentangle the knowledge of words in the textual representations and visual concepts in the visual representations, and align them with each other. This disentangled task-invariant alignment of representations facilitates grounding and knowledge transfer across both tasks. We show that the proposed model outperforms a range of baselines on both tasks in simulated 3D environments. We also show that this disentanglement of representations makes our model modular, interpretable, and allows for transfer to instructions containing new words by leveraging object detectors.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01385

PDF

http://arxiv.org/pdf/1902.01385
Read All
Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

2019-02-04

Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

arXiv_CL

arXiv_CL Weakly_Supervised Face
Abstract

The vast majority of language pairs in the world are low-resource because they have little, if any, parallel data available. Unfortunately, machine translation (MT) systems do not currently work well in this setting. Besides the technical challenges of learning with limited supervision, there is also another challenge: it is very difficult to evaluate methods trained on low resource language pairs because there are very few freely and publicly available benchmarks. In this work, we take sentences from Wikipedia pages and introduce new evaluation datasets in two very low resource language pairs, Nepali-English and Sinhala-English. These are languages with very different morphology and syntax, for which little out-of-domain parallel data is available and for which relatively large amounts of monolingual data are freely available. We describe our process to collect and cross-check the quality of translations, and we report baseline performance using several learning settings: fully supervised, weakly supervised, semi-supervised, and fully unsupervised. Our experiments demonstrate that current state-of-the-art methods perform rather poorly on this benchmark, posing a challenge to the research community working on low resource MT. Data and code to reproduce our experiments are available at https://github.com/facebookresearch/flores.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01382

PDF

http://arxiv.org/pdf/1902.01382
Read All
Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning

2019-02-04

Arthur Juliani, Ahmed Khalifa, Vincent-Pierre Berges, Jonathan Harper, Hunter Henry, Adam Crespi, Julian Togelius, Danny Lange

arXiv_AI

arXiv_AI Sparse Reinforcement_Learning
Abstract

The rapid pace of research in Deep Reinforcement Learning has been driven by the presence of fast and challenging simulation environments. These environments often take the form of games; with tasks ranging from simple board games, to classic home console games, to modern strategy games. We propose a new benchmark called Obstacle Tower: a high visual fidelity, 3D, 3rd person, procedurally generated game environment. An agent in the Obstacle Tower must learn to solve both low-level control and high-level planning problems in tandem while learning from pixels and a sparse reward signal. Unlike other similar benchmarks such as the ALE, evaluation of agent performance in Obstacle Tower is based on an agent’s ability to perform well on unseen instances of the environment. In this paper we outline the environment and provide a set of initial baseline results produced by current state-of-the-art Deep RL methods as well as human players. In all cases these algorithms fail to produce agents capable of performing anywhere near human level on a set of evaluations designed to test both memorization and generalization ability. As such, we believe that the Obstacle Tower has the potential to serve as a helpful Deep RL benchmark now and into the future.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01378

PDF

http://arxiv.org/pdf/1902.01378
Read All
End-to-End Single Image Fog Removal using Enhanced Cycle Consistent Adversarial Networks

2019-02-04

Wei Liu, Xianxu Hou, Jiang Duan, Guoping Qiu

arXiv_CV

arXiv_CV Adversarial Knowledge GAN Embedding Quantitative
Abstract

Single image defogging is a classical and challenging problem in computer vision. Existing methods towards this problem mainly include handcrafted priors based methods that rely on the use of the atmospheric degradation model and learning based approaches that require paired fog-fogfree training example images. In practice, however, prior-based methods are prone to failure due to their own limitations and paired training data are extremely difficult to acquire. Inspired by the principle of CycleGAN network, we have developed an end-to-end learning system that uses unpaired fog and fogfree training images, adversarial discriminators and cycle consistency losses to automatically construct a fog removal system. Similar to CycleGAN, our system has two transformation paths; one maps fog images to a fogfree image domain and the other maps fogfree images to a fog image domain. Instead of one stage mapping, our system uses a two stage mapping strategy in each transformation path to enhance the effectiveness of fog removal. Furthermore, we make explicit use of prior knowledge in the networks by embedding the atmospheric degradation principle and a sky prior for mapping fogfree images to the fog images domain. In addition, we also contribute the first real world nature fog-fogfree image dataset for defogging research. Our multiple real fog images dataset (MRFID) contains images of 200 natural outdoor scenes. For each scene, there are one clear image and corresponding four foggy images of different fog densities manually selected from a sequence of images taken by a fixed camera over the course of one year. Qualitative and quantitative comparison against several state-of-the-art methods on both synthetic and real world images demonstrate that our approach is effective and performs favorably for recovering a clear image from a foggy image.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01374

PDF

http://arxiv.org/pdf/1902.01374
Read All
Insertion-based Decoding with Automatically Inferred Generation Order

2019-02-04

Jiatao Gu, Qi Liu, Kyunghyun Cho

arXiv_CL

arXiv_CL
Abstract

Conventional neural autoregressive decoding commonly assumes a left-to-right generation order. In this work, we propose a novel decoding algorithm – INDIGO – which supports flexible generation in an arbitrary order with the help of insertion operations. We use Transformer, a state-of-the-art sequence generation model, to efficiently implement the proposed approach, enabling it to be trained with either a pre-defined generation order or an adaptive order searched based on the model’s own preference. Experiments on three real-world tasks, including machine translation, word order recovery and code generation, demonstrate that our algorithm can generate sequences in an arbitrary order, while achieving competitive or even better performance compared to the conventional left-to-right generation. Case studies show that INDIGO adopts adaptive generation orders based on input information.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01370

PDF

http://arxiv.org/pdf/1902.01370
Read All
Evaluation of Multidisciplinary Effects of Artificial Intelligence with Optimization Perspective

2019-02-04

M. H. Calp

arXiv_AI

arXiv_AI Optimization
Abstract

Artificial Intelligence has an important place in the scientific community as a result of its successful outputs in terms of different fields. In time, the field of Artificial Intelligence has been divided into many sub-fields because of increasing number of different solution approaches, methods, and techniques. Machine Learning has the most remarkable role with its functions to learn from samples from the environment. On the other hand, intelligent optimization done by inspiring from nature and swarms had its own unique scientific literature, with effective solutions provided for optimization problems from different fields. Because intelligent optimization can be applied in different fields effectively, this study aims to provide a general discussion on multidisciplinary effects of Artificial Intelligence by considering its optimization oriented solutions. The study briefly focuses on background of the intelligent optimization briefly and then gives application examples of intelligent optimization from a multidisciplinary perspective.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01362

PDF

http://arxiv.org/pdf/1902.01362
Read All
Intelligent Traffic Signal Control: Using Reinforcement Learning with Partial Detection

2019-02-04

Rusheng Zhang, Akihiro Ishikawa, Wenli Wang, Benjamin Striner, Ozan Tonguz

arXiv_AI

arXiv_AI Object_Detection Attention Reinforcement_Learning Detection
Abstract

Intelligent Transportation Systems (ITS) have attracted the attention of researchers and the general public alike as a means to alleviate traffic congestion. Recently, the maturity of wireless technology has enabled a cost-efficient way to achieve ITS by detecting vehicles using Vehicle to Infrastructure (V2I) communications. Traditional ITS algorithms, in most cases, assume that every vehicle is observed, such as by a camera or a loop detector, but a V2I implementation would detect only those vehicles with wireless communications capability. We examine a family of transportation systems, which we will refer to as `Partially Detected Intelligent Transportation Systems’. An algorithm that can act well under a small detection rate is highly desirable due to gradual penetration rates of the underlying wireless technologies such as Dedicated Short Range Communications (DSRC) technology. Artificial Intelligence (AI) techniques for Reinforcement Learning (RL) are suitable tools for finding such an algorithm due to utilizing varied inputs and not requiring explicit analytic understanding or modeling of the underlying system dynamics. In this paper, we report a RL algorithm for partially observable ITS based on DSRC. The performance of this system is studied under different car flows, detection rates, and topologies of the road network. Our system is able to efficiently reduce the average waiting time of vehicles at an intersection, even with a low detection rate.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1807.01628

PDF

http://arxiv.org/pdf/1807.01628
Read All
Solving The Exam Scheduling Problems in Central Exams With Genetic Algorithms

2019-02-04

Murat Dener, M. Hanefi Calp

arXiv_AI

arXiv_AI GAN
Abstract

It is the efficient use of resources expected from an exam scheduling application. There are various criteria for efficient use of resources and for all tests to be carried out at minimum cost in the shortest possible time. It is aimed that educational institutions with such criteria successfully carry out central examination organizations. In the study, a two-stage genetic algorithm was developed. In the first stage, the assignment of courses to sessions was carried out. In the second stage, the students who participated in the test session were assigned to examination rooms. Purposes of the study are increasing the number of joint students participating in sessions, using the minimum number of buildings in the same session, and reducing the number of supervisors using the minimum number of classrooms possible. In this study, a general purpose exam scheduling solution for educational institutions was presented. The developed system can be used in different central examinations to create originality. Given the results of the sample application, it is seen that the proposed genetic algorithm gives successful results.1

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01360

PDF

http://arxiv.org/pdf/1902.01360
Read All
An Argument-Marker Model for Syntax-Agnostic Proto-Role Labeling

2019-02-04

Juri Opitz, Anette Frank

arXiv_CL

arXiv_CL Attention Prediction
Abstract

Semantic proto-role labeling (SPRL) is an alternative to semantic role labeling (SRL) that moves beyond a categorical definition of roles, following Dowty’s feature-based view of proto-roles. This theory determines agenthood vs. patienthood based on a participant’s instantiation of more or less typical agent vs. patient properties, such as, e.g. volitionality in an event. To perform SPRL, we develop an ensemble of hierarchical models with self-attention and concurrently learned predicate-argument markers. Our method is competitive with the state-of-the art, overall outperforming previous work in two different formulations of the task (multi-label and Likert scale prediction). In contrast to previous work, our results do not depend on supplementary gold syntax.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01349

PDF

http://arxiv.org/pdf/1902.01349
Read All
Partial Fingerprint Detection Using Core Point Location

2019-02-04

Wajih Ullah Baig, Adeel Ejaz, Umar Munir, Kashif Sardar

arXiv_CV

arXiv_CV Detection
Abstract

In Biometric identification, fingerprints based identification has been the widely accepted mechanism. Automated fingerprints identification/verification techniques are widely adopted in many civilian and forensic applications. In forensic applications fingerprints are usually incomplete, broken, unclear or degraded which are known as partial fingerprints. Fingerprints identification/verification largely suffer from the problem of handling partial fingerprints. In this paper a novel and simple approach is presented for detecting partial fingerprints using core point location. Our techniques is particularly useful during the acquisition stage as to determine whether a user needs to re-align the finger to ensure a complete capture of fingerprint area.This technique is tested on FVC-2002 DB1A. The results are very accurate which are presented in the Results sections.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01400

PDF

http://arxiv.org/pdf/1902.01400
Read All
Towards an Interactive and Interpretable CAD System to Support Proximal Femur Fracture Classification

2019-02-04

Amelia Jiménez-Sánchez, Anees Kazi, Shadi Albarqouni, Chlodwig Kirchhoff, Peter Biberthaler, Nassir Navab, Diana Mateus, Sonja Kirchhoff

arXiv_CV

arXiv_CV Face Classification Deep_Learning
Abstract

Fractures of the proximal femur represent a critical entity in the western world, particularly with the growing elderly population. Such fractures result in high morbidity and mortality, reflecting a significant health and economic impact on our society. Different treatment strategies are recommended for different fracture types, with surgical treatment still being the gold standard in most of the cases. The success of the treatment and prognosis after surgery strongly depends on an accurate classification of the fracture among standard types, such as those defined by the AO system. However, the classification of fracture types based on x-ray images is difficult as confirmed by low intra- and inter-expert agreement rates of our in-house study and also in the previous literature. The presented work proposes a fully automatic computer-aided diagnosis (CAD) tool, based on current deep learning techniques, able to identify, localize and finally classify proximal femur fractures on x-rays images according to the AO classification. Results of our experimental evaluation show that the performance achieved by the proposed CAD tool is comparable to the average expert for the classification of x-ray images into types ‘‘A’’, ‘‘B’’ and ‘‘normal’’ (precision of 89%), while the performance is even superior when classifying fractures versus ‘‘normal’’ cases (precision of 94%). In addition, the integration of the proposed CAD tool into daily clinical routine is extensively discussed, towards improving the interface between humans and AI-powered machines in supporting medical decisions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01338

PDF

http://arxiv.org/pdf/1902.01338
Read All
Learning to segment with image-level supervision

2019-02-04

Gaurav Pandey, Ambedkar Dukkipati

arXiv_CV

arXiv_CV Salient Segmentation Weakly_Supervised CNN
Abstract

Deep convolutional networks have achieved the state-of-the-art for semantic image segmentation tasks. However, training these networks requires access to densely labeled images, which are known to be very expensive to obtain. On the other hand, the web provides an almost unlimited source of images annotated at the image level. How can one utilize this much larger weakly annotated set for tasks that require dense labeling? Prior work often relied on localization cues, such as saliency maps, objectness priors, bounding boxes etc., to address this challenging problem. In this paper, we propose a model that generates auxiliary labels for each image, while simultaneously forcing the output of the CNN to satisfy the mean-field constraints imposed by a conditional random field. We show that one can enforce the CRF constraints by forcing the distribution at each pixel to be close to the distribution of its neighbors. This is in stark contrast with methods that compute a recursive expansion of the mean-field distribution using a recurrent architecture and train the resultant distribution. Instead, the proposed model adds an extra loss term to the output of the CNN, and hence, is faster than recursive implementations. We achieve the state-of-the-art for weakly supervised semantic image segmentation on VOC 2012 dataset, assuming no manually labeled pixel level information is available. Furthermore, the incorporation of conditional random fields in CNN incurs little extra time during training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1705.01262

PDF

http://arxiv.org/pdf/1705.01262
Read All
When Exceptions are the Norm: Exploring the Role of Consent in HRI

2019-02-04

Vasanth Sarathy, Thomas Arnold, Matthias Scheutz

arXiv_RO

arXiv_RO Attention Face
Abstract

HRI researchers have made major strides in developing robotic architectures that are capable of reading a limited set of social cues and producing behaviors that enhance their likeability and feeling of comfort amongst humans. However, the cues in these models are fairly direct and the interactions largely dyadic. To capture the normative qualities of interaction more robustly, we propose consent as a distinct, critical area for HRI research. Convening important insights in existing HRI work around topics like touch, proxemics, gaze, and moral norms, the notion of consent reveals key expectations that can shape how a robot acts in social space. By sorting various kinds of consent through social and legal doctrine, we delineate empirical and technical questions to meet consent challenges faced in major application domains and robotic roles. Attention to consent could show, for example, how extraordinary, norm-violating actions can be justified by agents and accepted by those around them. We argue that operationalizing ideas from legal scholarship can better guide how robotic systems might cultivate and sustain proper forms of consent.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01320

PDF

http://arxiv.org/pdf/1902.01320
Read All
Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

2019-02-04

Andrea Galassi, Marco Lippi, Paolo Torroni

arXiv_AI

arXiv_AI Review Attention
Abstract

Attention is an increasingly popular mechanism used in a wide range of neural architectures. Because of the fast-paced advances in this domain, a systematic overview of attention is still missing. In this article, we define a unified model for attention architectures for natural language processing, with a focus on architectures designed to work with vector representation of the textual data. We discuss the dimensions along which proposals differ, the possible uses of attention, and chart the major research activities and open challenges in the area.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.02181

PDF

http://arxiv.org/pdf/1902.02181
Read All
'Squeeze & Excite' Guided Few-Shot Segmentation of Volumetric Images

2019-02-04

Abhijit Guha Roy, Shayan Siddiqui, Sebastian Pölsterl, Nassir Navab, Christian Wachinger

arXiv_CV

arXiv_CV Segmentation GAN
Abstract

Deep neural networks enable highly accurate image segmentation, but require large amounts of manually annotated data for supervised training. Few-shot learning aims to address this shortcoming by learning a new class from a few annotated support examples. We introduce, for the first time, a novel few-shot framework, for the segmentation of volumetric medical images with only a few annotated slices. Compared to other related works in computer vision, the major challenges are the absence of pre-trained networks and the volumetric nature of medical scans. We address these challenges by proposing a new architecture for few-shot segmentation that incorporates ‘squeeze & excite’ blocks. Our two-armed architecture consists of a conditioner arm, which processes the annotated support input and generates a task representation which is used the relevant information for segmenting a new class. This representation is passed on to the segmenter arm that uses this information to segment the new query image. To facilitate efficient interaction between the conditioner and the segmenter arm, we propose to use ‘channel squeeze & spatial excitation’ blocks: a light-weight computational module, that enables heavy interaction between the both arms with negligible increase in model complexity. This contribution allows us to perform image segmentation without relying on a pre-trained model, which generally is unavailable for medical scans. Furthermore, we propose an efficient strategy for volumetric segmentation by optimally pairing a few slices of the support volume to all the slices of query volume. We perform the experiments for organ segmentation on whole-body contrast-enhanced CT scans from Visceral Dataset. Our proposed model outperforms multiple baselines and existing approaches with respect to the segmentation accuracy by a significant margin.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01314

PDF

http://arxiv.org/pdf/1902.01314
Read All
An Effective Approach to Unsupervised Machine Translation

2019-02-04

Mikel Artetxe, Gorka Labaka, Eneko Agirre

arXiv_AI

arXiv_AI NMT
Abstract

While machine translation has traditionally relied on large amounts of parallel corpora, a recent research line has managed to train both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) systems using monolingual corpora only. In this paper, we identify and address several deficiencies of existing unsupervised SMT approaches by exploiting subword information, developing a theoretically well founded unsupervised tuning method, and incorporating a joint refinement procedure. Moreover, we use our improved SMT system to initialize a dual NMT model, which is further fine-tuned through on-the-fly back-translation. Together, we obtain large improvements over the previous state-of-the-art in unsupervised machine translation. For instance, we get 22.5 BLEU points in English-to-German WMT 2014, 5.5 points more than the previous best unsupervised system, and 0.5 points more than the (supervised) shared task winner back in 2014.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01313

PDF

http://arxiv.org/pdf/1902.01313
Read All
Real-time Prediction of Automotive Collision Risk from Monocular Video

2019-02-04

Derek J. Phillips, Juan Carlos Aragon, Anjali Roychowdhury, Regina Madigan, Sunil Chintakindi, Mykel J. Kochenderfer

arXiv_CV

arXiv_CV Object_Detection Tracking Object_Tracking Prediction Detection
Abstract

Many automotive applications, such as Advanced Driver Assistance Systems (ADAS) for collision avoidance and warnings, require estimating the future automotive risk of a driving scene. We present a low-cost system that predicts the collision risk over an intermediate time horizon from a monocular video source, such as a dashboard-mounted camera. The modular system includes components for object detection, object tracking, and state estimation. We introduce solutions to the object tracking and distance estimation problems. Advanced approaches to the other tasks are used to produce real-time predictions of the automotive risk for the next 10 s at over 5 Hz. The system is designed such that alternative components can be substituted with minimal effort. It is demonstrated on common physical hardware, specifically an off-the-shelf gaming laptop and a webcam. We extend the framework to support absolute speed estimation and more advanced risk estimation techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01293

PDF

http://arxiv.org/pdf/1902.01293
Read All
Real-Time Steganalysis for Stream Media Based on Multi-channel Convolutional Sliding Windows

2019-02-04

Zhongliang Yang, Hao Yang, Yuting Hu, Yongfeng Huang, Yu-Jin Zhang

arXiv_AI

arXiv_AI GAN Face Embedding CNN Detection Relation
Abstract

Previous VoIP steganalysis methods face great challenges in detecting speech signals at low embedding rates, and they are also generally difficult to perform real-time detection, making them hard to truly maintain cyberspace security. To solve these two challenges, in this paper, combined with the sliding window detection algorithm and Convolution Neural Network we propose a real-time VoIP steganalysis method which based on multi-channel convolution sliding windows. In order to analyze the correlations between frames and different neighborhood frames in a VoIP signal, we define multi channel sliding detection windows. Within each sliding window, we design two feature extraction channels which contain multiple convolution layers with multiple convolution kernels each layer to extract correlation features of the input signal. Then based on these extracted features, we use a forward fully connected network for feature fusion. Finally, by analyzing the statistical distribution of these features, the discriminator will determine whether the input speech signal contains covert information or not.We designed several experiments to test the proposed model’s detection ability under various conditions, including different embedding rates, different speech length, etc. Experimental results showed that the proposed model outperforms all the previous methods, especially in the case of low embedding rate, which showed state-of-the-art performance. In addition, we also tested the detection efficiency of the proposed model, and the results showed that it can achieve almost real-time detection of VoIP speech signals.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01286

PDF

http://arxiv.org/pdf/1902.01286
Read All
Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

2019-02-04

Martin Sundermeyer, Zoltan-Csaba Marton, Maximilian Durner, Manuel Brucker, Rudolph Triebel

arXiv_CV

arXiv_CV Object_Detection Pose_Estimation Detection
Abstract

We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Experiments on the T-LESS and LineMOD datasets show that our method outperforms similar model-based approaches and competes with state-of-the art approaches that require real pose-annotated images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01275

PDF

http://arxiv.org/pdf/1902.01275
Read All
Dynamic Planning Networks

2019-02-04

Norman Tasfi, Miriam Capretz

arXiv_CV

arXiv_CV Reinforcement_Learning
Abstract

We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, that combines model-based and model-free aspects for online planning. Our architecture learns to dynamically construct plans using a learned state-transition model by selecting and traversing between simulated states and actions to maximize information before acting. In contrast to model-free methods, model-based planning lets the agent efficiently test action hypotheses without performing costly trial-and-error in the environment. DPN learns to efficiently form plans by expanding a single action-conditional state transition at a time instead of exhaustively evaluating each action, reducing the required number of state-transitions during planning by up to 96%. We observe various emergent planning patterns used to solve environments, including classical search methods such as breadth-first and depth-first search. DPN shows improved data efficiency, performance, and generalization to new and unseen domains in comparison to several baselines.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1812.11240

PDF

https://arxiv.org/pdf/1812.11240
Read All
PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

2019-02-04

Paavo Parmas, Carl Edward Rasmussen, Jan Peters, Kenji Doya

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization Inference Deep_Learning
Abstract

Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computations. Not only do the magnitudes of the gradients become large, the direction of the gradients becomes essentially random. We show that reparameterization gradients suffer from the problem, while likelihood ratio gradients are robust. Using our insights, we develop a model-based policy search framework, Probabilistic Inference for Particle-Based Policy Search (PIPPS), which is easily extensible, and allows for almost arbitrary models and policies, while simultaneously matching the performance of previous data-efficient learning algorithms. Finally, we invent the total propagation algorithm, which efficiently computes a union over all pathwise derivative depths during a single backwards pass, automatically giving greater weight to estimators with lower variance, sometimes improving over reparameterization gradients by $10^6$ times.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01240

PDF

http://arxiv.org/pdf/1902.01240
Read All
Solving Nurse Scheduling Problem Using Constraint Programming Technique

2019-02-04

O.M. Alade, A.O. Amusat

arXiv_AI

arXiv_AI GAN
Abstract

Staff scheduling is a universal problem that can be encountered in many organizations, such as call centers, educational institution, industry, hospital, and any other public services. It is one of the most important aspects of workforce management strategy and the one that is most prone to errors or issues as there are many entities should be considered, such as the staff turnover, employee availability, time between rotations, unusual periods of activity, and even the last-minute shift changes. The nurse scheduling problem is a variant of staff scheduling problems which appoints nurses to shifts as well as rooms per day taking both hard constraints, i.e., hospital requirements, and soft constraints, i.e., nurse preferences, into account. Most algorithms used for scheduling problems fall short when it comes to the number of inputs they can handle. In this paper, constraint programming was developed to solve the nurse scheduling problem. The developed constraint programming model was then implemented using python programming language.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01193

PDF

http://arxiv.org/pdf/1902.01193
Read All
Constructing the Matrix Multilayer Perceptron and its Application to the VAE

2019-02-04

Jalil Taghia, Maria Bånkestad, Fredrik Lindsten, Thomas B. Schön

arXiv_AI

arXiv_AI Prediction Recognition
Abstract

Like most learning algorithms, the multilayer perceptrons (MLP) is designed to learn a vector of parameters from data. However, in certain scenarios we are interested in learning structured parameters (predictions) in the form of symmetric positive definite matrices. Here, we introduce a variant of the MLP, referred to as the matrix MLP, that is specialized at learning symmetric positive definite matrices. We also present an application of the model within the context of the variational autoencoder (VAE). Our formulation of the VAE extends the vanilla formulation to the cases where the recognition and the generative networks can be from the parametric family of distributions with dense covariance matrices. Two specific examples are discussed in more detail: the dense covariance Gaussian and its generalization, the power exponential distribution. Our new developments are illustrated using both synthetic and real data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01182

PDF

http://arxiv.org/pdf/1902.01182
Read All
Unsupervised Clinical Language Translation

2019-02-04

Wei-Hung Weng, Yu-An Chung, Peter Szolovits

arXiv_CL

arXiv_CL Represenation_Learning
Abstract

As patients’ access to their doctors’ clinical notes becomes common, translating professional, clinical jargon to layperson-understandable language is essential to improve patient-clinician communication. Such translation yields better clinical outcomes by enhancing patients’ understanding of their own health conditions, and thus improving patients’ involvement in their own care. Existing research has used dictionary-based word replacement or definition insertion to approach the need. However, these methods are limited by expert curation, which is hard to scale and has trouble generalizing to unseen datasets that do not share an overlapping vocabulary. In contrast, we approach the clinical word and sentence translation problem in a completely unsupervised manner. We show that a framework using representation learning, bilingual dictionary induction and statistical machine translation yields the best precision at 10 of 0.827 on professional-to-consumer word translation, and mean opinion scores of 4.10 and 4.28 out of 5 for clinical correctness and layperson readability, respectively, on sentence translation. Our fully-unsupervised strategy overcomes the curation problem, and the clinically meaningful evaluation reduces biases from inappropriate evaluators, which are critical in clinical machine learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01177

PDF

http://arxiv.org/pdf/1902.01177
Read All
A Unified Framework for Marketing Budget Allocation

2019-02-04

Kui Zhao, Junhao Hua, Ling Yan, Qi Zhang, Huan Xu, Cheng Yang

arXiv_AI

arXiv_AI Optimization Relation
Abstract

While marketing budget allocation has been studied for decades in traditional business, nowadays online business brings much more challenges due to the dynamic environment and complex decision-making process. In this paper, we present a novel unified framework for marketing budget allocation. By leveraging abundant data, the proposed data-driven approach can help us to overcome the challenges and make more informed decisions. In our approach, a semi-black-box model is built to forecast the dynamic market response and an efficient optimization method is proposed to solve the complex allocation task. First, the response in each market-segment is forecasted by exploring historical data through a semi-black-box model, where the capability of logit demand curve is enhanced by neural networks. The response model reveals relationship between sales and marketing cost. Based on the learned model, budget allocation is then formulated as an optimization problem, and we design efficient algorithms to solve it in both continuous and discrete settings. Several kinds of business constraints are supported in one unified optimization paradigm, including cost upper bound, profit lower bound, or ROI lower bound. The proposed framework is easy to implement and readily to handle large-scale problems. It has been successfully applied to many scenarios in Alibaba Group. The results of both offline experiments and online A/B testing demonstrate its effectiveness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01128

PDF

http://arxiv.org/pdf/1902.01128
Read All
Realistic Image Generation using Region-phrase Attention

2019-02-04

Wanming Huang, Yida Xu, Ian Oppermann

arXiv_CV

arXiv_CV Adversarial Attention GAN
Abstract

The Generative Adversarial Network (GAN) has recently been applied to generate synthetic images from text. Despite significant advances, most current state-of-the-art algorithms are regular-grid region based; when attention is used, it is mainly applied between individual regular-grid regions and a word. These approaches are sufficient to generate images that contain a single object in its foreground, such as a “bird” or “flower”. However, natural languages often involve complex foreground objects and the background may also constitute a variable portion of the generated image. Therefore, the regular-grid based image attention weights may not necessarily concentrate on the intended foreground region(s), which in turn, results in an unnatural looking image. Additionally, individual words such as “a”, “blue” and “shirt” do not necessarily provide a full visual context unless they are applied together. For this reason, in our paper, we proposed a novel method in which we introduced an additional set of attentions between true-grid regions and word phrases. The true-grid region is derived using a set of auxiliary bounding boxes. These auxiliary bounding boxes serve as superior location indicators to where the alignment and attention should be drawn with the word phrases. Word phrases are derived from analysing Part-of-Speech (POS) results. We perform experiments on this novel network architecture using the Microsoft Common Objects in Context (MSCOCO) dataset and the model generates $256 \times 256$ conditioned on a short sentence description. Our proposed approach is capable of generating more realistic images compared with the current state-of-the-art algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05395

PDF

http://arxiv.org/pdf/1902.05395
Read All
Object Detection and 3D Estimation via an FMCW Radar Using a Fully Convolutional Network

2019-02-04

Guoqiang Zhang, Haopeng Li, Fabian Wenger

arXiv_CV

arXiv_CV Object_Detection CNN Deep_Learning Detection
Abstract

This paper considers object detection and 3D estimation using an FMCW radar. The state-of-the-art deep learning framework is employed instead of using traditional signal processing. In preparing the radar training data, the ground truth of an object orientation in 3D space is provided by conducting image analysis, of which the images are obtained through a coupled camera to the radar device. To ensure successful training of a fully convolutional network (FCN), we propose a normalization method, which is found to be essential to be applied to the radar signal before feeding into the neural network. The system after proper training is able to first detect the presence of an object in an environment. If it does, the system then further produces an estimation of its 3D position. Experimental results show that the proposed system can be successfully trained and employed for detecting a car and further estimating its 3D position in a noisy environment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05394

PDF

http://arxiv.org/pdf/1902.05394
Read All
The Natural Language of Actions

2019-02-04

Guy Tennenholtz, Shie Mannor

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning Embedding Relation
Abstract

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We show how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then use these for augmenting state representations as well as improving function approximation of Q-values. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01119

PDF

http://arxiv.org/pdf/1902.01119
Read All
Exploring Temporal Dependencies in Multimodal Referring Expressions with Mixed Reality

2019-02-04

Elena Sibirtseva, Ali Ghadirzadeh, Iolanda Leite, Mårten Björkman, Danica Kragic

arXiv_RO

arXiv_RO
Abstract

In collaborative tasks, people rely both on verbal and non-verbal cues simultaneously to communicate with each other. For human-robot interaction to run smoothly and naturally, a robot should be equipped with the ability to robustly disambiguate referring expressions. In this work, we propose a model that can disambiguate multimodal fetching requests using modalities such as head movements, hand gestures, and speech. We analysed the acquired data from mixed reality experiments and formulated a hypothesis that modelling temporal dependencies of events in these three modalities increases the model’s predictive power. We evaluated our model on a Bayesian framework to interpret referring expressions with and without exploiting a temporal prior.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01117

PDF

http://arxiv.org/pdf/1902.01117
Read All
Dual Path Multi-Scale Fusion Networks with Attention for Crowd Counting

2019-02-04

Liang Zhu, Zhijian Zhao, Chao Lu, Yining Lin, Yao Peng, Tangren Yao

arXiv_CV

arXiv_CV Attention CNN
Abstract

The task of crowd counting in varying density scenes is an extremely difficult challenge due to large scale variations. In this paper, we propose a novel dual path multi-scale fusion network architecture with attention mechanism named SFANet that can perform accurate count estimation as well as present high-resolution density maps for highly congested crowd scenes. The proposed SFANet contains two main components: a VGG backbone convolutional neural network (CNN) as the front-end feature map extractor and a dual path multi-scale fusion networks as the back-end to generate density map. These dual path multi-scale fusion networks have the same structure, one path is responsible for generating attention map by highlighting crowd regions in images, the other path is responsible for fusing multi-scale features as well as attention map to generate the final high-quality high-resolution density maps. SFANet can be easily trained in an end-to-end way by dual path joint training. We have evaluated our method on four crowd counting datasets (ShanghaiTech, UCF CC 50, UCSD and UCF-QRNF). The results demonstrate that with attention mechanism and multi-scale feature fusion, the proposed SFANet achieves the best performance on all these datasets and generates better quality density maps compared with other state-of-the-art approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01115

PDF

http://arxiv.org/pdf/1902.01115
Read All
Strategies for Structuring Story Generation

2019-02-04

Angela Fan, Mike Lewis, Yann Dauphin

arXiv_CL

arXiv_CL Face Text_Generation Language_Model
Abstract

Writers generally rely on plans or sketches to write long stories, but most current language models generate word by word from left to right. We explore coarse-to-fine models for creating narrative texts of several hundred words, and introduce new models which decompose stories by abstracting over actions and entities. The model first generates the predicate-argument structure of the text, where different mentions of the same entity are marked with placeholder tokens. It then generates a surface realization of the predicate-argument structure, and finally replaces the entity placeholders with context-sensitive names and references. Human judges prefer the stories from our models to a wide range of previous approaches to hierarchical text generation. Extensive analysis shows that our methods can help improve the diversity and coherence of events and entities in generated stories.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01109

PDF

http://arxiv.org/pdf/1902.01109
Read All
Compatible and Diverse Fashion Image Inpainting

2019-02-04

Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis

arXiv_CV

arXiv_CV Quantitative
Abstract

Visual compatibility is critical for fashion analysis, yet is missing in existing fashion image synthesis systems. In this paper, we propose to explicitly model visual compatibility through fashion image inpainting. To this end, we present Fashion Inpainting Networks (FiNet), a two-stage image-to-image generation framework that is able to perform compatible and diverse inpainting. Disentangling the generation of shape and appearance to ensure photorealistic results, our framework consists of a shape generation network and an appearance generation network. More importantly, for each generation network, we introduce two encoders interacting with one another to learn latent code in a shared compatibility space. The latent representations are jointly optimized with the corresponding generation network to condition the synthesis process, encouraging a diverse set of generated results that are visually compatible with existing fashion garments. In addition, our framework is readily extended to clothing reconstruction and fashion transfer, with impressive results. Extensive experiments with comparisons with state-of-the-art approaches on fashion synthesis task quantitatively and qualitatively demonstrate the effectiveness of our method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01096

PDF

http://arxiv.org/pdf/1902.01096
Read All
Predictive Uncertainty Quantification with Compound Density Networks

2019-02-04

Agustinus Kristiadi, Asja Fischer

arXiv_AI

arXiv_AI Adversarial Inference Prediction
Abstract

Despite the huge success of deep neural networks (NNs), finding good mechanisms for quantifying their prediction uncertainty is still an open problem. Bayesian neural networks are one of the most popular approaches to uncertainty quantification. On the other hand, it was recently shown that ensembles of NNs, which belong to the class of mixture models, can be used to quantify prediction uncertainty. In this paper, we build upon these two approaches. First, we increase the mixture model’s flexibility by replacing the fixed mixing weights by an adaptive, input-dependent distribution (specifying the probability of each component) represented by NNs, and by considering uncountably many mixture components. The resulting class of models can be seen as the continuous counterpart to mixture density networks and is therefore referred to as compound density networks (CDNs). We employ both maximum likelihood and variational Bayesian inference to train CDNs, and empirically show that they yield better uncertainty estimates on out-of-distribution data and are more robust to adversarial examples than the previous approaches.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01080

PDF

http://arxiv.org/pdf/1902.01080
Read All
Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions

2019-02-04

Alexandros Stergiou, Georgios Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, Remco Veltkamp, Ronald Poppe

arXiv_CV

arXiv_CV Salient CNN Video_Classification Classification Deep_Learning Recognition
Abstract

Deep learning approaches have been established as the main methodology for video classification and recognition. Recently, 3-dimensional convolutions have been used to achieve state-of-the-art performance in many challenging video datasets. Because of the high level of complexity of these methods, as the convolution operations are also extended to additional dimension in order to extract features from them as well, providing a visualization for the signals that the network interpret as informative, is a challenging task. An effective notion of understanding the network’s inner-workings would be to isolate the spatio-temporal regions on the video that the network finds most informative. We propose a method called Saliency Tubes which demonstrate the foremost points and regions in both frame level and over time that are found to be the main focus points of the network. We demonstrate our findings on widely used datasets for third-person and egocentric action classification and enhance the set of methods and visualizations that improve 3D Convolutional Neural Networks (CNNs) intelligibility.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01078

PDF

http://arxiv.org/pdf/1902.01078
Read All
Jumping Manifolds: Geometry Aware Dense Non-Rigid Structure from Motion

2019-02-04

Suryansh Kumar

arXiv_CV

arXiv_CV Face
Abstract

Given dense image feature correspondences of a non-rigidly moving object across multiple frames, this paper proposes an algorithm to estimate its 3D shape for each frame. To solve this problem accurately, the recent state-of-the-art algorithm reduces this task to set of local linear subspace reconstruction and clustering problem using Grassmann manifold representation \cite{kumar2018scalable}. Unfortunately, their method missed on some of the critical issues associated with the modeling of surface deformations, for e.g., the dependence of a local surface deformation on its neighbors. Furthermore, their representation to group high dimensional data points inevitably introduce the drawbacks of categorizing samples on the high-dimensional Grassmann manifold \cite{huang2015projection, harandi2014manifold}. Hence, to deal with such limitations with \cite{kumar2018scalable}, we propose an algorithm that jointly exploits the benefit of high-dimensional Grassmann manifold to perform reconstruction, and its equivalent lower-dimensional representation to infer suitable clusters. To accomplish this, we project each Grassmannians onto a lower-dimensional Grassmann manifold which preserves and respects the deformation of the structure w.r.t its neighbors. These Grassmann points in the lower-dimension then act as a representative for the selection of high-dimensional Grassmann samples to perform each local reconstruction. In practice, our algorithm provides a geometrically efficient way to solve dense NRSfM by switching between manifolds based on its benefit and usage. Experimental results show that the proposed algorithm is very effective in handling noise with reconstruction accuracy as good as or better than the competing methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01077

PDF

http://arxiv.org/pdf/1902.01077
Read All
Causal Effect Identification from Multiple Incomplete Data Sources: A General Search-based Approach

2019-02-04

Santtu Tikka, Antti Hyttinen, Juha Karvanen

arXiv_AI

arXiv_AI Knowledge Inference
Abstract

Causal effect identification considers whether an interventional probability distribution can be uniquely determined without parametric assumptions from measured source distributions and structural knowledge on the generating system. While complete graphical criteria and procedures exist for many identification problems, there are still challenging but important extensions that have not been considered in the literature. To tackle these new settings, we present a search algorithm directly over the rules of do-calculus. Due to generality of do-calculus, the search is capable of taking more advanced data-generating mechanisms into account along with an arbitrary type of both observational and experimental source distributions. The search is enhanced via a heuristic and search space reduction techniques. The approach, called do-search, is provably sound, and it is complete with respect to identifiability problems that have been shown to be completely characterized by do-calculus. When extended with additional rules, the search is capable of handling missing data problems as well. With the versatile search, we are able to approach new problems such as combined transportability and selection bias, or multiple sources of selection bias. We also perform a systematic analysis of bivariate missing data problems and study causal inference under case-control design.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01073

PDF

http://arxiv.org/pdf/1902.01073
Read All
A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization

2019-02-04

Wonseok Hwang, Jinyeung Yim, Seunghyun Park, Minjoon Seo

arXiv_CL

arXiv_CL
Abstract

WikiSQL is the task of mapping a natural language question to a SQL query given a table from a Wikipedia article. We first show that learning highly context- and table-aware word representations is arguably the most important consideration for achieving a high accuracy in the task. We explore three variants of BERT-based architecture and our best model outperforms the previous state of the art by 8.2% and 2.5% in logical form and execution accuracy, respectively. We provide a detailed analysis of the models to guide how word contextualization can be utilized in a such semantic parsing task. We then argue that this score is near the upper bound in WikiSQL, where we observe that the most of the evaluation errors are due to wrong annotations. We also measure human accuracy on a portion of the dataset and show that our model exceeds the human performance, at least by 1.4% execution accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01069

PDF

http://arxiv.org/pdf/1902.01069
Read All
3D point cloud registration with shape constraint

2019-02-04

Swapna Agarwal, Brojeshwar Bhowmick

arXiv_CV

arXiv_CV Detection
Abstract

In this paper, a shape-constrained iterative algorithm is proposed to register a rigid template point-cloud to a given reference point-cloud. The algorithm embeds a shape-based similarity constraint into the principle of gravitation. The shape-constrained gravitation, as induced by the reference, controls the movement of the template such that at each iteration, the template better aligns with the reference in terms of shape. This constraint enables the alignment in difficult conditions indtroduced by change (presence of outliers and/or missing parts), translation, rotation and scaling. We discuss efficient implementation techniques with least manual intervention. The registration is shown to be useful for change detection in the 3D point-cloud. The algorithm is compared with three state-of-the-art registration approaches. The experiments are done on both synthetic and real-world data. The proposed algorithm is shown to perform better in the presence of big rotation, structured and unstructured outliers and missing data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01061

PDF

http://arxiv.org/pdf/1902.01061
Read All
End-to-end feature fusion siamese network for adaptive visual tracking

2019-02-04

Dongyan Guo, Jun Wang, Weixuan Zhao, Ying Cui, Zhenhua Wang, Shengyong Chen

arXiv_CV

arXiv_CV Salient Tracking
Abstract

According to observations, different visual objects have different salient features in different scenarios. Even for the same object, its salient shape and appearance features may change greatly from time to time in a long-term tracking task. Motivated by them, we proposed an end-to-end feature fusion framework based on Siamese network, named FF-Siam, which can effectively fuse different features for adaptive visual tracking. The framework consists of four layers. A feature extraction layer is designed to extract the different features of the target region and search region. The extracted features are then put into a weight generation layer to obtain the channel weights, which indicate the importance of different feature channels. Both features and the channel weights are utilized in a template generation layer to generate a discriminative template. Finally, the corresponding response maps created by the convolution of the search region features and the template are applied with a fusion layer to obtain the final response map for locating the target. Experimental results demonstrate that the proposed framework achieves state-of-the-art performance on the popular Temple-Color, OTB50 and UAV123 benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01057

PDF

http://arxiv.org/pdf/1902.01057
Read All
Training Medical Image Analysis Systems like Radiologists

2019-02-04

Gabriel Maicas, Andrew P. Bradley, Jacinto C. Nascimento, Ian Reid, Gustavo Carneiro

arXiv_AI

arXiv_AI Classification
Abstract

The training of medical image analysis systems using machine learning approaches follows a common script: collect and annotate a large dataset, train the classifier on the training set, and test it on a hold-out test set. This process bears no direct resemblance with radiologist training, which is based on solving a series of tasks of increasing difficulty, where each task involves the use of significantly smaller datasets than those used in machine learning. In this paper, we propose a novel training approach inspired by how radiologists are trained. In particular, we explore the use of meta-training that models a classifier based on a series of tasks. Tasks are selected using teacher-student curriculum learning, where each task consists of simple classification problems containing small training sets. We hypothesize that our proposed meta-training approach can be used to pre-train medical image analysis models. This hypothesis is tested on the automatic breast screening classification from DCE-MRI trained with weakly labeled datasets. The classification performance achieved by our approach is shown to be the best in the field for that application, compared to state of art baseline approaches: DenseNet, multiple instance learning and multi-task learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.10884

PDF

http://arxiv.org/pdf/1805.10884
Read All
Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI

2019-02-04

Gabriel Maicas, Andrew P. Bradley, Jacinto C. Nascimento, Ian Reid, Gustavo Carneiro

arXiv_CV

arXiv_CV Classification
Abstract

We propose a new method for breast cancer screening from DCE-MRI based on a post-hoc approach that is trained using weakly annotated data (i.e., labels are available only at the image level without any lesion delineation). Our proposed post-hoc method automatically diagnosis the whole volume and, for positive cases, it localizes the malignant lesions that led to such diagnosis. Conversely, traditional approaches follow a pre-hoc approach that initially localises suspicious areas that are subsequently classified to establish the breast malignancy – this approach is trained using strongly annotated data (i.e., it needs a delineation and classification of all lesions in an image). Another goal of this paper is to establish the advantages and disadvantages of both approaches when applied to breast screening from DCE-MRI. Relying on experiments on a breast DCE-MRI dataset that contains scans of 117 patients, our results show that the post-hoc method is more accurate for diagnosing the whole volume per patient, achieving an AUC of 0.91, while the pre-hoc method achieves an AUC of 0.81. However, the performance for localising the malignant lesions remains challenging for the post-hoc method due to the weakly labelled dataset employed during training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.09404

PDF

http://arxiv.org/pdf/1809.09404
Read All
Towards Pedestrian Detection Using RetinaNet in ECCV 2018 Wider Pedestrian Detection Challenge

2019-02-04

Md Ashraful Alam Milton

arXiv_CV

arXiv_CV Object_Detection Segmentation Image_Classification Classification Deep_Learning Detection
Abstract

The main essence of this paper is to investigate the performance of RetinaNet based object detectors on pedestrian detection. Pedestrian detection is an important research topic as it provides a baseline for general object detection and has a great number of practical applications like autonomous car, robotics and Security camera. Though extensive research has made huge progress in pedestrian detection, there are still many issues and open for more research and improvement. Recent deep learning based methods have shown state-of-the-art performance in computer vision tasks such as image classification, object detection, and segmentation. Wider pedestrian detection challenge aims at finding improve solutions for pedestrian detection problem. In this paper, We propose a pedestrian detection system based on RetinaNet. Our solution has scored 0.4061 mAP. The code is available at https://github.com/miltonbd/ECCV_2018_pedestrian_detection_challenege.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01031

PDF

http://arxiv.org/pdf/1902.01031
Read All
Extracting Multiple-Relations in One-Pass with Pre-Trained Transformers

2019-02-04

Haoyu Wang, Ming Tan, Mo Yu, Shiyu Chang, Dakuo Wang, Kun Xu, Xiaoxiao Guo, Saloni Potdar

arXiv_AI

arXiv_AI Relation_Extraction Attention Embedding Prediction Relation
Abstract

Most approaches to extraction multiple relations from a paragraph require multiple passes over the paragraph. In practice, multiple passes are computationally expensive and this makes difficult to scale to longer paragraphs and larger text corpora. In this work, we focus on the task of multiple relation extraction by encoding the paragraph only once (one-pass). We build our solution on the pre-trained self-attentive (Transformer) models, where we first add a structured prediction layer to handle extraction between multiple entity pairs, then enhance the paragraph embedding to capture multiple relational information associated with each entity with an entity-aware attention technique. We show that our approach is not only scalable but can also perform state-of-the-art on the standard benchmark ACE 2005.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01030

PDF

http://arxiv.org/pdf/1902.01030
Read All
Estimation with Fast Landmark Selection in Robot Visual Navigation

2019-02-04

Hossein K. Mousavi, Nader Motee

arXiv_RO

arXiv_RO
Abstract

We consider the visual feature selection to improve the estimation quality required for the accurate navigation of a robot. We build upon a key property that asserts: contributions of trackable features (landmarks) appear linearly in the information matrix of the corresponding estimation problem. We utilize standard models for motion and vision system using a camera to formulate the feature selection problem over moving finite time horizons. A scalable randomized sampling algorithm is proposed to select more informative features (and ignore the rest) to achieve a superior position estimation quality. We provide probabilistic performance guarantees for our method. The time-complexity of our feature selection algorithm is linear in the number of candidate features, which is practically plausible and outperforms existing greedy methods that scale quadratically with the number of candidates features. Our numerical simulations confirm that not only the execution time of our proposed method is comparably less than that of the greedy method, but also the resulting estimation quality is very close to the greedy method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01026

PDF

http://arxiv.org/pdf/1902.01026
Read All
Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network

2019-02-04

Shervin Minaee, Amirali Abdolrashidi

arXiv_CV

arXiv_CV Attention Face CNN Deep_Learning Recognition
Abstract

Facial expression recognition has been an active research area over the past few decades, and it is still challenging due to the high intra-class variation. Traditional approaches for this problem rely on hand-crafted features such as SIFT, HOG and LBP, followed by a classifier trained on a database of images or videos. Most of these works perform reasonably well on datasets of images captured in a controlled condition, but fail to perform as good on more challenging datasets with more image variation and partial faces. In recent years, several works proposed an end-to-end framework for facial expression recognition, using deep learning models. Despite the better performance of these works, there still seems to be a great room for improvement. In this work, we propose a deep learning approach based on attentional convolutional network, which is able to focus on important parts of the face, and achieves significant improvement over previous models on multiple datasets, including FER-2013, CK+, FERG, and JAFFE. We also use a visualization technique which is able to find important face regions for detecting different emotions, based on the classifier’s output. Through experimental results, we show that different emotions seems to be sensitive to different parts of the face.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01019

PDF

http://arxiv.org/pdf/1902.01019
Read All
Crystal Loss and Quality Pooling for Unconstrained Face Verification and Recognition

2019-02-04

Rajeev Ranjan, Ankan Bansal, Hongyu Xu, Swami Sankaranarayanan, Jun-Cheng Chen, Carlos D. Castillo, Rama Chellappa

arXiv_CV

arXiv_CV Face CNN Classification Deep_Learning Recognition
Abstract

In recent years, the performance of face verification and recognition systems based on deep convolutional neural networks (DCNNs) has significantly improved. A typical pipeline for face verification includes training a deep network for subject classification with softmax loss, using the penultimate layer output as the feature descriptor, and generating a cosine similarity score given a pair of face images or videos. The softmax loss function does not optimize the features to have higher similarity score for positive pairs and lower similarity score for negative pairs, which leads to a performance gap. In this paper, we propose a new loss function, called Crystal Loss, that restricts the features to lie on a hypersphere of a fixed radius. The loss can be easily implemented using existing deep learning frameworks. We show that integrating this simple step in the training pipeline significantly improves the performance of face verification and recognition systems. We achieve state-of-the-art performance for face verification and recognition on challenging LFW, IJB-A, IJB-B and IJB-C datasets over a large range of false alarm rates (10-1 to 10-7).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.01159

PDF

http://arxiv.org/pdf/1804.01159
Read All
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

2019-02-04

R. Thomas McCoy, Ellie Pavlick, Tal Linzen

arXiv_CL

arXiv_CL Inference
Abstract

Machine learning systems can often achieve high performance on a test set by relying on heuristics that are effective for frequent example types but break down in more challenging cases. We study this issue within natural language inference (NLI), the task of determining whether one sentence entails another. Based on an analysis of the task, we hypothesize three fallible syntactic heuristics that NLI models are likely to adopt: the lexical overlap heuristic, the subsequence heuristic, and the constituent heuristic. To determine whether models have adopted these heuristics, we introduce a controlled evaluation set called HANS (Heuristic Analysis for NLI Systems), which contains many examples where the heuristics fail. We find that models trained on MNLI, including the state-of-the-art model BERT, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics. We conclude that there is substantial room for improvement in NLI systems, and that the HANS dataset can motivate and measure progress in this area.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.01007

PDF

http://arxiv.org/pdf/1902.01007
Read All
A Tangent Distance Preserving Dimensionality Reduction Algorithm

2019-02-04

Xu Zhao, Zongli Jiang

arXiv_CV

arXiv_CV
Abstract

This paper considers the problem of nonlinear dimensionality reduction. Unlike existing methods, such as LLE, ISOMAP, which attempt to unfold the true manifold in the low dimensional space, our algorithm tries to preserve the nonlinear structure of the manifold, and shows how the manifold is folded in the high dimensional space. We call this method Tangent Distance Preserving Mapping (TDPM). TDPM uses tangent distance instead of geodesic distance, and then applies MDS to the tangent distance matrix to map the manifold into a low dimensional space in which we can get its nonlinear structure.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05373

PDF

http://arxiv.org/pdf/1902.05373
Read All
Boosting with Lexicographic Programming: Addressing Class Imbalance without Cost Tuning

2019-02-04

Shounak Datta, Sayak Nag, Swagatam Das

arXiv_CV

arXiv_CV Image_Classification Classification
Abstract

A large amount of research effort has been dedicated to adapting boosting for imbalanced classification. However, boosting methods are yet to be satisfactorily immune to class imbalance, especially for multi-class problems. This is because most of the existing solutions for handling class imbalance rely on expensive cost set tuning for determining the proper level of compensation. We show that the assignment of weights to the component classifiers of a boosted ensemble can be thought of as a game of Tug of War between the classes in the margin space. We then demonstrate how this insight can be used to attain a good compromise between the rare and abundant classes without having to resort to cost set tuning, which has long been the norm for imbalanced classification. The solution is based on a lexicographic linear programming framework which requires two stages. Initially, class-specific component weight combinations are found so as to minimize a hinge loss individually for each of the classes. Subsequently, the final component weights are assigned so that the maximum deviation from the class-specific minimum loss values (obtained in the previous stage) is minimized. Hence, the proposal is not only restricted to two-class situations, but is also readily applicable to multi-class problems. Additionally,we also derive the dual formulation corresponding to the proposed framework. Experiments conducted on artificial and real-world imbalanced datasets as well as on challenging applications such as hyperspectral image classification and ImageNet classification establish the efficacy of the proposal.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1708.09684

PDF

http://arxiv.org/pdf/1708.09684
Read All

166/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL