Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Large-Scale Pedestrian Retrieval Competition

2019-03-06

Da Li, Zhang Zhang

arXiv_CV

arXiv_CV Re-identification Detection
Abstract

The Large-Scale Pedestrian Retrieval Competition (LSPRC) mainly focuses on person retrieval which is an important end application in intelligent vision system of surveillance. Person retrieval aims at searching the interested target with specific visual attributes or images. The low image quality, various camera viewpoints, large pose variations and occlusions in real scenes make it a challenge problem. By providing large-scale surveillance data in real scene and standard evaluation methods that are closer to real application, the competition aims to improve the robust of related algorithms and further meet the complicated situations in real application. LSPRC includes two kinds of tasks, i.e., Attribute based Pedestrian Retrieval (PR-A) and Re-IDentification (ReID) based Pedestrian Retrieval (PR-ID). The normal evaluation index, i.e., mean Average Precision (mAP), is used to measure the performances of the two tasks under various scale, pose and occlusion. While the method of system evaluation is introduced to evaluate the person retrieval system in which the related algorithms of the two tasks are integrated into a large-scale video parsing platform (named ISEE) combing with algorithm of pedestrian detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02137

PDF

http://arxiv.org/pdf/1903.02137
Read All
Negative Training for Neural Dialogue Response Generation

2019-03-06

Tianxing He, James Glass

arXiv_CL

arXiv_CL Deep_Learning
Abstract

Although deep learning models have brought tremendous advancements to the field of open-domain dialogue response generation, recent research results have revealed that the trained models have undesirable generation behaviors, such as malicious responses and generic (boring) responses. In this work, we propose a framework named “Negative Training” to minimize such behaviors. Given a trained model, the framework will first find generated samples that exhibit the undesirable behavior, and then use them to feed negative training signals for fine-tuning the model. Our experiments show that negative training can significantly reduce the hit rate of malicious responses (e.g. from 12.6% to 0%), or discourage frequent responses and improve response diversity (e.g. improve response entropy by over 63%).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02134

PDF

http://arxiv.org/pdf/1903.02134
Read All
Age Progression and Regression with Spatial Attention Modules

2019-03-06

Qi Li, Yunfan Liu, Zhenan Sun

arXiv_CV

arXiv_CV Adversarial Attention GAN Face
Abstract

Age progression and regression refers to aesthetically rendering a given face image to present effects of face aging and rejuvenation, respectively. Although numerous studies have been conducted in this topic, there are still two major problems: 1) multiple models are usually trained to simulate different age mappings, and 2) the photo-realism of generated face images is heavily influenced by the variation of training images in terms of pose, illumination, and background. To address these issues, in this paper, we propose a framework based on conditional Generative Adversarial Networks (cGANs) to achieve age progression and regression simultaneously. Particularly, since face aging and rejuvenation are largely different in terms of image translation patterns, we model these two processes using two separate generators, each dedicated to one age changing process. In addition, we exploit the spatial attention mechanism to limit image modifications to regions closely related to age changes, so that images with high visual fidelity could be synthesized for in-the-wild cases. Experiments on multiple datasets demonstrate the ability of our model in synthesizing lifelike face images at desired ages with personalized features well preserved.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02133

PDF

http://arxiv.org/pdf/1903.02133
Read All
SentRNA: Improving computational RNA design by incorporating a prior of human design strategies

2019-03-06

Jade Shi (EteRNA players), Rhiju Das, Vijay S. Pande

arXiv_AI

arXiv_AI
Abstract

Solving the RNA inverse folding problem is a critical prerequisite to RNA design, an emerging field in bioengineering with a broad range of applications from reaction catalysis to cancer therapy. Although significant progress has been made in developing machine-based inverse RNA folding algorithms, current approaches still have difficulty designing sequences for large or complex targets. On the other hand, human players of the online RNA design game EteRNA have consistently shown superior performance in this regard, being able to readily design sequences for targets that are challenging for machine algorithms. Here we present a novel approach to the RNA design problem, SentRNA, a design agent consisting of a fully-connected neural network trained end-to-end using human-designed RNA sequences. We show that through this approach, SentRNA can solve complex targets previously unsolvable by any machine-based approach and achieve state-of-the-art performance on two separate challenging test sets. Our results demonstrate that incorporating human design strategies into a design algorithm can significantly boost machine performance and suggests a new paradigm for machine-based RNA design.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.03146

PDF

http://arxiv.org/pdf/1803.03146
Read All
The Singular Values of Convolutional Layers

2019-03-06

Hanie Sedghi, Vineet Gupta, Philip M. Long

arXiv_AI

arXiv_AI CNN
Abstract

We characterize the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer, enabling their efficient computation. This characterization also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. We show that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2\% to 5.3\%.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.10408

PDF

http://arxiv.org/pdf/1805.10408
Read All
Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation

2019-03-05

Zhi Tian, Chunhua Shen, Tong He, Youliang Yan

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Prediction
Abstract

Recent semantic segmentation methods exploit encoder-decoder architectures to produce the desired pixel-wise segmentation prediction. The last layer of the decoders is typically a bilinear upsampling procedure to recover the final pixel-wise prediction. We empirically show that this oversimple and data-independent bilinear upsampling may lead to sub-optimal results. In this work, we propose a data-dependent upsampling (DUpsampling) to replace bilinear, which takes advantages of the redundancy in the label space of semantic segmentation and is able to recover the pixel-wise prediction from low-resolution outputs of CNNs. The main advantage of the new upsampling layer lies in that with a relatively lower-resolution feature map such as $\frac{1}{16}$ or $\frac{1}{32}$ of the input size, we can achieve even better segmentation accuracy, significantly reducing computation complexity. This is made possible by 1) the new upsampling layer’s much improved reconstruction capability; and more importantly 2) the DUpsampling based decoder’s flexibility in leveraging almost arbitrary combinations of the CNN encoders’ features. Experiments demonstrate that our proposed decoder outperforms the state-of-the-art decoder, with only $\sim$20\% of computation. Finally, without any post-processing, the framework equipped with our proposed decoder achieves new state-of-the-art performance on two datasets: 88.1\% mIOU on PASCAL VOC with 30\% computation of the previously best model; and 52.5\% mIOU on PASCAL Context.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02120

PDF

http://arxiv.org/pdf/1903.02120
Read All
Uncertainty-Aware Imitation Learning using Kernelized Movement Primitives

2019-03-05

João Silvério, Yanlong Huang, Fares J. Abu-Dakka, Leonel Rozo, Darwin G. Caldwell

arXiv_RO

arXiv_RO Relation
Abstract

During the past few years, probabilistic approaches to imitation learning have earned a relevant place in the literature. One of their most prominent features, in addition to extracting a mean trajectory from task demonstrations, is that they provide a variance estimation. The intuitive meaning of this variance, however, changes across different techniques, indicating either variability or uncertainty. In this paper we leverage kernelized movement primitives (KMP) to provide a new perspective on imitation learning by predicting variability, correlations and uncertainty about robot actions. This rich set of information is used in combination with optimal controller fusion to learn actions from data, with two main advantages: i) robots become safe when uncertain about their actions and ii) they are able to leverage partial demonstrations, given as elementary sub-tasks, to optimally perform a higher level, more complex task. We showcase our approach in a painting task, where a human user and a KUKA robot collaborate to paint a wooden board. The task is divided into two sub-tasks and we show that using our approach the robot becomes compliant (hence safe) outside the training regions and executes the two sub-tasks with optimal gains.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02114

PDF

http://arxiv.org/pdf/1903.02114
Read All
Road Segmentation Using CNN and Distributed LSTM

2019-03-05

Yecheng Lyu, Lin Bai, Xinming Huang

arXiv_CV

arXiv_CV Segmentation GAN CNN RNN
Abstract

In automated driving systems (ADS) and advanced driver-assistance systems (ADAS), an efficient road segmentation is necessary to perceive the drivable region and build an occupancy map for path planning. The existing algorithms implement gigantic convolutional neural networks (CNNs) that are computationally expensive and time consuming. In this paper, we introduced distributed LSTM, a neural network widely used in audio and video processing, to process rows and columns in images and feature maps. We then propose a new network combining the convolutional and distributed LSTM layers to solve the road segmentation problem. In the end, the network is trained and tested in KITTI road benchmark. The result shows that the combined structure enhances the feature extraction and processing but takes less processing time than pure CNN structure.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1808.04450

PDF

http://arxiv.org/pdf/1808.04450
Read All
Bounded Residual Gradient Networks for Facial Affect Computing

2019-03-05

Behzad Hasani, Pooran Singh Negi, Mohammad H. Mahoor

arXiv_CV

arXiv_CV Recognition
Abstract

Residual-based neural networks have shown remarkable results in various visual recognition tasks including Facial Expression Recognition (FER). Despite the tremendous efforts have been made to improve the performance of FER systems using DNNs, existing methods are not generalizable enough for practical applications. This paper introduces Bounded Residual Gradient Networks (BReG-Net) for facial expression recognition, in which the shortcut connection between the input and the output of the ResNet module is replaced with a differentiable function with a bounded gradient. This configuration prevents the network from facing the vanishing or exploding gradient problem. We show that utilizing such non-linear units will result in shallower networks with better performance. Further, by using a weighted loss function which gives a higher priority to less represented categories, we can achieve an overall better recognition rate. The results of our experiments show that BReG-Nets outperform state-of-the-art methods on three publicly available facial databases in the wild, on both the categorical and dimensional models of affect.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02110

PDF

http://arxiv.org/pdf/1903.02110
Read All
SleepEEGNet: Automated Sleep Stage Scoring with Sequence to Sequence Deep Learning Approach

2019-03-05

Sajad Mousavi, Fatemeh Afghah, U. Rajendra Acharya

arXiv_AI

arXiv_AI CNN Deep_Learning
Abstract

Electroencephalogram (EEG) is a common base signal used to monitor brain activity and diagnose sleep disorders. Manual sleep stage scoring is a time-consuming task for sleep experts and is limited by inter-rater reliability. In this paper, we propose an automatic sleep stage annotation method called SleepEEGNet using a single-channel EEG signal. The SleepEEGNet is composed of deep convolutional neural networks (CNNs) to extract time-invariant features, frequency information, and a sequence to sequence model to capture the complex and long short-term context dependencies between sleep epochs and scores. In addition, to reduce the effect of the class imbalance problem presented in the available sleep datasets, we applied novel loss functions to have an equal misclassified error for each sleep stage while training the network. We evaluated the proposed method on different single-EEG channels (i.e., Fpz-Cz and Pz-Oz EEG channels) from the Physionet Sleep-EDF datasets published in 2013 and 2018. The evaluation results demonstrate that the proposed method achieved the best annotation performance compared to current literature, with an overall accuracy of 84.26%, a macro F1-score of 79.66% and Cohen’s Kappa coefficient = 0.79. Our developed model is ready to test with more sleep EEG signals and aid the sleep specialists to arrive at an accurate diagnosis. The source code is available at https://github.com/SajadMo/SleepEEGNet.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02108

PDF

http://arxiv.org/pdf/1903.02108
Read All
From Selective Deep Convolutional Features to Compact Binary Representations for Image Retrieval

2019-03-05

Thanh-Toan Do, Tuan Hoang, Dang-Khoa Le Tan, Huu Le, Tam V. Nguyen, Ngai-Man Cheung

arXiv_CV

arXiv_CV Image_Caption Image_Retrieval Embedding CNN
Abstract

In the large-scale image retrieval task, the two most important requirements are the discriminability of image representations and the efficiency in computation and storage of representations. Regarding the former requirement, Convolutional Neural Network (CNN) is proven to be a very powerful tool to extract highly discriminative local descriptors for effective image search. Additionally, in order to further improve the discriminative power of the descriptors, recent works adopt fine-tuned strategies. In this paper, taking a different approach, we propose a novel, computationally efficient, and competitive framework. Specifically, we firstly propose various strategies to compute masks, namely SIFT-mask, SUM-mask, and MAX-mask, to select a representative subset of local convolutional features and eliminate redundant features. Our in-depth analyses demonstrate that proposed masking schemes are effective to address the burstiness drawback and improve retrieval accuracy. Secondly, we propose to employ recent embedding and aggregating methods which can significantly boost the feature discriminability. Regarding the computation and storage efficiency, we include a hashing module to produce very compact binary image representations. Extensive experiments on six image retrieval benchmarks demonstrate that our proposed framework achieves the state-of-the-art retrieval performances.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.02899

PDF

http://arxiv.org/pdf/1802.02899
Read All
How to train your MAML

2019-03-05

Antreas Antoniou, Harrison Edwards, Amos Storkey

arXiv_CV

arXiv_CV GAN Inference
Abstract

The field of few-shot learning has recently seen substantial advancements. Most of these advancements came from casting few-shot learning as a meta-learning problem. Model Agnostic Meta Learning or MAML is currently one of the best approaches for few-shot learning via meta-learning. MAML is simple, elegant and very powerful, however, it has a variety of issues, such as being very sensitive to neural network architectures, often leading to instability during training, requiring arduous hyperparameter searches to stabilize training and achieve high generalization and being very computationally expensive at both training and inference times. In this paper, we propose various modifications to MAML that not only stabilize the system, but also substantially improve the generalization performance, convergence speed and computational overhead of MAML, which we call MAML++.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1810.09502

PDF

https://arxiv.org/pdf/1810.09502
Read All
Acoustic Impulse Responses for Wearable Audio Devices

2019-03-05

Ryan M. Corey, Naoki Tsuda, Andrew C. Singer

arXiv_SD

arXiv_SD
Abstract

We present an open-access dataset of over 8000 acoustic impulse from 160 microphones spread across the body and affixed to wearable accessories. The data can be used to evaluate audio capture and array processing systems using wearable devices such as hearing aids, headphones, eyeglasses, jewelry, and clothing. We analyze the acoustic transfer functions of different parts of the body, measure the effects of clothing worn over microphones, compare measurements from a live human subject to those from a mannequin, and simulate the noise-reduction performance of several beamformers. The results suggest that arrays of microphones spread across the body are more effective than those confined to a single device.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02094

PDF

http://arxiv.org/pdf/1903.02094
Read All
Open-Sourced Reinforcement Learning Environments for Surgical Robotics

2019-03-05

Florian Richter, Ryan K. Orosco, Michael C. Yip

arXiv_RO

arXiv_RO Knowledge Reinforcement_Learning
Abstract

Reinforcement Learning (RL) is a machine learning framework for artificially intelligent systems to solve a variety of complex problems. Recent years has seen a surge of successes solving challenging games and smaller domain problems, including simple though non-specific robotic manipulation and grasping tasks. Rapid successes in RL have come in part due to the strong collaborative effort by the RL community to work on common, open-sourced environment simulators such as OpenAI’s Gym that allow for expedited development and valid comparisons between different, state-of-art strategies. In this paper, we aim to bridge the RL and the surgical robotics communities by presenting the first open-sourced reinforcement learning environments for surgical robotics, called dVRL. Through the proposed RL environment, which are functionally equivalent to Gym, we show that it is easy to prototype and implement state-of-art RL algorithms on surgical robotics problems that aim to introduce autonomous robotic precision and accuracy to assisting, collaborative, or repetitive tasks during surgery. Learned policies are furthermore successfully transferable to a real robot. Finally, combining dVRL with the over 40+ international network of da Vinci Surgical Research Kits in active use at academic institutions, we see dVRL as enabling the broad surgical robotics community to fully leverage the newest strategies in reinforcement learning, and for reinforcement learning scientists with no knowledge of surgical robotics to test and develop new algorithms that can solve the real-world, high-impact challenges in autonomous surgery.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02090

PDF

http://arxiv.org/pdf/1903.02090
Read All
Viewpoint Optimization for Autonomous Strawberry Harvesting with Deep Reinforcement Learning

2019-03-05

Jonathon Sather

arXiv_CV

arXiv_CV Reinforcement_Learning Optimization
Abstract

Autonomous harvesting may provide a viable solution to mounting labor pressures in the United States’s strawberry industry. However, due to bottlenecks in machine perception and economic viability, a profitable and commercially adopted strawberry harvesting system remains elusive. In this research, we explore the feasibility of using deep reinforcement learning to overcome these bottlenecks and develop a practical algorithm to address the sub-objective of viewpoint optimization, or the development of a control policy to direct a camera to favorable vantage points for autonomous harvesting. We evaluate the algorithm’s performance in a custom, open-source simulated environment and observe affirmative results. Our trained agent yields 8.7 times higher returns than random actions and 8.8 percent faster exploration than our best baseline policy, which uses visual servoing. Visual investigation shows the agent is able fixate on favorable viewpoints, despite having no explicit means to propagate information through time. Overall, we conclude that deep reinforcement learning is a promising area of research to advance the state of the art in autonomous strawberry harvesting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02074

PDF

http://arxiv.org/pdf/1903.02074
Read All
Defining Image Memorability using the Visual Memory Schema

2019-03-05

Erdem Akagunduz, Adrian G. Bors, Karla K. Evans

arXiv_CV

arXiv_CV Salient GAN CNN Transfer_Learning Deep_Learning Prediction
Abstract

Memorability of an image is a characteristic determined by the human observers’ ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. {The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the properties of cumulative human annotations.} We propose a new concept called the Visual Memory Schema (VMS) referring to an organisation of image components human observers share when encoding and recognising images. The concept of VMS is operationalised by asking human observers to define memorable regions of images they were asked to remember during an episodic memory test. We then statistically assess the consistency of VMSs across observers for either correctly or incorrectly recognised images. The associations of the VMSs with eye fixations and saliency are analysed separately as well. Lastly, we adapt various deep learning architectures for the reconstruction and prediction of memorable regions in images and analyse the results when using transfer learning at the outputs of different convolutional network layers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02056

PDF

http://arxiv.org/pdf/1903.02056
Read All
Size of Interventional Markov Equivalence Classes in Random DAG Models

2019-03-05

Dmitriy Katz, Karthikeyan Shanmugam, Chandler Squires, Caroline Uhler

arXiv_AI

arXiv_AI Inference Relation
Abstract

Directed acyclic graph (DAG) models are popular for capturing causal relationships. From observational and interventional data, a DAG model can only be determined up to its \emph{interventional Markov equivalence class} (I-MEC). We investigate the size of MECs for random DAG models generated by uniformly sampling and ordering an Erd\H{o}s-R'{e}nyi graph. For constant density, we show that the expected $\log$ observational MEC size asymptotically (in the number of vertices) approaches a constant. We characterize I-MEC size in a similar fashion in the above settings with high precision. We show that the asymptotic expected number of interventions required to fully identify a DAG is a constant. These results are obtained by exploiting Meek rules and coupling arguments to provide sharp upper and lower bounds on the asymptotic quantities, which are then calculated numerically up to high precision. Our results have important consequences for experimental design of interventions and the development of algorithms for causal inference.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02054

PDF

http://arxiv.org/pdf/1903.02054
Read All
Pose Estimation of Vehicles Over Uneven Terrain

2019-03-05

Yingchong Ma, Zvi Shiller

arXiv_RO

arXiv_RO Pose_Estimation
Abstract

This paper presents a method for pose estimation of off-road vehicles moving over uneven terrain. It determines the contact points between the wheels and the terrain, assuming rigid contacts between an arbitrary number of wheels and ground. The terrain is represented by a 3D points cloud, interpolated by a B-patch to provide a continuous terrain representation. The pose estimation problem is formulated as a rigid body contact problem for a given location of the vehicle’s center of mass over the terrain and a given yaw angle. The contact points between the wheels and ground are determined by releasing the vehicle from a given point above the terrain, until the contact forces between the wheels and ground, and the gravitational force, reach equilibrium. The contact forces are calculated using singular value decomposition (SVD) of the deduced contact matrix. The proposed method is computationally efficient, allowing real time computation during motion, as demonstrated in several examples. Accurate pose estimations can be used for motion planning, stability analyses and traversability analyses over uneven terrain.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02052

PDF

http://arxiv.org/pdf/1903.02052
Read All
Efficient Constellation-Based Map-Merging for Semantic SLAM

2019-03-05

Kristoffer M. Frey, Ted J. Steiner, Jonathan P. How

arXiv_RO

arXiv_RO SLAM
Abstract

Data association in SLAM is fundamentally challenging, and handling ambiguity well is crucial to achieve robust operation in real-world environments. When ambiguous measurements arise, conservatism often mandates that the measurement is discarded or a new landmark is initialized rather than risking an incorrect association. To address the inevitable duplicate' landmarks that arise, we present an efficient map-merging framework to detect duplicate constellations of landmarks, providing a high-confidence loop-closure mechanism well-suited for object-level SLAM. This approach uses an incrementally-computable approximation of landmark uncertainty that only depends on local information in the SLAM graph, avoiding expensive recovery of the full system covariance matrix. This enables a search based on geometric consistency (GC) (rather than full joint compatibility (JC)) that inexpensively reduces the search space to a handful of best’ hypotheses. Furthermore, we reformulate the commonly-used interpretation tree to allow for more efficient integration of clique-based pairwise compatibility, accelerating the branch-and-bound max-cardinality search. Our method is demonstrated to match the performance of full JC methods at significantly-reduced computational cost, facilitating robust object-based loop-closure over large SLAM problems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1809.09646

PDF

http://arxiv.org/pdf/1809.09646
Read All
Lidar-Monocular Visual Odometry with Genetic Algorithm for Parameter Optimization

2019-03-05

Adarsh Sehgal, Ashutosh Singandhupe, Hung Manh La, Alireza Tavakkoli, Sushil J. Louis

arXiv_RO

arXiv_RO Tracking Optimization Detection
Abstract

Lidar-Monocular Visual Odometry (LIMO), a odometry estimation algorithm, combines camera and LIght Detection And Ranging sensor (LIDAR) for visual localization by tracking camera features as well as features from LIDAR measurements, and it estimates the motion using Bundle Adjustment based on robust key frames. For rejecting the outliers, LIMO uses semantic labelling and weights of the vegetation landmarks. A drawback of LIMO as well as many other odometry estimation algorithms is that it has many parameters that need to be manually adjusted according to the dynamic changes in the environment in order to decrease the translational errors. In this paper, we present and argue the use of Genetic Algorithm to optimize parameters with reference to LIMO and maximize LIMO’s localization and motion estimation performance. We evaluate our approach on the well known KITTI odometry dataset and show that the genetic algorithm helps LIMO to reduce translation error in different datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02046

PDF

http://arxiv.org/pdf/1903.02046
Read All
Learning a Lattice Planner Control Set for Autonomous Vehicles

2019-03-05

Ryan De Iaco, Stephen L. Smith, Krzysztof Czarnecki

arXiv_AI

arXiv_AI Sparse
Abstract

In this paper, we introduce a method to compute a sparse lattice planner control set that is suited to a particular task by learning from a representative dataset of vehicle paths. To do this, we use a scoring measure similar to the Fr'echet distance and propose an algorithm for evaluating a given control set according to the scoring measure. Control actions are then selected from a dense control set according to an objective function that rewards improvements in matching the dataset while also encouraging sparsity. This method is evaluated across several experiments involving real and synthetic datasets, and it is shown to generate smaller control sets when compared to the previous state-of-the-art lattice control set computation technique, with these smaller control sets maintaining a high degree of manoeuvrability in the required task. This results in a planning time speedup of up to 4.31x when using the learned control set over the state-of-the-art computed control set. In addition, we show the learned control sets are better able to capture the driving style of the dataset in terms of path curvature.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02044

PDF

http://arxiv.org/pdf/1903.02044
Read All
Abnormal Chest X-ray Identification With Generative Adversarial One-Class Classifier

2019-03-05

Yuxing Tang, Youbao Tang, Mei Han, Jing Xiao, Ronald M. Summers

arXiv_CV

arXiv_CV Adversarial Quantitative
Abstract

Being one of the most common diagnostic imaging tests, chest radiography requires timely reporting of potential findings in the images. In this paper, we propose an end-to-end architecture for abnormal chest X-ray identification using generative adversarial one-class learning. Unlike previous approaches, our method takes only normal chest X-ray images as input. The architecture is composed of three deep neural networks, each of which learned by competing while collaborating among them to model the underlying content structure of the normal chest X-rays. Given a chest X-ray image in the testing phase, if it is normal, the learned architecture can well model and reconstruct the content; if it is abnormal, since the content is unseen in the training phase, the model would perform poorly in its reconstruction. It thus enables distinguishing abnormal chest X-rays from normal ones. Quantitative and qualitative experiments demonstrate the effectiveness and efficiency of our approach, where an AUC of 0.841 is achieved on the challenging NIH Chest X-ray dataset in a one-class learning setting, with the potential in reducing the workload for radiologists.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02040

PDF

http://arxiv.org/pdf/1903.02040
Read All
Deep Learning in Medical Image Registration: A Survey

2019-03-05

Grant Haskins, Uwe Kruger, Pingkun Yan

arXiv_CV

arXiv_CV GAN Face Survey Deep_Learning
Abstract

The establishment of image correspondence through robust image registration is critical to many clinical tasks such as image fusion, organ atlas creation, and tumor growth monitoring, and is a very challenging problem. Since the beginning of the recent deep learning renaissance, the medical imaging research community has developed deep learning based approaches and achieved the state-of-the-art in many applications, including image registration. The rapid adoption of deep learning for image registration applications over the past few years necessitates a comprehensive summary and outlook, which is the main scope of this survey. This requires placing a focus on the different research areas as well as highlighting challenges that practitioners face. This survey, therefore, outlines the evolution of deep learning based medical image registration in the context of both research challenges and relevant innovations in the past few years. Further, this survey highlights future research directions to show how this field may be possibly moved forward to the next level.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02026

PDF

http://arxiv.org/pdf/1903.02026
Read All
Crowd Counting Using Scale-Aware Attention Networks

2019-03-05

Mohammad Asiful Hossain, Mehrdad Hosseinzadeh, Omit Chanda, Yang Wang

arXiv_CV

arXiv_CV Attention Deep_Learning
Abstract

In this paper, we consider the problem of crowd counting in images. Given an image of a crowded scene, our goal is to estimate the density map of this image, where each pixel value in the density map corresponds to the crowd density at the corresponding location in the image. Given the estimated density map, the final crowd count can be obtained by summing over all values in the density map. One challenge of crowd counting is the scale variation in images. In this work, we propose a novel scale-aware attention network to address this challenge. Using the attention mechanism popular in recent deep learning architectures, our model can automatically focus on certain global and local scales appropriate for the image. By combining these global and local scale attention, our model outperforms other state-of-the-art methods for crowd counting on several benchmark datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02025

PDF

http://arxiv.org/pdf/1903.02025
Read All
Using Natural Language for Reward Shaping in Reinforcement Learning

2019-03-05

Prasoon Goyal, Scott Niekum, Raymond J. Mooney

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Recent reinforcement learning (RL) approaches have shown strong performance in complex domains such as Atari games, but are often highly sample inefficient. A common approach to reduce interaction time with the environment is to use reward shaping, which involves carefully designing reward functions that provide the agent intermediate rewards for progress towards the goal. However, designing appropriate shaping rewards is known to be difficult as well as time-consuming. In this work, we address this problem by using natural language instructions to perform reward shaping. We propose the LanguagE-Action Reward Network (LEARN), a framework that maps free-form natural language instructions to intermediate rewards based on actions taken by the agent. These intermediate language-based rewards can seamlessly be integrated into any standard reinforcement learning algorithm. We experiment with Montezuma’s Revenge from the Atari Learning Environment, a popular benchmark in RL. Our experiments on a diverse set of 15 tasks demonstrate that, for the same number of interactions with the environment, language-based rewards lead to successful completion of the task 60% more often on average, compared to learning without language.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02020

PDF

http://arxiv.org/pdf/1903.02020
Read All
Deep Learning at Scale for Gravitational Wave Parameter Estimation of Binary Black Hole Mergers

2019-03-05

Hongyu Shen, E. A. Huerta, Zhizhen Zhao

arXiv_AI

arXiv_AI Object_Detection Deep_Learning Detection
Abstract

We present the first application of deep learning at scale to do gravitational wave parameter estimation of binary black hole mergers that describe a 4-D signal manifold, i.e., black holes whose spins are aligned or anti-aligned, and which evolve on quasi-circular orbits. We densely sample this 4-D signal manifold using over three hundred thousand simulated waveforms. In order to cover a broad range of astrophysically motivated scenarios, we synthetically enhance this waveform dataset to ensure that our deep learning algorithms can process waveforms located at any point in the data stream of gravitational wave detectors (time invariance) for a broad range of signal-to-noise ratios (scale invariance), which in turn means that our neural network models are trained with over $10^{7}$ waveform signals. We then apply these neural network models to estimate the astrophysical parameters of black hole mergers, and their corresponding black hole remnants, including the final spin and the gravitational wave quasi-normal frequencies. These neural network models represent the first time deep learning is used to provide point-parameter estimation calculations endowed with statistical errors. For each binary black hole merger that ground-based gravitational wave detectors have observed, our deep learning algorithms can reconstruct its parameters within 2 milliseconds using a single Tesla V100 GPU. We show that this new approach produces parameter estimation results that are consistent with Bayesian analyses that have been used to reconstruct the parameters of the catalog of binary black hole mergers observed by the advanced LIGO and Virgo detectors.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01998

PDF

http://arxiv.org/pdf/1903.01998
Read All
Statistical Guarantees for the Robustness of Bayesian Neural Networks

2019-03-05

Luca Cardelli, Marta Kwiatkowska, Luca Laurenti, Nicola Paoletti, Andrea Patane, Matthew Wicker

arXiv_CV

arXiv_CV Adversarial Image_Classification Inference Classification Prediction
Abstract

We introduce a probabilistic robustness measure for Bayesian Neural Networks (BNNs), defined as the probability that, given a test point, there exists a point within a bounded set such that the BNN prediction differs between the two. Such a measure can be used, for instance, to quantify the probability of the existence of adversarial examples. Building on statistical verification techniques for probabilistic models, we develop a framework that allows us to estimate probabilistic robustness for a BNN with statistical guarantees, i.e., with a priori error and confidence bounds. We provide experimental comparison for several approximate BNN inference techniques on image classification tasks associated to MNIST and a two-class subset of the GTSRB dataset. Our results enable quantification of uncertainty of BNN predictions in adversarial settings.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01980

PDF

https://arxiv.org/pdf/1903.01980
Read All
Visibility graphs for robust harmonic similarity measures between audio spectra

2019-03-05

Delia Fano Yela, Dan Stowell, Mark Sandler

arXiv_SD

arXiv_SD
Abstract

Graph theory is emerging as a new source of tools for time series analysis. One promising method is to transform a signal into its visibility graph, a representation which captures many interesting aspects of the signal. Here we introduce the visibility graph for audio spectra. Such visibility graph captures the harmonic content whilst being resilient to broadband noise. We propose to use a structural distance between two graphs as a novel harmonic-biased similarity measure. We present experiments demonstrating the utility of this distance measure for real and synthesised audio data. The source code is available online.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01976

PDF

http://arxiv.org/pdf/1903.01976
Read All
Learning Latent Plans from Play

2019-03-05

Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet

arXiv_RO

arXiv_RO GAN Embedding
Abstract

We propose learning from teleoperated play data (LfP) as a way to scale up multi-task robotic skill learning. Learning from play (LfP) offers three main advantages: 1) It is cheap. Large amounts of play data can be collected quickly as it does not require scene staging, task segmenting, or resetting to an initial state. 2) It is general. It contains both functional and non-functional behavior, relaxing the need for a predefined task distribution. 3) It is rich. Play involves repeated, varied behavior and naturally leads to high coverage of the possible interaction space. These properties distinguish play from expert demonstrations, which are rich, but expensive, and scripted unattended data collection, which is cheap, but insufficiently rich. Variety in play, however, presents a multimodality challenge to methods seeking to learn control on top. To this end, we introduce Play-LMP, a method designed to handle variability in the LfP setting by organizing it in an embedding space. Play-LMP jointly learns 1) reusable latent plan representations unsupervised from play data and 2) a single goal-conditioned policy capable of decoding inferred plans to achieve user-specified tasks. We show empirically that Play-LMP, despite not being trained on task-specific data, is capable of generalizing to 18 complex user-specified manipulation tasks with average success of 85.5%, outperforming individual models trained on expert demonstrations (success of 70.3%). Furthermore, we find that play-supervised models, unlike their expert-trained counterparts, 1) are more robust to perturbations and 2) exhibit retrying-till-success. Finally, despite never being trained with task labels, we find that our agent learns to organize its latent plan space around functional tasks. Videos of the performed experiments are available at learning-from-play.github.io

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01973

PDF

http://arxiv.org/pdf/1903.01973
Read All
Dealing with Qualitative and Quantitative Features in Legal Domains

2019-03-05

Maximiliano C. D. Budán, María Laura Cobo, Diego I. Martínez, Antonino Rotolo

arXiv_AI

arXiv_AI Knowledge Quantitative Relation
Abstract

In this work, we enrich a formalism for argumentation by including a formal characterization of features related to the knowledge, in order to capture proper reasoning in legal domains. We add meta-data information to the arguments in the form of labels representing quantitative and qualitative data about them. These labels are propagated through an argumentative graph according to the relations of support, conflict, and aggregation between arguments.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01966

PDF

https://arxiv.org/pdf/1903.01966
Read All
Complexity Results and Algorithms for Bipolar Argumentation

2019-03-05

Amin Karamlou, Kristijonas Čyras, Francesca Toni

arXiv_AI

arXiv_AI Tracking Relation
Abstract

Bipolar Argumentation Frameworks (BAFs) admit several interpretations of the support relation and diverging definitions of semantics. Recently, several classes of BAFs have been captured as instances of bipolar Assumption-Based Argumentation, a class of Assumption-Based Argumentation (ABA). In this paper, we establish the complexity of bipolar ABA, and consequently of several classes of BAFs. In addition to the standard five complexity problems, we analyse the rarely-addressed extension enumeration problem too. We also advance backtracking-driven algorithms for enumerating extensions of bipolar ABA frameworks, and consequently of BAFs under several interpretations. We prove soundness and completeness of our algorithms, describe their implementation and provide a scalability evaluation. We thus contribute to the study of the as yet uninvestigated complexity problems of (variously interpreted) BAFs as well as of bipolar ABA, and provide the lacking implementations thereof.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01964

PDF

https://arxiv.org/pdf/1903.01964
Read All
The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets

2019-03-05

Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, Dawn Song

arXiv_AI

arXiv_AI Quantitative
Abstract

This paper describes a testing methodology for quantitatively assessing the risk of unintended memorization of rare or unique sequences in generative sequence models—a common type of neural network. Such models are sometimes trained on sensitive data (e.g., the text of users’ private messages); our methodology allows deep-learning practitioners to choose configurations that minimize memorization during training, thereby benefiting privacy. In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, if not addressed during training, we show that new, efficient procedures can allow extracting unique, secret sequences such as credit card numbers from trained models. We also show that our testing strategy is practical and easy-to-apply, e.g., by describing its use for quantitatively preventing data exposure in Smart Compose, a production, commercial neural network trained on millions of users’ email messages.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.08232

PDF

http://arxiv.org/pdf/1802.08232
Read All
Learning Exploration Policies for Navigation

2019-03-05

Tao Chen, Saurabh Gupta, Abhinav Gupta

arXiv_RO

arXiv_RO Attention
Abstract

Numerous past works have tackled the problem of task-driven navigation. But, how to effectively explore a new environment to enable a variety of down-stream tasks has received much less attention. In this work, we study how agents can autonomously explore realistic and complex 3D environments without the context of task-rewards. We propose a learning-based approach and investigate different policy architectures, reward functions, and training paradigms. We find that the use of policies with spatial memory that are bootstrapped with imitation learning and finally finetuned with coverage rewards derived purely from on-board sensors can be effective at exploring novel environments. We show that our learned exploration policies can explore better than classical approaches based on geometry alone and generic learning-based exploration techniques. Finally, we also show how such task-agnostic exploration can be used for down-stream tasks. Code and Videos are available at: https://sites.google.com/view/exploration-for-nav.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01959

PDF

http://arxiv.org/pdf/1903.01959
Read All
TableBank: Table Benchmark for Image-based Table Detection and Recognition

2019-03-05

Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li

arXiv_CV

arXiv_CV Deep_Learning Detection Recognition
Abstract

We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet. Existing research for image-based table detection and recognition usually fine-tunes pre-trained models on out-of-domain data with a few thousands human labeled examples, which is difficult to generalize on real world applications. With TableBank that contains 417K high-quality labeled tables, we build several strong baselines using state-of-the-art models with deep neural networks. We make TableBank publicly available (this https URL) and hope it will empower more deep learning approaches in the table detection and recognition task.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01949

PDF

https://arxiv.org/pdf/1903.01949
Read All
MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation

2019-03-05

Yazan Abu Farha, Juergen Gall

arXiv_CV

arXiv_CV Segmentation CNN Classification Prediction
Abstract

Temporally locating and classifying action segments in long untrimmed videos is of particular interest to many applications like surveillance and robotics. While traditional approaches follow a two-step pipeline, by generating frame-wise probabilities and then feeding them to high-level temporal models, recent approaches use temporal convolutions to directly classify the video frames. In this paper, we introduce a multi-stage architecture for the temporal action segmentation task. Each stage features a set of dilated temporal convolutions to generate an initial prediction that is refined by the next one. This architecture is trained using a combination of a classification loss and a proposed smoothing loss that penalizes over-segmentation errors. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our model achieves state-of-the-art results on three challenging datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01945

PDF

https://arxiv.org/pdf/1903.01945
Read All
An Efficient Production Process for Extracting Salivary Glands from Mosquitoes

2019-03-05

Mariah Schrum, Amanda Canezin, Sumana Chakravarty, Michelle Laskowski, Suat Comert, Yunuscan Sevimli, Gregory S. Chirikjian, Stephen L. Hoffman, Russell H. Taylor

arXiv_RO

arXiv_RO
Abstract

Malaria is the one of the leading causes of morbidity and mortality in many developing countries. The development of a highly effective and readily deployable vaccine represents a major goal for world health. There has been recent progress in developing a clinically effective vaccine manufactured using Plasmodium falciparum sporozoites (PfSPZ) extracted from the salivary glands of Anopheles sp. Mosquitoes. The harvesting of PfSPZ requires dissection of the mosquito and manual removal of the salivary glands from each mosquito by trained technicians. While PfSPZ-based vaccines have shown highly promising results, the process of dissection of salivary glands is tedious and labor intensive. We propose a mechanical device that will greatly increase the rate of mosquito dissection and deskill the process to make malaria vaccines more affordable and more readily available. This device consists of several components: a sorting stage in which the mosquitoes are sorted into slots, a cutting stage in which the heads are removed, and a squeezing stage in which the salivary glands are extracted and collected. This method allows mosquitoes to be dissected twenty at a time instead of one by one as previously done and significantly reduces the dissection time per mosquito.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.02532

PDF

http://arxiv.org/pdf/1903.02532
Read All
O-GAN: Extremely Concise Approach for Auto-Encoding Generative Adversarial Networks

2019-03-05

Jianlin Su

arXiv_CV

arXiv_CV Adversarial GAN
Abstract

In this paper, we propose Orthogonal Generative Adversarial Networks (O-GANs). We decompose the network of discriminator orthogonally and add an extra loss into the objective of common GANs, which can enforce discriminator become an effective encoder. The same extra loss can be embedded into any kind of GANs and there is almost no increase in computation. Furthermore, we discuss the principle of our method, which is relative to the fully-exploiting of the remaining degrees of freedom of discriminator. As we know, our solution is the simplest approach to train a generative adversarial network with auto-encoding ability.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01931

PDF

https://arxiv.org/pdf/1903.01931
Read All
A Deep Learning based approach to VM behavior identification in cloud systems

2019-03-05

Matteo Stefanini, Riccardo Lancellotti, Lorenzo Baraldi, Simone Calderara

arXiv_AI

arXiv_AI CNN Classification Deep_Learning
Abstract

Cloud computing data centers are growing in size and complexity to the point where monitoring and management of the infrastructure become a challenge due to scalability issues. A possible approach to cope with the size of such data centers is to identify VMs exhibiting a similar behavior. Existing literature demonstrated that clustering together VMs that show a similar behavior may improve the scalability of both monitoring andmanagement of a data center. However, available techniques suffer from a trade-off between accuracy and time to achieve this result. Throughout this paper we propose a different approach where, instead of an unsupervised clustering, we rely on classifiers based on deep learning techniques to assigna newly deployed VMs to a cluster of already-known VMs. The two proposed classifiers, namely DeepConv and DeepFFT use a convolution neural network and (in the latter model) exploits Fast Fourier Transformation to classify the VMs. Our proposal is validated using a set of traces describing the behavior of VMs from a realcloud data center. The experiments compare our proposal with state-of-the-art solutions available in literature, demonstrating that our proposal achieve better performance. Furthermore, we show that our solution issignificantly faster than the alternatives as it can produce a perfect classification even with just a few samples of data, making our proposal viable also toclassify on-demand VMs that are characterized by a short life span.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01930

PDF

http://arxiv.org/pdf/1903.01930
Read All
Using a Segmenting Description approach in Multiple Criteria Decision Aiding

2019-03-05

Milosz Kadzinski, Jan Badura, Jose Rui Figueira

arXiv_AI

arXiv_AI Optimization Recommendation
Abstract

We propose a new method for analyzing a set of parameters in a multiple criteria ranking method. Unlike the existing techniques, we do not use any optimization technique, instead incorporating and extending a Segmenting Description approach. While considering a value-based preference disaggregation method, we demonstrate the usefulness of the introduced algorithm in a multi-purpose decision analysis exploiting a system of inequalities that models the Decision Maker’s preferences. Specifically, we discuss how it can be applied for verifying the consistency between the revealed and estimated preferences as well as for identifying the sources of potential incoherence. Moreover, we employ the method for conducting robustness analysis, i.e., discovering a set of all compatible parameter values and verifying the stability of suggested recommendation in view of multiplicity of feasible solutions. In addition, we make clear its suitability for generating arguments about the validity of outcomes and the role of particular criteria. We discuss the favorable characteristics of the Segmenting Description approach which enhance its suitability for use in Multiple Criteria Decision Aiding. These include keeping in memory an entire process of transforming a system of inequalities and avoiding the need for processing the inequalities contained in the basic system which is subsequently enriched with some hypothesis to be verified. The applicability of the proposed method is exemplified on a numerical study.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01923

PDF

https://arxiv.org/pdf/1903.01923
Read All
An approach to Decision Making based on Dynamic Argumentation Systems

2019-03-05

Edgardo Ferretti, Luciano H. Tamargo, Alejandro J. Garcia, Marcelo L. Errecalde, Guillermo R. Simari

arXiv_AI

arXiv_AI Inference Relation
Abstract

In this paper, we introduce a formalism for single-agent decision making that is based on Dynamic Argumentation Frameworks. The formalism can be used to justify a choice, which is based on the current situation the agent is involved. Taking advantage of the inference mechanism of the argumentation formalism, it is possible to consider preference relations and conflicts among the available alternatives for that reasoning. With this formalization, given a particular set of evidence, the justified conclusions supported by warranted arguments will be used by the agent’s decision rules to determine which alternatives will be selected. We also present an algorithm that implements a choice function based on our formalization. Finally, we complete our presentation by introducing formal results that relate the proposed framework with approaches of classical decision theory.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01920

PDF

https://arxiv.org/pdf/1903.01920
Read All
FastReg: Fast Non-Rigid Registration via Accelerated Optimisation on the Manifold of Diffeomorphisms

2019-03-05

Daniel Grzech, Loïc le Folgoc, Mattias P. Heinrich, Bishesh Khanal, Jakub Moll, Julia A. Schnabel, Ben Glocker, Bernhard Kainz

arXiv_CV

arXiv_CV
Abstract

We present a new approach to diffeomorphic non-rigid registration of medical images. The method is based on optical flow and warps images via gradient flow with the standard $L^2$ inner product. To compute the transformation, we rely on accelerated optimisation on the manifold of diffeomorphisms. We achieve regularity properties of Sobolev gradient flows, which are expensive to compute, owing to a novel method of averaging the gradients in time rather than space. We successfully register brain MRI and challenging abdominal CT scans at speeds orders of magnitude faster than previous approaches. We make our code available in a public repository: this https URL

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01905

PDF

https://arxiv.org/pdf/1903.01905
Read All
Language and Dialect Identification of Cuneiform Texts

2019-03-05

Tommi Jauhiainen, Heidi Jauhiainen, Tero Alstola, Krister Lindén

arXiv_CL

arXiv_CL Knowledge
Abstract

This article introduces a corpus of cuneiform texts from which the dataset for the use of the Cuneiform Language Identification (CLI) 2019 shared task was derived as well as some preliminary language identification experiments conducted using that corpus. We also describe the CLI dataset and how it was derived from the corpus. In addition, we provide some baseline language identification results using the CLI dataset. To the best of our knowledge, the experiments detailed here are the first time automatic language identification methods have been used on cuneiform data.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01891

PDF

https://arxiv.org/pdf/1903.01891
Read All
Learning a smooth kernel regularizer for convolutional neural networks

2019-03-05

Reuben Feinman, Brenden M. Lake

arXiv_CV

arXiv_CV Regularization CNN Relation Recognition
Abstract

Modern deep neural networks require a tremendous amount of data to train, often needing hundreds or thousands of labeled examples to learn an effective representation. For these networks to work with less data, more structure must be built into their architectures or learned from previous experience. The learned weights of convolutional neural networks (CNNs) trained on large datasets for object recognition contain a substantial amount of structure. These representations have parallels to simple cells in the primary visual cortex, where receptive fields are smooth and contain many regularities. Incorporating smoothness constraints over the kernel weights of modern CNN architectures is a promising way to improve their sample complexity. We propose a smooth kernel regularizer that encourages spatial correlations in convolution kernel weights. The correlation parameters of this regularizer are learned from previous experience, yielding a method with a hierarchical Bayesian interpretation. We show that our correlated regularizer can help constrain models for visual recognition, improving over an L2 regularization baseline.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01882

PDF

https://arxiv.org/pdf/1903.01882
Read All
Bipolar in Temporal Argumentation Framework

2019-03-05

Maximiliano C. D. Budán, Maria Laura Cobo, Diego C. Martinez, Guillermo R. Simari

arXiv_AI

arXiv_AI Relation
Abstract

A Timed Argumentation Framework (TAF) is a formalism where arguments are only valid for consideration in a given period of time, called availability intervals, which are defined for every individual argument. The original proposal is based on a single, abstract notion of attack between arguments that remains static and permanent in time. Thus, in general, when identifying the set of acceptable arguments, the outcome associated with a TAF will vary over time. In this work we introduce an extension of TAF adding the capability of modeling a support relation between arguments. In this sense, the resulting framework provides a suitable model for different time-dependent issues. Thus, the main contribution here is to provide an enhanced framework for modeling a positive (support) and negative (attack) interaction varying over time, which are relevant in many real-world situations. This leads to a Timed Bipolar Argumentation Framework (T-BAF), where classical argument extensions can be defined. The proposal aims at advancing in the integration of temporal argumentation in different application domain.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01874

PDF

https://arxiv.org/pdf/1903.01874
Read All
TinBiNN: Tiny Binarized Neural Network Overlay in about 5,000 4-LUTs and 5mW

2019-03-05

Guy G.F. Lemieux, Joe Edwards, Joel Vandergriendt, Aaron Severance, Ryan De Iaco, Abdullah Raouf, Hussein Osman, Tom Watzka, Satwant Singh

arXiv_CV

arXiv_CV Object_Detection CNN Inference Detection
Abstract

Reduced-precision arithmetic improves the size, cost, power and performance of neural networks in digital logic. In convolutional neural networks, the use of 1b weights can achieve state-of-the-art error rates while eliminating multiplication, reducing storage and improving power efficiency. The BinaryConnect binary-weighted system, for example, achieves 9.9% error using floating-point activations on the CIFAR-10 dataset. In this paper, we introduce TinBiNN, a lightweight vector processor overlay for accelerating inference computations with 1b weights and 8b activations. The overlay is very small – it uses about 5,000 4-input LUTs and fits into a low cost iCE40 UltraPlus FPGA from Lattice Semiconductor. To show this can be useful, we build two embedded ‘person detector’ systems by shrinking the original BinaryConnect network. The first is a 10-category classifier with a 89% smaller network that runs in 1,315ms and achieves 13.6% error. The other is a 1-category classifier that is even smaller, runs in 195ms, and has only 0.4% error. In both classifiers, the error can be attributed entirely to training and not reduced precision.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06630

PDF

http://arxiv.org/pdf/1903.06630
Read All
An Approach to Characterize Graded Entailment of Arguments through a Label-based Framework

2019-03-05

Maximiliano C. D. Budán, Gerardo I. Simari, Ignacio Viglizzo, Guillermo R. Simari

arXiv_AI

arXiv_AI
Abstract

Argumentation theory is a powerful paradigm that formalizes a type of commonsense reasoning that aims to simulate the human ability to resolve a specific problem in an intelligent manner. A classical argumentation process takes into account only the properties related to the intrinsic logical soundness of an argument in order to determine its acceptability status. However, these properties are not always the only ones that matter to establish the argument’s acceptability—there exist other qualities, such as strength, weight, social votes, trust degree, relevance level, and certainty degree, among others.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01865

PDF

https://arxiv.org/pdf/1903.01865
Read All
Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection

2019-03-05

Zhixin Wang, Kui Jia

arXiv_CV

arXiv_CV Object_Detection Knowledge CNN Detection
Abstract

In this work, we propose a novel method termed Frustum ConvNet (F-ConvNet) for amodal 3D object detection from point clouds. Given 2D region proposals in a RGB image, our method first generates a sequence of frustums for each region proposal, and uses the obtained frustums to group local points. F-ConvNet aggregates point-wise features as frustumlevel feature vectors, and arrays these feature vectors as a feature map for use of its subsequent component of fully convolutional network (FCN), which spatially fuses frustumlevel features and supports an end-to-end and continuous estimation of oriented boxes in the 3D space. We also propose component variants of L-ConvNet, including a FCN variant that extracts multi-resolution frustum features, and a refined use of L-ConvNet over a reduced 3D space. Careful ablation studies verify the efficacy of these component variants. LConvNet assumes no prior knowledge of the working 3D environment, and is thus dataset-agnostic. We present experiments on both the indoor SUN-RGBD and outdoor KITTI datasets. LConvNet outperforms all existing methods on SUN-RGBD, and at the time of submission it outperforms all published works on the KITTI benchmark. We will make the code of L-ConvNet publicly available.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01864

PDF

https://arxiv.org/pdf/1903.01864
Read All
Stochastic Sampling Simulation for Pedestrian Trajectory Prediction

2019-03-05

Cyrus Anderson, Xiaoxiao Du, Ram Vasudevan, Matthew Johnson-Roberson

arXiv_RO

arXiv_RO Adversarial GAN Deep_Learning Prediction
Abstract

Urban environments pose a significant challenge for autonomous vehicles (AVs) as they must safely navigate while in close proximity to many pedestrians. It is crucial for the AV to correctly understand and predict the future trajectories of pedestrians to avoid collision and plan a safe path. Deep neural networks (DNNs) have shown promising results in accurately predicting pedestrian trajectories, relying on large amounts of annotated real-world data to learn pedestrian behavior. However, collecting and annotating these large real-world pedestrian datasets is costly in both time and labor. This paper describes a novel method using a stochastic sampling-based simulation to train DNNs for pedestrian trajectory prediction with social interaction. Our novel simulation method can generate vast amounts of automatically-annotated, realistic, and naturalistic synthetic pedestrian trajectories based on small amounts of real annotation. We then use such synthetic trajectories to train an off-the-shelf state-of-the-art deep learning approach Social GAN (Generative Adversarial Network) to perform pedestrian trajectory prediction. Our proposed architecture, trained only using synthetic trajectories, achieves better prediction results compared to those trained on human-annotated real-world data using the same network. Our work demonstrates the effectiveness and potential of using simulation as a substitution for human annotation efforts to train high-performing prediction algorithms such as the DNNs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.01860

PDF

http://arxiv.org/pdf/1903.01860
Read All
Virtual Ground Truth, and Pre-selection of 3D Interest Points for Improved Repeatability Evaluation of 2D Detectors

2019-03-05

Simon R Lang, Martin H Luerssen, David M Powers

arXiv_CV

arXiv_CV Object_Detection Classification Detection
Abstract

In Computer Vision, finding simple features is performed using classifiers called interest point (IP) detectors, which are often utilised to track features as the scene changes. For 2D based classifiers it has been intuitive to measure repeated point reliability using 2D metrics given the difficulty to establish ground truth beyond 2D. The aim is to bridge the gap between 2D classifiers and 3D environments, and improve performance analysis of 2D IP classification on 3D objects. This paper builds on existing work with 3D scanned and artificial models to test conventional 2D feature detectors with the assistance of virtualised 3D scenes. Virtual space depth is leveraged in tests to perform pre-selection of closest repeatable points in both 2D and 3D contexts before repeatability is measured. This more reliable ground truth is used to analyse testing configurations with a singular and 12 model dataset across affine transforms in x, y and z rotation, as well as x,y scaling with 9 well known IP detectors. The virtual scene’s ground truth demonstrates that 3D pre-selection eliminates a large portion of false positives that are normally considered repeated in 2D configurations. The results indicate that 3D virtual environments can provide assistance in comparing the performance of conventional detectors when extending their applications to 3D environments, and can result in better classification of features when testing prospective classifiers’ performance. A ROC based informedness measure also highlights tradeoffs in 2D/3D performance compared to conventional repeatability measures.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01828

PDF

https://arxiv.org/pdf/1903.01828
Read All
HexagDLy - Processing hexagonally sampled data with CNNs in PyTorch

2019-03-05

Constantin Steppa, Tim Lukas Holch

arXiv_CV

arXiv_CV CNN Deep_Learning
Abstract

HexagDLy is a Python-library extending the PyTorch deep learning framework with convolution and pooling operations on hexagonal grids. It aims to ease the access to convolutional neural networks for applications that rely on hexagonally sampled data as, for example, commonly found in ground-based astroparticle physics experiments.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.01814

PDF

https://arxiv.org/pdf/1903.01814
Read All

132/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL