Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Implicit Generation and Generalization in Energy-Based Models

2019-03-20

Yilun Du, Igor Mordatch

arXiv_CV

arXiv_CV Adversarial GAN Classification
Abstract

Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train. We present techniques to scale MCMC based EBM training, on continuous neural networks, and show its success on the high-dimensional data domains of ImageNet32x32, ImageNet128x128, CIFAR-10, and robotic hand trajectories, achieving significantly better samples than other likelihood models and on par with contemporary GAN approaches, while covering all modes of the data. We highlight unique capabilities of implicit generation, such as energy compositionality and corrupt image reconstruction and completion. Finally, we show that EBMs generalize well and are able to achieve state-of-the-art out-of-distribution classification, exhibit adversarially robust classification, coherent long term predicted trajectory roll-outs, and generate zero-shot compositions of models.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08689

PDF

http://arxiv.org/pdf/1903.08689
Read All
Im2Pencil: Controllable Pencil Illustration from Photographs

2019-03-20

Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, Ming-Hsuan Yang

arXiv_CV

arXiv_CV
Abstract

We propose a high-quality photo-to-pencil translation method with fine-grained control over the drawing style. This is a challenging task due to multiple stroke types (e.g., outline and shading), structural complexity of pencil shading (e.g., hatching), and the lack of aligned training data pairs. To address these challenges, we develop a two-branch model that learns separate filters for generating sketchy outlines and tonal shading from a collection of pencil drawings. We create training data pairs by extracting clean outlines and tonal illustrations from original pencil drawings using image filtering techniques, and we manually label the drawing styles. In addition, our model creates different pencil styles (e.g., line sketchiness and shading style) in a user-controllable manner. Experimental results on different types of pencil drawings show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and user evaluations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08682

PDF

http://arxiv.org/pdf/1903.08682
Read All
Probing the Need for Visual Context in Multimodal Machine Translation

2019-03-20

Ozan Caglayan, Pranava Madhyastha, Lucia Specia, Loïc Barrault

arXiv_CL

arXiv_CL
Abstract

Current work on multimodal machine translation (MMT) has suggested that the visual modality is either unnecessary or only marginally beneficial. We posit that this is a consequence of the very simple, short and repetitive sentences used in the only available dataset for the task (Multi30K), rendering the source text sufficient as context. In the general case, however, we believe that it is possible to combine visual and textual information in order to ground translations. In this paper we probe the contribution of the visual modality to state-of-the-art MMT models by conducting a systematic analysis where we partially deprive the models from source-side textual context. Our results show that under limited textual context, models are capable of leveraging the visual input to generate better translations. This contradicts the current belief that MMT models disregard the visual modality because of either the quality of the image features or the way they are integrated into the model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08678

PDF

http://arxiv.org/pdf/1903.08678
Read All
Hydra: an Ensemble of Convolutional Neural Networks for Geospatial Land Classification

2019-03-20

Rodrigo Minetto, Mauricio Pamplona Segundo, Sudeep Sarkar

arXiv_CV

arXiv_CV CNN Optimization Classification
Abstract

We describe in this paper Hydra, an ensemble of convolutional neural networks (CNN) for geospatial land classification. The idea behind Hydra is to create an initial CNN that is coarsely optimized but provides a good starting pointing for further optimization, which will serve as the Hydra’s body. Then, the obtained weights are fine-tuned multiple times with different augmentation techniques, crop styles, and classes weights to form an ensemble of CNNs that represent the Hydra’s heads. By doing so, we prompt convergence to different endpoints, which is a desirable aspect for ensembles. With this framework, we were able to reduce the training time while maintaining the classification performance of the ensemble. We created ensembles for our experiments using two state-of-the-art CNN architectures, ResNet and DenseNet. We have demonstrated the application of our Hydra framework in two datasets, FMOW and NWPU-RESISC45, achieving results comparable to the state-of-the-art for the former and the best reported performance so far for the latter. Code and CNN models are available at https://github.com/maups/hydra-fmow

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.03518

PDF

http://arxiv.org/pdf/1802.03518
Read All
Online continual learning with no task boundaries

2019-03-20

Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio

arXiv_AI

arXiv_AI Optimization
Abstract

Continual learning is the ability of an agent to learn online with a non-stationary and never-ending stream of data. A key component for such never-ending learning process is to overcome the catastrophic forgetting of previously seen data, a problem that neural networks are well known to suffer from. The solutions developed so far often relax the problem of continual learning to the easier task-incremental setting, where the stream of data is divided into tasks with clear boundaries. In this paper, we break the limits and move to the more challenging online setting where we assume no information of tasks in the data stream. We start from the idea that each learning step should not increase the losses of the previously learned examples through constraining the optimization process. This means that the number of constraints grows linearly with the number of examples, which is a serious limitation. We develop a solution to select a fixed number of constraints that we use to approximate the feasible region defined by the original constraints. We compare our approach against the methods that rely on task boundaries to select a fixed set of examples, and show comparable or even better results, especially when the boundaries are blurry or when the data distributions are imbalanced.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08671

PDF

http://arxiv.org/pdf/1903.08671
Read All
Photometric Mesh Optimization for Video-Aligned 3D Object Reconstruction

2019-03-20

Chen-Hsuan Lin, Oliver Wang, Bryan C. Russell, Eli Shechtman, Vladimir G. Kim, Matthew Fisher, Simon Lucey

arXiv_CV

arXiv_CV Face Optimization
Abstract

In this paper, we address the problem of 3D object mesh reconstruction from RGB videos. Our approach combines the best of multi-view geometric and data-driven methods for 3D reconstruction by optimizing object meshes for multi-view photometric consistency while constraining mesh deformations with a shape prior. We pose this as a piecewise image alignment problem for each mesh face projection. Our approach allows us to update shape parameters from the photometric error without any depth or mask information. Moreover, we show how to avoid a degeneracy of zero photometric gradients via rasterizing from a virtual viewpoint. We demonstrate 3D object mesh reconstruction results from both synthetic and real-world videos with our photometric mesh optimization, which is unachievable with either na"ive mesh generation networks or traditional pipelines of surface reconstruction without heavy manual post-processing.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08642

PDF

http://arxiv.org/pdf/1903.08642
Read All
An Optimal Task Allocation Strategy for Heterogeneous Multi-Robot Systems

2019-03-20

Gennaro Notomista, Siddharth Mayya, Seth Hutchinson, Magnus Egerstedt

arXiv_RO

arXiv_RO
Abstract

For a team of heterogeneous robots executing multiple tasks, we propose a novel algorithm to optimally allocate tasks to robots while accounting for their different capabilities. Motivated by the need that robot teams have in many real-world applications of remaining operational for long periods of time, we allow each robot to choose tasks taking into account the energy consumed by executing them, besides the global specifications on the task allocation. The tasks are encoded as constraints in an energy minimization problem solved at each point in time by each robot. The prioritization of a task over others – effectively signifying the allocation of the task to that particular robot – occurs via the introduction of slack variables in the task constraints. Moreover, the suitabilities of certain robots towards certain tasks are also taken into account to generate a task allocation algorithm for a team of robots with heterogeneous capabilities. The efficacy of the developed approach is demonstrated both in simulation and on a team of real robots.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08641

PDF

http://arxiv.org/pdf/1903.08641
Read All
An Efficient Schmidt-EKF for 3D Visual-Inertial SLAM

2019-03-20

Patrick Geneva, James Maley, Guoquan Huang

arXiv_CV

arXiv_CV Tracking SLAM Relation
Abstract

It holds great implications for practical applications to enable centimeter-accuracy positioning for mobile and wearable sensor systems. In this paper, we propose a novel, high-precision, efficient visual-inertial (VI)-SLAM algorithm, termed Schmidt-EKF VI-SLAM (SEVIS), which optimally fuses IMU measurements and monocular images in a tightly-coupled manner to provide 3D motion tracking with bounded error. In particular, we adapt the Schmidt Kalman filter formulation to selectively include informative features in the state vector while treating them as nuisance parameters (or Schmidt states) once they become matured. This change in modeling allows for significant computational savings by no longer needing to constantly update the Schmidt states (or their covariance), while still allowing the EKF to correctly account for their cross-correlations with the active states. As a result, we achieve linear computational complexity in terms of map size, instead of quadratic as in the standard SLAM systems. In order to fully exploit the map information to bound navigation drifts, we advocate efficient keyframe-aided 2D-to-2D feature matching to find reliable correspondences between current 2D visual measurements and 3D map features. The proposed SEVIS is extensively validated in both simulations and experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08636

PDF

http://arxiv.org/pdf/1903.08636
Read All
Closed-form Preintegration Methods for Graph-based Visual-Inertial Navigation

2019-03-20

Kevin Eckenhoff, Patrick Geneva, Guoquan Huang

arXiv_RO

arXiv_RO Optimization
Abstract

In this paper we propose a new analytical preintegration theory for graph-based sensor fusion with an inertial measurement unit (IMU) and a camera (or other aiding sensors).Rather than using discrete sampling of the measurement dynamics as in current methods,we derive the closed-form solutions to the preintegration equations, yielding improved accuracy in state estimation.We advocate two new different inertial models for preintegration: (i) the model that assumes piecewise constant measurements, and (ii) the model that assumes piecewise constant local true acceleration.We show through extensive Monte-Carlo simulations the effect that the choice of preintegration model has on estimation performance.To validate the proposed preintegration theory, we develop both direct and indirect visual-inertial navigation systems (VINS) that leverage our preintegration.In the first, within a tightly-coupled, sliding-window optimization framework, we jointly estimate the features in the window and the IMU states while performing marginalization to bound the computational cost.In the second, we loosely-couple the IMU preintegration with a direct image alignment that estimates relative camera motion by minimizing the photometric errors (i.e., image intensity difference), allowing for efficient and informative loop closures. Both systems are extensively validated in real-world experiments and are shown to offer competitive performance to state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.02774

PDF

http://arxiv.org/pdf/1805.02774
Read All
Plug and play methods for magnetic resonance imaging

2019-03-20

Rizwan Ahmad, Charles A. Bouman, Gregery T. Buzzard, Stanley Chan, Edward T. Reehorst, Philip Schniter

arXiv_CV

arXiv_CV Review Deep_Learning
Abstract

Magnetic Resonance Imaging (MRI) is a non-invasive diagnostic tool that provides excellent soft-tissue contrast without the use of ionizing radiation. But, compared to other clinical imaging modalities (e.g., CT or ultrasound), the data acquisition process for MRI is inherently slow. Furthermore, dynamic applications demand collecting a series of images in quick succession. As a result, reducing acquisition time and improving imaging quality for undersampled datasets have been active areas of research for the last two decades. The combination of parallel imaging and compressive sensing (CS) has been shown to benefit a wide range of MRI applications. More recently, deep learning techniques have been shown to outperform CS methods. Some of these techniques pose the MRI reconstruction as a direct inversion problem and tackle it by training a deep neural network (DNN) to map from the measured Fourier samples and the final image. Considering that the forward model in MRI changes from one dataset to the next, such methods have to be either trained over a large and diverse corpus of data or limited to a specific application, and even then they cannot ensure data consistency. An alternative is to use “plug-and-play” (PnP) algorithms, which iterate image denoising with forward-model based signal recovery. PnP algorithms are an excellent fit for compressive MRI because they decouple image modeling from the forward model, which can change significantly among different scans due to variations in the coil sensitivity maps, sampling patterns, and image resolution. Consequently, with PnP, state-of-the-art image-denoising techniques, such as those based on DNNs, can be directly exploited for compressive MRI image reconstruction. The objective of this article is two-fold: i) to review recent advances in plug-and-play methods, and ii) to discuss their application to compressive MRI image reconstruction.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08616

PDF

http://arxiv.org/pdf/1903.08616
Read All
Engaging Image Captioning Via Personality

2019-03-20

Kurt Shuster, Samuel Humeau, Hexiang Hu, Antoine Bordes, Jason Weston

arXiv_CV

arXiv_CV Image_Caption Caption
Abstract

Standard image captioning tasks such as COCO and Flickr30k are factual, neutral in tone and (to a human) state the obvious (e.g., “a man playing a guitar”). While such tasks are useful to verify that a machine understands the content of an image, they are not engaging to humans as captions. With this in mind we define a new task, Personality-Captions, where the goal is to be as engaging to humans as possible by incorporating controllable style and personality traits. We collect and release a large dataset of 201,858 of such captions conditioned over 215 possible traits. We build models that combine existing work from (i) sentence representations (Mazare et al., 2018) with Transformers trained on 1.7 billion dialogue examples; and (ii) image representations (Mahajan et al., 2018) with ResNets trained on 3.5 billion social media images. We obtain state-of-the-art performance on Flickr30k and COCO, and strong performance on our new task. Finally, online evaluations validate that our task and models are engaging to humans, with our best model close to human performance.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1810.10665

PDF

https://arxiv.org/pdf/1810.10665
Read All
Reinforcing Classical Planning for Adversary Driving Scenarios

2019-03-20

Nazmus Sakib, Hengshuai Yao, Hong Zhang

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning
Abstract

Adversary scenarios in driving, where the other vehicles may make mistakes or have a competing or malicious intent, have to be studied not only for our safety but also for addressing the concerns from public in order to push the technology forward. Classical planning solutions for adversary driving do not exist so far, especially when the vehicles do not communicate their intent. Given recent success in solving hard problems in artificial intelligence (AI), it is worth studying the potential of reinforcement learning for safety driving in adversary settings. In most recent reinforcement learning applications, there is a deep neural networks that maps an input state to an optimal policy over primitive actions. However, learning a policy over primitive actions is very difficult and inefficient. On the other hand, the knowledge already learned in classical planning methods should be inherited and reused. In order to take advantage of reinforcement learning good at exploring the action space for safety and classical planning skill models good at handling most driving scenarios, we propose to learn a policy over an action space of primitive actions augmented with classical planning methods. We show two advantages by doing so. First, training this reinforcement learning agent is easier and faster than training the primitive-action agent. Second, our new agent outperforms the primitive-action reinforcement learning agent, human testers as well as the classical planning methods that our agent queries as skills.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08606

PDF

http://arxiv.org/pdf/1903.08606
Read All
Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet

2019-03-20

Wieland Brendel, Matthias Bethge

arXiv_CV

arXiv_CV Classification Deep_Learning
Abstract

Deep Neural Networks (DNNs) excel on many complex perceptual tasks but it has proven notoriously difficult to understand how they reach their decisions. We here introduce a high-performance DNN architecture on ImageNet whose decisions are considerably easier to explain. Our model, a simple variant of the ResNet-50 architecture called BagNet, classifies an image based on the occurrences of small local image features without taking into account their spatial ordering. This strategy is closely related to the bag-of-feature (BoF) models popular before the onset of deep learning and reaches a surprisingly high accuracy on ImageNet (87.6% top-5 for 33 x 33 px features and Alexnet performance for 17 x 17 px features). The constraint on local features makes it straight-forward to analyse how exactly each part of the image influences the classification. Furthermore, the BagNets behave similar to state-of-the art deep neural networks such as VGG-16, ResNet-152 or DenseNet-169 in terms of feature sensitivity, error distribution and interactions between image parts. This suggests that the improvements of DNNs over previous bag-of-feature classifiers in the last few years is mostly achieved by better fine-tuning rather than by qualitatively different decision strategies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00760

PDF

http://arxiv.org/pdf/1904.00760
Read All
DC-SPP-YOLO: Dense Connection and Spatial Pyramid Pooling Based YOLO for Object Detection

2019-03-20

Zhanchao Huang, Jianlin Wang

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

Although YOLOv2 approach is extremely fast on object detection; its backbone network has the low ability on feature extraction and fails to make full use of multi-scale local region features, which restricts the improvement of object detection accuracy. Therefore, this paper proposed a DC-SPP-YOLO (Dense Connection and Spatial Pyramid Pooling Based YOLO) approach for ameliorating the object detection accuracy of YOLOv2. Specifically, the dense connection of convolution layers is employed in the backbone network of YOLOv2 to strengthen the feature extraction and alleviate the vanishing-gradient problem. Moreover, an improved spatial pyramid pooling is introduced to pool and concatenate the multi-scale local region features, so that the network can learn the object features more comprehensively. The DC-SPP-YOLO model is established and trained based on a new loss function composed of mean square error and cross entropy, and the object detection is realized. Experiments demonstrate that the mAP (mean Average Precision) of DC-SPP-YOLO proposed on PASCAL VOC datasets and UA-DETRAC datasets is higher than that of YOLOv2; the object detection accuracy of DC-SPP-YOLO is superior to YOLOv2 by strengthening feature extraction and using the multi-scale local region features.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08589

PDF

http://arxiv.org/pdf/1903.08589
Read All
A Polynomial-time Solution for Robust Registration with Extreme Outlier Rates

2019-03-20

Heng Yang, Luca Carlone

arXiv_CV

arXiv_CV Optimization
Abstract

We propose a robust approach for the registration of two sets of 3D points in the presence of a large amount of outliers. Our first contribution is to reformulate the registration problem using a Truncated Least Squares (TLS) cost that makes the estimation insensitive to a large fraction of spurious point-to-point correspondences. The second contribution is a general framework to decouple rotation, translation, and scale estimation, which allows solving in cascade for the three transformations. Since each subproblem (scale, rotation, and translation estimation) is still non-convex and combinatorial in nature, out third contribution is to show that (i) TLS scale and (component-wise) translation estimation can be solved exactly and in polynomial time via an adaptive voting scheme, (ii) TLS rotation estimation can be relaxed to a semidefinite program and the relaxation is tight in practice, even in presence of an extreme amount of outliers. We validate the proposed algorithm, named TEASER (Truncated least squares Estimation And SEmidefinite Relaxation), in standard registration benchmarks showing that the algorithm outperforms RANSAC and robust local optimization techniques, and favorably compares with Branch-and-Bound methods, while being a polynomial-time algorithm. TEASER can tolerate up to 99% outliers and returns highly-accurate solutions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08588

PDF

http://arxiv.org/pdf/1903.08588
Read All
Data Augmentation for Leaf Segmentation and Counting Tasks in Rosette Plants

2019-03-20

Dmitry Kuznichov, Alon Zvirin, Yaron Honen, Ron Kimmel

arXiv_CV

arXiv_CV Segmentation Classification Deep_Learning
Abstract

Deep learning techniques involving image processing and data analysis are constantly evolving. Many domains adapt these techniques for object segmentation, instantiation and classification. Recently, agricultural industries adopted those techniques in order to bring automation to farmers around the globe. One analysis procedure required for automatic visual inspection in this domain is leaf count and segmentation. Collecting labeled data from field crops and greenhouses is a complicated task due to the large variety of crops, growth seasons, climate changes, phenotype diversity, and more, especially when specific learning tasks require a large amount of labeled data for training. Data augmentation for training deep neural networks is well established, examples include data synthesis, using generative semi-synthetic models, and applying various kinds of transformations. In this paper we propose a method that preserves the geometric structure of the data objects, thus keeping the physical appearance of the data-set as close as possible to imaged plants in real agricultural scenes. The proposed method provides state of the art results when applied to the standard benchmark in the field, namely, the ongoing Leaf Segmentation Challenge hosted by Computer Vision Problems in Plant Phenotyping.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08583

PDF

http://arxiv.org/pdf/1903.08583
Read All
Single Image Deraining: A Comprehensive Benchmark Analysis

2019-03-20

Siyuan Li, Iago Breno Araujo, Wenqi Ren, Zhangyang Wang, Eric K. Tokuda, Roberto Hirata Junior, Roberto Cesar-Junior, Jiawan Zhang, Xiaojie Guo, Xiaochun Cao

arXiv_CV

arXiv_CV
Abstract

We present a comprehensive study and evaluation of existing single image deraining algorithms, using a new large-scale benchmark consisting of both synthetic and real-world rainy images.This dataset highlights diverse data sources and image contents, and is divided into three subsets (rain streak, rain drop, rain and mist), each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics, to no-reference metrics, to subjective evaluation and the novel task-driven evaluation. Experiments on the dataset shed light on the comparisons and limitations of state-of-the-art deraining algorithms, and suggest promising future directions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08558

PDF

http://arxiv.org/pdf/1903.08558
Read All
OCGAN: One-class Novelty Detection Using GANs with Constrained Latent Representations

2019-03-20

Pramuditha Perera, Ramesh Nallapati, Bing Xiang

arXiv_CV

arXiv_CV Adversarial GAN Detection
Abstract

We present a novel model called OCGAN for the classical problem of one-class novelty detection, where, given a set of examples from a particular class, the goal is to determine if a query example is from the same class. Our solution is based on learning latent representations of in-class examples using a denoising auto-encoder network. The key contribution of our work is our proposal to explicitly constrain the latent space to exclusively represent the given class. In order to accomplish this goal, firstly, we force the latent space to have bounded support by introducing a tanh activation in the encoder’s output layer. Secondly, using a discriminator in the latent space that is trained adversarially, we ensure that encoded representations of in-class examples resemble uniform random samples drawn from the same bounded space. Thirdly, using a second adversarial discriminator in the input space, we ensure all randomly drawn latent samples generate examples that look real. Finally, we introduce a gradient-descent based sampling technique that explores points in the latent space that generate potential out-of-class examples, which are fed back to the network to further train it to generate in-class examples from those points. The effectiveness of the proposed method is measured across four publicly available datasets using two one-class novelty detection protocols where we achieve state-of-the-art results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08550

PDF

http://arxiv.org/pdf/1903.08550
Read All
Convolutional Sparse Coding for Compressed Sensing CT Reconstruction

2019-03-20

Peng Bao, Wenjun Xia, Kang Yang, Weiyan Chen, Mianyi Chen, Yan Xi, Shanzhou Niu, Jiliu Zhou, He Zhang, Huaiqiang Sun, Zhangyang Wang, Yi Zhang

arXiv_CV

arXiv_CV Sparse CNN Quantitative
Abstract

Over the past few years, dictionary learning (DL)-based methods have been successfully used in various image reconstruction problems. However, traditional DL-based computed tomography (CT) reconstruction methods are patch-based and ignore the consistency of pixels in overlapped patches. In addition, the features learned by these methods always contain shifted versions of the same features. In recent years, convolutional sparse coding (CSC) has been developed to address these problems. In this paper, inspired by several successful applications of CSC in the field of signal processing, we explore the potential of CSC in sparse-view CT reconstruction. By directly working on the whole image, without the necessity of dividing the image into overlapped patches in DL-based methods, the proposed methods can maintain more details and avoid artifacts caused by patch aggregation. With predetermined filters, an alternating scheme is developed to optimize the objective function. Extensive experiments with simulated and real CT data were performed to validate the effectiveness of the proposed methods. Qualitative and quantitative results demonstrate that the proposed methods achieve better performance than several existing state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08549

PDF

http://arxiv.org/pdf/1903.08549
Read All
Dilated deeply supervised networks for hippocampus segmentation in MRI

2019-03-20

Lukas Folle, Sulaiman Vesal, Nishant Ravikumar, Andreas Maier

arXiv_AI

arXiv_AI Segmentation CNN Semantic_Segmentation
Abstract

Tissue loss in the hippocampi has been heavily correlated with the progression of Alzheimer’s Disease (AD). The shape and structure of the hippocampus are important factors in terms of early AD diagnosis and prognosis by clinicians. However, manual segmentation of such subcortical structures in MR studies is a challenging and subjective task. In this paper, we investigate variants of the well known 3D U-Net, a type of convolution neural network (CNN) for semantic segmentation tasks. We propose an alternative form of the 3D U-Net, which uses dilated convolutions and deep supervision to incorporate multi-scale information into the model. The proposed method is evaluated on the task of hippocampus head and body segmentation in an MRI dataset, provided as part of the MICCAI 2018 segmentation decathlon challenge. The experimental results show that our approach outperforms other conventional methods in terms of different segmentation accuracy metrics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09097

PDF

http://arxiv.org/pdf/1903.09097
Read All
Learning Convolutional Transforms for Lossy Point Cloud Geometry Compression

2019-03-20

Maurice Quach, Giuseppe Valenzise, Frederic Dufaux

arXiv_CV

arXiv_CV Face CNN Optimization Classification
Abstract

Efficient point cloud compression is fundamental to enable the deployment of virtual and mixed reality applications, since the number of points to code can range in the order of millions. In this paper, we present a novel data-driven geometry compression method for static point clouds based on learned convolutional transforms and uniform quantization. We perform joint optimization of both rate and distortion using a trade-off parameter. In addition, we cast the decoding process as a binary classification of the point cloud occupancy map. Our method outperforms the MPEG reference solution in terms of rate-distortion on the Microsoft Voxelized Upper Bodies dataset with 51.5% BDBR savings on average. Moreover, while octree-based methods face exponential diminution of the number of points at low bitrates, our method still produces high resolution outputs even at low bitrates.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08548

PDF

http://arxiv.org/pdf/1903.08548
Read All
Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning

2019-03-20

Sandy H. Huang, Martina Zambelli, Jackie Kay, Murilo F. Martins, Yuval Tassa, Patrick M. Pilarski, Raia Hadsell

arXiv_RO

arXiv_RO Sparse Reinforcement_Learning
Abstract

Robots must know how to be gentle when they need to interact with fragile objects, or when the robot itself is prone to wear and tear. We propose an approach that enables deep reinforcement learning to train policies that are gentle, both during exploration and task execution. In a reward-based learning environment, a natural approach involves augmenting the (task) reward with a penalty for non-gentleness, which can be defined as excessive impact force. However, augmenting with only this penalty impairs learning: policies get stuck in a local optimum which avoids all contact with the environment. Prior research has shown that combining auxiliary tasks or intrinsic rewards can be beneficial for stabilizing and accelerating learning in sparse-reward domains, and indeed we find that introducing a surprise-based intrinsic reward does avoid the no-contact failure case. However, we show that a simple dynamics-based surprise is not as effective as penalty-based surprise. Penalty-based surprise, based on predicting forceful contacts, has a further benefit: it encourages exploration which is contact-rich yet gentle. We demonstrate the effectiveness of the approach using a complex, tendon-powered robot hand with tactile sensors. Videos are available at this http URL

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08542

PDF

http://arxiv.org/pdf/1903.08542
Read All
Segmentation-Based Deep-Learning Approach for Surface-Defect Detection

2019-03-20

Domen Tabernik, Samo Šela, Jure Skvarč, Danijel Skočaj

arXiv_CV

arXiv_CV Segmentation Face Detection
Abstract

Automated surface-anomaly detection using machine learning has become an interesting and promising area of research, with a very high and direct impact on the application domain of visual inspection. Deep-learning methods have become the most suitable approaches for this task. They allow the inspection system to learn to detect the surface anomaly by simply showing it a number of exemplar images. This paper presents a segmentation-based deep-learning architecture that is designed for the detection and segmentation of surface anomalies and is demonstrated on a specific domain of surface-crack detection. The design of the architecture enables the model to be trained using a small number of samples, which is an important requirement for practical applications. The proposed model is compared with the related deep-learning methods, including the state-of-the-art commercial software, showing that the proposed approach outperforms the related methods on the domain of a surface-crack detection. The large number of experiments also shed light on the required precision of the annotation, the number of required training samples and on the required computational cost. Experiments are performed on a newly created dataset based on a real-world quality control case and demonstrate that the proposed approach is able to learn on a small number of defected surfaces, using only approximately 25-30 defective training samples, instead of hundreds or thousands, which is usually the case in deep-learning applications. This makes the deep-learning method practical for use in industry where the number of available defective samples is limited. The dataset is also made publicly available to encourage the development and evaluation of new methods for surface-defect detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08536

PDF

http://arxiv.org/pdf/1903.08536
Read All
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set

2019-03-20

Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, Xin Tong

arXiv_CV

arXiv_CV Face Deep_Learning
Abstract

Recently, deep learning based 3D face reconstruction methods have shown promising results in both quality and efficiency.However, training deep neural networks typically requires a large volume of data, whereas face images with ground-truth 3D face shapes are scarce. In this paper, we propose a novel deep 3D face reconstruction approach that 1) leverages a robust, hybrid loss function for weakly-supervised learning which takes into account both low-level and perception-level information for supervision, and 2) performs multi-image face reconstruction by exploiting complementary information from different images for shape aggregation. Our method is fast, accurate, and robust to occlusion and large pose. We provide comprehensive experiments on three datasets, systematically comparing our method with fifteen recent methods and demonstrating its state-of-the-art performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08527

PDF

http://arxiv.org/pdf/1903.08527
Read All
Ontology of Card Sleights

2019-03-20

Aaron Sterling

arXiv_AI

arXiv_AI Knowledge Face Ontology
Abstract

We present a machine-readable movement writing for sleight-of-hand moves with cards – a “Labanotation of card magic.” This scheme of movement writing contains 440 categories of motion, and appears to taxonomize all card sleights that have appeared in over 1500 publications. The movement writing is axiomatized in $\mathcal{SROIQ}$(D) Description Logic, and collected formally as an Ontology of Card Sleights, a computational ontology that extends the Basic Formal Ontology and the Information Artifact Ontology. The Ontology of Card Sleights is implemented in OWL DL, a Description Logic fragment of the Web Ontology Language. While ontologies have historically been used to classify at a less granular level, the algorithmic nature of card tricks allows us to transcribe a performer’s actions step by step. We conclude by discussing design criteria we have used to ensure the ontology can be accessed and modified with a simple click-and-drag interface. This may allow database searches and performance transcriptions by users with card magic knowledge, but no ontology background.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08523

PDF

http://arxiv.org/pdf/1903.08523
Read All
Augmenting expert detection of early coronary artery occlusion from 12 lead electrocardiograms using deep learning

2019-03-20

Rob Brisk, Raymond R Bond. Dewar D Finlay, James McLaughlin, Alicja Piadlo, Stephen J Leslie, David E Gossman, Ian B A Menown, David J McEneaney

arXiv_AI

arXiv_AI Deep_Learning Detection
Abstract

Early diagnosis of acute coronary artery occlusion based on electrocardiogram (ECG) findings is essential for prompt delivery of primary percutaneous coronary intervention. Current ST elevation (STE) criteria are specific but insensitive. Consequently, it is likely that many patients are missing out on potentially life-saving treatment. Experts combining non-specific ECG changes with STE detect ischaemia with higher sensitivity, but at the cost of specificity. We show that a deep learning model can detect ischaemia caused by acute coronary artery occlusion with a better balance of sensitivity and specificity than STE criteria, existing computerised analysers or expert cardiologists.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04421

PDF

http://arxiv.org/pdf/1903.04421
Read All
Artificial Intelligence : from Research to Application ; the Upper-Rhine Artificial Intelligence Symposium

2019-03-20

Andreas Christ, Franz Quint (eds.)

arXiv_AI

arXiv_AI Knowledge
Abstract

The TriRhenaTech alliance universities and their partners presented their competences in the field of artificial intelligence and their cross-border cooperations with the industry at the tri-national conference ‘Artificial Intelligence : from Research to Application’ on March 13th, 2019 in Offenburg. The TriRhenaTech alliance is a network of universities in the Upper Rhine Trinational Metropolitan Region comprising of the German universities of applied sciences in Furtwangen, Kaiserslautern, Karlsruhe, and Offenburg, the Baden-Wuerttemberg Cooperative State University Loerrach, the French university network Alsace Tech (comprised of 14 ‘grandes 'ecoles’ in the fields of engineering, architecture and management) and the University of Applied Sciences and Arts Northwestern Switzerland. The alliance’s common goal is to reinforce the transfer of knowledge, research, and technology, as well as the cross-border mobility of students.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08495

PDF

http://arxiv.org/pdf/1903.08495
Read All
CaosDB - Research Data Management for Complex, Changing, and Automated Research Workflows

2019-03-20

Timm Fitschen, Alexander Schlemmer, Daniel Hornung, Henrik tom Wörden, Ulrich Parlitz, Stefan Luther

arXiv_AI

arXiv_AI Face
Abstract

Here we present CaosDB, a Research Data Management System (RDMS) designed to ensure seamless integration of inhomogeneous data sources and repositories of legacy data. Its primary purpose is the management of data from biomedical sciences, both from simulations and experiments during the complete research data lifecycle. An RDMS for this domain faces particular challenges: Research data arise in huge amounts, from a wide variety of sources, and traverse a highly branched path of further processing. To be accepted by its users, an RDMS must be built around workflows of the scientists and practices and thus support changes in workflow and data structure. Nevertheless it should encourage and support the development and observation of standards and furthermore facilitate the automation of data acquisition and processing with specialized software. The storage data model of an RDMS must reflect these complexities with appropriate semantics and ontologies while offering simple methods for finding, retrieving, and understanding relevant data. We show how CaosDB responds to these challenges and give an overview of the CaosDB Server, its data model and its easy-to-learn CaosDB Query Language. We briefly discuss the status of the implementation, how we currently use CaosDB, and how we plan to use and extend it.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.07653

PDF

http://arxiv.org/pdf/1801.07653
Read All
Three-dimensional Segmentation of Trees Through a Flexible Multi-Class Graph Cut Algorithm

2019-03-20

Jonathan Williams, Carola-Bibiane Schönlieb, Tom Swinfield, Juheon Lee, Xiaohao Cai, Lan Qie, David A. Coomes

arXiv_CV

arXiv_CV Knowledge Segmentation Tracking Detection
Abstract

Developing a robust algorithm for automatic individual tree crown (ITC) detection from laser scanning datasets is important for tracking the responses of trees to anthropogenic change. Such approaches allow the size, growth and mortality of individual trees to be measured, enabling forest carbon stocks and dynamics to be tracked and understood. Many algorithms exist for structurally simple forests including coniferous forests and plantations. Finding a robust solution for structurally complex, species-rich tropical forests remains a challenge; existing segmentation algorithms often perform less well than simple area-based approaches when estimating plot-level biomass. Here we describe a Multi-Class Graph Cut (MCGC) approach to tree crown delineation. This uses local three-dimensional geometry and density information, alongside knowledge of crown allometries, to segment individual tree crowns from LiDAR point clouds. Our approach robustly identifies trees in the top and intermediate layers of the canopy, but cannot recognise small trees. From these three-dimensional crowns, we are able to measure individual tree biomass. Comparing these estimates to those from permanent inventory plots, our algorithm is able to produce robust estimates of hectare-scale carbon density, demonstrating the power of ITC approaches in monitoring forests. The flexibility of our method to add additional dimensions of information, such as spectral reflectance, make this approach an obvious avenue for future development and extension to other sources of three-dimensional data, such as structure from motion datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08481

PDF

http://arxiv.org/pdf/1903.08481
Read All
Deep Octonion Networks

2019-03-20

Jiasong Wu, Ling Xu, Youyong Kong, Lotfi Senhadji, Huazhong Shu

arXiv_CV

arXiv_CV Attention Image_Classification Classification Deep_Learning
Abstract

Deep learning is a research hot topic in the field of machine learning. Real-value neural networks (Real NNs), especially deep real networks (DRNs), have been widely used in many research fields. In recent years, the deep complex networks (DCNs) and the deep quaternion networks (DQNs) have attracted more and more attentions. The octonion algebra, which is an extension of complex algebra and quaternion algebra, can provide more efficient and compact expression. This paper constructs a general framework of deep octonion networks (DONs) and provides the main building blocks of DONs such as octonion convolution, octonion batch normalization and octonion weight initialization; DONs are then used in image classification tasks for CIFAR-10 and CIFAR-100 data sets. Compared with the DRNs, the DCNs, and the DQNs, the proposed DONs have better convergence and higher classification accuracy. The success of DONs is also explained by multi-task learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08478

PDF

http://arxiv.org/pdf/1903.08478
Read All
Combining Coarse and Fine Physics for Manipulation using Parallel-in-Time Integration

2019-03-20

Wisdom C. Agboh, Daniel Ruprecht, Mehmet R. Dogar

arXiv_RO

arXiv_RO Optimization Prediction
Abstract

We present a method for fast and accurate physics-based predictions during non-prehensile manipulation planning and control. Given an initial state and a sequence of controls, the problem of predicting the resulting sequence of states is a key component of a variety of model-based planning and control algorithms. We propose combining a coarse (i.e. computationally cheap but not very accurate) predictive physics model, with a fine (i.e. computationally expensive but accurate) predictive physics model, to generate a hybrid model that is at the required speed and accuracy for a given manipulation task. Our approach is based on the Parareal algorithm, a parallel-in-time integration method used for computing numerical solutions for general systems of ordinary differential equations. We use Parareal to combine a coarse pushing model with an off-the-shelf physics engine to deliver physics-based predictions that are as accurate as the physics engine but runs in substantially less wall-clock time, thanks to Parareal being amenable to parallelization. We use these physics-based predictions in a model-predictive-control framework based on trajectory optimization, to plan pushing actions that avoid an obstacle and reach a goal location. We show that by combining the two physics models, we can achieve the same success rates as the planner that uses the off-the-shelf physics engine directly, but significantly faster. We present experiments in simulation and on a real robotic setup.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08470

PDF

http://arxiv.org/pdf/1903.08470
Read All
In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images

2019-03-20

Marin Oršić, Ivan Krešo, Petra Bevandić, Siniša Šegvić

arXiv_CV

arXiv_CV Segmentation Drone Semantic_Segmentation Prediction Recognition
Abstract

Recent success of semantic segmentation approaches on demanding road driving datasets has spurred interest in many related application fields. Many of these applications involve real-time prediction on mobile platforms such as cars, drones and various kinds of robots. Real-time setup is challenging due to extraordinary computational complexity involved. Many previous works address the challenge with custom lightweight architectures which decrease computational complexity by reducing depth, width and layer capacity with respect to general purpose architectures. We propose an alternative approach which achieves a significantly better performance across a wide range of computing budgets. First, we rely on a light-weight general purpose architecture as the main recognition engine. Then, we leverage light-weight upsampling with lateral connections as the most cost-effective solution to restore the prediction resolution. Finally, we propose to enlarge the receptive field by fusing shared features at multiple resolutions in a novel fashion. Experiments on several road driving datasets show a substantial advantage of the proposed approach, either with ImageNet pre-trained parameters or when we learn from scratch. Our Cityscapes test submission entitled SwiftNetRN-18 delivers 75.5% MIoU and achieves 39.9Hz on 1024x2048 images on GTX1080Ti.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08469

PDF

http://arxiv.org/pdf/1903.08469
Read All
Neural Speed Reading with Structural-Jump-LSTM

2019-03-20

Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

arXiv_CL

arXiv_CL Inference RNN
Abstract

Recurrent neural networks (RNNs) can model natural language by sequentially ‘reading’ input tokens and outputting a distributed representation of each token. Due to the sequential nature of RNNs, inference time is linearly dependent on the input length, and all inputs are read regardless of their importance. Efforts to speed up this inference, known as ‘neural speed reading’, either ignore or skim over part of the input. We present Structural-Jump-LSTM: the first neural speed reading model to both skip and jump text during inference. The model consists of a standard LSTM and two agents: one capable of skipping single words when reading, and one capable of exploiting punctuation structure (sub-sentence separators (,:), sentence end symbols (.!?), or end of text markers) to jump ahead after reading a word. A comprehensive experimental evaluation of our model against all five state-of-the-art neural reading models shows that Structural-Jump-LSTM achieves the best overall floating point operations (FLOP) reduction (hence is faster), while keeping the same accuracy or even improving it compared to a vanilla LSTM that reads the whole text.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00761

PDF

http://arxiv.org/pdf/1904.00761
Read All
Smart Edition of MIDI Files

2019-03-20

Pierre Roy, Francois Pachet

arXiv_SD

arXiv_SD Relation
Abstract

We address the issue of editing musical performance data, in particular MIDI files representing human musical performances. Editing such sequences raises specific issues due to the ambiguous nature of musical objects. The first source of ambiguity is that musicians naturally produce many deviations from the metrical frame. These deviations may be intentional or subconscious, but they play an important role in conveying the groove or feeling of a performance. Relations between musical elements are also usually implicit, creating even more ambiguity. A note is in relation with the surrounding notes in many possible ways: it can be part of a melodic pattern, it can also play a harmonic role with the simultaneous notes, or be a pedal-tone. All these aspects play an essential role that should be preserved, as much as possible, when editing musical sequences. In this paper, we contribute specifically to the problem of editing non-quantized, metrical musical sequences represented as MIDI files. We first list of number of problems caused by the use of naive edition operations applied to performance data, using a motivating example. We then introduce a model, called Dancing MIDI, based on 1) two desirable, well-defined properties for edit operations and 2) two well-defined operations, Split and Concat, with an implementation. We show that our model formally satisfies the two properties, and that it prevents most of the problems that occur with naive edit operations on our motivating example, as well as on a real-world example using an automatic harmonizer.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08459

PDF

http://arxiv.org/pdf/1903.08459
Read All
On Class Imbalance and Background Filtering in Visual Relationship Detection

2019-03-20

Alessio Sarullo, Tingting Mu

arXiv_CV

arXiv_CV Detection Relation
Abstract

In this paper we investigate the problems of class imbalance and irrelevant relationships in Visual Relationship Detection (VRD). State-of-the-art deep VRD models still struggle to predict uncommon classes, limiting their applicability. Moreover, many methods are incapable of properly filtering out background relationships while predicting relevant ones. Although these problems are very apparent, they have both been overlooked so far. We analyse why this is the case and propose modifications to both model and training to alleviate the aforementioned issues, as well as suggesting new measures to complement existing ones and give a more holistic picture of the efficacy of a model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08456

PDF

http://arxiv.org/pdf/1903.08456
Read All
Extracting Frequent Gradual Patterns Using Constraints Modeling

2019-03-20

Jerry Lonlac, Saïdd Jabbour, Engelbert Mephu Nguifo, Lakhdar Saïs, Badran Raddaoui

arXiv_AI

arXiv_AI
Abstract

In this paper, we propose a constraint-based modeling approach for the problem of discovering frequent gradual patterns in a numerical dataset. This SAT-based declarative approach offers an additional possibility to benefit from the recent progress in satisfiability testing and to exploit the efficiency of modern SAT solvers for enumerating all frequent gradual patterns in a numerical dataset. Our approach can easily be extended with extra constraints, such as temporal constraints in order to extract more specific patterns in a broad range of gradual patterns mining applications. We show the practical feasibility of our SAT model by running experiments on two real world datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08452

PDF

http://arxiv.org/pdf/1903.08452
Read All
Decay-Function-Free Time-Aware Attention to Context and Speaker Indicator for Spoken Language Understanding

2019-03-20

Jonggu Kim, Jong-Hyeok Lee

arXiv_CL

arXiv_CL Salient Attention Tracking
Abstract

To capture salient contextual information for spoken language understanding (SLU) of a dialogue, we propose time-aware models that automatically learn the latent time-decay function of the history without a manual time-decay function. We also propose a method to identify and label the current speaker to improve the SLU accuracy. In experiments on the benchmark dataset used in Dialog State Tracking Challenge 4, the proposed models achieved significantly higher F1 scores than the state-of-the-art contextual models. Finally, we analyze the effectiveness of the introduced models in detail. The analysis demonstrates that the proposed methods were effective to improve SLU accuracy individually.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08450

PDF

http://arxiv.org/pdf/1903.08450
Read All
Left-to-Right Dependency Parsing with Pointer Networks

2019-03-20

Daniel Fernández-González, Carlos Gómez-Rodríguez

arXiv_CL

arXiv_CL
Abstract

We propose a novel transition-based algorithm that straightforwardly parses sentences from left to right by building $n$ attachments, with $n$ being the length of the input sentence. Similarly to the recent stack-pointer parser by Ma et al. (2018), we use the pointer network framework that, given a word, can directly point to a position from the sentence. However, our left-to-right approach is simpler than the original top-down stack-pointer parser (not requiring a stack) and reduces transition sequence length in half, from 2$n$-1 actions to $n$. This results in a quadratic non-projective parser that runs twice as fast as the original while achieving the best accuracy to date on the English PTB dataset (96.04% UAS, 94.43% LAS) among fully-supervised single-model dependency parsers, and improves over the former top-down transition system in the majority of languages tested.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08445

PDF

http://arxiv.org/pdf/1903.08445
Read All
Affect in Tweets Using Experts Model

2019-03-20

Subba Reddy Oota, Adithya Avvaru, Mounika Marreddy, Radhika Mamidi

arXiv_CL

arXiv_CL Sentiment Detection
Abstract

Estimating the intensity of emotion has gained significance as modern textual inputs in potential applications like social media, e-retail markets, psychology, advertisements etc., carry a lot of emotions, feelings, expressions along with its meaning. However, the approaches of traditional sentiment analysis primarily focuses on classifying the sentiment in general (positive or negative) or at an aspect level(very positive, low negative, etc.) and cannot exploit the intensity information. Moreover, automatically identifying emotions like anger, fear, joy, sadness, disgust etc., from text introduces challenging scenarios where single tweet may contain multiple emotions with different intensities and some emotions may even co-occur in some of the tweets. In this paper, we propose an architecture, Experts Model, inspired from the standard Mixture of Experts (MoE) model. The key idea here is each expert learns different sets of features from the feature vector which helps in better emotion detection from the tweet. We compared the results of our Experts Model with both baseline results and top five performers of SemEval-2018 Task-1, Affect in Tweets (AIT). The experimental results show that our proposed approach deals with the emotion detection problem and stands at top-5 results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00762

PDF

http://arxiv.org/pdf/1904.00762
Read All
Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks

2019-03-20

Steven Carr, Nils Jansen, Ralf Wimmer, Alexandru C. Serban, Bernd Becker, Ufuk Topcu

arXiv_AI

arXiv_AI RNN
Abstract

We study strategy synthesis for partially observable Markov decision processes (POMDPs). The particular problem is to determine strategies that provably adhere to (probabilistic) temporal logic constraints. This problem is computationally intractable and theoretically hard. We propose a novel method that combines techniques from machine learning and formal verification. First, we train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP. Secondly, we restrict the RNN-based strategy to represent a finite-memory strategy and implement it on a specific POMDP. For the resulting finite Markov chain, efficient formal verification techniques provide provable guarantees against temporal logic specifications. If the specification is not satisfied, counterexamples supply diagnostic information. We use this information to improve the strategy by iteratively training the RNN. Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08428

PDF

http://arxiv.org/pdf/1903.08428
Read All
Modeling Intelligent Decision Making Command And Control Agents: An Application to Air Defense

2019-03-20

Sumanta Kumar Das

arXiv_AI

arXiv_AI
Abstract

The paper is a half-way between the agent technology and the mathematical reasoning to model tactical decision making tasks. These models are applied to air defense (AD) domain for command and control (C2). It also addresses the issues related to evaluation of agents. The agents are designed and implemented using the agent-programming paradigm. The agents are deployed in an air combat simulated environment for performing the tasks of C2 like electronic counter counter measures, threat assessment, and weapon allocation. The simulated AD system runs without any human intervention, and represents state-of-the-art model for C2 autonomy. The use of agents as autonomous decision making entities is particularly useful in view of futuristic network centric warfare.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08412

PDF

http://arxiv.org/pdf/1903.08412
Read All
Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking

2019-03-20

Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

arXiv_CL

arXiv_CL Embedding
Abstract

Automatic fact-checking systems detect misinformation, such as fake news, by (i) selecting check-worthy sentences for fact-checking, (ii) gathering related information to the sentences, and (iii) inferring the factuality of the sentences. Most prior research on (i) uses hand-crafted features to select check-worthy sentences, and does not explicitly account for the recent finding that the top weighted terms in both check-worthy and non-check-worthy sentences are actually overlapping [15]. Motivated by this, we present a neural check-worthiness sentence ranking model that represents each word in a sentence by \textit{both} its embedding (aiming to capture its semantics) and its syntactic dependencies (aiming to capture its role in modifying the semantics of other terms in the sentence). Our model is an end-to-end trainable neural network for check-worthiness ranking, which is trained on large amounts of unlabelled data through weak supervision. Thorough experimental evaluation against state of the art baselines, with and without weak supervision, shows our model to be superior at all times (+13% in MAP and +28% at various Precision cut-offs from the best baseline with statistical significance). Empirical analysis of the use of weak supervision, word embedding pretraining on domain-specific data, and the use of syntactic dependencies of our model reveals that check-worthy sentences contain notably more identical syntactic dependencies than non-check-worthy sentences.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08404

PDF

http://arxiv.org/pdf/1903.08404
Read All
Contextual Compositionality Detection with External Knowledge Bases andWord Embeddings

2019-03-20

Dongsheng Wang, Quichi Li, Lucas Chaves Lima, Jakob grue Simonsen, Christina Lioma

arXiv_AI

arXiv_AI Knowledge Embedding Detection
Abstract

When the meaning of a phrase cannot be inferred from the individual meanings of its words (e.g., hot dog), that phrase is said to be non-compositional. Automatic compositionality detection in multi-word phrases is critical in any application of semantic processing, such as search engines; failing to detect non-compositional phrases can hurt system effectiveness notably. Existing research treats phrases as either compositional or non-compositional in a deterministic manner. In this paper, we operationalize the viewpoint that compositionality is contextual rather than deterministic, i.e., that whether a phrase is compositional or non-compositional depends on its context. For example, the phrase `green card’ is compositional when referring to a green colored card, whereas it is non-compositional when meaning permanent residence authorization. We address the challenge of detecting this type of contextual compositionality as follows: given a multi-word phrase, we enrich the word embedding representing its semantics with evidence about its global context (terms it often collocates with) as well as its local context (narratives where that phrase is used, which we call usage scenarios). We further extend this representation with information extracted from external knowledge bases. The resulting representation incorporates both localized context and more general usage of the phrase and allows to detect its compositionality in a non-deterministic and contextual way. Empirical evaluation of our model on a dataset of phrase compositionality, manually collected by crowdsourcing contextual compositionality assessments, shows that our model outperforms state-of-the-art baselines notably on detecting phrase compositionality.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08389

PDF

http://arxiv.org/pdf/1903.08389
Read All
Convolution with even-sized kernels and symmetric padding

2019-03-20

Shuang Wu, Guanrui Wang, Pei Tang, Feng Chen, Luping Shi

arXiv_CV

arXiv_CV CNN Classification Deep_Learning
Abstract

Compact convolutional neural networks gain efficiency mainly through depthwise convolutions, expanded channels and complex topologies, which contrarily aggravate the training efforts. In this work, we identify the shift problem occurs in even-sized kernel (2x2, 4x4) convolutions, and eliminate it by proposing symmetric padding on each side of the feature maps (C2sp, C4sp). Symmetric padding enlarges the receptive fields of even-sized kernels with little computational cost. In classification tasks, C2sp outperforms the conventional 3x3 convolution and obtains comparable accuracies to existing compact convolution blocks, but consumes less memory and time during training. In generation tasks, C2sp and C4sp both achieve improved image qualities and stabilized training. Symmetric padding coupled with even-sized convolution is easy to be implemented into deep learning frameworks, providing promising building units for architecture designs that emphasize training efforts on online and continual learning occasions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08385

PDF

http://arxiv.org/pdf/1903.08385
Read All
Part-based approximations for morphological operators using asymmetric auto-encoders

2019-03-20

Bastien Ponchon (CMM, LTCI), Santiago Velasco-Forero (CMM), Samy Blusseau (CMM), Jesus Angulo (CMM), Isabelle Bloch (LTCI)

arXiv_CV

arXiv_CV Sparse
Abstract

This paper addresses the issue of building a part-based representation of a dataset of images. More precisely, we look for a non-negative, sparse decomposition of the images on a reduced set of atoms, in order to unveil a morphological and interpretable structure of the data. Additionally, we want this decomposition to be computed online for any new sample that is not part of the initial dataset. Therefore, our solution relies on a sparse, non-negative auto-encoder where the encoder is deep (for accuracy) and the decoder shallow (for interpretability). This method compares favorably to the state-of-the-art online methods on two datasets (MNIST and Fashion MNIST), according to classical metrics and to a new one we introduce, based on the invariance of the representation to morphological dilation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00763

PDF

http://arxiv.org/pdf/1904.00763
Read All
Regularize, Expand and Compress: Multi-task based Lifelong Learning via NonExpansive AutoML

2019-03-20

Jie Zhang, Junting Zhang, Shalini Ghosh, Dawei Li, Jingwen Zhu, Heming Zhang, Yalin Wang

arXiv_CV

arXiv_CV Attention
Abstract

Lifelong learning, the problem of continual learning where tasks arrive in sequence, has been lately attracting more attention in the computer vision community. The aim of lifelong learning is to develop a system that can learn new tasks while maintaining the performance on the previously learned tasks. However, there are two obstacles for lifelong learning of deep neural networks: catastrophic forgetting and capacity limitation. To solve the above issues, inspired by the recent breakthroughs in automatically learning good neural network architectures, we develop a Multi-task based lifelong learning via nonexpansive AutoML framework termed Regularize, Expand and Compress (REC). REC is composed of three stages: 1) continually learns the sequential tasks without the learned tasks’ data via a newly proposed multi-task weight consolidation (MWC) algorithm; 2) expands the network to help the lifelong learning with potentially improved model capability and performance by network-transformation based AutoML; 3) compresses the expanded model after learning every new task to maintain model efficiency and performance. The proposed MWC and REC algorithms achieve superior performance over other lifelong learning algorithms on four different datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08362

PDF

http://arxiv.org/pdf/1903.08362
Read All
GRIP: Generative Robust Inference and Perception for Semantic Robot Manipulation in Adversarial Environments

2019-03-20

Xiaotong Chen, Rui Chen, Zhiqiang Sui, Zhefan Ye, Yanqi Liu, R. Iris Bahar, Odest Chadwicke Jenkins

arXiv_RO

arXiv_RO Adversarial Object_Detection Pose_Estimation CNN Inference Detection
Abstract

Recent advancements have led to a proliferation of machine learning systems used to assist humans in a wide range of tasks. However, we are still far from accurate, reliable, and resource-efficient operations of these systems. For robot perception, convolutional neural networks (CNNs) for object detection and pose estimation are recently coming into widespread use. However, neural networks are known to suffer overfitting during training process and are less robust within unseen conditions, which are especially vulnerable to {\em adversarial scenarios}. In this work, we propose {\em Generative Robust Inference and Perception (GRIP)} as a two-stage object detection and pose estimation system that aims to combine relative strengths of discriminative CNNs and generative inference methods to achieve robust estimation. Our results show that a second stage of sample-based generative inference is able to recover from false object detection by CNNs, and produce robust estimations in adversarial conditions. We demonstrate the efficacy of {\em GRIP} robustness through comparison with state-of-the-art learning-based pose estimators and pick-and-place manipulation in dark and cluttered environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08352

PDF

http://arxiv.org/pdf/1903.08352
Read All
Photon-Flooded Single-Photon 3D Cameras

2019-03-20

Anant Gupta, Atul Ingle, Andreas Velten, Mohit Gupta

arXiv_CV

arXiv_CV
Abstract

Single photon avalanche diodes (SPADs) are starting to play a pivotal role in the development of photon-efficient, long-range LiDAR systems. However, due to non-linearities in their image formation model, a high photon flux (e.g., due to strong sunlight) leads to distortion of the incident temporal waveform, and potentially, large depth errors. Operating SPADs in low flux regimes can mitigate these distortions, but, often requires attenuating the signal and thus, results in low signal-to-noise ratio. In this paper, we address the following basic question: what is the optimal photon flux that a SPAD-based LiDAR should be operated in? We derive a closed form expression for the optimal flux, which is quasi-depth-invariant, and depends on the ambient light strength. The optimal flux is lower than what a SPAD typically measures in real world scenarios, but surprisingly, considerably higher than what is conventionally suggested for avoiding distortions. We propose a simple, adaptive approach for achieving the optimal flux by attenuating incident flux based on an estimate of ambient light strength. Using extensive simulations and a hardware prototype, we show that the optimal flux criterion holds for several depth estimators, under a wide range of illumination conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08347

PDF

http://arxiv.org/pdf/1903.08347
Read All
Geometry-constrained Car Recognition Using a 3D Perspective Network

2019-03-20

Rui Zeng, Zongyuan Ge, Simon Denman, Sridha Sridharan, Clinton Fookes

arXiv_CV

arXiv_CV Attention CNN Classification Quantitative Detection Recognition
Abstract

We present a novel learning framework for vehicle recognition from a single RGB image. Unlike existing methods which only use attention mechanisms to locate 2D discriminative information, our unified framework learns a joint representation of the 2D global texture and 3D-bounding-box in a mutually correlated and reinforced way. These two kinds of feature representation are combined by a novel fusion network, which predicts the vehicle’s category. The 2D global feature is extracted using an off-the-shelf detection network, where the estimated 2D bounding box assists in finding the region of interest (RoI). With the assistance of the RoI, the 3D bounding box and its corresponding features are generated in a geometrically correct way using a novel \textit{3D perspective Network} (3DPN). The 3DPN consists of a convolutional neural network (CNN), a vanishing point loss, and RoI perspective layers. The CNN regresses the 3D bounding box under the guidance of the proposed vanishing point loss, which provides a perspective geometry constraint. Thanks to the proposed RoI perspective layer, the variation caused by viewpoint changes is corrected via the estimated geometry, enhancing the feature representation. We present qualitative and quantitative results for our approach on the vehicle classification and verification tasks in the BoxCars dataset. The results demonstrate that, by learning how to extract features from the 3D bounding box, we can achieve comparable or superior performance to methods that only use 2D information.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.07916

PDF

https://arxiv.org/pdf/1903.07916
Read All
Face Detection in Repeated Settings

2019-03-20

Mohammad Nayeem Teli, Bruce A. Draper, J. Ross Beveridge

arXiv_CV

arXiv_CV Face Detection Face_Detection Relation Recognition
Abstract

Face detection is an important first step before face verification and recognition. In unconstrained settings it is still an open challenge because of the variation in pose, lighting, scale, background and location. However, for the purposes of verification we can have a control on background and location. Images are primarily captured in places such as the entrance to a sensitive building, in front of a door or some location where the background does not change. We present a correlation based face detection algorithm to detect faces in such settings, where we control the location, and leave lighting, pose, and scale uncontrolled. In these scenarios the results indicate that our algorithm is easy and fast to train, outperforms Viola and Jones face detection accuracy and is faster to test.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08649

PDF

http://arxiv.org/pdf/1903.08649
Read All

114/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL