Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Class-incremental Learning via Deep Model Consolidation

2019-03-19

Junting Zhang, Jie Zhang, Shalini Ghosh, Dawei Li, Serafettin Tasci, Larry Heck, Heming Zhang, C.-C. Jay Kuo

arXiv_CV

arXiv_CV Object_Detection Image_Classification Classification Detection
Abstract

Deep neural networks (DNNs) often suffer from “catastrophic forgetting” during incremental learning (IL) — an abrupt degradation of performance on the original set of classes when the training objective is adapted to a newly added set of classes. Existing IL approaches attempting to overcome catastrophic forgetting tend to produce a model that is biased towards either the old classes or new classes, unless with the help of exemplars of the old data. To address this issue, we propose a class-incremental learning paradigm called Deep Model Consolidation (DMC), which works well even when the original training data is not available. The idea is to first train a separate model only for the new classes, and then combine the two individual models trained on data of two distinct set of classes (old classes and new classes) via a novel dual distillation training objective. The two models are consolidated by exploiting publicly available unlabeled auxiliary data. This overcomes the potential difficulties due to unavailability of original training data. Compared to the state-of-the-art techniques, DMC demonstrates significantly better performance in CIFAR-100 image classification and PASCAL VOC 2007 object detection benchmarks in the IL setting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07864

PDF

http://arxiv.org/pdf/1903.07864
Read All
A Spring Propelled Extreme Environment Robot for Off-World Cave Exploration

2019-03-19

Steven Morad, Thomas Dailey, Leonard Vance, Jekan Thangavelautham

arXiv_RO

arXiv_RO
Abstract

Pits on the Moon and Mars are intriguing geological formations that have yet to be explored. These geological formations can provide protection from harsh diurnal temperature variations, ionizing radiation, and meteorite impacts. Some have proposed that these underground formations are well-suited as human outposts. Some theorize that the Martian pits may harbor remnants of past life. Unfortunately, these geo-logical formations have been off-limits to conventional wheeled rovers and lander systems due to their collapsed ceiling or ‘skylight’ entrances. In this paper, a new low-cost method to explore these pits is presented using the Spring Propelled Extreme Environment Robot (SPEER). The SPEER consists of a launch system that flings disposable spherical microbots through skylights into the pits. The microbots are low-cost and composed of aluminium Al-6061 disposable spheres with an array of adapted COTS sensors and a solid rocket motor for soft landing.By moving most control authority to the launcher, the microbots become very simple, lightweight, and low-cost. We present a preliminary design of the microbots that can be built today using commercial components for under 500 USD. The microbots have a total mass of 1 kg, with more than 750 g available for a science instrument. In this paper, we present the design, dynamics and control, and operation of these microbots. This is followed by initial feasibility studies of the SPEER system by simulating exploration of a known Lunar pit in Mare Tranquillitatis.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07856

PDF

http://arxiv.org/pdf/1903.07856
Read All
Fabric Soft Poly-Limbs for Physical Assistance of Daily Living Tasks

2019-03-19

Pham H. Nguyen, Imran I. B. Mohd, Curtis Sparks, Francisco L. Arellano, Wenlong Zhang, Panagiotis Polygerinos

arXiv_RO

arXiv_RO
Abstract

This paper presents the design and development of a highly articulated, continuum, wearable, fabric-based Soft Poly-Limb (fSPL). This fabric soft arm acts as an additional limb that provides the wearer with mobile manipulation assistance through the use of soft actuators made with high-strength inflatable fabrics. In this work, a set of systematic design rules is presented for the creation of highly compliant soft robotic limbs through an understanding of the fabric based components behavior as a function of input pressure. These design rules are generated by investigating a range of parameters through computational finite-element method (FEM) models focusing on the fSPL’s articulation capabilities and payload capacity in 3D space. The theoretical motion and payload outputs of the fSPL and its components are experimentally validated as well as additional evaluations verify its capability to safely carry loads 10.1x its body weight, by wrapping around the object. Finally, we demonstrate how the fully collapsible fSPL can comfortably be stored in a soft-waist belt and interact with the wearer through spatial mobility and preliminary pick-and-place control experiments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07852

PDF

http://arxiv.org/pdf/1903.07852
Read All
The Probabilistic Object Detection Challenge

2019-03-19

John Skinner, David Hall, Haoyang Zhang, Feras Dayoub, Niko Sünderhauf

arXiv_CV

arXiv_CV Object_Detection Detection
Abstract

We introduce a new challenge for computer and robotic vision, the first ACRV Robotic Vision Challenge, Probabilistic Object Detection. Probabilistic object detection is a new variation on traditional object detection tasks, requiring estimates of spatial and semantic uncertainty. We extend the traditional bounding box format of object detection to express spatial uncertainty using gaussian distributions for the box corners. The challenge introduces a new test dataset of video sequences, which are designed to more closely resemble the kind of data available to a robotic system. We evaluate probabilistic detections using a new probability-based detection quality (PDQ) measure. The goal in creating this challenge is to draw the computer and robotic vision communities together, toward applying object detection solutions for practical robotics applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07840

PDF

http://arxiv.org/pdf/1903.07840
Read All
Turing-Completeness of Dynamics in Abstract Persuasion Argumentation

2019-03-19

Ryuta Arisaka

arXiv_AI

arXiv_AI Relation
Abstract

Abstract Persuasion Argumentation (APA) is a dynamic argumentation formalism that extends Dung argumentation with persuasion relations. In this work, we show through two-counter Minsky machine encoding that APA dynamics is Turing-complete.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07837

PDF

http://arxiv.org/pdf/1903.07837
Read All
Non-negative representation based discriminative dictionary learning for face recognition

2019-03-19

Zhe Chen, Xiao-Jun Wu, Josef Kittler

arXiv_CV

arXiv_CV Regularization Sparse Face Classification Recognition Face_Recognition
Abstract

In this paper, we propose a non-negative representation based discriminative dictionary learning algorithm (NRDL) for multicategory face classification. In contrast to traditional dictionary learning methods, NRDL investigates the use of non-negative representation (NR), which contributes to learning discriminative dictionary atoms. In order to make the learned dictionary more suitable for classification, NRDL seamlessly incorporates nonnegative representation constraint, discriminative dictionary learning and linear classifier training into a unified model. Specifically, NRDL introduces a positive constraint on representation matrix to find distinct atoms from heterogeneous training samples, which results in sparse and discriminative representation. Moreover, a discriminative dictionary encouraging function is proposed to enhance the uniqueness of class-specific sub-dictionaries. Meanwhile, an inter-class incoherence constraint and a compact graph based regularization term are constructed to respectively improve the discriminability of learned classifier. Experimental results on several benchmark face data sets verify the advantages of our NRDL algorithm over the state-of-the-art dictionary learning methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07836

PDF

http://arxiv.org/pdf/1903.07836
Read All
Fisher Discriminative Least Square Regression with Self-Adaptive Weighting for Face Recognition

2019-03-19

Zhe Chen, Xiao-Jun Wu, Josef Kittler

arXiv_CV

arXiv_CV Face Classification Recognition Face_Recognition
Abstract

As a supervised classification method, least square regression (LSR) has shown promising performance in multiclass face recognition tasks. However, the latest LSR based classification methods mainly focus on learning a relaxed regression target to replace traditional zero-one label matrix while ignoring the discriminability of transformed features. Based on the assumption that the transformed features of samples from the same class have similar structure while those of samples from different classes are uncorrelated, in this paper we propose a novel discriminative LSR method based on the Fisher discrimination criterion (FDLSR), where the projected features have small within-class scatter and large inter-class scatter simultaneously. Moreover, different from other methods, we explore relax regression from the view of transformed features rather than the regression targets. Specifically, we impose a dynamic non-negative weight matrix on the transformed features to enlarge the margin between the true and the false classes by self-adaptively assigning appropriate weights to different features. Above two factors can encourage the learned transformation for regression to be more discriminative and thus achieving better classification performance. Extensive experiments on various databases demonstrate that the proposed FDLSR method achieves superior performance to other state-of-the-art LSR based classification methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07833

PDF

http://arxiv.org/pdf/1903.07833
Read All
Low-Rank Discriminative Least Squares Regression for Image Classification

2019-03-19

Zhe Chen, Xiao-Jun Wu, He-Feng Yin, Josef Kittler

arXiv_CV

arXiv_CV Regularization Image_Classification Classification
Abstract

Latest least squares regression (LSR) methods mainly try to learn slack regression targets to replace strict zero-one labels. However, the difference of intra-class targets can also be highlighted when enlarging the distance between different classes, and roughly persuing relaxed targets may lead to the problem of overfitting. To solve above problems, we propose a low-rank discriminative least squares regression model (LRDLSR) for multi-class image classification. Specifically, LRDLSR class-wisely imposes low-rank constraint on the intra-class regression targets to encourage its compactness and similarity. Moreover, LRDLSR introduces an additional regularization term on the learned targets to avoid the problem of overfitting. These two improvements are helpful to learn a more discriminative projection for regression and thus achieving better classification performance. Experimental results over a range of image databases demonstrate the effectiveness of the proposed LRDLSR method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07832

PDF

http://arxiv.org/pdf/1903.07832
Read All
Diversity-Promoting Deep Reinforcement Learning for Interactive Recommendation

2019-03-19

Yong Liu, Yinan Zhang, Qiong Wu, Chunyan Miao, Lizhen Cui, Binqiang Zhao, Yin Zhao, Lu Guan

arXiv_AI

arXiv_AI Attention Reinforcement_Learning Recommendation
Abstract

Interactive recommendation that models the explicit interactions between users and the recommender system has attracted a lot of research attentions in recent years. Most previous interactive recommendation systems only focus on optimizing recommendation accuracy while overlooking other important aspects of recommendation quality, such as the diversity of recommendation results. In this paper, we propose a novel recommendation model, named \underline{D}iversity-promoting \underline{D}eep \underline{R}einforcement \underline{L}earning (D$^2$RL), which encourages the diversity of recommendation results in interaction recommendations. More specifically, we adopt a Determinantal Point Process (DPP) model to generate diverse, while relevant item recommendations. A personalized DPP kernel matrix is maintained for each user, which is constructed from two parts: a fixed similarity matrix capturing item-item similarity, and the relevance of items dynamically learnt through an actor-critic reinforcement learning framework. We performed extensive offline experiments as well as simulated online experiments with real world datasets to demonstrate the effectiveness of the proposed model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07826

PDF

http://arxiv.org/pdf/1903.07826
Read All
Compressed Sensing: From Research to Clinical Practice with Data-Driven Learning

2019-03-19

Joseph Y. Cheng, Feiyu Chen, Christopher Sandino, Morteza Mardani, John M. Pauly, Shreyas S. Vasanawala

arXiv_CV

arXiv_CV Review Deep_Learning
Abstract

Compressed sensing in MRI enables high subsampling factors while maintaining diagnostic image quality. This technique enables shortened scan durations and/or improved image resolution. Further, compressed sensing can increase the diagnostic information and value from each scan performed. Overall, compressed sensing has significant clinical impact in improving the diagnostic quality and patient experience for imaging exams. However, a number of challenges exist when moving compressed sensing from research to the clinic. These challenges include hand-crafted image priors, sensitive tuning parameters, and long reconstruction times. Data-driven learning provides a solution to address these challenges. As a result, compressed sensing can have greater clinical impact. In this tutorial, we will review the compressed sensing formulation and outline steps needed to transform this formulation to a deep learning framework. Supplementary open source code in python will be used to demonstrate this approach with open databases. Further, we will discuss considerations in applying data-driven compressed sensing in the clinical setting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07824

PDF

http://arxiv.org/pdf/1903.07824
Read All
Safe Policy Synthesis in Multi-Agent POMDPs via Discrete-Time Barrier Functions

2019-03-19

Mohamadreza Ahmadi, Andrew Singletary, Joel W. Burdick, Aaron D. Ames

arXiv_RO

arXiv_RO Attention
Abstract

A multi-agent partially observable Markov decision process (MPOMDP) is a modeling paradigm used for high-level planning of heterogeneous autonomous agents subject to uncertainty and partial observation. Despite their modeling efficiency, MPOMDPs have not received significant attention in safety-critical settings. In this paper, we use barrier functions to design policies for MPOMDPs that ensure safety. Notably, our method does not rely on discretization of the belief space, or finite memory. To this end, we formulate sufficient and necessary conditions for the safety of a given set based on discrete-time barrier functions (DTBFs) and we demonstrate that our formulation also allows for Boolean compositions of DTBFs for representing more complicated safe sets. We show that the proposed method can be implemented online by a sequence of one-step greedy algorithms as a standalone safe controller or as a safety-filter given a nominal planning policy. We illustrate the efficiency of the proposed methodology based on DTBFs using a high-fidelity simulation of heterogeneous robots.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07823

PDF

http://arxiv.org/pdf/1903.07823
Read All
Trick or TReAT: Thematic Reinforcement for Artistic Typography

2019-03-19

Purva Tendulkar, Kalpesh Krishna, Ramprasaath R. Selvaraju, Devi Parikh

arXiv_CV

arXiv_CV
Abstract

An approach to make text visually appealing and memorable is semantic reinforcement - the use of visual cues alluding to the context or theme in which the word is being used to reinforce the message (e.g., Google Doodles). We present a computational approach for semantic reinforcement called TReAT - Thematic Reinforcement for Artistic Typography. Given an input word (e.g. exam) and a theme (e.g. education), the individual letters of the input word are replaced by cliparts relevant to the theme which visually resemble the letters - adding creative context to the potentially boring input word. We use an unsupervised approach to learn a latent space to represent letters and cliparts and compute similarities between the two. Human studies show that participants can reliably recognize the word as well as the theme in our outputs (TReATs) and find them more creative compared to meaningful baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07820

PDF

http://arxiv.org/pdf/1903.07820
Read All
Mask-guided Style Transfer Network for Purifying Real Images

2019-03-19

Tongtong Zhao, Yuxiao Yan, Jinjia Peng, Huibing Wang, Xianping Fu

arXiv_CV

arXiv_CV Segmentation Attention Style_Transfer Quantitative
Abstract

Recently, the progress of learning-by-synthesis has proposed a training model for synthetic images, which can effectively reduce the cost of human and material resources. However, due to the different distribution of synthetic images compared with real images, the desired performance cannot be achieved. To solve this problem, the previous method learned a model to improve the realism of the synthetic images. Different from the previous methods, this paper try to purify real image by extracting discriminative and robust features to convert outdoor real images to indoor synthetic images. In this paper, we first introduce the segmentation masks to construct RGB-mask pairs as inputs, then we design a mask-guided style transfer network to learn style features separately from the attention and bkgd(background) regions and learn content features from full and attention region. Moreover, we propose a novel region-level task-guided loss to restrain the features learnt from style and content. Experiments were performed using mixed studies (qualitative and quantitative) methods to demonstrate the possibility of purifying real images in complex directions. We evaluate the proposed method on various public datasets, including LPW, COCO and MPIIGaze. Experimental results show that the proposed method is effective and achieves the state-of-the-art results.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.08152

PDF

http://arxiv.org/pdf/1903.08152
Read All
Self-Weighted Multiview Metric Learning by Maximizing the Cross Correlations

2019-03-19

Huibing Wang, Jinjia Peng, Xianping Fu

arXiv_CV

arXiv_CV Image_Retrieval Face Relation Recognition Face_Recognition
Abstract

With the development of multimedia time, one sample can always be described from multiple views which contain compatible and complementary information. Most algorithms cannot take information from multiple views into considerations and fail to achieve desirable performance in most situations. For many applications, such as image retrieval, face recognition, etc., an appropriate distance metric can better reflect the similarities between various samples. Therefore, how to construct a good distance metric learning methods which can deal with multiview data has been an important topic during the last decade. In this paper, we proposed a novel algorithm named Self-weighted Multiview Metric Learning (SM2L) which can finish this task by maximizing the cross correlations between different views. Furthermore, because multiple views have different contributions to the learning procedure of SM2L, we adopt a self-weighted learning framework to assign multiple views with different weights. Various experiments on benchmark datasets can verify the performance of our proposed method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07812

PDF

http://arxiv.org/pdf/1903.07812
Read All
Multiple Instance Hybrid Estimator for Hyperspectral Target Characterization and Sub-pixel Target Detection

2019-03-19

Changzhe Jiao, Chao Chen, Ronald G. McGarvey, Stephanie Bohlman, Licheng Jiao, Alina Zare

arXiv_CV

arXiv_CV Object_Detection Sparse Detection
Abstract

The Multiple Instance Hybrid Estimator for discriminative target characterization from imprecisely labeled hyperspectral data is presented. In many hyperspectral target detection problems, acquiring accurately labeled training data is difficult. Furthermore, each pixel containing target is likely to be a mixture of both target and non-target signatures (i.e., sub-pixel targets), making extracting a pure prototype signature for the target class from the data extremely difficult. The proposed approach addresses these problems by introducing a data mixing model and optimizing the response of the hybrid sub-pixel detector within a multiple instance learning framework. The proposed approach iterates between estimating a set of discriminative target and non-target signatures and solving a sparse unmixing problem. After learning target signatures, a signature based detector can then be applied on test data. Both simulated and real hyperspectral target detection experiments show the proposed algorithm is effective at learning discriminative target signatures and achieves superior performance over state-of-the-art comparison algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1710.11599

PDF

http://arxiv.org/pdf/1710.11599
Read All
Dynamic Deep Networks for Retinal Vessel Segmentation

2019-03-19

Aashis Khanal, Rolando Estrada

arXiv_CV

arXiv_CV Segmentation
Abstract

Segmenting the retinal vasculature entails a trade-off between how much of the overall vascular structure we identify vs. how precisely we segment individual vessels. In particular, state-of-the-art methods tend to under-segment faint vessels, as well as pixels that lie on the edges of thicker vessels. Thus, they underestimate the width of individual vessels, as well as the ratio of large to small vessels. More generally, many crucial bio-markers—including the artery-vein (AV) ratio, branching angles, number of bifurcation, fractal dimension, tortuosity, vascular length-to-diameter ratio and wall-to-lumen length—require precise measurements of individual vessels. To address this limitation, we propose a novel, stochastic training scheme for deep neural networks that better classifies the faint, ambiguous regions of the image. Our approach relies on two key innovations. First, we train our deep networks with dynamic weights that fluctuate during each training iteration. This stochastic approach forces the network to learn a mapping that robustly balances precision and recall. Second, we decouple the segmentation process into two steps. In the first half of our pipeline, we estimate the likelihood of every pixel and then use these likelihoods to segment pixels that are clearly vessel or background. In the latter part of our pipeline, we use a second network to classify the ambiguous regions in the image. Our proposed method obtained state-of-the-art results on five retinal datasets—DRIVE, STARE, CHASE-DB, AV-WIDE, and VEVIO—by learning a robust balance between false positive and false negative rates. In addition, we are the first to report segmentation results on the AV-WIDE dataset, and we have made the ground-truth annotations for this dataset publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07803

PDF

http://arxiv.org/pdf/1903.07803
Read All
Robust Visual Tracking Using Dynamic Classifier Selection with Sparse Representation of Label Noise

2019-03-19

Yuefeng Chen, Qing Wang

arXiv_CV

arXiv_CV Sparse Tracking Quantitative Detection
Abstract

Recently a category of tracking methods based on “tracking-by-detection” is widely used in visual tracking problem. Most of these methods update the classifier online using the samples generated by the tracker to handle the appearance changes. However, the self-updating scheme makes these methods suffer from drifting problem because of the incorrect labels of weak classifiers in training samples. In this paper, we split the class labels into true labels and noise labels and model them by sparse representation. A novel dynamic classifier selection method, robust to noisy training data, is proposed. Moreover, we apply the proposed classifier selection algorithm to visual tracking by integrating a part based online boosting framework. We have evaluated our proposed method on 12 challenging sequences involving severe occlusions, significant illumination changes and large pose variations. Both the qualitative and quantitative evaluations demonstrate that our approach tracks objects accurately and robustly and outperforms state-of-the-art trackers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07801

PDF

http://arxiv.org/pdf/1903.07801
Read All
Predicting Citywide Crowd Flows in Irregular Regions Using Multi-View Graph Convolutional Networks

2019-03-19

Junkai Sun, Junbo Zhang, Qiaofei Li, Xiuwen Yi, Yu Zheng

arXiv_CV

arXiv_CV CNN Prediction Relation
Abstract

Being able to predict the crowd flows in each and every part of a city, especially in irregular regions, is strategically important for traffic control, risk assessment, and public safety. However, it is very challenging because of interactions and spatial correlations between different regions. In addition, it is affected by many factors: i) multiple temporal correlations among different time intervals: closeness, period, trend; ii) complex external influential factors: weather, events; iii) meta features: time of the day, day of the week, and so on. In this paper, we formulate crowd flow forecasting in irregular regions as a spatio-temporal graph (STG) prediction problem in which each node represents a region with time-varying flows. By extending graph convolution to handle the spatial information, we propose using spatial graph convolution to build a multi-view graph convolutional network (MVGCN) for the crowd flow forecasting problem, where different views can capture different factors as mentioned above. We evaluate MVGCN using four real-world datasets (taxicabs and bikes) and extensive experimental results show that our approach outperforms the adaptations of state-of-the-art methods. And we have developed a crowd flow forecasting system for irregular regions that can now be used internally.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07789

PDF

http://arxiv.org/pdf/1903.07789
Read All
Probabilistic End-to-end Noise Correction for Learning with Noisy Labels

2019-03-19

Kun Yi, Jianxin Wu

arXiv_CV

arXiv_CV Deep_Learning
Abstract

Deep learning has achieved excellent performance in various computer vision tasks, but requires a lot of training examples with clean labels. It is easy to collect a dataset with noisy labels, but such noise makes networks overfit seriously and accuracies drop dramatically. To address this problem, we propose an end-to-end framework called PENCIL, which can update both network parameters and label estimations as label distributions. PENCIL is independent of the backbone network structure and does not need an auxiliary clean dataset or prior information about noise, thus it is more general and robust than existing methods and is easy to apply. PENCIL outperforms previous state-of-the-art methods by large margins on both synthetic and real-world datasets with different noise types and noise rates. Experiments show that PENCIL is robust on clean datasets, too.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07788

PDF

http://arxiv.org/pdf/1903.07788
Read All
Object Detection from Scratch with Deep Supervision

2019-03-19

Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, Xiangyang Xue

arXiv_CV

arXiv_CV Object_Detection Classification Prediction Detection
Abstract

We propose Deeply Supervised Object Detectors (DSOD), an object detection framework that can be trained from scratch. Recent advances in object detection heavily depend on the off-the-shelf models pre-trained on large-scale classification datasets like ImageNet and OpenImage. However, one problem is that adopting pre-trained models from classification to detection task may incur learning bias due to the different objective function and diverse distributions of object categories. Techniques like fine-tuning on detection task could alleviate this issue to some extent but are still not fundamental. Furthermore, transferring these pre-trained models across discrepant domains will be more difficult (e.g., from RGB to depth images). Thus, a better solution to handle these critical problems is to train object detectors from scratch, which motivates our proposed method. Previous efforts on this direction mainly failed by reasons of the limited training data and naive backbone network structures for object detection. In DSOD, we contribute a set of design principles for learning object detectors from scratch. One of the key principles is the deep supervision, enabled by layer-wise dense connections in both backbone networks and prediction layers, plays a critical role in learning good detectors from scratch. After involving several other principles, we build our DSOD based on the single-shot detection framework (SSD). We evaluate our method on PASCAL VOC 2007, 2012 and COCO datasets. DSOD achieves consistently better results than the state-of-the-art methods with much more compact models. Specifically, DSOD outperforms baseline method SSD on all three benchmarks, while requiring only 1/2 parameters. We also observe that DSOD can achieve comparable/slightly better results than Mask RCNN + FPN (under similar input size) with only 1/3 parameters, using no extra data or pre-trained models.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1809.09294

PDF

https://arxiv.org/pdf/1809.09294
Read All
Cloze-driven Pretraining of Self-attention Networks

2019-03-19

Alexei Baevski, Sergey Edunov, Yinhan Liu, Luke Zettlemoyer, Michael Auli

arXiv_CL

arXiv_CL Attention
Abstract

We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems. Our model solves a cloze-style word reconstruction task, where each word is ablated and must be predicted given the rest of the text. Experiments demonstrate large performance gains on GLUE and new state of the art results on NER as well as constituency parsing benchmarks, consistent with the concurrently introduced BERT model. We also present a detailed analysis of a number of factors that contribute to effective pretraining, including data domain and size, model capacity, and variations on the cloze objective.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07785

PDF

http://arxiv.org/pdf/1903.07785
Read All
2019-05-31

Read All
Lemotif: Abstract Visual Depictions of your Emotional States in Life

2019-03-18

Devi Parikh

arXiv_AI

arXiv_AI Salient
Abstract

We present Lemotif. Lemotif generates a motif for your emotional life. You tell Lemotif a little bit about your day – what were salient events or aspects and how they made you feel. Lemotif will generate a lemotif – a creative abstract visual depiction of your emotions and their sources. Over time, Lemotif can create visual motifs to capture a summary of your emotional states over arbitrary periods of time – making patterns in your emotions and their sources apparent, presenting opportunities to take actions, and measure their effectiveness. The underlying principles in Lemotif are that the lemotif should (1) separate out the sources of the emotions, (2) depict these sources visually, (3) depict the emotions visually, and (4) have a creative aspect to them. We verify via human studies that each of these factors contributes to the proposed lemotifs being favored over corresponding baselines.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07766

PDF

http://arxiv.org/pdf/1903.07766
Read All
Deep Reinforcement Learning with Decorrelation

2019-03-18

Borislav Mavrin, Hengshuai Yao, Linglong Kong

arXiv_AI

arXiv_AI Regularization Reinforcement_Learning Represenation_Learning Relation
Abstract

Learning an effective representation for high-dimensional data is a challenging problem in reinforcement learning (RL). Deep reinforcement learning (DRL) such as Deep Q networks (DQN) achieves remarkable success in computer games by learning deeply encoded representation from convolution networks. In this paper, we propose a simple yet very effective method for representation learning with DRL algorithms. Our key insight is that features learned by DRL algorithms are highly correlated, which interferes with learning. By adding a regularized loss that penalizes correlation in latent features (with only slight computation), we decorrelate features represented by deep neural networks incrementally. On 49 Atari games, with the same regularization factor, our decorrelation algorithms perform $70\%$ in terms of human-normalized scores, which is $40\%$ better than DQN. In particular, ours performs better than DQN on 39 games with 4 close ties and lost only slightly on $6$ games. Empirical results also show that the decorrelation method applies to Quantile Regression DQN (QR-DQN) and significantly boosts performance. Further experiments on the losing games show that our decorelation algorithms can win over DQN and QR-DQN with a fined tuned regularization factor.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07765

PDF

http://arxiv.org/pdf/1903.07765
Read All
Motion Planning for Multi-Mobile-Manipulator Payload Transport Systems

2019-03-18

Rahul Tallamraju, Durgesh Haribhau Salunkhe, Sujit Rajappa, Aamir Ahmad, Kamalakar Karlapalem, Suril Vijaykumar Shah

arXiv_RO

arXiv_RO
Abstract

In this paper, a kinematic motion planning algorithm for cooperative spatial payload manipulation is presented. A hierarchical approach is introduced to compute real-time collision-free motion plans for a formation of mobile manipulator robots. Initially, collision-free configurations of a deformable 2-D virtual bounding box are identified, over a planning horizon, to define a convex workspace for the entire system. Then, 3-D payload configurations whose projections lie within the defined convex workspace are computed. Finally, a convex decentralized model-predictive controller is formulated to plan collision-free trajectories for the formation of mobile manipulators. This approach facilitates real-time motion planning for the system and is scalable in the number of robots. The algorithm is validated in simulated dynamic environments. Simulation video: https://youtu.be/9EKj7RwRs_4.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07758

PDF

http://arxiv.org/pdf/1903.07758
Read All
Learning to Augment Synthetic Images for Sim2Real Policy Transfer

2019-03-18

Alexander Pashevich, Robin A. M. Strudel, Igor Kalevatykh, Ivan Laptev, Cordelia Schmid

arXiv_CV

arXiv_CV Image_Caption
Abstract

Vision and learning have made significant progress that could improve robotics policies for complex tasks and environments. Learning deep neural networks for image understanding, however, requires large amounts of domain-specific visual data. While collecting such data from real robots is possible, such an approach limits the scalability as learning policies typically requires thousands of trials. In this work we attempt to learn manipulation policies in simulated environments. Simulators enable scalability and provide access to the underlying world state during training. Policies learned in simulators, however, do not transfer well to real scenes given the domain gap between real and synthetic data. We follow recent work on domain randomization and augment synthetic images with sequences of random transformations. Our main contribution is to optimize the augmentation strategy for sim2real transfer and to enable domain-independent policy learning. We design an efficient search for depth image augmentations using object localization as a proxy task. Given the resulting sequence of random transformations, we use it to augment synthetic depth images during policy learning. Our augmentation strategy is policy-independent and enables policy learning with no real images. We demonstrate our approach to significantly improve accuracy on three manipulation tasks evaluated on a real robot.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07740

PDF

http://arxiv.org/pdf/1903.07740
Read All
Direct Object Recognition Without Line-of-Sight Using Optical Coherence

2019-03-18

Xin Lei, Liangyu He, Yixuan Tan, Ken Xingze Wang, Xinggang Wang, Yihan Du, Shanhui Fan, Zongfu Yu

arXiv_CV

arXiv_CV Recognition
Abstract

Visual object recognition under situations in which the direct line-of-sight is blocked, such as when it is occluded around the corner, is of practical importance in a wide range of applications. With coherent illumination, the light scattered from diffusive walls forms speckle patterns that contain information of the hidden object. It is possible to realize non-line-of-sight (NLOS) recognition with these speckle patterns. We introduce a novel approach based on speckle pattern recognition with deep neural network, which is simpler and more robust than other NLOS recognition methods. Simulations and experiments are performed to verify the feasibility and performance of this approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07705

PDF

http://arxiv.org/pdf/1903.07705
Read All
Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems

2019-03-18

Hadi Abdullah, Washington Garcia, Christian Peeters, Patrick Traynor, Kevin R. B. Butler, Joseph Wilson

arXiv_SD

arXiv_SD Adversarial Knowledge Recognition
Abstract

Voice Processing Systems (VPSes), now widely deployed, have been made significantly more accurate through the application of recent advances in machine learning. However, adversarial machine learning has similarly advanced and has been used to demonstrate that VPSes are vulnerable to the injection of hidden commands - audio obscured by noise that is correctly recognized by a VPS but not by human beings. Such attacks, though, are often highly dependent on white-box knowledge of a specific machine learning model and limited to specific microphones and speakers, making their use across different acoustic hardware platforms (and thus their practicality) limited. In this paper, we break these dependencies and make hidden command attacks more practical through model-agnostic (blackbox) attacks, which exploit knowledge of the signal processing algorithms commonly used by VPSes to generate the data fed into machine learning systems. Specifically, we exploit the fact that multiple source audio samples have similar feature vectors when transformed by acoustic feature extraction algorithms (e.g., FFTs). We develop four classes of perturbations that create unintelligible audio and test them against 12 machine learning models, including 7 proprietary models (e.g., Google Speech API, Bing Speech API, IBM Speech API, Azure Speaker API, etc), and demonstrate successful attacks against all targets. Moreover, we successfully use our maliciously generated audio samples in multiple hardware configurations, demonstrating effectiveness across both models and real systems. In so doing, we demonstrate that domain-specific knowledge of audio signal processing represents a practical means of generating successful hidden voice command attacks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.05734

PDF

http://arxiv.org/pdf/1904.05734
Read All
Learning Channel Inter-dependencies at Multiple Scales on Dense Networks for Face Recognition

2019-03-18

Qiangchang Wang, Guodong Guo, Mohammad Iqbal Nouyed

arXiv_CV

arXiv_CV Face Recognition Face_Recognition
Abstract

We propose a new deep network structure for unconstrained face recognition. The proposed network integrates several key components together in order to characterize complex data distributions, such as in unconstrained face images. Inspired by recent progress in deep networks, we consider some important concepts, including multi-scale feature learning, dense connections of network layers, and weighting different network flows, for building our deep network structure. The developed network is evaluated in unconstrained face matching, showing the capability of learning complex data distributions caused by face images with various qualities.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1711.10103

PDF

http://arxiv.org/pdf/1711.10103
Read All
Software-Defined Design Space Exploration for an Efficient AI Accelerator Architecture

2019-03-18

Ye Yu, Yingmin Li, Shuai Che, Niraj K. Jha, Weifeng Zhang

arXiv_AI

arXiv_AI Object_Detection Optimization Detection Recognition
Abstract

Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high computational complexity of DNNs often necessitates extremely fast and efficient hardware. The problem gets worse as the size of neural networks grows exponentially. As a result, customized hardware accelerators have been developed to accelerate DNN processing without sacrificing model accuracy. However, previous accelerator design studies have not fully considered the characteristics of the target applications, which may lead to sub-optimal architecture designs. On the other hand, new DNN models have been developed for better accuracy, but their compatibility with the underlying hardware accelerator is often overlooked. In this article, we propose an application-driven framework for architectural design space exploration of DNN accelerators. This framework is based on a hardware analytical model of individual DNN operations. It models the accelerator design task as a multi-dimensional optimization problem. We demonstrate that it can be efficaciously used in application-driven accelerator architecture design. Given a target DNN, the framework can generate efficient accelerator design solutions with optimized performance and area. Furthermore, we explore the opportunity to use the framework for accelerator configuration optimization under simultaneous diverse DNN applications. The framework is also capable of improving neural network models to best fit the underlying hardware resources.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07676

PDF

http://arxiv.org/pdf/1903.07676
Read All
Sensor Fusion for Predictive Control of Human-Prosthesis-Environment Dynamics in Assistive Walking: A Survey

2019-03-18

Kuangen Zhang, Clarence W. de Silva, Chenglong Fu

arXiv_RO

arXiv_RO Review Survey
Abstract

This survey paper concerns Sensor Fusion for Predictive Control of Human-Prosthesis-Environment Dynamics in Assistive Walking. The powered lower limb prosthesis can imitate the human limb motion and help amputees to recover the walking ability, but it is still a challenge for amputees to walk in complex environments with the powered prosthesis. Previous researchers mainly focused on the interaction between a human and the prosthesis without considering the environmental information, which can provide an environmental context for human-prosthesis interaction. Therefore, in this review, recent sensor fusion methods for the predictive control of human-prosthesis-environment dynamics in assistive walking are critically surveyed. In that backdrop, several pertinent research issues that need further investigation are presented. In particular, general controllers, comparison of sensors, and complete procedures of sensor fusion methods that are applicable in assistive walking are introduced. Also, possible sensor fusion research for human-prosthesis-environment dynamics is presented.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07674

PDF

http://arxiv.org/pdf/1903.07674
Read All
CornerNet: Detecting Objects as Paired Keypoints

2019-03-18

Hei Law, Jia Deng

arXiv_CV

arXiv_CV Object_Detection CNN Detection
Abstract

We propose CornerNet, a new approach to object detection where we detect an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network. By detecting objects as paired keypoints, we eliminate the need for designing a set of anchor boxes commonly used in prior single-stage detectors. In addition to our novel formulation, we introduce corner pooling, a new type of pooling layer that helps the network better localize corners. Experiments show that CornerNet achieves a 42.2% AP on MS COCO, outperforming all existing one-stage detectors.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1808.01244

PDF

https://arxiv.org/pdf/1808.01244
Read All
Neural Sequential Phrase Grounding

2019-03-18

Pelin Dogan, Leonid Sigal, Markus Gross

arXiv_CV

arXiv_CV Embedding RNN
Abstract

We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an image-text embedding, our architecture formulates grounding of multiple phrases as a sequential and contextual process. Specifically, we encode region proposals and all phrases into two stacks of LSTM cells, along with so-far grounded phrase-region pairs. These LSTM stacks collectively capture context for grounding of the next phrase. The resulting architecture, which we call SeqGROUND, supports many-to-many matching by allowing an image region to be matched to multiple phrases and vice versa. We show competitive performance on the Flickr30K benchmark dataset and, through ablation studies, validate the efficacy of sequential grounding as well as individual design choices in our model architecture.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07669

PDF

http://arxiv.org/pdf/1903.07669
Read All
An Updated Duet Model for Passage Re-ranking

2019-03-18

Bhaskar Mitra, Nick Craswell

arXiv_CL

arXiv_CL
Abstract

We propose several small modifications to Duet—a deep neural ranking model—and evaluate the updated model on the MS MARCO passage ranking task. We report significant improvements from the proposed changes based on an ablation study.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07666

PDF

http://arxiv.org/pdf/1903.07666
Read All
Survey of state-of-the-art mixed data clustering algorithms

2019-03-18

Amir Ahmad, Shehroz S. Khan

arXiv_AI

arXiv_AI Review Survey
Abstract

Mixed data comprises both numeric and categorical features, and mixed datasets occur frequently in many domains, such as health, finance, and marketing. Clustering is often applied to mixed datasets to find structures and to group similar objects for further analysis. However, clustering mixed data is challenging because it is difficult to directly apply mathematical operations, such as summation or averaging, to the feature values of these datasets. In this paper, we present a taxonomy for the study of mixed data clustering algorithms by identifying five major research themes. We then present a state-of-the-art review of the research works within each research theme. We analyze the strengths and weaknesses of these methods with pointers for future research directions. Lastly, we present an in-depth analysis of the overall challenges in this field, highlight open research questions and discuss guidelines to make progress in the field.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.04364

PDF

http://arxiv.org/pdf/1811.04364
Read All
Learning Correspondence from the Cycle-Consistency of Time

2019-03-18

Xiaolong Wang, Allan Jabri, Alexei A. Efros

arXiv_CV

arXiv_CV Segmentation Tracking
Abstract

We introduce a self-supervised method for learning visual correspondence from unlabeled video. The main idea is to use cycle-consistency in time as free supervisory signal for learning visual representations from scratch. At training time, our model learns a feature map representation to be useful for performing cycle-consistent tracking. At test time, we use the acquired representation to find nearest neighbors across space and time. We demonstrate the generalizability of the representation – without finetuning – across a range of visual correspondence tasks, including video object segmentation, keypoint tracking, and optical flow. Our approach outperforms previous self-supervised methods and performs competitively with strongly supervised methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07593

PDF

http://arxiv.org/pdf/1903.07593
Read All
A Multilingual Encoding Method for Text Classification and Dialect Identification Using Convolutional Neural Network

2019-03-18

Amr Adel Helmy

arXiv_CL

arXiv_CL Text_Classification CNN Classification Language_Model
Abstract

This thesis presents a language-independent text classification model by introduced two new encoding methods “BUNOW” and “BUNOC” used for feeding the raw text data into a new CNN spatial architecture with vertical and horizontal convolutional process instead of commonly used methods like one hot vector or word representation (i.e. word2vec) with temporal CNN architecture. The proposed model can be classified as hybrid word-character model in its work methodology because it consumes less memory space by using a fewer neural network parameters as in character level representation, in addition to providing much faster computations with fewer network layers depth, as in word level representation. A promising result achieved compared to state of art models in two different morphological benchmarked dataset one for Arabic language and one for English language.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07588

PDF

http://arxiv.org/pdf/1903.07588
Read All
Human Activity Recognition for Edge Devices

2019-03-18

Manjot Bilkhu, Hammababdullah Ayyubi

arXiv_CV

arXiv_CV Recognition
Abstract

Video activity Recognition has recently gained a lot of momentum with the release of massive Kinetics (400 and 600) data. Architectures such as I3D and C3D networks have shown state-of-the-art performances for activity recognition. The one major pitfall with these state-of-the-art networks is that they require a lot of compute. In this paper we explore how we can achieve comparable results to these state-of-the-art networks for devices-on-edge. We primarily explore two architectures - I3D and Temporal Segment Network. We show that comparable results can be achieved using one tenth the memory usage by changing the testing procedure. We also report our results on Resnet architecture as our backbone apart from the original Inception architecture. Specifically, we achieve 84.54\% top-1 accuracy on UCF-101 dataset using only RGB frames.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07563

PDF

http://arxiv.org/pdf/1903.07563
Read All
SceneCode: Monocular Dense Semantic Reconstruction using Learned Encoded Scene Representations

2019-03-18

Shuaifeng Zhi, Michael Bloesch, Stefan Leutenegger, Andrew J. Davison

arXiv_CV

arXiv_CV Face Relation
Abstract

Systems which incrementally create 3D semantic maps from image sequences must store and update representations of both geometry and semantic entities. However, while there has been much work on the correct formulation for geometrical estimation, state-of-the-art systems usually rely on simple semantic representations which store and update independent label estimates for each surface element (depth pixels, surfels, or voxels). Spatial correlation is discarded, and fused label maps are incoherent and noisy. We introduce a new compact and optimisable semantic representation by training a variational auto-encoder that is conditioned on a colour image. Using this learned latent space, we can tackle semantic label fusion by jointly optimising the low-dimenional codes associated with each of a set of overlapping images, producing consistent fused label maps which preserve spatial correlation. We also show how this approach can be used within a monocular keyframe based semantic mapping system where a similar code approach is used for geometry. The probabilistic formulation allows a flexible formulation where we can jointly estimate motion, geometry and semantics in a unified optimisation.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.06482

PDF

https://arxiv.org/pdf/1903.06482
Read All
A Vocoder Based Method For Singing Voice Extraction

2019-03-18

Pritish Chandna, Merlijn Blaauw, Jordi Bonada, Emilia Gomez

arXiv_SD

arXiv_SD CNN Deep_Learning
Abstract

This paper presents a novel method for extracting the vocal track from a musical mixture. The musical mixture consists of a singing voice and a backing track which may comprise of various instruments. We use a convolutional network with skip and residual connections as well as dilated convolutions to estimate vocoder parameters, given the spectrogram of an input mixture. The estimated parameters are then used to synthesize the vocal track, without any interference from the backing track. We evaluate our system, through objective metrics pertinent to audio quality and interference from background sources, and via a comparative subjective evaluation. We use open-source source separation systems based on Non-negative Matrix Factorization (NMFs) and Deep Learning methods as benchmarks for our system and discuss future applications for this particular algorithm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07554

PDF

http://arxiv.org/pdf/1903.07554
Read All
Visual Cue Integration for Small Target Motion Detection in Natural Cluttered Backgrounds

2019-03-18

Hongxin Wang, Jigen Peng, Qinbing Fu, Huatian Wang, Shigang Yue

arXiv_CV

arXiv_CV Tracking Detection
Abstract

The robust detection of small targets against cluttered background is important for future artificial visual systems in searching and tracking applications. The insects’ visual systems have demonstrated excellent ability to avoid predators, find prey or identify conspecifics - which always appear as small dim speckles in the visual field. Build a computational model of the insects’ visual pathways could provide effective solutions to detect small moving targets. Although a few visual system models have been proposed, they only make use of small-field visual features for motion detection and their detection results often contain a number of false positives. To address this issue, we develop a new visual system model for small target motion detection against cluttered moving backgrounds. Compared to the existing models, the small-field and wide-field visual features are separately extracted by two motion-sensitive neurons to detect small target motion and background motion. These two types of motion information are further integrated to filter out false positives. Extensive experiments showed that the proposed model can outperform the existing models in terms of detection rates.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07546

PDF

http://arxiv.org/pdf/1903.07546
Read All
InGaN/GaN μLED SPICE modelling with size dependent ABC model integration

2019-03-18

Anis Daami, François Olivier

arXiv_CV

arXiv_CV GAN
Abstract

The need of high brightness micro-displays in portable applications dedicated to mixed and/or virtual reality has drawn an important research wave on InGaN/GaN based micro-sized light emitting diodes ({\mu}LEDs). We propose to use a SPICE modelling technique to describe and simulate the electro-optical behavior of the {\mu}LED. A sub-circuit portrayal of the whole device will be used to describe current-voltage behavior and the optical power performance of the device based on the ABC model. We suggest an innovative method to derive instantaneously the carrier concentration from the simulated electrical current in order to determine the {\mu}LED quantum efficiency. In a second step, a statistical approach is also added into the SPICE model in order to apprehend the spread on experimental data. This {\mu}LED SPICE modelling approach is very important to allow the design of robust pixel driving circuits.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.07538

PDF

https://arxiv.org/pdf/1903.07538
Read All
Green InGaN/GaN LEDs: High luminance and blue shift

2019-03-18

Anis Daami, François Olivier, Ludovic Dupré, Christophe Licitra, Franck Henry, François Templier, Stéphanie Le Calvez

arXiv_CV

arXiv_CV GAN Prediction
Abstract

We report in this paper electro-optical results on InGaN/GaN based green micro light-emitting diodes ({\mu}LEDs). Current light-voltage measurements reveal that the external quantum efficiency (EQE) behavior versus charge injection does not follow the ABC model prediction. Light-emission homogeneity investigation, carried out by photoluminescence mapping, shows that the Quantum Confinement Starck Effect (QCSE) is less significant at the edges of {\mu}LEDs. Electroluminescence shows a subsequent color green-to-blue deviation at high carrier injection levels. The extracted spectra at different current injection levels tend to show the appearance of discrete wavelength emissions. These observations may enhance the hypothesis that higher-energy excited-levels in InGaN quantum wells may also contribute to the blue shift, solely attributed to QCSE lessening under intense electric field magnitudes. We hereby present first results dealing with green {\mu}LEDs electro-optical performances with regards to their size.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.07535

PDF

https://arxiv.org/pdf/1903.07535
Read All
LYRICS: a General Interface Layer to Integrate AI and Deep Learning

2019-03-18

Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Marco Gori

arXiv_AI

arXiv_AI Knowledge Face Optimization Inference Deep_Learning
Abstract

In spite of the amazing results obtained by deep learning in many applications, a real intelligent behavior of an agent acting in a complex environment is likely to require some kind of higher-level symbolic inference. Therefore, there is a clear need for the definition of a general and tight integration between low-level tasks, processing sensorial data that can be effectively elaborated using deep learning techniques, and the logic reasoning that allows humans to take decisions in complex environments. This paper presents LYRICS, a generic interface layer for AI, which is implemented in TersorFlow (TF). LYRICS provides an input language that allows to define arbitrary First Order Logic (FOL) background knowledge. The predicates and functions of the FOL knowledge can be bound to any TF computational graph, and the formulas are converted into a set of real-valued constraints, which participate to the overall optimization problem. This allows to learn the weights of the learners, under the constraints imposed by the prior knowledge. The framework is extremely general as it imposes no restrictions in terms of which models or knowledge can be integrated. In this paper, we show the generality of the approach showing some use cases of the presented language, including generative models, logic reasoning, model checking and supervised learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07534

PDF

http://arxiv.org/pdf/1903.07534
Read All
PZnet: Efficient 3D ConvNet Inference on Manycore CPUs

2019-03-18

Sergiy Popovych, Davit Buniatyan, Aleksandar Zlateski, Kai Li, H. Sebastian Seung

arXiv_CV

arXiv_CV CNN Inference
Abstract

Convolutional nets have been shown to achieve state-of-the-art accuracy in many biomedical image analysis tasks. Many tasks within biomedical analysis domain involve analyzing volumetric (3D) data acquired by CT, MRI and Microscopy acquisition methods. To deploy convolutional nets in practical working systems, it is important to solve the efficient inference problem. Namely, one should be able to apply an already-trained convolutional network to many large images using limited computational resources. In this paper we present PZnet, a CPU-only engine that can be used to perform inference for a variety of 3D convolutional net architectures. PZNet outperforms MKL-based CPU implementations of PyTorch and Tensorflow by more than 3.5x for the popular U-net architecture. Moreover, for 3D convolutions with low featuremap numbers, cloud CPU inference with PZnet outperfroms cloud GPU inference in terms of cost efficiency.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07525

PDF

http://arxiv.org/pdf/1903.07525
Read All
Boosted Attention: Leveraging Human Attention for Image Captioning

2019-03-18

Shi Chen, Qi Zhao

arXiv_CV

arXiv_CV Image_Caption Attention Caption
Abstract

Visual attention has shown usefulness in image captioning, with the goal of enabling a caption model to selectively focus on regions of interest. Existing models typically rely on top-down language information and learn attention implicitly by optimizing the captioning objectives. While somewhat effective, the learned top-down attention can fail to focus on correct regions of interest without direct supervision of attention. Inspired by the human visual system which is driven by not only the task-specific top-down signals but also the visual stimuli, we in this work propose to use both types of attention for image captioning. In particular, we highlight the complementary nature of the two types of attention and develop a model (Boosted Attention) to integrate them for image captioning. We validate the proposed approach with state-of-the-art performance across various evaluation metrics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00767

PDF

http://arxiv.org/pdf/1904.00767
Read All
EV-IMO: Motion Segmentation Dataset and Learning Pipeline for Event Cameras

2019-03-18

Anton Mitrokhin, Chengxi Ye, Cornelia Fermuller, Yiannis Aloimonos, Tobi Delbruck

arXiv_CV

arXiv_CV Segmentation
Abstract

We present the first event-based learning approach for motion segmentation in indoor scenes and the first event-based dataset - EV-IMO - which includes accurate pixel-wise motion masks, egomotion and ground truth depth. Our approach is based on an efficient implementation of the SfM learning pipeline using a low parameter neural network architecture on event data. In addition to camera egomotion and a dense depth map, the network estimates pixel-wise independently moving object segmentation and computes per-object 3D translational velocities for moving objects. We also train a shallow network with just 40k parameters, which is able to compute depth and egomotion. Our EV-IMO dataset features 32 minutes of indoor recording with up to 3 fast moving objects simultaneously in the camera field of view. The objects and the camera are tracked by the VICON motion capture system. By 3D scanning the room and the objects, accurate depth map ground truth and pixel-wise object masks are obtained, which are reliable even in poor lighting conditions and during fast motion. We then train and evaluate our learning pipeline on EV-IMO and demonstrate that our approach far surpasses its rivals and is well suited for scene constrained robotics applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07520

PDF

http://arxiv.org/pdf/1903.07520
Read All
An Effective Label Noise Model for DNN Text Classification

2019-03-18

Ishan Jindal, Daniel Pressel, Brian Lester, Matthew Nokleby

arXiv_CL

arXiv_CL Regularization Attention Text_Classification CNN Image_Classification Classification
Abstract

Because large, human-annotated datasets suffer from labeling errors, it is crucial to be able to train deep neural networks in the presence of label noise. While training image classification models with label noise have received much attention, training text classification models have not. In this paper, we propose an approach to training deep networks that is robust to label noise. This approach introduces a non-linear processing layer (noise model) that models the statistics of the label noise into a convolutional neural network (CNN) architecture. The noise model and the CNN weights are learned jointly from noisy training data, which prevents the model from overfitting to erroneous labels. Through extensive experiments on several text classification datasets, we show that this approach enables the CNN to learn better sentence representations and is robust even to extreme label noise. We find that proper initialization and regularization of this noise model is critical. Further, by contrast to results focusing on large batch sizes for mitigating label noise for image classification, we find that altering the batch size does not have much effect on classification performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07507

PDF

http://arxiv.org/pdf/1903.07507
Read All
Understanding the Limitations of CNN-based Absolute Camera Pose Regression

2019-03-18

Torsten Sattler, Qunjie Zhou, Marc Pollefeys, Laura Leal-Taixe

arXiv_CV

arXiv_CV Image_Retrieval Pose_Estimation CNN Prediction SLAM
Abstract

Visual localization is the task of accurate camera pose estimation in a known scene. It is a key problem in computer vision and robotics, with applications including self-driving cars, Structure-from-Motion, SLAM, and Mixed Reality. Traditionally, the localization problem has been tackled using 3D geometry. Recently, end-to-end approaches based on convolutional neural networks have become popular. These methods learn to directly regress the camera pose from an input image. However, they do not achieve the same level of pose accuracy as 3D structure-based methods. To understand this behavior, we develop a theoretical model for camera pose regression. We use our model to predict failure cases for pose regression techniques and verify our predictions through experiments. We furthermore use our model to show that pose regression is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure. A key result is that current approaches do not consistently outperform a handcrafted image retrieval baseline. This clearly shows that additional research is needed before pose regression algorithms are ready to compete with structure-based methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07504

PDF

http://arxiv.org/pdf/1903.07504
Read All
Bilinear Representation for Language-based Image Editing Using Conditional Generative Adversarial Networks

2019-03-18

Xiaofeng Mao, Yuefeng Chen, Yuhong Li, Tao Xiong, Yuan He, Hui Xue

arXiv_CV

arXiv_CV Adversarial GAN Quantitative Relation
Abstract

The task of Language-Based Image Editing (LBIE) aims at generating a target image by editing the source image based on the given language description. The main challenge of LBIE is to disentangle the semantics in image and text and then combine them to generate realistic images. Therefore, the editing performance is heavily dependent on the learned representation. In this work, conditional generative adversarial network (cGAN) is utilized for LBIE. We find that existing conditioning methods in cGAN lack of representation power as they cannot learn the second-order correlation between two conditioning vectors. To solve this problem, we propose an improved conditional layer named Bilinear Residual Layer (BRL) to learning more powerful representations for LBIE task. Qualitative and quantitative comparisons demonstrate that our method can generate images with higher quality when compared to previous LBIE techniques.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.07499

PDF

http://arxiv.org/pdf/1903.07499
Read All

116/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL