Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

Aggregated Deep Local Features for Remote Sensing Image Retrieval

2019-03-22

Raffaele Imbriaco, Clint Sebastian, Egor Bondarev, Peter H.N. de With

arXiv_CV

arXiv_CV Image_Retrieval Attention CNN
Abstract

Remote Sensing Image Retrieval remains a challenging topic due to the special nature of Remote Sensing Imagery. Such images contain various different semantic objects, which clearly complicates the retrieval task. In this paper, we present an image retrieval pipeline that uses attentive, local convolutional features and aggregates them using the Vector of Locally Aggregated Descriptors (VLAD) to produce a global descriptor. We study various system parameters such as the multiplicative and additive attention mechanisms and descriptor dimensionality. We propose a query expansion method that requires no external inputs. Experiments demonstrate that even without training, the local convolutional features and global representation outperform other systems. After system tuning, we can achieve state-of-the-art or competitive results. Furthermore, we observe that our query expansion method increases overall system performance by about 3%, using only the top-three retrieved images. Finally, we show how dimensionality reduction produces compact descriptors with increased retrieval performance and fast retrieval computation times, e.g. 50% faster than the current systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09469

PDF

http://arxiv.org/pdf/1903.09469
Read All
Factorised Representation Learning in Cardiac Image Analysis

2019-03-22

Agisilaos Chartsias, Thomas Joyce, Giorgos Papanastasiou, Michelle Williams, David Newby, Rohan Dharmakumar, Sotirios A. Tsaftaris

arXiv_CV

arXiv_CV Segmentation Represenation_Learning
Abstract

Typically, a medical image offers spatial information on the anatomy (and pathology) modulated by imaging specific characteristics. Many imaging modalities including Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) can be interpreted in this way. We can venture further and consider that a medical image naturally factors into some spatial factors depicting anatomy and factors that denote the imaging characteristics. Here, we explicitly learn this decomposed (factorised) representation of imaging data, focusing in particular on cardiac images. We propose Spatial Decomposition Network (SDNet), which factorises 2D medical images into spatial anatomical factors and non-spatial imaging factors. We demonstrate that this high-level representation is ideally suited for several medical image analysis tasks, such as semi-supervised segmentation, multi-task segmentation and regression, and image-to-image synthesis. Specifically, we show that our model can match the performance of fully supervised segmentation models, using only a fraction of the labelled images. Critically, we show that our factorised representation also benefits from supervision obtained either when we use auxiliary tasks to train the model in a multi-task setting (e.g. regressing to known cardiac indices), or when aggregating multimodal data from different sources (e.g. pooling together MRI and CT data). To explore the properties of the learned factorisation, we perform latent-space arithmetic and show that we can synthesise CT from MR and vice versa, by swapping the modality factors. We also demonstrate that the factor holding image specific information can be used to predict the input modality with high accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09467

PDF

http://arxiv.org/pdf/1903.09467
Read All
Data Augmentation via Dependency Tree Morphing for Low-Resource Languages

2019-03-22

Gözde Gül Şahin, Mark Steedman

arXiv_CL

arXiv_CL
Abstract

Neural NLP systems achieve high scores in the presence of sizable training dataset. Lack of such datasets leads to poor system performances in the case low-resource languages. We present two simple text augmentation techniques using dependency trees, inspired from image processing. We crop sentences by removing dependency links, and we rotate sentences by moving the tree fragments around the root. We apply these techniques to augment the training sets of low-resource languages in Universal Dependencies project. We implement a character-level sequence tagging model and evaluate the augmented datasets on part-of-speech tagging task. We show that crop and rotate provides improvements over the models trained with non-augmented data for majority of the languages, especially for languages with rich case marking systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09460

PDF

http://arxiv.org/pdf/1903.09460
Read All
LINSPECTOR: Multilingual Probing Tasks for Word Representations

2019-03-22

Gözde Gül Şahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych

arXiv_CL

arXiv_CL Embedding Inference Classification Relation
Abstract

Despite an ever growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the community to get an estimate of the downstream task performance, as well as to design more informed neural architectures, while avoiding extensive experimentation which requires substantial computational resources not all researchers have access to. A recent development in NLP is to use simple classification tasks, also called probing tasks, that test for a single linguistic feature such as part-of-speech. Existing studies mostly focus on exploring the information encoded by the sentence-level representations for English. However, from a typological perspective the morphologically poor English is rather an outlier: the information encoded by the word order and function words in English is often stored on a subword, morphological level in other languages. To address this, we introduce 15 word-level probing tasks such as case marking, possession, word length, morphological tag count and pseudoword identification for 24 languages. We present experiments on several state of the art word embedding models, in which we relate the probing task performance for a diverse set of languages to a range of classic NLP tasks such as semantic role labeling and natural language inference. We find that a number of probing tests have significantly high positive correlation to the downstream tasks, especially for morphologically rich languages. We show that our tests can be used to explore word embeddings or black-box neural models for linguistic cues in a multilingual setting. We release the probing datasets and the evaluation suite with https://github.com/UKPLab/linspector.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09442

PDF

http://arxiv.org/pdf/1903.09442
Read All
An end-to-end Neural Network Framework for Text Clustering

2019-03-22

Jie Zhou, Xingyi Cheng, Jinchao Zhang

arXiv_CL

arXiv_CL Sentiment Review Sentiment_Classification Represenation_Learning Optimization Classification
Abstract

The unsupervised text clustering is one of the major tasks in natural language processing (NLP) and remains a difficult and complex problem. Conventional \mbox{methods} generally treat this task using separated steps, including text representation learning and clustering the representations. As an improvement, neural methods have also been introduced for continuous representation learning to address the sparsity problem. However, the multi-step process still deviates from the unified optimization target. Especially the second step of cluster is generally performed with conventional methods such as k-Means. We propose a pure neural framework for text clustering in an end-to-end manner. It jointly learns the text representation and the clustering model. Our model works well when the context can be obtained, which is nearly always the case in the field of NLP. We have our method \mbox{evaluated} on two widely used benchmarks: IMDB movie reviews for sentiment classification and $20$-Newsgroup for topic categorization. Despite its simplicity, experiments show the model outperforms previous clustering methods by a large margin. Furthermore, the model is also verified on English wiki dataset as a large corpus.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09424

PDF

http://arxiv.org/pdf/1903.09424
Read All
Artificial intelligence-based process for metal scrap sorting

2019-03-22

Maximilian Auer, Kai Osswald, Raphael Volz, Joerg Woidasky

arXiv_AI

arXiv_AI
Abstract

Machine learning offers remarkable benefits for improving workplaces and working conditions amongst others in the recycling industry. Here e.g. hand-sorting of medium value scrap is labor intensive and requires experienced and skilled workers. On the one hand, they have to be highly concentrated for making proper readings and analyses of the material, but on the other hand, this work is monotonous. Therefore, a machine learning approach is proposed for a quick and reliable automated identification of alloys in the recycling industry, while the mere scrap handling is regarded to be left in the hands of the workers. To this end, a set of twelve tool and high-speed steels from the field were selected to be identified by their spectrum induced by electric arcs. For data acquisition, the optical emission spectrometer Thorlabs CCS 100 was used. Spectra have been post-processed to be fed into the supervised machine learning algorithm. The development of the machine learning software is conducted according to the steps of the VDI 2221 standard method. For programming Python 3 as well as the python-library sklearn were used. By systematic parameter variation, the appropriate machine learning algorithm was selected and validated. Subsequent validation steps showed that the automated identification process using a machine learning approach and the optical emission spectrometry is applicable, reaching a maximum F1 score of 96.9 %. This performance is as good as the performance of a highly trained worker using visual grinding spark identification. The tests were based on a self-generated set of 600 spectra per single alloy (7,200 spectra in total) which were produced using an industry workshop device.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09415

PDF

http://arxiv.org/pdf/1903.09415
Read All
Fast Bayesian Uncertainty Estimation of Batch Normalized Single Image Super-Resolution Network

2019-03-22

Aupendu Kar, Prabir Kumar Biswas

arXiv_CV

arXiv_CV Super_Resolution CNN
Abstract

In recent years, deep convolutional neural network (CNN) has achieved unprecedented success in image super-resolution (SR) task. But the black-box nature of the neural network and due to its lack of transparency, it is hard to trust the outcome. In this regards, we introduce a Bayesian approach for uncertainty estimation in super-resolution network. We generate Monte Carlo (MC) samples from a posterior distribution by using batch mean and variance as a stochastic parameter in the batch-normalization layer during test time. Those MC samples not only reconstruct the image from its low-resolution counterpart but also provides a confidence map of reconstruction which will be very impactful for practical use. We also introduce a faster approach for estimating the uncertainty, and it can be useful for real-time applications. We validate our results using standard datasets for performance analysis and also for different domain-specific super-resolution task. We also estimate uncertainty quality using standard statistical metrics and also provides a qualitative evaluation of uncertainty for SR applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09410

PDF

http://arxiv.org/pdf/1903.09410
Read All
Effect of Ge-doping on the short-wave, mid- and far-infrared intersubband transitions in GaN/AlGaN heterostructures

2019-03-22

Caroline B. Lim, Akhil Ajay, Jonas Lähnemann, Catherine Bougerol, Eva Monroy

arXiv_CV

arXiv_CV GAN
Abstract

This paper assesses the effects of Ge-doping on the structural and optical (band-to-band and intersubband (ISB)) properties of GaN/AlGaN multi-quantum wells (QWs) designed to display ISB absorption in the short-wave, mid- and far-infrared ranges (SWIR, MIR, and FIR, respectively). The standard c-plane crystallographic orientation is considered for wells absorbing in the SWIR and MIR spectral regions, whereas the FIR structures are grown along the nonpolar m-axis. In all cases, we compare the characteristics of Ge-doped and Si-doped samples with the same design and various doping levels. The use of Ge appears to improve the mosaicity of the highly lattice-mismatched GaN/AlN heterostructures. However, when reducing the lattice mismatch, the mosaicity is rather determined by the substrate and does not show any dependence on the dopant nature or concentration. From the optical point of view, by increasing the dopant density, we observe a blueshift of the photoluminescence in polar samples due to the screening of the internal electric field by free carriers. In the ISB absorption, on the other hand, there is a systematic improvement of the linewidth when using Ge as a dopant for high doping levels, whatever the spectral region under consideration (i.e. different QW size, barrier composition and crystallographic orientation).

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.09375

PDF

https://arxiv.org/pdf/1903.09375
Read All
Deep Hierarchical Reinforcement Learning Based Recommendations via Multi-goals Abstraction

2019-03-22

Dongyang Zhao, Liang Zhang, Bo Zhang, Lizhou Zheng, Yongjun Bao, Weipeng Yan

arXiv_AI

arXiv_AI Sparse Reinforcement_Learning Recommendation
Abstract

The recommender system is an important form of intelligent application, which assists users to alleviate from information redundancy. Among the metrics used to evaluate a recommender system, the metric of conversion has become more and more important. The majority of existing recommender systems perform poorly on the metric of conversion due to its extremely sparse feedback signal. To tackle this challenge, we propose a deep hierarchical reinforcement learning based recommendation framework, which consists of two components, i.e., high-level agent and low-level agent. The high-level agent catches long-term sparse conversion signals, and automatically sets abstract goals for low-level agent, while the low-level agent follows the abstract goals and interacts with real-time environment. To solve the inherent problem in hierarchical reinforcement learning, we propose a novel deep hierarchical reinforcement learning algorithm via multi-goals abstraction (HRL-MG). Our proposed algorithm contains three characteristics: 1) the high-level agent generates multiple goals to guide the low-level agent in different stages, which reduces the difficulty of approaching high-level goals; 2) different goals share the same state encoder parameters, which increases the update frequency of the high-level agent and thus accelerates the convergence of our proposed algorithm; 3) an appreciate benefit assignment function is designed to allocate rewards in each goal so as to coordinate different goals in a consistent direction. We evaluate our proposed algorithm based on a real-world e-commerce dataset and validate its effectiveness.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09374

PDF

http://arxiv.org/pdf/1903.09374
Read All
Few-shot Adaptive Faster R-CNN

2019-03-22

Tao Wang, Xiaopeng Zhang, Li Yuan, Jiashi Feng

arXiv_CV

arXiv_CV Regularization Object_Detection Classification Detection
Abstract

To mitigate the detection performance drop caused by domain shift, we aim to develop a novel few-shot adaptation approach that requires only a few target domain images with limited bounding box annotations. To this end, we first observe several significant challenges. First, the target domain data is highly insufficient, making most existing domain adaptation methods ineffective. Second, object detection involves simultaneous localization and classification, further complicating the model adaptation process. Third, the model suffers from over-adaptation (similar to overfitting when training with a few data example) and instability risk that may lead to degraded detection performance in the target domain. To address these challenges, we first introduce a pairing mechanism over source and target features to alleviate the issue of insufficient target domain samples. We then propose a bi-level module to adapt the source trained detector to the target domain: 1) the split pooling based image level adaptation module uniformly extracts and aligns paired local patch features over locations, with different scale and aspect ratio; 2) the instance level adaptation module semantically aligns paired object features while avoids inter-class confusion. Meanwhile, a source model feature regularization (SMFR) is applied to stabilize the adaptation process of the two modules. Combining these contributions gives a novel few-shot adaptive Faster-RCNN framework, termed FAFRCNN, which effectively adapts to target domain with a few labeled samples. Experiments with multiple datasets show that our model achieves new state-of-the-art performance under both the interested few-shot domain adaptation(FDA) and unsupervised domain adaptation(UDA) setting.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09372

PDF

http://arxiv.org/pdf/1903.09372
Read All
Macro Action Reinforcement Learning with Sequence Disentanglement using Variational Autoencoder

2019-03-22

Kim Heecheol, Masanori Yamada, Kosuke Miyoshi, Hiroshi Yamakawa

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

One problem in the application of reinforcement learning to real-world problems is the curse of dimensionality on the action space. Macro actions, a sequence of primitive actions, have been studied to diminish the dimensionality of the action space with regard to the time axis. However, previous studies relied on humans defining macro actions or assumed macro actions as repetitions of the same primitive actions. We present Factorized Macro Action Reinforcement Learning (FaMARL) which autonomously learns disentangled factor representation of a sequence of actions to generate macro actions that can be directly applied to general reinforcement learning algorithms. FaMARL exhibits higher scores than other reinforcement learning algorithms on environments that require an extensive amount of search.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09366

PDF

http://arxiv.org/pdf/1903.09366
Read All
Joint 3D Face Reconstruction and Dense Face Alignment from A Single Image with 2D-Assisted Self-Supervised Learning

2019-03-22

Xiaoguang Tu, Jian Zhao, Zihang Jiang, Yao Luo, Mei Xie, Yang Zhao, Linxiao He, Zheng Ma, Jiashi Feng

arXiv_CV

arXiv_CV Sparse Face Prediction
Abstract

3D face reconstruction from a single 2D image is a challenging problem with broad applications. Recent methods typically aim to learn a CNN-based 3D face model that regresses coefficients of 3D Morphable Model (3DMM) from 2D images to render 3D face reconstruction or dense face alignment. However, the shortage of training data with 3D annotations considerably limits performance of those methods. To alleviate this issue, we propose a novel 2D-assisted self-supervised learning (2DASL) method that can effectively use “in-the-wild” 2D face images with noisy landmark information to substantially improve 3D face model learning. Specifically, taking the sparse 2D facial landmarks as additional information, 2DSAL introduces four novel self-supervision schemes that view the 2D landmark and 3D landmark prediction as a self-mapping process, including the 2D and 3D landmark self-prediction consistency, cycle-consistency over the 2D landmark prediction and self-critic over the predicted 3DMM coefficients based on landmark predictions. Using these four self-supervision schemes, the 2DASL method significantly relieves demands on the the conventional paired 2D-to-3D annotations and gives much higher-quality 3D face models without requiring any additional 3D annotations. Experiments on multiple challenging datasets show that our method outperforms state-of-the-arts for both 3D face reconstruction and dense face alignment by a large margin.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09359

PDF

http://arxiv.org/pdf/1903.09359
Read All
A Model Counter's Guide to Probabilistic Systems

2019-03-22

Marcell Vazquez-Chanlatte, Markus N. Rabe, Sanjit A. Seshia

arXiv_AI

arXiv_AI Inference Relation
Abstract

In this paper, we systematize the modeling of probabilistic systems for the purpose of analyzing them with model counting techniques. Starting from unbiased coin flips, we show how to model biased coins, correlated coins, and distributions over finite sets. From there, we continue with modeling sequential systems, such as Markov chains, and revisit the relationship between weighted and unweighted model counting. Thereby, this work provides a conceptual framework for deriving #SAT encodings for probabilistic inference.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09354

PDF

http://arxiv.org/pdf/1903.09354
Read All
Binary Space Partitioning Forests

2019-03-22

Xuhui Fan, Bin Li, Scott Anthony Sisson

arXiv_AI

arXiv_AI Inference Relation
Abstract

The Binary Space Partitioning~(BSP)-Tree process is proposed to produce flexible 2-D partition structures which are originally used as a Bayesian nonparametric prior for relational modelling. It can hardly be applied to other learning tasks such as regression trees because extending the BSP-Tree process to a higher dimensional space is nontrivial. This paper is the first attempt to extend the BSP-Tree process to a d-dimensional (d>2) space. We propose to generate a cutting hyperplane, which is assumed to be parallel to d-2 dimensions, to cut each node in the d-dimensional BSP-tree. By designing a subtle strategy to sample two free dimensions from d dimensions, the extended BSP-Tree process can inherit the essential self-consistency property from the original version. Based on the extended BSP-Tree process, an ensemble model, which is named the BSP-Forest, is further developed for regression tasks. Thanks to the retained self-consistency property, we can thus significantly reduce the geometric calculations in the inference stage. Compared to its counterpart, the Mondrian Forest, the BSP-Forest can achieve similar performance with fewer cuts due to its flexibility. The BSP-Forest also outperforms other (Bayesian) regression forests on a number of real-world data sets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09348

PDF

http://arxiv.org/pdf/1903.09348
Read All
Overcoming Small Minirhizotron Datasets Using Transfer Learning

2019-03-22

Weihuang Xu, Guohao Yu, Alina Zare, Brendan Zurweller, Diane Rowland, Joel Reyes-Cabrera, Felix B Fritschi, Roser Matamala, Thomas E. Juenger

arXiv_CV

arXiv_CV Segmentation Transfer_Learning
Abstract

Minirhizotron technology is widely used for studying the development of roots. Such systems collect visible-wavelength color imagery of plant roots in-situ by scanning an imaging system within a clear tube driven into the soil. Automated analysis of root systems could facilitate new scientific discoveries that would be critical to address the world’s pressing food, resource, and climate issues. A key component of automated analysis of plant roots from imagery is the automated pixel-level segmentation of roots from their surrounding soil. Supervised learning techniques appear to be an appropriate tool for the challenge due to varying local soil and root conditions, however, lack of enough annotated training data is a major limitation due to the error-prone and time-consuming manually labeling process. In this paper, we investigate the use of deep neural networks based on the U-net architecture for automated, precise pixel-wise root segmentation in minirhizotron imagery. We compiled two minirhizotron image datasets to accomplish this study: one with 17,550 peanut root images and another with 28 switchgrass root images. Both datasets were paired with manually labeled ground truth masks. We trained three neural networks with different architectures on the larger peanut root dataset to explore the effect of the neural network depth on segmentation performance. To tackle the more limited switchgrass root dataset, we showed that models initialized with features pre-trained on the peanut dataset and then fine-tuned on the switchgrass dataset can improve segmentation performance significantly. We obtained 99\% segmentation accuracy in switchgrass imagery using only 21 training images. We also observed that features pre-trained on a closely related but relatively moderate size dataset like our peanut dataset are more effective than features pre-trained on the large but unrelated ImageNet dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09344

PDF

http://arxiv.org/pdf/1903.09344
Read All
The Binary Space Partitioning-Tree Process

2019-03-22

Xuhui Fan, Bin Li, Scott Anthony Sisson

arXiv_AI

arXiv_AI GAN Inference Relation
Abstract

The Mondrian process represents an elegant and powerful approach for space partition modelling. However, as it restricts the partitions to be axis-aligned, its modelling flexibility is limited. In this work, we propose a self-consistent Binary Space Partitioning (BSP)-Tree process to generalize the Mondrian process. The BSP-Tree process is an almost surely right continuous Markov jump process that allows uniformly distributed oblique cuts in a two-dimensional convex polygon. The BSP-Tree process can also be extended using a non-uniform probability measure to generate direction differentiated cuts. The process is also self-consistent, maintaining distributional invariance under a restricted subdomain. We use Conditional-Sequential Monte Carlo for inference using the tree structure as the high-dimensional variable. The BSP-Tree process’s performance on synthetic data partitioning and relational modelling demonstrates clear inferential improvements over the standard Mondrian process and other related methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09343

PDF

http://arxiv.org/pdf/1903.09343
Read All
Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

2019-03-22

Kazuki Shimada, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

arXiv_SD

arXiv_SD Speech_Recognition Recognition
Abstract

This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take a supervised approach that classifies each time-frequency (TF) bin into noise or speech by training a deep neural network (DNN). The performance of ASR, however, is degraded in an unknown noisy environment. To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF). This enables us to accurately estimate the SCMs of speech and noise not from observed noisy mixtures but from separated speech and noise components. In this paper we propose online MVDR beamforming by effectively initializing and incrementally updating the parameters of MNMF. Another main contribution is to comprehensively investigate the performances of ASR obtained by various types of spatial filters, i.e., time-invariant and variant versions of MVDR beamformers and those of rank-1 and full-rank multichannel Wiener filters, in combination with MNMF. The experimental results showed that the proposed method outperformed the state-of-the-art DNN-based beamforming method in unknown environments that did not match training data.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09341

PDF

http://arxiv.org/pdf/1903.09341
Read All
Visceral Machines: Risk-Aversion in Reinforcement Learning with Intrinsic Physiological Rewards

2019-03-22

Daniel McDuff, Ashish Kapoor

arXiv_AI

arXiv_AI Sparse Reinforcement_Learning
Abstract

As people learn to navigate the world, autonomic nervous system (e.g., “fight or flight”) responses provide intrinsic feedback about the potential consequence of action choices (e.g., becoming nervous when close to a cliff edge or driving fast around a bend.) Physiological changes are correlated with these biological preparations to protect one-self from danger. We present a novel approach to reinforcement learning that leverages a task-independent intrinsic reward function trained on peripheral pulse measurements that are correlated with human autonomic nervous system responses. Our hypothesis is that such reward functions can circumvent the challenges associated with sparse and skewed rewards in reinforcement learning settings and can help improve sample efficiency. We test this in a simulated driving environment and show that it can increase the speed of learning and reduce the number of collisions during the learning stage.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.09975

PDF

http://arxiv.org/pdf/1805.09975
Read All
Pose Estimation of Periacetabular Osteotomy Fragments with Intraoperative X-Ray Navigation

2019-03-22

Robert B. Grupp, Rachel A. Hegeman, Ryan J. Murphy, Clayton P. Alexander, Yoshito Otake, Benjamin A. McArthur, Mehran Armand, Russell H. Taylor

arXiv_CV

arXiv_CV Pose_Estimation
Abstract

Objective: State of the art navigation systems for pelvic osteotomies use optical systems with external fiducials. We propose the use of X-Ray navigation for pose estimation of periacetabular fragments without fiducials. Methods: A 2D/3D registration pipeline was developed to recover fragment pose. This pipeline was tested through an extensive simulation study and 6 cadaveric surgeries. Using osteotomy boundaries in the fluoroscopic images, the preoperative plan is refined to more accurately match the intraoperative shape. Results: In simulation, average fragment pose errors were 1.3{\deg}/1.7 mm when the planned fragment matched the intraoperative fragment, 2.2{\deg}/2.1 mm when the plan was not updated to match the true shape, and 1.9{\deg}/2.0 mm when the fragment shape was intraoperatively estimated. In cadaver experiments, the average pose errors were 2.2{\deg}/2.2 mm, 3.8{\deg}/2.5 mm, and 3.5{\deg}/2.2 mm when registering with the actual fragment shape, a preoperative plan, and an intraoperatively refined plan, respectively. Average errors of the lateral center edge angle were less than 2{\deg} for all fragment shapes in simulation and cadaver experiments. Conclusion: The proposed pipeline is capable of accurately reporting femoral head coverage within a range clinically identified for long-term joint survivability. Significance: Human interpretation of fragment pose is challenging and usually restricted to rotation about a single anatomical axis. The proposed pipeline provides an intraoperative estimate of rigid pose with respect to all anatomical axes, is compatible with minimally invasive incisions, and has no dependence on external fiducials.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09339

PDF

http://arxiv.org/pdf/1903.09339
Read All
A Type-coherent, Expressive Representation as an Initial Step to Language Understanding

2019-03-22

Gene Louis Kim, Lenhart Schubert

arXiv_AI

arXiv_AI Inference
Abstract

A growing interest in tasks involving language understanding by the NLP community has led to the need for effective semantic parsing and inference. Modern NLP systems use semantic representations that do not quite fulfill the nuanced needs for language understanding: adequately modeling language semantics, enabling general inferences, and being accurately recoverable. This document describes underspecified logical forms (ULF) for Episodic Logic (EL), which is an initial form for a semantic representation that balances these needs. ULFs fully resolve the semantic type structure while leaving issues such as quantifier scope, word sense, and anaphora unresolved; they provide a starting point for further resolution into EL, and enable certain structural inferences without further resolution. This document also presents preliminary results of creating a hand-annotated corpus of ULFs for the purpose of training a precise ULF parser, showing a three-person pairwise interannotator agreement of 0.88 on confident annotations. We hypothesize that a divide-and-conquer approach to semantic parsing starting with derivation of ULFs will lead to semantic analyses that do justice to subtle aspects of linguistic meaning, and will enable construction of more accurate semantic parsers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09333

PDF

http://arxiv.org/pdf/1903.09333
Read All
Unsupervised Deformable Registration for Multi-Modal Images via Disentangled Representations

2019-03-22

Chen Qin, Bibo Shi, Rui Liao, Tommaso Mansi, Daniel Rueckert, Ali Kamen

arXiv_CV

arXiv_CV Adversarial Relation
Abstract

We propose a fully unsupervised multi-modal deformable image registration method (UMDIR), which does not require any ground truth deformation fields or any aligned multi-modal image pairs during training. Multi-modal registration is a key problem in many medical image analysis applications. It is very challenging due to complicated and unknown relationships between different modalities. In this paper, we propose an unsupervised learning approach to reduce the multi-modal registration problem to a mono-modal one through image disentangling. In particular, we decompose images of both modalities into a common latent shape space and separate latent appearance spaces via an unsupervised multi-modal image-to-image translation approach. The proposed registration approach is then built on the factorized latent shape code, with the assumption that the intrinsic shape deformation existing in original image domain is preserved in this latent space. Specifically, two metrics have been proposed for training the proposed network: a latent similarity metric defined in the common shape space and a learningbased image similarity metric based on an adversarial loss. We examined different variations of our proposed approach and compared them with conventional state-of-the-art multi-modal registration methods. Results show that our proposed methods achieve competitive performance against other methods at substantially reduced computation time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09331

PDF

http://arxiv.org/pdf/1903.09331
Read All
A resnet-based universal method for speckle reduction in optical coherence tomography images

2019-03-22

Cai Ning, Shi Fei, Hu Dianlin, Chen Yang

arXiv_CV

arXiv_CV
Abstract

In this work we propose a ResNet-based universal method for speckle reduction in optical coherence tomography (OCT) images. The proposed model contains 3 main modules: Convolution-BN-ReLU, Branch and Residual module. Unlike traditional algorithms, the model can learn from training data instead of selecting parameters manually such as noise level. Application of this proposed method to the OCT images shows a more than 22 dB signal-to-noise ratio improvement in speckle noise reduction with minimal structure blurring. The proposed method provides strong generalization ability and can process noisy other types of OCT images without retraining. It outperforms other filtering methods in suppressing speckle noises and revealing subtle features.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09330

PDF

http://arxiv.org/pdf/1903.09330
Read All
Improving Safety in Reinforcement Learning Using Model-Based Architectures and Human Intervention

2019-03-22

Bharat Prakash, Mohit Khatwani, Nicholas Waytowich, Tinoosh Mohsenin

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Recent progress in AI and Reinforcement learning has shown great success in solving complex problems with high dimensional state spaces. However, most of these successes have been primarily in simulated environments where failure is of little or no consequence. Most real-world applications, however, require training solutions that are safe to operate as catastrophic failures are inadmissible especially when there is human interaction involved. Currently, Safe RL systems use human oversight during training and exploration in order to make sure the RL agent does not go into a catastrophic state. These methods require a large amount of human labor and it is very difficult to scale up. We present a hybrid method for reducing the human intervention time by combining model-based approaches and training a supervised learner to improve sample efficiency while also ensuring safety. We evaluate these methods on various grid-world environments using both standard and visual representations and show that our approach achieves better performance in terms of sample efficiency, number of catastrophic states reached as well as overall task performance compared to traditional model-free approaches

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09328

PDF

http://arxiv.org/pdf/1903.09328
Read All
Towards Optimal Structured CNN Pruning via Generative Adversarial Learning

2019-03-22

Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann

arXiv_CV

arXiv_CV Regularization Adversarial Sparse CNN Optimization
Abstract

Structured pruning of filters or neurons has received increased focus for compressing convolutional neural networks. Most existing methods rely on multi-stage optimizations in a layer-wise manner for iteratively pruning and retraining which may not be optimal and may be computation intensive. Besides, these methods are designed for pruning a specific structure, such as filter or block structures without jointly pruning heterogeneous structures. In this paper, we propose an effective structured pruning approach that jointly prunes filters as well as other structures in an end-to-end manner. To accomplish this, we first introduce a soft mask to scale the output of these structures by defining a new objective function with sparsity regularization to align the output of baseline and network with this mask. We then effectively solve the optimization problem by generative adversarial learning (GAL), which learns a sparse soft mask in a label-free and an end-to-end manner. By forcing more scaling factors in the soft mask to zero, the fast iterative shrinkage-thresholding algorithm (FISTA) can be leveraged to fast and reliably remove the corresponding structures. Extensive experiments demonstrate the effectiveness of GAL on different datasets, including MNIST, CIFAR-10 and ImageNet ILSVRC 2012. For example, on ImageNet ILSVRC 2012, the pruned ResNet-50 achieves 10.88\% Top-5 error and results in a factor of 3.7x speedup. This significantly outperforms state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09291

PDF

http://arxiv.org/pdf/1903.09291
Read All
In-home and remote use of robotic body surrogates by people with profound motor deficits

2019-03-22

Phillip M. Grice, Charles C. Kemp

arXiv_RO

arXiv_RO Face
Abstract

By controlling robots comparable to the human body, people with profound motor deficits could potentially perform a variety of physical tasks for themselves, improving their quality of life. The extent to which this is achievable has been unclear due to the lack of suitable interfaces by which to control robotic body surrogates and a dearth of studies involving substantial numbers of people with profound motor deficits. We developed a novel, web-based augmented reality interface that enables people with profound motor deficits to remotely control a PR2 mobile manipulator from Willow Garage, which is a human-scale, wheeled robot with two arms. We then conducted two studies to investigate the use of robotic body surrogates. In the first study, 15 novice users with profound motor deficits from across the United States controlled a PR2 in Atlanta, GA to perform a modified Action Research Arm Test (ARAT) and a simulated self-care task. Participants achieved clinically meaningful improvements on the ARAT and 12 of 15 participants (80%) successfully completed the simulated self-care task. Participants agreed that the robotic system was easy to use, was useful, and would provide a meaningful improvement in their lives. In the second study, one expert user with profound motor deficits had free use of a PR2 in his home for seven days. He performed a variety of self-care and household tasks, and also used the robot in novel ways. Taking both studies together, our results suggest that people with profound motor deficits can improve their quality of life using robotic body surrogates, and that they can gain benefit with only low-level robot autonomy and without invasive interfaces. However, methods to reduce the rate of errors and increase operational speed merit further investigation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.01477

PDF

http://arxiv.org/pdf/1803.01477
Read All
Fast and accurate reconstruction of HARDI using a 1D encoder-decoder convolutional network

2019-03-21

Shi Yin, Zhengqiang Zhang, Qinmu Peng, Xinge You

arXiv_CV

arXiv_CV Sparse CNN Relation
Abstract

High angular resolution diffusion imaging (HARDI) demands a lager amount of data measurements compared to diffusion tensor imaging, restricting its use in practice. In this work, we explore a learning-based approach to reconstruct HARDI from a smaller number of measurements in q-space. The approach aims to directly learn the mapping relationship between the measured and HARDI signals from the collecting HARDI acquisitions of other subjects. Specifically, the mapping is represented as a 1D encoder-decoder convolutional neural network under the guidance of the compressed sensing (CS) theory for HARDI reconstruction. The proposed network architecture mainly consists of two parts: an encoder network produces the sparse coefficients and a decoder network yields a reconstruction result. Experiment results demonstrate we can robustly reconstruct HARDI signals with the accurate results and fast speed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09272

PDF

http://arxiv.org/pdf/1903.09272
Read All
First steps to a constructor theory of cognition

2019-03-21

Riccardo Franco

arXiv_AI

arXiv_AI
Abstract

This article applies the conceptual framework of constructor theory of information to cognition theory. The main result of this work is that cognition theory, in specific situations concerning for example the conjunction fallacy heuristic, requires the use of superinformation media, just as quantum theory. This result entails that quantum and cognition theories can be considered as elements of a general class of superinformation-based subsidiary theories.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.09829

PDF

http://arxiv.org/pdf/1904.09829
Read All
Adversarial camera stickers: A Physical Camera Attack on Deep Learning Classifier

2019-03-21

Juncheng B. Li, Frank R. Schmidt, J. Zico Kolter

arXiv_CV

arXiv_CV Adversarial Deep_Learning
Abstract

Recent work has thoroughly documented the susceptibility of deep learning systems to adversarial examples, but most such instances directly manipulate the digital input to a classifier. Although a smaller line of work considers physical adversarial attacks, in all cases these involve manipulating the object of interest, e.g., putting a physical sticker on a object to misclassify it, or manufacturing an object specifically intended to be misclassified. In this work, we consider an alternative question: is it possible to fool deep classifiers, over all perceived objects of a certain type, by physically manipulating the camera itself? We show that this is indeed possible, that by placing a carefully crafted and mainly-translucent sticker over the lens of a camera, one can create universal perturbations of the observed images that are inconspicuous, yet reliably misclassify target objects as a different (targeted) class. To accomplish this, we propose an iterative procedure for both updating the attack perturbation (to make it adversarial for a given classifier), and the threat model itself (to ensure it is physically realizable). For example, we show that we can achieve physically-realizable attacks that fool ImageNet classifiers in a targeted fashion 49.6% of the time. This presents a new class of physically-realizable threat models to consider in the context of adversarially robust machine learning. Link to our demo video: https://youtu.be/wUVmL33Fx54

Abstract (translated by Google)

URL

http://arxiv.org/abs/1904.00759

PDF

http://arxiv.org/pdf/1904.00759
Read All
Deep Learning with Anatomical Priors: Imitating Enhanced Autoencoders in Latent Space for Improved Pelvic Bone Segmentation in MRI

2019-03-21

Duc Duy Pham, Gurbandurdy Dovletov, Sebastian Warwas, Stefan Landgraeber, Marcus Jäger, Josef Pauli

arXiv_CV

arXiv_CV Segmentation Semantic_Segmentation Deep_Learning
Abstract

We propose a 2D Encoder-Decoder based deep learning architecture for semantic segmentation, that incorporates anatomical priors by imitating the encoder component of an autoencoder in latent space. The autoencoder is additionally enhanced by means of hierarchical features, extracted by an U-Net module. Our suggested architecture is trained in an end-to-end manner and is evaluated on the example of pelvic bone segmentation in MRI. A comparison to the standard U-Net architecture shows promising improvements.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09263

PDF

http://arxiv.org/pdf/1903.09263
Read All
Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus

2019-03-21

Yan Zhang, Michael M. Zavlanos

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems. Specifically, we assume that all agents keep local estimates of the global optimal policy parameter and update their local value function estimates independently. Then, we introduce an additional consensus step to let all the agents asymptotically achieve agreement on the global optimal policy function. The convergence analysis of the proposed algorithm is provided and the effectiveness of the proposed algorithm is validated using a distributed resource allocation example. Compared to relevant distributed actor critic methods, here the agents do not share information about their local tasks, but instead they coordinate to estimate the global policy function.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09255

PDF

http://arxiv.org/pdf/1903.09255
Read All
CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification

2019-03-21

Zheng Tang, Milind Naphade, Ming-Yu Liu, Xiaodong Yang, Stan Birchfield, Shuo Wang, Ratnesh Kumar, David Anastasiu, Jenq-Neng Hwang

arXiv_CV

arXiv_CV Re-identification Object_Detection Knowledge Tracking Optimization Detection
Abstract

Urban traffic optimization using traffic cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km. To the best of our knowledge, CityFlow is the largest-scale dataset in terms of spatial coverage and the number of cameras/videos in an urban environment. The dataset contains more than 200K annotated bounding boxes covering a wide range of scenes, viewing angles, vehicle models, and urban traffic flow conditions. Camera geometry and calibration information are provided to aid spatio-temporal analysis. In addition, a subset of the benchmark is made available for the task of image-based vehicle re-identification (ReID). We conducted an extensive experimental evaluation of baselines/state-of-the-art approaches in MTMC tracking, multi-target single-camera (MTSC) tracking, object detection, and image-based ReID on this dataset, analyzing the impact of different network architectures, loss functions, spatio-temporal models and their combinations on task effectiveness. An evaluation server is launched with the release of our benchmark at the 2019 AI City Challenge (https://www.aicitychallenge.org/) that allows researchers to compare the performance of their newest techniques. We expect this dataset to catalyze research in this field, propel the state-of-the-art forward, and lead to deployed traffic optimization(s) in the real world.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09254

PDF

http://arxiv.org/pdf/1903.09254
Read All
Trainable Time Warping: Aligning Time-Series in the Continuous-Time Domain

2019-03-21

Soheil Khorram, Melvin G McInnis, Emily Mower Provost

arXiv_AI

arXiv_AI CNN Optimization Classification
Abstract

DTW calculates the similarity or alignment between two signals, subject to temporal warping. However, its computational complexity grows exponentially with the number of time-series. Although there have been algorithms developed that are linear in the number of time-series, they are generally quadratic in time-series length. The exception is generalized time warping (GTW), which has linear computational cost. Yet, it can only identify simple time warping functions. There is a need for a new fast, high-quality multisequence alignment algorithm. We introduce trainable time warping (TTW), whose complexity is linear in both the number and the length of time-series. TTW performs alignment in the continuous-time domain using a sinc convolutional kernel and a gradient-based optimization technique. We compare TTW and GTW on 85 UCR datasets in time-series averaging and classification. TTW outperforms GTW on 67.1% of the datasets for the averaging tasks, and 61.2% of the datasets for the classification tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09245

PDF

http://arxiv.org/pdf/1903.09245
Read All
Low Resource Text Classification with ULMFit and Backtranslation

2019-03-21

Sam Shleifer

arXiv_CL

arXiv_CL Review Text_Classification Classification Deep_Learning
Abstract

In computer vision, virtually every state of the art deep learning system is trained with data augmentation. In text classification, however, data augmentation is less widely practiced because it must be performed before training and risks introducing label noise. We augment the IMDB movie reviews dataset with examples generated by two families of techniques: random token perturbations introduced by Wei and Zou [2019] and backtranslation – translating to a second language then back to English. In low resource environments, backtranslation generates significant improvement on top of the state-of-the-art ULMFit model. A ULMFit model pretrained on wikitext103 and then finetuned on only 50 IMDB examples and 500 synthetic examples generated by backtranslation achieves 80.6\% accuracy, an 8.1\% improvement over the augmentation-free baseline with only 9 minutes of additional training time. Random token perturbations do not yield any improvements but incur equivalent computational cost. The benefits of training with backtranslated examples decreases with the size of the available training data. On the full dataset, neither augmentation technique improves upon ULMFit’s state of the art performance. We address this by using backtranslations as a form of test time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09244

PDF

http://arxiv.org/pdf/1903.09244
Read All
Inferring Compact Representations for Efficient Natural Language Understanding of Robot Instructions

2019-03-21

Siddharth Patki, Andrea F. Daniele, Matthew R. Walter, Thomas M. Howard

arXiv_AI

arXiv_AI Salient Attention Inference
Abstract

The speed and accuracy with which robots are able to interpret natural language is fundamental to realizing effective human-robot interaction. A great deal of attention has been paid to developing models and approximate inference algorithms that improve the efficiency of language understanding. However, existing methods still attempt to reason over a representation of the environment that is flat and unnecessarily detailed, which limits scalability. An open problem is then to develop methods capable of producing the most compact environment model sufficient for accurate and efficient natural language understanding. We propose a model that leverages environment-related information encoded within instructions to identify the subset of observations and perceptual classifiers necessary to perceive a succinct, instruction-specific environment representation. The framework uses three probabilistic graphical models trained from a corpus of annotated instructions to infer salient scene semantics, perceptual classifiers, and grounded symbols. Experimental results on two robots operating in different environments demonstrate that by exploiting the content and the structure of the instructions, our method learns compact environment representations that significantly improve the efficiency of natural language symbol grounding.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09243

PDF

http://arxiv.org/pdf/1903.09243
Read All
Deep Radiomics for Brain Tumor Detection and Classification from Multi-Sequence MRI

2019-03-21

Subhashis Banerjee, Sushmita Mitra, Francesco Masulli, Stefano Rovetta

arXiv_CV

arXiv_CV Transfer_Learning Classification Prediction Detection
Abstract

Glioma constitutes 80% of malignant primary brain tumors and is usually classified as HGG and LGG. The LGG tumors are less aggressive, with slower growth rate as compared to HGG, and are responsive to therapy. Tumor biopsy being challenging for brain tumor patients, noninvasive imaging techniques like Magnetic Resonance Imaging (MRI) have been extensively employed in diagnosing brain tumors. Therefore automated systems for the detection and prediction of the grade of tumors based on MRI data becomes necessary for assisting doctors in the framework of augmented intelligence. In this paper, we thoroughly investigate the power of Deep ConvNets for classification of brain tumors using multi-sequence MR images. We propose novel ConvNet models, which are trained from scratch, on MRI patches, slices, and multi-planar volumetric slices. The suitability of transfer learning for the task is next studied by applying two existing ConvNets models (VGGNet and ResNet) trained on ImageNet dataset, through fine-tuning of the last few layers. LOPO testing, and testing on the holdout dataset are used to evaluate the performance of the ConvNets. Results demonstrate that the proposed ConvNets achieve better accuracy in all cases where the model is trained on the multi-planar volumetric dataset. Unlike conventional models, it obtains a testing accuracy of 95% for the low/high grade glioma classification problem. A score of 97% is generated for classification of LGG with/without 1p/19q codeletion, without any additional effort towards extraction and selection of features. We study the properties of self-learned kernels/ filters in different layers, through visualization of the intermediate layer outputs. We also compare the results with that of state-of-the-art methods, demonstrating a maximum improvement of 7% on the grading performance of ConvNets and 9% on the prediction of 1p/19q codeletion status.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09240

PDF

http://arxiv.org/pdf/1903.09240
Read All
SkelNetOn 2019 Dataset and Challenge on Deep Learning for Geometric Shape Understanding

2019-03-21

Ilke Demir, Camilla Hahn, Kathryn Leonard, Geraldine Morin, Dana Rahbani, Athina Panotopoulou, Amelie Fondevilla, Elena Balashova, Bastien Durix, Adam Kortylewski

arXiv_CV

arXiv_CV Segmentation Deep_Learning Detection
Abstract

We present SkelNetOn 2019 Challenge and Deep Learning for Geometric Shape Understanding workshop to utilize existing and develop novel deep learning architectures for shape understanding. We observed that unlike traditional segmentation and detection tasks, geometry understanding is still a new area for investigation using deep learning techniques. SkelNetOn aims to bring together researchers from different domains to foster learning methods on global shape understanding tasks. We aim to improve and evaluate the state-of-the-art shape understanding approaches, and to serve as reference benchmarks for future research. Similar to other challenges in computer vision domain, SkelNetOn tracks propose three datasets and corresponding evaluation methodologies; all coherently bundled in three competitions with a dedicated workshop co-located with CVPR 2019 conference. In this paper, we describe and analyze characteristics of each dataset, define the evaluation criteria of the public competitions, and provide baselines for each task.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09233

PDF

http://arxiv.org/pdf/1903.09233
Read All
MobileNetV2: Inverted Residuals and Linear Bottlenecks

2019-03-21

Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen

arXiv_CV

arXiv_CV Object_Detection Segmentation Semantic_Segmentation Classification Detection
Abstract

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. The MobileNetV2 architecture is based on an inverted residual structure where the input and output of the residual block are thin bottleneck layers opposite to traditional residual models which use expanded representations in the input an MobileNetV2 uses lightweight depthwise convolutions to filter features in the intermediate expansion layer. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on Imagenet classification, COCO object detection, VOC image segmentation. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as the number of parameters

Abstract (translated by Google)

URL

http://arxiv.org/abs/1801.04381

PDF

http://arxiv.org/pdf/1801.04381
Read All
Multi-person Articulated Tracking with Spatial and Temporal Embeddings

2019-03-21

Sheng Jin, Wentao Liu, Wanli Ouyang, Chen Qian

arXiv_CV

arXiv_CV Pose_Estimation Tracking Embedding Object_Tracking Detection
Abstract

We propose a unified framework for multi-person pose estimation and tracking. Our framework consists of two main components,~\ie~SpatialNet and TemporalNet. The SpatialNet accomplishes body part detection and part-level data association in a single frame, while the TemporalNet groups human instances in consecutive frames into trajectories. Specifically, besides body part detection heatmaps, SpatialNet also predicts the Keypoint Embedding (KE) and Spatial Instance Embedding (SIE) for body part association. We model the grouping procedure into a differentiable Pose-Guided Grouping (PGG) module to make the whole part detection and grouping pipeline fully end-to-end trainable. TemporalNet extends spatial grouping of keypoints to temporal grouping of human instances. Given human proposals from two consecutive frames, TemporalNet exploits both appearance features encoded in Human Embedding (HE) and temporally consistent geometric features embodied in Temporal Instance Embedding (TIE) for robust tracking. Extensive experiments demonstrate the effectiveness of our proposed model. Remarkably, we demonstrate substantial improvements over the state-of-the-art pose tracking method from 65.4\% to 71.8\% Multi-Object Tracking Accuracy (MOTA) on the ICCV’17 PoseTrack Dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09214

PDF

http://arxiv.org/pdf/1903.09214
Read All
Artificial Intelligence for Diabetes Case Management: The Intersection of Physical and Mental Health

2019-03-21

Casey C. Bennett

arXiv_AI

arXiv_AI Classification
Abstract

Diabetes is a major public health problem in the United States, affecting roughly 30 million people. Diabetes complications, along with the mental health comorbidities that often co-occur with them, are major drivers of high healthcare costs, poor outcomes, and reduced treatment adherence in diabetes. Here, we evaluate in a large state-wide population whether we can use artificial intelligence (AI) techniques to identify clusters of patient trajectories within the broader diabetes population in order to create cost-effective, narrowly-focused case management intervention strategies to reduce development of complications. This approach combined data from: 1) claims, 2) case management notes, and 3) social determinants of health from ~300,000 real patients between 2014 and 2016. We categorized complications as five types: Cardiovascular, Neuropathy, Opthalmic, Renal, and Other. Modeling was performed combining a variety of machine learning algorithms, including supervised classification, unsupervised clustering, natural language processing of unstructured care notes, and feature engineering. The results showed that we can predict development of diabetes complications roughly 83.5% of the time using claims data or social determinants of health data. They also showed we can reveal meaningful clusters in the patient population related to complications and mental health that can be used to cost-effective screening program, reducing the number of patients to be screened down by 85%. This study outlines creation of an AI framework to develop protocols to better address mental health comorbidities that lead to complications development in the diabetes population. Future work is described that outlines potential lines of research and the need for better addressing the ‘people side’ of the equation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.03044

PDF

http://arxiv.org/pdf/1810.03044
Read All
Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

2019-03-21

Jiexiong Tang, John Folkesson, Patric Jensfelt

arXiv_CV

arXiv_CV Sparse Face Tracking Deep_Learning Prediction SLAM
Abstract

In this paper, we proposed a new deep learning based dense monocular SLAM method. Compared to existing methods, the proposed framework constructs a dense 3D model via a sparse to dense mapping using learned surface normals. With single view learned depth estimation as prior for monocular visual odometry, we obtain both accurate positioning and high quality depth reconstruction. The depth and normal are predicted by a single network trained in a tightly coupled manner.Experimental results show that our method significantly improves the performance of visual tracking and depth prediction in comparison to the state-of-the-art in deep monocular dense SLAM.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09199

PDF

http://arxiv.org/pdf/1903.09199
Read All
Semantic Comparison of State-of-the-Art Deep Learning Methods for Image Multi-Label Classification

2019-03-21

Adam Kubany, Shimon Ben Ishay, Ruben-sacha Ohayon, Armin Shmilovici, Lior Rokach, Tomer Doitshman

arXiv_CV

arXiv_CV Image_Caption Face Classification Deep_Learning Recognition
Abstract

Image understanding relies heavily on accurate multi-label classification. In recent years deep learning (DL) algorithms have become very successful tools for multi-label classification of image objects. With these set of tools, various implementations of DL algorithms for multi-label classification have been published for the public use in the form of application programming interfaces (API). In this study, we evaluate and compare 10 of the most prominent publicly available APIs in a best-of-breed challenge. The evaluation of the various APIs is performed on the Visual Genome labeling benchmark dataset using 12 well-recognized similarity metrics. Additionally, for the first time in this kind of comparison, we use a semantic similarity metric to evaluate the semantic similarity performance. In this evaluation, Microsoft Computer Vision, IBM Visual Recognition, and Imagga APIs show better performance than the other APIs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09190

PDF

http://arxiv.org/pdf/1903.09190
Read All
Long range teleoperation for fine manipulation tasks under time-delay network conditions

2019-03-21

Jun Jin, Laura Petrich, Shida He, Masood Dehghan, Martin Jagersand

arXiv_RO

arXiv_RO Sparse Knowledge Face
Abstract

We present a coarse-to-fine approach based semi-autonomous teleoperation system using vision guidance. The system is optimized for long range teleoperation tasks under time-delay network conditions and does not require prior knowledge of the remote scene. Our system initializes with a self exploration behavior that senses the remote surroundings through a freely mounted eye-in-hand web cam. The self exploration stage estimates hand-eye calibration and provides a telepresence interface via real-time 3D geometric reconstruction. The human operator is able to specify a visual task through the interface and a coarse-to-fine controller guides the remote robot enabling our system to work in high latency networks. Large motions are guided by coarse 3D estimation, whereas fine motions use image cues (IBVS). Network data transmission cost is minimized by sending only sparse points and a final image to the human side. Experiments from Singapore to Canada on multiple tasks were conducted to show our system’s capability to work in long range teleoperation tasks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09189

PDF

http://arxiv.org/pdf/1903.09189
Read All
Towards automatic construction of multi-network models for heterogeneous multi-task learning

2019-03-21

Unai Garciarena, Alexander Mendiburu, Roberto Santana

arXiv_AI

arXiv_AI Reinforcement_Learning Classification
Abstract

Multi-task learning, as it is understood nowadays, consists of using one single model to carry out several similar tasks. From classifying hand-written characters of different alphabets to figuring out how to play several Atari games using reinforcement learning, multi-task models have been able to widen their performance range across different tasks, although these tasks are usually of a similar nature. In this work, we attempt to widen this range even further, by including heterogeneous tasks in a single learning procedure. To do so, we firstly formally define a multi-network model, identifying the necessary components and characteristics to allow different adaptations of said model depending on the tasks it is required to fulfill. Secondly, employing the formal definition as a starting point, we develop an illustrative model example consisting of three different tasks (classification, regression and data sampling). The performance of this model implementation is then analyzed, showing its capabilities. Motivated by the results of the analysis, we enumerate a set of open challenges and future research lines over which the full potential of the proposed model definition can be exploited.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09171

PDF

http://arxiv.org/pdf/1903.09171
Read All
Quantitative Depth Quality Assessment of RGBD Cameras At Close Range Using 3D Printed Fixtures

2019-03-21

Michele Pratusevich, Jason Chrisos, Shreyas Aditya

arXiv_CV

arXiv_CV Quantitative
Abstract

Mobile robots that manipulate their environments require high-accuracy scene understanding at close range. Typically this understanding is achieved with RGBD cameras, but the evaluation process for selecting an appropriate RGBD camera for the application is minimally quantitative. Limited manufacturer-published metrics do not translate to observed quality in real-world cluttered environments, since quality is application-specific. To bridge the gap, we present a method for quantitatively measuring depth quality using a set of extendable 3D printed fixtures that approximate real-world conditions. By framing depth quality as point cloud density and root mean square error (RMSE) from a known geometry, we present a method that is extendable by other system integrators for custom environments. We show a comparison of 3 cameras and present a case study for camera selection, provide reference meshes and analysis code, and discuss further extensions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09169

PDF

http://arxiv.org/pdf/1903.09169
Read All
Progressive Sparse Local Attention for Video object detection

2019-03-21

Chaoxu Guo, Bin Fan, Jie Gu, Qian Zhang, Shiming Xiang, Veronique Prinet, Chunhong Pan

arXiv_CV

arXiv_CV Object_Detection Sparse Attention Detection
Abstract

Transferring image-based object detectors to domain of videos remains a challenging problem. Previous efforts mostly exploit optical flow to propagate features across frames, aiming to achieve a good trade-off between performance and computational complexity. However, introducing an extra model to estimate optical flow would significantly increase the overall model size. The gap between optical flow and high-level features can hinder it from establishing the spatial correspondence accurately. Instead of relying on optical flow, this paper proposes a novel module called Progressive Sparse Local Attention (PSLA), which establishes the spatial correspondence between features across frames in a local region with progressive sparse strides and uses the correspondence to propagate features. Based on PSLA, Recursive Feature Updating (RFU) and Dense feature Transforming (DFT) are introduced to model temporal appearance and enrich feature representation respectively. Finally, a novel framework for video object detection is proposed. Experiments on ImageNet VID are conducted. Our framework achieves a state-of-the-art speed-accuracy trade-off with significantly reduced model capacity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09126

PDF

http://arxiv.org/pdf/1903.09126
Read All
PProCRC: Probabilistic Collaboration of Image Patches

2019-03-21

Tapabrata Chakraborti, Brendan McCane, Steven mills, Umapada Pal

arXiv_CV

arXiv_CV OCR Face Recognition Face_Recognition
Abstract

We present a conditional probabilistic framework for collaborative representation of image patches. It in-corporates background compensation and outlier patch suppression into the main formulation itself, thus doingaway with the need for pre-processing steps to handle the same. A closed form non-iterative solution of the costfunction is derived. The proposed method (PProCRC) outperforms earlier related patch based (PCRC, GP-CRC)as well as the state-of-the-art probabilistic (ProCRC and EProCRC) models on several fine-grained benchmarkimage datasets for face recognition (AR and LFW) and species recognition (Oxford Flowers and Pets) tasks.We also expand our recent endemic Indian birds (IndBirds) dataset and report results on it. The demo code andIndBirds dataset are available through lead author.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09123

PDF

http://arxiv.org/pdf/1903.09123
Read All
Closed-Form Optimal Triangulation Based on Angular Errors

2019-03-21

Seong Hun Lee, Javier Civera

arXiv_CV

arXiv_CV Knowledge
Abstract

In this paper, we study closed-form optimal solutions to two-view triangulation with known internal calibration and pose. By formulating the triangulation problem as $L_1$ and $L_\infty$ minimization of angular reprojection errors, we derive the exact closed-form solutions that guarantee global optimality under respective cost functions. To the best of our knowledge, we are the first to present such solutions. Since the angular error is rotationally invariant, our solutions can be applied for any type of central cameras, be it perspective, fisheye or omnidirectional. Our methods also require significantly less computation than the existing optimal methods. Experimental results on synthetic and real datasets validate our theoretical derivations.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09115

PDF

http://arxiv.org/pdf/1903.09115
Read All
Levelling the Playing Field: A Comprehensive Comparison of Visual Place Recognition Approaches under Changing Conditions

2019-03-21

Mubariz Zaffar, Ahmad Khaliq, Shoaib Ehsan, Michael Milford, Klaus McDonald-Maier

arXiv_CV

arXiv_CV Recognition
Abstract

In recent years there has been significant improvement in the capability of Visual Place Recognition (VPR) methods, building on the success of both hand-crafted and learnt visual features, temporal filtering and usage of semantic scene information. The wide range of approaches and the relatively recent growth in interest in the field has meant that a wide range of datasets and assessment methodologies have been proposed, often with a focus only on precision-recall type metrics, making comparison difficult. In this paper we present a comprehensive approach to evaluating the performance of 10 state-of-the-art recently-developed VPR techniques, which utilizes three standardized metrics: (a) Matching Performance b) Matching Time c) Memory Footprint. Together this analysis provides an up-to-date and widely encompassing snapshot of the various strengths and weaknesses of contemporary approaches to the VPR problem. The aim of this work is to help move this particular research field towards a more mature and unified approach to the problem, enabling better comparison and hence more progress to be made in future research.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09107

PDF

http://arxiv.org/pdf/1903.09107
Read All
Future Near-Collision Prediction from Monocular Video: Feasibility, Dataset, and Challenges

2019-03-21

Aashi Manglik, Xinshuo Weng, Eshed Ohn-Bar, Kris M. Kitani

arXiv_RO

arXiv_RO CNN Deep_Learning Prediction Detection
Abstract

We explore the possibility of using a single monocular camera to forecast the time to collision between a suitcase-shaped robot being pushed by its user and other nearby pedestrians. We develop a purely image-based deep learning approach that directly estimates the time to collision without the need of relying on explicit geometric depth estimates or velocity information to predict future collisions. While previous work has focused on detecting immediate collision in the context of navigating Unmanned Aerial Vehicles, the detection was limited to a binary variable (i.e., collision or no collision). We propose a more fine-grained approach to collision forecasting by predicting the exact time to collision in terms of milliseconds, which is more helpful for collision avoidance in the context of dynamic path planning. To evaluate our method, we have collected a novel large-scale dataset of over 13,000 indoor video segments each showing a trajectory of at least one person ending in a close proximity (a near collision) with the camera mounted on a mobile suitcase-shaped platform. Using this dataset, we do extensive experimentation on different temporal windows as input using an exhaustive list of state-of-the-art convolutional neural networks (CNNs). Our results show that our proposed multi-stream CNN is the best model for predicting time to near-collision. The average prediction error of our time to near collision is 0.75 seconds across our test environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09102

PDF

http://arxiv.org/pdf/1903.09102
Read All
Flying through a narrow gap using neural network: an end-to-end planning and control approach

2019-03-21

Jiarong Lin, Luqi Wang, Fei Gao, Shaojie Shen, Fu Zhang

arXiv_RO

arXiv_RO Reinforcement_Learning Drone
Abstract

In this paper, we investigate the problem of enabling a drone to fly through a tilted narrow gap, without a traditional planning and control pipeline. To this end, we propose an end-to-end policy network, which imitates from the traditional pipeline and is fine-tuned using reinforcement learning. Unlike previous works which plan dynamical feasible trajectories using motion primitives and track the generated trajectory by a geometric controller, our proposed method is an end-to-end approach which takes the flight scenario as input and directly outputs thrust-attitude control commands for the quadrotor. Key contributions of our paper are: 1) presenting an imitate-reinforce training framework. 2) flying through a narrow gap using an end-to-end policy network, showing that learning based method can also address the highly dynamic control problem as the traditional pipeline does (see attached video: https://www.youtube.com/watch?v=jU1qRcLdjx0). 3) propose a robust imitation of an optimal trajectory generator using multilayer perceptrons. 4) show how reinforcement learning can improve the performance of imitation learning, and the potential to achieve higher performance over the model-based method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.09088

PDF

http://arxiv.org/pdf/1903.09088
Read All

112/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL