Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

High Flux Passive Imaging with Single-Photon Sensors

2019-02-26

Atul Ingle, Andreas Velten, Mohit Gupta

arXiv_CV

arXiv_CV
Abstract

Single-photon avalanche diodes (SPADs) are an emerging technology with a unique capability of capturing individual photons with high timing precision. SPADs are being used in several active imaging systems (e.g., fluorescence lifetime microscopy and LiDAR), albeit mostly limited to low photon flux settings. We propose passive free-running SPAD (PF-SPAD) imaging, an imaging modality that uses SPADs for capturing 2D intensity images with unprecedented dynamic range under ambient lighting, without any active light source. Our key observation is that the precise inter-photon timing measured by a SPAD can be used for estimating scene brightness under ambient lighting conditions, even for very bright scenes. We develop a theoretical model for PF-SPAD imaging, and derive a scene brightness estimator based on the average time of darkness between successive photons detected by a PF-SPAD pixel. Our key insight is that due to the stochastic nature of photon arrivals, this estimator does not suffer from a hard saturation limit. Coupled with high sensitivity at low flux, this enables a PF-SPAD pixel to measure a wide range of scene brightness, from very low to very high, thereby achieving extreme dynamic range. We demonstrate an improvement of over 2 orders of magnitude over conventional sensors by imaging scenes spanning a dynamic range of 10^6:1.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10190

PDF

http://arxiv.org/pdf/1902.10190
Read All
Attention is not Explanation

2019-02-26

Sarthak Jain, Byron C. Wallace

arXiv_AI

arXiv_AI Attention Prediction Relation
Abstract

Attention mechanisms have seen wide adoption in neural NLP models. In addition to improving predictive performance, these are often touted as affording transparency: models equipped with attention provide a distribution over attended-to input units, and this is often presented (at least implicitly) as communicating the relative importance of inputs. However, it is unclear what relationship exists between attention weights and model outputs. In this work, we perform extensive experiments across a variety of NLP tasks that aim to assess the degree to which attention weights provide meaningful `explanations’ for predictions. We find that they largely do not. For example, learned attention weights are frequently uncorrelated with gradient-based measures of feature importance, and one can identify very different attention distributions that nonetheless yield equivalent predictions. Our findings show that standard attention modules do not provide meaningful explanations and should not be treated as though they do. Code for all experiments is available at https://github.com/successar/AttentionExplanation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10186

PDF

http://arxiv.org/pdf/1902.10186
Read All
Obstacle-aware Adaptive Informative Path Planning for UAV-based Target Search

2019-02-26

Ajith Anil Meera, Marija Popovic, Alexander Millane, Roland Siegwart

arXiv_RO

arXiv_RO Detection
Abstract

Target search with unmanned aerial vehicles (UAVs) is relevant problem to many scenarios, e.g., search and rescue (SaR). However, a key challenge is planning paths for maximal search efficiency given flight time constraints. To address this, we propose the Obstacle-aware Adaptive Informative Path Planning (OA-IPP) algorithm for target search in cluttered environments using UAVs. Our approach leverages a layered planning strategy using a Gaussian Process (GP)-based model of target occupancy to generate informative paths in continuous 3D space. Within this framework, we introduce an adaptive replanning scheme which allows us to trade off between information gain, field coverage, sensor performance, and collision avoidance for efficient target detection. Extensive simulations show that our OA-IPP method performs better than state-of-the-art planners, and we demonstrate its application in a realistic urban SaR scenario.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10182

PDF

http://arxiv.org/pdf/1902.10182
Read All
FlowNet3D: Learning Scene Flow in 3D Point Clouds

2019-02-26

Xingyu Liu, Charles R. Qi, Leonidas J. Guibas

arXiv_CV

arXiv_CV Segmentation Embedding
Abstract

Many applications in robotics and human-computer interaction can benefit from understanding 3D motion of points in a dynamic environment, widely noted as scene flow. While most previous methods focus on stereo and RGB-D images as input, few try to estimate scene flow directly from point clouds. In this work, we propose a novel deep neural network named $FlowNet3D$ that learns scene flow from point clouds in an end-to-end fashion. Our network simultaneously learns deep hierarchical features of point clouds and flow embeddings that represent point motions, supported by two newly proposed learning layers for point sets. We evaluate the network on both challenging synthetic data from FlyingThings3D and real Lidar scans from KITTI. Trained on synthetic data only, our network successfully generalizes to real scans, outperforming various baselines and showing competitive results to the prior art. We also demonstrate two applications of our scene flow output (scan registration and motion segmentation) to show its potential wide use cases.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1806.01411

PDF

http://arxiv.org/pdf/1806.01411
Read All
Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

2019-02-26

Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller

arXiv_AI

arXiv_AI
Abstract

Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly “intelligent” behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10178

PDF

http://arxiv.org/pdf/1902.10178
Read All
A Memoization Framework for Scaling Submodular Optimization to Large Scale Problems

2019-02-26

Rishabh Iyer, Jeff Bilmes

arXiv_AI

arXiv_AI Summarization Optimization
Abstract

We are motivated by large scale submodular optimization problems, where standard algorithms that treat the submodular functions in the \emph{value oracle model} do not scale. In this paper, we present a model called the \emph{precomputational complexity model}, along with a unifying memoization based framework, which looks at the specific form of the given submodular function. A key ingredient in this framework is the notion of a \emph{precomputed statistic}, which is maintained in the course of the algorithms. We show that we can easily integrate this idea into a large class of submodular optimization problems including constrained and unconstrained submodular maximization, minimization, difference of submodular optimization, optimization with submodular constraints and several other related optimization problems. Moreover, memoization can be integrated in both discrete and continuous relaxation flavors of algorithms for these problems. We demonstrate this idea for several commonly occurring submodular functions, and show how the precomputational model provides significant speedups compared to the value oracle model. Finally, we empirically demonstrate this for large scale machine learning problems of data subset selection and summarization.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10176

PDF

http://arxiv.org/pdf/1902.10176
Read All
Coloring Big Graphs with AlphaGoZero

2019-02-26

Jiayi Huang, Mostofa Patwary, Gregory Diamos

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning
Abstract

We show that recent innovations in deep reinforcement learning can effectively color very large graphs – a well-known NP-hard problem with clear commercial applications. Because the Monte Carlo Tree Search with Upper Confidence Bound algorithm used in AlphaGoZero can improve the performance of a given heuristic, our approach allows deep neural networks trained using high performance computing (HPC) technologies to transform computation into improved heuristics with zero prior knowledge. Key to our approach is the introduction of a novel deep neural network architecture (FastColorNet) that has access to the full graph context and requires $O(V)$ time and space to color a graph with $V$ vertices, which enables scaling to very large graphs that arise in real applications like parallel computing, compilers, numerical solvers, and design automation, among others. As a result, we are able to learn new state of the art heuristics for graph coloring.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10162

PDF

http://arxiv.org/pdf/1902.10162
Read All
A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks

2019-02-26

Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, Pengchuan Zhang

arXiv_AI

arXiv_AI Adversarial Knowledge
Abstract

Verification of neural networks enables us to gauge their robustness against adversarial attacks. Verification algorithms fall into two categories: exact verifiers that run in exponential time and relaxed verifiers that are efficient but incomplete. In this paper, we unify all existing LP-relaxed verifiers, to the best of our knowledge, under a general convex relaxation framework. This framework works for neural networks with diverse architectures and nonlinearities and covers both primal and dual views of robustness verification. We further prove strong duality between the primal and dual problems under very mild conditions. Next, we perform large-scale experiments, amounting to more than 22 CPU-years, to obtain exact solution to the convex-relaxed problem that is optimal within our framework for ReLU networks. We find the exact solution does not significantly improve upon the gap between PGD and existing relaxed verifiers for various networks trained normally or robustly on MNIST and CIFAR datasets. Our results suggest there is an inherent barrier to tight verification for the large class of methods captured by our framework. We discuss possible causes of this barrier and potential future directions for bypassing it.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.08722

PDF

https://arxiv.org/pdf/1902.08722
Read All
Transfer Learning for Performance Modeling of Configurable Systems: A Causal Analysis

2019-02-26

Mohammad Ali Javidian, Pooyan Jamshidi, Marco Valtorta

arXiv_AI

arXiv_AI Knowledge Transfer_Learning Relation
Abstract

Modern systems (e.g., deep neural networks, big data analytics, and compilers) are highly configurable, which means they expose different performance behavior under different configurations. The fundamental challenge is that one cannot simply measure all configurations due to the sheer size of the configuration space. Transfer learning has been used to reduce the measurement efforts by transferring knowledge about performance behavior of systems across environments. Previously, research has shown that statistical models are indeed transferable across environments. In this work, we investigate identifiability and transportability of causal effects and statistical relations in highly-configurable systems. Our causal analysis agrees with previous exploratory analysis \cite{Jamshidi17} and confirms that the causal effects of configuration options can be carried over across environments with high confidence. We expect that the ability to carry over causal relations will enable effective performance analysis of highly-configurable systems.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10119

PDF

http://arxiv.org/pdf/1902.10119
Read All
Multi-Task Learning with Contextualized Word Representations for Extented Named Entity Recognition

2019-02-26

Thai-Hoang Pham, Khai Mai, Nguyen Minh Trung, Nguyen Tuan Duc, Danushka Bolegala, Ryohei Sasano, Satoshi Sekine

arXiv_CL

arXiv_CL Recognition
Abstract

Fine-Grained Named Entity Recognition (FG-NER) is critical for many NLP applications. While classical named entity recognition (NER) has attracted a substantial amount of research, FG-NER is still an open research domain. The current state-of-the-art (SOTA) model for FG-NER relies heavily on manual efforts for building a dictionary and designing hand-crafted features. The end-to-end framework which achieved the SOTA result for NER did not get the competitive result compared to SOTA model for FG-NER. In this paper, we investigate how effective multi-task learning approaches are in an end-to-end framework for FG-NER in different aspects. Our experiments show that using multi-task learning approaches with contextualized word representation can help an end-to-end neural network model achieve SOTA results without using any additional manual effort for creating data and designing features.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10118

PDF

http://arxiv.org/pdf/1902.10118
Read All
Utterance-level Aggregation For Speaker Recognition In The Wild

2019-02-26

Weidi Xie, Arsha Nagrani, Joon Son Chung, Andrew Zisserman

arXiv_CV

arXiv_CV Recognition
Abstract

The objective of this paper is speaker recognition “in the wild”-where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the design of deep networks for this task are the type of trunk (frame level) network, and the method of temporal aggregation. We propose a powerful speaker recognition deep network, using a “thin-ResNet” trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end. We show that our network achieves state of the art performance by a significant margin on the VoxCeleb1 test set for speaker recognition, whilst requiring fewer parameters than previous methods. We also investigate the effect of utterance length on performance, and conclude that for “in the wild” data, a longer length is beneficial.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10107

PDF

http://arxiv.org/pdf/1902.10107
Read All
A multimodal movie review corpus for fine-grained opinion mining

2019-02-26

Alexandre Garcia, Slim Essid, Florence d'Alché-Buc, Chloé Clavel

arXiv_CL

arXiv_CL Review Prediction Quantitative Recommendation
Abstract

In this paper, we introduce a set of opinion annotations for the POM movie review dataset, composed of 1000 videos. The annotation campaign is motivated by the development of a hierarchical opinion prediction framework allowing one to predict the different components of the opinions (e.g. polarity and aspect) and to identify the corresponding textual spans. The resulting annotations have been gathered at two granularity levels: a coarse one (opinionated span) and a finer one (span of opinion components). We introduce specific categories in order to make the annotation of opinions easier for movie reviews. For example, some categories allow the discovery of user recommendation and preference in movie reviews. We provide a quantitative analysis of the annotations and report the inter-annotator agreement under the different levels of granularity. We provide thus the first set of ground-truth annotations which can be used for the task of fine-grained multimodal opinion prediction. We provide an analysis of the data gathered through an inter-annotator study and show that a linear structured predictor learns meaningful features even for the prediction of scarce labels. Both the annotations and the baseline system will be made publicly available.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10102

PDF

http://arxiv.org/pdf/1902.10102
Read All
SceneFlowFields++: Multi-frame Matching, Visibility Prediction, and Robust Interpolation for Scene Flow Estimation

2019-02-26

René Schuster, Oliver Wasenmüller, Christian Unger, Georg Kuschk, Didier Stricker

arXiv_CV

arXiv_CV Regularization Sparse Prediction
Abstract

State-of-the-art scene flow algorithms pursue the conflicting targets of accuracy, run time, and robustness. With the successful concept of pixel-wise matching and sparse-to-dense interpolation, we push the limits of scene flow estimation. Avoiding strong assumptions on the domain or the problem yields a more robust algorithm. This algorithm is fast because we avoid explicit regularization during matching, which allows an efficient computation. Using image information from multiple time steps and explicit visibility prediction based on previous results, we achieve competitive performances on different data sets. Our contributions and results are evaluated in comparative experiments. Overall, we present an accurate scene flow algorithm that is faster and more generic than any individual benchmark leader.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10099

PDF

http://arxiv.org/pdf/1902.10099
Read All
Intelligent Autonomous Things on the Battlefield

2019-02-26

Alexander Kott, Ethan Stump

arXiv_AI

arXiv_AI Adversarial
Abstract

Numerous, artificially intelligent, networked things will populate the battlefield of the future, operating in close collaboration with human warfighters, and fighting as teams in highly adversarial environments. This chapter explores the characteristics, capabilities and intelli-gence required of such a network of intelligent things and humans - Internet of Battle Things (IOBT). The IOBT will experience unique challenges that are not yet well addressed by the current generation of AI and machine learning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10086

PDF

http://arxiv.org/pdf/1902.10086
Read All
Entity Recognition at First Sight: Improving NER with Eye Movement Information

2019-02-26

Nora Hollenstein, Ce Zhang

arXiv_CL

arXiv_CL Tracking Embedding Recognition
Abstract

Previous research shows that eye-tracking data contains information about the lexical and syntactic properties of text, which can be used to improve natural language processing models. In this work, we leverage eye movement features from three corpora with recorded gaze information to augment a state-of-the-art neural model for named entity recognition (NER) with gaze embeddings. These corpora were manually annotated with named entity labels. Moreover, we show how gaze features, generalized on word type level, eliminate the need for recorded eye-tracking data at test time. The gaze-augmented models for NER using token-level and type-level features outperform the baselines. We present the benefits of eye-tracking features by evaluating the NER models on both individual datasets as well as in cross-domain settings.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10068

PDF

http://arxiv.org/pdf/1902.10068
Read All
Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network

2019-02-26

Liwen Zhang, Jiqing Han

arXiv_SD

arXiv_SD CNN Classification
Abstract

The temporal dynamics and the discriminative information in the audio signals are very crucial for the Acoustic Scene Classification (ASC). In this work, we propose a temporal feature learning method with hierarchical architecture called Multi-Layer Temporal Pooling (MLTP). Via recursive non-linear feature mappings and temporal pooling operations, our proposed MLTP can effectively capture the high-level temporal dynamics for an entire audio signal with arbitrary duration in an unsupervised way. With the patch-level discriminative features extracted by a simple pre-trained convolutional neural network (CNN) as input, our method attempts to learn the temporal features for the entire audio sample which will be directly used to train the classifier. Experimental results show that our method significantly improves the ASC performance. Without using any data augmentation techniques or ensemble strategies, our method can still achieve the state of art performance with only one lightweight CNN and a single classifier.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10063

PDF

http://arxiv.org/pdf/1902.10063
Read All
MRS-VPR: a multi-resolution sampling based global visual place recognition method

2019-02-26

Peng Yin, Rangaprasad Arun Srivatsan, Yin Chen, Xueqian Li, Hongda Zhang, Lingyun Xu, Lu Li, Zhenzhong Jia, Jianmin Ji, Yuqing He

arXiv_CV

arXiv_CV Detection SLAM Recognition
Abstract

Place recognition and loop closure detection are challenging for long-term visual navigation tasks. SeqSLAM is considered to be one of the most successful approaches to achieving long-term localization under varying environmental conditions and changing viewpoints. It depends on a brute-force, time-consuming sequential matching method. We propose MRS-VPR, a multi-resolution, sampling-based place recognition method, which can significantly improve the matching efficiency and accuracy in sequential matching. The novelty of this method lies in the coarse-to-fine searching pipeline and a particle filter-based global sampling scheme, that can balance the matching efficiency and accuracy in the long-term navigation task. Moreover, our model works much better than SeqSLAM when the testing sequence has a much smaller scale than the reference sequence. Our experiments demonstrate that the proposed method is efficient in locating short temporary trajectories within long-term reference ones without losing accuracy compared to SeqSLAM.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10059

PDF

http://arxiv.org/pdf/1902.10059
Read All
A Multi-Domain Feature Learning Method for Visual Place Recognition

2019-02-26

Peng Yin, Lingyun Xu, Xueqian Li, Chen Yin, Yingli Li, Rangaprasad Arun Srivatsan, Lu Li, Jianmin Ji, Yuqing He

arXiv_AI

arXiv_AI Recognition
Abstract

Visual Place Recognition (VPR) is an important component in both computer vision and robotics applications, thanks to its ability to determine whether a place has been visited and where specifically. A major challenge in VPR is to handle changes of environmental conditions including weather, season and illumination. Most VPR methods try to improve the place recognition performance by ignoring the environmental factors, leading to decreased accuracy decreases when environmental conditions change significantly, such as day versus night. To this end, we propose an end-to-end conditional visual place recognition method. Specifically, we introduce the multi-domain feature learning method (MDFL) to capture multiple attribute-descriptions for a given place, and then use a feature detaching module to separate the environmental condition-related features from those that are not. The only label required within this feature learning pipeline is the environmental condition. Evaluation of the proposed method is conducted on the multi-season \textit{NORDLAND} dataset, and the multi-weather \textit{GTAV} dataset. Experimental results show that our method improves the feature robustness against variant environmental conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10058

PDF

http://arxiv.org/pdf/1902.10058
Read All
TCDCaps: Visual Tracking via Cascaded Dense Capsules

2019-02-26

Ding Ma, Xiangqian Wu

arXiv_CV

arXiv_CV Tracking Detection Relation
Abstract

The critical challenge in tracking-by-detection framework is how to avoid drift problem during online learning, where the robust features for a variety of appearance changes are difficult to be learned and a reasonable intersection over union (IoU) threshold that defines the true/false positives is hard to set. This paper presents the TCDCaps method to address the problems above via a cascaded dense capsule architecture. To get robust features, we extend original capsules with dense-connected routing, which are referred as DCaps. Depending on the preservation of part-whole relationships in the Capsule Networks, our dense-connected capsules can capture a variety of appearance variations. In addition, to handle the issue of IoU threshold, a cascaded DCaps model (CDCaps) is proposed to improve the quality of candidates, it consists of sequential DCaps trained with increasing IoU thresholds so as to sequentially improve the quality of candidates. Extensive experiments on 3 popular benchmarks demonstrate the robustness of the proposed TCDCaps.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10054

PDF

http://arxiv.org/pdf/1902.10054
Read All
SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set

2019-02-26

William Havard, Laurent Besacier, Olivier Rosec

arXiv_CV

arXiv_CV Caption
Abstract

This paper presents an augmentation of MSCOCO dataset where speech is added to image and text. Speech captions are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images. Disfluencies and speed perturbation are added to the signal in order to sound more natural. Each speech signal (WAV) is paired with a JSON file containing exact timecode for each word/syllable/phoneme in the spoken caption. Such a corpus could be used for Language and Vision (LaVi) tasks including speech input or output instead of text. Investigating multimodal learning schemes for unsupervised speech pattern discovery is also possible with this corpus, as demonstrated by a preliminary study conducted on a subset of the corpus (10h, 10k spoken captions).

Abstract (translated by Google)

URL

https://arxiv.org/abs/1707.08435

PDF

https://arxiv.org/pdf/1707.08435
Read All
A framework for information extraction from tables in biomedical literature

2019-02-26

Nikola Milosevic, Cassie Gregson, Robert Hernandez, Goran Nenadic

arXiv_CV

arXiv_CV Detection
Abstract

The scientific literature is growing exponentially, and professionals are no more able to cope with the current amount of publications. Text mining provided in the past methods to retrieve and extract information from text; however, most of these approaches ignored tables and figures. The research done in mining table data still does not have an integrated approach for mining that would consider all complexities and challenges of a table. Our research is examining the methods for extracting numerical (number of patients, age, gender distribution) and textual (adverse reactions) information from tables in the clinical literature. We present a requirement analysis template and an integral methodology for information extraction from tables in clinical domain that contains 7 steps: (1) table detection, (2) functional processing, (3) structural processing, (4) semantic tagging, (5) pragmatic processing, (6) cell selection and (7) syntactic processing and extraction. Our approach performed with the F-measure ranged between 82 and 92%, depending on the variable, task and its complexity.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10031

PDF

http://arxiv.org/pdf/1902.10031
Read All
Grammar Based Directed Testing of Machine Learning Systems

2019-02-26

Sakshi Udeshi, Sudipta Chattopadhyay

arXiv_AI

arXiv_AI Knowledge
Abstract

The massive progress of machine learning has seen its application over a variety of domains in the past decade. But how do we develop a systematic, scalable and modular strategy to validate machine-learning systems? We present, to the best of our knowledge, the first approach, which provides a systematic test framework for machine-learning systems that accepts grammar-based inputs. Our OGMA approach automatically discovers erroneous behaviours in classifiers and leverages these erroneous behaviours to improve the respective models. OGMA leverages inherent robustness properties present in any well trained machine-learning model to direct test generation and thus, implementing a scalable test generation methodology. To evaluate our OGMA approach, we have tested it on three real world natural language processing (NLP) classifiers. We have found thousands of erroneous behaviours in these systems. We also compare OGMA with a random test generation approach and observe that OGMA is more effective than such random test generation by up to 489%.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.10027

PDF

https://arxiv.org/pdf/1902.10027
Read All
Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution

2019-02-26

Thomas Elsken, Jan Hendrik Metzen, Frank Hutter

arXiv_CV

arXiv_CV NAS Recognition
Abstract

Neural Architecture Search aims at automatically finding neural architectures that are competitive with architectures designed by human experts. While recent approaches have achieved state-of-the-art predictive performance for image recognition, they are problematic under resource constraints for two reasons: (1)the neural architectures found are solely optimized for high predictive performance, without penalizing excessive resource consumption, (2) most architecture search methods require vast computational resources. We address the first shortcoming by proposing LEMONADE, an evolutionary algorithm for multi-objective architecture search that allows approximating the entire Pareto-front of architectures under multiple objectives, such as predictive performance and number of parameters, in a single run of the method. We address the second shortcoming by proposing a Lamarckian inheritance mechanism for LEMONADE which generates children networks that are warmstarted with the predictive performance of their trained parents. This is accomplished by using (approximate) network morphism operators for generating children. The combination of these two contributions allows finding models that are on par or even outperform both hand-crafted as well as automatically-designed networks.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1804.09081

PDF

https://arxiv.org/pdf/1804.09081
Read All
STAR-Net: Action Recognition using Spatio-Temporal Activation Reprojection

2019-02-26

William McNally, Alexander Wong, John McPhee

arXiv_CV

arXiv_CV Pose_Estimation Action_Recognition CNN Recognition
Abstract

While depth cameras and inertial sensors have been frequently leveraged for human action recognition, these sensing modalities are impractical in many scenarios where cost or environmental constraints prohibit their use. As such, there has been recent interest on human action recognition using low-cost, readily-available RGB cameras via deep convolutional neural networks. However, many of the deep convolutional neural networks proposed for action recognition thus far have relied heavily on learning global appearance cues directly from imaging data, resulting in highly complex network architectures that are computationally expensive and difficult to train. Motivated to reduce network complexity and achieve higher performance, we introduce the concept of spatio-temporal activation reprojection (STAR). More specifically, we reproject the spatio-temporal activations generated by human pose estimation layers in space and time using a stack of 3D convolutions. Experimental results on UTD-MHAD and J-HMDB demonstrate that an end-to-end architecture based on the proposed STAR framework (which we nickname STAR-Net) is proficient in single-environment and small-scale applications. On UTD-MHAD, STAR-Net outperforms several methods using richer data modalities such as depth and inertial sensors.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10024

PDF

http://arxiv.org/pdf/1902.10024
Read All
Anomalous Situation Detection in Complex Scenes

2019-02-26

Michalis Voutouris, Giovanni Sachi, Hina Afridi

arXiv_CV

arXiv_CV Tracking Detection
Abstract

In this paper we investigate a robust method to identify anomalies in complex scenes. This task is performed by evaluating the collective behavior by extracting the local binary patterns (LBP) and Laplacian of Gaussian (LoG) features. We fuse both features together which are exploited to train an MLP neural network during the training stage, and the anomaly is identified on the test samples. Considering the challenge of tracking individuals in dense crowded scenes due to multiple occlusions and clutter, in this paper we extract LBP and LoG features and use them as an approximate representation of the anomalous situation. These features well match the appearance of anomaly and their consistency, and accuracy is higher both in regular and irregular areas compared to other descriptors. In this paper, these features are exploited as input prior to train the neural network. The MLP neural network is subsequently explored to consider these features that can detect the anomalous situation. The experimental tests are conducted on a set of benchmark video sequences commonly used for anomaly situation detection.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10016

PDF

http://arxiv.org/pdf/1902.10016
Read All
Foundations of Sequence-to-Sequence Modeling for Time Series

2019-02-26

Vitaly Kuznetsov, Zelda Mariet

arXiv_AI

arXiv_AI Quantitative
Abstract

The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practitioners choosing between different modeling methodologies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.03714

PDF

http://arxiv.org/pdf/1805.03714
Read All
The Termination Critic

2019-02-26

Anna Harutyunyan, Will Dabney, Diana Borsa, Nicolas Heess, Remi Munos, Doina Precup

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination condition, as opposed to – as is common – the policy. The termination condition is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding – arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a “critic” for the termination condition. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning and planning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09996

PDF

http://arxiv.org/pdf/1902.09996
Read All
Fully Distributed Bayesian Optimization with Stochastic Policies

2019-02-26

Javier Garcia-Barcos, Ruben Martinez-Cantin

arXiv_AI

arXiv_AI Optimization
Abstract

Bayesian optimization has become a popular method for high-throughput computing, like the design of computer experiments or hyperparameter tuning of expensive models, where sample efficiency is mandatory. In these applications, distributed and scalable architectures are a necessity. However, Bayesian optimization is mostly sequential. Even parallel variants require certain computations between samples, limiting the parallelization bandwidth. Thompson sampling has been previously applied for distributed Bayesian optimization. But, when compared with other acquisition functions in the sequential setting, Thompson sampling is known to perform suboptimally. In this paper, we present a new method for fully distributed Bayesian optimization, which can be combined with any acquisition function. Our approach considers Bayesian optimization as a partially observable Markov decision process. In this context, stochastic policies, such as the Boltzmann policy, have some interesting properties which can also be studied for Bayesian optimization. Furthermore, the Boltzmann policy trivially allows a distributed Bayesian optimization implementation with high level of parallelism and scalability. We present results in several benchmarks and applications that shows the performance of our method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09992

PDF

http://arxiv.org/pdf/1902.09992
Read All
Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

2019-02-26

Tom Zahavy, Avinatan Hasidim, Haim Kaplan, Yishay Mansour

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

We consider a settings of hierarchical reinforcement learning, in which the reward is a sum of components. For each component we are given a policy that maximizes it and our goal is to assemble a policy from the individual policies that maximizes the sum of the components. We provide theoretical guarantees for assembling such policies in deterministic MDPs with collectible rewards. Our approach builds on formulating this problem as a traveling salesman problem with discounted reward. We focus on local solutions, i.e., policies that only use information from the current state; thus, they are easy to implement and do not require substantial computational resources. We propose three local stochastic policies and prove that they guarantee better performance than any deterministic local policy in the worst case; experimental results suggest that they also perform better on average.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.10140

PDF

http://arxiv.org/pdf/1902.10140
Read All
Frame Interpolation with Multi-Scale Deep Loss Functions and Generative Adversarial Networks

2019-02-26

Joost van Amersfoort, Wenzhe Shi, Alejandro Acosta, Francisco Massa, Johannes Totz, Zehan Wang, Jose Caballero

arXiv_CV

arXiv_CV Adversarial GAN CNN Deep_Learning
Abstract

Frame interpolation attempts to synthesise frames given one or more consecutive video frames. In recent years, deep learning approaches, and notably convolutional neural networks, have succeeded at tackling low- and high-level computer vision problems including frame interpolation. These techniques often tackle two problems, namely algorithm efficiency and reconstruction quality. In this paper, we present a multi-scale generative adversarial network for frame interpolation (\mbox{FIGAN}). To maximise the efficiency of our network, we propose a novel multi-scale residual estimation module where the predicted flow and synthesised frame are constructed in a coarse-to-fine fashion. To improve the quality of synthesised intermediate video frames, our network is jointly supervised at different levels with a perceptual loss function that consists of an adversarial and two content losses. We evaluate the proposed approach using a collection of 60fps videos from YouTube-8m. Our results improve the state-of-the-art accuracy and provide subjective visual quality comparable to the best performing interpolation method at x47 faster runtime.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1711.06045

PDF

http://arxiv.org/pdf/1711.06045
Read All
Understanding Agent Incentives using Causal Influence Diagrams, Part I: Single Action Settings

2019-02-26

Tom Everitt, Pedro A. Ortega, Elizabeth Barnes, Shane Legg

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Agents are systems that optimize an objective function in an environment. Together, the goal and the environment induce secondary objectives, incentives. Modeling the agent-environment interaction in graphical models called influence diagrams, we can answer two fundamental questions about an agent’s incentives directly from the graph: (1) which nodes is the agent incentivized to observe, and (2) which nodes is the agent incentivized to influence? The answers tell us which information and influence points need extra protection. For example, we may want a classifier for job applications to not use the ethnicity of the candidate, and a reinforcement learning agent not to take direct control of its reward mechanism. Different algorithms and training paradigms can lead to different influence diagrams, so our method can be used to identify algorithms with problematic incentives and help in designing algorithms with better incentives.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09980

PDF

http://arxiv.org/pdf/1902.09980
Read All
Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives

2019-02-26

Abhik Singla, Shounak Bhattacharya, Dhaivat Dholakiya, Shalabh Bhatnagar, Ashitava Ghosal, Bharadwaj Amrutur, Shishir Kolathaya

arXiv_RO

arXiv_RO Reinforcement_Learning
Abstract

Humans and animals are believed to use a very minimal set of trajectories to perform a wide variety of tasks including walking. Our main objective in this paper is two fold 1) Obtain an effective tool to realize these basic motion patterns for quadrupedal walking, called the kinematic motion primitives (kMPs), via trajectories learned from deep reinforcement learning (D-RL) and 2) Realize a set of behaviors, namely trot, walk, gallop and bound from these kinematic motion primitives in our custom four legged robot, called the `Stoch’. D-RL is a data driven approach, which has been shown to be very effective for realizing all kinds of robust locomotion behaviors, both in simulation and in experiment. On the other hand, kMPs are known to capture the underlying structure of walking and yield a set of derived behaviors. We first generate walking gaits from D-RL, which uses policy gradient based approaches. We then analyze the resulting walking by using principal component analysis. We observe that the kMPs extracted from PCA followed a similar pattern irrespective of the type of gaits generated. Leveraging on this underlying structure, we then realize walking in Stoch by a straightforward reconstruction of joint trajectories from kMPs. This type of methodology improves the transferability of these gaits to real hardware, lowers the computational overhead on-board, and also avoids multiple training iterations by generating a set of derived behaviors from a single learned gait.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.03842

PDF

http://arxiv.org/pdf/1810.03842
Read All
Design of intentional backdoors in sequential models

2019-02-26

Zhaoyuan Yang, Naresh Iyer, Johan Reimann, Nurali Virani

arXiv_AI

arXiv_AI Adversarial Reinforcement_Learning RNN Classification Detection
Abstract

Recent work has demonstrated robust mechanisms by which attacks can be orchestrated on machine learning models. In contrast to adversarial examples, backdoor or trojan attacks embed surgically modified samples with targeted labels in the model training process to cause the targeted model to learn to misclassify chosen samples in the presence of specific triggers, while keeping the model performance stable across other nominal samples. However, current published research on trojan attacks mainly focuses on classification problems, which ignores sequential dependency between inputs. In this paper, we propose methods to discreetly introduce and exploit novel backdoor attacks within a sequential decision-making agent, such as a reinforcement learning agent, by training multiple benign and malicious policies within a single long short-term memory (LSTM) network. We demonstrate the effectiveness as well as the damaging impact of such attacks through initial outcomes generated from our approach, employed on grid-world environments. We also provide evidence as well as intuition on how the trojan trigger and malicious policy is activated. Challenges with network size and unintentional triggers are identified and analogies with adversarial examples are also discussed. In the end, we propose potential approaches to defend against or serve as early detection for such attacks. Results of our work can also be extended to many applications of LSTM and recurrent networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09972

PDF

http://arxiv.org/pdf/1902.09972
Read All
Mining Objects: Fully Unsupervised Object Discovery and Localization From a Single Image

2019-02-26

Runsheng Zhang, Yaping Huang, Mengyang Pu, Qingji Guan, Jian Zhang, Qi Zou

arXiv_CV

arXiv_CV CNN
Abstract

The goal of our work is to discover dominant objects without using any annotations. We focus on performing unsupervised object discovery and localization in a strictly general setting where only a single image is given. This is far more challenge than typical co-localization or weakly-supervised localization tasks. To tackle this problem, we propose a simple but effective pattern mining-based method, called Object Mining (OM), which exploits the ad-vantages of data mining and feature representation of pre-trained convolutional neural networks (CNNs). Specifically,Object Mining first converts the feature maps from a pre-trained CNN model into a set of transactions, and then frequent patterns are discovered from transaction data base through pattern mining techniques. We observe that those discovered patterns, i.e., co-occurrence highlighted regions,typically hold appearance and spatial consistency. Motivated by this observation, we can easily discover and localize possible objects by merging relevant meaningful pat-terns in an unsupervised manner. Extensive experiments on a variety of benchmarks demonstrate that Object Mining achieves competitive performance compared with the state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09968

PDF

http://arxiv.org/pdf/1902.09968
Read All
An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Instance Detection

2019-02-26

Stefan Hinterstoisser, Olivier Pauly, Hauke Heibel, Martina Marek, Martin Bokeloh

arXiv_CV

arXiv_CV Object_Detection Deep_Learning Detection
Abstract

Deep learning methods typically require vast amounts of training data to reach their full potential. While some publicly available datasets exists, domain specific data always needs to be collected and manually labeled, an expensive, time consuming and error prone process. Training with synthetic data is therefore very lucrative, as dataset creation and labeling comes for free. We propose a novel method for creating purely synthetic training data for object detection. We leverage a large dataset of 3D background models and densely render them using full domain randomization. This yields background images with realistic shapes and texture on top of which we render the objects of interest. During training, the data generation process follows a curriculum strategy guaranteeing that all foreground models are presented to the network equally under all possible poses and conditions with increasing complexity. As a result, we entirely control the underlying statistics and we create optimal training samples at every stage of training. Using a set of 64 retail objects, we demonstrate that our simple approach enables the training of detectors that outperform models trained with real data on a challenging evaluation dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09967

PDF

http://arxiv.org/pdf/1902.09967
Read All
Autonomous Identification and Goal-Directed Invocation of Event-Predictive Behavioral Primitives

2019-02-26

Christian Gumbsch, Martin V. Butz, Georg Martius

arXiv_AI

arXiv_AI GAN Prediction
Abstract

Voluntary behavior of humans appears to be composed of small, elementary building blocks or behavioral primitives. While this modular organization seems crucial for the learning of complex motor skills and the flexible adaption of behavior to new circumstances, the problem of learning meaningful, compositional abstractions from sensorimotor experiences remains an open challenge. Here, we introduce a computational learning architecture, termed surprise-based behavioral modularization into event-predictive structures (SUBMODES), that explores behavior and identifies the underlying behavioral units completely from scratch. The SUBMODES architecture bootstraps sensorimotor exploration using a self-organizing neural controller. While exploring the behavioral capabilities of its own body, the system learns modular structures that predict the sensorimotor dynamics and generate the associated behavior. In line with recent theories of event perception, the system uses unexpected prediction error signals, i.e., surprise, to detect transitions between successive behavioral primitives. We show that, when applied to two robotic systems with completely different body kinematics, the system manages to learn a variety of complex and realistic behavioral primitives. Moreover, after initial self-exploration the system can use its learned predictive models progressively more effectively for invoking model predictive planning and goal-directed control in different tasks and environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09948

PDF

http://arxiv.org/pdf/1902.09948
Read All
Unsupervised Part Mining for Fine-grained Image Classification

2019-02-26

Jian Zhang, Runsheng Zhang, Yaping Huang, Qi Zou

arXiv_CV

arXiv_CV CNN Image_Classification Classification
Abstract

Fine-grained image classification remains challenging due to the large intra-class variance and small inter-class variance. Since the subtle visual differences are only in local regions of discriminative parts among subcategories, part localization is a key issue for fine-grained image classification. Most existing approaches localize object or parts in an image with object or part annotations, which are expensive and labor-consuming. To tackle this issue, we propose a fully unsupervised part mining (UPM) approach to localize the discriminative parts without even image-level annotations, which largely improves the fine-grained classification performance. We first utilize pattern mining techniques to discover frequent patterns, i.e., co-occurrence highlighted regions, in the feature maps extracted from a pre-trained convolutional neural network (CNN) model. Inspired by the fact that these relevant meaningful patterns typically hold appearance and spatial consistency, we then cluster the mined regions to obtain the cluster centers and the discriminative parts surrounding the cluster centers are generated. Importantly, any annotations and sophisticated training procedures are not used in our proposed part localization approach. Finally, a multi-stream classification network is built for aggregating the original, object-level and part-level features simultaneously. Compared with other state-of-the-art approaches, our UPM approach achieves the competitive performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09941

PDF

http://arxiv.org/pdf/1902.09941
Read All
Semantic Relational Object Tracking

2019-02-26

Andreas Persson, Pedro Zuidberg Dos Martires, Amy Loutfi, Luc De Raedt

arXiv_RO

arXiv_RO Tracking Object_Tracking Relation
Abstract

This paper addresses the topic of semantic world modeling by conjoining probabilistic reasoning and object anchoring. The proposed approach uses a so-called bottom-up object anchoring method that relies on the rich continuous data from perceptual sensor data. A novel anchoring matching function method learns to maintain object entities in space and time and is validated using a large set of trained humanly annotated ground truth data of real-world objects. For more complex scenarios, a high-level probabilistic object tracker has been integrated with the anchoring framework and handles the tracking of occluded objects via reasoning about the state of unobserved objects. We demonstrate the performance of our integrated approach through scenarios such as the shell game scenario, where we illustrate how anchored objects are retained by preserving relations through probabilistic reasoning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09937

PDF

http://arxiv.org/pdf/1902.09937
Read All
SuperTML: Two-Dimensional Word Embedding and Transfer Learning Using ImageNet Pretrained CNN Models for the Classifications on Tabular Data

2019-02-26

Baohua Sun, Lin Yang, Wenhan Zhang, Michael Lin, Patrick Dong, Charles Young, Jason Dong

arXiv_CV

arXiv_CV Text_Classification Embedding Transfer_Learning Classification
Abstract

Tabular data is the most commonly used form of data in industry. Gradient Boosting Trees, Support Vector Machine, Random Forest, and Logistic Regression are typically used for classification tasks on tabular data. DNN models using categorical embeddings are also applied in this task, but all attempts thus far have used one-dimensional embeddings. The recent work of Super Characters method using two-dimensional word embeddings achieved the state of art result in text classification tasks, showcasing the promise of this new approach. In this paper, we propose the SuperTML method, which borrows the idea of Super Characters method and two-dimensional embeddings to address the problem of classification on tabular data. For each input of tabular data, the features are first projected into two-dimensional embeddings like an image, and then this image is fed into fine-tuned two-dimensional CNN models for classification. Experimental results have shown that the proposed SuperTML method had achieved state-of-the-art results on both large and small datasets.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06246

PDF

http://arxiv.org/pdf/1903.06246
Read All
Person Identification and Body Mass Index: A Deep Learning-Based Study on Micro-Dopplers

2019-02-26

Sherif Abdulatif, Fady Aziz, Karim Armanious, Bernhard Kleiner, Bin Yang, Urs Schneider

arXiv_CV

arXiv_CV Embedding CNN Deep_Learning Relation
Abstract

Obtaining a smart surveillance requires a sensing system that can capture accurate and detailed information for the human walking style. The radar micro-Doppler ($\boldsymbol{\mu}$-D) analysis is proved to be a reliable metric for studying human locomotions. Thus, $\boldsymbol{\mu}$-D signatures can be used to identify humans based on their walking styles. Additionally, the signatures contain information about the radar cross section (RCS) of the moving subject. This paper investigates the effect of human body characteristics on human identification based on their $\boldsymbol{\mu}$-D signatures. In our proposed experimental setup, a treadmill is used to collect $\boldsymbol{\mu}$-D signatures of 22 subjects with different genders and body characteristics. Convolutional autoencoders (CAE) are then used to extract the latent space representation from the $\boldsymbol{\mu}$-D signatures. It is then interpreted in two dimensions using t-distributed stochastic neighbor embedding (t-SNE). Our study shows that the body mass index (BMI) has a correlation with the $\boldsymbol{\mu}$-D signature of the walking subject. A 50-layer deep residual network is then trained to identify the walking subject based on the $\boldsymbol{\mu}$-D signature. We achieve an accuracy of 98% on the test set with high signal-to-noise-ratio (SNR) and 84% in case of different SNR levels.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1811.07173

PDF

http://arxiv.org/pdf/1811.07173
Read All
IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition

2019-02-26

Ke Yang, Jingjing Fu, Xun Guo, Yan Lu, Peng Qiao, Dongsheng Li, Yong Dou

arXiv_CV

arXiv_CV Action_Recognition Recognition
Abstract

Effective spatiotemporal feature representation is crucial to the video-based action recognition task. Focusing on discriminate spatiotemporal feature learning, we propose Information Fused Temporal Transformation Network (IF-TTN) for action recognition on top of popular Temporal Segment Network (TSN) framework. In the network, Information Fusion Module (IFM) is designed to fuse the appearance and motion features at multiple ConvNet levels for each video snippet, forming a short-term video descriptor. With fused features as inputs, Temporal Transformation Networks (TTN) are employed to model middle-term temporal transformation between the neighboring snippets following a sequential order. As TSN itself depicts long-term temporal structure by segmental consensus, the proposed network comprehensively considers multiple granularity temporal features. Our IF-TTN achieves the state-of-the-art results on two most popular action recognition datasets: UCF101 and HMDB51. Empirical investigation reveals that our architecture is robust to the input motion map quality. Replacing optical flow with the motion vectors from compressed video stream, the performance is still comparable to the flow-based methods while the testing speed is 10x faster.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09928

PDF

http://arxiv.org/pdf/1902.09928
Read All
Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

2019-02-26

Haofei Xu, Jianmin Zheng, Jianfei Cai, Juyong Zhang

arXiv_CV

arXiv_CV Tracking
Abstract

While learning based depth estimation from images/videos has achieved substantial progress, there still exist intrinsic limitations. Supervised methods are limited by small amount of ground truth or labeled data and unsupervised methods for monocular videos are mostly based on the static scene assumption, not performing well on real world scenarios with the presence of dynamic objects. In this paper, we propose a new learning based method consisting of DepthNet, PoseNet and Region Deformer Networks (RDN) to estimate depth from unconstrained monocular videos without ground truth supervision. The core contribution lies in RDN for proper handling of rigid and non-rigid motions of various objects such as rigidly moving cars and deformable humans. In particular, a deformation based motion representation is proposed to model individual object motion on 2D images. This representation enables our method to be applicable to diverse unconstrained monocular videos. Our method can not only achieve the state-of-the-art results on standard benchmarks KITTI and Cityscapes, but also show promising results on a crowded pedestrian tracking dataset, which demonstrates the effectiveness of the deformation based motion representation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09907

PDF

http://arxiv.org/pdf/1902.09907
Read All
Diagnosis of Alzheimer's Disease via Multi-modality 3D Convolutional Neural Network

2019-02-26

Yechong Huang, Jiahang Xu, Yuncheng Zhou, Tong Tong, Xiahai Zhuang, the Alzheimer's Disease Neuroimaging Initiative

arXiv_CV

arXiv_CV Segmentation CNN Image_Classification Classification Deep_Learning
Abstract

Alzheimer’s Disease (AD) is one of the most concerned neurodegenerative diseases. In the last decade, studies on AD diagnosis attached great significance to artificial intelligence (AI)-based diagnostic algorithms. Among the diverse modality imaging data, T1-weighted MRI and 18F-FDGPET are widely researched for this task. In this paper, we propose a novel convolutional neural network (CNN) to fuse the multi-modality information including T1-MRI and FDG-PDT images around the hippocampal area for the diagnosis of AD. Different from the traditional machine learning algorithms, this method does not require manually extracted features, and utilizes the stateof-art 3D image-processing CNNs to learn features for the diagnosis and prognosis of AD. To validate the performance of the proposed network, we trained the classifier with paired T1-MRI and FDG-PET images using the ADNI datasets, including 731 Normal (NL) subjects, 647 AD subjects, 441 stable MCI (sMCI) subjects and 326 progressive MCI (pMCI) subjects. We obtained the maximal accuracies of 90.10% for NL/AD task, 87.46% for NL/pMCI task, and 76.90% for sMCI/pMCI task. The proposed framework yields comparative results against state-of-the-art approaches. Moreover, the experimental results have demonstrated that (1) segmentation is not a prerequisite by using CNN, (2) the hippocampal area provides enough information to give a reference to AD diagnosis. Keywords: Alzheimer’s Disease, Multi-modality, Image Classification, CNN, Deep Learning, Hippocampal

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09904

PDF

http://arxiv.org/pdf/1902.09904
Read All
Unsupervised Segmentation Algorithms' Implementation in ITK for Tissue Classification via Human Head MRI Scans

2019-02-26

Shadman Sakib, Md. Abu Bakr Siddique

arXiv_CV

arXiv_CV Segmentation Classification
Abstract

Tissue classification is one of the significant tasks in the field of biomedical image analysis. Magnetic Resonance Imaging (MRI) is of great importance in tissue classification especially in the areas of brain tissue classification which is able to recognize anatomical areas of interest such as surgical planning, monitoring therapy, clinical drug trials, image registration, stereotactic neurosurgery, radiotherapy etc. The task of this paper is to implement different unsupervised classification algorithms in ITK and perform tissue classification (white matter, gray matter, cerebrospinal fluid (CSF) and background of the human brain). For this purpose, 5 grayscale head MRI scans are provided. In order of classifying brain tissues, three algorithms are used. These are: Otsu thresholding, Bayesian classification and Bayesian classification with Gaussian smoothing. The obtained classification results are analyzed in the results and discussion section.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11131

PDF

http://arxiv.org/pdf/1902.11131
Read All
Imaging and Classification Techniques for Seagrass Mapping and Monitoring: A Comprehensive Survey

2019-02-26

Md Moniruzzaman, S. M. Shamsul Islam, Paul Lavery, Mohammed Bennamoun, C. Peng Lam

arXiv_CV

arXiv_CV Review Object_Detection Attention Survey Classification Deep_Learning Detection
Abstract

Monitoring underwater habitats is a vital part of observing the condition of the environment. The detection and mapping of underwater vegetation, especially seagrass has drawn the attention of the research community as early as the nineteen eighties. Initially, this monitoring relied on in situ observation by experts. Later, advances in remote-sensing technology, satellite-monitoring techniques and, digital photo- and video-based techniques opened a window to quicker, cheaper, and, potentially, more accurate seagrass-monitoring methods. So far, for seagrass detection and mapping, digital images from airborne cameras, spectral images from satellites, acoustic image data using underwater sonar technology, and digital underwater photo and video images have been used to map the seagrass meadows or monitor their condition. In this article, we have reviewed the recent approaches to seagrass detection and mapping to understand the gaps of the present approaches and determine further research scope to monitor the ocean health more easily. We have identified four classes of approach to seagrass mapping and assessment: still image-, video data-, acoustic image-, and spectral image data-based techniques. We have critically analysed the surveyed approaches and found the research gaps including the need for quick, cheap and effective imaging techniques robust to depth, turbidity, location and weather conditions, fully automated seagrass detectors that can work in real-time, accurate techniques for estimating the seagrass density, and the availability of high computation facilities for processing large scale data. For addressing these gaps, future research should focus on developing cheaper image and video data collection techniques, deep learning based automatic annotation and classification, and real-time percentage-cover calculation.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11114

PDF

http://arxiv.org/pdf/1902.11114
Read All
Disentangled Representation Learning for 3D Face Shape

2019-02-26

Zi-Hang Jiang, Qianyi Wu, Keyu Chen, Juyong Zhang

arXiv_CV

arXiv_CV Face Represenation_Learning
Abstract

In this paper, we present a novel strategy to design disentangled 3D face shape representation. Specifically, a given 3D face shape is decomposed into identity part and expression part, which are both encoded and decoded in a nonlinear way. To solve this problem, we propose an attribute decomposition framework for 3D face mesh. To better represent face shapes which are usually nonlinear deformed between each other, the face shapes are represented by a vertex based deformation representation rather than Euclidean coordinates. The experimental results demonstrate that our method has better performance than existing methods on decomposing the identity and expression parts. Moreover, more natural expression transfer results can be achieved with our method than existing methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09887

PDF

http://arxiv.org/pdf/1902.09887
Read All
The Unusual Effectiveness of Averaging in GAN Training

2019-02-26

Yasin Yazıcı, Chuan-Sheng Foo, Stefan Winkler, Kim-Hui Yap, Georgios Piliouras, Vijay Chandrasekhar

arXiv_CV

arXiv_CV Knowledge GAN Face
Abstract

We examine two different techniques for parameter averaging in GAN training. Moving Average (MA) computes the time-average of parameters, whereas Exponential Moving Average (EMA) computes an exponentially discounted sum. Whilst MA is known to lead to convergence in bilinear settings, we provide the – to our knowledge – first theoretical arguments in support of EMA. We show that EMA converges to limit cycles around the equilibrium with vanishing amplitude as the discount parameter approaches one for simple bilinear games and also enhances the stability of general GAN training. We establish experimentally that both techniques are strikingly effective in the non-convex-concave GAN setting as well. Both improve inception and FID scores on different architectures and for different GAN objectives. We provide comprehensive experimental results across a range of datasets – mixture of Gaussians, CIFAR-10, STL-10, CelebA and ImageNet – to demonstrate its effectiveness. We achieve state-of-the-art results on CIFAR-10 and produce clean CelebA face images.\footnote{~The code is available at \url{this https URL}}

Abstract (translated by Google)

URL

https://arxiv.org/abs/1806.04498

PDF

https://arxiv.org/pdf/1806.04498
Read All
MSC: A Dataset for Macro-Management in StarCraft II

2019-02-26

Huikai Wu, Junge Zhang, Kaiqi Huang

arXiv_AI

arXiv_AI Prediction
Abstract

Macro-management is an important problem in StarCraft, which has been studied for a long time. Various datasets together with assorted methods have been proposed in the last few years. But these datasets have some defects for boosting the academic and industrial research: 1) There’re neither standard preprocessing, parsing and feature extraction procedures nor predefined training, validation and test set in some datasets. 2) Some datasets are only specified for certain tasks in macro-management. 3) Some datasets are either too small or don’t have enough labeled data for modern machine learning algorithms such as deep neural networks. So most previous methods are trained with various features, evaluated on different test sets from the same or different datasets, making it difficult to be compared directly. To boost the research of macro-management in StarCraft, we release a new dataset MSC based on the platform SC2LE. MSC consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. We also split MSC into training, validation and test set for the convenience of evaluation and comparison. Besides the dataset, we propose a baseline model and present initial baseline results for global state evaluation and build order prediction, which are two of the key tasks in macro-management. Various downstream tasks and analyses of the dataset are also described for the sake of research on macro-management in StarCraft II. Homepage: https://github.com/wuhuikai/MSC.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1710.03131

PDF

http://arxiv.org/pdf/1710.03131
Read All
MC-ISTA-Net: Adaptive Measurement and Initialization and Channel Attention Optimization inspired Neural Network for Compressive Sensing

2019-02-26

Nanyu Li, Cuiyin Liu, Wei Dai

arXiv_CV

arXiv_CV Attention Optimization
Abstract

The optimization inspired network can bridge convex optimization and neural networks in Compressive Sensing (CS) reconstruction of natural image, like ISTA-Net+, which mapping optimization algorithm: iterative shrinkage-thresholding algorithm (ISTA) into network. However, measurement matrix and input initialization are still hand-crafted, and multi-channel feature map contain information at different frequencies, which is treated equally across channels, hindering the ability of CS reconstruction in optimization-inspired networks. In order to solve the above problems, we proposed MC-ISTA-Net

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09878

PDF

http://arxiv.org/pdf/1902.09878
Read All
Improving a tf-idf weighted document vector embedding

2019-02-26

Craig W. Schmidt

arXiv_CL

arXiv_CL Review QA Embedding Language_Model
Abstract

We examine a number of methods to compute a dense vector embedding for a document in a corpus, given a set of word vectors such as those from word2vec or GloVe. We describe two methods that can improve upon a simple weighted sum, that are optimal in the sense that they maximizes a particular weighted cosine similarity measure. We consider several weighting functions, including inverse document frequency (idf), smooth inverse frequency (SIF), and the sub-sampling function used in word2vec. We find that idf works best for our applications. We also use common component removal proposed by Arora et al. as a post-process and find it is helpful in most cases. We compare these embeddings variations to the doc2vec embedding on a new evaluation task using TripAdvisor reviews, and also on the CQADupStack benchmark from the literature.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.09875

PDF

http://arxiv.org/pdf/1902.09875
Read All

143/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL