Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

A Sliding Mode Force and Position Controller Synthesis for Series Elastic Actuators

2019-03-13

Emre Sariyildiz, Rahim Mutlu, Haoyong Yu

arXiv_RO

arXiv_RO
Abstract

This paper deals with the robust force and position control problems of Series Elastic Actuators. It is shown that a Series Elastic Actuator’s force control problem can be described by a second-order dynamic model which suffers from only matched disturbances. However, the position control dynamics of a Series Elastic Actuator is of fourth-order and includes matched and mismatched disturbances. In other words, a Series Elastic Actuator’s position control is more complicated than its force control, particularly when disturbances are considered. A novel robust motion controller is proposed for Series Elastic Actuators by using Disturbance Observer and Sliding Mode Control. When the proposed robust motion controller is implemented, a Series Elastic Actuator can precisely track desired trajectories and safely contact with an unknown and dynamic environment. The proposed motion controller does not require precise dynamic models of the actuator and environment. Therefore, it can be applied to many different advanced robotic systems such as compliant humanoids and exoskeletons. The validity of the motion controller is experimentally verified.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05337

PDF

http://arxiv.org/pdf/1903.05337
Read All
Efficient Search-Based Weighted Model Integration

2019-03-13

Zhe Zeng, Guy Van den Broeck

arXiv_AI

arXiv_AI Sparse Inference
Abstract

Weighted model integration (WMI) extends Weighted model counting (WMC) to the integration of functions over mixed discrete-continuous domains. It has shown tremendous promise for solving inference problems in graphical models and probabilistic programming. Yet, state-of-the-art tools for WMI are limited in terms of performance and ignore the independence structure that is crucial to improving efficiency. To address this limitation, we propose an efficient model integration algorithm for theories with tree primal graphs. We exploit the sparse graph structure by using search to performing integration. Our algorithm greatly improves the computational efficiency on such problems and exploits context-specific independence between variables. Experimental results show dramatic speedups compared to existing WMI solvers on problems with tree-shaped dependencies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05334

PDF

http://arxiv.org/pdf/1903.05334
Read All
Multi-Geometry Spatial Acoustic Modeling for Distant Speech Recognition

2019-03-13

Kenichi Kumatani, Wu Minhua, Shiva Sundaram, Nikko Strom, Bjorn Hoffmeister

arXiv_SD

arXiv_SD Speech_Recognition Optimization RNN Recognition
Abstract

The use of spatial information with multiple microphones can improve far-field automatic speech recognition (ASR) accuracy. However, conventional microphone array techniques degrade speech enhancement performance when there is an array geometry mismatch between design and test conditions. Moreover, such speech enhancement techniques do not always yield ASR accuracy improvement due to the difference between speech enhancement and ASR optimization objectives. In this work, we propose to unify an acoustic model framework by optimizing spatial filtering and long short-term memory (LSTM) layers from multi-channel (MC) input. Our acoustic model subsumes beamformers with multiple types of array geometry. In contrast to deep clustering methods that treat a neural network as a black box tool, the network encoding the spatial filters can process streaming audio data in real time without the accumulation of target signal statistics. We demonstrate the effectiveness of such MC neural networks through ASR experiments on the real-world far-field data. We show that our two-channel acoustic model can on average reduce word error rates (WERs) by~13.4 and~12.7% compared to a single channel ASR system with the log-mel filter bank energy (LFBE) feature under the matched and mismatched microphone placement conditions, respectively. Our result also shows that our two-channel network achieves a relative WER reduction of over~7.0% compared to conventional beamforming with seven microphones overall.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06539

PDF

http://arxiv.org/pdf/1903.06539
Read All
Silicon Nitride Stress Liner Impacts on the Electrical Characteristics of AlGaN/GaN HEMTs

2019-03-13

Wei-Chih Cheng, Tao Fang, Siqi Lei, Yunlong Zhao, Minghao He, Mansun Chan, Guangrui (Maggie)Xia, Feng Zhao, Hongyu Yu

arXiv_CV

arXiv_CV GAN Face
Abstract

Due to the piezoelectric nature of GaN, the 2DEG in AlGaN/GaN HEMT could be engineered by strain. In this work, SiNx deposited using dual-frequency PECVD was used as a stressor. The output performance of the devices was dominated by the surface passivation instead of the stress effect. However, the threshold voltage was increased by the induced stress, supporting strain engineering as an effective approach to pursue the normally-off operation of AlGaN/GaN HEMTs.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1903.05290

PDF

https://arxiv.org/pdf/1903.05290
Read All
All You Need is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification

2019-03-13

Weijie Chen, Di Xie, Yuan Zhang, Shiliang Pu

arXiv_CV

arXiv_CV Sparse CNN Image_Classification Optimization Inference Classification
Abstract

Shift operation is an efficient alternative over depthwise separable convolution. However, it is still bottlenecked by its implementation manner, namely memory movement. To put this direction forward, a new and novel basic component named Sparse Shift Layer (SSL) is introduced in this paper to construct efficient convolutional neural networks. In this family of architectures, the basic block is only composed by 1x1 convolutional layers with only a few shift operations applied to the intermediate feature maps. To make this idea feasible, we introduce shift operation penalty during optimization and further propose a quantization-aware shift learning method to impose the learned displacement more friendly for inference. Extensive ablation studies indicate that only a few shift operations are sufficient to provide spatial information communication. Furthermore, to maximize the role of SSL, we redesign an improved network architecture to Fully Exploit the limited capacity of neural Network (FE-Net). Equipped with SSL, this network can achieve 75.0% top-1 accuracy on ImageNet with only 563M M-Adds. It surpasses other counterparts constructed by depthwise separable convolution and the networks searched by NAS in terms of accuracy and practical speed.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05285

PDF

http://arxiv.org/pdf/1903.05285
Read All
Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

2019-03-13

Yunhao Tang, Mingzhang Yin, Mingyuan Zhou

arXiv_AI

arXiv_AI Optimization
Abstract

Due to the high variance of policy gradients, on-policy optimization algorithms are plagued with low sample efficiency. In this work, we propose Augment-Reinforce-Merge (ARM) policy gradient estimator as an unbiased low-variance alternative to previous baseline estimators on tasks with binary action space, inspired by the recent ARM gradient estimator for discrete random variable models. We show that the ARM policy gradient estimator achieves variance reduction with theoretical guarantees, and leads to significantly more stable and faster convergence of policies parameterized by neural networks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05284

PDF

http://arxiv.org/pdf/1903.05284
Read All
2019-05-31

Read All
End-To-End Speech Recognition Using A High Rank LSTM-CTC Based Model

2019-03-12

Yangyang Shi, Mei-Yuh Hwang, Xin Lei

arXiv_CL

arXiv_CL Speech_Recognition RNN Classification Recognition
Abstract

Long Short Term Memory Connectionist Temporal Classification (LSTM-CTC) based end-to-end models are widely used in speech recognition due to its simplicity in training and efficiency in decoding. In conventional LSTM-CTC based models, a bottleneck projection matrix maps the hidden feature vectors obtained from LSTM to softmax output layer. In this paper, we propose to use a high rank projection layer to replace the projection matrix. The output from the high rank projection layer is a weighted combination of vectors that are projected from the hidden feature vectors via different projection matrices and non-linear activation function. The high rank projection layer is able to improve the expressiveness of LSTM-CTC models. The experimental results show that on Wall Street Journal (WSJ) corpus and LibriSpeech data set, the proposed method achieves 4%-6% relative word error rate (WER) reduction over the baseline CTC system. They outperform other published CTC based end-to-end (E2E) models under the condition that no external data or data augmentation is applied. Code has been made available at https://github.com/mobvoi/lstm_ctc.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05261

PDF

http://arxiv.org/pdf/1903.05261
Read All
Syntax-aware Neural Semantic Role Labeling with Supertags

2019-03-12

Jungo Kasai, Dan Friedman, Robert Frank, Dragomir Radev, Owen Rambow

arXiv_CL

arXiv_CL RNN
Abstract

We introduce a new syntax-aware model for dependency-based semantic role labeling that outperforms syntax-agnostic models for English and Spanish. We use a BiLSTM to tag the text with supertags extracted from dependency parses, and we feed these supertags, along with words and parts of speech, into a deep highway BiLSTM for semantic role labeling. Our model combines the strengths of earlier models that performed SRL on the basis of a full dependency parse with more recent models that use no syntactic information at all. Our local and non-ensemble model achieves state-of-the-art performance on the CoNLL 09 English and Spanish datasets. SRL models benefit from syntactic information, and we show that supertagging is a simple, powerful, and robust way to incorporate syntax into a neural SRL system.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05260

PDF

http://arxiv.org/pdf/1903.05260
Read All
Towards Unsupervised Cancer Subtyping: Predicting Prognosis Using A Histologic Visual Dictionary

2019-03-12

Hassan Muhammad, Carlie S. Sigel, Gabriele Campanella, Thomas Boerner, Linda M. Pak, Stefan Büttner, Jan N.M. IJzermans, Bas Groot Koerkamp, Michael Doukas, William R. Jarnagin, Amber Simpson, Thomas J. Fuchs

arXiv_CV

arXiv_CV CNN
Abstract

Unlike common cancers, such as those of the prostate and breast, tumor grading in rare cancers is difficult and largely undefined because of small sample sizes, the sheer volume of time needed to undertake on such a task, and the inherent difficulty of extracting human-observed patterns. One of the most challenging examples is intrahepatic cholangiocarcinoma (ICC), a primary liver cancer arising from the biliary system, for which there is well-recognized tumor heterogeneity and no grading paradigm or prognostic biomarkers. In this paper, we propose a new unsupervised deep convolutional autoencoder-based clustering model that groups together cellular and structural morphologies of tumor in 246 ICC digitized whole slides, based on visual similarity. From this visual dictionary of histologic patterns, we use the clusters as covariates to train Cox-proportional hazard survival models. In univariate analysis, three clusters were significantly associated with recurrence-free survival. Combinations of these clusters were significant in multivariate analysis. In a multivariate analysis of all clusters, five showed significance to recurrence-free survival, however the overall model was not measured to be significant. Finally, a pathologist assigned clinical terminology to the significant clusters in the visual dictionary and found evidence supporting the hypothesis that collagen-enriched fibrosis plays a role in disease severity. These results offer insight into the future of cancer subtyping and show that computational pathology can contribute to disease prognostication, especially in rare cancers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05257

PDF

http://arxiv.org/pdf/1903.05257
Read All
The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

2019-03-12

Andrew Hundt, Varun Jain, Chia-Hung Lin, Chris Paxton, Gregory D. Hager

arXiv_AI

arXiv_AI Prediction
Abstract

A robot can now grasp an object more effectively than ever before, but once it has the object what happens next? We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances. To address this, we introduce the JHU CoSTAR Block Stacking Dataset (BSD), where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data. We discuss the ways in which this dataset provides a valuable resource for a broad range of other topics of investigation. We find that hand-designed neural networks that work on prior datasets do not generalize to this task. Thus, to establish a baseline for this dataset, we demonstrate an automated search of neural network based models using a novel multiple-input HyperTree MetaModel, and find a final model which makes reasonable 3D pose predictions for grasping and stacking on our dataset. The CoSTAR BSD, code, and instructions are available at https://sites.google.com/site/costardataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1810.11714

PDF

http://arxiv.org/pdf/1810.11714
Read All
Learning Feature Aggregation in Temporal Domain for Re-Identification

2019-03-12

Jakub Špaňhel, Jakub Sochor, Roman Juránek, Petr Dobeš, Vojtěch Bartl, Adam Herout

arXiv_CV

arXiv_CV Re-identification Attention Person_Re-identification
Abstract

Person re-identification is a standard and established problem in the computer vision community. In recent years, vehicle re-identification is also getting more attention. In this paper, we focus on both these tasks and propose a method for aggregation of features in temporal domain as it is common to have multiple observations of the same object. The aggregation is based on weighting different elements of the feature vectors by different weights and it is trained in an end-to-end manner by a Siamese network. The experimental results show that our method outperforms other existing methods for feature aggregation in temporal domain on both vehicle and person re-identification tasks. Furthermore, to push research in vehicle re-identification further, we introduce a novel dataset CarsReId74k. The dataset is not limited to frontal/rear viewpoints. It contains 17,681 unique vehicles, 73,976 observed tracks, and 277,236 positive pairs. The dataset was captured by 66 cameras from various angles.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05244

PDF

http://arxiv.org/pdf/1903.05244
Read All
A Visually Plausible Grasping System for Object Manipulation and Interaction in Virtual Reality Environments

2019-03-12

Sergiu Oprea, Pablo Martinez-Gonzalez, Alberto Garcia-Garcia, John Alejandro Castro-Vargas, Sergio Orts-Escolano, Jose Garcia-Rodriguez

arXiv_CV

arXiv_CV Face Quantitative
Abstract

Interaction in virtual reality (VR) environments is essential to achieve a pleasant and immersive experience. Most of the currently existing VR applications, lack of robust object grasping and manipulation, which are the cornerstone of interactive systems. Therefore, we propose a realistic, flexible and robust grasping system that enables rich and real-time interactions in virtual environments. It is visually realistic because it is completely user-controlled, flexible because it can be used for different hand configurations, and robust because it allows the manipulation of objects regardless their geometry, i.e. hand is automatically fitted to the object shape. In order to validate our proposal, an exhaustive qualitative and quantitative performance analysis has been carried out. On the one hand, qualitative evaluation was used in the assessment of the abstract aspects such as: hand movement realism, interaction realism and motor control. On the other hand, for the quantitative evaluation a novel error metric has been proposed to visually analyze the performed grips. This metric is based on the computation of the distance from the finger phalanges to the nearest contact point on the object surface. These contact points can be used with different application purposes, mainly in the field of robotics. As a conclusion, system evaluation reports a similar performance between users with previous experience in virtual reality applications and inexperienced users, referring to a steep learning curve.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05238

PDF

http://arxiv.org/pdf/1903.05238
Read All
Bootstrapping Method for Developing Part-of-Speech Tagged Corpus in Low Resource Languages Tagset - A Focus on an African Igbo

2019-03-12

Onyenwe Ikechukwu E, Onyedinma Ebele G, Aniegwu Godwin E, Ezeani Ignatius M

arXiv_CL

arXiv_CL Face Speech_Recognition Recognition
Abstract

Most languages, especially in Africa, have fewer or no established part-of-speech (POS) tagged corpus. However, POS tagged corpus is essential for natural language processing (NLP) to support advanced researches such as machine translation, speech recognition, etc. Even in cases where there is no POS tagged corpus, there are some languages for which parallel texts are available online. The task of POS tagging a new language corpus with a new tagset usually face a bootstrapping problem at the initial stages of the annotation process. The unavailability of automatic taggers to help the human annotator makes the annotation process to appear infeasible to quickly produce adequate amounts of POS tagged corpus for advanced NLP research and training the taggers. In this paper, we demonstrate the efficacy of a POS annotation method that employed the services of two automatic approaches to assist POS tagged corpus creation for a novel language in NLP. The two approaches are cross-lingual and monolingual POS tags projection. We used cross-lingual to automatically create an initial ‘errorful’ tagged corpus for a target language via word-alignment. The resources for creating this are derived from a source language rich in NLP resources. A monolingual method is applied to clean the induce noise via an alignment process and to transform the source language tags to the target language tags. We used English and Igbo as our case study. This is possible because there are parallel texts that exist between English and Igbo, and the source language English has available NLP resources. The results of the experiment show a steady improvement in accuracy and rate of tags transformation with score ranges of 6.13% to 83.79% and 8.67% to 98.37% respectively. The rate of tags transformation evaluates the rate at which source language tags are translated to target language tags.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05225

PDF

http://arxiv.org/pdf/1903.05225
Read All
'Hang in There': Lexical and Visual Analysis to Identify Posts Warranting Empathetic Responses

2019-03-12

Mimansa Jaiswal, Sairam Tabibu, Erik Cambria

arXiv_CL

arXiv_CL Sentiment Caption
Abstract

In the past few years, social media has risen as a platform where people express and share personal incidences about abuse, violence and mental health issues. There is a need to pinpoint such posts and learn the kind of response expected. For this purpose, we understand the sentiment that a personal story elicits on different posts present on different social media sites, on the topics of abuse or mental health. In this paper, we propose a method supported by hand-crafted features to judge if the post requires an empathetic response. The model is trained upon posts from various web-pages and corresponding comments, on both the captions and the images. We were able to obtain 80% accuracy in tagging posts requiring empathetic responses.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05210

PDF

http://arxiv.org/pdf/1903.05210
Read All
Big Data Analytics and AI in Mental Healthcare

2019-03-12

Ariel Rosenfeld, David Benrimoh, Caitrin Armstrong, Nykan Mirchi, Timothe Langlois-Therrien, Colleen Rollins, Myriam Tanguay-Sela, Joseph Mehltretter, Robert Fratila, Sonia Israel, Emily Snook, Kelly Perlman, Akiva Kleinerman, Bechara Saab, Mark Thoburn, Cheryl Gabbay, Amit Yaniv-Rosenfeld

arXiv_AI

arXiv_AI
Abstract

Mental health conditions cause a great deal of distress or impairment; depression alone will affect 11% of the world’s population. The application of Artificial Intelligence (AI) and big-data technologies to mental health has great potential for personalizing treatment selection, prognosticating, monitoring for relapse, detecting and helping to prevent mental health conditions before they reach clinical-level symptomatology, and even delivering some treatments. However, unlike similar applications in other fields of medicine, there are several unique challenges in mental health applications which currently pose barriers towards the implementation of these technologies. Specifically, there are very few widely used or validated biomarkers in mental health, leading to a heavy reliance on patient and clinician derived questionnaire data as well as interpretation of new signals such as digital phenotyping. In addition, diagnosis also lacks the same objective ‘gold standard’ as in other conditions such as oncology, where clinicians and researchers can often rely on pathological analysis for confirmation of diagnosis. In this chapter we discuss the major opportunities, limitations and techniques used for improving mental healthcare through AI and big-data. We explore both the computational, clinical and ethical considerations and best practices as well as lay out the major researcher directions for the near future.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.12071

PDF

http://arxiv.org/pdf/1903.12071
Read All
Arithmetic-Geometric Mean Robustness for Control from Signal Temporal Logic Specifications

2019-03-12

Noushin Mehdipour, Cristian-Ioan Vasile, Calin Belta

arXiv_RO

arXiv_RO
Abstract

We present a new average-based robustness score for Signal Temporal Logic (STL) and a framework for optimal control of a dynamical system under STL constraints. By averaging the scores of different specifications or subformulae at different time points, our new definition highlights the frequency of satisfaction, as well as how robustly each specification is satisfied at each time point. We show that this definition provides a better score for how well a specification is satisfied. Its usefulness in monitoring and control synthesis problems is illustrated through case studies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05186

PDF

http://arxiv.org/pdf/1903.05186
Read All
Topological Analysis of Syntactic Structures

2019-03-12

Alexander Port, Taelin Karidi, Matilde Marcolli

arXiv_CL

arXiv_CL Relation
Abstract

We use the persistent homology method of topological data analysis and dimensional analysis techniques to study data of syntactic structures of world languages. We analyze relations between syntactic parameters in terms of dimensionality, of hierarchical clustering structures, and of non-trivial loops. We show there are relations that hold across language families and additional relations that are family-specific. We then analyze the trees describing the merging structure of persistent connected components for languages in different language families and we show that they partly correlate to historical phylogenetic trees but with significant differences. We also show the existence of interesting non-trivial persistent first homology groups in various language families. We give examples where explicit generators for the persistent first homology can be identified, some of which appear to correspond to homoplasy phenomena, while others may have an explanation in terms of historical linguistics, corresponding to known cases of syntactic borrowing across different language subfamilies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05181

PDF

http://arxiv.org/pdf/1903.05181
Read All
Richness of Deep Echo State Network Dynamics

2019-03-12

Claudio Gallicchio, Alessio Micheli

arXiv_AI

arXiv_AI RNN Gradient_Descent
Abstract

Reservoir Computing (RC) is a popular methodology for the efficient design of Recurrent Neural Networks (RNNs). Recently, the advantages of the RC approach have been extended to the context of multi-layered RNNs, with the introduction of the Deep Echo State Network (DeepESN) model. In this paper, we study the quality of state dynamics in progressively higher layers of DeepESNs, using tools from the areas of information theory and numerical analysis. Our experimental results on RC benchmark datasets reveal the fundamental role played by the strength of inter-reservoir connections to increasingly enrich the representations developed in higher layers. Our analysis also gives interesting insights into the possibility of effective exploitation of training algorithms based on stochastic gradient descent in the RC field.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05174

PDF

http://arxiv.org/pdf/1903.05174
Read All
On the Pitfalls of Measuring Emergent Communication

2019-03-12

Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin

arXiv_AI

arXiv_AI Reinforcement_Learning Survey Quantitative Detection Recommendation
Abstract

How do we know if communication is emerging in a multi-agent system? The vast majority of recent papers on emergent communication show that adding a communication channel leads to an increase in reward or task success. This is a useful indicator, but provides only a coarse measure of the agent’s learned communication abilities. As we move towards more complex environments, it becomes imperative to have a set of finer tools that allow qualitative and quantitative insights into the emergence of communication. This may be especially useful to allow humans to monitor agents’ behaviour, whether for fault detection, assessing performance, or even building trust. In this paper, we examine a few intuitive existing metrics for measuring communication, and show that they can be misleading. Specifically, by training deep reinforcement learning agents to play simple matrix games augmented with a communication channel, we find a scenario where agents appear to communicate (their messages provide information about their subsequent action), and yet the messages do not impact the environment or other agent in any way. We explain this phenomenon using ablation studies and by visualizing the representations of the learned policies. We also survey some commonly used metrics for measuring emergent communication, and provide recommendations as to when these metrics should be used.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05168

PDF

http://arxiv.org/pdf/1903.05168
Read All
Search-based 3D Planning and Trajectory Optimization for Safe Micro Aerial Vehicle Flight Under Sensor Visibility Constraints

2019-03-12

Matthias Nieuwenhuisen, Sven Behnke

arXiv_RO

arXiv_RO Optimization
Abstract

Safe navigation of Micro Aerial Vehicles (MAVs) requires not only obstacle-free flight paths according to a static environment map, but also the perception of and reaction to previously unknown and dynamic objects. This implies that the onboard sensors cover the current flight direction. Due to the limited payload of MAVs, full sensor coverage of the environment has to be traded off with flight time. Thus, often only a part of the environment is covered. We present a combined allocentric complete planning and trajectory optimization approach taking these sensor visibility constraints into account. The optimized trajectories yield flight paths within the apex angle of a Velodyne Puck Lite 3D laser scanner enabling low-level collision avoidance to perceive obstacles in the flight direction. Furthermore, the optimized trajectories take the flight dynamics into account and contain the velocities and accelerations along the path. We evaluate our approach with a DJI Matrice 600 MAV and in simulation employing hardware-in-the-loop.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05165

PDF

http://arxiv.org/pdf/1903.05165
Read All
Scaling Multi-Domain Dialogue State Tracking via Query Reformulation

2019-03-12

Pushpendre Rastogi, Arpit Gupta, Tongfei Chen, Lambert Mathias

arXiv_CL

arXiv_CL Tracking
Abstract

We present a novel approach to dialogue state tracking and referring expression resolution tasks. Successful contextual understanding of multi-turn spoken dialogues requires resolving referring expressions across turns and tracking the entities relevant to the conversation across turns. Tracking conversational state is particularly challenging in a multi-domain scenario when there exist multiple spoken language understanding (SLU) sub-systems, and each SLU sub-system operates on its domain-specific meaning representation. While previous approaches have addressed the disparate schema issue by learning candidate transformations of the meaning representation, in this paper, we instead model the reference resolution as a dialogue context-aware user query reformulation task – the dialog state is serialized to a sequence of natural language tokens representing the conversation. We develop our model for query reformulation using a pointer-generator network and a novel multi-task learning setup. In our experiments, we show a significant improvement in absolute F1 on an internal as well as a, soon to be released, public benchmark respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05164

PDF

http://arxiv.org/pdf/1903.05164
Read All
Simple Physical Adversarial Examples against End-to-End Autonomous Driving Models

2019-03-12

Adith Boloor, Xin He, Christopher Gill, Yevgeniy Vorobeychik, Xuan Zhang

arXiv_RO

arXiv_RO Adversarial Deep_Learning
Abstract

Recent advances in machine learning, especially techniques such as deep neural networks, are promoting a range of high-stakes applications, including autonomous driving, which often relies on deep learning for perception. While deep learning for perception has been shown to be vulnerable to a host of subtle adversarial manipulations of images, end-to-end demonstrations of successful attacks, which manipulate the physical environment and result in physical consequences, are scarce. Moreover, attacks typically involve carefully constructed adversarial examples at the level of pixels. We demonstrate the first end-to-end attacks on autonomous driving in simulation, using simple physically realizable attacks: the painting of black lines on the road. These attacks target deep neural network models for end-to-end autonomous driving control. A systematic investigation shows that such attacks are surprisingly easy to engineer, and we describe scenarios (e.g., right turns) in which they are highly effective, and others that are less vulnerable (e.g., driving straight). Further, we use network deconvolution to demonstrate that the attacks succeed by inducing activation patterns similar to entirely different scenarios used in training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05157

PDF

http://arxiv.org/pdf/1903.05157
Read All
A Path Planning Framework for a Flying Robot in Close Proximity of Humans

2019-03-12

Hyung-Jin Yoon, Christopher Widdowson, Thiago Marinho, Ranxiao Frances Wang, Naira Hovakimyan

arXiv_RO

arXiv_RO Attention
Abstract

We present a path planning framework that takes into account the human’s safety perception in the presence of a flying robot. The framework addresses two objectives: (i) estimation of the uncertain parameters of the proposed safety perception model based on test data collected using Virtual Reality (VR) testbed, and (ii) offline optimal control computation using the estimated safety perception model. Due to the unknown factors in the human tests data, it is not suitable to use standard regression techniques that minimize the mean squared error (MSE). We propose to use a Hidden Markov model (HMM) approach where human’s attention is considered as a hidden state to infer whether the data samples are relevant to learn the safety perception model. The HMM approach improved log-likelihood over the standard least squares solution. For path planning, we use Bernstein polynomials for discretization, as the resulting path remains within the convex hull of the control points, providing guarantees for deconfliction with obstacles at low computational cost. An example of optimal trajectory generation using the learned human model is presented. The optimal trajectory generated using the proposed model results in reasonable safety distance from the human. In contrast, the paths generated using the standard regression model have undesirable shapes due to overfitting. The example demonstrates that the HMM approach has robustness to the unknown factors compared to the standard MSE model.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05156

PDF

http://arxiv.org/pdf/1903.05156
Read All
A Sequential Set Generation Method for Predicting Set-Valued Outputs

2019-03-12

Tian Gao, Jie Chen, Vijil Chenthamarakshan, Michael Witbrock

arXiv_AI

arXiv_AI Regularization Classification Prediction
Abstract

Consider a general machine learning setting where the output is a set of labels or sequences. This output set is unordered and its size varies with the input. Whereas multi-label classification methods seem a natural first resort, they are not readily applicable to set-valued outputs because of the growth rate of the output space; and because conventional sequence generation doesn’t reflect sets’ order-free nature. In this paper, we propose a unified framework–sequential set generation (SSG)–that can handle output sets of labels and sequences. SSG is a meta-algorithm that leverages any probabilistic learning method for label or sequence prediction, but employs a proper regularization such that a new label or sequence is generated repeatedly until the full set is produced. Though SSG is sequential in nature, it does not penalize the ordering of the appearance of the set elements and can be applied to a variety of set output problems, such as a set of classification labels or sequences. We perform experiments with both benchmark and synthetic data sets and demonstrate SSG’s strong performance over baseline methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05153

PDF

http://arxiv.org/pdf/1903.05153
Read All
STRATA: A Unified Framework for Task Assignments in Large Teams of Heterogeneous Robots

2019-03-12

Harish Ravichandar, Kenneth Shaw, Sonia Chernova

arXiv_RO

arXiv_RO
Abstract

Large teams of robots have the potential to solve complex multi-task problems that are intractable for a single robot working independently. However, solving complex multi-task problems requires leveraging the relative strengths of different robots in the team. We present Stochastic TRAit-based Task Assignment (STRATA), a unified framework that models large teams of heterogeneous robots and performs optimal task assignments. Specifically, given information on which traits (capabilities) are required for various tasks, STRATA computes the optimal assignments of robots to tasks such that the task-trait requirements are achieved. Inspired by prior work in robot swarms and biodiversity, we categorize robots into different species (groups) based on their traits. We model each trait as a continuous variable and differentiate between traits that can and cannot be aggregated from different robots. STRATA is capable of reasoning about both species-level and robot-level differences in traits. Further, we define measures of diversity for any given team based on the team’s continuous-space trait model. We illustrate the necessity and effectiveness of STRATA using detailed simulations and a capture the flag game environment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05149

PDF

http://arxiv.org/pdf/1903.05149
Read All
Low Power Inference for On-Device Visual Recognition with a Quantization-Friendly Solution

2019-03-12

Chen Feng, Tao Sheng, Zhiyu Liang, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Matthew Ardi, Alexander C. Berg, Yiran Chen, Bo Chen, Kent Gauen, Yung-Hsiang Lu

arXiv_CV

arXiv_CV Inference Recognition
Abstract

The IEEE Low-Power Image Recognition Challenge (LPIRC) is an annual competition started in 2015 that encourages joint hardware and software solutions for computer vision systems with low latency and power. Track 1 of the competition in 2018 focused on the innovation of software solutions with fixed inference engine and hardware. This decision allows participants to submit models online and not worry about building and bringing custom hardware on-site, which attracted a historically large number of submissions. Among the diverse solutions, the winning solution proposed a quantization-friendly framework for MobileNets that achieves an accuracy of 72.67% on the holdout dataset with an average latency of 27ms on a single CPU core of Google Pixel2 phone, which is superior to the best real-time MobileNet models at the time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06791

PDF

http://arxiv.org/pdf/1903.06791
Read All
Unsupervised Discovery of Parts, Structure, and Dynamics

2019-03-12

Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

arXiv_AI

arXiv_AI Image_Caption
Abstract

Humans easily recognize object parts and their hierarchical structure by watching how they move; they can then predict how each part moves in the future. In this paper, we propose a novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos. Our Parts, Structure, and Dynamics (PSD) model learns to, first, recognize the object parts via a layered image representation; second, predict hierarchy via a structural descriptor that composes low-level concepts into a hierarchical structure; and third, model the system dynamics by predicting the future. Experiments on multiple real and synthetic datasets demonstrate that our PSD model works well on all three tasks: segmenting object parts, building their hierarchical structure, and capturing their motion distributions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05136

PDF

http://arxiv.org/pdf/1903.05136
Read All
Universally Slimmable Networks and Improved Training Techniques

2019-03-12

Jiahui Yu, Thomas Huang

arXiv_AI

arXiv_AI Super_Resolution Reinforcement_Learning Classification
Abstract

Slimmable networks are a family of neural networks that can instantly adjust the runtime width. The width can be chosen from a predefined widths set to adaptively optimize accuracy-efficiency trade-offs at runtime. In this work, we propose a systematic approach to train universally slimmable networks (US-Nets), extending slimmable networks to execute at arbitrary width, and generalizing to networks both with and without batch normalization layers. We further propose two improved training techniques for US-Nets, named the sandwich rule and inplace distillation, to enhance training process and boost testing accuracy. We show improved performance of universally slimmable MobileNet v1 and MobileNet v2 on ImageNet classification task, compared with individually trained ones and 4-switch slimmable network baselines. We also evaluate the proposed US-Nets and improved training techniques on tasks of image super-resolution and deep reinforcement learning. Extensive ablation experiments on these representative tasks demonstrate the effectiveness of our proposed methods. Our discovery opens up the possibility to directly evaluate FLOPs-Accuracy spectrum of network architectures. Code and models will be available at: https://github.com/JiahuiYu/slimmable_networks

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05134

PDF

http://arxiv.org/pdf/1903.05134
Read All
Communication-efficient distributed SGD with Sketching

2019-03-12

Nikita Ivkin, Daniel Rothchild, Enayat Ullah, Vladimir Braverman, Ion Stoica, Raman Arora

arXiv_AI

arXiv_AI
Abstract

Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time. Motivated by the success of sketching methods in sub-linear/streaming algorithms, we propose a sketching-based approach to minimize the communication costs between nodes without losing accuracy. In our proposed method, workers in a distributed, synchronous training setting send sketches of their gradient vectors to the parameter server instead of the full gradient vector. Leveraging the theoretical properties of sketches, we show that this method recovers the favorable convergence guarantees of single-machine top-$k$ SGD. Furthermore, when applied to a model with $d$ dimensions on $W$ workers, our method requires only $\Theta(kW)$ bytes of communication, compared to $\Omega(dW)$ for vanilla distributed SGD. To validate our method, we run experiments using a residual network trained on the CIFAR-10 dataset. We achieve no drop in validation accuracy with a compression ratio of 4, or about 1 percentage point drop with a compression ratio of 8. We also demonstrate that our method scales to many workers.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04488

PDF

http://arxiv.org/pdf/1903.04488
Read All
A Quantization-Friendly Separable Convolution for MobileNets

2019-03-12

Tao Sheng, Chen Feng, Shaojie Zhuo, Xiaopeng Zhang, Liang Shen, Mickey Aleksic

arXiv_CV

arXiv_CV Image_Classification Inference Classification Deep_Learning
Abstract

As deep learning (DL) is being rapidly pushed to edge computing, researchers invented various ways to make inference computation more efficient on mobile/IoT devices, such as network pruning, parameter compression, and etc. Quantization, as one of the key approaches, can effectively offload GPU, and make it possible to deploy DL on fixed-point pipeline. Unfortunately, not all existing networks design are friendly to quantization. For example, the popular lightweight MobileNetV1, while it successfully reduces parameter size and computation latency with separable convolution, our experiment shows its quantized models have large accuracy gap against its float point models. To resolve this, we analyzed the root cause of quantization loss and proposed a quantization-friendly separable convolution architecture. By evaluating the image classification task on ImageNet2012 dataset, our modified MobileNetV1 model can archive 8-bit inference top-1 accuracy in 68.03%, almost closed the gap to the float pipeline.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.08607

PDF

http://arxiv.org/pdf/1803.08607
Read All
A total variation based regularizer promoting piecewise-Lipschitz reconstructions

2019-03-12

Martin Burger, Yury Korolev, Carola-Bibiane Schönlieb, Christiane Stollenwerk

arXiv_CV

arXiv_CV Regularization
Abstract

We introduce a new regularizer in the total variation family that promotes reconstructions with a given Lipschitz constant (which can also vary spatially). We prove regularizing properties of this functional and investigate its connections to total variation and infimal convolution type regularizers TVLp and, in particular, establish topological equivalence. Our numerical experiments show that the proposed regularizer can achieve similar performance as total generalized variation while having the advantage of a very intuitive interpretation of its free parameter, which is just a local estimate of the norm of the gradient. It also provides a natural approach to spatially adaptive regularization.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05079

PDF

http://arxiv.org/pdf/1903.05079
Read All
Blackbox End-to-End Verification of Ground Robot Safety and Liveness

2019-03-12

Brandon Bohrer, Yong Kiam Tan, Stefan Mitsch, Andrew Sogokon, André Platzer

arXiv_RO

arXiv_RO
Abstract

We formally prove end-to-end correctness of a ground robot implemented in a simulator. We use an untrusted controller supervised by a verified sandbox. Contributions include: (i) A model of the robot in differential dynamic logic, which specifies assumptions on the controller and robot kinematics, (ii) Formal proofs of safety and liveness for a waypoint-following problem with speed limits, (iii) An automatically synthesized sandbox, which is automatically proven to enforce model compliance at runtime, and (iv) Controllers, planners, and environments for the simulations. The verified sandbox is used to safeguard (unverified) controllers in a realistic simulated environment. Experimental evaluation of the resulting sandboxed implementation confirms safety and high model-compliance, with an inherent trade-off between compliance and performance. The verified sandbox thus serves as a valuable bidirectional link between formal methods and implementation, automating both enforcement of safety and model validation simultaneously.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05073

PDF

http://arxiv.org/pdf/1903.05073
Read All
Dense Classification and Implanting for Few-Shot Learning

2019-03-12

Yann Lifchitz, Yannis Avrithis, Sylvaine Picard, Andrei Bursuc

arXiv_CV

arXiv_CV Knowledge Classification
Abstract

Training deep neural networks from few examples is a highly challenging and key problem for many computer vision tasks. In this context, we are targeting knowledge transfer from a set with abundant data to other sets with few available examples. We propose two simple and effective solutions: (i) dense classification over feature maps, which for the first time studies local activations in the domain of few-shot learning, and (ii) implanting, that is, attaching new neurons to a previously trained network to learn new, task-specific features. On miniImageNet, we improve the prior state-of-the-art on few-shot classification, i.e., we achieve 62.5%, 79.8% and 83.8% on 5-way 1-shot, 5-shot and 10-shot settings respectively.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05050

PDF

http://arxiv.org/pdf/1903.05050
Read All
Placental Flattening via Volumetric Parameterization

2019-03-12

S. Mazdak Abulnaga, Esra Abaci Turk, Mikhail Bessmeltsev, P. Ellen Grant, Justin Solomon, Polina Golland

arXiv_CV

arXiv_CV Gradient_Descent
Abstract

We present a volumetric mesh-based algorithm for flattening the placenta to a canonical template to enable effective visualization of local anatomy and function. Monitoring placental function in vivo promises to support pregnancy assessment and to improve care outcomes. We aim to alleviate visualization and interpretation challenges presented by the shape of the placenta when it is attached to the curved uterine wall. We flatten the volumetric mesh that captures placental shape to resemble the well-studied ex vivo shape. We formulate our method as a map from the in vivo shape to a flattened template that minimizes the symmetric Dirichlet energy density to control distortion throughout the volume. Local injectivity is enforced via constrained line search during gradient descent. We evaluate the proposed method on 28 placenta shapes extracted from MRI images in a study of placental function. We achieve sub-voxel accuracy in mapping the boundary of the placenta to the template while successfully controlling distortion throughout the volume. We illustrate how the resulting mapping of the placenta enhances visualization of the placental anatomy and function. Our code is freely available at https://github.com/mabulnaga/placenta-flattening .

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05044

PDF

http://arxiv.org/pdf/1903.05044
Read All
Character Eyes: Seeing Language through Character-Level Taggers

2019-03-12

Yuval Pinter, Marc Marone, Jacob Eisenstein

arXiv_CL

arXiv_CL Face RNN
Abstract

Character-level models have been used extensively in recent years in NLP tasks as both supplements and replacements for closed-vocabulary token-level word representations. In one popular architecture, character-level LSTMs are used to feed token representations into a sequence tagger predicting token-level annotations such as part-of-speech (POS) tags. In this work, we examine the behavior of POS taggers across languages from the perspective of individual hidden units within the character LSTM. We aggregate the behavior of these units into language-level metrics which quantify the challenges that taggers face on languages with different morphological properties, and identify links between synthesis and affixation preference and emergent behavior of the hidden tagger layer. In a comparative experiment, we show how modifying the balance between forward and backward hidden units affects model arrangement and performance in these types of languages.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05041

PDF

http://arxiv.org/pdf/1903.05041
Read All
An End-to-End Network for Panoptic Segmentation

2019-03-12

Huanyu Liu, Chao Peng, Changqian Yu, Jingbo Wang, Xu Liu, Gang Yu, Wei Jiang

arXiv_CV

arXiv_CV Segmentation Relation
Abstract

Panoptic segmentation, which needs to assign a category label to each pixel and segment each object instance simultaneously, is a challenging topic. Traditionally, the existing approaches utilize two independent models without sharing features, which makes the pipeline inefficient to implement. In addition, a heuristic method is usually employed to merge the results. However, the overlapping relationship between object instances is difficult to determine without sufficient context information during the merging process. To address the problems, we propose a novel end-to-end network for panoptic segmentation, which can efficiently and effectively predict both the instance and stuff segmentation in a single network. Moreover, we introduce a novel spatial ranking module to deal with the occlusion problem between the predicted instances. Extensive experiments have been done to validate the performance of our proposed method and promising results have been achieved on the COCO Panoptic benchmark.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05027

PDF

http://arxiv.org/pdf/1903.05027
Read All
Generating Compact Geometric Track-Maps for Train Positioning Applications

2019-03-12

Hanno Winter, Stefan Luthardt, Volker Willert, Jürgen Adamy

arXiv_CV

arXiv_CV Optimization
Abstract

In this paper we present a method to generate compact geometric track-maps for train-borne localization applications. We first give a brief overview on the role of track maps and it becomes apparent that there are hardly any adequate methods to generate suitable geometric track-maps. Therefore, we present a novel map generation procedure that uses an optimization formulation to find the continuous sequence of track geometries that fits the available measurement data best. The optimization is initialized with the results from a localization filter developed in our previous work. The filter also provides the required information for shape identification and measurement association. The approach will be evaluated using simulated data in comparison to the typically used data-point based maps.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05014

PDF

http://arxiv.org/pdf/1903.05014
Read All
A Study on Passage Re-ranking in Embedding based Unsupervised Semantic Search

2019-03-12

Md Faisal Mahbub Chowdhury, Vijil Chenthamarakshan, Rishav Chakravarti, Alfio M. Gliozzo

arXiv_CL

arXiv_CL Embedding
Abstract

State of the art approaches for (embedding based) unsupervised semantic search exploits either compositional similarity (of a query and a passage) or pair-wise word (or term) similarity (from the query and the passage). By design, word based approaches do not incorporate similarity in the larger context (query/passage), while compositional similarity based approaches are usually unable to take advantage of the most important cues in the context. In this paper we propose a new compositional similarity based approach, called variable centroid vector (VCVB), that tries to address both of these limitations. We also presents results using a different type of compositional similarity based approach by exploiting universal sentence embedding. We provide empirical evaluation on two different benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.08057

PDF

http://arxiv.org/pdf/1804.08057
Read All
Theory III: Dynamics and Generalization in Deep Networks

2019-03-12

Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Bob Liang, Jack Hidary, Tomaso Poggio

arXiv_AI

arXiv_AI Review Classification Gradient_Descent
Abstract

We review recent observations on the dynamical systems induced by gradient descent methods used for training deep networks and summarize properties of the solutions they converge to. Recent results illuminate the absence of overfitting in the special case of linear networks for binary classification. They prove that minimization of loss functions such as the logistic, the cross-entropy and the exponential loss yields asymptotic convergence to the maximum margin solution for linearly separable datasets, independently of the initial conditions. Here we discuss the case of nonlinear DNNs near zero minima of the empirical loss, under exponential-type and square losses, for several variations of the basic gradient descent algorithm, including a new NMGD (norm minimizing gradient descent) version that converges to the minimum norm fixed points of the gradient descent iteration. Our main results are: 1) gradient descent algorithms with weight normalization constraint achieve generalization; 2) the fundamental reason for the effectiveness of existing weight normalization and batch normalization techniques is that they are approximate implementations of maximizing the margin under unit norm constraint; 3) without unit norm constraints some level of generalization can still be obtained for not-too-deep networks because the balance of the weights across different layers, if present at initialization, is maintained by the gradient flow. In the perspective of these theoretical results, we discuss experimental evidence around the apparent absence of overfitting, that is the observation that the expected classification error does not get worse when increasing the number of parameters. Our explanation focuses on the implicit normalization enforced by algorithms such as batch normalization. In particular, the control of the norm of the weights is related to Halpern iterations for minimum norm solutions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04991

PDF

http://arxiv.org/pdf/1903.04991
Read All
Cascaded Projection: End-to-End Network Compression and Acceleration

2019-03-12

Breton Minnehan, Andreas Savakis

arXiv_CV

arXiv_CV CNN Optimization Classification Gradient_Descent
Abstract

We propose a data-driven approach for deep convolutional neural network compression that achieves high accuracy with high throughput and low memory requirements. Current network compression methods either find a low-rank factorization of the features that requires more memory, or select only a subset of features by pruning entire filter channels. We propose the Cascaded Projection (CaP) compression method that projects the output and input filter channels of successive layers to a unified low dimensional space based on a low-rank projection. We optimize the projection to minimize classification loss and the difference between the next layer’s features in the compressed and uncompressed networks. To solve this non-convex optimization problem we propose a new optimization method of a proxy matrix using backpropagation and Stochastic Gradient Descent (SGD) with geometric constraints. Our cascaded projection approach leads to improvements in all critical areas of network compression: high accuracy, low memory consumption, low parameter count and high processing speed. The proposed CaP method demonstrates state-of-the-art results compressing VGG16 and ResNet networks with over 4x reduction in the number of computations and excellent performance in top-5 accuracy on the ImageNet dataset before and after fine-tuning.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04988

PDF

http://arxiv.org/pdf/1903.04988
Read All
Fair comparison of skin detection approaches on publicly available datasets

2019-03-12

Alessandra Lumini, Loris Nanni

arXiv_CV

arXiv_CV Object_Detection Attention Face Tracking Detection Face_Detection Recognition
Abstract

Skin detection is the process of discriminating skin and non-skin regions in a digital image and it is widely used in several applications ranging from hand gesture analysis to tracking body parts and face detection. Skin detection is a challenging problem which has drawn extensive attention from the research community, nevertheless a fair comparison among approaches is very difficult due to the lack of a common benchmark and a unified testing protocol. In this work, we investigate the most recent research in this field and we propose a fair comparison among approaches using several different datasets. The major contributions of this work is a framework to evaluate and combine different skin detector approaches, whose source code will be made freely available for future research, and an extensive experimental comparison among several recent methods which have also been used to define an ensemble that works well in many different problems. Experiments are carried out in 10 different datasets including more than 10000 labelled images: experimental results confirm that the ensemble here proposed obtains a very good performance with respect to other stand-alone approaches, without requiring ad hoc parameter tuning. A MATLAB version of the framework for testing and ensemble proposed in this paper will be freely available from (https://www.dei.unipd.it/node/2357 + Pattern Recognition and Ensemble Classifiers).

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.02531

PDF

http://arxiv.org/pdf/1802.02531
Read All
DREAM-NAP: Decay Replay Mining to Predict Next Process Activities

2019-03-12

Julian Theis, Houshang Darabi

arXiv_AI

arXiv_AI RNN Deep_Learning Prediction
Abstract

In complex processes, various events can happen in different sequences. The prediction of the next event activity given an a-priori process state is of importance in such processes. Recent methods leverage deep learning techniques such as recurrent neural networks to predict event activities from raw process logs. However, deep learning techniques cannot efficiently model logical behaviors of complex processes. In this paper, we take advantage of Petri nets as a powerful tool in modeling logical behaviors of complex processes. We propose an approach which first discovers Petri nets from event logs utilizing a recent process mining algorithm. In a second step, we enhance the obtained model with time decay functions to create timed process state samples. Finally, we use these samples in combination with token movement counters and Petri net markings to train a deep learning model that predicts the next event activity. We demonstrate significant performance improvements and outperform the state-of-the-art methods on eight out of nine real-world benchmark event logs in accuracy.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.05084

PDF

http://arxiv.org/pdf/1903.05084
Read All
Iterated two-phase local search for the Set-Union Knapsack Problem

2019-03-12

Zequn Wei, Jin-Kao Hao

arXiv_AI

arXiv_AI Optimization
Abstract

The Set-union Knapsack Problem (SUKP) is a generalization of the popular 0-1 knapsack problem. Given a set of weighted elements and a set of items with profits where each item is composed of a subset of elements, the SUKP involves packing a subset of items in a capacity-constrained knapsack such that the total profit of the selected items is maximized while their weights do not exceed the knapsack capacity. In this work, we present an effective iterated two-phase local search algorithm for this NP-hard combinatorial optimization problem. The proposed algorithm iterates through two search phases: a local optima exploration phase that alternates between a variable neighborhood descent search and a tabu search to explore local optimal solutions, and a local optima escaping phase to drive the search to unexplored regions. We show the competitiveness of the algorithm compared to the state-of-the-art methods in the literature. Specifically, the algorithm discovers 18 improved best results (new lower bounds) for the 30 benchmark instances and matches the best-known results for the 12 remaining instances. We also report the first computational results with the general CPLEX solver, including 6 proven optimal solutions. Finally, we investigate the effectiveness of the key ingredients of the algorithm on its performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04966

PDF

http://arxiv.org/pdf/1903.04966
Read All
Discriminative Principal Component Analysis: A REVERSE THINKING

2019-03-12

Hanli Qiao

arXiv_CV

arXiv_CV Face Recognition Face_Recognition
Abstract

In this paper, we propose a novel approach named by Discriminative Principal Component Analysis which is abbreviated as Discriminative PCA in order to enhance separability of PCA by Linear Discriminant Analysis (LDA). The proposed method performs feature extraction by determining a linear projection that captures the most scattered discriminative information. The most innovation of Discriminative PCA is performing PCA on discriminative matrix rather than original sample matrix. For calculating the required discriminative matrix under low complexity, we exploit LDA on a converted matrix to obtain within-class matrix and between-class matrix thereof. During the computation process, we utilise direct linear discriminant analysis (DLDA) to solve the encountered SSS problem. For evaluating the performances of Discriminative PCA in face recognition, we analytically compare it with DLAD and PCA on four well known facial databases, they are PIE, FERET, YALE and ORL respectively. Results in accuracy and running time obtained by nearest neighbour classifier are compared when different number of training images per person used. Not only the superiority and outstanding performance of Discriminative PCA showed in recognition rate, but also the comparable results of running time.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04963

PDF

http://arxiv.org/pdf/1903.04963
Read All
Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

2019-03-12

Haotian Fu, Hongyao Tang, Jianye Hao, Zihan Lei, Yingfeng Chen, Changjie Fan

arXiv_AI

arXiv_AI Knowledge Reinforcement_Learning
Abstract

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04959

PDF

http://arxiv.org/pdf/1903.04959
Read All
Noisy Supervision for Correcting Misaligned Cadaster Maps Without Perfect Ground Truth Data

2019-03-12

Nicolas Girard (UCA, TITANE), Guillaume Charpiat (TAU), Yuliya Tarabalka (UCA, TITANE)

arXiv_CV

arXiv_CV
Abstract

In machine learning the best performance on a certain task is achieved by fully supervised methods when perfect ground truth labels are available. However, labels are often noisy, especially in remote sensing where manually curated public datasets are rare. We study the multi-modal cadaster map alignment problem for which available annotations are mis-aligned polygons, resulting in noisy supervision. We subsequently set up a multiple-rounds training scheme which corrects the ground truth annotations at each round to better train the model at the next round. We show that it is possible to reduce the noise of the dataset by iteratively training a better alignment model to correct the annotation alignment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.06529

PDF

http://arxiv.org/pdf/1903.06529
Read All
Probabilistic Temporal Logic over Finite Traces

2019-03-12

Fabrizio M. Maggi, Marco Montali, Rafael Peñaloza

arXiv_AI

arXiv_AI Attention Inference
Abstract

Temporal logics over finite traces have recently gained attention due to their use in real-world applications, in particular in business process modelling and planning. In real life, processes contain some degree of uncertainty that is impossible to handle with classical logics. We propose a new probabilistic temporal logic over finite traces based on superposition semantics, where all possible evolutions are possible, until observed. We study the properties of the logic and provide automata-based mechanisms for deriving probabilistic inferences from its formulas. We ground the approach in the context of declarative process modelling, showing how the temporal patterns used in Declare can be lifted to our setting, and discussing how probabilistic inferences can be exploited to provide key offline and runtime reasoning tasks, and how to discover probabilistic Declare patterns from event data by minor adjustments to existing discovery algorithms.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04940

PDF

http://arxiv.org/pdf/1903.04940
Read All
Goal-Directed Behavior under Variational Predictive Coding: Dynamic Organization of Visual Attention and Working Memory

2019-03-12

Minju Jung, Takazumi Matsumoto, Jun Tani

arXiv_RO

arXiv_RO Knowledge Attention GAN Inference
Abstract

Mental simulation is a critical cognitive function for goal-directed behavior because it is essential for assessing actions and their consequences. When a self-generated or externally specified goal is given, a sequence of actions that is most likely to attain that goal is selected among other candidates via mental simulation. Therefore, better mental simulation leads to better goal-directed action planning. However, developing a mental simulation model is challenging because it requires knowledge of self and the environment. The current paper studies how adequate goal-directed action plans of robots can be mentally generated by dynamically organizing top-down visual attention and visual working memory. For this purpose, we propose a neural network model based on variational Bayes predictive coding, where goal-directed action planning is formulated by Bayesian inference of latent intentional space. Our experimental results showed that cognitively meaningful competencies, such as autonomous top-down attention to the robot end effector (its hand) as well as dynamic organization of occlusion-free visual working memory, emerged. Furthermore, our analysis of comparative experiments indicated that introduction of visual working memory and the inference mechanism using variational Bayes predictive coding significantly improve the performance in planning adequate goal-directed actions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04932

PDF

http://arxiv.org/pdf/1903.04932
Read All
UAV/UGV Autonomous Cooperation: UAV Assists UGV to Climb a Cliff by Attaching a Tether

2019-03-12

Takahiro Miki, Petr Khrapchenkov, Koichi Hori

arXiv_RO

arXiv_RO
Abstract

This paper proposes a novel cooperative system for an Unmanned Aerial Vehicle (UAV) and an Unmanned Ground Vehicle (UGV) which utilizes the UAV not only as a flying sensor but also as a tether attachment device. Two robots are connected with a tether, allowing the UAV to anchor the tether to a structure located at the top of a steep terrain, impossible to reach for UGVs. Thus, enhancing the poor traversability of the UGV by not only providing a wider range of scanning and mapping from the air, but also by allowing the UGV to climb steep terrains with the winding of the tether. In addition, we present an autonomous framework for the collaborative navigation and tether attachment in an unknown environment. The UAV employs visual inertial navigation with 3D voxel mapping and obstacle avoidance planning. The UGV makes use of the voxel map and generates an elevation map to execute path planning based on a traversability analysis. Furthermore, we compared the pros and cons of possible methods for the tether anchoring from multiple points of view. To increase the probability of successful anchoring, we evaluated the anchoring strategy with an experiment. Finally, the feasibility and capability of our proposed system were demonstrated by an autonomous mission experiment in the field with an obstacle and a cliff.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.04898

PDF

http://arxiv.org/pdf/1903.04898
Read All

123/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL