Welcome to AMDS123 Blog!

Recent Papers about CV, CL and SD

2D LiDAR Map Prediction via Estimating Motion Flow with GRU

2019-02-19

Yafei Song, Yonghong Tian, Gang Wang, Mingyang Li

arXiv_CV

arXiv_CV Represenation_Learning Prediction
Abstract

It is a significant problem to predict the 2D LiDAR map at next moment for robotics navigation and path-planning. To tackle this problem, we resort to the motion flow between adjacent maps, as motion flow is a powerful tool to process and analyze the dynamic data, which is named optical flow in video processing. However, unlike video, which contains abundant visual features in each frame, a 2D LiDAR map lacks distinctive local features. To alleviate this challenge, we propose to estimate the motion flow based on deep neural networks inspired by its powerful representation learning ability in estimating the optical flow of the video. To this end, we design a recurrent neural network based on gated recurrent unit, which is named LiDAR-FlowNet. As a recurrent neural network can encode the temporal dynamic information, our LiDAR-FlowNet can estimate motion flow between the current map and the unknown next map only from the current frame and previous frames. A self-supervised strategy is further designed to train the LiDAR-FlowNet model effectively, while no training data need to be manually annotated. With the estimated motion flow, it is straightforward to predict the 2D LiDAR map at the next moment. Experimental results verify the effectiveness of our LiDAR-FlowNet as well as the proposed training strategy. The results of the predicted LiDAR map also show the advantages of our motion flow based method.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06919

PDF

http://arxiv.org/pdf/1902.06919
Read All
Fast Compressive Sensing Recovery Using Generative Models with Structured Latent Variables

2019-02-19

Shaojie Xu, Sihan Zeng, Justin Romberg

arXiv_CV

arXiv_CV Deep_Learning
Abstract

Deep learning models have significantly improved the visual quality and accuracy on compressive sensing recovery. In this paper, we propose an algorithm for signal reconstruction from compressed measurements with image priors captured by a generative model. We search and constrain on latent variable space to make the method stable when the number of compressed measurements is extremely limited. We show that, by exploiting certain structures of the latent variables, the proposed method produces improved reconstruction accuracy and preserves realistic and non-smooth features in the image. Our algorithm achieves high computation speed by projecting between the original signal space and the latent variable space in an alternating fashion.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06913

PDF

http://arxiv.org/pdf/1902.06913
Read All
Appearance-based Gesture recognition in the compressed domain

2019-02-19

Shaojie Xu, Anvesha Amaravati, Justin Romberg, Arijit Raychowdhury

arXiv_CV

arXiv_CV Recognition
Abstract

We propose a novel appearance-based gesture recognition algorithm using compressed domain signal processing techniques. Gesture features are extracted directly from the compressed measurements, which are the block averages and the coded linear combinations of the image sensor’s pixel values. We also improve both the computational efficiency and the memory requirement of the previous DTW-based K-NN gesture classifiers. Both simulation testing and hardware implementation strongly support the proposed algorithm.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.00100

PDF

http://arxiv.org/pdf/1903.00100
Read All
Efficient Obstacle Rearrangement for Object Manipulation Tasks in Cluttered Environments

2019-02-19

Jinhwi Lee, Younggil Cho, Changjoo Nam, Jonghyeon Park, Changhwan Kim

arXiv_RO

arXiv_RO
Abstract

We present an algorithm that produces a plan for relocating obstacles in order to grasp a target in clutter by a robotic manipulator without collisions. We consider configurations where objects are densely populated in a constrained and confined space. Thus, there exists no collision-free path for the manipulator without relocating obstacles. Since the problem of planning for object rearrangement has shown to be NP-hard, it is difficult to perform manipulation tasks efficiently which could frequently happen in service domains (e.g., taking out a target from a shelf or a fridge). Our proposed planner employs a collision avoidance scheme which has been widely used in mobile robot navigation. The planner determines an obstacle to be removed quickly in real time. It also can deal with dynamic changes in the configuration (e.g., changes in object poses). Our method is shown to be complete and runs in polynomial time. Experimental results in a realistic simulated environment show that our method improves up to 31% of the execution time compared to other competitors.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06907

PDF

http://arxiv.org/pdf/1902.06907
Read All
On Voting Strategies and Emergent Communication

2019-02-19

Shubham Gupta, Ambedkar Dukkipati

arXiv_AI

arXiv_AI Knowledge
Abstract

Humans use language to collectively execute complex strategies in addition to using it as a referential tool for referring to physical entities. While existing approaches that study the emergence of language in settings where the language mainly acts as a referential tool, in this paper, we study the role of emergent languages in discovering and implementing strategies in a multi-agent setting. The agents in our setup are connected via a network and are allowed to exchange messages in the form of sequences of discrete symbols. We formulate the problem as a voting game, where two candidate agents are contesting in an election and their goal is to convince the population members (other agents) in the network to vote for them by sending them messages. We use neural networks to parameterize the policies followed by agents in the game. We investigate the effect of choosing different training objectives and strategies for agents in the game and make observations about the emergent language in each case. To the best of our knowledge this is the first work that explores emergence of language for discovering and implementing strategies in a setting where agents are connected via an underlying network.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06897

PDF

http://arxiv.org/pdf/1902.06897
Read All
P-Reverb: Perceptual Characterization of Early and Late Reflections for Auditory Displays

2019-02-19

Atul Rungta, Nicholas Rewkowski, Roberta Klatzky, Dinesh Manocha

arXiv_SD

arXiv_SD
Abstract

We introduce a novel, perceptually derived metric (P-Reverb) that relates the just-noticeable difference (JND) of the early sound field(also called early reflections) to the late sound field (known as late reflections or reverberation). Early and late reflections are crucial components of the sound field and provide multiple perceptual cues for auditory displays. We conduct two extensive user evaluations that relate the JNDs of early reflections and late reverberation in terms of the mean-free path of the environment and present a novel P-Reverb metric. Our metric is used to estimate dynamic reverberation characteristics efficiently in terms of important parameters like reverberation time (RT60). We show the numerical accuracy of our P-Reverb metric in estimating RT60. Finally, we use our metric to design an interactive sound propagation algorithm and demonstrate its effectiveness on various benchmarks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06880

PDF

http://arxiv.org/pdf/1902.06880
Read All
Predicting city safety perception based on visual image content

2019-02-19

Sergio Acosta, Jorge E. Camargo

arXiv_CV

arXiv_CV
Abstract

Safety perception measurement has been a subject of interest in many cities of the world. This is due to its social relevance, and to its effect on some local economic activities. Even though people safety perception is a subjective topic, sometimes it is possible to find out common patterns given a restricted geographical and sociocultural context. This paper presents an approach that makes use of image processing and machine learning techniques to detect with high accuracy urban environment patterns that could affect citizen’s safety perception.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06871

PDF

http://arxiv.org/pdf/1902.06871
Read All
Challenging Environments for Traffic Sign Detection: Reliability Assessment under Inclement Conditions

2019-02-19

Dogancan Temel, Tariq Alshawi, Min-Hung Chen, Ghassan AlRegib

arXiv_CV

arXiv_CV CNN Detection
Abstract

State-of-the-art algorithms successfully localize and recognize traffic signs over existing datasets, which are limited in terms of challenging condition type and severity. Therefore, it is not possible to estimate the performance of traffic sign detection algorithms under overlooked challenging conditions. Another shortcoming of existing datasets is the limited utilization of temporal information and the unavailability of consecutive frames and annotations. To overcome these shortcomings, we generated the CURE-TSD video dataset and hosted the first IEEE Video and Image Processing (VIP) Cup within the IEEE Signal Processing Society. In this paper, we provide a detailed description of the CURE-TSD dataset, analyze the characteristics of the top performing algorithms, and provide a performance benchmark. Moreover, we investigate the robustness of the benchmarked algorithms with respect to sign size, challenge type and severity. Benchmarked algorithms are based on state-of-the-art and custom convolutional neural networks that achieved a precision of 0.55 and a recall of 0.32, F0.5 score of 0.48 and F2 score of 0.35. Experimental results show that benchmarked algorithms are highly sensitive to tested challenging conditions, which result in an average performance drop of 0.17 in terms of precision and a performance drop of 0.28 in recall under severe conditions. The dataset is publicly available at https://ghassanalregib.com/curetsd/.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06857

PDF

http://arxiv.org/pdf/1902.06857
Read All
WIDER Face and Pedestrian Challenge 2018: Methods and Results

2019-02-19

Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jianfeng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou

arXiv_CV

arXiv_CV Review Face Detection Face_Detection
Abstract

This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian. The challenge focuses on the problem of precise localization of human faces and bodies, and accurate association of identities. It comprises of three tracks: (i) WIDER Face which aims at soliciting new approaches to advance the state-of-the-art in face detection, (ii) WIDER Pedestrian which aims to find effective and efficient approaches to address the problem of pedestrian detection in unconstrained environments, and (iii) WIDER Person Search which presents an exciting challenge of searching persons across 192 movies. In total, 73 teams made valid submissions to the challenge tracks. We summarize the winning solutions for all three tracks. and present discussions on open problems and potential research directions in these topics.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06854

PDF

http://arxiv.org/pdf/1902.06854
Read All
On the Impact of the Activation Function on Deep Neural Networks Training

2019-02-19

Soufiane Hayou, Arnaud Doucet, Judith Rousseau

arXiv_AI

arXiv_AI
Abstract

The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure. An inappropriate selection can lead to the loss of information of the input during forward propagation and the exponential vanishing/exploding of gradients during back-propagation. Understanding the theoretical properties of untrained random networks is key to identifying which deep networks may be trained successfully as recently demonstrated by Samuel et al (2017) who showed that for deep feedforward neural networks only a specific choice of hyperparameters known as the `Edge of Chaos’ can lead to good performance. While the work by Samuel et al (2017) discuss trainability issues, we focus here on training acceleration and overall performance. We give a comprehensive theoretical analysis of the Edge of Chaos and show that we can indeed tune the initialization parameters and the activation function in order to accelerate the training and improve the performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06853

PDF

http://arxiv.org/pdf/1902.06853
Read All
Fusing Visual, Textual and Connectivity Clues for Studying Mental Health

2019-02-19

Amir Hossein Yazdavar, Mohammad Saeid Mahdavinejad, Goonmeet Bajaj, William Romine, Amirhassan Monadjemi, Krishnaprasad Thirunarayan, Amit Sheth, Jyotishman Pathak

arXiv_CL

arXiv_CL Inference Relation
Abstract

With ubiquity of social media platforms, millions of people are sharing their online persona by expressing their thoughts, moods, emotions, feelings, and even their daily struggles with mental health issues voluntarily and publicly on social media. Unlike the most existing efforts which study depression by analyzing textual content, we examine and exploit multimodal big data to discern depressive behavior using a wide variety of features including individual-level demographics. By developing a multimodal framework and employing statistical techniques for fusing heterogeneous sets of features obtained by processing visual, textual and user interaction data, we significantly enhance the current state-of-the-art approaches for identifying depressed individuals on Twitter (improving the average F1-Score by 5 percent) as well as facilitate demographic inference from social media for broader applications. Besides providing insights into the relationship between demographics and mental health, our research assists in the design of a new breed of demographic-aware health interventions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06843

PDF

http://arxiv.org/pdf/1902.06843
Read All
SC-FEGAN: Face Editing Generative Adversarial Network with User's Sketch and Color

2019-02-18

Youngjoo Jo, Jongyoul Park

arXiv_CV

arXiv_CV Adversarial GAN Face CNN
Abstract

We present a novel image editing system that generates images as the user provides free-form mask, sketch and color as an input. Our system consist of a end-to-end trainable convolutional network. Contrary to the existing methods, our system wholly utilizes free-form user input with color and shape. This allows the system to respond to the user’s sketch and color input, using it as a guideline to generate an image. In our particular work, we trained network with additional style loss which made it possible to generate realistic results, despite large portions of the image being removed. Our proposed network architecture SC-FEGAN is well suited to generate high quality synthetic image using intuitive user inputs.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06838

PDF

http://arxiv.org/pdf/1902.06838
Read All
Commodity RGB-D Sensors: Data Acquisition

2019-02-18

Michael Zollhöfer

arXiv_CV

arXiv_CV OCR Face Tracking
Abstract

Over the past ten years we have seen a democratization of range sensing technology. While previously range sensors have been highly expensive and only accessible to a few domain experts, such sensors are nowadays ubiquitous and can even be found in the latest generation of mobile devices, e.g., current smartphones. This democratization of range sensing technology was started with the release of the Microsoft Kinect, and since then many different commodity range sensors followed its lead, such as the Primesense Carmine, Asus Xtion Pro, and the Structure Sensor from Occipital. The availability of cheap range sensing technology led to a big leap in research, especially in the context of more powerful static and dynamic reconstruction techniques, starting from 3D scanning applications, such as KinectFusion, to highly accurate face and body tracking approaches. In this chapter, we have a detailed look into the different types of existing range sensors. We discuss the two fundamental types of commodity range sensing techniques in detail, namely passive and active sensing, and we explore the principles these technologies are based on. Our focus is on modern active commodity range sensors based on time-of-flight and structured light. We conclude by discussing the noise characteristics, working ranges, and types of errors made by the different sensing modalities.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06835

PDF

http://arxiv.org/pdf/1902.06835
Read All
Learned In Speech Recognition: Contextual Acoustic Word Embeddings

2019-02-18

Shruti Palaskar, Vikas Raunak, Florian Metze

arXiv_CL

arXiv_CL Attention Speech_Recognition Embedding Inference Language_Model Recognition
Abstract

End-to-end acoustic-to-word speech recognition models have recently gained popularity because they are easy to train, scale well to large amounts of training data, and do not require a lexicon. In addition, word models may also be easier to integrate with downstream tasks such as spoken language understanding, because inference (search) is much simplified compared to phoneme, character or any other sort of sub-word units. In this paper, we describe methods to construct contextual acoustic word embeddings directly from a supervised sequence-to-sequence acoustic-to-word speech recognition model using the learned attention distribution. On a suite of 16 standard sentence evaluation tasks, our embeddings show competitive performance against a word2vec model trained on the speech transcriptions. In addition, we evaluate these embeddings on a spoken language understanding task, and observe that our embeddings match the performance of text-based embeddings in a pipeline of first performing speech recognition and then constructing word embeddings from transcriptions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06833

PDF

http://arxiv.org/pdf/1902.06833
Read All
Towards the Next Generation Airline Revenue Management: A Deep Reinforcement Learning Approach to Seat Inventory Control and Overbooking

2019-02-18

Syed Arbab Mohd Shihab, Caleb Logemann, Deepak-George Thomas, Peng Wei

arXiv_AI

arXiv_AI Reinforcement_Learning Optimization
Abstract

Revenue management can enable airline corporations to maximize the revenue generated from each scheduled flight departing in their transportation network by means of finding the optimal policies for differential pricing, seat inventory control and overbooking. As different demand segments in the market have different Willingness-To-Pay (WTP), airlines use differential pricing, booking restrictions, and service amenities to determine different fare classes or products targeted at each of these demand segments. Because seats are limited for each flight, airlines also need to allocate seats for each of these fare classes to prevent lower fare class passengers from displacing higher fare class ones and set overbooking limits in anticipation of cancellations and no-shows such that revenue is maximized. Previous work addresses these problems using optimization techniques or classical Reinforcement Learning methods. This paper focuses on the latter problem - the seat inventory control problem - casting it as a Markov Decision Process to be able to find the optimal policy. Multiple fare classes, concurrent continuous arrival of passengers of different fare classes, overbooking and random cancellations that are independent of class have been considered in the model. We have addressed this problem using Deep Q-Learning with the goal of maximizing the reward for each flight departure. The implementation of this technique allows us to employ large continuous state space but also presents the potential opportunity to test on real time airline data. To generate data and train the agent, a basic air-travel market simulator was developed. The performance of the agent in different simulated market scenarios was compared against theoretically optimal solutions and was found to be nearly close to the expected optimal revenue.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06824

PDF

http://arxiv.org/pdf/1902.06824
Read All
Low-bit Quantization of Neural Networks for Efficient Inference

2019-02-18

Yoni Choukroun, Eli Kravchik, Pavel Kisilev

arXiv_CV

arXiv_CV Optimization Inference
Abstract

Recent breakthrough methods in machine learning make use of increasingly large deep neural networks. The gains in performance have come at the cost of a substantial increase in computation and storage, making real-time implementation on limited hardware a very challenging task. One popular approach to address this challenge is to perform low-bit precision computations via neural network quantization. However, aggressive quantization generally entails a severe penalty in terms of accuracy and usually requires the retraining of the network or resorts to higher bit precision quantization. In this paper, we formalize the linear quantization task as a Minimum Mean Squared Error (MMSE) problem for both weights and activations. This allows low-bit precision inference without the need for full network retraining. The main contributions of our approach is the optimization of the constrained MSE problem at each layer of the network, the hardware aware partitioning of the neural network parameters, and the use of multiple low precision quantized tensors for poorly approximated layers. The proposed approach allows for the first time a linear 4 bits integer precision (INT4) quantization for deployment of pretrained models on limited hardware resources.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06822

PDF

http://arxiv.org/pdf/1902.06822
Read All
Motion Equivariant Networks for Event Cameras with the Temporal Normalization Transform

2019-02-18

Alex Zihao Zhu, Ziyun Wang, Kostas Daniilidis

arXiv_CV

arXiv_CV Classification
Abstract

In this work, we propose a novel transformation for events from an event camera that is equivariant to optical flow under convolutions in the 3-D spatiotemporal domain. Events are generated by changes in the image, which are typically due to motion, either of the camera or the scene. As a result, different motions result in a different set of events. For learning based tasks based on a static scene such as classification which directly use the events, we must either rely on the learning method to learn the underlying object distinct from the motion, or to memorize all possible motions for each object with extensive data augmentation. Instead, we propose a novel transformation of the input event data which normalizes the $x$ and $y$ positions by the timestamp of each event. We show that this transformation generates a representation of the events that is equivariant to this motion when the optical flow is constant, allowing a deep neural network to learn the classification task without the need for expensive data augmentation. We test our method on the event based N-MNIST dataset, as well as a novel dataset N-MOVING-MNIST, with significantly more variety in motion compared to the standard N-MNIST dataset. In all sequences, we demonstrate that our transformed network is able to achieve similar or better performance compared to a network with a standard volumetric event input, and performs significantly better when the test set has a larger set of motions than seen at training.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06820

PDF

http://arxiv.org/pdf/1902.06820
Read All
Learning Simple Thresholded Features with Sparse Support Recovery

2019-02-18

Hongyu Xu, Zhangyang Wang, Haichuan Yang, Ding Liu, Ji Liu

arXiv_CV

arXiv_CV Sparse Inference Deep_Learning
Abstract

The thresholded feature has recently emerged as an extremely efficient, yet rough empirical approximation, of the time-consuming sparse coding inference process. Such an approximation has not yet been rigorously examined, and standard dictionaries often lead to non-optimal performance when used for computing thresholded features. In this paper, we first present two theoretical recovery guarantees for the thresholded feature to exactly recover the nonzero support of the sparse code. Motivated by them, we then formulate the Dictionary Learning for Thresholded Features (DLTF) model, which learns an optimized dictionary for applying the thresholded feature. In particular, for the $(k, 2)$ norm involved, a novel proximal operator with log-linear time complexity $O(m\log m)$ is derived. We evaluate the performance of DLTF on a vast range of synthetic and real-data tasks, where DLTF demonstrates remarkable efficiency, effectiveness and robustness in all experiments. In addition, we briefly discuss the potential link between DLTF and deep learning building blocks.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1804.05515

PDF

http://arxiv.org/pdf/1804.05515
Read All
FreeLabel: A Publicly Available Annotation Tool based on Freehand Traces

2019-02-18

Philipe A. Dias, Zhou Shen, Amy Tabb, Henry Medeiros

arXiv_CV

arXiv_CV Image_Caption Segmentation Face Deep_Learning Quantitative
Abstract

Large-scale annotation of image segmentation datasets is often prohibitively expensive, as it usually requires a huge number of worker hours to obtain high-quality results. Abundant and reliable data has been, however, crucial for the advances on image understanding tasks achieved by deep learning models. In this paper, we introduce FreeLabel, an intuitive open-source web interface that allows users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. The efficacy of FreeLabel is quantitatively demonstrated by experimental results on the PASCAL dataset as well as on a dataset from the agricultural domain. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06806

PDF

http://arxiv.org/pdf/1902.06806
Read All
Democratisation of Usable Machine Learning in Computer Vision

2019-02-18

Raymond Bond, Ansgar Koene, Alan Dix, Jennifer Boger, Maurice D. Mulvenna, Mykola Galushka, Bethany Waterhouse Bradley, Fiona Browne, Hui Wang, Alexander Wong

arXiv_AI

arXiv_AI OCR
Abstract

Many industries are now investing heavily in data science and automation to replace manual tasks and/or to help with decision making, especially in the realm of leveraging computer vision to automate many monitoring, inspection, and surveillance tasks. This has resulted in the emergence of the ‘data scientist’ who is conversant in statistical thinking, machine learning (ML), computer vision, and computer programming. However, as ML becomes more accessible to the general public and more aspects of ML become automated, applications leveraging computer vision are increasingly being created by non-experts with less opportunity for regulatory oversight. This points to the overall need for more educated responsibility for these lay-users of usable ML tools in order to mitigate potentially unethical ramifications. In this paper, we undertake a SWOT analysis to study the strengths, weaknesses, opportunities, and threats of building usable ML tools for mass adoption for important areas leveraging ML such as computer vision. The paper proposes a set of data science literacy criteria for educating and supporting lay-users in the responsible development and deployment of ML applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06804

PDF

http://arxiv.org/pdf/1902.06804
Read All
End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model

2019-02-18

Daniel Stoller, Simon Durand, Sebastian Ewert

arXiv_SD

arXiv_SD Detection Recognition
Abstract

Time-aligned lyrics can enrich the music listening experience by enabling karaoke, text-based song retrieval and intra-song navigation, and other applications. Compared to text-to-speech alignment, lyrics alignment remains highly challenging, despite many attempts to combine numerous sub-modules including vocal separation and detection in an effort to break down the problem. Furthermore, training required fine-grained annotations to be available in some form. Here, we present a novel system based on a modified Wave-U-Net architecture, which predicts character probabilities directly from raw audio using learnt multi-scale representations of the various signal components. There are no sub-modules whose interdependencies need to be optimized. Our training procedure is designed to work with weak, line-level annotations available in the real world. With a mean alignment error of 0.35s on a standard dataset our system outperforms the state-of-the-art by an order of magnitude.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06797

PDF

http://arxiv.org/pdf/1902.06797
Read All
Securing Voice-driven Interfaces against Fake Audio Attacks

2019-02-18

Hafiz Malik

arXiv_SD

arXiv_SD Face Detection
Abstract

Voice cloning technologies have found applications in a variety of areas ranging from personalized speech interfaces to advertisement, robotics, and so on. Existing voice cloning systems are capable of learning speaker characteristics and use trained models to synthesize a person’s voice from only a few audio samples. Advances in cloned speech generation technologies are capable of generating perceptually indistinguishable speech from a bona-fide speech. These advances pose new security and privacy threats to voice-driven interfaces and speech-based access control systems. The state-of-the-art speech synthesis technologies use trained or tuned generative models for cloned speech generation. Trained generative models rely on linear operations, learned weights, and excitation source for cloned speech synthesis. These systems leave characteristic artifacts in the synthesized speech. Higher-order spectral analysis is used to capture differentiating attributes between bona-fide and cloned audios. Specifically, quadrature phase coupling (QPC) in the estimated bicoherence, Gaussianity test statistics, and linearity test statistics are used to capture generative model artifacts. Performance of the proposed method is evaluated on cloned audios generated using speaker adaptation- and speaker encoding-based approaches. Experimental results for a dataset consisting of 126 cloned speech and 8 bona-fide speech samples indicate that the proposed method is capable of detecting bona-fide and cloned audios with close to a perfect detection rate.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06782

PDF

http://arxiv.org/pdf/1902.06782
Read All
Nonlinear Model Predictive Control for Robust Bipedal Locomotion Exploring CoM Height and Angular Momentum Changes

2019-02-18

Jiatao Ding, Chengxu Zhou, Songyan Xin, Xiaohui Xiao, Nikos Tsagarakis

arXiv_RO

arXiv_RO
Abstract

Human beings can make use of various reactive strategies, e.g. foot location adjustment and upper-body inclination, to keep balance while walking under dynamic disturbances. In this work, we propose a novel Nonlinear Model Predictive Control (NMPC) framework for versatile bipedal gait pattern generation, with the capabilities of footstep adjustment, Center of Mass (CoM) height variation and angular momentum adaptation. These features are realized by constraining the Zero Moment Point motion with considering the variable CoM height and angular momentum change of the Inverted Pendulum plus Flywheel Model. In addition, the NMPC framework also takes into account the constraints of footstep location, CoM vertical motion, upper-body inclination and joint torques, and is finally formulated as a quadratically constrained quadratic program. Therefore, it can be solved efficiently by Sequential Quadratic Programming. Using this unified framework, versatile walking pattern with exploiting time-varying CoM height trajectory and angular momentum changes can be generated based only on the terrain information input. Furthermore, the improved capability for balance recovery under external pushes has been demonstrated through simulation studies.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06770

PDF

http://arxiv.org/pdf/1902.06770
Read All
Multi-view Incremental Segmentation of 3D Point Clouds for Mobile Robots

2019-02-18

Jingdao Chen, Yong K. Cho, Zsolt Kira

arXiv_RO

arXiv_RO Segmentation Semantic_Segmentation Classification
Abstract

Mobile robots need to create high-definition 3D maps of the environment for applications such as remote surveillance and infrastructure mapping. Accurate semantic processing of the acquired 3D point cloud is critical for allowing the robot to obtain a high-level understanding of the surrounding objects and perform context-aware decision making. Existing techniques for point cloud semantic segmentation are mostly applied on a single-frame or offline basis, with no way to integrate the segmentation results over time. This paper proposes an online method for mobile robots to incrementally build a semantically-rich 3D point cloud of the environment. The proposed deep neural network, MCPNet, is trained to predict class labels and object instance labels for each point in the scanned point cloud in an incremental fashion. A multi-view context pooling (MCP) operator is used to combine point features obtained from multiple viewpoints to improve the classification accuracy. The proposed architecture was trained and evaluated on ray-traced scans derived from the Stanford 3D Indoor Spaces dataset. Results show that the proposed approach led to 15% improvement in point-wise accuracy and 7% improvement in NMI compared to the next best online method, with only a 6% drop in accuracy compared to the PointNet-based offline approach.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06768

PDF

http://arxiv.org/pdf/1902.06768
Read All
The meta-problem and the transfer of knowledge between theories of consciousness: a software engineer's take

2019-02-18

Marcel Kvassay

arXiv_AI

arXiv_AI Knowledge
Abstract

This contribution examines two radically different explanations of our phenomenal intuitions, one reductive and one strongly non-reductive, and identifies two germane ideas that could benefit many other theories of consciousness. Firstly, the ability of sophisticated agent architectures with a purely physical implementation to support certain functional forms of qualia or proto-qualia appears to entail the possibility of machine consciousness with qualia, not only for reductive theories but also for the nonreductive ones that regard consciousness as ubiquitous in Nature. Secondly, analysis of introspective psychological material seems to hint that, under the threshold of our ordinary waking awareness, there exist further ‘submerged’ or ‘subliminal’ layers of consciousness which constitute a hidden foundation and support and another source of our phenomenal intuitions. These ‘submerged’ layers might help explain certain puzzling phenomena concerning subliminal perception, such as the apparently ‘unconscious’ multisensory integration and learning of subliminal stimuli.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1903.03418

PDF

http://arxiv.org/pdf/1903.03418
Read All
Parenting: Safe Reinforcement Learning from Human Input

2019-02-18

Christopher Frye, Ilya Feige

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

Autonomous agents trained via reinforcement learning present numerous safety concerns: reward hacking, negative side effects, and unsafe exploration, among others. In the context of near-future autonomous agents, operating in environments where humans understand the existing dangers, human involvement in the learning process has proved a promising approach to AI Safety. Here we demonstrate that a precise framework for learning from human input, loosely inspired by the way humans parent children, solves a broad class of safety problems in this context. We show that our Parenting algorithm solves these problems in the relevant AI Safety gridworlds of Leike et al. (2017), that an agent can learn to outperform its parent as it “matures”, and that policies learnt through Parenting are generalisable to new environments.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06766

PDF

http://arxiv.org/pdf/1902.06766
Read All
Using Machine Learning to Guide Cognitive Modeling: A Case Study in Moral Reasoning

2019-02-18

Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths

arXiv_AI

arXiv_AI
Abstract

Large-scale behavioral datasets enable researchers to use complex machine learning algorithms to better predict human behavior, yet this increased predictive power does not always lead to a better understanding of the behavior in question. In this paper, we outline a data-driven, iterative procedure that allows cognitive scientists to use machine learning to generate models that are both interpretable and accurate. We demonstrate this method in the domain of moral decision-making, where standard experimental approaches often identify relevant principles that influence human judgments, but fail to generalize these findings to “real world” situations that place these principles in conflict. The recently released Moral Machine dataset allows us to build a powerful model that can predict the outcomes of these conflicts while remaining simple enough to explain the basis behind human decisions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06744

PDF

http://arxiv.org/pdf/1902.06744
Read All
Learning Ground Traversability from Simulations

2019-02-18

R. Omar Chavez-Garcia, Jerome Guzzi, Luca M. Gambardella, Alessandro Giusti

arXiv_RO

arXiv_RO CNN Classification
Abstract

Mobile ground robots operating on unstructured terrain must predict which areas of the environment they are able to pass in order to plan feasible paths. We address traversability estimation as a heightmap classification problem: we build a convolutional neural network that, given an image representing the heightmap of a terrain patch, predicts whether the robot will be able to traverse such patch from left to right. The classifier is trained for a specific robot model (wheeled, tracked, legged, snake-like) using simulation data on procedurally generated training terrains; the trained classifier can be applied to unseen large heightmaps to yield oriented traversability maps, and then plan traversable paths. We extensively evaluate the approach in simulation on six real-world elevation datasets, and run a real-robot validation in one indoor and one outdoor environment.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1709.05368

PDF

http://arxiv.org/pdf/1709.05368
Read All
DIViS: Domain Invariant Visual Servoing for Collision-Free Goal Reaching

2019-02-18

Fereshteh Sadeghi

arXiv_AI

arXiv_AI
Abstract

Robots should understand both semantics and physics to be functional in the real world. While robot platforms provide means for interacting with the physical world they cannot autonomously acquire object-level semantics without needing human. In this paper, we investigate how to minimize human effort and intervention to teach robots perform real world tasks that incorporate semantics. We study this question in the context of visual servoing of mobile robots and propose DIViS, a Domain Invariant policy learning approach for collision free Visual Servoing. DIViS incorporates high level semantics from previously collected static human-labeled datasets and learns collision free servoing entirely in simulation and without any real robot data. However, DIViS can directly be deployed on a real robot and is capable of servoing to the user-specified object categories while avoiding collisions in the real world. DIViS is not constrained to be queried by the final view of goal but rather is robust to servo to image goals taken from initial robot view with high occlusions without this impairing its ability to maintain a collision free path. We show the generalization capability of DIViS on real mobile robots in more than 90 real world test scenarios with various unseen object goals in unstructured environments. DIViS is compared to prior approaches via real world experiments and rigorous tests in simulation. For supplementary videos, see: \href{https://fsadeghi.github.io/DIViS}{https://fsadeghi.github.io/DIViS}

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.05947

PDF

http://arxiv.org/pdf/1902.05947
Read All
Multi-layer Depth and Epipolar Feature Transformers for 3D Scene Reconstruction

2019-02-18

Daeyun Shin, Zhile Ren, Erik B. Sudderth, Charless C. Fowlkes

arXiv_CV

arXiv_CV Face CNN
Abstract

We tackle the problem of automatically reconstructing a complete 3D model of a scene from a single RGB image. This challenging task requires inferring the shape of both visible and occluded surfaces. Our approach utilizes viewer-centered, multi-layer representation of scene geometry adapted from recent methods for single object shape completion. To improve the accuracy of view-centered representations for complex scenes, we introduce a novel “Epipolar Feature Transformer” that transfers convolutional network features from an input view to other virtual camera viewpoints, and thus better covers the 3D scene geometry. Unlike existing approaches that first detect and localize objects in 3D, and then infer object shape using category-specific models, our approach is fully convolutional, end-to-end differentiable, and avoids the resolution and memory limitations of voxel representations. We demonstrate the advantages of multi-layer depth representations and epipolar feature transformers on the reconstruction of a large database of indoor scenes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06729

PDF

http://arxiv.org/pdf/1902.06729
Read All
HybridSN: Exploring 3D-2D CNN Feature Hierarchy for Hyperspectral Image Classification

2019-02-18

Swalpa Kumar Roy, Gopal Krishna, Shiv Ram Dubey, Bidyut B. Chaudhuri

arXiv_CV

arXiv_CV CNN Image_Classification Classification Deep_Learning
Abstract

Hyperspectral image (HSI) classification is widely used for the analysis of remotely sensed images. Hyperspectral imagery includes varying bands of images. Convolutional Neural Network (CNN) is one of the most frequently used deep learning based methods for visual data processing. The use of CNN for HSI classification is also visible in recent works. These approaches are mostly based on 2D CNN. Whereas, the HSI classification performance is highly dependent on both spatial and spectral information. Very few methods have utilized the 3D CNN because of increased computational complexity. This letter proposes a Hybrid Spectral Convolutional Neural Network (HybridSN) for HSI classification. Basically, the HybridSN is a spectral-spatial 3D-CNN followed by spatial 2D-CNN. The 3D-CNN facilitates the joint spatial-spectral feature representation from a stack of spectral bands. The 2D-CNN on top of the 3D-CNN further learns more abstract level spatial representation. Moreover, the use of hybrid CNNs reduces the complexity of the model compared to 3D-CNN alone. To test the performance of this hybrid approach, very rigorous HSI classification experiments are performed over Indian Pines, Pavia University and Salinas Scene remote sensing datasets. The results are compared with the state-of-the-art hand-crafted as well as end-to-end deep learning based methods. A very satisfactory performance is obtained using the proposed HybridSN for HSI classification. The source code can be found at \url{https://github.com/gokriznastic/HybridSN}.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06701

PDF

http://arxiv.org/pdf/1902.06701
Read All
Generative Adversarial Networks Synthesize Realistic OCT Images of the Retina

2019-02-18

Stephen G. Odaibo, M.D., M.S. (Math), M.S. (Comp. Sci.)

arXiv_CV

arXiv_CV Adversarial Knowledge Attention GAN
Abstract

We report, to our knowledge, the first end-to-end application of Generative Adversarial Networks (GANs) towards the synthesis of Optical Coherence Tomography (OCT) images of the retina. Generative models have gained recent attention for the increasingly realistic images they can synthesize, given a sampling of a data type. In this paper, we apply GANs to a sampling distribution of OCTs of the retina. We observe the synthesis of realistic OCT images depicting recognizable pathology such as macular holes, choroidal neovascular membranes, myopic degeneration, cystoid macular edema, and central serous retinopathy amongst others. This represents the first such report of its kind. Potential applications of this new technology include for surgical simulation, for treatment planning, for disease prognostication, and for accelerating the development of new drugs and surgical procedures to treat retinal disease.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06676

PDF

http://arxiv.org/pdf/1902.06676
Read All
Multiple Instance Choquet Integral Classifier Fusion and Regression for Remote Sensing Applications

2019-02-18

Xiaoxiao Du, Alina Zare

arXiv_CV

arXiv_CV Classification Prediction Detection
Abstract

In classifier (or regression) fusion the aim is to combine the outputs of several algorithms to boost overall performance. Standard supervised fusion algorithms often require accurate and precise training labels. However, accurate labels may be difficult to obtain in many remote sensing applications. This paper proposes novel classification and regression fusion models that can be trained given ambiguosly and imprecisely labeled training data in which training labels are associated with sets of data points (i.e., “bags”) instead of individual data points (i.e., “instances”) following a multiple instance learning framework. Experiments were conducted based on the proposed algorithms on both synthetic data and applications such as target detection and crop yield prediction given remote sensing data. The proposed algorithms show effective classification and regression performance.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1803.04048

PDF

http://arxiv.org/pdf/1803.04048
Read All
Investigating the Effect of Segmentation Methods on Neural Model based Sentiment Analysis on Informal Short Texts in Turkish

2019-02-18

Fatih Kurt, Dilek Kisa, Pinar Karagoz

arXiv_CL

arXiv_CL Sentiment Segmentation Sentiment_Classification CNN RNN Classification
Abstract

This work investigates segmentation approaches for sentiment analysis on informal short texts in Turkish. The two building blocks of the proposed work are segmentation and deep neural network model. Segmentation focuses on preprocessing of text with different methods. These methods are grouped in four: morphological, sub-word, tokenization, and hybrid approaches. We analyzed several variants for each of these four methods. The second stage focuses on evaluation of the neural model for sentiment analysis. The performance of each segmentation method is evaluated under Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) model proposed in the literature for sentiment classification.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06635

PDF

http://arxiv.org/pdf/1902.06635
Read All
Contextual Encoder-Decoder Network for Visual Saliency Prediction

2019-02-18

Alexander Kroner, Mario Senden, Kurt Driessens, Rainer Goebel

arXiv_CV

arXiv_CV Salient CNN Image_Classification Classification Prediction Detection
Abstract

Predicting salient regions in natural images requires the detection of objects that are present in a scene. To develop robust representations for this challenging task, high-level visual features at multiple spatial scales must be extracted and augmented with contextual information. However, existing models aimed at explaining human fixation maps do not incorporate such a mechanism explicitly. Here we propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task. The architecture forms an encoder-decoder structure and includes a module with multiple convolutional layers at different dilation rates to capture multi-scale features in parallel. Moreover, we combine the resulting representations with global scene information for accurately predicting visual saliency. Our model achieves competitive results on two public saliency benchmarks and we demonstrate the effectiveness of the suggested approach on selected examples. The network is based on a lightweight image classification backbone and hence presents a suitable choice for applications with limited computational resources to estimate human fixations across complex natural scenes.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06634

PDF

http://arxiv.org/pdf/1902.06634
Read All
Appendix for: Cut-free Calculi and Relational Semantics for Temporal STIT logics

2019-02-18

Kees van Berkel, Tim Lyon

arXiv_AI

arXiv_AI Relation
Abstract

This paper is an appendix to the paper “Cut-free Calculi and Relational Semantics for Temporal STIT logics” by Berkel and Lyon, 2019. It provides the completeness proof for the basic STIT logic Ldm (relative to irreflexive, temporal Kripke STIT frames) as well as gives the derivation of the independence of agents axiom for the logic Xstit.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06632

PDF

http://arxiv.org/pdf/1902.06632
Read All
Classifying textual data: shallow, deep and ensemble methods

2019-02-18

Laura Anderlucci, Lucia Guastadisegni, Cinzia Viroli

arXiv_CL

arXiv_CL Sparse Text_Classification Classification Deep_Learning
Abstract

This paper focuses on a comparative evaluation of the most common and modern methods for text classification, including the recent deep learning strategies and ensemble methods. The study is motivated by a challenging real data problem, characterized by high-dimensional and extremely sparse data, deriving from incoming calls to the customer care of an Italian phone company. We will show that deep learning outperforms many classical (shallow) strategies but the combination of shallow and deep learning methods in a unique ensemble classifier may improve the robustness and the accuracy of “single” classification methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.07068

PDF

http://arxiv.org/pdf/1902.07068
Read All
Deep Learning for Video Game Playing

2019-02-18

Niels Justesen, Philip Bontrager, Julian Togelius, Sebastian Risi

arXiv_AI

arXiv_AI Review Sparse Deep_Learning
Abstract

In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1708.07902

PDF

http://arxiv.org/pdf/1708.07902
Read All
structural and optical properties of InxGa1-xN/GaN epilayers grown on a miscut sapphire substrate

2019-02-18

I. A. Ajia, S. M. C. Miranda, N. Franco, E. Alves, K. Lorenz, K. P. O'Donnell, I. S. Roqan

arXiv_CV

arXiv_CV GAN Face
Abstract

We report on structural and optical properties of InGaN/GaN thin films, with a 0.46o misalignment between the surface and the (0001) plane, which were grown by metal-organic chemical vapor deposition (MOCVD) on 0.34o miscut sapphire substrates. X-ray diffraction and X-ray reflectivity were used to precisely measure the degree of miscut. Reciprocal space mapping was employed to determine the lattice parameters and strain state of the InGaN layers. Rutherford backscattering spectrometry with channeling was employed to measure their composition and crystalline quality with depth resolution. No strain anisotropy was observed. Polarization-dependent photoluminescence spectroscopy was carried out to examine the effect of the miscut on the bandedge emission of the epilayer.

Abstract (translated by Google)

URL

https://arxiv.org/abs/1902.06592

PDF

https://arxiv.org/pdf/1902.06592
Read All
Object Recognition under Multifarious Conditions: A Reliability Analysis and A Feature Similarity-based Performance Estimation

2019-02-18

Dogancan Temel, Jinsol Lee, Ghassan AlRegib

arXiv_CV

arXiv_CV Image_Caption Deep_Learning Relation Recognition
Abstract

In this paper, we investigate the reliability of online recognition platforms, Amazon Rekognition and Microsoft Azure, with respect to changes in background, acquisition device, and object orientation. We focus on platforms that are commonly used by the public to better understand their real-world performances. To assess the variation in recognition performance, we perform a controlled experiment by changing the acquisition conditions one at a time. We use three smartphones, one DSLR, and one webcam to capture side views and overhead views of objects in a living room, an office, and photo studio setups. Moreover, we introduce a framework to estimate the recognition performance with respect to backgrounds and orientations. In this framework, we utilize both handcrafted features based on color, texture, and shape characteristics and data-driven features obtained from deep neural networks. Experimental results show that deep learning-based image representations can estimate the recognition performance variation with a Spearman’s rank-order correlation of 0.94 under multifarious acquisition conditions.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06585

PDF

http://arxiv.org/pdf/1902.06585
Read All
CLAUDETTE: an Automated Detector of Potentially Unfair Clauses in Online Terms of Service

2019-02-18

Marco Lippi, Przemyslaw Palka, Giuseppe Contissa, Francesca Lagioia, Hans-Wolfgang Micklitz, Giovanni Sartor, Paolo Torroni

arXiv_AI

arXiv_AI Object_Detection Detection
Abstract

Terms of service of on-line platforms too often contain clauses that are potentially unfair to the consumer. We present an experimental study where machine learning is employed to automatically detect such potentially unfair clauses. Results show that the proposed system could provide a valuable tool for lawyers and consumers alike.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.01217

PDF

http://arxiv.org/pdf/1805.01217
Read All
A Generative Map for Image-based Camera Localization

2019-02-18

Mingpan Guo, Stefan Matthes, Jiaojiao Ye, Hao Shen

arXiv_CV

arXiv_CV Attention
Abstract

In image-based camera localization systems, information about the environment is usually stored in some representation, which can be referred to as a map. Conventionally, most map representations are built upon hand-crafted features. Recently, neural networks have attracted attention as a data-driven map representation, and have shown promising results in visual localization. However, these neural network maps are generally unreadable and hard to interpret. A readable map is not only accessible to humans, but also provides a way to be verified when the ground truth pose is unavailable. To tackle this problem, we propose Generative Map, a new framework for learning human-readable neural network maps. Our framework can be used for localization as previous learning maps, and also allows us to inspect the map by querying images from specified viewpoints of interest. We combine a generative model with the Kalman filter, which exploits the sequential structure of the localization problem. This also allows our approach to naturally incorporate additional sensor information and a transition model of the system. For evaluation we use real world images from the 7-Scenes dataset. We show that our approach can be used for localization tasks. For readability, we demonstrate that our Generative Map can be queried with poses from the test sequence to generate images, which closely resemble the true images.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.11124

PDF

http://arxiv.org/pdf/1902.11124
Read All
Decomposing multispectral face images into diffuse and specular shading and biophysical parameters

2019-02-18

Sarah Alotaibi, William A. P. Smith

arXiv_CV

arXiv_CV Face Quantitative
Abstract

We propose a novel biophysical and dichromatic reflectance model that efficiently characterises spectral skin reflectance. We show how to fit the model to multispectral face images enabling high quality estimation of diffuse and specular shading as well as biophysical parameter maps (melanin and haemoglobin). Our method works from a single image without requiring complex controlled lighting setups yet provides quantitatively accurate reconstructions and qualitatively convincing decomposition and editing.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06557

PDF

http://arxiv.org/pdf/1902.06557
Read All
MetaGrasp: Data Efficient Grasping by Affordance Interpreter Network

2019-02-18

Junhao Cai, Hui Cheng, Zhanpeng Zhang, Jingcheng Su

arXiv_CV

arXiv_CV Adversarial Inference Quantitative
Abstract

Data-driven approach for grasping shows significant advance recently. But these approaches usually require much training data. To increase the efficiency of grasping data collection, this paper presents a novel grasp training system including the whole pipeline from data collection to model inference. The system can collect effective grasp sample with a corrective strategy assisted by antipodal grasp rule, and we design an affordance interpreter network to predict pixelwise grasp affordance map. We define graspability, ungraspability and background as grasp affordances. The key advantage of our system is that the pixel-level affordance interpreter network trained with only a small number of grasp samples under antipodal rule can achieve significant performance on totally unseen objects and backgrounds. The training sample is only collected in simulation. Extensive qualitative and quantitative experiments demonstrate the accuracy and robustness of our proposed approach. In the real-world grasp experiments, we achieve a grasp success rate of 93% on a set of household items and 91% on a set of adversarial items with only about 6,300 simulated samples. We also achieve 87% accuracy in clutter scenario. Although the model is trained using only RGB image, when changing the background textures, it also performs well and can achieve even 94% accuracy on the set of adversarial objects, which outperforms current state-of-the-art methods.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06554

PDF

http://arxiv.org/pdf/1902.06554
Read All
LocalNorm: Robust Image Classification through Dynamically Regularized Normalization

2019-02-18

Bojian Yin, Siebren Schaafsma, Henk Corporaal, H. Steven Scholte, Sander M. Bohte

arXiv_CV

arXiv_CV CNN Image_Classification Inference Classification
Abstract

While modern convolutional neural networks achieve outstanding accuracy on many image classification tasks, they are, compared to humans, much more sensitive to image degradation. Here, we describe a variant of Batch Normalization, LocalNorm, that regularizes the normalization layer in the spirit of Dropout while dynamically adapting to the local image intensity and contrast at test-time. We show that the resulting deep neural networks are much more resistant to noise-induced image degradation, improving accuracy by up to three times, while achieving the same or slightly better accuracy on non-degraded classical benchmarks. In computational terms, LocalNorm adds negligible training cost and little or no cost at inference time, and can be applied to already-trained networks in a straightforward manner.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06550

PDF

http://arxiv.org/pdf/1902.06550
Read All
Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology

2019-02-18

David Tellez, Geert Litjens, Peter Bandi, Wouter Bulten, John-Melle Bokhorst, Francesco Ciompi, Jeroen van der Laak

arXiv_CV

arXiv_CV GAN CNN Classification
Abstract

Stain variation is a phenomenon observed when distinct pathology laboratories stain tissue slides that exhibit similar but not identical color appearance. Due to this color shift between laboratories, convolutional neural networks (CNNs) trained with images from one lab often underperform on unseen images from the other lab. Several techniques have been proposed to reduce the generalization error, mainly grouped into two categories: stain color augmentation and stain color normalization. The former simulates a wide variety of realistic stain variations during training, producing stain-invariant CNNs. The latter aims to match training and test color distributions in order to reduce stain variation. For the first time, we compared some of these techniques and quantified their effect on CNN classification performance using a heterogeneous dataset of hematoxylin and eosin histopathology images from 4 organs and 9 pathology laboratories. Additionally, we propose a novel unsupervised method to perform stain color normalization using a neural network. Based on our experimental results, we provide practical guidelines on how to use stain color augmentation and stain color normalization in future computational pathology applications.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06543

PDF

http://arxiv.org/pdf/1902.06543
Read All
Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning

2019-02-18

Woojun Kim, Myungsik Cho, Youngchul Sung

arXiv_AI

arXiv_AI Reinforcement_Learning
Abstract

In this paper, we propose a new learning technique named message-dropout to improve the performance for multi-agent deep reinforcement learning under two application scenarios: 1) classical multi-agent reinforcement learning with direct message communication among agents and 2) centralized training with decentralized execution. In the first application scenario of multi-agent systems in which direct message communication among agents is allowed, the message-dropout technique drops out the received messages from other agents in a block-wise manner with a certain probability in the training phase and compensates for this effect by multiplying the weights of the dropped-out block units with a correction probability. The applied message-dropout technique effectively handles the increased input dimension in multi-agent reinforcement learning with communication and makes learning robust against communication errors in the execution phase. In the second application scenario of centralized training with decentralized execution, we particularly consider the application of the proposed message-dropout to Multi-Agent Deep Deterministic Policy Gradient (MADDPG), which uses a centralized critic to train a decentralized actor for each agent. We evaluate the proposed message-dropout technique for several games, and numerical results show that the proposed message-dropout technique with proper dropout rate improves the reinforcement learning performance significantly in terms of the training speed and the steady-state performance in the execution phase.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06527

PDF

http://arxiv.org/pdf/1902.06527
Read All
Why do deep convolutional networks generalize so poorly to small image transformations?

2019-02-18

Aharon Azulay, Yair Weiss

arXiv_CV

arXiv_CV CNN Recognition
Abstract

Deep convolutional network architectures are often assumed to guarantee generalization for small image translations and deformations. In this paper we show that modern CNNs (VGG16, ResNet50, and InceptionResNetV2) can drastically change their output when an image is translated in the image plane by a few pixels, and that this failure of generalization also happens with other realistic small image transformations. Furthermore, the deeper the network the more we see these failures to generalize. We show that these failures are related to the fact that the architecture of modern CNNs ignores the classical sampling theorem so that generalization is not guaranteed. We also show that biases in the statistics of commonly used image datasets makes it unlikely that CNNs will learn to be invariant to these transformations. Taken together our results suggest that the performance of CNNs in object recognition falls far short of the generalization capabilities of humans.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1805.12177

PDF

http://arxiv.org/pdf/1805.12177
Read All
Structural Recurrent Neural Network for Traffic Speed Prediction

2019-02-18

Youngjoo Kim, Peng Wang, Lyudmila Mihaylova

arXiv_CV

arXiv_CV Embedding RNN Prediction
Abstract

Deep neural networks have recently demonstrated the traffic prediction capability with the time series data obtained by sensors mounted on road segments. However, capturing spatio-temporal features of the traffic data often requires a significant number of parameters to train, increasing computational burden. In this work we demonstrate that embedding topological information of the road network improves the process of learning traffic features. We use a graph of a vehicular road network with recurrent neural networks (RNNs) to infer the interaction between adjacent road segments as well as the temporal dynamics. The topology of the road network is converted into a spatio-temporal graph to form a structural RNN (SRNN). The proposed approach is validated over traffic speed data from the road network of the city of Santander in Spain. The experiment shows that the graph-based method outperforms the state-of-the-art methods based on spatio-temporal images, requiring much fewer parameters to train.

Abstract (translated by Google)

URL

http://arxiv.org/abs/1902.06506

PDF

http://arxiv.org/pdf/1902.06506
Read All
Anticipation in Human-Robot Cooperation: A Recurrent Neural Network Approach for Multiple Action Sequences Prediction

2019-02-18

Paul Schydlo, Mirko Rakovic, Lorenzo Jamone, José Santos-Victor

arXiv_AI

arXiv_AI Prediction
Abstract

Close human-robot cooperation is a key enabler for new developments in advanced manufacturing and assistive applications. Close cooperation require robots that can predict human actions and intent, and understand human non-verbal cues. Recent approaches based on neural networks have led to encouraging results in the human action prediction problem both in continuous and discrete spaces. Our approach extends the research in this direction. Our contributions are three-fold. First, we validate the use of gaze and body pose cues as a means of predicting human action through a feature selection method. Next, we address two shortcomings of existing literature: predicting multiple and variable-length action sequences. This is achieved by introducing an encoder-decoder recurrent neural network topology in the discrete action prediction problem. In addition, we theoretically demonstrate the importance of predicting multiple action sequences as a means of estimating the stochastic reward in a human robot cooperation scenario. Finally, we show the ability to effectively train the prediction model on a action prediction dataset, involving human motion data, and explore the influence of the model’s parameters on its performance. Source code repository: https://github.com/pschydlo/ActionAnticipation

Abstract (translated by Google)

URL

http://arxiv.org/abs/1802.10503

PDF

http://arxiv.org/pdf/1802.10503
Read All

152/266

Welcome to AMDS123 Blog!

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL

PDF

Abstract

Abstract (translated by Google)

URL